Changeset 7252:3a737f288391 in orange


Ignore:
Timestamp:
02/02/11 22:05:52 (3 years ago)
Author:
anze <anze.staric@…>
Branch:
default
Convert:
2a2511d74e3fa22469386886c75e8fea8f4aa8f2
Message:

updated bayes documentation

Location:
orange
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/bayes.py

    r7225 r7252  
    77 
    88.. index:: Naive Bayesian Learner 
    9 .. autoclass:: Orange.classification.bayes.NaiveBayes 
     9.. autoclass:: Orange.classification.bayes.NaiveBayesLearner 
    1010   :members: 
    1111    
     12.. autoclass:: Orange.classification.bayes.NaiveBayesClassifier 
     13   :members: 
     14    
     15Examples 
     16======== 
     17Let us load the data, induce a classifier and see how it performs on the first 
     18five examples. 
     19 
     20>>> data = orange.ExampleTable("lenses") 
     21>>> bayes = orange.BayesLearner(data) 
     22>>> 
     23>>> for ex in data[:5]: 
     24...    print ex.getclass(), bayes(ex) 
     25no no 
     26no no 
     27soft soft 
     28no no 
     29hard hard 
     30 
     31The classifier is correct in all five cases. Interested in probabilities, 
     32maybe? 
     33 
     34>>> for ex in data[:5]: 
     35...     print ex.getclass(), bayes(ex, orange.Classifier.GetProbabilities) 
     36no <0.423, 0.000, 0.577> 
     37no <0.000, 0.000, 1.000> 
     38soft <0.000, 0.668, 0.332> 
     39no <0.000, 0.000, 1.000> 
     40hard <0.715, 0.000, 0.285> 
     41 
     42While very confident about the second and the fourth example, the classifier 
     43guessed the correct class of the first one only by a small margin of 42 vs. 
     4458 percents. 
     45 
     46Now, let us peek into the classifier. 
     47 
     48>>> print bayes.estimator 
     49None 
     50>>> print bayes.distribution 
     51<0.167, 0.208, 0.625> 
     52>>> print bayes.conditionalEstimators 
     53None 
     54>>> print bayes.conditionalDistributions[0] 
     55<'young': <0.250, 0.250, 0.500>, 'p_psby': <0.125, 0.250, 0.625>, (...) 
     56>>> bayes.conditionalDistributions[0]["young"] 
     57<0.250, 0.250, 0.500> 
     58 
     59The classifier has no estimator since probabilities are stored in distribution. 
     60The probability of the first class is 0.167, of the second 0.208 and the 
     61probability of the third class is 0.625. Nor does it have  
     62conditionalEstimators, probabilities are stored in conditionalDistributions. 
     63We printed the contingency matrix for the first attribute and, in the last 
     64line, conditional probabilities of the three classes when the value of the 
     65first attribute is "young". 
     66 
     67Let us now use m-estimate instead of relative frequencies. 
     68 
     69>>> bayesl = orange.BayesLearner() 
     70>>> bayesl.estimatorConstructor = orange.ProbabilityEstimatorConstructor_m(m=2.0) 
     71>>> bayes = bayesl(data) 
     72 
     73The classifier is still correct for all examples. 
     74 
     75>>> for ex in data[:5]: 
     76...     print ex.getclass(), bayes(ex, no &lt;0.375, 0.063, 0.562&gt; 
     77no <0.016, 0.003, 0.981> 
     78soft <0.021, 0.607, 0.372> 
     79no <0.001, 0.039, 0.960> 
     80hard <0.632, 0.030, 0.338> 
     81 
     82Observing probabilities shows a shift towards the third, more frequent class - 
     83as compared to probabilities above, where relative frequencies were used. 
     84 
     85>>> print bayes.conditionalDistributions[0] 
     86<'young': <0.233, 0.242, 0.525>, 'p_psby': <0.133, 0.242, 0.625>, (...) 
     87 
     88Note that the change in error estimation did not have any effect on apriori 
     89probabilities: 
     90 
     91>>> print bayes.distribution 
     92<0.167, 0.208, 0.625> 
     93 
     94The reason for this is that this same distribution was used as apriori 
     95distribution for m-estimation. (How to enforce another apriori distribution? 
     96While the orange C++ core supports of it, this feature has not been exported 
     97to Python yet.) 
     98 
     99Finally, let us show an example with continuous attributes. We will take iris 
     100dataset that contains four continuous and no discrete attributes. 
     101 
     102>>> data = orange.ExampleTable("iris") 
     103>>> bayes = orange.BayesLearner(data) 
     104>>> for exi in range(0, len(data), 20): 
     105...     print data[exi].getclass(), bayes(data[exi], orange.Classifier.GetBoth) 
     106 
     107The classifier works well. To see a glimpse of how it works, let us observe 
     108conditional distributions for the first attribute. It is stored in 
     109conditionalDistributions, as before, except that it now behaves as a dictionary, 
     110not as a list like before (see information on distributions. 
     111 
     112>>> print bayes.conditionalDistributions[0] 
     113<4.300: <0.837, 0.137, 0.026>;, 4.333: <0.834, 0.140, 0.026>, 4.367: <0.830, (...) 
     114 
     115For a nicer picture, we can print out the probabilities, copy and paste it to 
     116some graph drawing program ... and get something like the figure below. 
     117 
     118>>> for x, probs in bayes.conditionalDistributions[0].items(): 
     119...     print "%5.3f\t%5.3f\t%5.3f\t%5.3f" % (x, probs[0], probs[1], probs[2]) 
     1204.300   0.837   0.137   0.026 
     1214.333   0.834   0.140   0.026 
     1224.367   0.830   0.144   0.026 
     1234.400   0.826   0.147   0.027 
     1244.433   0.823   0.150   0.027 
     125(...) 
     126 
     127If petal lengths are shorter, the most probable class is "setosa". Irises with middle petal lengths belong to "versicolor", while longer petal lengths indicate for "virginica". Critical values where the decision would change are at about 5.4 and 6.3. 
     128 
     129It is important to stress that the curves are relatively smooth although no fitting (either manual or automatic) of parameters took place. 
     130 
    12131""" 
    13132 
     
    16135from Orange.core import BayesLearner as _BayesLearner 
    17136 
    18 class NaiveBayes(Orange.core.Learner): 
    19     """ 
    20     Naive bayes learner 
     137class NaiveBayesLearner(Orange.core.Learner): 
     138    """ 
     139    Probabilistic classifier based on applying Bayes' theorem (from Bayesian 
     140    statistics) with strong (naive) independence assumptions. 
     141    If data instances are provided to the constructor, the learning algorithm 
     142    is called and the resulting classifier is returned instead of the learner. 
     143     
     144    :param adjustTreshold: If set and the class is binary, the classifier's 
     145            threshold will be set as to optimize the classification accuracy. 
     146            The threshold is tuned by observing the probabilities predicted on 
     147            learning data. Setting it to True can increase the 
     148            accuracy considerably. 
     149    :type adjustTreshold: boolean 
     150    :param m: m for m-estimate. If set, m-estimation of probabilities 
     151            will be used using :class:`orange.ProbabilityEstimatorConstructor_m` 
     152            This attribute is ignored if you also set estimatorConstructor. 
     153    :type m: integer 
     154    :param estimatorConstructor: Probability estimator constructor for 
     155            prior class probabilities. Defaults to 
     156            :`class:orange.ProbabilityEstimatorConstructor_relative` 
     157            Setting this attribute disables the above described attribute m. 
     158    :type estimatorConstructor: orange.ProbabilityEstimatorConstructor 
     159    :param conditionalEstimatorConstructor: Probability estimator constructor 
     160            for conditional probabilities for discrete features. If omitted, 
     161            the estimator for prior probabilities will be used. 
     162    :type conditionalEstimatorConstructor: orange.ConditionalProbabilityEstimatorConstructor 
     163    :param conditionalEstimatorConstructorContinuous: Probability estimator constructor 
     164            for conditional probabilities for continuous features. Defaults to 
     165            :class:`orange.ConditionalProbabilityEstimatorConstructor_loess` 
     166    :type conditionalEstimatorConstructorContinuous: orange.ConditionalProbabilityEstimatorConstructor 
     167    :rtype: :class:`Orange.classification.bayes.NaiveBayesLearner` or 
     168            :class:`Orange.classification.bayes.NaiveBayesClassifier`  
    21169    """ 
    22170     
     
    29177            return self 
    30178         
    31     def __init__(self, normalizePredictions=True, adjustTreshold=False, 
    32                  m=0, estimatorConstructor=None, conditionalEstimatorConstructor=None, 
     179    def __init__(self, adjustTreshold=False, m=0, estimatorConstructor=None, 
     180                 conditionalEstimatorConstructor=None, 
    33181                 conditionalEstimatorConstructorContinuous=None,**argkw): 
    34         """ 
    35         :param adjustTreshold: If set and the class is binary, the classifier's 
    36                 threshold will be set as to optimize the classification accuracy. 
    37                 The threshold is tuned by observing the probabilities predicted on 
    38                 learning data. Default is False (to conform with the usual naive 
    39                 bayesian classifiers), but setting it to True can increase the 
    40                 accuracy considerably. 
    41         :type adjustTreshold: boolean 
    42         :param m: m for m-estimate. If set, m-estimation of probabilities 
    43                 will be used using :class:`orange.ProbabilityEstimatorConstructor_m` 
    44                 This attribute is ignored if you also set estimatorConstructor. 
    45         :type m: integer 
    46         :param estimatorConstructor: Probability estimator constructor for 
    47                 prior class probabilities. Defaults to 
    48                 :`class:orange.ProbabilityEstimatorConstructor_relative` 
    49                 Setting this attribute disables the above described attribute m. 
    50         :type estimatorConstructor: orange.ProbabilityEstimatorConstructor 
    51         :param conditionalEstimatorConstructor: Probability estimator constructor 
    52                 for conditional probabilities for discrete features. If omitted, 
    53                 the estimator for a priori will be used. 
    54                 class probabilities. 
    55         :type conditionalEstimatorConstructor: orange.ConditionalProbabilityEstimatorConstructor 
    56         :param conditionalEstimatorConstructorContinuous: Probability estimator constructor 
    57                 for conditional probabilities for continuous features. Defaults to 
    58                 :class:`orange.ConditionalProbabilityEstimatorConstructor_loess` 
    59         :type conditionalEstimatorConstructorContinuous: orange.ConditionalProbabilityEstimatorConstructor 
    60         """ 
    61182        self.adjustThreshold = adjustTreshold 
    62183        self.m = m 
     
    66187        self.__dict__.update(argkw) 
    67188 
    68     def __call__(self, examples, weight=0): 
     189    def __call__(self, instances, weight=0): 
     190        """Learn from the given table of data instances. 
     191         
     192        :param instances: Data instances to learn from. 
     193        :type instances: Orange.data.Table 
     194        :param weight: Id of meta attribute with weights of instances 
     195        :type weight: integer 
     196        :rtype: :class:`Orange.classification.bayes.NaiveBayesClassifier` 
     197        """ 
    69198        bayes = _BayesLearner() 
    70199        if self.estimatorConstructor: 
     
    86215            bayes.conditionalEstimatorConstructorContinuous = self.conditionalEstimatorConstructorContinuous 
    87216             
    88         return NaiveBayesClassifier(bayes(examples, weight)) 
     217        return NaiveBayesClassifier(bayes(instances, weight)) 
    89218             
    90219class NaiveBayesClassifier(Orange.core.Classifier): 
    91     def __init__(self, nbc): 
    92         self.nativeBayesClassifier = nbc 
     220    """ 
     221    Wrapps a native BayesClassifier to add print method 
     222    :param: 
     223    """ 
     224     
     225    def __init__(self, nativeBayesClassifier): 
     226        self.nativeBayesClassifier = nativeBayesClassifier 
    93227        for k, v in self.nativeBayesClassifier.__dict__.items(): 
    94228            self.__dict__[k] = v 
    95229   
    96     def __call__(self, *args, **kwdargs): 
    97         self.nativeBayesClassifier(*args, **kwdargs) 
     230    def __call__(self, instance, *args, **kwdargs): 
     231        """Classify a new instance 
     232        :param instance: instance to be classifier 
     233        :type instance: :class:`Orange.data.Instance` 
     234        :rtype: :class:Orange.data.` 
     235        """ 
     236        self.nativeBayesClassifier(instance, *args, **kwdargs) 
    98237 
    99238    def __setattr__(self, name, value): 
     
    105244        self.__dict__[name] = value 
    106245     
     246    def p(self, class_, instance): 
     247        """Return probability of single class 
     248         
     249        Probability is not normalized and can be different from probability 
     250        returned from __call__ 
     251        """ 
     252        return self.nativeBayesClassifier.p(class_, instance) 
    107253     
    108254    def printModel(self): 
     255        """Print classificator in human friendly format""" 
    109256        nValues=len(self.classVar.values) 
    110257        frmtStr=' %10.3f'*nValues 
  • orange/doc/Orange/rst/code/bayes-run.py

    r7225 r7252  
    1 # Description: Self Organizing Maps on iris data set 
    2 # Category:    modelling 
    3 # Uses:        iris 
    4 # Referenced:  orngSOM.htm 
    5 # Classes:     orngSOM.SOMLearner 
    6  
    71import Orange 
    82som = Orange.projection.som.SOMLearner(map_shape=(10, 20), initialize=Orange.projection.som.InitializeRandom) 
Note: See TracChangeset for help on using the changeset viewer.