Changeset 7418:dc66f4fdfe2f in orange


Ignore:
Timestamp:
02/04/11 12:26:25 (3 years ago)
Author:
anze <anze.staric@…>
Branch:
default
Convert:
42cb7723fb7beecc4f8fcfdd556364f157fe71d2
Message:

Fixed bayes regression tests

Location:
orange
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/bayes.py

    r7375 r7418  
    11"""  
    2    index:: naive bayes 
    3  
    4 ========================= 
    5 Naive Bayesian Classifier 
    6 ========================= 
    7  
    8 The most primitive bayesian classifier is :obj:`NaiveLearner`. The class 
    9 estimates conditional probabilities from train data and uses them for 
    10 prediction of new examples.  
     2   index:: naive Bayes classifier 
     3    
     4.. index::  
     5   single: classification; naive Bayes classifier 
     6 
     7====================== 
     8Naive Bayes Classifier 
     9====================== 
     10 
     11The most primitive bayesian classifier is :obj:`NaiveLearner`.  
     12(http://en.wikipedia.org/wiki/Naive_Bayes_classifier) 
     13The class estimates conditional probabilities from training data and uses them 
     14for classification of new examples.  
    1115 
    1216Example (`bayes-run.py`_, uses `iris.tab`_) 
     
    2630Examples 
    2731======== 
     32Example (`bayes-run.py`_, uses `iris.tab`_) 
     33 
     34.. literalinclude:: code/bayes-run.py 
     35    :lines: 7- 
     36     
    2837Let us load the data, induce a classifier and see how it performs on the first 
    2938five examples. 
     
    4554 
    4655>>> for ex in table[:5]: 
    47 ...     print ex.getclass(), bayes(ex, Orange.classification.Classifier.GetProbabilities) 
     56...     print ex.getclass(), bayes(ex, \ 
     57Orange.classification.Classifier.GetProbabilities) 
    4858no <0.423, 0.000, 0.577> 
    4959no <0.000, 0.000, 1.000> 
     
    8595 
    8696>>> for ex in table[:5]: 
    87 ...     print ex.getclass(), bayes(ex, Orange.classification.Classifier.GetBoth)  
     97...     print ex.getclass(), bayes(ex, \ 
     98Orange.classification.Classifier.GetBoth) 
    8899no <0.375, 0.063, 0.562>; 
    89100no <0.016, 0.003, 0.981> 
     
    105116 
    106117The reason for this is that this same distribution was used as apriori 
    107 distribution for m-estimation. (How to enforce another apriori distribution? 
    108 While the orange C++ core supports of it, this feature has not been exported 
    109 to Python yet.) 
     118distribution for m-estimation. 
    110119 
    111120Finally, let us show an example with continuous attributes. We will take iris 
     
    115124>>> bayes = orange.BayesLearner(table) 
    116125>>> for exi in range(0, len(table), 20): 
    117 ...     print data[exi].getclass(), bayes(table[exi], orange.Classifier.GetBoth) 
     126...     print data[exi].getclass(), bayes(table[exi], \ 
     127orange.Classifier.GetBoth) 
    118128 
    119129The classifier works well. To see a glimpse of how it works, let us observe 
    120130conditional distributions for the first attribute. It is stored in 
    121 conditionalDistributions, as before, except that it now behaves as a dictionary, 
    122 not as a list like before (see information on distributions. 
     131conditionalDistributions, as before, except that it now behaves as a 
     132dictionary, not as a list like before (see information on distributions. 
    123133 
    124134>>> print bayes.conditionalDistributions[0] 
    125 <4.300: <0.837, 0.137, 0.026>;, 4.333: <0.834, 0.140, 0.026>, 4.367: <0.830, (...) 
     135<4.300: <0.837, 0.137, 0.026>;, 4.333: <0.834, 0.140, 0.026>, 4.367: <0.830, \ 
     136(...) 
    126137 
    127138For a nicer picture, we can print out the probabilities, copy and paste it to 
     
    137148(...) 
    138149 
    139 If petal lengths are shorter, the most probable class is "setosa". Irises with middle petal lengths belong to "versicolor", while longer petal lengths indicate for "virginica". Critical values where the decision would change are at about 5.4 and 6.3. 
    140  
    141 It is important to stress that the curves are relatively smooth although no fitting (either manual or automatic) of parameters took place. 
     150If petal lengths are shorter, the most probable class is "setosa". Irises with 
     151middle petal lengths belong to "versicolor", while longer petal lengths 
     152indicate for "virginica". Critical values where the decision would change are 
     153at about 5.4 and 6.3. 
     154 
     155It is important to stress that the curves are relatively smooth although no 
     156fitting (either manual or automatic) of parameters took place. 
    142157 
    143158 
    144159.. _bayes-run.py: code/bayes-run.py 
    145160.. _iris.tab: code/iris.tab 
     161 
     162====================== 
     163Implementation Details 
     164====================== 
     165 
     166Orange.core.BayesLearner 
     167======================== 
     168The first three fields are empty (None) by default. 
     169 
     170If estimatorConstructor is left undefined, p(C) will be estimated by relative 
     171frequencies of examples (see ProbabilityEstimatorConstructor_relative). 
     172When conditionalEstimatorConstructor is left undefined, it will use the same 
     173constructor as for estimating unconditional probabilities (estimatorConstructor 
     174is used as an estimator in (ConditionalProbabilityEstimatorConstructor_ByRows). 
     175That is, by default, both will use relative frequencies. But when 
     176estimatorConstructor is set to, for instance, estimate probabilities by 
     177m-estimate with m=2.0, m-estimates with m=2.0 will be used for estimation of 
     178conditional probabilities, too. 
     179P(c|vi) for continuous attributes are, by default estimated with loess (a 
     180variant of locally weighted linear regression), using 
     181ConditionalProbabilityEstimatorConstructor_loess. 
     182The learner first constructs an estimator for p(C). It tries to get a 
     183precomputed distribution of probabilities; if the estimator is capable of 
     184returning it, the distribution is stored in the classifier's field distribution 
     185and the just constructed estimator is disposed. Otherwise, the estimator is 
     186stored in the classifier's field estimator, while the distribution is left 
     187empty. 
     188 
     189The same is then done for conditional probabilities. Different constructors are 
     190used for discrete and continuous attributes. If the constructed estimator can 
     191return all conditional probabilities in form of Contingency, the contingency is 
     192stored and the estimator disposed. If not, the estimator is stored. If there 
     193are no contingencies when the learning is finished, the resulting classifier's 
     194conditionalDistributions is None. Alternatively, if all probabilities are 
     195stored as contingencies, the conditionalEstimators fields is None. 
     196 
     197Field normalizePredictions is copied to the resulting classifier. 
     198 
     199Orange.core.BayesClassifier 
     200=========================== 
     201Class NaiveClassifier represents a naive Bayesian classifier. Probability of 
     202class C, knowing that values of features :math:`F_1, F_2, ..., F_n` are 
     203:math:`v_1, v_2, ..., v_n`, is computed as :math:`p(C|v_1, v_2, ..., v_n) = \ 
     204p(C) \\cdot \\frac{p(C|v_1)}{p(C)} \\cdot \\frac{p(C|v_2)}{p(C)} \\cdot ... \ 
     205\\cdot \\frac{p(C|v_n)}{p(C)}`. 
     206 
     207Note that when relative frequencies are used to estimate probabilities, the 
     208more usual formula (with factors of form :math:`\\frac{p(v_i|C)}{p(v_i)}`) and 
     209the above formula are exactly equivalent (without any additional assumptions of 
     210independency, as one could think at a first glance). The difference becomes 
     211important when using other ways to estimate probabilities, like, for instance, 
     212m-estimate. In this case, the above formula is much more appropriate.  
     213 
     214When computing the formula, probabilities p(C) are read from distribution which 
     215is of type Distribution and stores a (normalized) probability of each class. 
     216When distribution is None, BayesClassifier calls estimator to assess the 
     217probability. The former method is faster and is actually used by all existing 
     218methods of probability estimation. The latter is more flexible. 
     219 
     220Conditional probabilities are computed similarly. Field conditionalDistribution 
     221is of type DomainContingency which is basically a list of instances of 
     222Contingency, one for each attribute; the outer variable of the contingency is 
     223the attribute and the inner is the class. Contingency can be seen as a list of 
     224normalized probability distributions. For attributes for which there is no 
     225contingency in conditionalDistribution a corresponding estimator in 
     226conditionalEstimators is used. The estimator is given the attribute value and 
     227returns distributions of classes. 
     228 
     229If neither, nor pre-computed contingency nor conditional estimator exist, the 
     230attribute is ignored without issuing any warning. The attribute is also ignored 
     231if its value is undefined; this cannot be overriden by estimators. 
     232 
     233Any field (distribution, estimator, conditionalDistributions, 
     234conditionalEstimators) can be None. For instance, BayesLearner normally 
     235constructs a classifier which has either distribution or estimator defined. 
     236While it is not an error, to have both, only distribution will be used in that 
     237case. As for the other two fields, they can be both defined and used 
     238complementarily; the elements which are missing in one are defined in the 
     239other. However, if there is no need for estimators, BayesLearner will not 
     240construct an empty list; it will not construct a list at all, but leave the 
     241field conditionalEstimators empty. 
     242 
     243If you only need probabilities of individual class call BayesClassifier's 
     244method p(class, example) to compute the probability of this class only. Note 
     245that this probability will not be normalized and will thus, in general, not 
     246equal the probability returned by the call operator. 
    146247""" 
    147248 
     
    197298    """ 
    198299     
    199     def __new__(cls, examples = None, weightID = 0, **argkw): 
     300    def __new__(cls, instances = None, weightID = 0, **argkw): 
    200301        self = Orange.classification.Learner.__new__(cls, **argkw) 
    201         if examples: 
     302        if instances: 
    202303            self.__init__(**argkw) 
    203             return self.__call__(examples, weightID) 
     304            return self.__call__(instances, weightID) 
    204305        else: 
    205306            return self 
     
    247348class NaiveClassifier(Orange.classification.Classifier): 
    248349    """ 
    249     Predictor based on calculated probabilities 
    250      
    251     :param baseClassifier: 
    252     :type: 
     350    Predictor based on calculated probabilities. It wraps an 
     351    :class:`Orange.core.BayesClassifier` that does the actual classification. 
     352     
     353    :param baseClassifier: an :class:`Orange.core.BayesLearner` to wrap. If 
     354            not set, a new :class:`Orange.core.BayesLearner` is created. 
     355    :type baseClassifier: :class:`Orange.core.BayesLearner` 
    253356     
    254357    :var distribution: Stores probabilities of classes, i.e. p(C) for each 
     
    301404        returned from __call__ 
    302405         
    303         :param class_: 
    304         :type class_: 
     406        :param class_: class variable for which the probability should be 
     407                outputed 
     408        :type class_: :class:Orange.data.Variable` 
    305409        :param instance: instance to be classified 
    306410        :type instance: :class:`Orange.data.Instance` 
  • orange/doc/Orange/rst/code/bayes-run.py

    r7368 r7418  
    1010learner = Orange.classification.bayes.NaiveLearner() 
    1111classifier = learner(table) 
    12 prediction = classifier(table[0]) 
     12 
     13for ex in table[:5]: 
     14    print ex.getclass(), classifier(ex) 
  • orange/doc/Orange/rst/code/selection-bayes.py

    r7319 r7418  
    1 # Description: Compares naive Bayes with and withouth feature subset selection 
     1# Description: Compares naive Bayes with and without feature subset selection 
    22# Category:    feature selection 
    33# Uses:        voting.tab 
     
    2323        ma = orngFSS.attMeasure(table) 
    2424        filtered = orngFSS.selectBestNAtts(table, ma, self.N) 
    25         model = Orange.classification.bayes.NaiveBayesLearner(filtered) 
     25        model = Orange.classification.bayes.NaiveLearner(filtered) 
    2626        return BayesFSS_Classifier(classifier=model, N=self.N, name=self.name) 
    2727 
     
    3636import orngStat, orngTest 
    3737table = Orange.data.Table("voting") 
    38 learners = (Orange.classification.bayes.NaiveBayesLearner(name='Naive Bayes'), 
     38learners = (Orange.classification.bayes.NaiveLearner(name='Naive Bayes'), 
    3939            BayesFSS(name="with FSS")) 
    4040results = orngTest.crossValidation(learners, table) 
Note: See TracChangeset for help on using the changeset viewer.