Changeset 8107:5c3d5f9d69bf in orange


Ignore:
Timestamp:
07/21/11 13:21:12 (3 years ago)
Author:
anze <anze.staric@…>
Branch:
default
Convert:
79d5678cf1692ce33d8e6b94685e9b17bf0f462d
Message:

Fixed casing of attributes in documentation and output of models with continuous features.
Closes #796

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/bayes.py

    r8042 r8107  
    1 """  
    2 .. index:: naive Bayes classifier 
    3     
    4 .. index::  
    5    single: classification; naive Bayes classifier 
    6  
    7 ********************************** 
    8 Naive Bayes classifier (``bayes``) 
    9 ********************************** 
    10  
    11 The most primitive Bayesian classifier is :obj:`NaiveLearner`.  
    12 `Naive Bayes classification algorithm <http://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_  
    13 estimates conditional probabilities from training data and uses them 
    14 for classification of new data instances. The algorithm learns very fast if all features 
    15 in the training data set are discrete. If a number of features are continues, though, the  
    16 algorithm runs slower due to time spent to estimate continuous conditional distributions. 
    17  
    18 The following example demonstrates a straightforward invocation of 
    19 this algorithm (`bayes-run.py`_, uses `titanic.tab`_): 
    20  
    21 .. literalinclude:: code/bayes-run.py 
    22    :lines: 7- 
    23  
    24 .. index:: Naive Bayesian Learner 
    25 .. autoclass:: Orange.classification.bayes.NaiveLearner 
    26    :members: 
    27    :show-inheritance: 
    28   
    29 .. autoclass:: Orange.classification.bayes.NaiveClassifier 
    30    :members: 
    31    :show-inheritance: 
    32     
    33     
    34 Examples 
    35 ======== 
    36  
    37 :obj:`NaiveLearner` can estimate probabilities using relative frequencies or 
    38 m-estimate (`bayes-mestimate.py`_, uses `lenses.tab`_): 
    39  
    40 .. literalinclude:: code/bayes-mestimate.py 
    41     :lines: 7- 
    42  
    43 Observing conditional probabilities in an m-estimate based classifier shows a 
    44 shift towards the second class - as compared to probabilities above, where 
    45 relative frequencies were used. Note that the change in error estimation did 
    46 not have any effect on apriori probabilities 
    47 (`bayes-thresholdAdjustment.py`_, uses `adult-sample.tab`_): 
    48  
    49 .. literalinclude:: code/bayes-thresholdAdjustment.py 
    50     :lines: 7- 
    51      
    52 Setting adjustThreshold parameter can sometimes improve the results. Those are 
    53 the classification accuracies of 10-fold cross-validation of a normal naive 
    54 bayesian classifier, and one with an adjusted threshold:: 
    55  
    56     [0.7901746265516516, 0.8280138859667578] 
    57  
    58 Probabilities for continuous features are estimated with \ 
    59 :class:`ProbabilityEstimatorConstructor_loess`. 
    60 (`bayes-plot-iris.py`_, uses `iris.tab`_): 
    61  
    62 .. literalinclude:: code/bayes-plot-iris.py 
    63     :lines: 4- 
    64      
    65 .. image:: code/bayes-iris.png 
    66    :scale: 50 % 
    67  
    68 If petal lengths are shorter, the most probable class is "setosa". Irises with 
    69 middle petal lengths belong to "versicolor", while longer petal lengths indicate 
    70 for "virginica". Critical values where the decision would change are at about 
    71 5.4 and 6.3. 
    72  
    73  
    74 .. _bayes-run.py: code/bayes-run.py 
    75 .. _bayes-thresholdAdjustment.py: code/bayes-thresholdAdjustment.py 
    76 .. _bayes-mestimate.py: code/bayes-mestimate.py 
    77 .. _bayes-plot-iris.py: code/bayes-plot-iris.py 
    78 .. _adult-sample.tab: code/adult-sample.tab 
    79 .. _iris.tab: code/iris.tab 
    80 .. _titanic.tab: code/iris.tab 
    81 .. _lenses.tab: code/lenses.tab 
    82  
    83 Implementation details 
    84 ====================== 
    85  
    86 The following two classes are implemented in C++ (*bayes.cpp*). They are not 
    87 intended to be used directly. Here we provide implementation details for those 
    88 interested. 
    89  
    90 Orange.core.BayesLearner 
    91 ------------------------ 
    92 Fields estimatorConstructor, conditionalEstimatorConstructor and 
    93 conditionalEstimatorConstructorContinuous are empty (None) by default. 
    94  
    95 If estimatorConstructor is left undefined, p(C) will be estimated by relative 
    96 frequencies of examples (see ProbabilityEstimatorConstructor_relative). 
    97 When conditionalEstimatorConstructor is left undefined, it will use the same 
    98 constructor as for estimating unconditional probabilities (estimatorConstructor 
    99 is used as an estimator in ConditionalProbabilityEstimatorConstructor_ByRows). 
    100 That is, by default, both will use relative frequencies. But when 
    101 estimatorConstructor is set to, for instance, estimate probabilities by 
    102 m-estimate with m=2.0, the same estimator will be used for estimation of 
    103 conditional probabilities, too. 
    104 P(c|vi) for continuous attributes are, by default, estimated with loess (a 
    105 variant of locally weighted linear regression), using 
    106 ConditionalProbabilityEstimatorConstructor_loess. 
    107 The learner first constructs an estimator for p(C). It tries to get a 
    108 precomputed distribution of probabilities; if the estimator is capable of 
    109 returning it, the distribution is stored in the classifier's field distribution 
    110 and the just constructed estimator is disposed. Otherwise, the estimator is 
    111 stored in the classifier's field estimator, while the distribution is left 
    112 empty. 
    113  
    114 The same is then done for conditional probabilities. Different constructors are 
    115 used for discrete and continuous attributes. If the constructed estimator can 
    116 return all conditional probabilities in form of Contingency, the contingency is 
    117 stored and the estimator disposed. If not, the estimator is stored. If there 
    118 are no contingencies when the learning is finished, the resulting classifier's 
    119 conditionalDistributions is None. Alternatively, if all probabilities are 
    120 stored as contingencies, the conditionalEstimators fields is None. 
    121  
    122 Field normalizePredictions is copied to the resulting classifier. 
    123  
    124 Orange.core.BayesClassifier 
    125 --------------------------- 
    126 Class NaiveClassifier represents a naive bayesian classifier. Probability of 
    127 class C, knowing that values of features :math:`F_1, F_2, ..., F_n` are 
    128 :math:`v_1, v_2, ..., v_n`, is computed as :math:`p(C|v_1, v_2, ..., v_n) = \ 
    129 p(C) \\cdot \\frac{p(C|v_1)}{p(C)} \\cdot \\frac{p(C|v_2)}{p(C)} \\cdot ... \ 
    130 \\cdot \\frac{p(C|v_n)}{p(C)}`. 
    131  
    132 Note that when relative frequencies are used to estimate probabilities, the 
    133 more usual formula (with factors of form :math:`\\frac{p(v_i|C)}{p(v_i)}`) and 
    134 the above formula are exactly equivalent (without any additional assumptions of 
    135 independency, as one could think at a first glance). The difference becomes 
    136 important when using other ways to estimate probabilities, like, for instance, 
    137 m-estimate. In this case, the above formula is much more appropriate.  
    138  
    139 When computing the formula, probabilities p(C) are read from distribution, which 
    140 is of type Distribution, and stores a (normalized) probability of each class. 
    141 When distribution is None, BayesClassifier calls estimator to assess the 
    142 probability. The former method is faster and is actually used by all existing 
    143 methods of probability estimation. The latter is more flexible. 
    144  
    145 Conditional probabilities are computed similarly. Field conditionalDistribution 
    146 is of type DomainContingency which is basically a list of instances of 
    147 Contingency, one for each attribute; the outer variable of the contingency is 
    148 the attribute and the inner is the class. Contingency can be seen as a list of 
    149 normalized probability distributions. For attributes for which there is no 
    150 contingency in conditionalDistribution a corresponding estimator in 
    151 conditionalEstimators is used. The estimator is given the attribute value and 
    152 returns distributions of classes. 
    153  
    154 If neither, nor pre-computed contingency nor conditional estimator exist, the 
    155 attribute is ignored without issuing any warning. The attribute is also ignored 
    156 if its value is undefined; this cannot be overriden by estimators. 
    157  
    158 Any field (distribution, estimator, conditionalDistributions, 
    159 conditionalEstimators) can be None. For instance, BayesLearner normally 
    160 constructs a classifier which has either distribution or estimator defined. 
    161 While it is not an error to have both, only distribution will be used in that 
    162 case. As for the other two fields, they can be both defined and used 
    163 complementarily; the elements which are missing in one are defined in the 
    164 other. However, if there is no need for estimators, BayesLearner will not 
    165 construct an empty list; it will not construct a list at all, but leave the 
    166 field conditionalEstimators empty. 
    167  
    168 If you only need probabilities of individual class call BayesClassifier's 
    169 method p(class, example) to compute the probability of this class only. Note 
    170 that this probability will not be normalized and will thus, in general, not 
    171 equal the probability returned by the call operator. 
    172 """ 
    173  
    1741import Orange 
    1752from Orange.core import BayesClassifier as _BayesClassifier 
     
    18512     
    18613    .. 
    187         :param adjustTreshold: sets the corresponding attribute 
    188         :type adjustTreshold: boolean 
     14        :param adjust_threshold: sets the corresponding attribute 
     15        :type adjust_threshold: boolean 
    18916        :param m: sets the :obj:`estimatorConstructor` to 
    19017            :class:`orange.ProbabilityEstimatorConstructor_m` with specified m 
    19118        :type m: integer 
    192         :param estimatorConstructor: sets the corresponding attribute 
    193         :type estimatorConstructor: orange.ProbabilityEstimatorConstructor 
    194         :param conditionalEstimatorConstructor: sets the corresponding attribute 
    195         :type conditionalEstimatorConstructor: 
     19        :param estimator_constructor: sets the corresponding attribute 
     20        :type estimator_constructor: orange.ProbabilityEstimatorConstructor 
     21        :param conditional_estimator_constructor: sets the corresponding attribute 
     22        :type conditional_estimator_constructor: 
    19623                :class:`orange.ConditionalProbabilityEstimatorConstructor` 
    197         :param conditionalEstimatorConstructorContinuous: sets the corresponding 
     24        :param conditional_estimator_constructor_continuous: sets the corresponding 
    19825                attribute 
    199         :type conditionalEstimatorConstructorContinuous:  
     26        :type conditional_estimator_constructor_continuous:  
    20027                :class:`orange.ConditionalProbabilityEstimatorConstructor` 
    20128                 
     
    20532    Constructor parameters set the corresponding attributes. 
    20633     
    207     .. attribute:: adjustTreshold 
     34    .. attribute:: adjust_threshold 
    20835     
    20936        If set and the class is binary, the classifier's 
     
    21946        This attribute is ignored if you also set estimatorConstructor. 
    22047         
    221     .. attribute:: estimatorConstructor 
     48    .. attribute:: estimator_constructor 
    22249     
    22350        Probability estimator constructor for 
     
    22653        Setting this attribute disables the above described attribute m. 
    22754         
    228     .. attribute:: conditionalEstimatorConstructor 
     55    .. attribute:: conditional_estimator_constructor 
    22956     
    23057        Probability estimator constructor 
     
    23259        the estimator for prior probabilities will be used. 
    23360         
    234     .. attribute:: conditionalEstimatorConstructorContinuous 
     61    .. attribute:: conditional_estimator_constructor_continuous 
    23562     
    23663        Probability estimator constructor for conditional probabilities for 
     
    292119      "conditionalEstimatorConstructorContinuous":"conditional_estimator_constructor_continuous", 
    293120      "weightID": "weight_id" 
    294 }, in_place=False)(NaiveLearner) 
     121}, in_place=True)(NaiveLearner) 
    295122 
    296123 
     
    300127    :class:`Orange.core.BayesClassifier` that does the actual classification. 
    301128     
    302     :param baseClassifier: an :class:`Orange.core.BayesLearner` to wrap. If 
     129    :param base_classifier: an :class:`Orange.core.BayesLearner` to wrap. If 
    303130            not set, a new :class:`Orange.core.BayesLearner` is created. 
    304     :type baseClassifier: :class:`Orange.core.BayesLearner` 
     131    :type base_classifier: :class:`Orange.core.BayesLearner` 
    305132     
    306133    .. attribute:: distribution 
     
    312139        An object that returns a probability of class p(C) for a given class C. 
    313140         
    314     .. attribute:: conditionalDistributions 
     141    .. attribute:: conditional_distributions 
    315142     
    316143        A list of conditional probabilities. 
    317144         
    318     .. attribute:: conditionalEstimators 
     145    .. attribute:: conditional_estimators 
    319146     
    320147        A list of estimators for conditional probabilities. 
    321148         
    322     .. attribute:: adjustThreshold 
     149    .. attribute:: adjust_threshold 
    323150     
    324151        For binary classes, this tells the learner to 
     
    328155    """ 
    329156     
    330     def __init__(self, baseClassifier=None): 
    331         if not baseClassifier: baseClassifier = _BayesClassifier() 
    332         self.nativeBayesClassifier = baseClassifier 
    333         for k, v in self.nativeBayesClassifier.__dict__.items(): 
     157    def __init__(self, base_classifier=None): 
     158        if not base_classifier: base_classifier = _BayesClassifier() 
     159        self.native_bayes_classifier = base_classifier 
     160        for k, v in self.native_bayes_classifier.__dict__.items(): 
    334161            self.__dict__[k] = v 
    335162   
     
    347174              :class:`Orange.statistics.Distribution` or a tuple with both 
    348175        """ 
    349         return self.nativeBayesClassifier(instance, result_type, *args, **kwdargs) 
     176        return self.native_bayes_classifier(instance, result_type, *args, **kwdargs) 
    350177 
    351178    def __setattr__(self, name, value): 
    352         if name == "nativeBayesClassifier": 
     179        if name == "native_bayes_classifier": 
    353180            self.__dict__[name] = value 
    354181            return 
    355         if name in self.nativeBayesClassifier.__dict__: 
    356             self.nativeBayesClassifier.__dict__[name] = value 
     182        if name in self.native_bayes_classifier.__dict__: 
     183            self.native_bayes_classifier.__dict__[name] = value 
    357184        self.__dict__[name] = value 
    358185     
     
    370197         
    371198        """ 
    372         return self.nativeBayesClassifier.p(class_, instance) 
     199        return self.native_bayes_classifier.p(class_, instance) 
    373200     
    374201    def __str__(self): 
    375         """return classifier in human friendly format.""" 
    376         nValues=len(self.classVar.values) 
    377         frmtStr=' %10.3f'*nValues 
    378         classes=" "*20+ ((' %10s'*nValues) % tuple([i[:10] for i in self.classVar.values])) 
     202        """Return classifier in human friendly format.""" 
     203        nvalues=len(self.class_var.values) 
     204        frmtStr=' %10.3f'*nvalues 
     205        classes=" "*20+ ((' %10s'*nvalues) % tuple([i[:10] for i in self.class_var.values])) 
    379206         
    380207        return "\n".join([ 
     
    388215                    ("%20s" % i.variable.values[v][:20]) + (frmtStr % tuple(i[v])) 
    389216                    for v in xrange(len(i.variable.values)))] 
    390                 ) for i in self.conditionalDistributions])]) 
     217                ) for i in self.conditional_distributions 
     218                        if i.variable.var_type == Orange.data.variable.Discrete])]) 
    391219             
    392220 
Note: See TracChangeset for help on using the changeset viewer.