orange/Orange/classification/bayes.py
.. index:: Naive Bayesian Learner
.. autoclass:: Orange.classification.bayes.NaiveBayesLearner
:members:

.. autoclass:: Orange.classification.bayes.NaiveBayesClassifier
:members:

Examples
========
Let us load the data, induce a classifier and see how it performs on the first
five examples.

>>> data = orange.ExampleTable("lenses")
>>> bayes = orange.BayesLearner(data)
>>>
>>> for ex in data[:5]:
...     print ex.getclass(), bayes(ex)
no no
no no
soft soft
no no
hard hard

The classifier is correct in all five cases. Interested in probabilities,
maybe?

>>> for ex in data[:5]:
...     print ex.getclass(), bayes(ex, orange.Classifier.GetProbabilities)
no <0.423, 0.000, 0.577>
no <0.000, 0.000, 1.000>
soft <0.000, 0.668, 0.332>
no <0.000, 0.000, 1.000>
hard <0.715, 0.000, 0.285>

While very confident about the second and the fourth example, the classifier
guessed the correct class of the first one only by a small margin of 42 vs.
58 percents.

Now, let us peek into the classifier.

>>> print bayes.estimator
None
>>> print bayes.distribution
<0.167, 0.208, 0.625>
>>> print bayes.conditionalEstimators
None
>>> print bayes.conditionalDistributions[0]
<'young': <0.250, 0.250, 0.500>, 'p_psby': <0.125, 0.250, 0.625>, (...)
>>> bayes.conditionalDistributions[0]["young"]
<0.250, 0.250, 0.500>

The classifier has no estimator since probabilities are stored in distribution.
The probability of the first class is 0.167, of the second 0.208 and the
probability of the third class is 0.625. Nor does it have
conditionalEstimators, probabilities are stored in conditionalDistributions.
We printed the contingency matrix for the first attribute and, in the last
line, conditional probabilities of the three classes when the value of the
first attribute is "young".

Let us now use mestimate instead of relative frequencies.

>>> bayesl = orange.BayesLearner()
>>> bayesl.estimatorConstructor = orange.ProbabilityEstimatorConstructor_m(m=2.0)
>>> bayes = bayesl(data)

The classifier is still correct for all examples.

>>> for ex in data[:5]:
...     print ex.getclass(), bayes(ex, no <0.375, 0.063, 0.562>
no <0.016, 0.003, 0.981>
soft <0.021, 0.607, 0.372>
no <0.001, 0.039, 0.960>
hard <0.632, 0.030, 0.338>

Observing probabilities shows a shift towards the third, more frequent class 
as compared to probabilities above, where relative frequencies were used.

>>> print bayes.conditionalDistributions[0]
<'young': <0.233, 0.242, 0.525>, 'p_psby': <0.133, 0.242, 0.625>, (...)

Note that the change in error estimation did not have any effect on apriori
probabilities:

>>> print bayes.distribution
<0.167, 0.208, 0.625>

The reason for this is that this same distribution was used as apriori
distribution for mestimation. (How to enforce another apriori distribution?
While the orange C++ core supports of it, this feature has not been exported
to Python yet.)

Finally, let us show an example with continuous attributes. We will take iris
dataset that contains four continuous and no discrete attributes.

>>> data = orange.ExampleTable("iris")
>>> bayes = orange.BayesLearner(data)
>>> for exi in range(0, len(data), 20):
...     print data[exi].getclass(), bayes(data[exi], orange.Classifier.GetBoth)

The classifier works well. To see a glimpse of how it works, let us observe
conditional distributions for the first attribute. It is stored in
conditionalDistributions, as before, except that it now behaves as a dictionary,
not as a list like before (see information on distributions.

>>> print bayes.conditionalDistributions[0]
<4.300: <0.837, 0.137, 0.026>;, 4.333: <0.834, 0.140, 0.026>, 4.367: <0.830, (...)

For a nicer picture, we can print out the probabilities, copy and paste it to
some graph drawing program ... and get something like the figure below.

>>> for x, probs in bayes.conditionalDistributions[0].items():
...     print "%5.3f\t%5.3f\t%5.3f\t%5.3f" % (x, probs[0], probs[1], probs[2])
4.300   0.837   0.137   0.026
4.333   0.834   0.140   0.026
4.367   0.830   0.144   0.026
4.400   0.826   0.147   0.027
4.433   0.823   0.150   0.027
(...)

If petal lengths are shorter, the most probable class is "setosa". Irises with middle petal lengths belong to "versicolor", while longer petal lengths indicate for "virginica". Critical values where the decision would change are at about 5.4 and 6.3.

It is important to stress that the curves are relatively smooth although no fitting (either manual or automatic) of parameters took place.

"""

from Orange.core import BayesLearner as _BayesLearner

class NaiveBayesLearner(Orange.core.Learner):
    """
    Probabilistic classifier based on applying Bayes' theorem (from Bayesian
    statistics) with strong (naive) independence assumptions.
    If data instances are provided to the constructor, the learning algorithm
    is called and the resulting classifier is returned instead of the learner.

    :param adjustTreshold: If set and the class is binary, the classifier's
        threshold will be set as to optimize the classification accuracy.
        The threshold is tuned by observing the probabilities predicted on
        learning data. Setting it to True can increase the
        accuracy considerably.
    :type adjustTreshold: boolean
    :param m: m for mestimate. If set, mestimation of probabilities
        will be used using :class:`orange.ProbabilityEstimatorConstructor_m`
        This attribute is ignored if you also set estimatorConstructor.
    :type m: integer
    :param estimatorConstructor: Probability estimator constructor for
        prior class probabilities. Defaults to
        :`class:orange.ProbabilityEstimatorConstructor_relative`
        Setting this attribute disables the above described attribute m.
    :type estimatorConstructor: orange.ProbabilityEstimatorConstructor
    :param conditionalEstimatorConstructor: Probability estimator constructor
        for conditional probabilities for discrete features. If omitted,
        the estimator for prior probabilities will be used.
    :type conditionalEstimatorConstructor: orange.ConditionalProbabilityEstimatorConstructor
    :param conditionalEstimatorConstructorContinuous: Probability estimator constructor
        for conditional probabilities for continuous features. Defaults to 165 :class:`orange.ConditionalProbabilityEstimatorConstructor_loess` 166 :type conditionalEstimatorConstructorContinuous: orange.ConditionalProbabilityEstimatorConstructor 167 :rtype: :class:`Orange.classification.bayes.NaiveBayesLearner` or 168 :class:`Orange.classification.bayes.NaiveBayesClassifier` 21 169 """ 22 170 … … 29 177 return self 30 178 31 def __init__(self, normalizePredictions=True, adjustTreshold=False,32 m=0, estimatorConstructor=None,conditionalEstimatorConstructor=None,179 def __init__(self, adjustTreshold=False, m=0, estimatorConstructor=None, 180 conditionalEstimatorConstructor=None, 33 181 conditionalEstimatorConstructorContinuous=None,**argkw): 34 """35 :param adjustTreshold: If set and the class is binary, the classifier's36 threshold will be set as to optimize the classification accuracy.37 The threshold is tuned by observing the probabilities predicted on38 learning data. Default is False (to conform with the usual naive39 bayesian classifiers), but setting it to True can increase the40 accuracy considerably.41 :type adjustTreshold: boolean42 :param m: m for mestimate. If set, mestimation of probabilities43 will be used using :class:`orange.ProbabilityEstimatorConstructor_m`44 This attribute is ignored if you also set estimatorConstructor.45 :type m: integer46 :param estimatorConstructor: Probability estimator constructor for47 prior class probabilities. Defaults to48 :`class:orange.ProbabilityEstimatorConstructor_relative`49 Setting this attribute disables the above described attribute m.50 :type estimatorConstructor: orange.ProbabilityEstimatorConstructor51 :param conditionalEstimatorConstructor: Probability estimator constructor52 for conditional probabilities for discrete features. If omitted,53 the estimator for a priori will be used.54 class probabilities.55 :type conditionalEstimatorConstructor: orange.ConditionalProbabilityEstimatorConstructor56 :param conditionalEstimatorConstructorContinuous: Probability estimator constructor57 for conditional probabilities for continuous features. Defaults to58 :class:`orange.ConditionalProbabilityEstimatorConstructor_loess`59 :type conditionalEstimatorConstructorContinuous: orange.ConditionalProbabilityEstimatorConstructor60 """61 182 self.adjustThreshold = adjustTreshold 62 183 self.m = m … … 66 187 self.__dict__.update(argkw) 67 188 68 def __call__(self, examples, weight=0): 189 def __call__(self, instances, weight=0): 190 """Learn from the given table of data instances. 191 192 :param instances: Data instances to learn from. 193 :type instances: Orange.data.Table 194 :param weight: Id of meta attribute with weights of instances 195 :type weight: integer 196 :rtype: :class:`Orange.classification.bayes.NaiveBayesClassifier` 197 """ 69 198 bayes = _BayesLearner() 70 199 if self.estimatorConstructor: … … 86 215 bayes.conditionalEstimatorConstructorContinuous = self.conditionalEstimatorConstructorContinuous 87 216 88 return NaiveBayesClassifier(bayes( examples, weight))217 return NaiveBayesClassifier(bayes(instances, weight)) 89 218 90 219 class NaiveBayesClassifier(Orange.core.Classifier): 91 def __init__(self, nbc): 92 self.nativeBayesClassifier = nbc 220 """ 221 Wrapps a native BayesClassifier to add print method 222 :param: 223 """ 224 225 def __init__(self, nativeBayesClassifier): 226 self.nativeBayesClassifier = nativeBayesClassifier 93 227 for k, v in self.nativeBayesClassifier.__dict__.items(): 94 228 self.__dict__[k] = v 95 229 96 def __call__(self, *args, **kwdargs): 97 self.nativeBayesClassifier(*args, **kwdargs) 230 def __call__(self, instance, *args, **kwdargs): 231 """Classify a new instance 232 :param instance: instance to be classifier 233 :type instance: :class:`Orange.data.Instance` 234 :rtype: :class:Orange.data.` 235 """ 236 self.nativeBayesClassifier(instance, *args, **kwdargs) 98 237 99 238 def __setattr__(self, name, value): … … 105 244 self.__dict__[name] = value 106 245 246 def p(self, class_, instance): 247 """Return probability of single class 248 249 Probability is not normalized and can be different from probability 250 returned from __call__ 251 """ 252 return self.nativeBayesClassifier.p(class_, instance) 107 253 108 254 def printModel(self): 255 """Print classificator in human friendly format""" 109 256 nValues=len(self.classVar.values) 110 257 frmtStr=' %10.3f'*nValues 
