## Naive Bayes classifier (bayes)¶

A Naive Bayes classifier is a probabilistic classifier that estimates conditional probabilities of the dependant variable from training data and uses them for classification of new data instances. The algorithm is very fast for discrete features, but runs slower for continuous features.

The following example demonstrates a straightforward invocation of this algorithm:

```import Orange
titanic = Orange.data.Table("titanic.tab")

learner = Orange.classification.bayes.NaiveLearner()
classifier = learner(titanic)

for inst in titanic[:5]:
print inst.getclass(), classifier(inst)
```
class Orange.classification.bayes.NaiveLearner(adjust_threshold=False, m=0, estimator_constructor=None, conditional_estimator_constructor=None, conditional_estimator_constructor_continuous=None, **argkw)

Probabilistic classifier based on applying Bayes’ theorem (from Bayesian statistics) with strong (naive) independence assumptions. Constructor parameters set the corresponding attributes.

If set and the class is binary, the classifier’s threshold will be set as to optimize the classification accuracy. The threshold is tuned by observing the probabilities predicted on learning data. Setting it to True can increase the accuracy considerably

m

m for m-estimate. If set, m-estimation of probabilities will be used using M. This attribute is ignored if you also set estimator_constructor.

estimator_constructor

Probability estimator constructor for prior class probabilities. Defaults to RelativeFrequency. Setting this attribute disables the above described attribute m.

conditional_estimator_constructor

Probability estimator constructor for conditional probabilities for discrete features. If omitted, the estimator for prior probabilities will be used.

conditional_estimator_constructor_continuous

Probability estimator constructor for conditional probabilities for continuous features. Defaults to Loess.

__call__(data, weight=0)

Learn from the given table of data instances.

Parameters: data (Table) – Data instances to learn from. weight (int) – Id of meta attribute with weights of instances NaiveClassifier
class Orange.classification.bayes.NaiveClassifier(base_classifier=None)

Predictor based on calculated probabilities.

distribution

Stores probabilities of classes, i.e. p(C) for each class C.

estimator

An object that returns a probability of class p(C) for a given class C.

conditional_distributions

A list of conditional probabilities.

conditional_estimators

A list of estimators for conditional probabilities.

For binary classes, this tells the learner to determine the optimal threshold probability according to 0-1 loss on the training set. For multiple class problems, it has no effect.

__call__(instance, result_type=0, *args, **kwdargs)

Classify a new instance.

Parameters: instance (Instance) – instance to be classified. result_type – GetValue or GetProbabilities or GetBoth Value, Distribution or a tuple with both
__str__()

Return classifier in human friendly format.

p(class_, instance)

Return probability of a single class. Probability is not normalized and can be different from probability returned from __call__.

Parameters: class (Value) – class value for which the probability should be output. instance (Instance) – instance to be classified.

### Examples¶

NaiveLearner can estimate probabilities using relative frequencies or m-estimate:

```import Orange

lenses = Orange.data.Table("lenses.tab")

bayes_L = Orange.classification.bayes.NaiveLearner(name="Naive Bayes")
bayesWithM_L = Orange.classification.bayes.NaiveLearner(m=2, name="Naive Bayes w/ m-estimate")
bayes = bayes_L(lenses)
bayesWithM = bayesWithM_L(lenses)

print bayes.conditional_distributions
# prints: <<'pre-presbyopic': <0.625, 0.125, 0.250>, 'presbyopic': <0.750, 0.125, 0.125>, ...>>
print bayesWithM.conditional_distributions
# prints: <<'pre-presbyopic': <0.625, 0.133, 0.242>, 'presbyopic': <0.725, 0.133, 0.142>, ...>>

print bayes.distribution
# prints: <0.625, 0.167, 0.208>
print bayesWithM.distribution
# prints: <0.625, 0.167, 0.208>
```

Conditional probabilities in an m-estimate based classifier show a shift towards the second class - as compared to probabilities above, where relative frequencies were used. The change in error estimation did not have any effect on apriori probabilities:

```import Orange
from Orange.classification import bayes
from Orange.evaluation import testing, scoring

nb = bayes.NaiveLearner(name="Naive Bayes")

print "%.6f, %.6f" % tuple(scoring.CA(results))
```

Setting adjust_threshold can improve the results. The classification accuracies of 10-fold cross-validation of a normal naive bayesian classifier, and one with an adjusted threshold:

```[0.7901746265516516, 0.8280138859667578]
```

Probability distributions for continuous features are estimated with Loess.

```iris = Orange.data.Table("iris.tab")
nb = Orange.classification.bayes.NaiveLearner(iris)

sepal_length, probabilities = zip(*nb.conditional_distributions[0].items())
p_setosa, p_versicolor, p_virginica = zip(*probabilities)

pylab.xlabel("sepal length")
pylab.ylabel("probability")
pylab.plot(sepal_length, p_setosa, label="setosa", linewidth=2)
pylab.plot(sepal_length, p_versicolor, label="versicolor", linewidth=2)
pylab.plot(sepal_length, p_virginica, label="virginica", linewidth=2)

pylab.legend(loc="best")
pylab.savefig("bayes-iris.png")
```

If petal lengths are shorter, the most probable class is “setosa”. Irises with middle petal lengths belong to “versicolor”, while longer petal lengths indicate for “virginica”. Critical values where the decision would change are at about 5.4 and 6.3.