Orange

Prev: Classification Next: Selected Classification Methods, Up: Classification

My First Orange Classifier

There are two types of objects that will be introduced in this lesson: learners and classifiers. Orange has a number of build-in learners. For instance, orange.BayesLearner is a naive Bayesian learner. When data is passed to a learner (e.g., orange.BayesLearner(data)), it returns a classifier. When data instance is presented to a classifier, it returns a class, vector of class probabilities, or both.

A Simple Classifier

Let us see how this works in practice. For a start, we will construct a naive Bayesian classifier from voting data set, and will use it to classify the first five instances from this data set (don't worry about overfitting for now).

classifier.py (uses voting.tab)

import orange data = orange.ExampleTable("voting") classifier = orange.BayesLearner(data) for i in range(5): c = classifier(data[i]) print "original", data[i].getclass(), "classified as", c

The script loads the data, uses it to constructs a classifier using naive Bayesian method, and then classifies first five instances of the data set. Note that both original class and the class assigned by a classifier is printed out.

The data set that we use includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes; a class is a representative's party. There are 435 data instances - 267 democrats and 168 republicans - in the data set (see UCI ML Repository and voting-records data set for further description). This is how our classifier performs on the first five instances:

1: republican (originally republican) 2: republican (originally republican) 3: republican (originally democrat) 4: democrat (originally democrat) 5: democrat (originally democrat)

You can see that naive Bayes makes a mistake at a third instance, but otherwise predicts correctly.

Obtaining Class Probabilities

To find out what is the probability that the classifier assigns to, say, democrat class, we need to call the classifier with additional parameter orange.GetProbabilities. Also, note that the democrats have a class index 1 (we find this out with print data.domain.classVar.values; notice that indices in Python start with 0; also notice that we have indicated the order of classes in the voting.tab already: instead of writing discrete for the class variable, we listed its set of possible values in the desired order).

classifier2.py (uses voting.tab)

import orange data = orange.ExampleTable("voting") classifier = orange.BayesLearner(data) print "Possible classes:", data.domain.classVar.values print "Probabilities for democrats:" for i in range(5): p = classifier(data[i], orange.GetProbabilities) print "%d: %5.3f (originally %s)" % (i+1, p[1], data[i].getclass())

The output of this script is:

Possible classes: <republican, democrat> Probabilities for democrats: 1: 0.000 (originally republican) 2: 0.000 (originally republican) 3: 0.005 (originally democrat) 4: 0.998 (originally democrat) 5: 0.957 (originally democrat)

The printout, for example, shows that with the third instance naive Bayes has not only misclassified, but the classifier missed quite substantially; it has assigned only a 0.005 probability to the correct class.

Where Next?

In Orange, most of the classifiers support the prediction of both class and/or probabilities, so what you have learned here on this topic is rather general. If you want to get a taste of some other Orange's classifiers, check the next lesson. Alternatively, you may go directly to see how the classifiers are tested and evaluated.



Prev: Classification Next: Selected Classification Methods, Up: Classification