# source:orange/orange/doc/ofb/c_basics.htm@6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 4.5 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line
4<body>
5
6<p class="Path">
7Prev: <a href="classification.htm">Classification</a>
8Next: <a href="c_otherclass.htm">Selected Classification Methods</a>,
9Up: <a href="classification.htm">Classification</a>
10</p>
11
12<H1>My First Orange Classifier</H1>
13<index name="classifiers+naive Bayesian classifier">
14
15<p>There are two types of objects that will be introduced in this
16lesson: learners and classifiers. Orange has a number of build-in
17learners. For instance, <code>orange.BayesLearner</code> is a naive Bayesian
18learner. When data is passed to a learner (e.g.,
19<code>orange.BayesLearner(data))</code>, it returns a classifier. When data
20instance is presented to a classifier, it returns a class, vector
21of class probabilities, or both.</p>
22
23<h2>A Simple Classifier</h2>
24
25<p>Let us see how this works in practice. For a start, we will
26construct a naive Bayesian classifier from voting data set, and
27will use it to classify the first five instances from this data set
28(don't worry about overfitting for now).</p>
29
30<p class="header" ><a href="classifier.py">classifier.py</a> (uses <a href=
31"voting.tab">voting.tab</a>)</p>
32<xmp class="code">import orange
33data = orange.ExampleTable("voting")
34classifier = orange.BayesLearner(data)
35for i in range(5):
36    c = classifier(data[i])
37    print "original", data[i].getclass(), "classified as", c
38</xmp>
39
40<p>The script loads the data, uses it to constructs a classifier
41using naive Bayesian method, and then classifies first five
42instances of the data set. Note that both original class and the
43class assigned by a classifier is printed out.</p>
44
45<p>The data set that we use includes votes for each of the U.S.
46House of Representatives Congressmen on the 16 key votes; a class
47is a representative's party. There are 435 data instances -
48267 democrats and 168 republicans - in the data set (see UCI ML
49Repository and voting-records data set for further description).
50This is how our classifier performs on the first five
51instances:</p>
52
53<xmp class="code">1: republican (originally republican)
542: republican (originally republican)
553: republican (originally democrat)
564: democrat (originally democrat)
575: democrat (originally democrat)
58</xmp>
59
60<p>You can see that naive Bayes makes a mistake at a third
61instance, but otherwise predicts correctly.</p>
62
63<h2>Obtaining Class Probabilities</h2>
64
65<p>To find out what is the probability that the classifier assigns
66to, say, democrat class, we need to call the classifier with
67additional parameter <code>orange.GetProbabilities</code>. Also, note that the
68democrats have a class index 1 (we find this out with print
69<code>data.domain.classVar.values</code>; notice that indices in Python start
70with 0; also notice that we have indicated the order of classes in
72writing discrete for the class variable, we listed its set of
73possible values in the desired order).</p>
74
75<p class="header"><a href="classifier2.py">classifier2.py</a> (uses <a href=
76"voting.tab">voting.tab</a>)</p>
77<xmp class="code">import orange
78data = orange.ExampleTable("voting")
79classifier = orange.BayesLearner(data)
80print "Possible classes:", data.domain.classVar.values
81print "Probabilities for democrats:"
82for i in range(5):
83    p = classifier(data[i], orange.GetProbabilities)
84    print "%d: %5.3f (originally %s)" % (i+1, p[1], data[i].getclass())
85</xmp>
86
87
88<p>The output of this script is:</p>
89<xmp class="code">Possible classes: <republican, democrat>
90Probabilities for democrats:
911: 0.000 (originally republican)
922: 0.000 (originally republican)
933: 0.005 (originally democrat)
944: 0.998 (originally democrat)
955: 0.957 (originally democrat)
96</xmp>
97
98<p>The printout, for example, shows that with the third instance
99naive Bayes has not only misclassified, but the classifier missed
100quite substantially; it has assigned only a 0.005 probability to
101the correct class.</p>
102
103<h2>Where Next?</h2>
104
105<p>In Orange, most of the classifiers support the prediction of
106both class and/or probabilities, so what you have learned here on
107this topic is rather general. If you want to get a taste of some
108other Orange's classifiers, check the <a href=
109"c_otherclass.htm">next lesson</a>. Alternatively, you may go
110directly to see how the classifiers are tested and <a href=
111"c_performance.htm">evaluated</a>.</p>
112
113<hr><br><p class="Path">
114Prev: <a href="classification.htm">Classification</a>
115Next: <a href="c_otherclass.htm">Selected Classification Methods</a>,
116Up: <a href="classification.htm">Classification</a>
117</p>
118
119</body></html>
Note: See TracBrowser for help on using the repository browser.