source: orange/Orange/doc/ofb/c_otherclass.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 9.7 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
6<p class="Path">
7Prev: <a href="c_basics.htm">My first Orange classifier</a>,
8Next: <a href="c_performance.htm">Testing and Evaluating</a>,
9Up: <a href="classification.htm">Classification</a>
12<H1>Selected Classification Methods</H1>
14<p>Orange supports a number of classification techniques, for instance
15classification trees, variants of naive Bayes,
16k-nearest neighbors, classification through association rules,
17function decomposition, logistic regression, and support vectors
18machines. We have already seen some of naive Bayes, and we will
19look here for a few more. Bear in mind that probably the best (and
20sometimes the only) way to access different methods is through
21their associated <a href="../modules/default.htm">modules</a>, so you
22should look there for more detailed documentation.</p>
24<h2>Classification Tree</h2>
25<index name="classifiers+classification trees">
27<p>Let us look briefly at a different learning method.
28Classification tree learner (yes, this is the same decision tree)
29is another native Orange learner, but because it is a rather
30complex object that is for its versatility composed of a number of
31other objects (for attribute estimation, stopping criterion, etc.),
32a wrapper (module) called <code>orngTree</code> was build around it to simplify
33the use of classification trees and to assemble the learner with
34some usual (default) components. Here is a script with it:</p>
36<p class="header"><a href=""></a> (uses <a href=
38<xmp class="code">import orange, orngTree
39data = orange.ExampleTable("voting")
41tree = orngTree.TreeLearner(data, sameMajorityPruning=1, mForPruning=2)
42print "Possible classes:", data.domain.classVar.values
43print "Probabilities for democrats:"
44for i in range(5):
45    p = tree(data[i], orange.GetProbabilities)
46    print "%d: %5.3f (originally %s)" % (i+1, p[1], data[i].getclass())
51<p>Note that this script is almost the same as the one for naive
52Bayes (<a href=""></a>), except
53that we have imported another module (<code>orngTree</code>) and used learner
54<code>orngTree.TreeLearner</code> to build a classifier called <code>tree</code>.</p>
56<p>For those of you that are at home with machine learning: the
57default parameters for tree learner assume that a single example is
58enough to have a leaf for it, gain ratio is used for measuring
59the quality of attributes that are considered for internal nodes of
60the tree, and after the tree is constructed the subtrees no pruning takes
61place (see <a href="../modules/orngTree.htm">orngTree documentation</a> for details).
62The resulting tree with default parameters would be rather big, so we have additionally
63requested that leaves that share common predecessor (node) are pruned if they classify to the same
64class, and requested that tree is post-pruned using m-error estimate pruning method with
65parameter m set to 2.0.</p>
67<p>The output of the script that uses classification tree learner
70<xmp class="code">Possible classes: <republican, democrat>
71Probabilities for democrats:
721: 0.051 (originally republican)
732: 0.027 (originally republican)
743: 0.989 (originally democrat)
754: 0.985 (originally democrat)
765: 0.985 (originally democrat)
79<p>Notice that all of the instances are classified correctly. The last
80line of the script prints out the tree that was used for
83<p class="header">output of running the <a href=
84""></a> script</p>
85<xmp class="code">physician-fee-freeze=n: democrat (98.52%)
87|    synfuels-corporation-cutback=n: republican (97.25%)
88|    synfuels-corporation-cutback=y
89|    |    mx-missile=n
90|    |    |    el-salvador-aid=y
91|    |    |    |    adoption-of-the-budget-resolution=n: republican (85.33%)
92|    |    |    |    adoption-of-the-budget-resolution=y
93|    |    |    |    |    anti-satellite-test-ban=n: democrat (99.54%)
94|    |    |    |    |    anti-satellite-test-ban=y: republican (100.00%)
95|    |    |    el-salvador-aid=n
96|    |    |    |    handicapped-infants=n: republican (100.00%)
97|    |    |    |    handicapped-infants=y: democrat (99.77%)
98|    |    mx-missile=y
99|    |    |    religious-groups-in-schools=y: democrat (99.54%)
100|    |    |    religious-groups-in-schools=n
101|    |    |    |    immigration=y: republican (98.63%)
102|    |    |    |    immigration=n
103|    |    |    |    |    handicapped-infants=n: republican (98.63%)
104|    |    |    |    |    handicapped-infants=y: democrat (99.77%)
107<p>Notice that the printout states the decision at internal nodes and, for leaves, the class label to which a tree would make a classification. These later are associated probability, which is estimated from the learning set of examples.</p>
109<p>If you are more of a "visual" type, you may like the following presentation of
110the tree better. This was achieved by printing out a tree in so-called dot file
111(the line of the script required for this is
112<code>orngTree.printDot(tree, fileName='', internalNodeShape="ellipse", leafShape="box")</code>),
113which was then compiled to PNG using <a href="">
114AT&T's Graphviz</a> program called dot (see <a href="../modules/orngTree.htm">orngTree documentation</a>
115for more):</p>
117<img src="tree.png" alt="A picture of decision tree" border="0">
120<h2>A Handful of Others</h2>
121<index name="classifiers+k nearest neighbours">
122<index name="classifiers+majority classifier">
124<p>Let us here check on two other classifiers. The first one,
125called majority classifier, will seem rather useless, as it always
126classifies to the majority class of the learning set. It predicts
127class probabilities that are equal class distributions from
128learning set. While being useless as such, it may often be good to
129compare this simplest classifier to any other classifier you test
130&ndash; if your other classifier is not significantly better than
131majority classifier, than this may a reason to sit back and
134<p>The second classifier we are introducing here is based on
135k-nearest neighbors algorithm, an instance-based method that finds
136k examples from learning set that are most similar to the instance
137that has to be classified. From the set it obtains in this way, it
138estimates class probabilities and uses the most frequent class for
141<p>The following script takes naive Bayes, classification tree
142(what we have already learned), majority and k-nearest neighbors
143classifier (new ones) and prints prediction for first 10 instances
144of voting data set.</p>
146<p class="header"><a href=""></a>
147(uses <a href=""></a>)</p>
148<xmp class="code">import orange, orngTree
149data = orange.ExampleTable("voting")
151# setting up the classifiers
152majority = orange.MajorityLearner(data)
153bayes = orange.BayesLearner(data)
154tree = orngTree.TreeLearner(data, sameMajorityPruning=1, mForPruning=2)
155knn = orange.kNNLearner(data, k=21)
156"Majority";"Naive Bayes";"Tree";"kNN"
160classifiers = [majority, bayes, tree, knn]
162# print the head
163print "Possible classes:", data.domain.classVar.values
164print "Probability for republican:"
165print "Original Class",
166for l in classifiers:
167    print "%-13s" % (,
170# classify first 10 instances and print probabilities
171for example in data[:10]:
172    print "(%-10s)  " % (example.getclass()),
173    for c in classifiers:
174        p = apply(c, [example, orange.GetProbabilities])
175        print "%5.3f        " % (p[0]),
176    print
179<p>The code is somehow long, due to our effort to print the results
180nicely. The first part of the code sets-up our four classifiers,
181and gives them names. Classifiers are then put into the list
182denoted with variable <code>classifiers</code> (this is nice since, if we would
183need to add another classifier, we would just define it and put it
184in the list, and for the rest of the code we would not worry about
185it any more). The script then prints the header with the names of
186the classifiers, and finally uses the classifiers to compute the
187probabilities of classes. Note for a special function <code>apply</code> that we
188have not met yet: it simply calls a function that is given as its
189first argument, and passes it the arguments that are given in the
190list. In our case, <code>apply</code> invokes our classifiers with a data
191instance and request to compute probabilities. The output of our
192script is:</p>
194<xmp class="code">Possible classes: <republican, democrat>
195Probability for republican:
196Original Class Majority      Naive Bayes   Tree          kNN
197(republican)   0.386         1.000         0.949         1.000
198(republican)   0.386         1.000         0.973         1.000
199(democrat  )   0.386         0.995         0.011         0.138
200(democrat  )   0.386         0.002         0.015         0.468
201(democrat  )   0.386         0.043         0.015         0.035
202(democrat  )   0.386         0.228         0.015         0.442
203(democrat  )   0.386         1.000         0.973         0.977
204(republican)   0.386         1.000         0.973         1.000
205(republican)   0.386         1.000         0.973         1.000
206(democrat  )   0.386         0.000         0.015         0.000
209<p>Notice that the prediction of majority class classifier does not
210depend on the instance it classifies (of course!). Other than that,
211it would be inappropriate to say anything conclusive on the quality
212of the classifiers &ndash; for this, we will need to resort to
213statistical methods on comparison of classification models, about
214which you can read in our <a href=
215"c_performance.htm">next lesson</a>.</p>
217<hr><br><p class="Path">
218Prev: <a href="c_basics.htm">My first Orange classifier</a>,
219Next: <a href="c_performance.htm">Testing and Evaluating</a>,
220Up: <a href="classification.htm">Classification</a>
Note: See TracBrowser for help on using the repository browser.