source: orange/Orange/doc/ofb/c_nb_disc.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 5.6 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3</HEAD>
4<body>
5
6<p class="Path">
7Prev: <a href="c_pythonlearner.htm">Build Your Own Learner</a>,
8Next: <a href="c_nb.htm">Naive Bayes in Python</a>,
9Up: <a href="c_pythonlearner.htm">Build Your Own Learner</a>,
10</p>
11
12<H1>Naive Bayes with Discretization</H1>
13
14<p>Let us build a learner/classifier that is an extension of build-in
15naive Bayes and which before learning categorizes the data (see also
16the lesson on <a href= "o_categorization.htm">Categorization</a>). We
17will define a module <a href="nbdisc.py">nbdisc.py</a> that will
18implement our method. As we have explained in the <a href=
19"c_pythonlearner.htm">introductory text on learners/classifiers</a>,
20it will implement two classes, Learner and Classifier. First,
21here is definition of a Learner class:</p>
22
23<p class="header">function <code>Learner</code> from <a href=
24"nbdisc.py">nbdisc.py</a></p>
25<xmp class="code">class Learner(object):
26    def __new__(cls, examples=None, name='discretized bayes', **kwds):
27        learner = object.__new__(cls, **kwds)
28        if examples:
29            learner.__init__(name) # force init
30            return learner(examples)
31        else:
32            return learner  # invokes the __init__
33
34    def __init__(self, name='discretized bayes'):
35        self.name = name
36
37    def __call__(self, data, weight=None):
38        disc = orange.Preprocessor_discretize( \
39            data, method=orange.EntropyDiscretization())
40        model = orange.BayesLearner(disc, weight)
41        return Classifier(classifier = model)
42</xmp>
43
44<p>Learner_Class has three methods. Method <code>__new__</code>
45creates the object and returns a learner or classifier, depending if
46examples where passed to the call. If the examples were passed as an
47argument than the method called the learner (invoking
48<code>__call__</code> method). Method <code>__init__</code> is invoked
49every time the class is called for the first time. Notice that all it
50does is remembers the only argument that this class can be called
51with, i.e. the argument <code>name</code> which defaults to
52&lsquo;discretized bayes&rsquo;. If you would expect any other
53arguments for your learners, you should handle them here (store them
54as class&rsquo; attributes using the keyword <code>self</code>).</p>
55
56<p>If we have created an instance of the learner (and did not pass the
57examples as attributes), the next call of this learner will invoke a
58method <code>__call__</code>, where the essence of our learner is
59implemented. Notice also that we have included an attribute for vector
60of instance weights, which is passed to naive Bayesian learner. In our
61learner, we first discretize the data using Fayyad &amp; Irani&rsquo;s
62entropy-based discretization, then build a naive Bayesian model and
63finally pass it to a class <code>Classifier</code>. You may expect
64that at its first invocation the <code>Classifier</code> will just
65remember the model we have called it with:</p>
66
67<p class="header">class Classifier from <a href=
68"nbdisc.py">nbdisc.py</a></p>
69<xmp class="code">class Classifier:
70    def __init__(self, **kwds):
71        self.__dict__.update(kwds)
72
73    def __call__(self, example, resultType = orange.GetValue):
74        return self.classifier(example, resultType)
75</xmp>
76
77
78<p>The method <code>__init__</code> in <code>Classifier</code> is
79rather general: it makes <code>Classifier</code> remember all
80arguments it was called with. They are then accessed through
81<code>Classifiers</code>&rsquo; arguments
82(<code>self.argument_name</code>). When Classifier is called, it
83expects an example and an optional argument that specifies the type of
84result to be returned.</p>
85
86<p>This comples our code for naive Bayesian classifier with
87discretization. You can see that the code is fairly short (fewer than
8820 lines), and it can be easily extended or changed if we want to do
89something else as well (include a feature subset selection, for instance,
90&hellip;).</p>
91
92<p>Here are now a few lines to test our code:</p>
93
94<p class="header">uses <a href="iris.tab">iris.tab</a> and <a href=
95"nbdisc.py">nbdisc.py</a></p>
96<pre class="code">
97> <code>python</code>
98>>> <code>import orange, nbdisc</code>
99>>> <code>data = orange.ExampleTable("iris")</code>
100>>> <code>classifier = nbdisc.Learner(data)</code>
101>>> <code>print classifier(data[100])</code>
102Iris-virginica
103>>> <code>classifier(data[100], orange.GetBoth)</code>
104(<orange.Value 'iris'='Iris-virginica'>, <0.000, 0.001, 0.999>)
105>>>
106</pre>
107
108<p>For a more elaborate test that also shows the use of a learner
109(that is not given the data at its initialization), here is a
110script that does 10-fold cross validation:</p>
111
112<p class="header">
113<a href=
114"nbdisc_test.py">nbdisc_test.py</a>
115(uses <a href="iris.tab">iris.tab</a> and
116<a href="nbdisc.py">nbdisc.py</a>)</p>
117<xmp class="code">import orange, orngEval, nbdisc
118data = orange.ExampleTable("iris")
119results = orngEval.CrossValidation([nbdisc.Learner()], data)
120print "Accuracy = %5.3f" % orngEval.CA(results)[0]
121</xmp>
122
123<p>The accuracy on this data set is about 92%. You may try to obtain a
124better accuracy by using some other type of discretization, or try
125some other learner on this data (hint: k-NN should perform
126better).</p>
127
128<p>You can now read on to see how the same schema of developing
129your own classifier was used for to assemble all-in-python <a href=
130"c_nb.htm">naive Bayesian method</a> and see how it is very easy to
131implement <a href="c_bagging.htm">bagging</a>.</p>
132
133<hr><br><p class="Path">
134Prev: <a href="c_pythonlearner.htm">Build Your Own Learner</a>,
135Next: <a href="c_nb.htm">Naive Bayes in Python</a>,
136Up: <a href="c_pythonlearner.htm">Build Your Own Learner</a>,
137</p>
138
139</body>
140</html>
141
Note: See TracBrowser for help on using the repository browser.