source: orange/orange/doc/ofb/o_ensemble.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 7.2 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3</HEAD>
4<body>
5
6<p class="Path">
7Prev: <a href="o_fss.htm">Feature subset selection</a>,
8Next: <a href="domain.htm">Basic data manipulation</a>,
9Up: <a href="other.htm">Other Techniques for Orange Scripting</a>
10</p>
11
12<H1>Ensemble Techniques</H1>
13<index name="ensemble learning">
14
15<p>Building ensemble classifiers in Orange is simple and
16easy. Starting from learners/classifiers that can predict
17probabilities and, if needed, use example weights, ensembles are
18actually wrappers that can aggregate predictions from a list of constructed
19classifiers. These wrappers behave exactly like other Orange learners/classifiers. We will here first
20show how to use a module for bagging and boosting that is
21included in Orange distribution (<a
22href="../modules/orngEnsemble.htm">orngEnsemble</a> module), and then,
23for a somehow more advanced example build our own ensemble
24learner.</p>
25
26<h2>Ensemble learning using orngEnsemble</h2>
27<index name="modules+bagging">
28<index name="modules+boosting">
29<index name="bagging">
30<index name="boosting">
31
32<p>First, there is a module for <a
33href="../modules/orngEnsemble.htm">Bagging and Boosting</a>, and using
34it is very easy: you have to define a learner, give it to bagger or
35booster, which in turn returns a new (boosted or bagged) learner. Here
36goes an example:</p>
37
38<p class="header"><a href="ensemble3.py">ensemble3.py</a> (uses <a href=
39"promoters.tab">promoters.tab</a>)</p>
40<xmp class=code>import orange, orngTest, orngStat, orngEnsemble
41data = orange.ExampleTable("promoters")
42
43majority = orange.MajorityLearner()
44majority.name = "default"
45knn = orange.kNNLearner(k=11)
46knn.name = "k-NN (k=11)"
47
48bagged_knn = orngEnsemble.BaggedLearner(knn, t=10)
49bagged_knn.name = "bagged k-NN"
50boosted_knn = orngEnsemble.BoostedLearner(knn, t=10)
51boosted_knn.name = "boosted k-NN"
52
53learners = [majority, knn, bagged_knn, boosted_knn]
54results = orngTest.crossValidation(learners, data, folds=10)
55print "        Learner   CA     Brier Score"
56for i in range(len(learners)):
57    print ("%15s:  %5.3f  %5.3f") % (learners[i].name,
58        orngStat.CA(results)[i], orngStat.BrierScore(results)[i])
59</xmp>
60
61<p>Most of the code is used for defining and naming objects that
62learn, and the last piece of code is to report evaluation
63results. Notice that to bag or boost a learner, it takes only a single
64line of code (like, <code>bagged_knn = orngEnsemble.BaggedLearner(knn,
65t=10)</code>)! Parameter <code>t</code> in bagging and boosting refers
66to number of classifiers that will be used for voting (or, if you
67like better, number of iterations by boosting/bagging). Depending on
68your random generator, you may get something like:</p>
69
70<XMP class=code>        Learner   CA     Brier Score
71        default:  0.473  0.501
72    k-NN (k=11):  0.859  0.240
73    bagged k-NN:  0.813  0.257
74   boosted k-NN:  0.830  0.244
75</XMP>
76
77
78<h2>Build You Own Ensemble Learner</h2>
79<index name="ensemble learning/in Python">
80<index name="bagging/in Python">
81
82<p>If you have sequentially followed through this tutorial,
83building your own ensemble learner is nothing new: you have
84already build a <a href="c_bagging.htm">module for bagging</a>.
85Here is another similar example: we will build a learner that
86takes a list of learners, obtains classifiers by training them
87on the example set, and when classifying, uses classifiers to
88estimate probabilities and goes with the class that is predicted
89the highest probability. That is, at the end, the prediction of
90a sole classifier counts. If class probabilities are requested,
91they are reported as computed by this very classifier.</p>
92
93<p>Here is the code that implements our learner and classifier:</p>
94
95<p class="header">part of <a href="ensemble2.py">ensemble2.py</a></p>
96<xmp class=code>def WinnerLearner(examples=None, **kwds):
97  learner = apply(WinnerLearner_Class, (), kwds)
98  if examples:
99    return learner(examples)
100  else:
101    return learner
102
103class WinnerLearner_Class:
104  def __init__(self, name='winner classifier', learners=None):
105    self.name = name
106    self.learners = learners
107
108  def __call__(self, data, learners=None, weight=None):
109    if learners:
110      self.learners = learners
111    classifiers = []
112    for l in self.learners:
113      classifiers.append(l(data))
114    return WinnerClassifier(classifiers = classifiers)
115
116class WinnerClassifier:
117  def __init__(self, **kwds):
118    self.__dict__.update(kwds)
119
120  def __call__(self, example, resultType = orange.GetValue):
121    pmatrix = []
122    for c in self.classifiers:
123      pmatrix.append(c(example, orange.GetProbabilities))
124
125    maxp = []  # stores max class probabilities for each classifiers
126    for pv in pmatrix:
127      maxp.append(max(pv))
128     
129    p = max(maxp)  # max class probability
130    classifier_index = maxp.index(p)
131    c = pmatrix[classifier_index].modus()
132   
133    if resultType == orange.GetValue:
134      return c
135    elif resultType == orange.getClassDistribution:
136      return pmatrix[classifier_index]
137    else:
138      return (c, pmatrix[classifier_index])
139</xmp>
140
141<p><code>WinnerLearner_Class</code> store the learners and, when
142called with a data set, constructs the classifiers and passes them to
143<code>WinnerClassifier</code>. When this is called with a data
144instance, it computes class probabilities, finds a maximum probability
145for some class, and accordingly reports on either the class,
146probabilities or both. Notice that we have also took care that this
147learner/classifier confirms to everything that is expected from such
148objects in Orange, so it may be used by other modules, most
149importantly for classifier validation.</p>
150
151<p>An example of how our new learner is used is exemplified by the
152following script:
153
154<p class="header">part of <a href="ensemble2.py">ensemble2.py</a> (uses <a href=
155"promoters.tab">promoters.tab</a>)</p>
156<xmp class=code>tree = orngTree.TreeLearner(mForPruning=5.0)
157tree.name = 'class. tree'
158bayes = orange.BayesLearner()
159bayes.name = 'naive bayes'
160winner = WinnerLearner(learners=[tree, bayes])
161winner.name = 'winner'
162
163majority = orange.MajorityLearner()
164majority.name = 'default'
165learners = [majority, tree, bayes, winner]
166
167data = orange.ExampleTable("promoters")
168
169results = orngTest.crossValidation(learners, data)
170print "Classification Accuracy:"
171for i in range(len(learners)):
172    print ("%15s: %5.3f") % (learners[i].name, orngStat.CA(results)[i])
173</xmp>
174
175<p>Notice again that invoking our new objects and using them for
176machine learning is just as easy as using any other learner. When
177running this script, it may report something like:</p>
178
179<XMP class=code>Classification Accuracy:
180        default: 0.472
181    class. tree: 0.830
182    naive bayes: 0.868
183         winner: 0.877
184</XMP>
185
186<p>The example script above that implements WinnerLearner and
187WinnerClassifiers may easily be adapted to something that you need for
188ensemble learner. For an exercise, change the learner such that it
189uses cross-validation to estimate the probability that the classifier
190will predict correctly, and when classifying use this probability to
191weight the classifiers and change winner-schema to voting.
192</P>
193
194<hr><br><p class="Path">
195Prev: <a href="o_fss.htm">Feature subset selection</a>,
196Next: <a href="domain.htm">Basic data manipulation</a>,
197Up: <a href="other.htm">Other Techniques for Orange Scripting</a>
198</p>
199
200
201</body></html>
202
Note: See TracBrowser for help on using the repository browser.