source: orange/docs/reference/rst/Orange.ensemble.rst @ 10539:f81832a7af04

Revision 10539:f81832a7af04, 5.8 KB checked in by blaz <blaz.zupan@…>, 2 years ago (diff)

Minor changes to documentation.

Line 
1##################################
2Ensemble algorithms (``ensemble``)
3##################################
4
5.. index:: ensemble
6
7`Ensembles <http://en.wikipedia.org/wiki/Ensemble_learning>`_ use
8multiple models to improve prediction performance. The module
9implements a number of popular approaches, including bagging,
10boosting, stacking and forest trees. Most of these are available both
11for classification and regression with exception of stacking, which
12with present implementation supports classification only.
13
14*******
15Bagging
16*******
17
18.. index:: bagging
19.. index::
20   single: ensemble; ensemble
21
22.. autoclass:: Orange.ensemble.bagging.BaggedLearner
23   :members:
24   :show-inheritance:
25
26.. autoclass:: Orange.ensemble.bagging.BaggedClassifier
27   :members:
28   :show-inheritance:
29
30********
31Boosting
32********
33
34.. index:: boosting
35.. index::
36   single: ensemble; boosting
37
38
39.. autoclass:: Orange.ensemble.boosting.BoostedLearner
40  :members:
41  :show-inheritance:
42
43.. autoclass:: Orange.ensemble.boosting.BoostedClassifier
44   :members:
45   :show-inheritance:
46
47Example
48=======
49Let us try boosting and bagging on Lymphography data set and use TreeLearner
50with post-pruning as a base learner. For testing, we use 10-fold cross
51validation and observe classification accuracy.
52
53:download:`ensemble.py <code/ensemble.py>`
54
55.. literalinclude:: code/ensemble.py
56  :lines: 7-
57
58Running this script, we may get something like::
59
60    Classification Accuracy:
61               tree: 0.764
62       boosted tree: 0.770
63        bagged tree: 0.790
64
65
66*************
67Random Forest
68*************
69
70.. index:: random forest
71.. index::
72   single: ensemble; random forest
73   
74.. autoclass:: Orange.ensemble.forest.RandomForestLearner
75  :members:
76  :show-inheritance:
77
78.. autoclass:: Orange.ensemble.forest.RandomForestClassifier
79  :members:
80  :show-inheritance:
81
82
83Example
84========
85
86The following script assembles a random forest learner and compares it
87to a tree learner on a liver disorder (bupa) and housing data sets.
88
89:download:`ensemble-forest.py <code/ensemble-forest.py>`
90
91.. literalinclude:: code/ensemble-forest.py
92  :lines: 7-
93
94Notice that our forest contains 50 trees. Learners are compared through
953-fold cross validation::
96
97    Classification: bupa.tab
98    Learner  CA     Brier  AUC
99    tree     0.586  0.829  0.575
100    forest   0.710  0.392  0.752
101    Regression: housing.tab
102    Learner  MSE    RSE    R2
103    tree     23.708  0.281  0.719
104    forest   11.988  0.142  0.858
105
106Perhaps the sole purpose of the following example is to show how to
107access the individual classifiers once they are assembled into the
108forest, and to show how we can assemble a tree learner to be used in
109random forests. In the following example the best feature for decision
110nodes is selected among three randomly chosen features, and maxDepth
111and minExamples are both set to 5.
112
113:download:`ensemble-forest2.py <code/ensemble-forest2.py>`
114
115.. literalinclude:: code/ensemble-forest2.py
116  :lines: 7-
117
118Running the above code would report on sizes (number of nodes) of the tree
119in a constructed random forest.
120
121   
122Feature scoring
123===============
124
125L. Breiman (2001) suggested the possibility of using random forests as a
126non-myopic measure of feature importance.
127
128The assessment of feature relevance with random forests is based on the
129idea that randomly changing the value of an important feature greatly
130affects instance's classification, while changing the value of an
131unimportant feature does not affect it much. Implemented algorithm
132accumulates feature scores over given number of trees. Importance of
133all features for a single tree are computed as: correctly classified
134OOB instances minus correctly classified OOB instances when the feature is
135randomly shuffled. The accumulated feature scores are divided by the
136number of used trees and multiplied by 100 before they are returned.
137
138.. autoclass:: Orange.ensemble.forest.ScoreFeature
139  :members:
140
141Computation of feature importance with random forests is rather slow
142and importances for all features need to be computes
143simultaneously. When it is called to compute a quality of certain
144feature, it computes qualities for all features in the dataset. When
145called again, it uses the stored results if the domain is still the
146same and the data table has not changed (this is done by checking the
147data table's version and is not foolproof; it will not detect if you
148change values of existing instances, but will notice adding and
149removing instances; see the page on :class:`Orange.data.Table` for
150details).
151
152:download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>`
153
154.. literalinclude:: code/ensemble-forest-measure.py
155  :lines: 7-
156
157The output of the above script is::
158
159    DATA:iris.tab
160
161    first: 3.91, second: 0.38
162
163    different random seed
164    first: 3.39, second: 0.46
165
166    All importances:
167       sepal length:   3.39
168        sepal width:   0.46
169       petal length:  30.15
170        petal width:  31.98
171
172References
173----------
174
175* L Breiman. Bagging Predictors. `Technical report No. 421 \
176    <http://www.stat.berkeley.edu/tech-reports/421.ps.Z>`_. University of \
177    California, Berkeley, 1994.
178* Y Freund, RE Schapire. `Experiments with a New Boosting Algorithm \
179    <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine \
180    Learning: Proceedings of the Thirteenth International Conference (ICML'96), 1996.
181* JR Quinlan. `Boosting, bagging, and C4.5 \
182    <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th \
183    National Conference on Artificial Intelligence (AAAI'96). pp. 725-730, 1996.
184* L Brieman. `Random Forests \
185    <http://www.springerlink.com/content/u0p06167n6173512/>`_.\
186    Machine Learning, 45, 5-32, 2001.
187* M Robnik-Sikonja. `Improving Random Forests \
188    <http://lkm.fri.uni-lj.si/rmarko/papers/robnik04-ecml.pdf>`_. In \
189    Proc. of European Conference on Machine Learning (ECML 2004),\
190    pp. 359-370, 2004.
191"""
192
193.. automodule:: Orange.ensemble
194
Note: See TracBrowser for help on using the repository browser.