source: orange/Orange/ensemble/__init__.py @ 9671:a7b056375472

Revision 9671:a7b056375472, 5.7 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1"""
2
3.. index:: ensemble
4
5Module Orange.ensemble implements Breiman's bagging and Random Forest,
6and Freund and Schapire's boosting algorithms.
7
8
9*******
10Bagging
11*******
12
13.. index:: bagging
14.. index::
15   single: ensemble; ensemble
16
17.. autoclass:: Orange.ensemble.bagging.BaggedLearner
18   :members:
19   :show-inheritance:
20
21.. autoclass:: Orange.ensemble.bagging.BaggedClassifier
22   :members:
23   :show-inheritance:
24
25********
26Boosting
27********
28
29.. index:: boosting
30.. index::
31   single: ensemble; boosting
32
33
34.. autoclass:: Orange.ensemble.boosting.BoostedLearner
35  :members:
36  :show-inheritance:
37
38.. autoclass:: Orange.ensemble.boosting.BoostedClassifier
39   :members:
40   :show-inheritance:
41
42Example
43=======
44Let us try boosting and bagging on Lymphography data set and use TreeLearner
45with post-pruning as a base learner. For testing, we use 10-fold cross
46validation and observe classification accuracy.
47
48:download:`ensemble.py <code/ensemble.py>` (uses :download:`lymphography.tab <code/lymphography.tab>`)
49
50.. literalinclude:: code/ensemble.py
51  :lines: 7-
52
53Running this script, we may get something like::
54
55    Classification Accuracy:
56               tree: 0.764
57       boosted tree: 0.770
58        bagged tree: 0.790
59
60
61*************
62Random Forest
63*************
64
65.. index:: random forest
66.. index::
67   single: ensemble; random forest
68   
69.. autoclass:: Orange.ensemble.forest.RandomForestLearner
70  :members:
71  :show-inheritance:
72
73.. autoclass:: Orange.ensemble.forest.RandomForestClassifier
74  :members:
75  :show-inheritance:
76
77
78Example
79========
80
81The following script assembles a random forest learner and compares it
82to a tree learner on a liver disorder (bupa) and housing data sets.
83
84:download:`ensemble-forest.py <code/ensemble-forest.py>` (uses :download:`bupa.tab <code/bupa.tab>`, :download:`housing.tab <code/housing.tab>`)
85
86.. literalinclude:: code/ensemble-forest.py
87  :lines: 7-
88
89Notice that our forest contains 50 trees. Learners are compared through
903-fold cross validation::
91
92    Classification: bupa.tab
93    Learner  CA     Brier  AUC
94    tree     0.586  0.829  0.575
95    forest   0.710  0.392  0.752
96    Regression: housing.tab
97    Learner  MSE    RSE    R2
98    tree     23.708  0.281  0.719
99    forest   11.988  0.142  0.858
100
101Perhaps the sole purpose of the following example is to show how to
102access the individual classifiers once they are assembled into the
103forest, and to show how we can assemble a tree learner to be used in
104random forests. In the following example the best feature for decision
105nodes is selected among three randomly chosen features, and maxDepth
106and minExamples are both set to 5.
107
108:download:`ensemble-forest2.py <code/ensemble-forest2.py>` (uses :download:`bupa.tab <code/bupa.tab>`)
109
110.. literalinclude:: code/ensemble-forest2.py
111  :lines: 7-
112
113Running the above code would report on sizes (number of nodes) of the tree
114in a constructed random forest.
115
116   
117Score Feature
118=============
119
120L. Breiman (2001) suggested the possibility of using random forests as a
121non-myopic measure of feature importance.
122
123The assessment of feature relevance with random forests is based on the
124idea that randomly changing the value of an important feature greatly
125affects instance's classification, while changing the value of an
126unimportant feature does not affect it much. Implemented algorithm
127accumulates feature scores over given number of trees. Importance of
128all features for a single tree are computed as: correctly classified
129OOB instances minus correctly classified OOB instances when the feature is
130randomly shuffled. The accumulated feature scores are divided by the
131number of used trees and multiplied by 100 before they are returned.
132
133.. autoclass:: Orange.ensemble.forest.ScoreFeature
134  :members:
135
136Computation of feature importance with random forests is rather slow and
137importances for all features need to be computes simultaneously. When it
138is called to compute a quality of certain feature, it computes qualities
139for all features in the dataset. When called again, it uses the stored
140results if the domain is still the same and the data table has not
141changed (this is done by checking the data table's version and is
142not foolproof; it will not detect if you change values of existing instances,
143but will notice adding and removing instances; see the page on
144:class:`Orange.data.Table` for details).
145
146:download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>` (uses :download:`iris.tab <code/iris.tab>`)
147
148.. literalinclude:: code/ensemble-forest-measure.py
149  :lines: 7-
150
151Corresponding output::
152
153    DATA:iris.tab
154
155    first: 3.91, second: 0.38
156
157    different random seed
158    first: 3.39, second: 0.46
159
160    All importances:
161       sepal length:   3.39
162        sepal width:   0.46
163       petal length:  30.15
164        petal width:  31.98
165
166References
167-----------
168* L Breiman. Bagging Predictors. `Technical report No. 421 \
169    <http://www.stat.berkeley.edu/tech-reports/421.ps.Z>`_. University of \
170    California, Berkeley, 1994.
171* Y Freund, RE Schapire. `Experiments with a New Boosting Algorithm \
172    <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine \
173    Learning: Proceedings of the Thirteenth International Conference (ICML'96), 1996.
174* JR Quinlan. `Boosting, bagging, and C4.5 \
175    <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th \
176    National Conference on Artificial Intelligence (AAAI'96). pp. 725-730, 1996.
177* L Brieman. `Random Forests \
178    <http://www.springerlink.com/content/u0p06167n6173512/>`_.\
179    Machine Learning, 45, 5-32, 2001.
180* M Robnik-Sikonja. `Improving Random Forests \
181    <http://lkm.fri.uni-lj.si/rmarko/papers/robnik04-ecml.pdf>`_. In \
182    Proc. of European Conference on Machine Learning (ECML 2004),\
183    pp. 359-370, 2004.
184"""
185
186__all__ = ["bagging", "boosting", "forest"]
187__docformat__ = 'restructuredtext'
188import Orange.core as orange
Note: See TracBrowser for help on using the repository browser.