Changeset 10539:f81832a7af04 in orange


Ignore:
Timestamp:
03/15/12 17:16:00 (2 years ago)
Author:
blaz <blaz.zupan@…>
Branch:
default
Message:

Minor changes to documentation.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.ensemble.rst

    r9372 r10539  
    33################################## 
    44 
     5.. index:: ensemble 
     6 
     7`Ensembles <http://en.wikipedia.org/wiki/Ensemble_learning>`_ use 
     8multiple models to improve prediction performance. The module 
     9implements a number of popular approaches, including bagging, 
     10boosting, stacking and forest trees. Most of these are available both 
     11for classification and regression with exception of stacking, which 
     12with present implementation supports classification only. 
     13 
     14******* 
     15Bagging 
     16******* 
     17 
     18.. index:: bagging 
     19.. index:: 
     20   single: ensemble; ensemble 
     21 
     22.. autoclass:: Orange.ensemble.bagging.BaggedLearner 
     23   :members: 
     24   :show-inheritance: 
     25 
     26.. autoclass:: Orange.ensemble.bagging.BaggedClassifier 
     27   :members: 
     28   :show-inheritance: 
     29 
     30******** 
     31Boosting 
     32******** 
     33 
     34.. index:: boosting 
     35.. index:: 
     36   single: ensemble; boosting 
     37 
     38 
     39.. autoclass:: Orange.ensemble.boosting.BoostedLearner 
     40  :members: 
     41  :show-inheritance: 
     42 
     43.. autoclass:: Orange.ensemble.boosting.BoostedClassifier 
     44   :members: 
     45   :show-inheritance: 
     46 
     47Example 
     48======= 
     49Let us try boosting and bagging on Lymphography data set and use TreeLearner 
     50with post-pruning as a base learner. For testing, we use 10-fold cross 
     51validation and observe classification accuracy. 
     52 
     53:download:`ensemble.py <code/ensemble.py>` 
     54 
     55.. literalinclude:: code/ensemble.py 
     56  :lines: 7- 
     57 
     58Running this script, we may get something like:: 
     59 
     60    Classification Accuracy: 
     61               tree: 0.764 
     62       boosted tree: 0.770 
     63        bagged tree: 0.790 
     64 
     65 
     66************* 
     67Random Forest 
     68************* 
     69 
     70.. index:: random forest 
     71.. index:: 
     72   single: ensemble; random forest 
     73    
     74.. autoclass:: Orange.ensemble.forest.RandomForestLearner 
     75  :members: 
     76  :show-inheritance: 
     77 
     78.. autoclass:: Orange.ensemble.forest.RandomForestClassifier 
     79  :members: 
     80  :show-inheritance: 
     81 
     82 
     83Example 
     84======== 
     85 
     86The following script assembles a random forest learner and compares it 
     87to a tree learner on a liver disorder (bupa) and housing data sets. 
     88 
     89:download:`ensemble-forest.py <code/ensemble-forest.py>` 
     90 
     91.. literalinclude:: code/ensemble-forest.py 
     92  :lines: 7- 
     93 
     94Notice that our forest contains 50 trees. Learners are compared through  
     953-fold cross validation:: 
     96 
     97    Classification: bupa.tab 
     98    Learner  CA     Brier  AUC 
     99    tree     0.586  0.829  0.575 
     100    forest   0.710  0.392  0.752 
     101    Regression: housing.tab 
     102    Learner  MSE    RSE    R2 
     103    tree     23.708  0.281  0.719 
     104    forest   11.988  0.142  0.858 
     105 
     106Perhaps the sole purpose of the following example is to show how to 
     107access the individual classifiers once they are assembled into the 
     108forest, and to show how we can assemble a tree learner to be used in 
     109random forests. In the following example the best feature for decision 
     110nodes is selected among three randomly chosen features, and maxDepth 
     111and minExamples are both set to 5. 
     112 
     113:download:`ensemble-forest2.py <code/ensemble-forest2.py>` 
     114 
     115.. literalinclude:: code/ensemble-forest2.py 
     116  :lines: 7- 
     117 
     118Running the above code would report on sizes (number of nodes) of the tree 
     119in a constructed random forest. 
     120 
     121     
     122Feature scoring 
     123=============== 
     124 
     125L. Breiman (2001) suggested the possibility of using random forests as a 
     126non-myopic measure of feature importance. 
     127 
     128The assessment of feature relevance with random forests is based on the 
     129idea that randomly changing the value of an important feature greatly 
     130affects instance's classification, while changing the value of an 
     131unimportant feature does not affect it much. Implemented algorithm 
     132accumulates feature scores over given number of trees. Importance of 
     133all features for a single tree are computed as: correctly classified  
     134OOB instances minus correctly classified OOB instances when the feature is 
     135randomly shuffled. The accumulated feature scores are divided by the 
     136number of used trees and multiplied by 100 before they are returned. 
     137 
     138.. autoclass:: Orange.ensemble.forest.ScoreFeature 
     139  :members: 
     140 
     141Computation of feature importance with random forests is rather slow 
     142and importances for all features need to be computes 
     143simultaneously. When it is called to compute a quality of certain 
     144feature, it computes qualities for all features in the dataset. When 
     145called again, it uses the stored results if the domain is still the 
     146same and the data table has not changed (this is done by checking the 
     147data table's version and is not foolproof; it will not detect if you 
     148change values of existing instances, but will notice adding and 
     149removing instances; see the page on :class:`Orange.data.Table` for 
     150details). 
     151 
     152:download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>` 
     153 
     154.. literalinclude:: code/ensemble-forest-measure.py 
     155  :lines: 7- 
     156 
     157The output of the above script is:: 
     158 
     159    DATA:iris.tab 
     160 
     161    first: 3.91, second: 0.38 
     162 
     163    different random seed 
     164    first: 3.39, second: 0.46 
     165 
     166    All importances: 
     167       sepal length:   3.39 
     168        sepal width:   0.46 
     169       petal length:  30.15 
     170        petal width:  31.98 
     171 
     172References 
     173---------- 
     174 
     175* L Breiman. Bagging Predictors. `Technical report No. 421 \ 
     176    <http://www.stat.berkeley.edu/tech-reports/421.ps.Z>`_. University of \ 
     177    California, Berkeley, 1994. 
     178* Y Freund, RE Schapire. `Experiments with a New Boosting Algorithm \ 
     179    <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine \ 
     180    Learning: Proceedings of the Thirteenth International Conference (ICML'96), 1996. 
     181* JR Quinlan. `Boosting, bagging, and C4.5 \ 
     182    <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th \ 
     183    National Conference on Artificial Intelligence (AAAI'96). pp. 725-730, 1996.  
     184* L Brieman. `Random Forests \ 
     185    <http://www.springerlink.com/content/u0p06167n6173512/>`_.\ 
     186    Machine Learning, 45, 5-32, 2001.  
     187* M Robnik-Sikonja. `Improving Random Forests \ 
     188    <http://lkm.fri.uni-lj.si/rmarko/papers/robnik04-ecml.pdf>`_. In \ 
     189    Proc. of European Conference on Machine Learning (ECML 2004),\ 
     190    pp. 359-370, 2004. 
     191""" 
     192 
    5193.. automodule:: Orange.ensemble 
    6194 
Note: See TracChangeset for help on using the changeset viewer.