Ignore:
Timestamp:
03/15/12 17:15:32 (2 years ago)
Author:
blaz <blaz.zupan@…>
Branch:
default
Message:

Moved documentation to rst.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Orange/ensemble/__init__.py

    r9994 r10538  
    1 """ 
    2  
    3 .. index:: ensemble 
    4  
    5 Module Orange.ensemble implements Breiman's bagging and Random Forest,  
    6 and Freund and Schapire's boosting algorithms. 
    7  
    8  
    9 ******* 
    10 Bagging 
    11 ******* 
    12  
    13 .. index:: bagging 
    14 .. index:: 
    15    single: ensemble; ensemble 
    16  
    17 .. autoclass:: Orange.ensemble.bagging.BaggedLearner 
    18    :members: 
    19    :show-inheritance: 
    20  
    21 .. autoclass:: Orange.ensemble.bagging.BaggedClassifier 
    22    :members: 
    23    :show-inheritance: 
    24  
    25 ******** 
    26 Boosting 
    27 ******** 
    28  
    29 .. index:: boosting 
    30 .. index:: 
    31    single: ensemble; boosting 
    32  
    33  
    34 .. autoclass:: Orange.ensemble.boosting.BoostedLearner 
    35   :members: 
    36   :show-inheritance: 
    37  
    38 .. autoclass:: Orange.ensemble.boosting.BoostedClassifier 
    39    :members: 
    40    :show-inheritance: 
    41  
    42 Example 
    43 ======= 
    44 Let us try boosting and bagging on Lymphography data set and use TreeLearner 
    45 with post-pruning as a base learner. For testing, we use 10-fold cross 
    46 validation and observe classification accuracy. 
    47  
    48 :download:`ensemble.py <code/ensemble.py>` 
    49  
    50 .. literalinclude:: code/ensemble.py 
    51   :lines: 7- 
    52  
    53 Running this script, we may get something like:: 
    54  
    55     Classification Accuracy: 
    56                tree: 0.764 
    57        boosted tree: 0.770 
    58         bagged tree: 0.790 
    59  
    60  
    61 ************* 
    62 Random Forest 
    63 ************* 
    64  
    65 .. index:: random forest 
    66 .. index:: 
    67    single: ensemble; random forest 
    68     
    69 .. autoclass:: Orange.ensemble.forest.RandomForestLearner 
    70   :members: 
    71   :show-inheritance: 
    72  
    73 .. autoclass:: Orange.ensemble.forest.RandomForestClassifier 
    74   :members: 
    75   :show-inheritance: 
    76  
    77  
    78 Example 
    79 ======== 
    80  
    81 The following script assembles a random forest learner and compares it 
    82 to a tree learner on a liver disorder (bupa) and housing data sets. 
    83  
    84 :download:`ensemble-forest.py <code/ensemble-forest.py>` 
    85  
    86 .. literalinclude:: code/ensemble-forest.py 
    87   :lines: 7- 
    88  
    89 Notice that our forest contains 50 trees. Learners are compared through  
    90 3-fold cross validation:: 
    91  
    92     Classification: bupa.tab 
    93     Learner  CA     Brier  AUC 
    94     tree     0.586  0.829  0.575 
    95     forest   0.710  0.392  0.752 
    96     Regression: housing.tab 
    97     Learner  MSE    RSE    R2 
    98     tree     23.708  0.281  0.719 
    99     forest   11.988  0.142  0.858 
    100  
    101 Perhaps the sole purpose of the following example is to show how to 
    102 access the individual classifiers once they are assembled into the 
    103 forest, and to show how we can assemble a tree learner to be used in 
    104 random forests. In the following example the best feature for decision 
    105 nodes is selected among three randomly chosen features, and maxDepth 
    106 and minExamples are both set to 5. 
    107  
    108 :download:`ensemble-forest2.py <code/ensemble-forest2.py>` 
    109  
    110 .. literalinclude:: code/ensemble-forest2.py 
    111   :lines: 7- 
    112  
    113 Running the above code would report on sizes (number of nodes) of the tree 
    114 in a constructed random forest. 
    115  
    116      
    117 Score Feature 
    118 ============= 
    119  
    120 L. Breiman (2001) suggested the possibility of using random forests as a 
    121 non-myopic measure of feature importance. 
    122  
    123 The assessment of feature relevance with random forests is based on the 
    124 idea that randomly changing the value of an important feature greatly 
    125 affects instance's classification, while changing the value of an 
    126 unimportant feature does not affect it much. Implemented algorithm 
    127 accumulates feature scores over given number of trees. Importance of 
    128 all features for a single tree are computed as: correctly classified  
    129 OOB instances minus correctly classified OOB instances when the feature is 
    130 randomly shuffled. The accumulated feature scores are divided by the 
    131 number of used trees and multiplied by 100 before they are returned. 
    132  
    133 .. autoclass:: Orange.ensemble.forest.ScoreFeature 
    134   :members: 
    135  
    136 Computation of feature importance with random forests is rather slow and 
    137 importances for all features need to be computes simultaneously. When it  
    138 is called to compute a quality of certain feature, it computes qualities 
    139 for all features in the dataset. When called again, it uses the stored  
    140 results if the domain is still the same and the data table has not 
    141 changed (this is done by checking the data table's version and is 
    142 not foolproof; it will not detect if you change values of existing instances, 
    143 but will notice adding and removing instances; see the page on  
    144 :class:`Orange.data.Table` for details). 
    145  
    146 :download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>` 
    147  
    148 .. literalinclude:: code/ensemble-forest-measure.py 
    149   :lines: 7- 
    150  
    151 Corresponding output:: 
    152  
    153     DATA:iris.tab 
    154  
    155     first: 3.91, second: 0.38 
    156  
    157     different random seed 
    158     first: 3.39, second: 0.46 
    159  
    160     All importances: 
    161        sepal length:   3.39 
    162         sepal width:   0.46 
    163        petal length:  30.15 
    164         petal width:  31.98 
    165  
    166 References 
    167 ----------- 
    168 * L Breiman. Bagging Predictors. `Technical report No. 421 \ 
    169     <http://www.stat.berkeley.edu/tech-reports/421.ps.Z>`_. University of \ 
    170     California, Berkeley, 1994. 
    171 * Y Freund, RE Schapire. `Experiments with a New Boosting Algorithm \ 
    172     <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine \ 
    173     Learning: Proceedings of the Thirteenth International Conference (ICML'96), 1996. 
    174 * JR Quinlan. `Boosting, bagging, and C4.5 \ 
    175     <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th \ 
    176     National Conference on Artificial Intelligence (AAAI'96). pp. 725-730, 1996.  
    177 * L Brieman. `Random Forests \ 
    178     <http://www.springerlink.com/content/u0p06167n6173512/>`_.\ 
    179     Machine Learning, 45, 5-32, 2001.  
    180 * M Robnik-Sikonja. `Improving Random Forests \ 
    181     <http://lkm.fri.uni-lj.si/rmarko/papers/robnik04-ecml.pdf>`_. In \ 
    182     Proc. of European Conference on Machine Learning (ECML 2004),\ 
    183     pp. 359-370, 2004. 
    184 """ 
    185  
    186 __all__ = ["bagging", "boosting", "forest"] 
     1__all__ = ["bagging", "boosting", "forest", "stacking"] 
    1872__docformat__ = 'restructuredtext' 
    1883import Orange.core as orange 
Note: See TracChangeset for help on using the changeset viewer.