Ignore:
Files:
2 added
7 edited

Legend:

Unmodified
Added
Removed
  • Orange/__init__.py

    r10546 r10549  
    8282_import("ensemble.boosting") 
    8383_import("ensemble.forest") 
     84_import("ensemble.stacking") 
    8485 
    8586_import("regression") 
  • Orange/ensemble/__init__.py

    r9994 r10540  
    1 """ 
    2  
    3 .. index:: ensemble 
    4  
    5 Module Orange.ensemble implements Breiman's bagging and Random Forest,  
    6 and Freund and Schapire's boosting algorithms. 
    7  
    8  
    9 ******* 
    10 Bagging 
    11 ******* 
    12  
    13 .. index:: bagging 
    14 .. index:: 
    15    single: ensemble; ensemble 
    16  
    17 .. autoclass:: Orange.ensemble.bagging.BaggedLearner 
    18    :members: 
    19    :show-inheritance: 
    20  
    21 .. autoclass:: Orange.ensemble.bagging.BaggedClassifier 
    22    :members: 
    23    :show-inheritance: 
    24  
    25 ******** 
    26 Boosting 
    27 ******** 
    28  
    29 .. index:: boosting 
    30 .. index:: 
    31    single: ensemble; boosting 
    32  
    33  
    34 .. autoclass:: Orange.ensemble.boosting.BoostedLearner 
    35   :members: 
    36   :show-inheritance: 
    37  
    38 .. autoclass:: Orange.ensemble.boosting.BoostedClassifier 
    39    :members: 
    40    :show-inheritance: 
    41  
    42 Example 
    43 ======= 
    44 Let us try boosting and bagging on Lymphography data set and use TreeLearner 
    45 with post-pruning as a base learner. For testing, we use 10-fold cross 
    46 validation and observe classification accuracy. 
    47  
    48 :download:`ensemble.py <code/ensemble.py>` 
    49  
    50 .. literalinclude:: code/ensemble.py 
    51   :lines: 7- 
    52  
    53 Running this script, we may get something like:: 
    54  
    55     Classification Accuracy: 
    56                tree: 0.764 
    57        boosted tree: 0.770 
    58         bagged tree: 0.790 
    59  
    60  
    61 ************* 
    62 Random Forest 
    63 ************* 
    64  
    65 .. index:: random forest 
    66 .. index:: 
    67    single: ensemble; random forest 
    68     
    69 .. autoclass:: Orange.ensemble.forest.RandomForestLearner 
    70   :members: 
    71   :show-inheritance: 
    72  
    73 .. autoclass:: Orange.ensemble.forest.RandomForestClassifier 
    74   :members: 
    75   :show-inheritance: 
    76  
    77  
    78 Example 
    79 ======== 
    80  
    81 The following script assembles a random forest learner and compares it 
    82 to a tree learner on a liver disorder (bupa) and housing data sets. 
    83  
    84 :download:`ensemble-forest.py <code/ensemble-forest.py>` 
    85  
    86 .. literalinclude:: code/ensemble-forest.py 
    87   :lines: 7- 
    88  
    89 Notice that our forest contains 50 trees. Learners are compared through  
    90 3-fold cross validation:: 
    91  
    92     Classification: bupa.tab 
    93     Learner  CA     Brier  AUC 
    94     tree     0.586  0.829  0.575 
    95     forest   0.710  0.392  0.752 
    96     Regression: housing.tab 
    97     Learner  MSE    RSE    R2 
    98     tree     23.708  0.281  0.719 
    99     forest   11.988  0.142  0.858 
    100  
    101 Perhaps the sole purpose of the following example is to show how to 
    102 access the individual classifiers once they are assembled into the 
    103 forest, and to show how we can assemble a tree learner to be used in 
    104 random forests. In the following example the best feature for decision 
    105 nodes is selected among three randomly chosen features, and maxDepth 
    106 and minExamples are both set to 5. 
    107  
    108 :download:`ensemble-forest2.py <code/ensemble-forest2.py>` 
    109  
    110 .. literalinclude:: code/ensemble-forest2.py 
    111   :lines: 7- 
    112  
    113 Running the above code would report on sizes (number of nodes) of the tree 
    114 in a constructed random forest. 
    115  
    116      
    117 Score Feature 
    118 ============= 
    119  
    120 L. Breiman (2001) suggested the possibility of using random forests as a 
    121 non-myopic measure of feature importance. 
    122  
    123 The assessment of feature relevance with random forests is based on the 
    124 idea that randomly changing the value of an important feature greatly 
    125 affects instance's classification, while changing the value of an 
    126 unimportant feature does not affect it much. Implemented algorithm 
    127 accumulates feature scores over given number of trees. Importance of 
    128 all features for a single tree are computed as: correctly classified  
    129 OOB instances minus correctly classified OOB instances when the feature is 
    130 randomly shuffled. The accumulated feature scores are divided by the 
    131 number of used trees and multiplied by 100 before they are returned. 
    132  
    133 .. autoclass:: Orange.ensemble.forest.ScoreFeature 
    134   :members: 
    135  
    136 Computation of feature importance with random forests is rather slow and 
    137 importances for all features need to be computes simultaneously. When it  
    138 is called to compute a quality of certain feature, it computes qualities 
    139 for all features in the dataset. When called again, it uses the stored  
    140 results if the domain is still the same and the data table has not 
    141 changed (this is done by checking the data table's version and is 
    142 not foolproof; it will not detect if you change values of existing instances, 
    143 but will notice adding and removing instances; see the page on  
    144 :class:`Orange.data.Table` for details). 
    145  
    146 :download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>` 
    147  
    148 .. literalinclude:: code/ensemble-forest-measure.py 
    149   :lines: 7- 
    150  
    151 Corresponding output:: 
    152  
    153     DATA:iris.tab 
    154  
    155     first: 3.91, second: 0.38 
    156  
    157     different random seed 
    158     first: 3.39, second: 0.46 
    159  
    160     All importances: 
    161        sepal length:   3.39 
    162         sepal width:   0.46 
    163        petal length:  30.15 
    164         petal width:  31.98 
    165  
    166 References 
    167 ----------- 
    168 * L Breiman. Bagging Predictors. `Technical report No. 421 \ 
    169     <http://www.stat.berkeley.edu/tech-reports/421.ps.Z>`_. University of \ 
    170     California, Berkeley, 1994. 
    171 * Y Freund, RE Schapire. `Experiments with a New Boosting Algorithm \ 
    172     <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine \ 
    173     Learning: Proceedings of the Thirteenth International Conference (ICML'96), 1996. 
    174 * JR Quinlan. `Boosting, bagging, and C4.5 \ 
    175     <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th \ 
    176     National Conference on Artificial Intelligence (AAAI'96). pp. 725-730, 1996.  
    177 * L Brieman. `Random Forests \ 
    178     <http://www.springerlink.com/content/u0p06167n6173512/>`_.\ 
    179     Machine Learning, 45, 5-32, 2001.  
    180 * M Robnik-Sikonja. `Improving Random Forests \ 
    181     <http://lkm.fri.uni-lj.si/rmarko/papers/robnik04-ecml.pdf>`_. In \ 
    182     Proc. of European Conference on Machine Learning (ECML 2004),\ 
    183     pp. 359-370, 2004. 
    184 """ 
    185  
    186 __all__ = ["bagging", "boosting", "forest"] 
     1__all__ = ["bagging", "boosting", "forest", "stacking"] 
    1872__docformat__ = 'restructuredtext' 
    188 import Orange.core as orange 
  • Orange/ensemble/forest.py

    r10530 r10540  
    9494            of completion of the learning progress. 
    9595 
    96     :param name: name of the learner. 
     96    :param name: learner name. 
    9797    :type name: string 
    9898 
  • Orange/regression/lasso.py

    r10314 r10535  
    1 """\ 
    2 ############################ 
    3 Lasso regression (``lasso``) 
    4 ############################ 
    5  
    6 .. index:: regression 
    7  
    8 .. _`Lasso regression. Regression shrinkage and selection via the lasso`: 
    9     http://www-stat.stanford.edu/~tibs/lasso/lasso.pdf 
    10  
    11  
    12 `The Lasso <http://www-stat.stanford.edu/~tibs/lasso/lasso.pdf>`_ is a shrinkage 
    13 and selection method for linear regression. It minimizes the usual sum of squared 
    14 errors, with a bound on the sum of the absolute values of the coefficients.  
    15  
    16 To fit the regression parameters on housing data set use the following code: 
    17  
    18 .. literalinclude:: code/lasso-example.py 
    19    :lines: 7,9,10,11 
    20  
    21 .. autoclass:: LassoRegressionLearner 
    22     :members: 
    23  
    24 .. autoclass:: LassoRegression 
    25     :members: 
    26  
    27  
    28 .. autoclass:: LassoRegressionLearner 
    29     :members: 
    30  
    31 .. autoclass:: LassoRegression 
    32     :members: 
    33  
    34 Utility functions 
    35 ----------------- 
    36  
    37 .. autofunction:: center 
    38  
    39 .. autofunction:: get_bootstrap_sample 
    40  
    41 .. autofunction:: permute_responses 
    42  
    43  
    44 ======== 
    45 Examples 
    46 ======== 
    47  
    48 To predict values of the response for the first five instances 
    49 use the code 
    50  
    51 .. literalinclude:: code/lasso-example.py 
    52    :lines: 14,15 
    53  
    54 Output 
    55  
    56 :: 
    57  
    58     Actual: 24.00, predicted: 24.58  
    59     Actual: 21.60, predicted: 23.30  
    60     Actual: 34.70, predicted: 24.98  
    61     Actual: 33.40, predicted: 24.78  
    62     Actual: 36.20, predicted: 24.66  
    63  
    64 To see the fitted regression coefficients, print the model 
    65  
    66 .. literalinclude:: code/lasso-example.py 
    67    :lines: 17 
    68  
    69 The output 
    70  
    71 :: 
    72  
    73     Variable  Coeff Est  Std Error          p 
    74      Intercept     22.533 
    75           CRIM     -0.000      0.023      0.480       
    76          INDUS     -0.010      0.023      0.300       
    77             RM      1.303      0.994      0.000   *** 
    78            AGE     -0.002      0.000      0.320       
    79        PTRATIO     -0.191      0.209      0.050     . 
    80          LSTAT     -0.126      0.105      0.000   *** 
    81     Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1 
    82  
    83  
    84     For 7 variables the regression coefficient equals 0:  
    85     ZN 
    86     CHAS 
    87     NOX 
    88     DIS 
    89     RAD 
    90     TAX 
    91     B 
    92  
    93 shows that some of the regression coefficients are equal to 0.     
    94  
    95  
    96  
    97  
    98  
    99 """ 
    100  
    1011import Orange 
    1022import numpy 
  • docs/reference/rst/Orange.ensemble.rst

    r9372 r10540  
    33################################## 
    44 
     5.. index:: ensemble 
     6 
     7`Ensembles <http://en.wikipedia.org/wiki/Ensemble_learning>`_ use 
     8multiple models to improve prediction performance. The module 
     9implements a number of popular approaches, including bagging, 
     10boosting, stacking and forest trees. Most of these are available both 
     11for classification and regression with exception of stacking, which 
     12with present implementation supports classification only. 
     13 
     14******* 
     15Bagging 
     16******* 
     17 
     18.. index:: bagging 
     19.. index:: 
     20   single: ensemble; ensemble 
     21 
     22.. autoclass:: Orange.ensemble.bagging.BaggedLearner 
     23   :members: 
     24   :show-inheritance: 
     25 
     26.. autoclass:: Orange.ensemble.bagging.BaggedClassifier 
     27   :members: 
     28   :show-inheritance: 
     29 
     30******** 
     31Boosting 
     32******** 
     33 
     34.. index:: boosting 
     35.. index:: 
     36   single: ensemble; boosting 
     37 
     38 
     39.. autoclass:: Orange.ensemble.boosting.BoostedLearner 
     40  :members: 
     41  :show-inheritance: 
     42 
     43.. autoclass:: Orange.ensemble.boosting.BoostedClassifier 
     44   :members: 
     45   :show-inheritance: 
     46 
     47Example 
     48======= 
     49 
     50The following script fits classification models by boosting and 
     51bagging on Lymphography data set with TreeLearner and post-pruning as 
     52a base learner. Classification accuracy of the methods is estimated by 
     5310-fold cross validation (:download:`ensemble.py <code/ensemble.py>`): 
     54 
     55.. literalinclude:: code/ensemble.py 
     56  :lines: 7- 
     57 
     58Running this script demonstrates some benefit of boosting and bagging 
     59over the baseline learner:: 
     60 
     61    Classification Accuracy: 
     62               tree: 0.764 
     63       boosted tree: 0.770 
     64        bagged tree: 0.790 
     65 
     66******** 
     67Stacking 
     68******** 
     69 
     70.. index:: stacking 
     71.. index:: 
     72   single: ensemble; stacking 
     73 
     74 
     75.. autoclass:: Orange.ensemble.stacking.StackedClassificationLearner 
     76  :members: 
     77  :show-inheritance: 
     78 
     79.. autoclass:: Orange.ensemble.stacking.StackedClassifier 
     80   :members: 
     81   :show-inheritance: 
     82 
     83Example 
     84======= 
     85 
     86Stacking often produces classifiers that are more predictive than 
     87individual classifiers in the ensemble. This effect is illustrated by 
     88a script that combines four different classification 
     89algorithms (:download:`ensemble-stacking.py <code/ensemble-stacking.py>`): 
     90 
     91.. literalinclude:: code/ensemble-stacking.py 
     92  :lines: 3- 
     93 
     94The benefits of stacking on this particular data set are 
     95substantial (numbers show classification accuracy):: 
     96 
     97   stacking: 0.934 
     98      bayes: 0.858 
     99       tree: 0.688 
     100         lr: 0.764 
     101        knn: 0.830 
     102 
     103************* 
     104Random Forest 
     105************* 
     106 
     107.. index:: random forest 
     108.. index:: 
     109   single: ensemble; random forest 
     110    
     111.. autoclass:: Orange.ensemble.forest.RandomForestLearner 
     112  :members: 
     113  :show-inheritance: 
     114 
     115.. autoclass:: Orange.ensemble.forest.RandomForestClassifier 
     116  :members: 
     117  :show-inheritance: 
     118 
     119 
     120Example 
     121======== 
     122 
     123The following script assembles a random forest learner and compares it 
     124to a tree learner on a liver disorder (bupa) and housing data sets. 
     125 
     126:download:`ensemble-forest.py <code/ensemble-forest.py>` 
     127 
     128.. literalinclude:: code/ensemble-forest.py 
     129  :lines: 7- 
     130 
     131Notice that our forest contains 50 trees. Learners are compared through  
     1323-fold cross validation:: 
     133 
     134    Classification: bupa.tab 
     135    Learner  CA     Brier  AUC 
     136    tree     0.586  0.829  0.575 
     137    forest   0.710  0.392  0.752 
     138    Regression: housing.tab 
     139    Learner  MSE    RSE    R2 
     140    tree     23.708  0.281  0.719 
     141    forest   11.988  0.142  0.858 
     142 
     143Perhaps the sole purpose of the following example is to show how to 
     144access the individual classifiers once they are assembled into the 
     145forest, and to show how we can assemble a tree learner to be used in 
     146random forests. In the following example the best feature for decision 
     147nodes is selected among three randomly chosen features, and maxDepth 
     148and minExamples are both set to 5. 
     149 
     150:download:`ensemble-forest2.py <code/ensemble-forest2.py>` 
     151 
     152.. literalinclude:: code/ensemble-forest2.py 
     153  :lines: 7- 
     154 
     155Running the above code would report on sizes (number of nodes) of the tree 
     156in a constructed random forest. 
     157 
     158     
     159Feature scoring 
     160=============== 
     161 
     162L. Breiman (2001) suggested the possibility of using random forests as a 
     163non-myopic measure of feature importance. 
     164 
     165The assessment of feature relevance with random forests is based on the 
     166idea that randomly changing the value of an important feature greatly 
     167affects instance's classification, while changing the value of an 
     168unimportant feature does not affect it much. Implemented algorithm 
     169accumulates feature scores over given number of trees. Importance of 
     170all features for a single tree are computed as: correctly classified  
     171OOB instances minus correctly classified OOB instances when the feature is 
     172randomly shuffled. The accumulated feature scores are divided by the 
     173number of used trees and multiplied by 100 before they are returned. 
     174 
     175.. autoclass:: Orange.ensemble.forest.ScoreFeature 
     176  :members: 
     177 
     178Computation of feature importance with random forests is rather slow 
     179and importances for all features need to be computes 
     180simultaneously. When it is called to compute a quality of certain 
     181feature, it computes qualities for all features in the dataset. When 
     182called again, it uses the stored results if the domain is still the 
     183same and the data table has not changed (this is done by checking the 
     184data table's version and is not foolproof; it will not detect if you 
     185change values of existing instances, but will notice adding and 
     186removing instances; see the page on :class:`Orange.data.Table` for 
     187details). 
     188 
     189:download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>` 
     190 
     191.. literalinclude:: code/ensemble-forest-measure.py 
     192  :lines: 7- 
     193 
     194The output of the above script is:: 
     195 
     196    DATA:iris.tab 
     197 
     198    first: 3.91, second: 0.38 
     199 
     200    different random seed 
     201    first: 3.39, second: 0.46 
     202 
     203    All importances: 
     204       sepal length:   3.39 
     205        sepal width:   0.46 
     206       petal length:  30.15 
     207        petal width:  31.98 
     208 
     209References 
     210---------- 
     211 
     212* L Breiman. Bagging Predictors. `Technical report No. 421 
     213  <http://www.stat.berkeley.edu/tech-reports/421.ps.Z>`_. University 
     214  of California, Berkeley, 1994. 
     215* Y Freund, RE Schapire. `Experiments with a New Boosting Algorithm 
     216  <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine 
     217  Learning: Proceedings of the Thirteenth International Conference 
     218  (ICML'96), 1996.  
     219* JR Quinlan. `Boosting, bagging, and C4.5 
     220  <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th 
     221  National Conference on Artificial Intelligence 
     222  (AAAI'96). pp. 725-730, 1996. 
     223* L Brieman. `Random Forests 
     224  <http://www.springerlink.com/content/u0p06167n6173512/>`_. Machine 
     225  Learning, 45, 5-32, 2001. 
     226* M Robnik-Sikonja. `Improving Random Forests 
     227  <http://lkm.fri.uni-lj.si/rmarko/papers/robnik04-ecml.pdf>`_. In 
     228  Proc. of European Conference on Machine Learning (ECML 2004), 
     229  pp. 359-370, 2004. 
     230 
    5231.. automodule:: Orange.ensemble 
    6232 
  • docs/reference/rst/Orange.regression.lasso.rst

    r9372 r10536  
     1############################ 
     2Lasso regression (``lasso``) 
     3############################ 
     4 
    15.. automodule:: Orange.regression.lasso 
     6 
     7.. index:: regression 
     8 
     9.. _`Lasso regression. Regression shrinkage and selection via the lasso`: 
     10    http://www-stat.stanford.edu/~tibs/lasso/lasso.pdf 
     11 
     12 
     13`The Lasso <http://www-stat.stanford.edu/~tibs/lasso/lasso.pdf>`_ is a shrinkage 
     14and selection method for linear regression. It minimizes the usual sum of squared 
     15errors, with a bound on the sum of the absolute values of the coefficients.  
     16 
     17To fit the regression parameters on housing data set use the following code: 
     18 
     19.. literalinclude:: code/lasso-example.py 
     20   :lines: 9,10,11 
     21 
     22.. autoclass:: LassoRegressionLearner 
     23    :members: 
     24 
     25.. autoclass:: LassoRegression 
     26    :members: 
     27 
     28 
     29.. autoclass:: LassoRegressionLearner 
     30    :members: 
     31 
     32.. autoclass:: LassoRegression 
     33    :members: 
     34 
     35Utility functions 
     36----------------- 
     37 
     38.. autofunction:: center 
     39 
     40.. autofunction:: get_bootstrap_sample 
     41 
     42.. autofunction:: permute_responses 
     43 
     44 
     45======== 
     46Examples 
     47======== 
     48 
     49To predict values of the response for the first five instances 
     50use the code 
     51 
     52.. literalinclude:: code/lasso-example.py 
     53   :lines: 14,15 
     54 
     55Output 
     56 
     57:: 
     58 
     59    Actual: 24.00, predicted: 24.58  
     60    Actual: 21.60, predicted: 23.30  
     61    Actual: 34.70, predicted: 24.98  
     62    Actual: 33.40, predicted: 24.78  
     63    Actual: 36.20, predicted: 24.66  
     64 
     65To see the fitted regression coefficients, print the model 
     66 
     67.. literalinclude:: code/lasso-example.py 
     68   :lines: 17 
     69 
     70The output 
     71 
     72:: 
     73 
     74    Variable  Coeff Est  Std Error          p 
     75     Intercept     22.533 
     76          CRIM     -0.000      0.023      0.480       
     77         INDUS     -0.010      0.023      0.300       
     78            RM      1.303      0.994      0.000   *** 
     79           AGE     -0.002      0.000      0.320       
     80       PTRATIO     -0.191      0.209      0.050     . 
     81         LSTAT     -0.126      0.105      0.000   *** 
     82    Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1 
     83 
     84 
     85    For 7 variables the regression coefficient equals 0:  
     86    ZN 
     87    CHAS 
     88    NOX 
     89    DIS 
     90    RAD 
     91    TAX 
     92    B 
     93 
     94shows that some of the regression coefficients are equal to 0.     
     95 
  • docs/reference/rst/Orange.regression.rst

    r10396 r10537  
    33########################### 
    44 
    5 Orange uses the term `classification` to also denote the 
    6 regression. For instance, the dependent variable is called a `class 
    7 variable` even when it is continuous, and models are generally called 
    8 classifiers. A part of the reason is that classification and 
    9 regression rely on the same set of basic classes. 
    10  
    11 Please see the documentation on :doc:`Orange.classification` for 
    12 information on how to fit models in general. 
    13  
    14 Orange contains a number of regression models which are listed below. 
     5Orange implements a set of methods for regression modeling, that is, 
     6where the outcome - dependent variable is real-valued: 
    157 
    168.. toctree:: 
    179   :maxdepth: 1 
    1810 
    19    Orange.regression.mean 
    2011   Orange.regression.linear 
    2112   Orange.regression.lasso 
     
    2314   Orange.regression.earth 
    2415   Orange.regression.tree 
     16   Orange.regression.mean 
     17 
     18Notice that the dependent variable is in this documentation and in the 
     19implementation referred to as `class variable`. See also the documentation 
     20on :doc:`Orange.classification` for information on how to fit models 
     21and use them for prediction. 
     22 
     23************************* 
     24Base class for regression 
     25************************* 
     26 
     27All regression learners are inherited from `BaseRegressionLearner`. 
    2528 
    2629.. automodule:: Orange.regression.base 
Note: See TracChangeset for help on using the changeset viewer.