01/06/13 20:18:48 (16 months ago)
Miha Stajdohar <miha.stajdohar@…>
11059:83e86ea77981, 11060:340b8bf1cbb4
11057:3da1cf37de17 (diff), 11056:a68fd2fce444 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Merged with tutorial updates.

1 edited


  • docs/tutorial/rst/ensembles.rst

    r9994 r11058  
    11.. index:: ensembles 
     6`Learning of ensembles <http://en.wikipedia.org/wiki/Ensemble_learning>`_ combines the predictions of separate models to gain in accuracy. The models may come from different training data samples, or may use different learners on the same data sets. Learners may also be diversified by changing their parameter sets. 
     8In Orange, ensembles are simply wrappers around learners. They behave just like any other learner. Given the data, they return models that can predict the outcome for any data instance:: 
     10   >>> import Orange 
     11   >>> data = Orange.data.Table("housing") 
     12   >>> tree = Orange.classification.tree.TreeLearner() 
     13   >>> btree = Orange.ensemble.bagging.BaggedLearner(tree) 
     14   >>> btree 
     15   BaggedLearner 'Bagging' 
     16   >>> btree(data) 
     17   BaggedClassifier 'Bagging' 
     18   >>> btree(data)(data[0]) 
     19   <orange.Value 'MEDV'='24.6'> 
     21The last line builds a predictor (``btree(data)``) and then uses it on a first data instance. 
     23Most ensemble methods can wrap either classification or regression learners. Exceptions are task-specialized techniques such as boosting. 
     25Bagging and Boosting 
    228.. index::  
    329   single: ensembles; bagging 
     31`Bootstrap aggregating <http://en.wikipedia.org/wiki/Bootstrap_aggregating>`_, or bagging, samples the training data uniformly and with replacement to train different predictors. Majority vote (classification) or mean (regression) across predictions then combines independent predictions into a single prediction.  
    433.. index::  
    534   single: ensembles; boosting 
    7 Ensemble learners 
    8 ================= 
     36In general, boosting is a technique that combines weak learners into a single strong learner. Orange implements `AdaBoost <http://en.wikipedia.org/wiki/AdaBoost>`_, which assigns weights to data instances according to performance of the learner. AdaBoost uses these weights to iteratively sample the instances to focus on those that are harder to classify. In the aggregation AdaBoost emphases individual classifiers with better performance on their training sets. 
    10 Building ensemble classifiers in Orange is simple and easy. Starting 
    11 from learners/classifiers that can predict probabilities and, if 
    12 needed, use example weights, ensembles are actually wrappers that can 
    13 aggregate predictions from a list of constructed classifiers. These 
    14 wrappers behave exactly like other Orange learners/classifiers. We 
    15 will here first show how to use a module for bagging and boosting that 
    16 is included in Orange distribution (:py:mod:`Orange.ensemble` module), and 
    17 then, for a somehow more advanced example build our own ensemble 
    18 learner. Using this module, using it is very easy: you have to define 
    19 a learner, give it to bagger or booster, which in turn returns a new 
    20 (boosted or bagged) learner. Here goes an example (:download:`ensemble3.py <code/ensemble3.py>`):: 
     38The following script wraps a classification tree in boosted and bagged learner, and tests the three learner through cross-validation: 
    22    import orange, orngTest, orngStat, orngEnsemble 
    23    data = orange.ExampleTable("promoters") 
    25    majority = orange.MajorityLearner() 
    26    majority.name = "default" 
    27    knn = orange.kNNLearner(k=11) 
    28    knn.name = "k-NN (k=11)" 
    30    bagged_knn = orngEnsemble.BaggedLearner(knn, t=10) 
    31    bagged_knn.name = "bagged k-NN" 
    32    boosted_knn = orngEnsemble.BoostedLearner(knn, t=10) 
    33    boosted_knn.name = "boosted k-NN" 
    35    learners = [majority, knn, bagged_knn, boosted_knn] 
    36    results = orngTest.crossValidation(learners, data, folds=10) 
    37    print "        Learner   CA     Brier Score" 
    38    for i in range(len(learners)): 
    39        print ("%15s:  %5.3f  %5.3f") % (learners[i].name, 
    40            orngStat.CA(results)[i], orngStat.BrierScore(results)[i]) 
     40.. literalinclude:: code/ensemble-bagging.py 
    42 Most of the code is used for defining and naming objects that learn, 
    43 and the last piece of code is to report evaluation results. Notice 
    44 that to bag or boost a learner, it takes only a single line of code 
    45 (like, ``bagged_knn = orngEnsemble.BaggedLearner(knn, t=10)``)! 
    46 Parameter ``t`` in bagging and boosting refers to number of 
    47 classifiers that will be used for voting (or, if you like better, 
    48 number of iterations by boosting/bagging). Depending on your random 
    49 generator, you may get something like:: 
     42The benefit of the two ensembling techniques, assessed in terms of area under ROC curve, is obvious:: 
    51            Learner   CA     Brier Score 
    52            default:  0.473  0.501 
    53        k-NN (k=11):  0.859  0.240 
    54        bagged k-NN:  0.813  0.257 
    55       boosted k-NN:  0.830  0.244 
     44    tree: 0.83 
     45   boost: 0.90 
     46    bagg: 0.91 
     51.. index::  
     52   single: ensembles; stacking 
     54Consider we partition a training set into held-in and held-out set. Assume that our taks is prediction of y, either probability of the target class in classification or a real value in regression. We are given a set of learners. We train them on held-in set, and obtain a vector of prediction on held-out set. Each element of the vector corresponds to prediction of individual predictor. We can now learn how to combine these predictions to form a target prediction, by training a new predictor on a data set of predictions and true value of y in held-out set. The technique is called `stacked generalization <http://en.wikipedia.org/wiki/Ensemble_learning#Stacking>`_, or in short stacking. Instead of a single split to held-in and held-out data set, the vectors of predictions are obtained through cross-validation. 
     56Orange provides a wrapper for stacking that is given a set of base learners and a meta learner: 
     58.. literalinclude:: code/ensemble-stacking.py 
     59   :lines: 3- 
     61By default, the meta classifier is naive Bayesian classifier. Changing this to logistic regression may be a good idea as well:: 
     63    stack = Orange.ensemble.stacking.StackedClassificationLearner(base_learners, \ 
     64               meta_learner=Orange.classification.logreg.LogRegLearner) 
     66Stacking is often better than each of the base learners alone, as also demonstrated by running our script:: 
     68   stacking: 0.967 
     69      bayes: 0.933 
     70       tree: 0.836 
     71        knn: 0.947 
     73Random Forests 
     76.. index::  
     77   single: ensembles; random forests 
     79`Random forest <http://en.wikipedia.org/wiki/Random_forest>`_ ensembles tree predictors. The diversity of trees is achieved in randomization of feature selection for node split criteria, where instead of the best feature one is picked arbitrary from a set of best features. Another source of randomization is a bootstrap sample of data from which the threes are developed. Predictions from usually several hundred trees are aggregated by voting. Constructing so many trees may be computationally demanding. Orange uses a special tree inducer (Orange.classification.tree.SimpleTreeLearner, considered by default) optimized for speed in random forest construction:  
     81.. literalinclude:: code/ensemble-forest.py 
     82   :lines: 3- 
     84Random forests are often superior when compared to other base classification or regression learners:: 
     86   forest: 0.976 
     87    bayes: 0.935 
     88      knn: 0.952 
Note: See TracChangeset for help on using the changeset viewer.