Changeset 7369:ffb58994f963 in orange


Ignore:
Timestamp:
02/04/11 00:35:16 (3 years ago)
Author:
crt <crtomir.gorup@…>
Branch:
default
Convert:
47b91a4e192c401dbb503bd6d2f3a3299f5799d8
Message:

Orange.ensemble now uses orngTree, documentation contains working examples.

Location:
orange
Files:
9 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/ensemble/__init__.py

    r7334 r7369  
    3030 
    3131.. literalinclude:: code/ensemble.py 
     32  :lines: 7- 
    3233 
    3334.. _lymphography.tab: code/lymphography.tab 
     
    3637Running this script, we may get something like:: 
    3738 
    38   TODO, we have to wait for executable TreeLearner with m-prunning 
     39    Classification Accuracy: 
     40               tree: 0.764 
     41       boosted tree: 0.770 
     42        bagged tree: 0.790 
     43 
    3944 
    4045================== 
     
    5661 
    5762.. literalinclude:: code/ensemble-forest.py 
     63  :lines: 7- 
    5864 
    5965.. _buba.tab: code/buba.tab 
     
    6470brier score and area under ROC curve:: 
    6571 
    66     WAIT FOR WORKING TREE 
     72    Learner  CA     Brier  AUC 
     73    tree     0.588  0.823  0.578 
     74    forest   0.713  0.383  0.763 
    6775 
    6876Perhaps the sole purpose of the following example is to show how to access 
     
    7785 
    7886.. literalinclude:: code/ensemble-forest2.py 
     87  :lines: 7- 
    7988 
    8089.. _ensemble-forest2.py: code/ensemble-forest2.py 
     
    100109number of used trees and multiplied by 100 before they are returned. 
    101110 
    102 .. autoclass:: Orange.ensemble.forest.MeasureAttribute 
     111.. autoclass:: Orange.ensemble.forest.MeasureAttribute_randomForests 
    103112  :members: 
    104113 
     
    121130 
    122131.. literalinclude:: code/ensemble-forest-measure.py 
     132  :lines: 7- 
    123133 
    124134.. _ensemble-forest-measure.py: code/ensemble-forest-measure.py 
     
    127137Corresponding output:: 
    128138 
    129     WAITING FOR WORKING TREES 
     139    first: 3.30, second: 0.57 
     140 
     141    different random seed 
     142    first: 3.52, second: 0.64 
     143 
     144    All importances: 
     145       sepal length:   3.52 
     146        sepal width:   0.64 
     147       petal length:  26.99 
     148        petal width:  34.42 
    130149 
    131150 
     
    136155    California, Berkeley, 1994. 
    137156* Y Freund, RE Schapire. `Experiments with a New Boosting Algorithm \ 
    138     <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine\ 
     157    <http://citeseer.ist.psu.edu/freund96experiments.html>`_. Machine \ 
    139158    Learning: Proceedings of the Thirteenth International Conference (ICML'96), 1996. 
    140159* JR Quinlan. `Boosting, bagging, and C4.5 \ 
    141     <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th\ 
     160    <http://www.rulequest.com/Personal/q.aaai96.ps>`_ . In Proc. of 13th \ 
    142161    National Conference on Artificial Intelligence (AAAI'96). pp. 725-730, 1996.  
    143162* L Brieman. `Random Forests \ 
    144     <http://www.springerlink.com/content/u0p06167n6173512/>`_. \ 
     163    <http://www.springerlink.com/content/u0p06167n6173512/>`_.\ 
    145164    Machine Learning, 45, 5-32, 2001.  
    146165* M Robnik-Sikonja. `Improving Random Forests \ 
    147166    <http://lkm.fri.uni-lj.si/rmarko/papers/robnik04-ecml.pdf>`_. In \ 
    148167    Proc. of European Conference on Machine Learning (ECML 2004),\ 
    149     pp. 359-370, 2004. [PDF] 
    150  
     168    pp. 359-370, 2004. 
    151169""" 
    152170 
     
    154172__docformat__ = 'restructuredtext' 
    155173import Orange.core as orange 
    156  
    157 class SplitConstructor_AttributeSubset(orange.TreeSplitConstructor): 
    158     def __init__(self, scons, attributes, rand = None): 
    159         import random 
    160         self.scons = scons           # split constructor of original tree 
    161         self.attributes = attributes # number of attributes to consider 
    162         if rand: 
    163             self.rand = rand             # a random generator 
    164         else: 
    165             self.rand = random.Random() 
    166             self.rand.seed(0) 
    167  
    168     def __call__(self, gen, weightID, contingencies, apriori, candidates, clsfr): 
    169         cand = [1]*self.attributes + [0]*(len(candidates) - self.attributes) 
    170         self.rand.shuffle(cand) 
    171         # instead with all attributes, we will invoke split constructor  
    172         # only for the subset of a attributes 
    173         t = self.scons(gen, weightID, contingencies, apriori, cand, clsfr) 
    174         return t 
  • orange/Orange/ensemble/bagging.py

    r7286 r7369  
    4949        :type t: int 
    5050        :param name: The name of the learner. 
    51         :type name: string""" 
     51        :type name: string 
     52        :rtype: :class:`Orange.ensemble.bagging.BaggedClassifier` or  
     53                :class:`Orange.ensemble.bagging.BaggedLearner` 
     54        """ 
    5255        self.t = t 
    5356        self.name = name 
    5457        self.learner = learner 
    5558 
    56     def __call__(self, examples, weight=0): 
     59    def __call__(self, instances, weight=0): 
     60        """Learn from the given table of data instances. 
     61         
     62        :param instances: Data instances to learn from. 
     63        :type instances: Orange.data.Table 
     64        :param weight: Id of meta attribute with weights of instances 
     65        :type weight: int 
     66        :rtype: :class:`Orange.ensemble.bagging.BaggedClassifier` 
     67        """ 
    5768        r = random.Random() 
    5869        r.seed(0) 
    5970         
    60         n = len(examples) 
     71        n = len(instances) 
    6172        classifiers = [] 
    6273        for i in range(self.t): 
     
    6475            for i in range(n): 
    6576                selection.append(r.randrange(n)) 
    66             examples = Orange.data.Table(examples) 
    67             data = examples.getitems(selection) 
     77            instances = Orange.data.Table(instances) 
     78            data = instances.getitems(selection) 
    6879            classifiers.append(self.learner(data, weight)) 
    6980        return BaggedClassifier(classifiers = classifiers, name=self.name,\ 
    70                     classVar=examples.domain.classVar) 
     81                    classVar=instances.domain.classVar) 
    7182 
    7283class BaggedClassifier(orange.Classifier): 
    73     """Return classifier.""" 
    7484    def __init__(self, **kwds): 
    7585        self.__dict__.update(kwds) 
  • orange/Orange/ensemble/boosting.py

    r7286 r7369  
    4545        :type t: int 
    4646        :param name: The name of the learner. 
    47         :type name: string""" 
     47        :type name: string 
     48        :rtype: :class:`Orange.ensemble.boosting.BoostedClassifier` or  
     49                :class:`Orange.ensemble.boosting.BoostedLearner`""" 
    4850        self.t = t 
    4951        self.name = name 
     
    5153 
    5254    def __call__(self, instances, origWeight = 0): 
     55        """Learn from the given table of data instances. 
     56         
     57        :param instances: Data instances to learn from. 
     58        :type instances: Orange.data.Table 
     59        :param origWeight: Weight. 
     60        :type origWeight: int 
     61        :rtype: :class:`Orange.ensemble.boosting.BoostedClassifier`""" 
    5362        import math 
    5463        weight = orange.newmetaid() 
     
    7180                else: 
    7281                    corr.append(1) 
    73             epsilon = epsilon / float(reduce(lambda x,y:x+y.getweight(weight), instances, 0)) 
    74             classifiers.append((classifier, epsilon and math.log((1-epsilon)/epsilon) or _inf)) 
     82            epsilon = epsilon / float(reduce(lambda x,y:x+y.getweight(weight),  
     83                instances, 0)) 
     84            classifiers.append((classifier, epsilon and math.log( 
     85                (1-epsilon)/epsilon) or _inf)) 
    7586            if epsilon==0 or epsilon >= 0.499: 
    7687                if epsilon >= 0.499 and len(classifiers)>1: 
    7788                    del classifiers[-1] 
    7889                instances.removeMetaAttribute(weight) 
    79                 return BoostedClassifier(classifiers = classifiers, name=self.name, classVar=instances.domain.classVar) 
     90                return BoostedClassifier(classifiers = classifiers,  
     91                    name=self.name, classVar=instances.domain.classVar) 
    8092            beta = epsilon/(1-epsilon) 
    8193            for e in range(n): 
     
    8799 
    88100        instances.removeMetaAttribute(weight) 
    89         return BoostedClassifier(classifiers = classifiers, name=self.name, classVar=instances.domain.classVar) 
     101        return BoostedClassifier(classifiers = classifiers, name=self.name,  
     102            classVar=instances.domain.classVar) 
    90103 
    91104class BoostedClassifier(orange.Classifier): 
  • orange/Orange/ensemble/forest.py

    r7334 r7369  
    11from math import sqrt, floor 
    2 import Orange.classification.tree 
    32import Orange.core as orange 
    43import Orange 
     4import Orange.feature.scoring 
     5import orngTree 
     6import random 
     7 
    58 
    69class RandomForestLearner(orange.Learner): 
     
    5659                (from 0.0 to 1.0) that gives estimates on learning progress. 
    5760        :param name: The name of the learner. 
    58         :type name: string""" 
    59         import random 
     61        :type name: string 
     62        :rtype: :class:`Orange.ensemble.forest.RandomForestClassifier` or  
     63                :class:`Orange.ensemble.forest.RandomForestLearner`""" 
    6064        self.trees = trees 
    6165        self.name = name 
     
    7377        if not learner: 
    7478            # tree learner assembled as suggested by Brieman (2001) 
    75             smallTreeLearner = Orange.classification.tree.TreeLearner( 
     79            smallTreeLearner = orngTree.TreeLearner( 
    7680            storeNodeClassifier = 0, storeContingencies=0,  
    7781            storeDistributions=1, minExamples=5).instance() 
    7882            smallTreeLearner.split.discreteSplitConstructor.measure = \ 
    7983                    smallTreeLearner.split.continuousSplitConstructor.measure =\ 
    80                         orange.MeasureAttribute_gini() 
     84                        Orange.feature.scoring.Gini() 
    8185            smallTreeLearner.split = SplitConstructor_AttributeSubset(\ 
    8286                    smallTreeLearner.split, attributes, self.rand) 
     
    8488 
    8589    def __call__(self, examples, weight=0): 
     90        """Learn from the given table of data instances. 
     91         
     92        :param instances: Data instances to learn from. 
     93        :type instances: Orange.data.Table 
     94        :param origWeight: Weight. 
     95        :type origWeight: int 
     96        :rtype: :class:`Orange.ensemble.forest.RandomForestClassifier`""" 
    8697        # if number of attributes for subset is not set, use square root 
    8798        if hasattr(self.learner.split, 'attributes') and\ 
     
    143154### MeasureAttribute_randomForests 
    144155 
    145 class MeasureAttribute(orange.MeasureAttribute): 
    146      
     156class MeasureAttribute_randomForests(orange.MeasureAttribute): 
    147157    def __init__(self, learner=None, trees = 100, attributes=None, rand=None): 
    148158        """:param trees: Number of trees in the forest. 
     
    322332        else: 
    323333          return set([]) 
     334 
     335class SplitConstructor_AttributeSubset(orange.TreeSplitConstructor): 
     336    def __init__(self, scons, attributes, rand = None): 
     337        import random 
     338        self.scons = scons           # split constructor of original tree 
     339        self.attributes = attributes # number of attributes to consider 
     340        if rand: 
     341            self.rand = rand             # a random generator 
     342        else: 
     343            self.rand = random.Random() 
     344            self.rand.seed(0) 
     345 
     346    def __call__(self, gen, weightID, contingencies, apriori, candidates, clsfr): 
     347        cand = [1]*self.attributes + [0]*(len(candidates) - self.attributes) 
     348        self.rand.shuffle(cand) 
     349        # instead with all attributes, we will invoke split constructor  
     350        # only for the subset of a attributes 
     351        t = self.scons(gen, weightID, contingencies, apriori, cand, clsfr) 
     352        return t 
  • orange/doc/Orange/rst/code/ensemble-forest-measure.py

    r7334 r7369  
     1import Orange 
    12import random 
    2 import Orange 
    33 
    4 data = Orange.data.Table("iris.tab") 
     4table = Orange.data.Table("iris.tab") 
    55 
    6 measure = Orange.ensemble.forest.MeasureAttribute(trees=100) 
     6measure = Orange.ensemble.forest.MeasureAttribute_randomForests(trees=100) 
    77 
    88#call by attribute index 
    9 imp0 = measure(0, data)  
     9imp0 = measure(0, table)  
    1010#call by orange.Variable 
    11 imp1 = measure(data.domain.attributes[1], data) 
     11imp1 = measure(table.domain.attributes[1], table) 
    1212print "first: %0.2f, second: %0.2f\n" % (imp0, imp1) 
    1313 
    1414print "different random seed" 
    15 measure = Orange.ensemble.forest.MeasureAttribute(trees=100, rand=random.Random(10)) 
     15measure = Orange.ensemble.forest.MeasureAttribute_randomForests(trees=100,  
     16        rand=random.Random(10)) 
    1617 
    17 imp0 = measure(0, data) 
    18 imp1 = measure(data.domain.attributes[1], data) 
     18imp0 = measure(0, table) 
     19imp1 = measure(table.domain.attributes[1], table) 
    1920print "first: %0.2f, second: %0.2f\n" % (imp0, imp1) 
    2021 
    2122print "All importances:" 
    22 imps = measure.importances(data) 
     23imps = measure.importances(table) 
    2324for i,imp in enumerate(imps): 
    24     print "%15s: %6.2f" % (data.domain.attributes[i].name, imp) 
     25  print "%15s: %6.2f" % (table.domain.attributes[i].name, imp) 
  • orange/doc/Orange/rst/code/ensemble-forest.py

    r7302 r7369  
    1 import Orange 
     1# Description: Demonstrates the use of random forests from Orange.ensemble.forest module 
     2# Category:    classification, ensembles 
     3# Classes:     RandomForestLearner 
     4# Uses:        bupa.tab 
     5# Referenced:  orngEnsemble.htm 
     6 
     7import Orange, orngTree 
    28 
    39data = Orange.data.Table('bupa.tab') 
    410forest = Orange.ensemble.forest.RandomForestLearner(trees=50, name="forest") 
    5 tree = Orange.classification.tree.TreeLearner(minExamples=2, mForPrunning=2,\ 
     11tree = orngTree.TreeLearner(minExamples=2, mForPrunning=2, \ 
    612                            sameMajorityPruning=True, name='tree') 
    713learners = [tree, forest] 
    814 
    915import orngTest, orngStat 
    10 results = orngTest.crossValidation(learners, data, folds=10) 
     16results = orngTest.crossValidation(learners, data, folds=3) 
    1117print "Learner  CA     Brier  AUC" 
    1218for i in range(len(learners)): 
    1319    print "%-8s %5.3f  %5.3f  %5.3f" % (learners[i].name, \ 
    14         orngStat.CA(results)[i], 
     20        orngStat.CA(results)[i],  
    1521        orngStat.BrierScore(results)[i], 
    1622        orngStat.AUC(results)[i]) 
  • orange/doc/Orange/rst/code/ensemble-forest2.py

    r7302 r7369  
    1 import Orange 
    2 import Orange.core as orange 
     1# Description: Defines a tree learner (trunks of depth less than 5) and uses them in forest tree, prints out the number of nodes in each tree 
     2# Category:    classification, ensembles 
     3# Classes:     RandomForestLearner 
     4# Uses:        bupa.tab 
     5# Referenced:  orngEnsemble.htm 
     6 
     7import Orange, orngTree 
    38 
    49data = Orange.data.Table('bupa.tab') 
    510 
    6 tree = Orange.classification.tree.TreeLearner(storeNodeClassifier = 0,  
    7     storeContingencies=0, storeDistributions=1, minExamples=5, ).instance() 
    8 gini = orange.MeasureAttribute_gini() 
     11tree = orngTree.TreeLearner(storeNodeClassifier = 0, storeContingencies=0, \ 
     12  storeDistributions=1, minExamples=5, ).instance() 
     13gini = Orange.feature.scoring.Gini() 
    914tree.split.discreteSplitConstructor.measure = \ 
    1015  tree.split.continuousSplitConstructor.measure = gini 
    1116tree.maxDepth = 5 
    12 tree.split = Orange.ensemble.SplitConstructor_AttributeSubset(tree.split, 3) 
     17tree.split = Orange.ensemble.forest.SplitConstructor_AttributeSubset(tree.split, 3) 
    1318 
    1419forestLearner = Orange.ensemble.forest.RandomForestLearner(learner=tree, trees=50) 
     
    1621 
    1722for c in forest.classifiers: 
    18     print Orange.classification.tree.countNodes(c), 
     23    print orngTree.countNodes(c), 
    1924print 
  • orange/doc/Orange/rst/code/ensemble.py

    r7284 r7369  
    1 import Orange 
     1# Description: Demonstrates the use of boosting and bagging from Orange.ensemble module 
     2# Category:    classification, ensembles 
     3# Classes:     BoostedLearner, BaggedLearner 
     4# Uses:        lymphography.tab 
     5# Referenced:  orngEnsemble.htm 
     6 
     7import Orange, orngTree 
    28import orngTest, orngStat 
    39 
    4 tree = Orange.classification.tree.TreeLearner(name="tree") # mForPruning=2 
     10tree = orngTree.TreeLearner(mForPruning=2, name="tree") 
    511bs = Orange.ensemble.boosting.BoostedLearner(tree, name="boosted tree") 
    612bg = Orange.ensemble.bagging.BaggedLearner(tree, name="bagged tree") 
     
    915 
    1016learners = [tree, bs, bg] 
    11 results = orngTest.crossValidation(learners, data) 
     17results = orngTest.crossValidation(learners, data, folds=3) 
    1218print "Classification Accuracy:" 
    1319for i in range(len(learners)): 
  • orange/orngEnsemble.py

    r7334 r7369  
    11from Orange.ensemble.bagging import * 
    22from Orange.ensemble.boosting import * 
    3 from Orange.ensemble.forest import RandomForestLearner 
    4 from Orange.ensemble.forest import RandomForestClassifier 
    5 from ORange.ensemble.forest import MeasureAttribute as MeasureAttribute_randomForests 
     3from Orange.ensemble.forest import * 
Note: See TracChangeset for help on using the changeset viewer.