Changeset 9037:fec5eae6525e in orange


Ignore:
Timestamp:
09/28/11 10:26:30 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
67739d6d20ffb9dfeef659bdd279be24f279d559
Message:

Orange.ensemble.forest documentation updates.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/ensemble/forest.py

    r8043 r9037  
    2929class RandomForestLearner(orange.Learner): 
    3030    """ 
    31     Just like bagging, classifiers in random forests are trained from bootstrap 
    32     samples of training data. Here, classifiers are trees. However, to increase 
    33     randomness, classifiers are built so that at each node the best feature is 
    34     chosen from a subset of features in the training set. We closely follow the 
     31    Just like in bagging, classifiers in random forests are trained from bootstrap 
     32    samples of training data. Here, the classifiers are trees. However, to increase 
     33    randomness, at each node of the tree the best feature is 
     34    chosen from a subset of features in the data. We closely follow the 
    3535    original algorithm (Brieman, 2001) both in implementation and parameter 
    3636    defaults. 
     
    3838    :param trees: number of trees in the forest. 
    3939    :type trees: int 
    40     :param attributes: number of features used in a randomly drawn 
    41             subset when searching for best feature to split the node in 
    42             tree growing (default: None, and if kept this way, this is 
    43             turned into square root of the number of features in the 
    44             training set, when this is presented to learner). Ignored 
    45             if :obj:`learner` is specified. 
     40    :param attributes: number of randomly drawn features among 
     41            which to select the best to split the nodes in tree 
     42            induction. The default, None, means the square root of 
     43            the number of features in the training data. Ignored if 
     44            :obj:`learner` is specified. 
    4645    :type attributes: int 
    47     :param base_learner: A base tree learner. If None (default), 
     46    :param base_learner: A base tree learner. The base learner will be 
     47        randomized with Random Forest's random 
     48        feature subset selection.  If None (default), 
    4849        :class:`~Orange.classification.tree.TreeLearner` with Gini index 
    4950        or MSE for attribute scoring will be used, and it will not split 
    50         nodes with less than 5 data instances. The base learner will be 
    51         randomized with Random Forest's random attribute subset selection. 
     51        nodes with less than 5 data instances. 
    5252    :type base_learner: None or :class:`Orange.classification.tree.TreeLearner` 
    53     :param rand: random generator used in bootstrap sampling. If None is  
    54         passed, then Python's Random from random library is used, with seed 
    55         initialized to 0. 
     53    :param rand: random generator used in bootstrap sampling. If None (default),  
     54        then ``random.Random(0)`` is used. 
    5655    :param learner: Tree induction learner. If None (default),  
    5756        the :obj:`~ScoreFeature.base_learner` will be used (and randomized). If 
     
    9594        Learn from the given table of data instances. 
    9695         
    97         :param instances: data instances to learn from. 
     96        :param instances: learning data. 
    9897        :type instances: class:`Orange.data.Table` 
    9998        :param origWeight: weight. 
     
    125124 
    126125        return RandomForestClassifier(classifiers = classifiers, name=self.name,\ 
    127                     domain=instances.domain, classVar=instances.domain.classVar) 
     126                    domain=instances.domain, class_var=instances.domain.class_var) 
    128127  
    129128 
    130129class RandomForestClassifier(orange.Classifier): 
    131130    """ 
    132     Random forest classifier uses decision trees induced from bootstrapped 
    133     training set to vote on class of presented instance. Most frequent vote 
    134     is returned. However, in our implementation, if class probability is 
    135     requested from a classifier, this will return the averaged probabilities 
    136     from each of the trees. 
    137  
    138     When constructing the classifier manually, the following parameters can 
     131    Uses the trees induced by the :obj:`RandomForestLearner`. An input 
     132    instance is classified into the class with the most frequent vote. 
     133    However, this implementation returns the averaged probabilities from 
     134    each of the trees if class probability is requested. 
     135 
     136    When constructed manually, the following parameters have to 
    139137    be passed: 
    140138 
     
    148146    :type domain: :class:`Orange.data.Domain` 
    149147     
    150     :param classVar: the class feature. 
    151     :type classVar: :class:`Orange.data.variable.Variable` 
    152  
    153     """ 
    154     def __init__(self, classifiers, name, domain, classVar, **kwds): 
     148    :param class_var: the class feature. 
     149    :type class_var: :class:`Orange.data.variable.Variable` 
     150 
     151    """ 
     152    def __init__(self, classifiers, name, domain, class_var, **kwds): 
    155153        self.classifiers = classifiers 
    156154        self.name = name 
    157155        self.domain = domain 
    158         self.classVar = classVar 
     156        self.class_var = class_var 
    159157        self.__dict__.update(kwds) 
    160158 
     
    179177            # voting for class probabilities 
    180178            if resultType == orange.GetProbabilities or resultType == orange.GetBoth: 
    181                 prob = [0.] * len(self.domain.classVar.values) 
     179                prob = [0.] * len(self.domain.class_var.values) 
    182180                for c in self.classifiers: 
    183181                    a = [x for x in c(instance, orange.GetProbabilities)] 
    184182                    prob = map(add, prob, a) 
    185183                norm = sum(prob) 
    186                 cprob = Orange.statistics.distribution.Discrete(self.classVar) 
     184                cprob = Orange.statistics.distribution.Discrete(self.class_var) 
    187185                for i in range(len(prob)): 
    188186                    cprob[i] = prob[i]/norm 
     
    192190            # highest probability through probability voting 
    193191            if resultType == orange.GetValue or resultType == orange.GetBoth: 
    194                 cfreq = [0] * len(self.domain.classVar.values) 
     192                cfreq = [0] * len(self.domain.class_var.values) 
    195193                for c in self.classifiers: 
    196194                    cfreq[int(c(instance))] += 1 
    197195                index = cfreq.index(max(cfreq)) 
    198                 cvalue = Orange.data.Value(self.domain.classVar, index) 
     196                cvalue = Orange.data.Value(self.domain.class_var, index) 
    199197     
    200198            if resultType == orange.GetValue: return cvalue 
     
    218216            if resultType == orange.GetValue or resultType == orange.GetBoth: 
    219217                values = [c(instance).value for c in self.classifiers] 
    220                 cvalue = Orange.data.Value(self.domain.classVar, sum(values) / len(self.classifiers)) 
     218                cvalue = Orange.data.Value(self.domain.class_var, sum(values) / len(self.classifiers)) 
    221219             
    222220            if resultType == orange.GetValue: return cvalue 
     
    225223             
    226224    def __reduce__(self): 
    227         return type(self), (self.classifiers, self.name, self.domain, self.classVar), dict(self.__dict__) 
     225        return type(self), (self.classifiers, self.name, self.domain, self.class_var), dict(self.__dict__) 
    228226 
    229227### MeasureAttribute_randomForests 
     
    233231    :param trees: number of trees in the forest. 
    234232    :type trees: int 
    235     :param attributes: number of features used in a randomly drawn 
    236             subset when searching for best feature to split the node in 
    237             tree growing (default: None, and if kept this way, this is 
    238             turned into square root of the number of features in the 
    239             training set, when this is presented to learner). Ignored 
    240             if :obj:`learner` is specified. 
     233    :param attributes: number of randomly drawn features among 
     234            which to select the best to split the nodes in tree 
     235            induction. The default, None, means the square root of 
     236            the number of features in the training data. Ignored if 
     237            :obj:`learner` is specified. 
    241238    :type attributes: int 
    242     :param base_learner: A base tree learner. If None (default), 
     239    :param base_learner: A base tree learner. The base learner will be 
     240        randomized with Random Forest's random 
     241        feature subset selection.  If None (default), 
    243242        :class:`~Orange.classification.tree.TreeLearner` with Gini index 
    244243        or MSE for attribute scoring will be used, and it will not split 
    245         nodes with less than 5 data instances. The base learner will be 
    246         randomized with Random Forest's random attribute subset selection. 
     244        nodes with less than 5 data instances. 
    247245    :type base_learner: None or :class:`Orange.classification.tree.TreeLearner` 
    248     :param rand: random generator used in bootstrap sampling. If None is  
    249         passed, then Python's Random from random library is used, with seed 
    250         initialized to 0. 
     246    :param rand: random generator used in bootstrap sampling. If None (default),  
     247        then ``random.Random(0)`` is used. 
    251248    :param learner: Tree induction learner. If None (default),  
    252249        the :obj:`~ScoreFeature.base_learner` will be used (and randomized). If 
     
    254251        with no additional transformations. 
    255252    :type learner: None or :class:`Orange.core.Learner` 
     253 
    256254    """ 
    257255    def __init__(self, trees=100, attributes=None, rand=None, base_learner=None, learner=None): 
     
    402400 
    403401        if  node.branchSelector: 
    404             j = attrnum[node.branchSelector.classVar.name] 
     402            j = attrnum[node.branchSelector.class_var.name] 
    405403            cs = set([]) 
    406404            for i in range(len(node.branches)): 
Note: See TracChangeset for help on using the changeset viewer.