Changeset 7415:66548e91de0b in orange


Ignore:
Timestamp:
02/04/11 12:18:31 (3 years ago)
Author:
matija <matija.polajnar@…>
Branch:
default
Convert:
56a4046aa18eaaa12718a7c1b075bdb5c06a2b55
Message:

Ensemble documentation fixes.

Location:
orange/Orange/ensemble
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/ensemble/__init__.py

    r7397 r7415  
    77 
    88 
    9 ================== 
     9======= 
    1010Bagging 
    11 ================== 
     11======= 
    1212 
    1313.. index:: ensemble bagging 
     
    1616   :show-inheritance: 
    1717 
    18 ================== 
     18.. autoclass:: Orange.ensemble.bagging.BaggedClassifier 
     19   :members: 
     20   :show-inheritance: 
     21 
     22======== 
    1923Boosting 
    20 ================== 
     24======== 
    2125 
    2226.. index:: ensemble boosting 
     27 
    2328.. autoclass:: Orange.ensemble.boosting.BoostedLearner 
    2429  :members: 
    2530  :show-inheritance: 
    2631 
     32.. autoclass:: Orange.ensemble.boosting.BoostedClassifier 
     33   :members: 
     34   :show-inheritance: 
     35 
    2736Example 
    28 ======== 
     37======= 
    2938Let us try boosting and bagging on Lymphography data set and use TreeLearner 
    3039with post-pruning as a base learner. For testing, we use 10-fold cross 
     
    4756 
    4857 
    49 ======= 
     58============= 
    5059Random Forest 
    51 ======= 
     60============= 
    5261 
    5362.. index:: ensemble randomforest 
    5463.. autoclass:: Orange.ensemble.forest.RandomForestLearner 
     64  :members: 
     65  :show-inheritance: 
     66 
     67.. autoclass:: Orange.ensemble.forest.RandomForestClassifier 
    5568  :members: 
    5669  :show-inheritance: 
     
    8295the individual classifiers once they are assembled into the forest, and to  
    8396show how we can assemble a tree learner to be used in random forests. The  
    84 tree induction uses an feature subset split constructor, which we have  
     97tree induction uses the feature subset split constructor, which we have  
    8598borrowed from :class:`Orange.ensemble` and from which we have requested the 
    8699best feature for decision nodes to be selected from three randomly  
     
    106119Assessing relevance of features with random forests is based on the 
    107120idea that randomly changing the value of an important feature greatly 
    108 affects example's classification while changing the value of an 
    109 unimportant feature doen't affect it much. Implemented algorithm 
    110 accumulates feature scores over given number of trees. Importances of 
     121affects instance's classification while changing the value of an 
     122unimportant feature does not affect it much. Implemented algorithm 
     123accumulates feature scores over given number of trees. Importance of 
    111124all features for a single tree are computed as: correctly classified OOB 
    112 examples minus correctly classified OOB examples when an feature is 
     125instances minus correctly classified OOB instances when the feature is 
    113126randomly shuffled. The accumulated feature scores are divided by the 
    114127number of used trees and multiplied by 100 before they are returned. 
     
    118131 
    119132Computation of feature importance with random forests is rather slow. Also,  
    120 importances for all features need to be considered simultaneous. Since we 
     133importances for all features need to be considered simultaneously. Since we 
    121134normally compute feature importance with random forests for all features in 
    122135the dataset, MeasureAttribute_randomForests caches the results. When it  
    123136is called to compute a quality of certain feature, it computes qualities 
    124137for all features in the dataset. When called again, it uses the stored  
    125 results if the domain is still the same and the example table has not 
    126 changed (this is done by checking the example tables version and is 
    127 not foolproof; it won't detect if you change values of existing examples, 
    128 but will notice adding and removing examples; see the page on  
     138results if the domain is still the same and the data table has not 
     139changed (this is done by checking the data table's version and is 
     140not foolproof; it will not detect if you change values of existing instances, 
     141but will notice adding and removing instances; see the page on  
    129142:class:`Orange.data.Table` for details). 
    130143 
    131 Caching will only have an effect if you use the same instance for all 
     144Caching will only have an effect if you use the same 
     145:class:`Orange.ensemble.forest.MeasureAttribute_randomForests` object for all 
    132146features in the domain. 
    133147 
  • orange/Orange/ensemble/bagging.py

    r7397 r7415  
    55import Orange 
    66 
    7 ####################################################################### 
    8 # Bagging 
    9  
    10 #def BaggedLearner(learner=None, t=10, name='Bagging', examples=None): 
    11 #    learner = BaggedLearnerClass(learner, t, name) 
    12 #    if examples: 
    13 #        return learner(examples) 
    14 #    else: 
    15 #        return learner 
    167 
    178class BaggedLearner(orange.Learner): 
     
    1910    BaggedLearner takes a learner and returns a bagged learner, which is  
    2011    essentially a wrapper around the learner passed as an argument. If  
    21     examples are passed in arguments, BaggedLearner returns a bagged  
    22     classifiers. Both learner and classifier then behave just like any  
     12    instances are passed in arguments, BaggedLearner returns a bagged  
     13    classifier. Both learner and classifier then behave just like any  
    2314    other learner and classifier in Orange. 
    2415 
    25     Bagging, in essence, takes a training data and a learner, and builds t  
    26     classifiers each time presenting a learner a bootstrap sample from the  
    27     training data. When given a test example, classifiers vote on class,  
    28     and a bagged classifier returns a class with a highest number of votes.  
     16    Bagging, in essence, takes training data and a learner, and builds *t*  
     17    classifiers, each time presenting a learner a bootstrap sample from the  
     18    training data. When given a test instance, classifiers vote on class,  
     19    and a bagged classifier returns a class with the highest number of votes.  
    2920    As implemented in Orange, when class probabilities are requested, these  
    3021    are proportional to the number of votes for a particular class. 
     
    3223    :param learner: learner to be bagged. 
    3324    :type learner: :class:`Orange.core.Learner` 
    34     :param examples: if examples are passed to BaggedLearner, this returns 
    35         a BaggedClassifier, that is, creates t classifiers using learner  
    36         and a subset of examples, as appropriate for bagging. 
    37     :type examples: :class:`Orange.data.Table` 
    3825    :param t: number of bagged classifiers, that is, classifiers created 
    39         when examples are passed to bagged learner. 
     26        when instances are passed to bagged learner. 
    4027    :type t: int 
    41     :param name: name of the learner. 
    42     :type name: string 
     28    :param name: name of the resulting learner. 
     29    :type name: str 
    4330    :rtype: :class:`Orange.ensemble.bagging.BaggedClassifier` or  
    4431            :class:`Orange.ensemble.bagging.BaggedLearner` 
    4532    """ 
    46     def __new__(cls, learner, examples=None, weightId=None, **kwargs): 
     33    def __new__(cls, learner, instances=None, weightId=None, **kwargs): 
    4734        self = orange.Learner.__new__(cls, **kwargs) 
    48         if examples is not None: 
     35        if instances is not None: 
    4936            self.__init__(self, learner, **kwargs) 
    50             return self.__call__(examples, weightId) 
     37            return self.__call__(instances, weightId) 
    5138        else: 
    5239            return self 
     
    5845 
    5946    def __call__(self, instances, weight=0): 
    60         """Learn from the given table of data instances. 
     47        """ 
     48        Learn from the given table of data instances. 
    6149         
    6250        :param instances: data instances to learn from. 
     
    6553        :type weight: int 
    6654        :rtype: :class:`Orange.ensemble.bagging.BaggedClassifier` 
     55         
    6756        """ 
    6857        r = random.Random() 
     
    8271 
    8372class BaggedClassifier(orange.Classifier): 
    84     def __init__(self, **kwds): 
     73    """ 
     74    A classifier that uses a bagging technique. Usually the learner 
     75    (:class:`Orange.ensemble.bagging.BaggedLearner`) is used to construct the 
     76    classifier. 
     77     
     78    When constructing the classifier manually, the following parameters can 
     79    be passed: 
     80 
     81    :param classifiers: a list of boosted classifiers. 
     82    :type classifiers: list 
     83     
     84    :param name: name of the resulting classifier. 
     85    :type name: str 
     86     
     87    :param classVar: the class attribute. 
     88    :type classVar: :class:`Orange.data.feature.Feature` 
     89 
     90    """ 
     91 
     92    def __init__(self, classifiers, name, classVar, **kwds): 
     93        self.classifiers = classifiers 
     94        self.name = name 
     95        self.classVar = classVar 
    8596        self.__dict__.update(kwds) 
    8697 
    87     def __call__(self, example, resultType = orange.GetValue): 
     98    def __call__(self, instance, resultType = orange.GetValue): 
     99        """ 
     100        :param instance: instance to be classified. 
     101        :type instance: :class:`Orange.data.Instance` 
     102         
     103        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \ 
     104              :class:`Orange.classification.Classifier.GetProbabilities` or 
     105              :class:`Orange.classification.Classifier.GetBoth` 
     106         
     107        :rtype: :class:`Orange.data.Value`,  
     108              :class:`Orange.statistics.Distribution` or a tuple with both 
     109        """ 
    88110        if self.classVar.varType == Orange.data.Type.Discrete: 
    89111            freq = [0.] * len(self.classVar.values) 
    90112            for c in self.classifiers: 
    91                 freq[int(c(example))] += 1 
     113                freq[int(c(instance))] += 1 
    92114            index = freq.index(max(freq)) 
    93115            value = Orange.data.Value(self.classVar, index) 
     
    101123                return (value, freq) 
    102124        elif self.classVar.varType ==Orange.data.Type.Continuous: 
    103             votes = [c(example, orange.GetBoth if resultType==\ 
     125            votes = [c(instance, orange.GetBoth if resultType==\ 
    104126                orange.GetProbabilities else resultType) \ 
    105127                for c in self.classifiers] 
  • orange/Orange/ensemble/boosting.py

    r7397 r7415  
    55_inf = 100000 
    66 
    7 #def BoostedLearner(learner, examples=None, t=10, name='AdaBoost.M1'): 
    8 #    learner = BoostedLearnerClass(learner, t, name) 
    9 #    if examples: 
    10 #        return learner(examples) 
    11 #    else: 
    12 #        return learner 
    13  
    147class BoostedLearner(orange.Learner): 
    158    """ 
    169    Instead of drawing a series of bootstrap samples from the training set, 
    17     bootstrap maintains a weight for each instance. When classifier is  
     10    bootstrap maintains a weight for each instance. When a classifier is  
    1811    trained from the training set, the weights for misclassified instances  
    19     are increased. Just like in bagged learner, the class is decided based  
     12    are increased. Just like in a bagged learner, the class is decided based  
    2013    on voting of classifiers, but in boosting votes are weighted by accuracy  
    2114    obtained on training set. 
     
    2417    1996). From user's viewpoint, the use of the BoostedLearner is similar to  
    2518    that of BaggedLearner. The learner passed as an argument needs to deal  
    26     with example weights. 
     19    with instance weights. 
    2720     
    28     :param learner: learner to be bagged. 
     21    :param learner: learner to be boosted. 
    2922    :type learner: :class:`Orange.core.Learner` 
    30     :param examples: ff examples are passed to BoostedLearner, 
    31         this returns a BoostedClassifier, that is, creates t  
    32         classifiers using learner and a subset of examples,  
    33         as appropriate for AdaBoost.M1 (default: None). 
    34     :type examples: :class:`Orange.data.Table` 
    35     :param t: number of boosted classifiers created from the example set. 
     23    :param t: number of boosted classifiers created from the instance set. 
    3624    :type t: int 
    37     :param name: name of the learner. 
    38     :type name: string 
     25    :param name: name of the resulting learner. 
     26    :type name: str 
    3927    :rtype: :class:`Orange.ensemble.boosting.BoostedClassifier` or  
    4028            :class:`Orange.ensemble.boosting.BoostedLearner` 
    4129    """ 
    42     def __new__(cls, learner, examples=None, weightId=None, **kwargs): 
     30    def __new__(cls, learner, instances=None, weightId=None, **kwargs): 
    4331        self = orange.Learner.__new__(cls, **kwargs) 
    44         if examples is not None: 
     32        if instances is not None: 
    4533            self.__init__(self, learner, **kwargs) 
    46             return self.__call__(examples, weightId) 
     34            return self.__call__(instances, weightId) 
    4735        else: 
    4836            return self 
     
    5442 
    5543    def __call__(self, instances, origWeight = 0): 
    56         """Learn from the given table of data instances. 
     44        """ 
     45        Learn from the given table of data instances. 
    5746         
    5847        :param instances: data instances to learn from. 
     
    6049        :param origWeight: weight. 
    6150        :type origWeight: int 
    62         :rtype: :class:`Orange.ensemble.boosting.BoostedClassifier`""" 
     51        :rtype: :class:`Orange.ensemble.boosting.BoostedClassifier` 
     52         
     53        """ 
    6354        import math 
    6455        weight = orange.newmetaid() 
     
    10495 
    10596class BoostedClassifier(orange.Classifier): 
    106     def __init__(self, **kwds): 
     97    """ 
     98    A classifier that uses a boosting technique. Usually the learner 
     99    (:class:`Orange.ensemble.boosting.BoostedLearner`) is used to construct the 
     100    classifier. 
     101     
     102    When constructing the classifier manually, the following parameters can 
     103    be passed: 
     104 
     105    :param classifiers: a list of boosted classifiers. 
     106    :type classifiers: list 
     107     
     108    :param name: name of the resulting classifier. 
     109    :type name: str 
     110     
     111    :param classVar: the class attribute. 
     112    :type classVar: :class:`Orange.data.feature.Feature` 
     113     
     114    """ 
     115 
     116    def __init__(self, classifiers, name, classVar, **kwds): 
     117        self.classifiers = classifiers 
     118        self.name = name 
     119        self.classVar = classVar 
    107120        self.__dict__.update(kwds) 
    108121 
    109     def __call__(self, example, resultType = orange.GetValue): 
     122    def __call__(self, instance, resultType = orange.GetValue): 
     123        """ 
     124        :param instance: instance to be classified. 
     125        :type instance: :class:`Orange.data.Instance` 
     126         
     127        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \ 
     128              :class:`Orange.classification.Classifier.GetProbabilities` or 
     129              :class:`Orange.classification.Classifier.GetBoth` 
     130         
     131        :rtype: :class:`Orange.data.Value`,  
     132              :class:`Orange.statistics.Distribution` or a tuple with both 
     133        """ 
    110134        votes = [0.] * len(self.classVar.values) 
    111135        for c, e in self.classifiers: 
    112             votes[int(c(example))] += e 
     136            votes[int(c(instance))] += e 
    113137        index = orngMisc.selectBestIndex(votes) 
    114138        # TODO 
  • orange/Orange/ensemble/forest.py

    r7397 r7415  
    99    """ 
    1010    Just like bagging, classifiers in random forests are trained from bootstrap 
    11     samples of training data. Here, classifiers are trees, but to increase 
    12     randomness build in the way that at each node the best attribute is chosen 
    13     from a subset of attributes in the training set. We closely follows the 
     11    samples of training data. Here, classifiers are trees. However, to increase 
     12    randomness, classifiers are built so that at each node the best feature is 
     13    chosen from a subset of features in the training set. We closely follow the 
    1414    original algorithm (Brieman, 2001) both in implementation and parameter 
    1515    defaults. 
    16  
    17     .. note:: 
    18         Random forest classifier uses decision trees induced from bootstrapped 
    19         training set to vote on class of presented example. Most frequent vote 
    20         is returned. However, in our implementation, if class probability is 
    21         requested from a classifier, this will return the averaged probabilities 
    22         from each of the trees. 
    23          
    24     :param examples: if these are passed, the call returns  
    25             RandomForestClassifier, that is, creates the required set of 
    26             decision trees, which, when presented with an examples, vote 
    27             for the predicted class. 
    28     :type examples: :class:`Orange.data.Table` 
     16         
     17    :param learner: although not required, one can use this argument 
     18            to pass one's own tree induction algorithm. If None is passed, 
     19            RandomForestLearner will use Orange's tree induction  
     20            algorithm such that induction nodes with less than 5  
     21            data instances will not be considered for (further) splitting. 
     22    :type learner: :class:`Orange.core.Learner` 
    2923    :param trees: number of trees in the forest. 
    3024    :type trees: int 
    31     :param learner: although not required, one can use this argument 
    32             to pass one's own tree induction algorithm. If None is passed 
    33             , RandomForestLearner will use Orange's tree induction  
    34             algorithm such that in induction nodes with less then 5  
    35             examples will not be considered for (further) splitting. 
    36     :type learner: :class:`Orange.core.Learner` 
    37     :param attributes: number of attributes used in a randomly drawn 
    38             subset when searching for best attribute to split the node 
     25    :param attributes: number of features used in a randomly drawn 
     26            subset when searching for best feature to split the node 
    3927            in tree growing (default: None, and if kept this way, this 
    40             is turned into square root of attributes in the training set, 
    41             when this is presented to learner). 
     28            is turned into square root of the number of features in the 
     29            training set, when this is presented to learner). 
     30    :type attributes: int 
    4231    :param rand: random generator used in bootstrap sampling.  
    43             If none is passed, then Python's Random from random library is  
     32            If None is passed, then Python's Random from random library is  
    4433            used, with seed initialized to 0. 
    4534    :type rand: function 
     
    5342    """ 
    5443 
    55     def __new__(cls, examples=None, weight = 0, **kwds): 
     44    def __new__(cls, instances=None, weight = 0, **kwds): 
    5645        self = orange.Learner.__new__(cls, **kwds) 
    57         if examples: 
     46        if instances: 
    5847            self.__init__(**kwds) 
    59             return self.__call__(examples, weight) 
     48            return self.__call__(instances, weight) 
    6049        else: 
    6150            return self 
     
    8877            self.learner = smallTreeLearner 
    8978 
    90     def __call__(self, examples, weight=0): 
    91         """Learn from the given table of data instances. 
     79    def __call__(self, instances, weight=0): 
     80        """ 
     81        Learn from the given table of data instances. 
    9282         
    9383        :param instances: data instances to learn from. 
     
    9585        :param origWeight: weight. 
    9686        :type origWeight: int 
    97         :rtype: :class:`Orange.ensemble.forest.RandomForestClassifier`""" 
    98         # if number of attributes for subset is not set, use square root 
     87        :rtype: :class:`Orange.ensemble.forest.RandomForestClassifier` 
     88         
     89        """ 
     90        # if number of features for subset is not set, use square root 
    9991        if hasattr(self.learner.split, 'attributes') and\ 
    10092                    not self.learner.split.attributes: 
    10193            self.learner.split.attributes = int(sqrt(\ 
    102                     len(examples.domain.attributes))) 
     94                    len(instances.domain.attributes))) 
    10395 
    10496        self.rand.setstate(self.randstate) #when learning again, set the same state 
    10597 
    106         n = len(examples) 
     98        n = len(instances) 
    10799        # build the forest 
    108100        classifiers = [] 
     
    112104            for j in range(n): 
    113105                selection.append(self.rand.randrange(n)) 
    114             data = examples.getitems(selection) 
     106            data = instances.getitems(selection) 
    115107            # build the model from the bootstrap sample 
    116108            classifiers.append(self.learner(data)) 
     
    120112 
    121113        return RandomForestClassifier(classifiers = classifiers, name=self.name,\ 
    122                     domain=examples.domain, classVar=examples.domain.classVar) 
     114                    domain=instances.domain, classVar=instances.domain.classVar) 
    123115         
    124116class RandomForestClassifier(orange.Classifier): 
    125     def __init__(self, **kwds): 
     117    """ 
     118    Random forest classifier uses decision trees induced from bootstrapped 
     119    training set to vote on class of presented instance. Most frequent vote 
     120    is returned. However, in our implementation, if class probability is 
     121    requested from a classifier, this will return the averaged probabilities 
     122    from each of the trees. 
     123 
     124    When constructing the classifier manually, the following parameters can 
     125    be passed: 
     126 
     127    :param classifiers: a list of classifiers to be used. 
     128    :type classifiers: list 
     129     
     130    :param name: name of the resulting classifier. 
     131    :type name: str 
     132     
     133    :param domain: the domain of the learning set. 
     134    :type domain: :class:`Orange.data.Domain` 
     135     
     136    :param classVar: the class attribute. 
     137    :type classVar: :class:`Orange.data.feature.Feature` 
     138 
     139    """ 
     140    def __init__(self, classifiers, name, domain, classVar, **kwds): 
     141        self.classifiers = classifiers 
     142        self.name = name 
     143        self.domain = domain 
     144        self.classVar = classVar 
    126145        self.__dict__.update(kwds) 
    127146 
    128     def __call__(self, example, resultType = orange.GetValue): 
     147    def __call__(self, instance, resultType = orange.GetValue): 
     148        """ 
     149        :param instance: instance to be classified. 
     150        :type instance: :class:`Orange.data.Instance` 
     151         
     152        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \ 
     153              :class:`Orange.classification.Classifier.GetProbabilities` or 
     154              :class:`Orange.classification.Classifier.GetBoth` 
     155         
     156        :rtype: :class:`Orange.data.Value`,  
     157              :class:`Orange.statistics.Distribution` or a tuple with both 
     158        """ 
    129159        from operator import add 
    130160 
     
    133163            cprob = [0.] * len(self.domain.classVar.values) 
    134164            for c in self.classifiers: 
    135                 a = [x for x in c(example, orange.GetProbabilities)] 
     165                a = [x for x in c(instance, orange.GetProbabilities)] 
    136166                cprob = map(add, cprob, a) 
    137167            norm = sum(cprob) 
     
    145175            cfreq = [0] * len(self.domain.classVar.values) 
    146176            for c in self.classifiers: 
    147                 cfreq[int(c(example))] += 1 
     177                cfreq[int(c(instance))] += 1 
    148178            index = cfreq.index(max(cfreq)) 
    149179            cvalue = Orange.data.Value(self.domain.classVar, index) 
     
    156186 
    157187class MeasureAttribute_randomForests(orange.MeasureAttribute): 
    158     """:param trees: number of trees in the forest. 
    159     :type trees: int 
     188    """ 
    160189    :param learner: although not required, one can use this argument to pass 
    161190        one's own tree induction algorithm. If None is  
    162191        passed, :class:`Orange.ensemble.forest.MeasureAttribute` will  
    163         use Orange's tree induction algorithm such that in  
    164         induction nodes with less then 5 examples will not be  
     192        use Orange's tree induction algorithm such that  
     193        induction nodes with less than 5 data instances will not be  
    165194        considered for (further) splitting. 
    166195    :type learner: None or :class:`Orange.core.Learner` 
    167     :param attributes: number of attributes used in a randomly drawn 
    168         subset when searching for best attribute to split the node in tree 
    169         growing (default: None, and if kept this way, this is turned into 
    170         square root of attributes in example set). 
     196    :param trees: number of trees in the forest. 
     197    :type trees: int 
     198    :param attributes: number of features used in a randomly drawn 
     199            subset when searching for best feature to split the node 
     200            in tree growing (default: None, and if kept this way, this 
     201            is turned into square root of the number of features in the 
     202            training set, when this is presented to learner). 
    171203    :type attributes: int 
    172204    :param rand: random generator used in bootstrap sampling. If None is  
    173205        passed, then Python's Random from random library is used, with seed 
    174         initialized to 0.""" 
     206        initialized to 0. 
     207    """ 
    175208    def __init__(self, learner=None, trees = 100, attributes=None, rand=None): 
    176209 
    177210        self.trees = trees 
    178211        self.learner = learner 
    179         self.bufexamples = None 
     212        self.bufinstances = None 
    180213        self.attributes = attributes 
    181214     
     
    193226          self.rand.seed(0) 
    194227 
    195     def __call__(self, a1, a2, a3=None): 
    196         """Return importance of a given attribute. Can be given by index,  
    197         name or as a Orange.data.feature.Feature.""" 
     228    def __call__(self, feature, instances, apriorClass=None): 
     229        """ 
     230        Return importance of a given feature. 
     231         
     232        :param feature: feature to evaluate (by index, name or 
     233            :class:`Orange.data.feature.Feature` object). 
     234        :type feature: int, str or :class:`Orange.data.feature.Feature`. 
     235         
     236        :param instances: data instances to use for importance evaluation. 
     237        :type instances: :class:`Orange.data.Table` 
     238         
     239        :param apriorClass: not used! 
     240         
     241        """ 
    198242        attrNo = None 
    199         examples = None 
    200  
    201         if type(a1) == int: #by attr. index 
    202           attrNo, examples, apriorClass = a1, a2, a3 
    203         elif type(a1) == type("a"): #by attr. name 
    204           attrName, examples, apriorClass = a1, a2, a3 
    205           attrNo = examples.domain.index(attrName) 
    206         elif isinstance(a1, Orange.data.feature.Feature): 
    207           a1, examples, apriorClass = a1, a2, a3 
    208           atrs = [a for a in examples.domain.attributes] 
    209           attrNo = atrs.index(a1) 
    210         else: 
    211           contingency, classDistribution, apriorClass = a1, a2, a3 
     243 
     244        if type(feature) == int: #by attr. index 
     245          attrNo  = feature 
     246        elif type(feature) == type("a"): #by attr. name 
     247          attrName = feature 
     248          attrNo = instances.domain.index(attrName) 
     249        elif isinstance(feature, Orange.data.feature.Feature): 
     250          atrs = [a for a in instances.domain.attributes] 
     251          attrNo = atrs.index(feature) 
     252        else: 
    212253          raise Exception("MeasureAttribute_rf can not be called with (\ 
    213254                contingency,classDistribution, apriorClass) as fuction arguments.") 
    214255 
    215         self.buffer(examples) 
     256        self.buffer(instances) 
    216257 
    217258        return self.avimp[attrNo]*100/self.trees 
    218259 
    219     def importances(self, examples): 
    220         """Return importances of all attributes in dataset in a list. 
    221         Buffered.""" 
    222         self.buffer(examples) 
     260    def importances(self, table): 
     261        """ 
     262        Return importance of all features in the dataset as a list. The result 
     263        is buffered, so repeated calls on the same (unchanged) dataset are 
     264        computationally cheap. 
     265         
     266        :param table: dataset of which the features' importance needs to be 
     267            measured. 
     268        :type table: :class:`Orange.data.Table`  
     269 
     270        """ 
     271        self.buffer(table) 
    223272     
    224273        return [a*100/self.trees for a in self.avimp] 
    225274 
    226     def buffer(self, examples): 
    227         """Recalcule importances if needed (new examples).""" 
     275    def buffer(self, instances): 
     276        """ 
     277        Recalculate importance of features if needed (ie. if it has been 
     278        buffered for the given dataset yet). 
     279 
     280        :param table: dataset of which the features' importance needs to be 
     281            measured. 
     282        :type table: :class:`Orange.data.Table`  
     283 
     284        """ 
    228285        recalculate = False 
    229286     
    230         if examples != self.bufexamples: 
     287        if instances != self.bufinstances: 
    231288          recalculate = True 
    232         elif examples.version != self.bufexamples.version: 
     289        elif instances.version != self.bufinstances.version: 
    233290          recalculate = True 
    234291          
    235292        if (recalculate): 
    236           self.bufexamples = examples 
    237           self.avimp = [0.0]*len(self.bufexamples.domain.attributes) 
     293          self.bufinstances = instances 
     294          self.avimp = [0.0]*len(self.bufinstances.domain.attributes) 
    238295          self.acu = 0 
    239296       
     
    245302                    self.learner.split.attributes: 
    246303              self.learner.split.attributes = int(sqrt(\ 
    247                             len(examples.domain.attributes))) 
     304                            len(instances.domain.attributes))) 
    248305       
    249           self.importanceAcu(self.bufexamples, self.trees, self.avimp) 
     306          self.importanceAcu(self.bufinstances, self.trees, self.avimp) 
    250307       
    251     def getOOB(self, examples, selection, nexamples): 
     308    def getOOB(self, instances, selection, nexamples): 
    252309        ooblist = filter(lambda x: x not in selection, range(nexamples)) 
    253         return examples.getitems(ooblist) 
     310        return instances.getitems(ooblist) 
    254311 
    255312    def numRight(self, oob, classifier): 
    256         """Return a number of examples which are classified correcty.""" 
     313        """ 
     314        Return a number of instances which are classified correctly. 
     315        """ 
    257316        right = 0 
    258317        for el in oob: 
     
    262321     
    263322    def numRightMix(self, oob, classifier, attr): 
    264         """Return a number of examples  which are classified  
    265         correctly even if an attribute is shuffled.""" 
     323        """ 
     324        Return a number of instances which are classified  
     325        correctly even if a feature is shuffled. 
     326        """ 
    266327        n = len(oob) 
    267328 
     
    280341        return right 
    281342 
    282     def importanceAcu(self, examples, trees, avimp): 
     343    def importanceAcu(self, instances, trees, avimp): 
    283344        """Accumulate avimp by importances for a given number of trees.""" 
    284         n = len(examples) 
    285  
    286         attrs = len(examples.domain.attributes) 
     345        n = len(instances) 
     346 
     347        attrs = len(instances.domain.attributes) 
    287348 
    288349        attrnum = {} 
    289         for attr in range(len(examples.domain.attributes)): 
    290            attrnum[examples.domain.attributes[attr].name] = attr             
     350        for attr in range(len(instances.domain.attributes)): 
     351           attrnum[instances.domain.attributes[attr].name] = attr             
    291352    
    292353        # build the forest 
     
    297358            for j in range(n): 
    298359                selection.append(self.rand.randrange(n)) 
    299             data = examples.getitems(selection) 
     360            data = instances.getitems(selection) 
    300361             
    301362            # build the model from the bootstrap sample 
     
    303364 
    304365            #prepare OOB data 
    305             oob = self.getOOB(examples, selection, n) 
     366            oob = self.getOOB(instances, selection, n) 
    306367             
    307368            #right on unmixed 
     
    310371            presl = list(self.presentInTree(cla.tree, attrnum)) 
    311372                       
    312             #randomize each attribute in data and test 
     373            #randomize each feature in data and test 
    313374            #only those on which there was a split 
    314375            for attr in presl: 
    315376                #calculate number of right classifications 
    316                 #if the values of this attribute are permutated randomly 
     377                #if the values of this features are permutated randomly 
    317378                rightimp = self.numRightMix(oob, cla, attr)                 
    318379                avimp[attr] += (float(right-rightimp))/len(oob) 
     
    320381 
    321382    def presentInTree(self, node, attrnum): 
    322         """Return attributes present in tree (attributes that split).""" 
     383        """Return features present in tree (features that split).""" 
    323384        if not node: 
    324385          return set([]) 
     
    339400        import random 
    340401        self.scons = scons           # split constructor of original tree 
    341         self.attributes = attributes # number of attributes to consider 
     402        self.attributes = attributes # number of features to consider 
    342403        if rand: 
    343404            self.rand = rand             # a random generator 
     
    349410        cand = [1]*self.attributes + [0]*(len(candidates) - self.attributes) 
    350411        self.rand.shuffle(cand) 
    351         # instead with all attributes, we will invoke split constructor  
    352         # only for the subset of a attributes 
     412        # instead with all features, we will invoke split constructor  
     413        # only for the subset of a features 
    353414        t = self.scons(gen, weightID, contingencies, apriori, cand, clsfr) 
    354415        return t 
Note: See TracChangeset for help on using the changeset viewer.