Changeset 7560:345ad81b8405 in orange


Ignore:
Timestamp:
02/04/11 22:47:19 (3 years ago)
Author:
crt <crtomir.gorup@…>
Branch:
default
Convert:
ffc845fb4762fc76e6d0136b2ddbd4d36a125890
Message:

First version of Orange.misc.selection, Orange.ensemble.forest now uses Orange2.5 trees.

Location:
orange
Files:
1 added
10 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/ensemble/__init__.py

    r7450 r7560  
    77 
    88 
    9 ======= 
     9******* 
    1010Bagging 
    11 ======= 
     11******* 
    1212 
    1313.. index:: bagging 
     
    2323   :show-inheritance: 
    2424 
    25 ======== 
     25******** 
    2626Boosting 
    27 ======== 
     27******** 
    2828 
    2929.. index:: boosting 
     
    6262 
    6363 
    64 ============= 
     64************* 
    6565Random Forest 
    66 ============= 
     66************* 
    6767 
    6868.. index:: random forest 
     
    105105show how we can assemble a tree learner to be used in random forests. The  
    106106tree induction uses the feature subset split constructor, which we have  
    107 borrowed from :class:`Orange.ensemble` and from which we have requested the 
     107borrowed from :class:`Orange.ensemble.forest` and from which we have requested the 
    108108best feature for decision nodes to be selected from three randomly  
    109109chosen features. 
     
    119119in a constructed random forest. 
    120120 
    121 ================ 
    122 MeasureAttribute 
    123 ================ 
     121     
     122Score Feature 
     123============= 
    124124 
    125125L. Breiman (2001) suggested the possibility of using random forests as a 
    126 non-myopic measure of attribute importance. 
     126non-myopic measure of feature importance. 
    127127 
    128128Assessing relevance of features with random forests is based on the 
     
    136136number of used trees and multiplied by 100 before they are returned. 
    137137 
    138 .. autoclass:: Orange.ensemble.forest.MeasureAttribute_randomForests 
     138.. autoclass:: Orange.ensemble.forest.ScoreFeature 
    139139  :members: 
    140140 
     
    142142importances for all features need to be considered simultaneously. Since we 
    143143normally compute feature importance with random forests for all features in 
    144 the dataset, MeasureAttribute_randomForests caches the results. When it  
     144the dataset, ScoreFeature caches the results. When it  
    145145is called to compute a quality of certain feature, it computes qualities 
    146146for all features in the dataset. When called again, it uses the stored  
     
    152152 
    153153Caching will only have an effect if you use the same 
    154 :class:`Orange.ensemble.forest.MeasureAttribute_randomForests` object for all 
     154:class:`Orange.ensemble.forest.ScoreFeature` object for all 
    155155features in the domain. 
    156156 
     
    178178 
    179179References 
    180 ============ 
     180----------- 
    181181* L Breiman. Bagging Predictors. `Technical report No. 421 \ 
    182182    <http://www.stat.berkeley.edu/tech-reports/421.ps.Z>`_. University of \ 
  • orange/Orange/ensemble/bagging.py

    r7415 r7560  
    44import Orange.core as orange 
    55import Orange 
    6  
    76 
    87class BaggedLearner(orange.Learner): 
     
    5049        :param instances: data instances to learn from. 
    5150        :type instances: Orange.data.Table 
    52         :param weight: ID of meta attribute with weights of instances 
     51        :param weight: ID of meta feature with weights of instances 
    5352        :type weight: int 
    5453        :rtype: :class:`Orange.ensemble.bagging.BaggedClassifier` 
     
    8584    :type name: str 
    8685     
    87     :param classVar: the class attribute. 
     86    :param classVar: the class feature. 
    8887    :type classVar: :class:`Orange.data.feature.Feature` 
    8988 
  • orange/Orange/ensemble/boosting.py

    r7450 r7560  
    108108    :type name: str 
    109109     
    110     :param classVar: the class attribute. 
     110    :param classVar: the class feature. 
    111111    :type classVar: :class:`Orange.data.feature.Feature` 
    112112     
  • orange/Orange/ensemble/forest.py

    r7415 r7560  
    33import Orange 
    44import Orange.feature.scoring 
    5 import orngTree 
    65import random 
    76 
     
    6766        if not learner: 
    6867            # tree learner assembled as suggested by Brieman (2001) 
    69             smallTreeLearner = orngTree.TreeLearner( 
     68            smallTreeLearner = Orange.classification.tree.TreeLearner( 
    7069            storeNodeClassifier = 0, storeContingencies=0,  
    7170            storeDistributions=1, minExamples=5).instance() 
     
    8281         
    8382        :param instances: data instances to learn from. 
    84         :type instances: Orange.data.Table 
     83        :type instances: class:`Orange.data.Table` 
    8584        :param origWeight: weight. 
    8685        :type origWeight: int 
    8786        :rtype: :class:`Orange.ensemble.forest.RandomForestClassifier` 
    88          
    8987        """ 
    9088        # if number of features for subset is not set, use square root 
     
    134132    :type domain: :class:`Orange.data.Domain` 
    135133     
    136     :param classVar: the class attribute. 
     134    :param classVar: the class feature. 
    137135    :type classVar: :class:`Orange.data.feature.Feature` 
    138136 
     
    185183### MeasureAttribute_randomForests 
    186184 
    187 class MeasureAttribute_randomForests(orange.MeasureAttribute): 
     185class ScoreFeature(orange.MeasureAttribute): 
    188186    """ 
    189187    :param learner: although not required, one can use this argument to pass 
  • orange/Orange/misc/__init__.py

    r7450 r7560  
    44 
    55Module Orange.misc contains common functions and classes which are used in other modules. 
    6  
    76 
    87================== 
     
    1413   single: misc; counters 
    1514 
     15.. automodule:: Orange.misc.counters 
     16  :members: 
    1617 
    1718================== 
    18 Renders 
     19Render 
    1920================== 
    2021 
    2122.. index:: misc 
    2223.. index:: 
    23    single: misc; Renders 
     24   single: misc; render 
     25 
     26.. automodule:: Orange.misc.render 
     27  :members: 
    2428 
    2529================== 
    26 Renders 
     30Selection 
    2731================== 
    2832 
     
    3135   single: misc; selection 
    3236 
     37Many machine learning techniques generate a set different solutions or have to 
     38choose, as for instance in classification tree induction, between different 
     39attributes. The most trivial solution is to iterate through the candidates, 
     40compare them and remember the optimal one. The problem occurs, however, when 
     41there are multiple candidates that are equally good, and the naive approaches 
     42would select the first or the last one, depending upon the formulation of 
     43the if-statement. 
     44 
     45:class:`Orange.misc.selection` provides a class that makes a random choice 
     46in such cases. Each new candidate is compared with the currently optimal 
     47one; it replaces the optimal if it is better, while if they are equal, 
     48one is chosen by random. The number of competing optimal candidates is stored, 
     49so in this random choice the probability to select the new candidate (over the 
     50current one) is 1/w, where w is the current number of equal candidates, 
     51including the present one. One can easily verify that this gives equal 
     52chances to all candidates, independent of the order in which they are presented. 
     53 
     54.. automodule:: Orange.misc.selection 
     55  :members: 
     56 
     57Example 
     58-------- 
     59 
     60The following snippet loads the data set lymphography and prints out the 
     61attribute with the highest information gain. 
     62 
     63part of `misc-selection-bestonthefly.py`_ (uses `lymphography.tab`_) 
     64 
     65.. literalinclude:: code/misc-selection-bestonthefly.py 
     66  :lines: 7-16 
     67 
     68Our candidates are tuples gain ratios and attributes, so we set 
     69:obj:`callCompareOn1st` to make the compare function compare the first element 
     70(gain ratios). We could achieve the same by initializing the object like this: 
     71 
     72part of `misc-selection-bestonthefly.py`_ (uses `lymphography.tab`_) 
     73 
     74.. literalinclude:: code/misc-selection-bestonthefly.py 
     75  :lines: 18-18 
     76 
     77 
     78The other way to do it is through indices. 
     79 
     80`misc-selection-bestonthefly.py`_ (uses `lymphography.tab`_) 
     81 
     82.. literalinclude:: code/misc-selection-bestonthefly.py 
     83  :lines: 25- 
     84 
     85.. _misc-selection-bestonthefly.py: code/misc-selection-bestonthefly.py.py 
     86.. _lymphography.tab: code/lymphography.tab 
     87 
     88Here we only give gain ratios to :obj:`bestOnTheFly`, so we don't have to specify a 
     89special compare operator. After checking all features we get the index of the  
     90optimal one by calling :obj:`winnerIndex`. 
    3391 
    3492""" 
     
    111169    def getstring(self): 
    112170        progchar = int(round(float(self.state) * (self.charwidth - 5) / 100.0)) 
    113         return self.title + "=" * (progchar) + ">" + " " * (self.charwidth - 5 - progchar) + "%3i" % int(round(self.state)) + "%" 
     171        return self.title + "=" * (progchar) + ">" + " " * (self.charwidth\ 
     172            - 5 - progchar) + "%3i" % int(round(self.state)) + "%" 
    114173 
    115174    def printline(self, string): 
  • orange/Orange/misc/selection.py

    r7450 r7560  
    22 
    33class BestOnTheFly: 
     4    """ 
     5    Finds the optimal object in a sequence of objects. The class is fed the 
     6    candidates one by one, and remembers the winner. It can thus be used by 
     7    methods that generate different solutions to a problem and need to 
     8    select the optimal one, but do not want to store them all. 
     9     
     10    :param compare: compare function. 
     11    :param seed: If not given, a random seed of 0 is used to ensure that\ 
     12    the same experiment always gives the same results, despite\ 
     13    pseudo-randomness.random seed. 
     14    :type seed: int 
     15    :param callCompareOn1st: If set, :obj:`BestOnTheFly` will suppose\ 
     16    that the candidates are lists are tuples, and it will call compare\ 
     17    with the first element of the tuple. 
     18    :type callCompareOn1st: bool 
     19    """ 
     20     
    421    def __init__(self, compare=cmp, seed = 0, callCompareOn1st = False): 
    522        self.randomGenerator = random.Random(seed) 
     
    1128 
    1229    def candidate(self, x): 
     30        """Add new candidate. 
     31         
     32        :param x: new candidate. 
     33        :type x: object""" 
    1334        self.index += 1 
    1435        if not self.wins: 
     
    3657 
    3758    def winner(self): 
     59        """Return (currently) optimal object. This function can be called 
     60        any number of times, even when the candidates are still coming. 
     61         
     62        :rtype: object""" 
    3863        return self.best 
    3964 
    4065    def winnerIndex(self): 
     66        """Return the index of the optimal object within the sequence of 
     67        the candidates. 
     68         
     69        :rtype: int""" 
    4170        if self.best is not None: 
    4271            return self.bestIndex 
     
    4574 
    4675def selectBest(x, compare=cmp, seed = 0, callCompareOn1st = False): 
     76    """Return the optimal object from list x. The function is used if the candidates 
     77    are already in the list, so using the more complicated :obj:`BestOnTheFly` directly is 
     78    not needed. 
     79 
     80    To demonstrate the use of :obj:`BestOnTheFly` see the implementation of 
     81    :obj:`selectBest`:: 
     82     
     83      def selectBest(x, compare=cmp, seed = 0, callCompareOn1st = False): 
     84          bs=BestOnTheFly(compare, seed, callCompareOn1st) 
     85          for i in x: 
     86              bs.candidate(i) 
     87          return bs.winner() 
     88 
     89    :param x: list of existing candidates. 
     90    :type x: list 
     91    :param compare: compare function. 
     92    :param seed: If not given, a random seed of 0 is used to ensure that\ 
     93    the same experiment always gives the same results, despite\ 
     94    pseudo-randomness.random seed. 
     95    :type seed: int 
     96    :param callCompareOn1st: If set, :obj:`BestOnTheFly` will suppose\ 
     97    that the candidates are lists are tuples, and it will call compare\ 
     98    with the first element of the tuple. 
     99    :type callCompareOn1st: bool 
     100    :rtype: object""" 
    47101    bs=BestOnTheFly(compare, seed, callCompareOn1st) 
    48102    for i in x: 
     
    51105 
    52106def selectBestIndex(x, compare=cmp, seed = 0, callCompareOn1st = False): 
     107    """Similar to :obj:`selectBest` except that it doesn't return the best object 
     108    but its index in the list x.""" 
    53109    bs=BestOnTheFly(compare, seed, callCompareOn1st) 
    54110    for i in x: 
     
    56112    return bs.winnerIndex() 
    57113 
    58 def compare2_firstBigger(x, y): 
     114# def compare2_firstBigger(x, y): 
     115def compareFirstBigger(x, y): 
    59116    """Function takes two lists and compares first elements. 
    60117     
    61     :param x: list of values 
     118    :param x: list of values. 
    62119    :type x: list 
    63     :param y: list of values 
     120    :param y: list of values. 
    64121    :type y: list 
    65122    :rtype:  cmp(x[0], y[0])""" 
    66123    return cmp(x[0], y[0]) 
    67124 
    68 def compare2_firstSmaller(x, y): 
     125#def compare2_firstSmaller(x, y): 
     126def compareFirstSmaller(x, y): 
    69127    """Function takes two lists and compares first elements. 
    70128     
    71     :param x: list of values 
     129    :param x: list of values. 
    72130    :type x: list 
    73     :param y: list of values 
     131    :param y: list of values. 
    74132    :type y: list 
    75133    :rtype:  -cmp(x[0], y[0])""" 
    76134    return -cmp(x[0], y[0]) 
    77135 
    78 def compare2_lastBigger(x, y): 
     136#     def compare2_lastBigger(x, y): 
     137def compareLastBigger(x, y): 
    79138    """Function takes two lists and compares last elements. 
    80139     
    81     :param x: list of values 
     140    :param x: list of values. 
    82141    :type x: list 
    83     :param y: list of values 
     142    :param y: list of values. 
    84143    :type y: list 
    85144    :rtype:  cmp(x[0], y[0])""" 
    86145    return cmp(x[-1], y[-1]) 
    87146 
    88 def compare2_lastSmaller(x, y): 
     147#    def compare2_lastSmaller(x, y): 
     148def compareLastSmaller(x, y): 
    89149    """Function takes two lists and compares last elements. 
    90150     
    91     :param x: list of values 
     151    :param x: list of values. 
    92152    :type x: list 
    93     :param y: list of values 
     153    :param y: list of values. 
    94154    :type y: list 
    95155    :rtype:  -cmp(x[0], y[0])""" 
    96156    return -cmp(x[-1], y[-1]) 
    97  
    98 def compare2_bigger(x, y): 
    99     """Function takes two numbers and compares first elements. 
     157     
     158#     def compare2_bigger(x, y): 
     159def compareBigger(x, y): 
     160    """Function takes and compares two numbers. 
    100161     
    101162    :param x: value. 
    102163    :type x: int 
    103     :param y: values. 
     164    :param y: value. 
    104165    :type y: int 
    105166    :rtype:  cmp(x, y)""" 
    106167    return cmp(x, y) 
    107  
    108 def compare2_smaller(x, y): 
    109     """Function takes two numbers and compares first elements. 
     168#     def compare2_smaller(x, y): 
     169def compareSmaller(x, y): 
     170    """Function takes and compares two numbers. 
    110171     
    111172    :param x: value. 
    112173    :type x: int 
    113     :param y: values. 
     174    :param y: value. 
    114175    :type y: int 
    115     :rtype:  cmp(x, y)""" 
     176    :rtype: cmp(x, y)""" 
    116177    return -cmp(x, y) 
  • orange/doc/Orange/rst/Orange.ensemble.rst

    r7264 r7560  
    1 ================ 
     1################ 
    22Orange.ensemble 
    3 ================ 
     3################ 
    44 
    55.. automodule:: Orange.ensemble 
     6 
  • orange/doc/Orange/rst/Orange.misc.rst

    r7489 r7560  
    1 =========== 
     1########### 
    22Orange.misc 
    3 =========== 
     3########### 
    44 
    55.. automodule:: Orange.misc 
  • orange/doc/Orange/rst/code/ensemble-forest-measure.py

    r7399 r7560  
    1010table = Orange.data.Table("iris.tab") 
    1111 
    12 measure = Orange.ensemble.forest.MeasureAttribute_randomForests(trees=100) 
     12measure = Orange.ensemble.forest.ScoreFeature(trees=100) 
    1313 
    1414#call by attribute index 
     
    1919 
    2020print "different random seed" 
    21 measure = Orange.ensemble.forest.MeasureAttribute_randomForests(trees=100,  
     21measure = Orange.ensemble.forest.ScoreFeature(trees=100,  
    2222        rand=random.Random(10)) 
    2323 
  • orange/orngEnsemble.py

    r7369 r7560  
    22from Orange.ensemble.boosting import * 
    33from Orange.ensemble.forest import * 
     4from Orange.ensemble.forest import ScoreFeature as MeasureAttribute_randomForests 
Note: See TracChangeset for help on using the changeset viewer.