Changeset 8192:14ad491155c0 in orange


Ignore:
Timestamp:
08/17/11 12:49:22 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
831dc26c4e5fface4fad8d0c3afb96d6d8d7e958
Message:

Orange.classification.tree: joined documentation of TreeLearner and
_TreeLearner. Only the TreeLearner now exists in the documentation.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r8185 r8192  
    434434        predictions. This method simply chooses the most probable class. 
    435435 
    436 .. class:: _TreeLearner 
    437  
    438     The main learning object is :obj:`_TreeLearner`. It is basically a 
    439     skeleton into which the user must plug the components for particular 
    440     functions. This class is not meant to be used directly. You should 
    441     rather use :class:`TreeLearner`. 
    442  
    443     Components that govern the structure of the tree are 
    444     :obj:`split` (of type :obj:`SplitConstructor`), :obj:`stop` 
    445     (of type :obj:`StopCriteria` and :obj:`example_splitter` (of type 
    446     :obj:`ExampleSplitter`). 
    447  
    448     .. attribute:: split 
    449  
    450         Object of type :obj:`SplitConstructor`. Default value, provided 
    451         by :obj:`_TreeLearner`, is :obj:`SplitConstructor_Combined` with 
    452         separate constructors for discrete and continuous attributes. 
    453         Discrete attributes are used as are, while continuous attributes 
    454         are binarized. Gain ratio is used to select attributes.  A minimum 
    455         of two examples in a leaf is required for discreter and five 
    456         examples in a leaf for continuous attributes. 
    457  
    458     .. attribute:: stop 
    459  
    460         Object of type :obj:`StopCriteria`. The default stopping 
    461         criterion stops induction when all examples in a node belong to 
    462         the same class. 
    463  
    464     .. attribute:: example_splitter 
    465  
    466         Object of type :obj:`ExampleSplitter`. The default splitter is 
    467         :obj:`ExampleSplitter_UnknownsAsSelector` that splits the learning 
    468         examples according to distributions given by the selector. 
    469  
    470     .. attribute:: contingency_computer 
    471      
    472         By default, this slot is left empty and ordinary contingency 
    473         matrices are computed for examples at each node. If need 
    474         arises, one can change the way the matrices are computed. This 
    475         can be used to change the way that unknown values are treated 
    476         when assessing qualities of attributes. As mentioned earlier, 
    477         the computed matrices can be used by split constructor and by 
    478         stopping criteria. On the other hand, they can be (and are) 
    479         ignored by some splitting constructors. 
    480  
    481     .. attribute:: node_learner 
    482  
    483         Induces a classifier from examples belonging to a 
    484         node. The same learner is used for internal nodes 
    485         and for leaves. The default :obj:`node_learner` is 
    486         :obj:`Orange.classification.majority.MajorityLearner`. 
    487  
    488     .. attribute:: descender 
    489  
    490         Descending component that the induces :obj:`TreeClassifier` will 
    491         use. Default descender is :obj:`Descender_UnknownMergeAsSelector` 
    492         which votes using the :obj:`branch_selector`'s distribution for 
    493         vote weights. 
    494  
    495     .. attribute:: max_depth 
    496  
    497         Gives maximal tree depth; 0 means that only root is generated. 
    498         The default is 100 to prevent any infinite tree induction due to 
    499         missettings in stop criteria. If you are sure you need larger 
    500         trees, increase it. If you, on the other hand, want to lower 
    501         this hard limit, you can do so as well. 
    502  
    503     .. attribute:: store_distributions, store_contingencies, store_examples, store_node_classifier 
    504  
    505         Decides whether to store class distributions, contingencies and 
    506         examples in :obj:`Node`, and whether the :obj:`node_classifier` 
    507         should be build for internal nodes.  By default, distributions and 
    508         node classifiers are stored, while contingencies and examples are 
    509         not. You won't save any memory by not storing distributions but 
    510         storing contingencies, since distributions actually points to the 
    511         same distribution that is stored in :obj:`contingency.classes`. 
    512  
    513     The :obj:`_TreeLearner` first sets the defaults for missing 
    514     components. Although stored in the actual :obj:`_TreeLearner`'s 
    515     fields, they are removed when the induction is finished. 
    516  
    517     Then it ensures that examples are stored in a table. This is 
    518     needed because the algorithm juggles with pointers to examples. If 
    519     examples are in a file or are fed through a filter, they are copied 
    520     to a table. Even if they are already in a table, they are copied if 
    521     :obj:`store_examples` is set. This is to assure that pointers remain 
    522     pointing to examples even if the user later changes the example 
    523     table. If they are in the table and the :obj:`store_examples` flag 
    524     is clear, we just use them as they are. This will obviously crash 
    525     in a multi-threaded system if one changes the table during the tree 
    526     induction. Well... don't do it. 
    527  
    528     Apriori class probabilities are computed. At this point we check 
    529     the sum of example weights; if it's zero, there are no examples and 
    530     we cannot proceed. A list of candidate attributes is set; in the 
    531     beginning, all attributes are candidates for the split criterion. 
    532  
    533     Now comes the recursive part of the :obj:`_TreeLearner`. Its arguments 
    534     are a set of examples, a weight meta-attribute ID (a tricky thing, 
    535     it can be always the same as the original or can change to accomodate 
    536     splitting of examples among branches), apriori class distribution 
    537     and a list of candidates (represented as a vector of Boolean values). 
    538  
    539     The contingency matrix is computed next. This happens even if the flag 
    540     :obj:`store_contingencies` is false.  If the :obj:`contingency_computer` 
    541     is given we use it, otherwise we construct just an ordinary 
    542     contingency matrix. 
    543  
    544     A :obj:`stop` is called to see whether it's worth to continue. If 
    545     not, a :obj:`node_classifier` is built and the :obj:`Node` is 
    546     returned. Otherwise, a :obj:`node_classifier` is only built if 
    547     :obj:`forceNodeClassifier` flag is set. 
    548  
    549     To get a :obj:`Node`'s :obj:`node_classifier`, the 
    550     :obj:`node_learner`'s :obj:`smart_learn` function is called 
    551     with the given examples, weight ID and the just computed 
    552     matrix. If the learner can use the matrix (and the default, 
    553     :obj:`Orange.classification.majority.MajorityLearner`, can), it won't 
    554     touch the examples. Thus, a choice of :obj:`contingency_computer` 
    555     will, in many cases, affect the :obj:`node_classifier`. The 
    556     :obj:`node_learner` can return no classifier; if so and 
    557     if the classifier would be needed for classification, 
    558     the :obj:`TreeClassifier`'s function returns DK or an empty 
    559     distribution. If you're writing your own tree classifier - pay 
    560     attention. 
    561  
    562     If the induction is to continue, a :obj:`split` component is called. 
    563     If it fails to return a branch selector, induction stops and the 
    564     :obj:`Node` is returned. 
    565  
    566     :obj:`_TreeLearner` than uses :obj:`ExampleSplitter` to divide the 
    567     examples as described above. 
    568  
    569     The contingency gets removed at this point if it is not to be 
    570     stored. Thus, the :obj:`split`, :obj:`stop` and :obj:`example_splitter` 
    571     can use the contingency matrices if they will. 
    572  
    573     The :obj:`_TreeLearner` then recursively calls itself for each of 
    574     the non-empty subsets. If the splitter returnes a list of weights, 
    575     a corresponding weight is used for each branch. Besides, the attribute 
    576     spent by the splitter (if any) is removed from the list of candidates 
    577     for the subtree. 
    578  
    579     A subset of examples is stored in its corresponding tree node, 
    580     if so requested. If not, the new weight attributes are removed 
    581     (if any were created). 
    582  
    583  
    584436Split constructors 
    585437===================== 
     
    628480    each branch. Just what we need for the :obj:`Node`.  It can return 
    629481    an empty list for the number of examples in branches; in this case, 
    630     the :obj:`_TreeLearner` will find the number itself after splitting 
     482    the :obj:`TreeLearner` will find the number itself after splitting 
    631483    the example set into subsets. However, if a split constructors can 
    632484    provide the numbers at no extra computational cost, it should do so. 
     
    845697        really need them. 
    846698 
    847  
    848  
    849699.. class:: StopCriteria_common 
    850700 
     
    864714        Example count is weighed. 
    865715 
    866 .. class:: StopCriteria_Python 
    867  
    868     Undocumented. 
    869716 
    870717Example Splitters 
     
    21061953class TreeLearner(Orange.core.Learner): 
    21071954    """ 
    2108     Assembles the classification or regression tree learner.  Essentially, 
    2109     :class:`TreeLearner` is a wrapper around :class:`_TreeLearner` and 
    2110     provides easier use of the latter.  It sets parameters for tree 
    2111     induction, that are controlled through the object's attributes. 
    2112     the object's attributes. If upon initialization :class:`TreeLearner` 
     1955    A classification or regression tree learner. 
     1956    If upon initialization :class:`TreeLearner` 
    21131957    is given a set of instances, then an :class:`TreeClassifier` object 
    2114     is built and returned instead. 
    2115  
    2116     Attributes can be also be set on initialization.  
     1958    is built and returned instead. Attributes can be also be set on initialization.  
     1959 
     1960    **The tree building process** 
     1961 
     1962    #. The learning instances are stored in a table, 
     1963       because the algorithm works with pointers to instances. If instances 
     1964       are in a file or are fed through a filter, they are copied to a 
     1965       table. Even if they are already in a table, they are copied if 
     1966       :obj:`store_examples` is `True`. 
     1967    #. Apriori class probabilities are computed. If the sum 
     1968       of instance weights is zero, there are no instances so the process 
     1969       stops. A list of candidate attributes for the split is compiled; 
     1970       in the beginning, all attributes are candidates. 
     1971    #. The recursive part. Its 
     1972       arguments are a set of instances, a weight meta-attribute ID 
     1973       (it can be always the same as the original or can change to 
     1974       accomodate splitting of instances among branches), apriori class 
     1975       distribution and a list of candidates (represented as a vector 
     1976       of Boolean values). 
     1977    #. The contingency matrix is computed.   
     1978    #. A :obj:`stop` is called 
     1979       to see whether it is worth to continue. If not, a 
     1980       :obj:`node_classifier` is built and the :obj:`Node` is 
     1981       returned. Otherwise, a :obj:`node_classifier` is only built if 
     1982       :obj:`store_node_classifier` is `True`.  The :obj:`node_classifier` 
     1983       is build by calling :obj:`node_learner`'s :obj:`smart_learn` 
     1984       function with the given instances, weight ID and the just computed 
     1985       matrix. If the learner can use the matrix (and the default, 
     1986       :obj:`~Orange.classification.majority.MajorityLearner`, can), it 
     1987       won't touch the instances. Therefore a :obj:`contingency_computer` 
     1988       will, in many cases, affect the :obj:`node_classifier`. The 
     1989       :obj:`node_learner` can return no classifier; if so and 
     1990       if the classifier would be needed for classification, the 
     1991       :obj:`TreeClassifier`'s function returns DK or an empty 
     1992       distribution. 
     1993    #. If the induction is to continue, a :obj:`split` is called. 
     1994       If it fails to return a branch selector, induction stops and the 
     1995       :obj:`Node` is returned. 
     1996    #. Instances are divided (into child nodes) with :obj:`splitter`. 
     1997    #. The contingency gets removed if :obj:`store_contingencies` is 
     1998       `False`.  Thus, :obj:`split`, :obj:`stop` and :obj:`splitter` 
     1999       can use the contingency matrices. 
     2000    #. The object recursively calls itself (see step 3) for each of 
     2001       the non-empty subsets. If the splitter returnes a list of weights, 
     2002       a corresponding weight is used for each branch. The attribute spent 
     2003       by the splitter (if any) is removed from the list of candidates 
     2004       for the subtree. 
     2005    #. A subset of instances is stored in its corresponding tree node, 
     2006       if :obj:`store_examples` is `True`. If not, the new weight 
     2007       attributes are removed (if any were created). 
     2008 
     2009    **Attributes** 
    21172010 
    21182011    .. attribute:: node_learner 
     
    21212014        used for internal nodes and leaves. The default is 
    21222015        :obj:`Orange.classification.majority.MajorityLearner`. 
     2016 
     2017    .. attribute:: descender 
     2018 
     2019        Descending component that the induces :obj:`TreeClassifier` will 
     2020        use. Default descender is :obj:`Descender_UnknownMergeAsSelector` 
     2021        which votes using the :obj:`branch_selector`'s distribution for 
     2022        vote weights. 
    21232023 
    21242024    **Split construction** 
     
    21482048     
    21492049        Measure for scoring of the attributes when deciding which of the 
    2150         attributes will be used for splitting of the example set in the node. 
     2050        attributes will be used for splitting of the instances in a node. 
    21512051        Can be either a :class:`Orange.feature.scoring.Measure` or one of 
    21522052        "infoGain" (:class:`Orange.feature.scoring.InfoGain`),  
     
    21652065        signature as :obj:`ExampleSplitter.__call__`. The default is 
    21662066        :class:`ExampleSplitter_UnknownsAsSelector` that splits the 
    2167         learning examples according to distributions given by the 
     2067        learning instances according to distributions given by the 
    21682068        selector. 
     2069 
     2070    .. attribute:: contingency_computer 
     2071     
     2072        By default, this slot is left empty and ordinary contingency 
     2073        matrices are computed for instances at each node. If need 
     2074        arises, one can change the way the matrices are computed, 
     2075        for example to change the treatment of unknown values. The 
     2076        computed matrices can be used by split constructor and by 
     2077        stopping criteria. 
    21692078 
    21702079    **Pruning** 
     
    22222131        induction algorithms.  When used, parameters  :obj:`max_majority` 
    22232132        and :obj:`min_examples` will not be  considered.  The default 
    2224         stopping criterion stops induction when all examples in a node 
     2133        stopping criterion stops induction when all instances in a node 
    22252134        belong to the same class. 
    22262135 
     
    22392148    **Record keeping** 
    22402149 
    2241     .. attribute:: store_distributions, store_contingencies, store_examples, store_node_classifier 
     2150    .. attribute:: store_distributions  
     2151     
     2152    .. attribute:: store_contingencies 
     2153     
     2154    .. attribute:: store_examples 
     2155     
     2156    .. attribute:: store_node_classifier 
    22422157 
    22432158        Determines whether to store class distributions, contingencies and 
    22442159        examples in :class:`Node`, and whether the :obj:`node_classifier` 
    2245         should be build for internal nodes.  You won't save any memory 
     2160        should be build for internal nodes.  No memory will be saved  
    22462161        by not storing distributions but storing contingencies, since 
    22472162        distributions actually points to the same distribution that is 
    22482163        stored in :obj:`contingency.classes`.  By default everything 
    2249         except :obj:`store_examples` is enabled. 
    2250      
     2164        except :obj:`store_examples` is enabled.  
     2165 
    22512166    """ 
    22522167    def __new__(cls, examples = None, weightID = 0, **argkw): 
     
    22942209          
    22952210        if self.splitter != None: 
    2296             bl.splitter = self.splitter 
     2211            bl.example_splitter = self.splitter 
    22972212 
    22982213        #post pruning 
     
    24102325 
    24112326        for a in ["store_distributions", "store_contingencies", "store_examples",  
    2412             "store_node_classifier", "node_learner", "max_depth"]: 
     2327            "store_node_classifier", "node_learner", "max_depth", "contingency_computer", "descender" ]: 
    24132328            if hasattr(self, a): 
    24142329                setattr(learner, a, getattr(self, a)) 
Note: See TracChangeset for help on using the changeset viewer.