Changeset 8738:07e6b75c594f in orange


Ignore:
Timestamp:
08/23/11 12:28:12 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
9c8374af45e77a1eb9f5cb047118e8d62c5e32cc
Message:

Orange.classification.tree documentation update.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r8258 r8738  
    342342===================== 
    343343 
    344 Split constructors that cannot handle features of particular type 
    345 (discrete, continuous) quitely skip them. Therefore use 
    346 a correct split constructor for your dataset. We suggest a 
    347 :obj:`SplitConstructor_Combined` that delegates features to specialized 
    348 split constructors. 
    349  
    350 The same components can be used either for inducing classification and 
    351 regression trees. The only component that needs to be chosen accordingly 
    352 is the 'measure' attribute for the :obj:`SplitConstructor_Score` class 
    353 (and derived classes). 
     344Split constructor find a suitable criteria for dividing the learning (and 
     345later testing) instances. Those that cannot handle a particular feature 
     346type (discrete, continuous) quitely skip them. Therefore use a correct 
     347split constructor for your dataset, or :obj:`SplitConstructor_Combined` 
     348that delegates features to specialized split constructors. 
     349 
     350The same split constructors can be both for classification and regression 
     351trees, if the 'measure' attribute for the :obj:`SplitConstructor_Score` 
     352class (and derived classes) is set accordingly. 
    354353 
    355354.. class:: SplitConstructor 
     
    565564================= 
    566565 
    567 Just like the :obj:`Descender` decides the branch for an 
    568 instance during classification, the :obj:`Splitter` 
    569 sorts the learning instances into branches. 
    570  
    571 :obj:`Splitter` is given a :obj:`Node` (from which  
    572 it can use different stuff, but most of splitters only use the  
    573 :obj:`branch_selector`), a set of instances to be divided, and  
    574 the weight ID. The result is a list of subsets of instances 
    575 and, optionally, a list of new weight ID's. 
    576  
    577 Most :obj:`Splitter` classes simply call the node's 
    578 :obj:`branch_selector` and assign instances to corresponding branches. When 
    579 the value is unknown they choose a particular branch or simply skip 
    580 the instance. 
    581  
    582 Some enhanced splitters can split instances. An instance (actually, a 
    583 pointer to it) is copied to more than one subset. To facilitate real 
    584 splitting, weights are needed. Each branch is assigned a weight ID (each 
    585 would usually have its own ID) and all instances that are in that branch 
    586 (either completely or partially) should have this meta attribute. If an 
    587 instance hasn't been split, it has only one additional attribute - with 
    588 weight ID corresponding to the subset to which it went. Example that 
    589 is split between, say, three subsets, has three new meta attributes, 
    590 one for each subset. ID's of weight meta attributes are returned by 
    591 the :obj:`Splitter` to be used at induction of the corresponding 
    592 subtrees. 
    593  
    594 Note that weights are used only when needed. When no splitting occured - 
    595 because the splitter is not able to do it or becauser there was no need 
     566Splitters sort learning instances info brances (the branches are selected 
     567with a :obj:`SplitConstructor`, while a :obj:`Descender` decides the 
     568branch for an instance during classification. 
     569 
     570Most splitters simply call :obj:`Node.branch_selector` and assign 
     571instances to correspondingly. When the value is unknown they choose a 
     572particular branch or simply skip the instance. 
     573 
     574 
     575Some enhanced splitters can split instances. An instance (actually, 
     576a pointer to it) is copied to more than one subset. To facilitate 
     577real splitting, weights are needed. Each branch has a weight ID (each 
     578would usually have its own ID) and all instances in that branch (either 
     579completely or partially) should have this meta attribute. If an instance 
     580hasn't been split, it has only one additional attribute - with weight 
     581ID corresponding to the subset to which it went. Instance that is split 
     582between, say, three subsets, has three new meta attributes, one for each 
     583subset. ID's of weight meta attributes returned by the :obj:`Splitter` 
     584are used for the induction of the corresponding subtrees. 
     585 
     586The weights are used only when needed. When no splitting occured - 
     587because the splitter is was unable to do it or because there was no need 
    596588for splitting - no weight ID's are returned. 
    597589 
     
    605597 
    606598    .. method:: __call__(node, instances[, weightID]) 
     599 
     600        :param node: a node. 
     601        :type node: :obj:`Node` 
     602        :param instances: a set of instances 
     603        :param weightID: weight ID.  
    607604         
    608605        Use the information in :obj:`node` (particularly the 
     
    613610        occurs and thus no weights are needed. 
    614611 
     612        Return a list of subsets of instances and, optionally, a list 
     613        of new weight ID's. 
    615614 
    616615.. class:: Splitter_IgnoreUnknowns 
     
    618617    Bases: :class:`Splitter` 
    619618 
    620     Simply ignores the instances for which no single branch can be 
    621     determined. 
     619    Ignores the instances for which no single branch can be determined. 
    622620 
    623621.. class:: Splitter_UnknownsToCommon 
     
    625623    Bases: :class:`Splitter` 
    626624 
    627     Places all such instances to a branch with the highest number of 
     625    Places all ambiguous instances to a branch with the highest number of 
    628626    instances. If there is more than one such branch, one is selected at 
    629627    random and then used for all instances. 
     
    633631    Bases: :class:`Splitter` 
    634632 
    635     Places instances with unknown value of the attribute into all branches. 
     633    Places instances with an unknown value of the feature into all branches. 
    636634 
    637635.. class:: Splitter_UnknownsToRandom 
     
    639637    Bases: :class:`Splitter` 
    640638 
    641     Selects a random branch for such instances. 
     639    Selects a random branch for ambiguous instances. 
    642640 
    643641.. class:: Splitter_UnknownsToBranch 
     
    645643    Bases: :class:`Splitter` 
    646644 
    647     Constructs an additional branch to contain all such instances.  
     645    Constructs an additional branch to contain all ambiguous instances.  
    648646    The branch's description is "unknown". 
    649647 
     
    652650    Bases: :class:`Splitter` 
    653651 
    654     Splits instances with unknown value of the attribute according to 
     652    Splits instances with unknown value of the feature according to 
    655653    proportions of instances in each branch. 
    656654 
     
    659657    Bases: :class:`Splitter` 
    660658 
    661     Splits instances with unknown value of the attribute according to 
    662     distribution proposed by selector (which is in most cases the same 
    663     as proportions of instances in branches). 
     659    Splits instances with unknown value of the feature according to 
     660    distribution proposed by selector (usually the same as proportions 
     661    of instances in branches). 
    664662 
    665663Descenders 
    666664============================= 
    667665 
    668 This is a classifier's counterpart for :class:`Splitter`. It 
    669 decides the destiny of instances that need to be classified and cannot 
    670 be unambiguously put in a branch. 
    671  
     666 
     667Descenders decide the where should the instances that cannot be 
     668unambiguously put in a branch be sorted to (the branches are selected 
     669with a :obj:`SplitConstructor`, while a :obj:`Splitter` sorts instances 
     670during learning). 
    672671 
    673672.. class:: Descender 
    674673 
    675     An abstract base object for tree descenders. 
    676  
    677     It descends a given instance as far deep as possible, 
    678     according to the values of instance's attributes. The :obj:`Descender`: 
    679     calls the node's :obj:`Node.branch_selector` to get the branch index. If 
    680     it's a simple index, the corresponding branch is followed. If not, 
    681     it's up to descender to decide what to do, and that's where descenders 
    682     differ. A descender can choose a single branch (for instance, 
    683     the one that is the most recommended by the :obj:`Node.branch_selector`) 
    684     or it can let the branches vote. 
     674    An abstract base object for tree descenders. It descends a 
     675    given instance as far deep as possible, according to the values 
     676    of instance's features. The :obj:`Descender`: calls the node's 
     677    :obj:`~Node.branch_selector` to get the branch index. If it's a 
     678    simple index, the corresponding branch is followed. If not, it's up 
     679    to descender to decide what to do. A descender can choose a single 
     680    branch (for instance, the one that is the most recommended by the 
     681    :obj:`~Node.branch_selector`) or it can let the branches vote. 
    685682 
    686683    Three are possible outcomes of a descent: 
    687684 
    688     #. Descender reaches a leaf. This happens when 
    689        there were no unknown or out-of-range values, or when the descender 
    690        selected a single branch and continued the descend despite them. In 
    691        this case, the descender returns the reached :obj:`Node`. 
     685    #. The descender reaches a leaf. This happens when 
     686       there were no unknown or out-of-range values, or when the 
     687       descender selected a single branch and continued the descend 
     688       despite them. The descender returns the reached :obj:`Node`. 
    692689    #. Node's :obj:`~Node.branch_selector` returned a distribution and the 
    693690       :obj:`Descender` decided to stop the descend at this (internal) 
    694        node.  The descender returns the current :obj:`Node`. 
     691       node. It returns the current :obj:`Node`. 
    695692    #. Node's :obj:`~Node.branch_selector` returned a distribution and the 
    696693       :obj:`Node` wants to split the instance (i.e., to decide the class 
     
    708705        weights of votes for subtrees (a list of floats). 
    709706 
    710         :obj:`Descender`'s that never split instances always descend to a 
    711         leaf, but they differ in the treatment of instances with unknown 
     707        Descenders that never split instances always descend to a 
     708        leaf. They differ in the treatment of instances with unknown 
    712709        values (or, in general, instances for which a branch cannot be 
    713         determined at some node(s) the tree).  :obj:`Descender`'s that 
     710        determined at some node the tree). Descenders that 
    714711        do split instances differ in returned vote weights. 
    715712 
     
    719716 
    720717    When instance cannot be classified into a single branch, the current 
    721     node is returned. Thus, the node's :obj:`NodeClassifier` will be used 
    722     to make a decision. It is your responsibility to see that even the 
    723     internal nodes have their :obj:`NodeClassifier` (i.e., don't disable 
    724     creating node classifier or manually remove them after the induction, 
    725     that's all) 
     718    node is returned. Thus, the node's :obj:`~Node.node_classifier` 
     719    will be used to make a decision. In such case the internal nodes 
     720    need to have their :obj:`Node.node_classifier` (i.e., don't disable 
     721    creating node classifier or manually remove them after the induction). 
    726722 
    727723.. class:: Descender_UnknownsToBranch 
     
    752748    Bases: :obj:`Descender` 
    753749 
    754     Makes the subtrees vote for the instance's class; the vote is weighted 
     750    The subtrees vote for the instance's class; the vote is weighted 
    755751    according to the sizes of the branches. 
    756752 
     
    759755    Bases: :obj:`Descender` 
    760756 
    761     Makes the subtrees vote for the instance's class; the vote is weighted 
     757    The subtrees vote for the instance's class; the vote is weighted 
    762758    according to the selectors proposal. 
    763759 
     
    768764    pair: classification trees; pruning 
    769765 
    770 Tree pruners derived from :obj:`Pruner` can be given either a :obj:`Node` 
    771 (presumably, but not necessarily a root) or a :obj:`_TreeClassifier`. The 
    772 result is a new :obj:`Node` or a :obj:`_TreeClassifier` with a pruned 
    773 tree. The original tree remains intact. 
    774  
    775 The pruners construct only a shallow copy of a tree.  The pruned tree's 
    776 :obj:`Node` contain references to the same contingency matrices, node 
    777 classifiers, branch selectors, ...  as the original tree. Thus, you may 
    778 modify a pruned tree structure (manually cut it, add new nodes, replace 
    779 components) but modifying, for instance, some node's :obj:`~Node.node_classifier` 
    780 (a :obj:`~Node.node_classifier` itself, not a reference to it!) would modify 
    781 the node's :obj:`~Node.node_classifier` in the corresponding node of the 
    782 original tree. 
     766The pruners construct a shallow copy of a tree.  The pruned tree's 
     767:obj:`Node` contain references to the same contingency matrices, 
     768node classifiers, branch selectors, ...  as the original tree. Thus, 
     769you may modify a pruned tree structure (manually cut it, add new 
     770nodes, replace components) but modifying, for instance, some node's 
     771:obj:`~Node.node_classifier` (a :obj:`~Node.node_classifier` itself, not 
     772a reference to it!) would modify the node's :obj:`~Node.node_classifier` 
     773in the corresponding node of the original tree. 
    783774 
    784775Pruners cannot construct a :obj:`~Node.node_classifier` nor merge 
     
    790781.. class:: Pruner 
    791782 
    792     This is an abstract base class which defines nothing useful, only 
    793     a pure virtual call operator. 
     783    An abstract base class for a tree pruner which defines nothing useful,  
     784    only a pure virtual call operator. 
    794785 
    795786    .. method:: __call__(tree) 
     787 
     788        :param tree: either 
     789            a :obj:`Node` (presumably, but not necessarily a root) or a 
     790            :obj:`_TreeClassifier` (the C++ version of the classifier, 
     791            saved in :obj:`TreeClassfier.base_classifier`). 
    796792 
    797793        Prunes a tree. The argument can be either a tree classifier or 
    798794        a tree node; the result is of the same type as the argument. 
     795        The original tree remains intact. 
    799796 
    800797.. class:: Pruner_SameMajority 
     
    804801    In Orange, a tree can have a non-trivial subtrees (i.e. subtrees with 
    805802    more than one leaf) in which all the leaves have the same majority 
    806     class. (This is allowed because those leaves can still have different 
    807     distributions of classes and thus predict different probabilities.) 
     803    class. This is allowed because those leaves can still have different 
     804    distributions of classes and thus predict different probabilities. 
    808805    However, this can be undesired when we're only interested 
    809806    in the class prediction or a simple tree interpretation. The 
     
    811808    subtree in which all the nodes would have the same majority class. 
    812809 
    813     This pruner will only prune the nodes in which the node classifier  
    814     is of class :obj:`Orange.classification.ConstantClassifier`  
    815     (or from a derived class). 
    816  
    817     Note that the leaves with more than one majority class require some  
    818     special handling. The pruning goes backwards, from leaves to the root. 
     810    This pruner will only prune the nodes in which the node classifier 
     811    is a :obj:`~Orange.classification.majority.ConstantClassifier` 
     812    (or a derived class). 
     813 
     814    The leaves with more than one majority class require some special 
     815    handling. The pruning goes backwards, from leaves to the root. 
    819816    When siblings are compared, the algorithm checks whether they have 
    820817    (at least one) common majority class. If so, they can be pruned. 
     
    935932    the end of this section. 
    936933 
    937  
    938 Examples 
    939 -------- 
     934.. rubric:: Examples 
    940935 
    941936We shall build a small tree from the iris data set - we shall limit the 
     
    11841179:samp:`%A` the average? Doesn't a regression tree always predict the 
    11851180leaf average anyway? Not necessarily, the tree predict whatever the 
    1186 :attr:`TreeClassifier.node_classifier` in a leaf returns.  
     1181:obj:`~Node.node_classifier` in a leaf returns.  
    11871182As :samp:`%V` uses the :obj:`Orange.data.variable.Continuous`' function 
    11881183for printing out the value, therefore the printed number has the same 
Note: See TracChangeset for help on using the changeset viewer.