Changeset 9016:d84078593f5a in orange


Ignore:
Timestamp:
09/26/11 11:35:15 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
a1bacbce942da7117ad6154d436b36b6030b430a
Message:

Updates to Orange.classification.tree.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r8999 r9016  
    6060    .. attribute:: node_classifier 
    6161 
    62         A classifier (usually a :obj:`DefaultClassifier`) that can be used 
    63         to classify instances coming to the node. If the node is a leaf, 
    64         this is used to decide the final class (or class distribution) 
    65         of an instance. If it's an internal node, it is stored if 
    66         :obj:`Node`'s flag :obj:`store_node_classifier` is set. Since 
    67         the :obj:`node_classifier` is needed by :obj:`Descender` and 
    68         for pruning (see far below), this is the default behaviour; 
    69         space consumption of the default :obj:`DefaultClassifier` is 
    70         rather small. You should never disable this if you intend to 
    71         prune the tree later. 
    72  
    73     If the node is a leaf, the remaining fields are None. If it's 
    74     an internal node, there are several additional fields. The lists 
    75     :obj:`branches`, :obj:`branch_descriptions` and :obj:`branch_sizes` 
    76     are of the same length. 
     62        A classifier (usually a :obj:`DefaultClassifier`) for instances 
     63        coming to the node. If the node is a leaf, it chooses the class 
     64        (or class distribution) of an instance. If it's an internal node, 
     65        it is only stored if :obj:`TreeLearner.store_node_classifier` 
     66        is True. 
     67 
     68    If the node is an internal node, there are several additional 
     69    fields. The lists :obj:`branches`, :obj:`branch_descriptions` and 
     70    :obj:`branch_sizes` are of the same length. 
    7771 
    7872    .. attribute:: branches 
    7973 
    80         Stores a list of subtrees, given as :obj:`Node`.  An element 
    81         can be None; in this case the node is empty. 
     74        A list of subtrees, given as :obj:`Node`.  If an element 
     75        is None, the node is empty. 
    8276 
    8377    .. attribute:: branch_descriptions 
    8478 
    8579        A list with string descriptions for branches, constructed by 
    86         :obj:`SplitConstructor`. It can contain different kinds of 
    87         descriptions, but basically, expect things like 'red' or '>12.3'. 
     80        :obj:`SplitConstructor`. It can contain anything, 
     81        for example 'red' or '>12.3'. 
    8882 
    8983    .. attribute:: branch_sizes 
    9084 
    91         Gives a (weighted) number of training instances that went into 
    92         each branch. This can be used later, for instance, for modeling 
     85        A (weighted) number of training instances that went into 
     86        each branch. It can be used, for instance, for modeling 
    9387        probabilities when classifying instances with unknown values. 
    9488 
    9589    .. attribute:: branch_selector 
    9690 
    97         Gives a branch for each instance. The same object is used 
    98         during learning and classifying. The :obj:`branch_selector` 
    99         is of type :obj:`Orange.classification.Classifier`, since its job is 
    100         similar to that of a classifier: it gets an instance and 
    101         returns discrete :obj:`Orange.data.Value` in range :samp:`[0, 
    102         len(branches)-1]`.  When an instance cannot be classified to 
    103         any branch, the selector can return a :obj:`Orange.data.Value` 
    104         containing a special value (sVal) which should be a discrete 
    105         distribution (DiscDistribution). This should represent a 
    106         :obj:`branch_selector`'s opinion of how to divide the instance 
    107         between the branches. Whether the proposition will be used or not 
    108         depends upon the chosen :obj:`Splitter` (when learning) 
     91        A :obj:`~Orange.classification.Classifier` that returns a branch 
     92        for each instance: it gets an instance and returns discrete 
     93        :obj:`Orange.data.Value` in ``[0, len(branches)-1]``.  When an 
     94        instance cannot be classified to any branch, the selector can 
     95        return a discrete distribution, which proposes how to divide 
     96        the instance between the branches. Whether the proposition will 
     97        be used depends upon the chosen :obj:`Splitter` (when learning) 
    10998        or :obj:`Descender` (when classifying). 
    11099 
     
    113102        Return the number of nodes in the subtrees (including the node, 
    114103        excluding null-nodes). 
    115  
    116  
    117104 
    118105======== 
     
    120107======== 
    121108 
    122 For example, here's how to write your own stop function. The example 
    123 constructs and prints two trees. For the first one we define the 
    124 *defStop* function, which is used by default, and combine it with a 
    125 random function so that the stop criteria will also be met in 20% of the 
    126 cases when *defStop* is false. For the second tree the stopping criteria 
    127 is random. Note that in the second case lambda function still has three 
    128 parameters, since this is a necessary number of parameters for the stop 
    129 function (:obj:`StopCriteria`).  
    130  
    131 .. _tree3.py: code/tree3.py 
    132  
    133 .. literalinclude:: code/tree3.py 
    134    :lines: 8-23 
    135  
    136 The output is not shown here since the resulting trees are rather 
    137 big. 
    138  
    139109Tree Structure 
    140110============== 
    141111 
    142 To have something to work on, we'll take the data from lenses dataset and 
    143 build a tree using the default components: 
     112This example explores the tree tructure of a tree build on the 
     113lenses data set: 
    144114 
    145115.. literalinclude:: code/treestructure.py 
    146116   :lines: 7-10 
    147117 
    148 How big is our tree? 
     118The next function counts the number of nodes in a tree: 
    149119 
    150120.. _lenses.tab: code/lenses.tab 
     
    154124   :lines: 12-21 
    155125 
    156 If node is None, we have a null-node; null nodes don't count, so we 
    157 return 0. Otherwise, the size is 1 (this node) plus the sizes of all 
    158 subtrees. The node is an internal node if it has a :obj:`branch_selector`; 
    159 it there's no selector, it's a leaf. Don't attempt to skip the if 
    160 statement: leaves don't have an empty list of branches, they don't have 
    161 a list of branches at all. 
    162  
    163     >>> treeSize(treeClassifier.tree) 
     126If node is None, we return 0. Otherwise, the size is 1 (this node) 
     127plus the sizes of all subtrees. We need to check if the node is an 
     128internal node (it has a :obj:`~Node.branch_selector`), as leaves don't have 
     129the :obj:`~Node.branches` attribute. 
     130 
     131    >>> tree_size(tree_classifier.tree) 
    164132    10 
    165133 
    166 Don't forget that this was only an excercise - :obj:`Node` has a built-in 
    167 method :obj:`Node.treeSize` that does exactly the same. 
     134This was only an excercise - a :obj:`Node` already has a built-in 
     135method :func:`~Node.tree_size`. 
    168136 
    169137Let us now write a script that prints out a tree. The recursive part of 
     
    328296    none (62.50%) 
    329297 
     298 
     299Redefining tree induction components 
     300==================================== 
     301 
     302This example shows how to use a custom stop function.  First, the 
     303``def_stop`` function defines the default stop function. The first tree 
     304has some added randomness: the induction will also stop in 20% of the 
     305cases when ``def_stop`` returns False. The stopping criteria for the 
     306second tree is completely random: it stops induction in 20% of cases. 
     307Note that in the second case lambda function still has three parameters, 
     308since this is a necessary number of parameters for the stop function 
     309(:obj:`StopCriteria`). 
     310 
     311.. _tree3.py: code/tree3.py 
     312 
     313.. literalinclude:: code/tree3.py 
     314   :lines: 8-23 
    330315 
    331316================================= 
     
    20242009    .. attribute:: store_node_classifier 
    20252010 
    2026         Determines whether to store class distributions, contingencies and 
    2027         examples in :class:`Node`, and whether the :obj:`Node.node_classifier` 
    2028         should be build for internal nodes.  No memory will be saved  
    2029         by not storing distributions but storing contingencies, since 
    2030         distributions actually points to the same distribution that is 
    2031         stored in :obj:`contingency.classes`.  By default everything 
    2032         except :obj:`store_instances` is enabled.  
     2011        Determines whether to store class distributions, 
     2012        contingencies and examples in :class:`Node`, and whether the 
     2013        :obj:`Node.node_classifier` should be build for internal nodes 
     2014        also (it is needed by the :obj:`Descender` or for pruning). 
     2015        Not storing distributions but storing contingencies does not 
     2016        save any memory, since distributions actually points to the 
     2017        same distribution that is stored in :obj:`contingency.classes`. 
     2018        By default everything except :obj:`store_instances` is enabled. 
    20332019 
    20342020    """ 
Note: See TracChangeset for help on using the changeset viewer.