Changeset 9023:e19d6e15d5c7 in orange

09/26/11 15:32:22 (3 years ago)

Orange.classification.Tree: edited the Examples section.

1 edited


  • orange/Orange/classification/

    r9016 r9023  
    2424This page first describes the learner and the classifier, and then 
    25 defines the base classes (individual components) of the trees and the 
     25the individual components of the trees and the 
    2626tree-building process. 
    3434.. class:: Node 
    36     Classification trees are represented as a tree-like hierarchy of 
     36    Classification trees are represented as a hierarchy of 
    3737    :obj:`Node` classes. 
    6666        is True. 
    68     If the node is an internal node, there are several additional 
    69     fields. The lists :obj:`branches`, :obj:`branch_descriptions` and 
     68    Internal nodes have additional 
     69    attributes. The lists :obj:`branches`, :obj:`branch_descriptions` and 
    7070    :obj:`branch_sizes` are of the same length. 
    7272    .. attribute:: branches 
    74         A list of subtrees, given as :obj:`Node`.  If an element 
    75         is None, the node is empty. 
     74        A list of subtrees. Each element is a :obj:`Node` or None. 
     75        If None, the node is empty. 
    7777    .. attribute:: branch_descriptions 
    79         A list with string descriptions for branches, constructed by 
     79        A list with string describing branches, which are constructed by 
    8080        :obj:`SplitConstructor`. It can contain anything, 
    8181        for example 'red' or '>12.3'. 
    8383    .. attribute:: branch_sizes 
    85         A (weighted) number of training instances that went into 
     85        A (weighted) number of training instances for  
    8686        each branch. It can be used, for instance, for modeling 
    8787        probabilities when classifying instances with unknown values. 
    9191        A :obj:`~Orange.classification.Classifier` that returns a branch 
    92         for each instance: it gets an instance and returns discrete 
     92        for each instance: it returns discrete 
    9393        :obj:`` in ``[0, len(branches)-1]``.  When an 
    94         instance cannot be classified to any branch, the selector can 
     94        instance cannot be classified unambiguously, the selector can 
    9595        return a discrete distribution, which proposes how to divide 
    9696        the instance between the branches. Whether the proposition will 
    97         be used depends upon the chosen :obj:`Splitter` (when learning) 
    98         or :obj:`Descender` (when classifying). 
     97        be used depends upon the :obj:`Splitter` (for learning) 
     98        or :obj:`Descender` (for classification). 
    100100    .. method:: tree_size() 
    112 This example explores the tree tructure of a tree build on the 
     112This example works with the 
    113113lenses data set: 
    135135method :func:`~Node.tree_size`. 
    137 Let us now write a script that prints out a tree. The recursive part of 
    138 the function will get a node and its level. 
     137Trees can be printed with a recursive function: 
    140139.. literalinclude:: code/ 
    141140   :lines: 26-41 
    143 Don't waste time on studying formatting tricks (\n's etc.), this is just 
    144 for nicer output. What matters is everything but the print statements. 
     142The crux of the example is not in the formatting (\\n's etc.); 
     143what matters is everything but the print statements. 
    145144As first, we check whether the node is a null-node (a node to which no 
    146145learning instances were classified). If this is so, we just print out 
    149148After handling null nodes, remaining nodes are internal nodes and 
    150 leaves.  For internal nodes, we print a node description consisting 
    151 of the feature's name and distribution of classes. :obj:`Node`'s 
    152 branch description is, for all currently defined splits, an instance 
    153 of a class derived from :obj:`Orange.classification.Classifier`  
    154 (in fact, it is 
    155 a :obj:`orange.ClassifierFromVarFD`, but a :obj:`Orange.classification.Classifier` 
    156 would suffice), and its :obj:`class_var` points to the attribute we seek. 
    157 So we print its name. We will also assume that storing class distributions 
    158 has not been disabled and print them as well.  Then we iterate through 
    159 branches; for each we print a branch description and iteratively call the 
    160 :obj:`printTree0` with a level increased by 1 (to increase the indent). 
    162 Finally, if the node is a leaf, we print out the distribution of learning 
     149leaves. For internal nodes, we print a node description consisting of 
     150the feature's name and distribution of classes. :obj:`Node`'s branch 
     151description is an instance of :obj:`~Orange.classification.Classifier`, 
     152and its ``class_var`` is the feature whose name is printed. 
     153Class distributions are printed as well (they are assumed to be strored).  
     154Then we branch description for each branch and recursively call  
     155:obj:`printTree0` with a level increased by 1 to increase the indent. 
     157Finally, if the node is a leaf, we print the distribution of learning 
    163158instances in the node and the class to which the instances in the node 
    164 would be classified. We again assume that the :obj:`~Node.node_classifier` is 
    165 the default one - a :obj:`DefaultClassifier`. A better print function 
     159would be classified. We assume that the :obj:`~Node.node_classifier` is 
     160a :obj:`DefaultClassifier`. A better print function 
    166161should be aware of possible alternatives. 
    168 Now, we just need to write a simple function to call our printTree0. 
    169 We could write something like... 
    171 :: 
    173     def printTree(x): 
    174         printTree0(x.tree, 0) 
    176 ... but we won't. Let us learn how to handle arguments of 
    177 different types. Let's write a function that will accept either a 
    178 :obj:`TreeClassifier` or a :obj:`Node`. 
     163If the print-out function needs to accept either a 
     164:obj:`TreeClassifier` or a :obj:`Node`, it can be written as follows: 
    180166.. literalinclude:: code/ 
    181167   :lines: 43-49 
    183 It's fairly straightforward: if :obj:`x` is of type derived from 
    184 :obj:`TreeClassifier`, we print :obj:`x.tree`; if it's :obj:`Node` we 
    185 just call :obj:`printTree0` with :obj:`x`. If it's of some other type, 
    186 we don't know how to handle it and thus raise an exception. The output:: 
    188     >>> printTree(treeClassifier) 
     169It's fairly straightforward: if ``x`` is a 
     170:obj:`TreeClassifier`, we print ``x.tree``; if it's :obj:`Node` we 
     171just call ``printTree0`` with `x`. If it's of some other type, 
     172we raise an exception. The output:: 
     174    >>> print_tree(tree_classifier) 
    189175    tear_rate (<15.000, 5.000, 4.000>) 
    190176    : reduced --> none (<12.000, 0.000, 0.000>) 
    201187          : hypermetrope --> none (<2.000, 0.000, 1.000>) 
    203 For a final exercise, let us write a simple pruning function. It will 
    204 be written entirely in Python, unrelated to any :obj:`Pruner`. It will 
    205 limit the maximal tree depth (the number of internal nodes on any path 
     189We conclude the tree structure examples with a simple pruning  
     190function, written entirely in Python and unrelated to any :obj:`Pruner`. It  
     191limits the maximal tree depth (the number of internal nodes on any path 
    206192down the tree) given as an argument.  For example, to get a two-level 
    207 tree, we would call cutTree(root, 2). The function will be recursive, 
     193tree, call cut_tree(root, 2). The function ise recursive, 
    208194with the second argument (level) decreasing at each call; when zero, 
    209195the current node will be made a leaf: 
    212198   :lines: 54-62 
    214 There's nothing to prune at null-nodes or leaves, so we act only when 
    215 :obj:`node` and :obj:`node.branch_selector` are defined. If level is 
    216 not zero, we call the function for each branch. Otherwise, we clear the 
     200The function acts only when 
     201:obj:`node` and :obj:`node.branch_selector` are defined. If the level is 
     202not zero, is recursively calls  the function for each branch. Otherwise, it clears the 
    217203selector, branches and branch descriptions. 
    226212       : yes --> hard (<2.000, 0.000, 4.000>) 
    228 Learning 
    229 ======== 
    231 You could just call :class:`TreeLearner` and let it fill the empty slots 
    232 with the default components. This section will teach you three things: 
    233 what are the missing components (and how to set the same components 
    234 yourself), how to use alternative components to get a different tree and, 
    235 finally, how to write a skeleton for tree induction in Python. 
    237 .. code/ 
     214Setting learning parameters 
    239217Let us construct a :obj:`TreeLearner` to play with: 
    244222There are three crucial components in learning: the 
    245223:obj:`~TreeLearner.split` and :obj:`~TreeLearner.stop` criteria, and the 
    246 example :obj:`~TreeLearner.splitter` (there are some others, which become 
    247 important during classification; we'll talk about them later). They are 
    248 not defined; if you use the learner, the slots are filled temporarily 
    249 but later cleared again. 
    251 :: 
    253     >>> print learner.split 
    254     None 
    255     >>> learner(data) 
    256     <TreeClassifier instance at 0x01F08760> 
    257     >>> print learner.split 
    258     None 
    260 Stopping criteria 
    261 ================= 
    263 The stop is trivial. The default is set by 
     224example :obj:`~TreeLearner.splitter`. The default ``stop`` is set with 
    265226    >>> learner.stop = Orange.classification.tree.StopCriteria_common() 
    267 We can now examine the default stopping parameters. 
     228and the default stopping parameters are 
    269230    >>> print learner.stop.max_majority, learner.stop.min_examples 
    270231    1.0 0.0 
    272 Not very restrictive. This keeps splitting the instances until there's 
    273 nothing left to split or all the instances are in the same class. Let us 
    274 set the minimal subset that we allow to be split to five instances and 
    275 see what comes out. 
     233The defaults keep splitting until there is 
     234nothing left to split or all the instances are in the same class. 
     235If the minimal subset that is allowed to be split further 
     236is set to five instances, the resulting tree is smaller. 
    277238    >>> learner.stop.min_instances = 5.0 
    288249    |    |    prescription=myope: hard (100.00%) 
    290 OK, that's better. If we want an even smaller tree, we can also limit 
    291 the maximal proportion of majority class. 
     251We can also limit the maximal proportion of majority class. 
    293253    >>> learner.stop.max_majority = 0.5 
    295255    >>> print tree.dump() 
    296256    none (62.50%) 
    299258Redefining tree induction components 
Note: See TracChangeset for help on using the changeset viewer.