Changeset 9023:e19d6e15d5c7 in orange


Ignore:
Timestamp:
09/26/11 15:32:22 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
4f1a36633c8a3ecc3f54651228be9bf341b0ec18
Message:

Orange.classification.Tree: edited the Examples section.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r9016 r9023  
    2323 
    2424This page first describes the learner and the classifier, and then 
    25 defines the base classes (individual components) of the trees and the 
     25the individual components of the trees and the 
    2626tree-building process. 
    2727 
     
    3434.. class:: Node 
    3535 
    36     Classification trees are represented as a tree-like hierarchy of 
     36    Classification trees are represented as a hierarchy of 
    3737    :obj:`Node` classes. 
    3838 
     
    6666        is True. 
    6767 
    68     If the node is an internal node, there are several additional 
    69     fields. The lists :obj:`branches`, :obj:`branch_descriptions` and 
     68    Internal nodes have additional 
     69    attributes. The lists :obj:`branches`, :obj:`branch_descriptions` and 
    7070    :obj:`branch_sizes` are of the same length. 
    7171 
    7272    .. attribute:: branches 
    7373 
    74         A list of subtrees, given as :obj:`Node`.  If an element 
    75         is None, the node is empty. 
     74        A list of subtrees. Each element is a :obj:`Node` or None. 
     75        If None, the node is empty. 
    7676 
    7777    .. attribute:: branch_descriptions 
    7878 
    79         A list with string descriptions for branches, constructed by 
     79        A list with string describing branches, which are constructed by 
    8080        :obj:`SplitConstructor`. It can contain anything, 
    8181        for example 'red' or '>12.3'. 
     
    8383    .. attribute:: branch_sizes 
    8484 
    85         A (weighted) number of training instances that went into 
     85        A (weighted) number of training instances for  
    8686        each branch. It can be used, for instance, for modeling 
    8787        probabilities when classifying instances with unknown values. 
     
    9090 
    9191        A :obj:`~Orange.classification.Classifier` that returns a branch 
    92         for each instance: it gets an instance and returns discrete 
     92        for each instance: it returns discrete 
    9393        :obj:`Orange.data.Value` in ``[0, len(branches)-1]``.  When an 
    94         instance cannot be classified to any branch, the selector can 
     94        instance cannot be classified unambiguously, the selector can 
    9595        return a discrete distribution, which proposes how to divide 
    9696        the instance between the branches. Whether the proposition will 
    97         be used depends upon the chosen :obj:`Splitter` (when learning) 
    98         or :obj:`Descender` (when classifying). 
     97        be used depends upon the :obj:`Splitter` (for learning) 
     98        or :obj:`Descender` (for classification). 
    9999 
    100100    .. method:: tree_size() 
     
    110110============== 
    111111 
    112 This example explores the tree tructure of a tree build on the 
     112This example works with the 
    113113lenses data set: 
    114114 
     
    135135method :func:`~Node.tree_size`. 
    136136 
    137 Let us now write a script that prints out a tree. The recursive part of 
    138 the function will get a node and its level. 
     137Trees can be printed with a recursive function: 
    139138 
    140139.. literalinclude:: code/treestructure.py 
    141140   :lines: 26-41 
    142141 
    143 Don't waste time on studying formatting tricks (\n's etc.), this is just 
    144 for nicer output. What matters is everything but the print statements. 
     142The crux of the example is not in the formatting (\\n's etc.); 
     143what matters is everything but the print statements. 
    145144As first, we check whether the node is a null-node (a node to which no 
    146145learning instances were classified). If this is so, we just print out 
     
    148147 
    149148After handling null nodes, remaining nodes are internal nodes and 
    150 leaves.  For internal nodes, we print a node description consisting 
    151 of the feature's name and distribution of classes. :obj:`Node`'s 
    152 branch description is, for all currently defined splits, an instance 
    153 of a class derived from :obj:`Orange.classification.Classifier`  
    154 (in fact, it is 
    155 a :obj:`orange.ClassifierFromVarFD`, but a :obj:`Orange.classification.Classifier` 
    156 would suffice), and its :obj:`class_var` points to the attribute we seek. 
    157 So we print its name. We will also assume that storing class distributions 
    158 has not been disabled and print them as well.  Then we iterate through 
    159 branches; for each we print a branch description and iteratively call the 
    160 :obj:`printTree0` with a level increased by 1 (to increase the indent). 
    161  
    162 Finally, if the node is a leaf, we print out the distribution of learning 
     149leaves. For internal nodes, we print a node description consisting of 
     150the feature's name and distribution of classes. :obj:`Node`'s branch 
     151description is an instance of :obj:`~Orange.classification.Classifier`, 
     152and its ``class_var`` is the feature whose name is printed. 
     153Class distributions are printed as well (they are assumed to be strored).  
     154Then we branch description for each branch and recursively call  
     155:obj:`printTree0` with a level increased by 1 to increase the indent. 
     156 
     157Finally, if the node is a leaf, we print the distribution of learning 
    163158instances in the node and the class to which the instances in the node 
    164 would be classified. We again assume that the :obj:`~Node.node_classifier` is 
    165 the default one - a :obj:`DefaultClassifier`. A better print function 
     159would be classified. We assume that the :obj:`~Node.node_classifier` is 
     160a :obj:`DefaultClassifier`. A better print function 
    166161should be aware of possible alternatives. 
    167162 
    168 Now, we just need to write a simple function to call our printTree0. 
    169 We could write something like... 
    170  
    171 :: 
    172  
    173     def printTree(x): 
    174         printTree0(x.tree, 0) 
    175  
    176 ... but we won't. Let us learn how to handle arguments of 
    177 different types. Let's write a function that will accept either a 
    178 :obj:`TreeClassifier` or a :obj:`Node`. 
     163If the print-out function needs to accept either a 
     164:obj:`TreeClassifier` or a :obj:`Node`, it can be written as follows: 
    179165 
    180166.. literalinclude:: code/treestructure.py 
    181167   :lines: 43-49 
    182168 
    183 It's fairly straightforward: if :obj:`x` is of type derived from 
    184 :obj:`TreeClassifier`, we print :obj:`x.tree`; if it's :obj:`Node` we 
    185 just call :obj:`printTree0` with :obj:`x`. If it's of some other type, 
    186 we don't know how to handle it and thus raise an exception. The output:: 
    187  
    188     >>> printTree(treeClassifier) 
     169It's fairly straightforward: if ``x`` is a 
     170:obj:`TreeClassifier`, we print ``x.tree``; if it's :obj:`Node` we 
     171just call ``printTree0`` with `x`. If it's of some other type, 
     172we raise an exception. The output:: 
     173 
     174    >>> print_tree(tree_classifier) 
    189175    tear_rate (<15.000, 5.000, 4.000>) 
    190176    : reduced --> none (<12.000, 0.000, 0.000>) 
     
    201187          : hypermetrope --> none (<2.000, 0.000, 1.000>) 
    202188 
    203 For a final exercise, let us write a simple pruning function. It will 
    204 be written entirely in Python, unrelated to any :obj:`Pruner`. It will 
    205 limit the maximal tree depth (the number of internal nodes on any path 
     189We conclude the tree structure examples with a simple pruning  
     190function, written entirely in Python and unrelated to any :obj:`Pruner`. It  
     191limits the maximal tree depth (the number of internal nodes on any path 
    206192down the tree) given as an argument.  For example, to get a two-level 
    207 tree, we would call cutTree(root, 2). The function will be recursive, 
     193tree, call cut_tree(root, 2). The function ise recursive, 
    208194with the second argument (level) decreasing at each call; when zero, 
    209195the current node will be made a leaf: 
     
    212198   :lines: 54-62 
    213199 
    214 There's nothing to prune at null-nodes or leaves, so we act only when 
    215 :obj:`node` and :obj:`node.branch_selector` are defined. If level is 
    216 not zero, we call the function for each branch. Otherwise, we clear the 
     200The function acts only when 
     201:obj:`node` and :obj:`node.branch_selector` are defined. If the level is 
     202not zero, is recursively calls  the function for each branch. Otherwise, it clears the 
    217203selector, branches and branch descriptions. 
    218204 
     
    226212       : yes --> hard (<2.000, 0.000, 4.000>) 
    227213 
    228 Learning 
    229 ======== 
    230  
    231 You could just call :class:`TreeLearner` and let it fill the empty slots 
    232 with the default components. This section will teach you three things: 
    233 what are the missing components (and how to set the same components 
    234 yourself), how to use alternative components to get a different tree and, 
    235 finally, how to write a skeleton for tree induction in Python. 
    236  
    237 .. _treelearner.py: code/treelearner.py 
     214Setting learning parameters 
     215=========================== 
    238216 
    239217Let us construct a :obj:`TreeLearner` to play with: 
     
    244222There are three crucial components in learning: the 
    245223:obj:`~TreeLearner.split` and :obj:`~TreeLearner.stop` criteria, and the 
    246 example :obj:`~TreeLearner.splitter` (there are some others, which become 
    247 important during classification; we'll talk about them later). They are 
    248 not defined; if you use the learner, the slots are filled temporarily 
    249 but later cleared again. 
    250  
    251 :: 
    252  
    253     >>> print learner.split 
    254     None 
    255     >>> learner(data) 
    256     <TreeClassifier instance at 0x01F08760> 
    257     >>> print learner.split 
    258     None 
    259  
    260 Stopping criteria 
    261 ================= 
    262  
    263 The stop is trivial. The default is set by 
     224example :obj:`~TreeLearner.splitter`. The default ``stop`` is set with 
    264225 
    265226    >>> learner.stop = Orange.classification.tree.StopCriteria_common() 
    266227 
    267 We can now examine the default stopping parameters. 
     228and the default stopping parameters are 
    268229 
    269230    >>> print learner.stop.max_majority, learner.stop.min_examples 
    270231    1.0 0.0 
    271232 
    272 Not very restrictive. This keeps splitting the instances until there's 
    273 nothing left to split or all the instances are in the same class. Let us 
    274 set the minimal subset that we allow to be split to five instances and 
    275 see what comes out. 
     233The defaults keep splitting until there is 
     234nothing left to split or all the instances are in the same class. 
     235If the minimal subset that is allowed to be split further 
     236is set to five instances, the resulting tree is smaller. 
    276237 
    277238    >>> learner.stop.min_instances = 5.0 
     
    288249    |    |    prescription=myope: hard (100.00%) 
    289250 
    290 OK, that's better. If we want an even smaller tree, we can also limit 
    291 the maximal proportion of majority class. 
     251We can also limit the maximal proportion of majority class. 
    292252 
    293253    >>> learner.stop.max_majority = 0.5 
     
    295255    >>> print tree.dump() 
    296256    none (62.50%) 
    297  
    298257 
    299258Redefining tree induction components 
Note: See TracChangeset for help on using the changeset viewer.