Changeset 9082:411cf63e4a73 in orange


Ignore:
Timestamp:
10/09/11 00:36:31 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
cbe16966796b185336ed8de9b45a1ff07a6db776
Message:

Updates to Orange.classification.tree.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r9076 r9082  
    3434.. class:: Node 
    3535 
    36     Classification trees are a a hierarchy of :obj:`Node` classes. 
     36    Classification trees are a hierarchy of :obj:`Node` classes. 
    3737 
    3838    Node stores the instances belonging to the node, a branch selector, 
     
    103103============== 
    104104 
    105 This example works with the 
    106 lenses data set: 
     105This example works with the lenses data set: 
    107106 
    108107.. literalinclude:: code/treestructure.py 
    109108   :lines: 7-10 
    110109 
    111 The next function counts the number of nodes in a tree: 
     110The following function counts the number of nodes in a tree: 
    112111 
    113112.. _lenses.tab: code/lenses.tab 
     
    117116   :lines: 12-21 
    118117 
    119 If node is None, we return 0. Otherwise, the size is 1 (this node) 
    120 plus the sizes of all subtrees. We need to check if the node is an 
    121 internal node (it has a :obj:`~Node.branch_selector`), as leaves don't have 
    122 the :obj:`~Node.branches` attribute. 
     118If node is None, the function above return 0. Otherwise, the size is 1 
     119(this node) plus the sizes of all subtrees. The algorithm need to check 
     120if a node is internal (it has a :obj:`~Node.branch_selector`), as leaves 
     121don't have the :obj:`~Node.branches` attribute. 
    123122 
    124123    >>> tree_size(tree_classifier.tree) 
    125124    10 
    126125 
    127 This was only an excercise - a :obj:`Node` already has a built-in 
    128 method :func:`~Node.tree_size`. 
    129  
    130 Trees can be printed with a recursive function: 
     126Note that a :obj:`Node` already has a built-in method 
     127:func:`~Node.tree_size`. 
     128 
     129Trees can be printed with a simple recursive function: 
    131130 
    132131.. literalinclude:: code/treestructure.py 
     
    134133 
    135134The crux of the example is not in the formatting (\\n's etc.); 
    136 what matters is everything but the print statements. 
    137 As first, we check whether the node is a null-node (a node to which no 
    138 learning instances were classified). If this is so, we just print out 
    139 "<null node>" and return. 
    140  
    141 After handling null nodes, remaining nodes are internal nodes and 
    142 leaves. For internal nodes, we print a node description consisting of 
    143 the feature's name and distribution of classes. :obj:`Node`'s branch 
    144 description is an instance of :obj:`~Orange.classification.Classifier`, 
    145 and its ``class_var`` is the feature whose name is printed. 
    146 Class distributions are printed as well (they are assumed to be strored).  
    147 Then we branch description for each branch and recursively call  
    148 :obj:`printTree0` with a level increased by 1 to increase the indent. 
    149  
    150 Finally, if the node is a leaf, we print the distribution of learning 
    151 instances in the node and the class to which the instances in the node 
    152 would be classified. We assume that the :obj:`~Node.node_classifier` is 
    153 a :obj:`DefaultClassifier`. A better print function 
    154 should be aware of possible alternatives. 
    155  
    156 The print-out function that accepts either a 
     135what matters is everything but the print statements. The code 
     136separately handles three node types: 
     137 
     138* For null nodes (a node to which no learning instances were classified), 
     139  it just prints "<null node>". 
     140* For internal nodes, it print a node description: 
     141  the feature's name and distribution of classes. :obj:`Node`'s 
     142  branch description is an :obj:`~Orange.classification.Classifier`, 
     143  and its ``class_var`` is the feature whose name is printed.  Class 
     144  distributions are printed as well (they are assumed to be stored). 
     145  The :obj:`printTree0` with a level increased by 1 to increase the 
     146  indent is recursively called for each branch. 
     147* If the node is a leaf, it prints the distribution of learning instances 
     148  in the node and the class to which the instances in the node would 
     149  be classified. We assume that the :obj:`~Node.node_classifier` is a 
     150  :obj:`DefaultClassifier`. A better print function should be aware of 
     151  possible alternatives. 
     152 
     153The wrapper function that accepts either a 
    157154:obj:`TreeClassifier` or a :obj:`Node` can be written as follows: 
    158155 
     
    160157   :lines: 43-49 
    161158 
    162 It's fairly straightforward: if ``x`` is a 
    163 :obj:`TreeClassifier`, we print ``x.tree``; if it's :obj:`Node` we 
    164 just call ``printTree0`` with `x`. If it's of some other type, 
    165 we raise an exception. The output:: 
     159It's straightforward: if ``x`` is a 
     160:obj:`TreeClassifier`, it prints ``x.tree``; if it's :obj:`Node` it 
     161print ``x``. If it's of some other type, 
     162an exception is raised. The output:: 
    166163 
    167164    >>> print_tree(tree_classifier) 
     
    180177          : hypermetrope --> none (<2.000, 0.000, 1.000>) 
    181178 
    182 The tree structure examples conclude with a simple pruning  
    183 function, written entirely in Python and unrelated to any :obj:`Pruner`. It  
    184 limits the maximal tree depth (the number of internal nodes on any path 
    185 down the tree) given as an argument.  For example, to get a two-level 
    186 tree, call cut_tree(root, 2). The function ise recursive, 
    187 with the second argument (level) decreasing at each call; when zero, 
    188 the current node will be made a leaf: 
     179The tree structure examples conclude with a simple pruning function, 
     180written entirely in Python and unrelated to any :class:`Pruner`. It limits 
     181the tree depth (the number of internal nodes on any path down the tree). 
     182For example, to get a two-level tree, call cut_tree(root, 2). The function 
     183is recursive, with the second argument (level) decreasing at each call; 
     184when zero, the current node will be made a leaf: 
    189185 
    190186.. literalinclude:: code/treestructure.py 
    191187   :lines: 54-62 
    192188 
    193 The function acts only when 
    194 :obj:`node` and :obj:`node.branch_selector` are defined. If the level is 
    195 not zero, is recursively calls  the function for each branch. Otherwise, it clears the 
    196 selector, branches and branch descriptions. 
     189The function acts only when :obj:`node` and :obj:`node.branch_selector` 
     190are defined. If the level is not zero, is recursively calls  the function 
     191for each branch. Otherwise, it clears the selector, branches and branch 
     192descriptions. 
    197193 
    198194    >>> cutTree(tree.tree, 2) 
     
    215211There are three crucial components in learning: the 
    216212:obj:`~TreeLearner.split` and :obj:`~TreeLearner.stop` criteria, and the 
    217 example :obj:`~TreeLearner.splitter`. The default ``stop`` is set with 
     213example :obj:`~TreeLearner.splitter`. The default ``stop`` is set with: 
    218214 
    219215    >>> learner.stop = Orange.classification.tree.StopCriteria_common() 
    220216 
    221 and the default stopping parameters are 
     217The default stopping parameters are: 
    222218 
    223219    >>> print learner.stop.max_majority, learner.stop.min_examples 
    224220    1.0 0.0 
    225221 
    226 The defaults keep splitting until there is 
    227 nothing left to split or all the instances are in the same class. 
    228 If the minimal subset that is allowed to be split further 
    229 is set to five instances, the resulting tree is smaller. 
     222The defaults only stop splitting when no instances are left or all of 
     223them are in the same class. 
     224 
     225If the minimal subset that is allowed to be split further is set to five 
     226instances, the resulting tree is smaller. 
    230227 
    231228    >>> learner.stop.min_instances = 5.0 
     
    254251This example shows how to use a custom stop function.  First, the 
    255252``def_stop`` function defines the default stop function. The first tree 
    256 has some added randomness: the induction will also stop in 20% of the 
     253has some added randomness; the induction will also stop in 20% of the 
    257254cases when ``def_stop`` returns False. The stopping criteria for the 
    258255second tree is completely random: it stops induction in 20% of cases. 
    259256Note that in the second case lambda function still has three parameters, 
    260 since this is a necessary number of parameters for the stop function 
    261 (:obj:`StopCriteria`). 
     257even though in does not need any, since so many are necessary 
     258for :obj:`~TreeLearner.stop`. 
    262259 
    263260.. _tree3.py: code/tree3.py 
     
    273270===================== 
    274271 
    275 Split constructor find a suitable criteria for dividing the learning (and 
    276 later testing) instances. Those that cannot handle a particular feature 
    277 type (discrete, continuous) quitely skip them. Therefore use a correct 
    278 split constructor for your dataset, or :obj:`SplitConstructor_Combined` 
    279 that delegates features to specialized split constructors. 
    280  
    281 The same split constructors can be both for classification and regression 
    282 trees, if the 'measure' attribute for the :obj:`SplitConstructor_Score` 
    283 class (and derived classes) is set accordingly. 
    284  
    285272.. class:: SplitConstructor 
    286273 
    287     Finds a suitable criteria for dividing the learning (and later 
    288     testing) instances.  
     274    Decide how to divide the learning instances. 
    289275     
    290276    The :obj:`SplitConstructor` should use the domain contingency when 
    291     possible, both because it's faster and because the contingency 
     277    possible, both for speed and because the contingency 
    292278    matrices are not necessarily constructed by simply counting the 
    293279    instances. There are, however, cases when domain contingency does not 
    294280    suffice; for example if ReliefF is used to score features. 
    295281 
    296     :obj:`SplitConstructor` returns a classifier to be used as 
    297     :obj:`Node`'s :obj:`~Node.branch_selector`, a list of branch descriptions 
    298     a list with the number of instances that go into each branch 
    299     (if empty, the :obj:`TreeLearner` will find the number itself after 
    300     splitting the instances into subsets), a split quality (a number without 
    301     any fixed meaning except that higher numbers mean better splits). 
    302  
    303     If the constructed splitting criterion uses a feature in such 
    304     a way that the feature will be useless in the future and should not be 
    305     considered as a split criterion in any of the subtrees (the typical 
    306     case of this are discrete features that are used as-they-are, 
    307     without any binarization or subsetting), then it should report 
    308     the index of this feature. Some splits do not spend any features; 
    309     this is indicated by returning a negative index. 
    310  
    311     A :obj:`SplitConstructor` can veto the further tree induction 
    312     by returning no classifier. This can happen for many reasons. 
    313     A general one is related to number of instances in the branches. 
    314     :obj:`SplitConstructor` has a field :obj:`min_subset`, which sets 
    315     the minimal number of instances in a branch; null nodes 
    316     are allowed. If there is no split where this condition is met, 
    317     :obj:`SplitConstructor` stops the induction. 
     282    A :obj:`SplitConstructor` can veto further tree induction by 
     283    returning no classifier. This is generall related to number of 
     284    instances in the branches. If there are no splits with more than 
     285    :obj:`SplitConstructor.min_subset` instances in the branches (null 
     286    nodes are allowed), the induction is stopped. 
     287 
     288    Split constructors that cannot handle a particular feature type 
     289    (discrete, continuous) quietly skip them. Therefore use a correct 
     290    split constructor or :obj:`SplitConstructor_Combined`, which delegates 
     291    features to specialized split constructors. 
     292 
     293    The same split constructors can be used both for classification 
     294    and regression, if the 'measure' attribute for the 
     295    :obj:`SplitConstructor_Score` class (and derived classes) is set 
     296    accordingly. 
     297 
    318298 
    319299    .. attribute:: min_subset 
    320300 
    321         The minimal number of (weighted) in non-null leaves. 
    322  
    323     .. method:: __call__(instances, [ weightID, contingency, apriori_distribution, candidates, clsfr])  
    324  
    325         :param instances:  Examples can be given in any acceptable form 
    326             (an :obj:`ExampleGenerator`, such as :obj:`ExampleTable`, or a 
    327             list of instances). 
     301        The minimal (weighted) number in non-null leaves. 
     302 
     303    .. method:: __call__(data, [ weightID, contingency, apriori_distribution, candidates, clsfr])  
     304 
     305        :param data: in any acceptable form. 
    328306        :param weightID: Optional; the default of 0 means that all 
    329307            instances have a weight of 1.0.  
     
    331309        :param apriori_distribution: apriori class probabilities. 
    332310        :type apriori_distribution: :obj:`Orange.statistics.distribution.Distribution` 
    333         :param candidates: The split constructor should consider only  
    334             the features in the candidate list (one boolean for each 
    335             feature). 
     311        :param candidates: only consider these  
     312            features (one boolean for each feature). 
    336313        :param clsfr: a node classifier (if it was constructed, that is,  
    337314            if :obj:`store_node_classifier` is True)  
    338315 
    339316        Construct a split. Return a tuple (:obj:`branch_selector`, 
    340         :obj:`branch_descriptions`, :obj:`subset_sizes`, :obj:`quality`, 
    341         :obj:`spent_feature`). :obj:`spent_feature` is -1 if no 
    342         feature is completely spent by the split criterion. If no 
    343         split is constructed, the :obj:`selector`, :obj:`branch_descriptions` 
    344         and :obj:`subset_sizes` are None, while :obj:`quality` is 0.0 and 
    345         :obj:`spent_feature` is -1.  
     317        :obj:`branch_descriptions` (a list), :obj:`subset_sizes` 
     318        (the number of instances for each branch, may also be 
     319        empty), :obj:`quality` (higher numbers mean better splits), 
     320        :obj:`spent_feature`). If no split is constructed, 
     321        the :obj:`selector`, :obj:`branch_descriptions` and 
     322        :obj:`subset_sizes` are None, while :obj:`quality` is 0.0 and 
     323        :obj:`spent_feature` is -1. 
     324 
     325        If the splitting criterion uses a feature in such a way that the 
     326        feature will be useless in the future and should not be considered 
     327        as a split criterion in any of the subtrees (the typical case of 
     328        this are discrete features that are used as-they-are, without 
     329        any binarization or subsetting), then it should return the 
     330        index of this feature. If no features are spent, 
     331        -1 is returned. 
    346332 
    347333.. class:: SplitConstructor_Score 
Note: See TracChangeset for help on using the changeset viewer.