Changeset 9115:13cace1b7abb in orange


Ignore:
Timestamp:
10/17/11 20:42:11 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
d4fb453a56f3c6bdbfccf84dd9514429de8a7064
Message:

Orange.classification.tree update

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r9082 r9115  
    251251This example shows how to use a custom stop function.  First, the 
    252252``def_stop`` function defines the default stop function. The first tree 
    253 has some added randomness; the induction will also stop in 20% of the 
     253has some added randomness; the induction also stops in 20% of the 
    254254cases when ``def_stop`` returns False. The stopping criteria for the 
    255255second tree is completely random: it stops induction in 20% of cases. 
     
    272272.. class:: SplitConstructor 
    273273 
    274     Decide how to divide the learning instances. 
     274    Decide how to divide learning instances. 
    275275     
    276     The :obj:`SplitConstructor` should use the domain contingency when 
    277     possible, both for speed and because the contingency 
    278     matrices are not necessarily constructed by simply counting the 
    279     instances. There are, however, cases when domain contingency does not 
    280     suffice; for example if ReliefF is used to score features. 
    281  
    282     A :obj:`SplitConstructor` can veto further tree induction by 
    283     returning no classifier. This is generall related to number of 
    284     instances in the branches. If there are no splits with more than 
    285     :obj:`SplitConstructor.min_subset` instances in the branches (null 
    286     nodes are allowed), the induction is stopped. 
    287  
    288     Split constructors that cannot handle a particular feature type 
    289     (discrete, continuous) quietly skip them. Therefore use a correct 
    290     split constructor or :obj:`SplitConstructor_Combined`, which delegates 
    291     features to specialized split constructors. 
    292  
    293     The same split constructors can be used both for classification 
    294     and regression, if the 'measure' attribute for the 
    295     :obj:`SplitConstructor_Score` class (and derived classes) is set 
    296     accordingly. 
    297  
     276    The :obj:`SplitConstructor` should use the domain 
     277    contingency when possible, both for speed and adaptability 
     278    (:obj:`TreeLearner.contingency`).  Sometimes domain contingency does 
     279    not suffice, for example if ReliefF score is used. 
     280 
     281    A :obj:`SplitConstructor` can veto further tree induction by returning 
     282    no classifier. This is generally related to the number of learning 
     283    instances that would go in each branch. If there are no splits with 
     284    more than :obj:`SplitConstructor.min_subset` instances in the branches 
     285    (null nodes are allowed), the induction is stopped. 
     286 
     287    Split constructors that cannot handle a particular feature 
     288    type (discrete, continuous) quietly skip them. When in doubt, use 
     289    :obj:`SplitConstructor_Combined`, which delegates features to 
     290    specialized split constructors. 
     291 
     292    The same split constructors can be used both for classification and 
     293    regression, if the chosen score (for :obj:`SplitConstructor_Score` 
     294    and derived classes) supports both. 
    298295 
    299296    .. attribute:: min_subset 
     
    323320        :obj:`spent_feature` is -1. 
    324321 
    325         If the splitting criterion uses a feature in such a way that the 
    326         feature will be useless in the future and should not be considered 
    327         as a split criterion in any of the subtrees (the typical case of 
    328         this are discrete features that are used as-they-are, without 
    329         any binarization or subsetting), then it should return the 
    330         index of this feature. If no features are spent, 
    331         -1 is returned. 
     322        If the chosen feature will be useless in the future and 
     323        should not be considered for splitting in any of the subtrees 
     324        (typically, when discrete features are used as-they-are, without 
     325        any binarization or subsetting), then it should return the index 
     326        of this feature as :obj:`spent_feature`. If no features are spent, 
     327        :obj:`spent_feature` is -1. 
    332328 
    333329.. class:: SplitConstructor_Score 
     
    335331    Bases: :class:`SplitConstructor` 
    336332 
    337     An abstract base class for split constructors that compare splits 
     333    An abstract base class that compare splits 
    338334    with a :class:`Orange.feature.scoring.Score`.  All split 
    339335    constructors except for :obj:`SplitConstructor_Combined` are derived 
     
    361357 
    362358    The constructed :obj:`branch_selector` is an instance of 
    363     :obj:`orange.ClassifierFromVarFD` that returns a value of the selected 
     359    :obj:`orange.ClassifierFromVarFD`, that returns a value of the selected 
    364360    feature. :obj:`branch_description` contains the feature's 
    365361    values. The feature is marked as spent (it cannot reappear in the 
     
    370366    Bases: :class:`SplitConstructor_Score` 
    371367 
    372     For each discrete feature it determines which binarization gives 
    373     the the highest score. In case of ties, a random feature is selected. 
     368    Finds the binarization with the highest score among all features. In 
     369    case of ties, a random feature is selected. 
    374370 
    375371    The constructed :obj:`branch_selector` is an instance 
    376     :obj:`orange.ClassifierFromVarFD` that returns a value of the selected 
    377     feature. Its :obj:`transformer` contains a :obj:`MapIntValue` 
    378     that maps values of the feature into a binary feature. Branch 
    379     descriptions are of form ``[<val1>, <val2>, ...<valn>]`` for branches 
    380     with more than one feature value. Branches with a single feature 
    381     value are described with that value. If the feature was binary, 
    382     it is spent and cannot be used in the node's subtrees. Otherwise 
    383     it is not spent. 
    384  
     372    :obj:`orange.ClassifierFromVarFD`, that returns a value of the 
     373    selected feature. Its :obj:`transformer` contains a ``MapIntValue`` 
     374    that maps values of the feature into a binary feature. Branches 
     375    with a single feature value are described with that value and 
     376    branches with more than one are described with ``[<val1>, <val2>, 
     377    ...<valn>]``. Only binary features are marked as spent. 
    385378 
    386379.. class:: SplitConstructor_Threshold 
     
    388381    Bases: :class:`SplitConstructor_Score` 
    389382 
    390     The only split constructor for continuous features.  It divides the 
     383    The only split constructor for continuous features. It divides the 
    391384    range of feature values with a threshold that maximizes the split's 
    392385    quality. In case of ties, a random feature is selected.  The feature 
     
    409402    Bases: :class:`SplitConstructor` 
    410403 
    411     This split constructor uses different split constructors for 
    412     discrete and continuous features. Each split constructor is called 
    413     with features of appropriate type only. Both construct a candidate 
    414     for a split; the better of them is selected. 
    415  
    416     There is a problem when multiple candidates have the same score. Let 
    417     there be nine discrete features with the highest score; the split 
     404    Uses different split constructors for discrete and continuous 
     405    features. Each split constructor is called with appropriate 
     406    features. Both construct a candidate for a split; the better of them 
     407    is used. 
     408 
     409    The choice of the split is not probabilistically fair, when 
     410    multiple candidates have the same score. For example, if there 
     411    are nine discrete features with the highest score the split 
    418412    constructor for discrete features will select one of them. Now, 
    419     if there is a single continuous feature with the same score, 
     413    if there is also a single continuous feature with the same score, 
    420414    :obj:`SplitConstructor_Combined` would randomly select between the 
    421     proposed discrete feature and the continuous feature. It is not aware 
    422     of that the discrete has already competed with eight other discrete 
    423     features. So, the probability for selecting (each) discrete feature 
    424     would be 1/18 instead of 1/10. Although incorrect, we doubt that 
    425     this would affect the tree's performance. 
    426  
    427     The :obj:`branch_selector`, :obj:`branch_descriptions` and whether 
    428     the feature is spent is decided by the winning split constructor. 
     415    proposed discrete feature and continuous feature, unaware that the 
     416    discrete feature  has already competed with eight others.  So, 
     417    the probability for selecting (each) discrete feature would be 
     418    1/18 instead of 1/10. Although incorrect, this should not affect 
     419    the performance. 
    429420 
    430421    .. attribute: discrete_split_constructor 
     
    438429        Split constructor for continuous features; it  
    439430        can be either :obj:`SplitConstructor_Threshold` or a  
    440         split constructor you programmed in Python. 
     431        a custom split constructor. 
    441432 
    442433 
     
    457448 
    458449        Return True (stop) of False (continue the induction). 
    459         If contingencies are given, they are used for checking whether 
    460         classes but not for counting. Derived classes should use the 
    461         contingencies whenever possible. 
     450        Contingencies are not used for counting. Derived classes should 
     451        use the contingencies whenever possible. 
    462452 
    463453.. class:: StopCriteria_common 
    464454 
    465     Additional criteria for pre-pruning: the proportion of majority 
    466     class and the number of weighted instances. 
     455    Pre-pruning with additional criteria. 
    467456 
    468457    .. attribute:: max_majority 
    469458 
    470         Maximal proportion of majority class. When exceeded, 
     459        Maximum proportion of majority class. When exceeded, 
    471460        induction stops. 
    472461 
    473462    .. attribute:: min_instances 
    474463 
    475         Minimal number of instances in internal leaves. Subsets with less 
     464        Minimum number of instances for splitting. Subsets with less 
    476465        than :obj:`min_instances` instances are not split further. 
    477466        The sample count is weighed. 
     
    481470================= 
    482471 
    483 Splitters sort learning instances info brances (the branches are selected 
     472Splitters sort learning instances into branches (the branches are selected 
    484473with a :obj:`SplitConstructor`, while a :obj:`Descender` decides the 
    485474branch for an instance during classification. 
    486475 
    487 Most splitters simply call :obj:`Node.branch_selector` and assign 
    488 instances to correspondingly. When the value is unknown they choose a 
    489 particular branch or simply skip the instance. 
    490  
    491 Some enhanced splitters can split instances. An instance (actually, 
    492 a pointer to it) is copied to more than one subset. To facilitate 
    493 real splitting, weights are needed. Each branch has a weight ID (each 
    494 would usually have its own ID) and all instances in that branch (either 
    495 completely or partially) should have this meta attribute. If an instance 
    496 hasn't been split, it has only one additional attribute - with weight 
    497 ID corresponding to the subset to which it went. Instance that is split 
     476Most splitters call :obj:`Node.branch_selector` and assign 
     477instances correspondingly. When the value is unknown they choose a 
     478particular branch or skip the instance. 
     479 
     480Some splitters can also split instances: a weighed instance is  
     481used in more than than one subset. Each branch has a weight ID (usually, 
     482each its own ID) and all instances in that branch should have this meta attribute.  
     483 
     484An instance that  
     485hasn't been split has only one additional attribute (weight 
     486ID corresponding to the subset to which it went). Instance that is split 
    498487between, say, three subsets, has three new meta attributes, one for each 
    499 subset. ID's of weight meta attributes returned by the :obj:`Splitter` 
    500 are used for the induction of the corresponding subtrees. 
    501  
    502 The weights are used only when needed. When no splitting occured - 
    503 because the splitter is was unable to do it or because there was no need 
    504 for splitting - no weight ID's are returned. 
     488subset. The weights are used only when needed; when there is no 
     489splitting - no weight IDs are returned. 
    505490 
    506491.. class:: Splitter 
    507492 
    508     An abstract base class for objects that split sets of instances 
    509     into subsets. The derived classes treat instances which cannot be 
    510     unambiguously placed into a single branch (usually due to unknown 
    511     value of the crucial attribute) differently. 
     493    An abstract base class that splits instances 
     494    into subsets. 
    512495 
    513496    .. method:: __call__(node, instances[, weightID]) 
     
    518501        :param weightID: weight ID.  
    519502         
    520         Use the information in :obj:`node` (particularly the 
    521         :obj:`branch_selector`) to split the given set of instances into 
    522         subsets.  Return a tuple with a list of instance generators and 
    523         a list of weights.  The list of weights is either an ordinary 
    524         python list of integers or a None when no splitting of instances 
    525         occurs and thus no weights are needed. 
    526  
    527         Return a list of subsets of instances and, optionally, a list 
    528         of new weight ID's. 
     503        Use the information in :obj:`Node` (particularly the 
     504        :obj:`~Node.branch_selector`) to split the given set of instances into 
     505        subsets.  Return a tuple with a list of instance subsets and 
     506        a list of weights.  The list of weights is either a 
     507        list of integers or None when no weights are added. 
    529508 
    530509.. class:: Splitter_IgnoreUnknowns 
     
    546525    Bases: :class:`Splitter` 
    547526 
    548     Places instances with an unknown value of the feature into all branches. 
     527    Splits instances with an unknown value of the feature into all branches. 
    549528 
    550529.. class:: Splitter_UnknownsToRandom 
     
    558537    Bases: :class:`Splitter` 
    559538 
    560     Constructs an additional branch to contain all ambiguous instances.  
     539    Constructs an additional branch for ambiguous instances.  
    561540    The branch's description is "unknown". 
    562541 
     
    579558============================= 
    580559 
    581 Descenders decide the where should the instances that cannot be 
    582 unambiguously put in a branch be sorted to (the branches are selected 
     560Descenders decide where should the instances that cannot be unambiguously 
     561put in a single branch go during classification (the branches are selected 
    583562with a :obj:`SplitConstructor`, while a :obj:`Splitter` sorts instances 
    584563during learning). 
     
    586565.. class:: Descender 
    587566 
    588     An abstract base object for tree descenders. It descends a 
    589     given instance as far deep as possible, according to the values 
     567    An abstract base tree descender. It descends a 
     568    an instance as deep as possible, according to the values 
    590569    of instance's features. The :obj:`Descender`: calls the node's 
    591570    :obj:`~Node.branch_selector` to get the branch index. If it's a 
    592     simple index, the corresponding branch is followed. If not, it's up 
    593     to descender to decide what to do. A descender can choose a single 
     571    simple index, the corresponding branch is followed. If not, the 
     572    descender decides what to do. A descender can choose a single 
    594573    branch (for instance, the one that is the most recommended by the 
    595574    :obj:`~Node.branch_selector`) or it can let the branches vote. 
     
    600579       there were no unknown or out-of-range values, or when the 
    601580       descender selected a single branch and continued the descend 
    602        despite them. The descender returns the reached :obj:`Node`. 
    603     #. Node's :obj:`~Node.branch_selector` returned a distribution and the 
     581       despite them. The descender returns the :obj:`Node` it has reached. 
     582    #. Node's :obj:`~Node.branch_selector` returned a distribution and 
    604583       :obj:`Descender` decided to stop the descend at this (internal) 
    605584       node. It returns the current :obj:`Node`. 
    606585    #. Node's :obj:`~Node.branch_selector` returned a distribution and the 
    607586       :obj:`Node` wants to split the instance (i.e., to decide the class 
    608        by voting). It returns a :obj:`Node` and the vote-weights for the 
    609        branches.  The weights can correspond to the distribution returned 
    610        by node's :obj:`~Node.branch_selector`, to the number of learning 
    611        instances that were assigned to each branch, or to something else. 
     587       by voting). It returns a :obj:`Node` and the vote-weights for 
     588       the branches.  The weights can correspond, for example,  to the 
     589       distribution returned by node's :obj:`~Node.branch_selector`, or to 
     590       the number of learning instances that were assigned to each branch. 
    612591 
    613592    .. method:: __call__(node, instance) 
    614593 
    615         Descends down the tree until it reaches a leaf or a node in 
    616         which a vote of subtrees is required. In both cases, a tuple 
    617         of two elements is returned; in the former, the tuple contains 
    618         the reached node and None, in the latter in contains a node and 
    619         weights of votes for subtrees (a list of floats). 
    620  
    621         Descenders that never split instances always descend to a 
    622         leaf. They differ in the treatment of instances with unknown 
    623         values (or, in general, instances for which a branch cannot be 
    624         determined at some node the tree). Descenders that 
    625         do split instances differ in returned vote weights. 
     594        Descends until it reaches a leaf or a node in 
     595        which a vote of subtrees is required. A tuple 
     596        of two elements is returned. If it reached a leaf, the tuple contains 
     597        the leaf node and None. If not, it contains a node and 
     598        a list of floats (weights of votes). 
    626599 
    627600.. class:: Descender_UnknownToNode 
     
    631604    When instance cannot be classified into a single branch, the current 
    632605    node is returned. Thus, the node's :obj:`~Node.node_classifier` 
    633     will be used to make a decision. In such case the internal nodes 
    634     need to have their :obj:`Node.node_classifier` (i.e., don't disable 
    635     creating node classifier or manually remove them after the induction). 
     606    will be used to make a decision. Therefore, internal nodes 
     607    need to have :obj:`Node.node_classifier` defined. 
    636608 
    637609.. class:: Descender_UnknownToBranch 
     
    649621    Classifies instances with unknown values to the branch with the 
    650622    highest number of instances. If there is more than one such branch, 
    651     random branch is chosen for each instance that is to be classified. 
     623    random branch is chosen for each instance. 
    652624 
    653625.. class:: Descender_UnknownToCommonSelector 
     
    678650    pair: classification trees; pruning 
    679651 
    680 The pruners construct a shallow copy of a tree.  The pruned tree's 
     652The pruners construct a shallow copy of a tree. The pruned tree's 
    681653:obj:`Node` contain references to the same contingency matrices, 
    682 node classifiers, branch selectors, ...  as the original tree. Thus, 
    683 you may modify a pruned tree structure (manually cut it, add new 
    684 nodes, replace components) but modifying, for instance, some node's 
    685 :obj:`~Node.node_classifier` (a :obj:`~Node.node_classifier` itself, not 
    686 a reference to it!) would modify the node's :obj:`~Node.node_classifier` 
    687 in the corresponding node of the original tree. 
    688  
    689 Pruners cannot construct a :obj:`~Node.node_classifier` nor merge 
    690 :obj:`~Node.node_classifier` of the pruned subtrees into classifiers for new 
    691 leaves. Thus, if you want to build a prunable tree, internal nodes 
    692 must have their :obj:`~Node.node_classifier` defined. Fortunately, this is 
    693 the default. 
     654node classifiers, branch selectors, ...  as the original tree. 
     655 
     656Pruners cannot construct a new :obj:`~Node.node_classifier`.  Thus, for 
     657pruning, internal nodes must have :obj:`~Node.node_classifier` defined 
     658(the default). 
    694659 
    695660.. class:: Pruner 
    696661 
    697     An abstract base class for a tree pruner which defines nothing useful,  
    698     only a pure virtual call operator. 
     662    An abstract base tree pruner. 
    699663 
    700664    .. method:: __call__(tree) 
    701665 
    702666        :param tree: either 
    703             a :obj:`Node` (presumably, but not necessarily a root) or a 
    704             :obj:`_TreeClassifier` (the C++ version of the classifier, 
     667            a :obj:`Node` or (the C++ version of the classifier, 
    705668            saved in :obj:`TreeClassfier.base_classifier`). 
    706669 
    707         Prunes a tree. The argument can be either a tree classifier or 
    708         a tree node; the result is of the same type as the argument. 
     670        The resulting pruned tree is of the same type as the argument. 
    709671        The original tree remains intact. 
    710672 
     
    713675    Bases: :class:`Pruner` 
    714676 
    715     In Orange, a tree can have a non-trivial subtrees (i.e. subtrees with 
    716     more than one leaf) in which all the leaves have the same majority 
    717     class. This is allowed because those leaves can still have different 
    718     distributions of classes and thus predict different probabilities. 
    719     However, this can be undesired when we're only interested 
    720     in the class prediction or a simple tree interpretation. The 
    721     :obj:`Pruner_SameMajority` prunes the tree so that there is no 
    722     subtree in which all the nodes would have the same majority class. 
     677    A tree can have a subtrees where all the leaves have 
     678    the same majority class. This is allowed because leaves can still 
     679    have different class distributions and thus predict different 
     680    probabilities.  The :obj:`Pruner_SameMajority` prunes the tree so 
     681    that there is no subtree in which all the nodes would have the same 
     682    majority class. 
    723683 
    724684    This pruner will only prune the nodes in which the node classifier 
     
    726686    (or a derived class). 
    727687 
    728     The leaves with more than one majority class require some special 
    729     handling. The pruning goes backwards, from leaves to the root. 
    730     When siblings are compared, the algorithm checks whether they have 
    731     (at least one) common majority class. If so, they can be pruned. 
     688    The pruning works from leaves to the root. 
     689    It siblings have (at least one) common majority class, they can be pruned. 
    732690 
    733691.. class:: Pruner_m 
     
    745703================= 
    746704 
    747 The tree printing functions are very flexible. They can print 
    748 out practically anything, from the number of instances, proportion 
    749 of instances of majority class in nodes and similar, to more complex 
    750 statistics like the proportion of instances in a particular class divided 
    751 by the proportion of instances of this class in a parent node. Users 
    752 may also pass their own functions to print certain elements. 
     705The tree printing functions are very flexible. They can print, for 
     706example, numbers of instances, proportions of majority class in nodes 
     707and similar, or more complex statistics like the proportion of instances 
     708in a particular class divided by the proportion of instances of this 
     709class in a parent node. Users may also pass their own functions to print 
     710certain elements. 
    753711 
    754712The easiest way to print the tree is to call :func:`TreeClassifier.dump` 
     
    788746number of instances in parent node. Precision formatting can be added, 
    789747e.g. ``%6.2NbP``. ``bA`` denotes division by the same quantity over the entire 
    790 data set, so ``%NbA`` will tell you the proportion of instaces (out 
     748data set, so ``%NbA`` will tell you the proportion of instances (out 
    791749of the entire training data set) that fell into that node. If division is 
    792750impossible since the parent node does not exist or some data is missing, 
     
    897855``bA`` is replaced with ``bA``. 
    898856 
    899 To print the 
    900 number of versicolors in each node, together with the proportion of 
    901 versicolors among the instances in this particular node and among all 
    902 versicolors use the following format string:: 
     857To print the number of versicolors in each node, together with the 
     858proportion of versicolors among the instances in this particular node 
     859and among all versicolors, use the following:: 
    903860 
    904861    '%C="Iris-versicolor" (%^c="Iris-versicolor"% of node, %^CbA="Iris-versicolor"% of versicolors) 
    905862 
    906 It gives the following output:: 
     863It gives:: 
    907864 
    908865    petal width<0.800: 0.000 (0% of node, 0% of versicolors) 
     
    940897    |    |    petal length>=4.850: [0.00, 0.00, 1.00] 
    941898 
    942 The most trivial format string for internal nodes is to for printing 
    943 the prediction at each node. ``.`` in the following example specifies 
     899The most trivial format string for internal nodes is for printing 
     900node predictions. ``.`` in the following example specifies 
    944901that the node_str should be the same as leaf_str. 
    945902 
     
    960917    |    |    |    petal length>=4.850: Iris-virginica 
    961918 
    962 There appeared a node called *root* and the tree looks one level 
    963 deeper. This is needed to print out the data for that node to. 
     919A node *root* has appeared and the tree looks one level 
     920deeper. This is needed to also print the data for tree root. 
    964921 
    965922To observe how the number 
    966 of virginicas decreases down the tree use:: 
     923of virginicas decreases down the tree try:: 
    967924 
    968925    print tree.dump(leaf_str='%^.1CbA="Iris-virginica"% (%^.1CbP="Iris-virginica"%)', node_str='.') 
    969926 
    970 The interpretation: ``CbA="Iris-virginica"`` is  
     927Interpretation: ``CbA="Iris-virginica"`` is  
    971928the number of instances from virginica, divided by the total number 
    972929of instances in this class. Add ``^.1`` and the result will be 
    973930multiplied and printed with one decimal. The trailing ``%`` is printed 
    974 out. In parentheses we print the same thing except that we divide by 
    975 the instances in the parent node. Note the use of single quotes, so we 
    976 can use the double quotes inside the string to specify the class. 
     931out. In parentheses the same thing was divided by 
     932the instances in the parent node. The single quotes were used for strings, so 
     933that double quotes inside the string can specify the class. 
    977934 
    978935:: 
     
    991948because the root has no parent, it prints out a dot. 
    992949 
    993 For the final example with classification trees, we shall print the 
    994 distributions in nodes, the distribution compared to the parent and the 
    995 proportions compared to the parent.  In the leaves we shall also add 
    996 the predicted class:: 
     950The final example with classification trees prints the distributions in 
     951nodes, the distribution compared to the parent, the proportions compared 
     952to the parent and the predicted class in the leaves:: 
    997953 
    998954    >>> print tree.dump(leaf_str='"%V   %D %.2DbP %.2dbP"', node_str='"%D %.2DbP %.2dbP"') 
     
    1015971.. rubric:: Examples on regression trees 
    1016972 
    1017 The regression trees examples use a tree 
    1018 induced from the housing data set. Without other argumets,  
    1019 :meth:`TreeClassifier.dump` prints the following:: 
     973The regression trees examples use a tree induced from the housing data 
     974set. Without other argumets, :meth:`TreeClassifier.dump` prints the 
     975following:: 
    1020976 
    1021977    RM<6.941 
     
    10621018    |    |    |        [SE: 0.000]   21.9 [21.900-21.900] 
    10631019 
    1064 The predicted value (``%V``) and the average (``%A``) may 
    1065 differ becase a regression tree does not always predict the 
    1066 leaf average, but whatever the 
    1067 :obj:`~Node.node_classifier` in a leaf returns. 
    1068 As ``%V`` uses the :obj:`Orange.data.variable.Continuous`' function 
    1069 for printing out the value, the number has the same 
    1070 number of decimals as in the data file. 
     1020The predicted value (``%V``) and the average (``%A``) may differ because 
     1021a regression tree does not always predict the leaf average, but whatever 
     1022the :obj:`~Node.node_classifier` in a leaf returns.  As ``%V`` uses the 
     1023:obj:`Orange.data.variable.Continuous`' function for printing the 
     1024value, the number has the same number of decimals as in the data file. 
    10711025 
    10721026Regression trees cannot print the distributions in the same way 
     
    11021056    >>> print tree.dump(leaf_str="%C![20,22] (%^cbP![20,22]%)", node_str=".") 
    11031057 
    1104 The format string  ``%c![20, 22]``  
    1105 denotes the proportion of instances (within the node) whose values 
    1106 are below 20 or above 22. ``%cbP![20, 22]`` derives  
    1107 same statistics computed on the parent. A ``^`` is added for  
    1108 percentages. 
     1058The format string  ``%c![20, 22]`` denotes the proportion of instances 
     1059(within the node) whose values are below 20 or above 22. ``%cbP![20, 
     106022]`` derives same statistics computed on the parent. A ``^`` is added 
     1061for percentages. 
    11091062 
    11101063:: 
     
    11301083------------------------- 
    11311084 
    1132 :meth:`TreeClassifier.dump`'s argument :obj:`user_formats` 
    1133 can be used to print out some other information. 
    1134 :obj:`~TreeClassifier.dump.user_formats` should contain a list of tuples 
    1135 with a regular expression and a function to be called when that expression 
    1136 is found in the format string. Expressions from :obj:`user_formats` 
    1137 are checked before the built-in expressions discussed above. 
    1138  
    1139 The regular expression should describe a string like were used above, 
     1085:meth:`TreeClassifier.dump`'s argument :obj:`user_formats` can be used to 
     1086print other information.  :obj:`~TreeClassifier.dump.user_formats` should 
     1087contain a list of tuples with a regular expression and a function to be 
     1088called when that expression is found in the format string. Expressions 
     1089from :obj:`user_formats` are checked before the built-in expressions 
     1090discussed above. 
     1091 
     1092The regular expression should describe a string like used above, 
    11401093for instance ``%.2DbP``. When a leaf or internal node 
    11411094is printed, the format string (:obj:`leaf_str` or :obj:`node_str`) 
     
    11771130        return insert_str(strg, mo, str(node.node_classifier.default_value)) 
    11781131 
    1179 It therefore takes the value predicted at the node 
     1132``replaceV`` takes the value predicted at the node 
    11801133(``node.node_classifier.default_value`` ), converts it to a string 
    11811134and passes it to :func:`insert_str`. 
     
    12361189=========================== 
    12371190 
    1238 C4.5 is incorporated in Orange because it is a standard benchmark in 
    1239 machine learning. The implementation uses the original C4.5 code, so the 
    1240 resulting tree is exactly like the one that would be build by standalone 
    1241 C4.5. The built tree is made accessible in Python. 
     1191C4.5 is, as  a standard benchmark in machine learning, incorporated in 
     1192Orange. The implementation uses the original C4.5 code, so the resulting 
     1193tree is exactly like the one that would be build by standalone C4.5. The 
     1194tree build is only made accessible in Python. 
    12421195 
    12431196:class:`C45Learner` and :class:`C45Classifier` behave 
Note: See TracChangeset for help on using the changeset viewer.