Ignore:
Timestamp:
03/14/11 16:44:55 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
94f71abddc9e24ed8c303352eb9a23d82e5b2c79
Message:

Tree documentation. Renamed TreeLearnerBase to _TreeLearner.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r7737 r7738  
    191191.. _treelearner.py: code/treelearner.py 
    192192 
    193 Let us construct a :obj:`TreeLearner` to play with (treelearner.py`_, uses `lenses.tab`_): 
     193Let us construct a :obj:`TreeLearner` to play with (`treelearner.py`_, uses `lenses.tab`_): 
    194194 
    195195.. literalinclude:: code/treelearner.py 
    196196   :lines: 7-10 
    197197 
    198 There are three crucial components in learning: the split and stop 
    199 criteria, and the :obj:`exampleSplitter` YY (there are some others, 
     198There are three crucial components in learning: the :obj:`~TreeLearner.split` and 
     199:obj:`~TreeLearner.stop` 
     200criteria, and the example :obj:`~TreeLearner.splitter` (there are some others, 
    200201which become important during classification; we'll talk about them 
    201202later). They are not defined; if you use the learner, the slots are 
     
    212213 
    213214Stopping criteria 
    214 ----------------- 
     215================= 
    215216 
    216217The stop is trivial. The default is set by 
     
    250251 
    251252 
    252 Printing the Tree 
     253================================= 
     254Learner and Classifier Components 
     255================================= 
     256 
     257Classification trees are represented as a tree-like hierarchy of 
     258:obj:`Node` classes. 
     259 
     260Several classes described above are already functional and can 
     261(and mostly will) be used as they are: :obj:`Node`, 
     262:obj:`_TreeLearner` and :obj:`TreeClassifier`.  
     263Classes :obj:`SplitConstructor`, :obj:`StopCriteria`,  
     264:obj:`ExampleSplitter`, :obj:`Descender`  
     265are among the Orange (C++ implemented)  
     266classes that can be subtyped  
     267in Python. You can thus program your own components based on 
     268these classes. 
     269 
     270.. class:: Node 
     271 
     272    Node stores information about the learning examples belonging  
     273    to the node, a branch selector, a list of branches (if the node is  
     274    not a leaf) with their descriptions and strengths, and a classifier. 
     275 
     276    .. attribute:: distribution 
     277     
     278        Stores a distribution for learning examples belonging to the node. 
     279        Storing distributions can be disabled by setting the  
     280        :obj:`_TreeLearner`'s storeDistributions flag to false. 
     281 
     282    .. attribute:: contingency 
     283 
     284        Stores complete contingency matrices for the learning examples  
     285        belonging to the node. Storing contingencies can be enabled by  
     286        setting :obj:`_TreeLearner`'s :obj:`storeContingencies`  
     287        flag to true. Note that even when the flag is not  
     288        set, the contingencies get computed and stored to  
     289        :obj:`Node`, but are removed shortly afterwards.  
     290        The details are given in the  
     291        description of the :obj:`_TreeLearner` object. 
     292 
     293    .. attribute:: examples, weightID 
     294 
     295        Store a set of learning examples for the node and the 
     296        corresponding ID of /weight meta attribute. The root of the 
     297        tree stores a "master table" of examples, while other nodes' 
     298        :obj:`Orange.data.Table` contain reference to examples in 
     299        the root's :obj:`Orange.data.Table`. Examples are only stored 
     300        if a corresponding flag (:obj:`storeExamples`) has been 
     301        set while building the tree; to conserve the space, storing 
     302        is disabled by default. 
     303 
     304    .. attribute:: nodeClassifier 
     305 
     306        A classifier (usually, but not necessarily, a 
     307        :obj:`DefaultClassifier`) that can be used to classify 
     308        examples coming to the node. If the node is a leaf, this is 
     309        used to decide the final class (or class distribution) of an 
     310        example. If it's an internal node, it is stored if 
     311        :obj:`Node`'s flag :obj:`storeNodeClassifier` 
     312        is set. Since the :obj:`nodeClassifier` is needed by 
     313        :obj:`Descender` and for pruning (see far below), 
     314        this is the default behaviour; space consumption of the default 
     315        :obj:`DefaultClassifier` is rather small. You should 
     316        never disable this if you intend to prune the tree later. 
     317 
     318    If the node is a leaf, the remaining fields are None.  
     319    If it's an internal node, there are several additional fields. 
     320 
     321    .. attribute:: branches 
     322 
     323        Stores a list of subtrees, given as :obj:`Node`. 
     324        An element can be None; in this case the node is empty. 
     325 
     326    .. attribute:: branchDescriptions 
     327 
     328        A list with string descriptions for branches, constructed by 
     329        :obj:`SplitConstructor`. It can contain different kinds 
     330        of descriptions, but basically, expect things like 'red' or '>12.3'. 
     331 
     332    .. attribute:: branchSizes 
     333 
     334        Gives a (weighted) number of training examples that went into 
     335        each branch. This can be used later, for instance, for 
     336        modeling probabilities when classifying examples with 
     337        unknown values. 
     338 
     339    .. attribute:: branchSelector 
     340 
     341        Gives a branch for each example. The same object is used during 
     342        learning and classifying. The :obj:`branchSelector` is of 
     343        type :obj:`orange.Classifier`, since its job is similar to that 
     344        of a classifier: it gets an example and returns discrete 
     345        :obj:`Orange.data.Value` in range :samp:`[0, len(branches)-1]`. 
     346        When an example cannot be classified to any branch, the selector 
     347        can return a :obj:`Orange.data.Value` containing a special value 
     348        (sVal) which should be a discrete distribution 
     349        (DiscDistribution). This should represent a 
     350        :obj:`branchSelector`'s opinion of how to divide the 
     351        example between the branches. Whether the proposition will be 
     352        used or not depends upon the chosen :obj:`ExampleSplitter` 
     353        (when learning) or :obj:`Descender` (when classifying). 
     354 
     355    The lists :obj:`branches`, :obj:`branchDescriptions` and 
     356    :obj:`branchSizes` are of the same length; all of them are 
     357    defined if the node is internal and none if it is a leaf. 
     358 
     359    .. method:: treeSize() 
     360         
     361        Return the number of nodes in the subtrees (including the 
     362        node, excluding null-nodes). 
     363 
     364 
     365.. class:: _TreeClassifier 
     366 
     367    Classifies examples according to a tree stored in :obj:`tree`. 
     368    Not meant to be used directly. The :class:`TreeLearner` class 
     369    constructs :class:`TreeClassifier`. 
     370 
     371    .. attribute:: tree 
     372 
     373        The root of the tree, represented as a :class:`Node`. 
     374     
     375    Classification would be straightforward if there were no unknown  
     376    values or, in general, examples that cannot be placed into a  
     377    single branch. The response in such cases is determined by a 
     378    component :obj:`descender`. 
     379 
     380    :obj:`Descender` is an abstract object which is given an example 
     381    and whose basic job is to descend as far down the tree as possible, 
     382    according to the values of example's attributes. The 
     383    :obj:`Descender`: calls the node's :obj:`branchSelector` to get  
     384    the branch index. If it's a simple index, the corresponding branch  
     385    is followed. If not, it's up to descender to decide what to do, and 
     386    that's where descenders differ. A :obj:`descender` can choose  
     387    a single branch (for instance, the one that is the most recommended  
     388    by the :obj:`branchSelector`) or it can let the branches vote. 
     389 
     390    In general there are three possible outcomes of a descent. 
     391 
     392    #. Descender reaches a leaf. This happens when nothing went wrong  
     393       (there are no unknown or out-of-range values in the example) or  
     394       when things went wrong, but the descender smoothed them by  
     395       selecting a single branch and continued the descend. In this 
     396       case, the descender returns the reached :obj:`Node`. 
     397    #. :obj:`branchSelector` returned a distribution and the  
     398       :obj:`Descender` decided to stop the descend at this  
     399       (internal) node.  Again, descender returns the current  
     400       :obj:`Node` and nothing else. 
     401    #. :obj:`branchSelector` returned a distribution and the  
     402       :obj:`Node` wants to split the example (i.e., to decide the  
     403       class by voting).  
     404 
     405    It returns a :obj:`Node` and the vote-weights for the branches.  
     406    The weights can correspond to the distribution returned by 
     407    :obj:`branchSelector`, to the number of learning examples that 
     408    were assigned to each branch, or to something else. 
     409 
     410    :obj:`TreeClassifier` uses the descender to descend from the root.  
     411    If it returns only a :obj:`Node` and no distribution, the  
     412    descend should stop; it does not matter whether it's a leaf (the 
     413    first case above) or an internal node (the second case). The node's 
     414    :obj:`nodeClassifier` is used to decide the class. If the descender 
     415    returns a :obj:`Node` and a distribution, the :obj:`TreeClassifier` 
     416    recursively calls itself for each of the subtrees and the  
     417    predictions are weighted as requested by the descender. 
     418 
     419    When voting, subtrees do not predict the class but probabilities  
     420    of classes. The predictions are multiplied by weights, summed and  
     421    the most probable class is returned. 
     422 
     423    .. method:: vote() 
     424     
     425        It gets a  
     426        :obj:`Node`, an :obj:`Orange.data.Instance` and a distribution of  
     427        vote weights. For each node, it calls the  
     428        :obj:`classDistribution` and then multiplies  
     429        and sums the distribution. :obj:`vote` returns a normalized  
     430        distribution of predictions. 
     431 
     432    .. method:: classDistribution() 
     433 
     434        Gets 
     435        an additional parameter, a :obj:`Node` (default tree root). 
     436        :obj:`classDistribution` uses :obj:`descender`. If descender  
     437        reaches a leaf, it calls  
     438        :obj:`nodeClassifier`, otherwise it calls :obj:`vote`. 
     439 
     440        Thus, the :obj:`vote` and  
     441        :obj:`classDistribution` are written in a form of double  
     442        recursion. The recursive calls do not happen at each node of the  
     443        tree but only at nodes where a vote is needed (that is, at nodes  
     444        where the descender halts). 
     445 
     446    .. method:: __call__ 
     447 
     448        Calls the 
     449        descender. If it reaches a leaf, the class is predicted by the  
     450        leaf's :obj:`nodeClassifier`. Otherwise, it calls  
     451        :obj:`vote`. From now on, :obj:`vote` and  
     452        :obj:`classDistribution` interweave down the tree and return  
     453        a distribution of predictions. This method simply  
     454        chooses the most probable class. 
     455 
     456.. class:: _TreeLearner 
     457 
     458    The main learning object is :obj:`_TreeLearner`. It is basically  
     459    a skeleton into which the user must plug the components for particular  
     460    functions. This class is not meant to be used directly. You should 
     461    rather use :class:`TreeLearner`. 
     462 
     463    Components that govern the structure of the tree are :obj:`split` 
     464    (of type :obj:`SplitConstructor`), :obj:`stop` (of  
     465    type :obj:`StopCriteria` and :obj:`exampleSplitter` 
     466    (of type :obj:`ExampleSplitter`). 
     467 
     468    .. attribute:: split 
     469 
     470        Object of type :obj:`SplitConstructor`. Default value,  
     471        provided by :obj:`_TreeLearner`, is :obj:`SplitConstructor_Combined` 
     472        with separate constructors for discrete and continuous attributes.  
     473        Discrete attributes are used as are, while continuous attributes 
     474        are binarized. Gain ratio is used to select attributes. 
     475        A minimum of two examples in a leaf is required for discreter 
     476        and five examples in a leaf for continuous attributes.</DD> 
     477     
     478    .. attribute:: stop 
     479 
     480        Object of type :obj:`StopCriteria`. The default stopping 
     481        criterion stops induction when all examples in a node belong  
     482        to the same class. 
     483 
     484    .. attribute:: splitter 
     485 
     486        Object of type :obj:`ExampleSplitter`. The default splitter 
     487        is :obj:`ExampleSplitter_UnknownsAsSelector` that splits 
     488        the learning examples according to distributions given by the 
     489        selector. 
     490 
     491    .. attribute:: contingencyComputer 
     492     
     493        By default, this slot is left empty and ordinary contingency 
     494        matrices are computed for examples at each node. If need arises, 
     495        one can change the way the matrices are computed. This can be 
     496        used to change the way that unknown values are treated when 
     497        assessing qualities of attributes. As mentioned earlier, 
     498        the computed matrices can be used by split constructor and by 
     499        stopping criteria. On the other hand, they can be (and are) 
     500        ignored by some splitting constructors. 
     501 
     502    .. attribute:: nodeLearner 
     503 
     504        Induces a classifier from examples belonging to a node. The 
     505        same learner is used for internal nodes and for leaves. The 
     506        default :obj:`nodeLearner` is :obj:`Orange.classification.majority.MajorityLearner`. 
     507 
     508    .. attribute:: descender 
     509 
     510        Descending component that the induces :obj:`TreeClassifier` 
     511        will use. Default descender is  
     512        :obj:`Descender_UnknownMergeAsSelector` which votes using  
     513        the :obj:`branchSelector`'s distribution for vote weights. 
     514 
     515    .. attribute:: maxDepth 
     516 
     517        Gives maximal tree depth; 0 means that only root is generated.  
     518        The default is 100 to prevent any infinite tree induction due 
     519        to missettings in stop criteria. If you are sure you need 
     520        larger trees, increase it. If you, on the other hand, want 
     521        to lower this hard limit, you can do so as well. 
     522 
     523    .. attribute:: storeDistributions, storeContingencies, storeExamples, storeNodeClassifier 
     524 
     525        Decides whether to store class distributions, contingencies  
     526        and examples in :obj:`Node`, and whether the  
     527        :obj:`nodeClassifier` should be build for internal nodes.  
     528        By default, distributions and node classifiers are stored,  
     529        while contingencies and examples are not. You won't save any  
     530        memory by not storing distributions but storing contingencies, 
     531        since distributions actually points to the same distribution 
     532        that is stored in :obj:`contingency.classes`. 
     533 
     534    The :obj:`_TreeLearner` first sets the defaults for missing 
     535    components. Although stored in the actual :obj:`_TreeLearner`'s 
     536    fields, they are removed when the induction is finished. 
     537 
     538    Then it ensures that examples are stored in a table. This is needed 
     539    because the algorithm juggles with pointers to examples. If 
     540    examples are in a file or are fed through a filter, they are copied 
     541    to a table. Even if they are already in a table, they are copied 
     542    if :obj:`storeExamples` is set. This is to assure that pointers 
     543    remain pointing to examples even if the user later changes the 
     544    example table. If they are in the table and the :obj:`storeExamples` 
     545    flag is clear, we just use them as they are. This will obviously 
     546    crash in a multi-threaded system if one changes the table during 
     547    the tree induction. Well... don't do it. 
     548 
     549    Apriori class probabilities are computed. At this point we check 
     550    the sum of example weights; if it's zero, there are no examples and  
     551    we cannot proceed. A list of candidate attributes is set; in the 
     552    beginning, all attributes are candidates for the split criterion. 
     553 
     554    Now comes the recursive part of the :obj:`_TreeLearner`. Its arguments  
     555    are a set of examples, a weight meta-attribute ID (a tricky thing, 
     556    it can be always the same as the original or can change to  
     557    accomodate splitting of examples among branches), apriori class 
     558    distribution and a list of candidates (represented as a vector 
     559    of Boolean values). 
     560 
     561    The contingency matrix is computed next. This happens 
     562    even if the flag :obj:`storeContingencies` is false. 
     563    If the :obj:`contingencyComputer` is given we use it, 
     564    otherwise we construct just an ordinary contingency matrix. 
     565 
     566    A :obj:`stop` is called to see whether it's worth to continue. If  
     567    not, a :obj:`nodeClassifier` is built and the :obj:`Node` is  
     568    returned. Otherwise, a :obj:`nodeClassifier` is only built if  
     569    :obj:`forceNodeClassifier` flag is set. 
     570 
     571    To get a :obj:`Node`'s :obj:`nodeClassifier`, the  
     572    :obj:`nodeLearner`'s :obj:`smartLearn` function is called with  
     573    the given examples, weight ID and the just computed matrix. If  
     574    the learner can use the matrix (and the default,  
     575    :obj:`Orange.classification.majority.MajorityLearner`, can), it won't touch the examples. Thus, 
     576    a choice of :obj:`contingencyComputer` will, in many cases,  
     577    affect the :obj:`nodeClassifier`. The :obj:`nodeLearner` can 
     578    return no classifier; if so and if the classifier would be  
     579    needed for classification, the :obj:`TreeClassifier`'s function 
     580    returns DK or an empty distribution. If you're writing your own 
     581    tree classifier - pay attention. 
     582 
     583    If the induction is to continue, a :obj:`split` component is called.  
     584    If it fails to return a branch selector, induction stops and the  
     585    :obj:`Node` is returned. 
     586 
     587    :obj:`_TreeLearner` than uses :obj:`ExampleSplitter` to divide  
     588    the examples as described above. 
     589 
     590    The contingency gets removed at this point if it is not to be  
     591    stored. Thus, the :obj:`split`, :obj:`stop` and  
     592    :obj:`exampleSplitter` can use the contingency matrices if they will. 
     593 
     594    The :obj:`_TreeLearner` then recursively calls itself for each of  
     595    the non-empty subsets. If the splitter returnes a list of weights,  
     596    a corresponding weight is used for each branch. Besides, the  
     597    attribute spent by the splitter (if any) is removed from the  
     598    list of candidates for the subtree. 
     599 
     600    A subset of examples is stored in its corresponding tree node,  
     601    if so requested. If not, the new weight attributes are removed (if  
     602    any were created). 
     603 
     604 
     605Split constructors 
     606===================== 
     607 
     608Split construction is almost as exciting as waiting for a delayed flight. 
     609Boring, that is. Split constructors juggle  
     610with contingency matrices, with separate cases for discrete 
     611and continuous classes... Most split constructors work either for 
     612discrete or for continuous attributes. We suggest to use  
     613a :obj:`SplitConstructor_Combined` that delegates  
     614attributes to specialized split constructors. 
     615 
     616Split constructors that cannot handle attributes of particular 
     617type (discrete, continuous) do not report an error or a warning but 
     618simply skip the attribute. It is your responsibility to use a correct 
     619split constructor for your dataset. (May we again suggest 
     620using :obj:`SplitConstructor_Combined`?) 
     621 
     622The same components can be used either for inducing classification and 
     623regression trees. The only component that needs to be chosen accordingly 
     624is the 'measure' attribute for the :obj:`SplitConstructor_Measure` 
     625class (and derived classes). 
     626 
     627.. class:: SplitConstructor 
     628 
     629    Finds a suitable criteria for dividing the learning (and later testing) 
     630    examples coming to the node. The data it gets is a set of examples 
     631    (and, optionally, an ID of weight meta-attribute), a domain 
     632    contingency computed from examples, apriori class probabilities, 
     633    a list of candidate attributes it should consider and a node 
     634    classifier (if it was constructed, that is, if  
     635    :obj:`storeNodeClassifier` is left true). 
     636 
     637    The :obj:`SplitConstructor` should use the domain contingency 
     638    when possible. The reasons are two-fold; one is that it's faster 
     639    and the other is that the contingency matrices are not necessarily 
     640    constructed by simply counting the examples. Why and how is 
     641    explained later. There are, however, cases, when domain contingency 
     642    does not suffice, for examples, when ReliefF is used as a measure 
     643    of quality of attributes. In this case, there's no other way but 
     644    to use the examples and ignore the precomputed contingencies. 
     645 
     646    :obj:`SplitConstructor` returns most of the data we talked 
     647    about when describing the :obj:`Node`. It returns a classifier 
     648    to be used as :obj:`Node`'s :obj:`branchSelector`, a list of 
     649    branch descriptions and a list with the number of examples that 
     650    go into each branch. Just what we need for the :obj:`Node`. 
     651    It can return an empty list for the number of examples in branches; 
     652    in this case, the :obj:`_TreeLearner` will find the number itself 
     653    after splitting the example set into subsets. However, if a split 
     654    constructors can provide the numbers at no extra computational 
     655    cost, it should do so. 
     656 
     657    In addition, it returns a quality of the split; a number without 
     658    any fixed meaning except that higher numbers mean better splits. 
     659 
     660    If the constructed splitting criterion uses an attribute in such 
     661    a way that the attribute is 'completely spent' and should not be 
     662    considered as a split criterion in any of the subtrees (the 
     663    typical case of this are discrete attributes that are used 
     664    as-they-are, that is, without any binarization or subsetting), 
     665    then it should report the index of this attribute. Some splits 
     666    do not spend any attribute; this is indicated by returning a 
     667    negative index. 
     668 
     669    A :obj:`SplitConstructor` can veto the further tree induction 
     670    by returning no classifier. This can happen for many reasons. 
     671    A general one is related to number of examples in the branches. 
     672    :obj:`SplitConstructor` has a field :obj:`minSubset`, 
     673    which sets the minimal number of examples in a branch; null nodes, 
     674    however, are allowed. If there is no split where this condition 
     675    is met, :obj:`SplitConstructor` stops the induction. 
     676 
     677    .. attribute:: minSubset 
     678 
     679        Sets the minimal number of examples in non-null leaves. As 
     680        always in Orange (where not specified otherwise), "number of  
     681        examples" refers to the weighted number of examples. 
     682     
     683    .. method:: __call__(examples, [weightID=0, apriori_distribution, candidates])  
     684 
     685        Construct a split. Returns a tuple (:obj:`branchSelector`, 
     686        :obj:`branchDescriptions`, :obj:`subsetSizes`, :obj:`quality`, 
     687        :obj:`spentAttribute`). :obj:`spentAttribute` is -1 if no 
     688        attribute is completely spent by the split criterion. If no 
     689        split is constructed, the :obj:`selector`, :obj:`branchDescriptions` 
     690        and :obj:`subsetSizes` are None, while :obj:`quality` is 0.0 and 
     691        :obj:`spentAttribute` is -1. 
     692 
     693        :param examples:  Examples can be given in any acceptable form 
     694            (an :obj:`ExampleGenerator`, such as :obj:`ExampleTable`, or a 
     695            list of examples). 
     696        :param weightID: Optional; the default of 0 means that all 
     697            examples have a weight of 1.0.  
     698        :param apriori-distribution: Should be of type  
     699            :obj:`orange.Distribution` and candidates should be a Python  
     700            list of objects which are interpreted as booleans. 
     701        :param candidates: The split constructor should consider only  
     702            the attributes in the candidate list (one boolean for each 
     703            attribute). 
     704 
     705 
     706.. class:: SplitConstructor_Measure 
     707 
     708    Bases: :class:`SplitConstructor` 
     709 
     710    An abstract base class for split constructors that employ  
     711    a :class:`Orange.feature.scoring.Measure` to assess a quality of a split.  
     712    All split constructors except for :obj:`SplitConstructor_Combined` 
     713    are derived from this class. 
     714 
     715    .. attribute:: measure 
     716 
     717        A component of type :class:`Orange.feature.scoring.Measure` used for 
     718        split evaluation. You must select a  
     719        :class:`Orange.feature.scoring.Measure` capable of handling your  
     720        class type  
     721        - for example, you cannot use :class:`Orange.feature.scoring.GainRatio` 
     722        for building regression trees or :class:`Orange.feature.scoring.MSE` 
     723        for classification trees. 
     724 
     725    .. attribute:: worstAcceptable 
     726 
     727        The lowest required split quality for a split to be acceptable. 
     728        Note that this value make sense only in connection with a 
     729        :obj:`measure` component. Default is 0.0. 
     730 
     731.. class:: SplitConstructor_Attribute 
     732 
     733    Bases: :class:`SplitConstructor_Measure` 
     734 
     735    Attempts to use a discrete attribute as a split; each value of the  
     736    attribute corresponds to a branch in the tree. Attributes are 
     737    evaluated with the :obj:`measure` and the one with the 
     738    highest score is used for a split. If there is more than one 
     739    attribute with the highest score, one of them is selected by random. 
     740 
     741    The constructed :obj:`branchSelector` is an instance of  
     742    :obj:`orange.ClassifierFromVarFD` that returns a value of the  
     743    selected attribute. If the attribute is  
     744    :obj:`Orange.data.variable.Discrete`, 
     745    :obj:`branchDescription`'s are the attribute's values. The  
     746    attribute is marked as spent, so that it cannot reappear in the  
     747    node's subtrees. 
     748 
     749.. class:: SplitConstructor_ExhaustiveBinary 
     750 
     751    Bases: :class:`SplitConstructor_Measure` 
     752 
     753    Works on discrete attributes. For each attribute, it determines 
     754    which binarization of the attribute gives the split with the 
     755    highest score. If more than one split has the highest score, 
     756    one of them is selected by random. After trying all the attributes, 
     757    it returns one of those with the highest score. 
     758 
     759    The constructed :obj:`branchSelector` is again an instance 
     760    :obj:`orange.ClassifierFromVarFD` that returns a value of the 
     761    selected attribute. This time, however, its :obj:`transformer` 
     762    contains an instance of :obj:`MapIntValue` that maps the values 
     763    of the attribute into a binary attribute. Branch descriptions are 
     764    of form "[<val1>, <val2>, ...<valn>]" for branches corresponding to 
     765    more than one value of the attribute. Branches that correspond to 
     766    a single value of the attribute are described with this value. If  
     767    the attribute was originally binary, it is spent and cannot be  
     768    used in the node's subtrees. Otherwise, it can reappear in the  
     769    subtrees. 
     770 
     771 
     772.. class:: SplitConstructor_Threshold 
     773 
     774    Bases: :class:`SplitConstructor_Measure` 
     775 
     776    This is currently the only constructor for splits with continuous  
     777    attributes. It divides the range of attributes values with a threshold  
     778    that maximizes the split's quality. As always, if there is more than  
     779    one split with the highest score, a random threshold is selected.  
     780    The attribute that yields the highest binary split is returned. 
     781 
     782    The constructed :obj:`branchSelector` is again an instance of  
     783    :obj:`orange.ClassifierFromVarFD` with an attached  
     784    :obj:`transformer`. This time, :obj:`transformer` is of type  
     785    :obj:`orange.ThresholdDiscretizer`. The branch descriptions are  
     786    "<threshold" and ">=threshold". The attribute is not spent. 
     787 
     788.. class:: SplitConstructor_OneAgainstOthers 
     789     
     790    Bases: :class:`SplitConstructor_Measure` 
     791 
     792    Undocumented. 
     793 
     794.. class:: SplitConstructor_Combined 
     795 
     796    Bases: :class:`SplitConstructor` 
     797 
     798    This constructor delegates the task of finding the optimal split  
     799    to separate split constructors for discrete and for continuous 
     800    attributes. Each split constructor is called, given only attributes 
     801    of appropriate types as candidates. Both construct a candidate for 
     802    a split; the better of them is selected. 
     803 
     804    (Note that there is a problem when more candidates have the same 
     805    score. Let there be are nine discrete attributes with the highest 
     806    score; the split constructor for discrete attributes will select 
     807    one of them. Now, let us suppose that there is a single continuous 
     808    attribute with the same score. :obj:`SplitConstructor_Combined` 
     809    would randomly select between the proposed discrete attribute and  
     810    the continuous attribute, not aware of the fact that the discrete 
     811    has already competed with eight other discrete attributes. So,  
     812    he probability for selecting (each) discrete attribute would be 1/18 
     813    instead of 1/10. Although not really correct, we doubt that this 
     814    would affect the tree's performance; many other machine learning 
     815    systems simply choose the first attribute with the highest score  
     816    anyway.) 
     817 
     818    The :obj:`branchSelector`, :obj:`branchDescriptions` and whether  
     819    the attribute is spent is decided by the winning split constructor. 
     820 
     821    .. attribute: discreteSplitConstructor 
     822 
     823        Split constructor for discrete attributes; can be, for instance, 
     824        :obj:`SplitConstructor_Attribute` or  
     825        :obj:`SplitConstructor_ExhaustiveBinary`. 
     826 
     827    .. attribute: continuousSplitConstructor 
     828 
     829        Split constructor for continuous attributes; at the moment, it  
     830        can be either :obj:`SplitConstructor_Threshold` or a  
     831        split constructor you programmed in Python. 
     832 
     833    .. attribute: continuousSplitConstructor 
     834     
     835        Split constructor for continuous attributes; at the moment, it  
     836        can be either :obj:`SplitConstructor_Threshold` or a split 
     837        constructor you programmed in Python. 
     838 
     839 
     840StopCriteria and StopCriteria_common 
     841============================================ 
     842 
     843obj:`StopCriteria` determines when to stop the induction of subtrees.  
     844 
     845.. class:: StopCriteria 
     846 
     847    Given a set of examples, weight ID and contingency matrices, decide 
     848    whether to continue the induction or not. The basic criterion checks 
     849    whether there are any examples and whether they belong to at least 
     850    two different classes (if the class is discrete). Derived components 
     851    check things like the number of examples and the proportion of 
     852    majority classes. 
     853 
     854    As opposed to :obj:`SplitConstructor` and similar basic classes, 
     855    :obj:`StopCriteria` is not an abstract but a fully functional 
     856    class that provides the basic stopping criteria. That is, the tree 
     857    induction stops when there is at most one example left; in this case, 
     858    it is not the weighted but the actual number of examples that counts. 
     859    Besides that, the induction stops when all examples are in the same 
     860    class (for discrete problems) or have the same value of the outcome 
     861    (for regression problems). 
     862 
     863    .. method:: __call__(examples[, weightID, domain contingencies]) 
     864 
     865        Decides whether to stop (true) or continue (false) the induction. 
     866        If contingencies are given, they are used for checking whether 
     867        the examples are in the same class (but not for counting the 
     868        examples). Derived classes should use the contingencies whenever 
     869        possible. If contingencies are not given, :obj:`StopCriteria` 
     870        will work without them. Derived classes should also use them if 
     871        they are available, but otherwise compute them only when they 
     872        really need them. 
     873 
     874 
     875 
     876.. class:: StopCriteria_common 
     877 
     878    :obj:`StopCriteria` contains additional criteria for pre-pruning: 
     879    it checks the proportion of majority class and the number of weighted 
     880    examples. 
     881 
     882    .. attribute:: maxMajor 
     883 
     884        Maximal proportion of majority class. When this is exceeded,  
     885        induction stops. 
     886 
     887    .. attribute:: minExamples 
     888 
     889        Minimal number of examples in internal leaves. Subsets with less 
     890        than :obj:`minExamples` examples are not split any further. 
     891        Example count is weighed. 
     892 
     893.. class:: StopCriteria_Python 
     894 
     895    Undocumented. 
     896 
     897Example Splitters 
     898================= 
     899 
     900Just like the :obj:`Descender` decides the branch for an 
     901example during classification, the :obj:`ExampleSplitter` 
     902sorts the learning examples into branches. 
     903 
     904:obj:`ExampleSplitter` is given a :obj:`Node` (from which  
     905it can use different stuff, but most of splitters only use the  
     906:obj:`branchSelector`), a set of examples to be divided, and  
     907the weight ID. The result is a list of subsets of examples 
     908and, optionally, a list of new weight ID's. 
     909 
     910Most  
     911:obj:`ExampleSplitter` classes simply call the node's  
     912:obj:`branchSelector` and assign examples to corresponding  
     913branches. When the value is unknown they choose a particular  
     914branch or simply skip the example. 
     915 
     916Some enhanced splitters can split examples. An example (actually,  
     917a pointer to it) is copied to more than one subset. To facilitate 
     918real splitting, weights are needed. Each branch is assigned a 
     919weight ID (each would usually have its own ID) and all examples 
     920that are in that branch (either completely or partially) should 
     921have this meta attribute. If an example hasn't been split, it 
     922has only one additional attribute - with weight ID corresponding 
     923to the subset to which it went. Example that is split between, 
     924say, three subsets, has three new meta attributes, one for each 
     925subset. ID's of weight meta attributes are returned by the 
     926:obj:`ExampleSplitter` to be used at induction of the 
     927corresponding subtrees. 
     928 
     929Note that weights are used only when needed. When no splitting 
     930occured - because the splitter is not able to do it or becauser 
     931there was no need for splitting - no weight ID's are returned. 
     932 
     933 
     934.. class:: ExampleSplitter 
     935 
     936    An abstract base class for objects that split sets of examples into  
     937    subsets. The derived classes treat examples which 
     938    cannot be unambiguously placed into a single branch (usually due 
     939    to unknown value of the crucial attribute) differently. 
     940 
     941    .. method:: __call__(node, examples[, weightID]) 
     942         
     943        Use the information in :obj:`node` (particularly the  
     944        :obj:`branchSelector`) to split the given set of examples into subsets.  
     945        Return a tuple with a list of example generators and a list of weights.  
     946        The list of weights is either an ordinary python list of integers or  
     947        a None when no splitting of examples occurs and thus no weights are  
     948        needed. 
     949 
     950 
     951.. class:: ExampleSplitter_IgnoreUnknowns 
     952 
     953    Bases: :class:`ExampleSplitter` 
     954 
     955    Simply ignores the examples for which no single branch can be determined. 
     956 
     957.. class:: ExampleSplitter_UnknownsToCommon 
     958 
     959    Bases: :class:`ExampleSplitter` 
     960 
     961    Places all such examples to a branch with the highest number of 
     962    examples. If there is more than one such branch, one is selected at 
     963    random and then used for all examples. 
     964 
     965.. class:: ExampleSplitter_UnknownsToAll 
     966 
     967    Bases: :class:`ExampleSplitter` 
     968 
     969    Places examples with unknown value of the attribute into all branches. 
     970 
     971.. class:: ExampleSplitter_UnknownsToRandom 
     972 
     973    Bases: :class:`ExampleSplitter` 
     974 
     975    Selects a random branch for such examples. 
     976 
     977.. class:: ExampleSplitter_UnknownsToBranch 
     978 
     979    Bases: :class:`ExampleSplitter` 
     980 
     981    Constructs an additional branch to contain all such examples.  
     982    The branch's description is "unknown". 
     983 
     984.. class:: ExampleSplitter_UnknownsAsBranchSizes 
     985 
     986    Bases: :class:`ExampleSplitter` 
     987 
     988    Splits examples with unknown value of the attribute according to  
     989    proportions of examples in each branch. 
     990 
     991.. class:: ExampleSplitter_UnknownsAsSelector 
     992 
     993    Bases: :class:`ExampleSplitter` 
     994 
     995    Splits examples with unknown value of the attribute according to  
     996    distribution proposed by selector (which is in most cases the same  
     997    as proportions of examples in branches). 
     998 
     999Descenders 
     1000============================= 
     1001 
     1002This is a classifier's counterpart for :class:`ExampleSplitter`. It  
     1003decides the destiny of examples that need to be classified and cannot 
     1004be unambiguously put in a branch. 
     1005 
     1006.. class:: Descender 
     1007 
     1008    An abstract base object for tree descenders. 
     1009 
     1010    .. method:: __call__(node, example) 
     1011 
     1012        Descends down the tree until it reaches a leaf or a node in  
     1013        which a vote of subtrees is required. In both cases, a tuple  
     1014        of two elements is returned; in the former, the tuple contains  
     1015        the reached node and None, in the latter in  
     1016        contains a node and weights of votes for subtrees (a list of floats). 
     1017 
     1018        :obj:`Descender`'s that never split examples always descend 
     1019        to a leaf, but they differ in the treatment of examples with 
     1020        unknown values (or, in general, examples for which a branch 
     1021        cannot be determined at some node(s) the tree). 
     1022        :obj:`Descender`'s that do split examples differ in returned 
     1023        vote weights. 
     1024 
     1025.. class:: Descender_UnknownsToNode 
     1026 
     1027    Bases: :obj:`Descender` 
     1028 
     1029    When example cannot be classified into a single branch, the 
     1030    current node is returned. Thus, the node's :obj:`NodeClassifier` 
     1031    will be used to make a decision. It is your responsibility to see 
     1032    that even the internal nodes have their :obj:`NodeClassifier` 
     1033    (i.e., don't disable creating node classifier or manually remove 
     1034    them after the induction, that's all) 
     1035 
     1036.. class:: Descender_UnknownsToBranch 
     1037 
     1038    Bases: :obj:`Descender` 
     1039 
     1040    Classifies examples with unknown value to a special branch. This 
     1041    makes sense only if the tree itself was constructed with 
     1042    :obj:`ExampleSplitter_UnknownsToBranch`. 
     1043 
     1044.. class:: Descender_UnknownsToCommonBranch 
     1045 
     1046    Bases: :obj:`Descender` 
     1047 
     1048    Classifies examples with unknown values to the branch with the 
     1049    highest number of examples. If there is more than one such branch, 
     1050    random branch is chosen for each example that is to be classified. 
     1051 
     1052.. class:: Descender_UnknownsToCommonSelector 
     1053 
     1054    Bases: :obj:`Descender` 
     1055 
     1056    Classifies examples with unknown values to the branch which received  
     1057    the highest recommendation by the selector. 
     1058 
     1059.. class:: Descender_MergeAsBranchSizes 
     1060 
     1061    Bases: :obj:`Descender` 
     1062 
     1063    Makes the subtrees vote for the example's class; the vote is 
     1064    weighted according to the sizes of the branches. 
     1065 
     1066.. class:: Descender_MergeAsSelector 
     1067 
     1068    Bases: :obj:`Descender` 
     1069 
     1070    Makes the subtrees vote for the example's class; the vote is  
     1071    weighted according to the selectors proposal. 
     1072 
     1073Pruning 
     1074======= 
     1075 
     1076.. index:: 
     1077    pair: classification trees; pruning 
     1078 
     1079Tree pruners derived from :obj:`Pruner` can be given either a 
     1080:obj:`Node` (presumably, but not necessarily a root) or a 
     1081:obj:`_TreeClassifier`. The result is a new :obj:`Node` 
     1082or a :obj:`_TreeClassifier` with a pruned tree. The original 
     1083tree remains intact. 
     1084 
     1085The pruners construct only a shallow copy of a tree. 
     1086The pruned tree's :obj:`Node` contain references to the same 
     1087contingency matrices, node classifiers, branch selectors, ... 
     1088as the original tree. Thus, you may modify a pruned tree structure 
     1089(manually cut it, add new nodes, replace components) but modifying, 
     1090for instance, some node's :obj:`nodeClassifier` (a 
     1091:obj:`nodeClassifier` itself, not a reference to it!) would modify 
     1092the node's :obj:`nodeClassifier` in the corresponding node of 
     1093the original tree. 
     1094 
     1095Pruners cannot construct a 
     1096:obj:`nodeClassifier` nor merge :obj:`nodeClassifier` of the pruned 
     1097subtrees into classifiers for new leaves. Thus, if you want to build 
     1098a prunable tree, internal nodes must have their :obj:`nodeClassifier` 
     1099defined. Fortunately, this is the default. 
     1100 
     1101.. class:: Pruner 
     1102 
     1103    This is an abstract base class which defines nothing useful, only  
     1104    a pure virtual call operator. 
     1105 
     1106    .. method:: __call__(tree) 
     1107 
     1108        Prunes a tree. The argument can be either a tree classifier or  
     1109        a tree node; the result is of the same type as the argument. 
     1110 
     1111.. class:: Pruner_SameMajority 
     1112 
     1113    Bases: :class:`Pruner` 
     1114 
     1115    In Orange, a tree can have a non-trivial subtrees (i.e. subtrees  
     1116    with more than one leaf) in which all the leaves have the same majority  
     1117    class. (This is allowed because those leaves can still have different 
     1118    distributions of classes and thus predict different probabilities.)  
     1119    However, this can be undesired when we're only interested in the  
     1120    class prediction or a simple tree interpretation. The  
     1121    :obj:`Pruner_SameMajority` prunes the tree so that there is no 
     1122    subtree in which all the nodes would have the same majority class. 
     1123 
     1124    This pruner will only prune the nodes in which the node classifier  
     1125    is of class :obj:`orange.DefaultClassifier` (or from a derived class). 
     1126 
     1127    Note that the leaves with more than one majority class require some  
     1128    special handling. The pruning goes backwards, from leaves to the root.  
     1129    When siblings are compared, the algorithm checks whether they  
     1130    have (at least one) common majority class. If so, they can be pruned. 
     1131 
     1132.. class:: Pruner_m 
     1133 
     1134    Bases: :class:`Pruner` 
     1135 
     1136    Prunes a tree by comparing m-estimates of static and dynamic  
     1137    error as defined in (Bratko, 2002). 
     1138 
     1139    .. attribute:: m 
     1140 
     1141        Parameter m for m-estimation. 
     1142 
     1143Printing the tree 
    2531144================= 
    2541145 
     
    3571248 
    3581249Examples 
    359 ======== 
     1250-------- 
    3601251 
    3611252We shall build a small tree from the iris data set - we shall limit the 
     
    5251416    |    |    |    petal length>=4.850: 86.0% (95.6%) 
    5261417 
    527 See what's in the parentheses in the root node? If :meth:`TreeClassifier.dump` 
     1418See what's in the parentheses in the root node? If :meth:`~TreeClassifier.dump` 
    5281419cannot compute something (in this case it's because the root has no parent), 
    5291420it prints out a dot. You can also eplace :samp:`=` by :samp:`!=` and it  
     
    6701561 
    6711562Defining Your Own Printout functions 
    672 ==================================== 
     1563------------------------------------ 
    6731564 
    6741565:meth:`TreeClassifier.dump`'s argument :obj:`userFormats` can be used to print out 
     
    7491640containing the regular expression and the callback function. 
    7501641 
    751  
    7521642We can now print out the iris tree: 
    7531643 
     
    7661656    |    |    petal length>=4.850: Iris-virginica 100% (104.55%) 
    7671657 
    768  
    7691658Plotting the Tree using Dot 
    770 =========================== 
     1659--------------------------- 
    7711660 
    7721661Suppose you saved the tree in a file "tree5.dot". You can then 
     
    7811670 
    7821671 
    783  
    784 Classification trees are represented as a tree-like hierarchy of 
    785 :obj:`Node` classes. 
    786  
    787 .. class:: Node 
    788  
    789     Node stores information about the learning examples belonging  
    790     to the node, a branch selector, a list of branches (if the node is  
    791     not a leaf) with their descriptions and strengths, and a classifier. 
    792  
    793     .. attribute:: distribution 
    794      
    795         Stores a distribution for learning examples belonging to the node. 
    796         Storing distributions can be disabled by setting the  
    797         :obj:`TreeLearnerBase`'s storeDistributions flag to false. 
    798  
    799     .. attribute:: contingency 
    800  
    801         Stores complete contingency matrices for the learning examples  
    802         belonging to the node. Storing contingencies can be enabled by  
    803         setting :obj:`TreeLearnerBase`'s :obj:`storeContingencies`  
    804         flag to true. Note that even when the flag is not  
    805         set, the contingencies get computed and stored to  
    806         :obj:`Node`, but are removed shortly afterwards.  
    807         The details are given in the  
    808         description of the :obj:`TreeLearnerBase` object. 
    809  
    810     .. attribute:: examples, weightID 
    811  
    812         Store a set of learning examples for the node and the 
    813         corresponding ID of /weight meta attribute. The root of the 
    814         tree stores a "master table" of examples, while other nodes' 
    815         :obj:`Orange.data.Table` contain reference to examples in 
    816         the root's :obj:`Orange.data.Table`. Examples are only stored 
    817         if a corresponding flag (:obj:`storeExamples`) has been 
    818         set while building the tree; to conserve the space, storing 
    819         is disabled by default. 
    820  
    821     .. attribute:: nodeClassifier 
    822  
    823         A classifier (usually, but not necessarily, a 
    824         :obj:`DefaultClassifier`) that can be used to classify 
    825         examples coming to the node. If the node is a leaf, this is 
    826         used to decide the final class (or class distribution) of an 
    827         example. If it's an internal node, it is stored if 
    828         :obj:`Node`'s flag :obj:`storeNodeClassifier` 
    829         is set. Since the :obj:`nodeClassifier` is needed by 
    830         :obj:`Descender` and for pruning (see far below), 
    831         this is the default behaviour; space consumption of the default 
    832         :obj:`DefaultClassifier` is rather small. You should 
    833         never disable this if you intend to prune the tree later. 
    834  
    835     If the node is a leaf, the remaining fields are None.  
    836     If it's an internal node, there are several additional fields. 
    837  
    838     .. attribute:: branches 
    839  
    840         Stores a list of subtrees, given as :obj:`Node`. 
    841         An element can be None; in this case the node is empty. 
    842  
    843     .. attribute:: branchDescriptions 
    844  
    845         A list with string descriptions for branches, constructed by 
    846         :obj:`SplitConstructor`. It can contain different kinds 
    847         of descriptions, but basically, expect things like 'red' or '>12.3'. 
    848  
    849     .. attribute:: branchSizes 
    850  
    851         Gives a (weighted) number of training examples that went into 
    852         each branch. This can be used later, for instance, for 
    853         modeling probabilities when classifying examples with 
    854         unknown values. 
    855  
    856     .. attribute:: branchSelector 
    857  
    858         Gives a branch for each example. The same object is used during 
    859         learning and classifying. The :obj:`branchSelector` is of 
    860         type :obj:`orange.Classifier`, since its job is similar to that 
    861         of a classifier: it gets an example and returns discrete 
    862         :obj:`Orange.data.Value` in range :samp:`[0, len(branches)-1]`. 
    863         When an example cannot be classified to any branch, the selector 
    864         can return a :obj:`Orange.data.Value` containing a special value 
    865         (sVal) which should be a discrete distribution 
    866         (DiscDistribution). This should represent a 
    867         :obj:`branchSelector`'s opinion of how to divide the 
    868         example between the branches. Whether the proposition will be 
    869         used or not depends upon the chosen :obj:`ExampleSplitter` 
    870         (when learning) or :obj:`Descender` (when classifying). 
    871  
    872     The lists :obj:`branches`, :obj:`branchDescriptions` and 
    873     :obj:`branchSizes` are of the same length; all of them are 
    874     defined if the node is internal and none if it is a leaf. 
    875  
    876     .. method:: treeSize() 
    877          
    878         Return the number of nodes in the subtrees (including the 
    879         node, excluding null-nodes). 
    880  
    881 ============ 
    882 Base classes 
    883 ============ 
    884  
    885 .. class:: _TreeClassifier 
    886  
    887     Classifies examples according to a tree stored in :obj:`tree`. 
    888     Not meant to be used directly. The :class:`TreeLearner` class 
    889     constructs :class:`TreeClassifier`. 
    890  
    891     .. attribute:: tree 
    892  
    893         The root of the tree, represented as a :class:`Node`. 
    894      
    895     Classification would be straightforward if there were no unknown  
    896     values or, in general, examples that cannot be placed into a  
    897     single branch. The response in such cases is determined by a 
    898     component :obj:`descender`. 
    899  
    900     :obj:`Descender` is an abstract object which is given an example 
    901     and whose basic job is to descend as far down the tree as possible, 
    902     according to the values of example's attributes. The 
    903     :obj:`Descender`: calls the node's :obj:`branchSelector` to get  
    904     the branch index. If it's a simple index, the corresponding branch  
    905     is followed. If not, it's up to descender to decide what to do, and 
    906     that's where descenders differ. A :obj:`descender` can choose  
    907     a single branch (for instance, the one that is the most recommended  
    908     by the :obj:`branchSelector`) or it can let the branches vote. 
    909  
    910     In general there are three possible outcomes of a descent. 
    911  
    912     #. Descender reaches a leaf. This happens when nothing went wrong  
    913        (there are no unknown or out-of-range values in the example) or  
    914        when things went wrong, but the descender smoothed them by  
    915        selecting a single branch and continued the descend. In this 
    916        case, the descender returns the reached :obj:`Node`. 
    917     #. :obj:`branchSelector` returned a distribution and the  
    918        :obj:`Descender` decided to stop the descend at this  
    919        (internal) node.  Again, descender returns the current  
    920        :obj:`Node` and nothing else. 
    921     #. :obj:`branchSelector` returned a distribution and the  
    922        :obj:`Node` wants to split the example (i.e., to decide the  
    923        class by voting).  
    924  
    925     It returns a :obj:`Node` and the vote-weights for the branches.  
    926     The weights can correspond to the distribution returned by 
    927     :obj:`branchSelector`, to the number of learning examples that 
    928     were assigned to each branch, or to something else. 
    929  
    930     :obj:`TreeClassifier` uses the descender to descend from the root.  
    931     If it returns only a :obj:`Node` and no distribution, the  
    932     descend should stop; it does not matter whether it's a leaf (the 
    933     first case above) or an internal node (the second case). The node's 
    934     :obj:`nodeClassifier` is used to decide the class. If the descender 
    935     returns a :obj:`Node` and a distribution, the :obj:`TreeClassifier` 
    936     recursively calls itself for each of the subtrees and the  
    937     predictions are weighted as requested by the descender. 
    938  
    939     When voting, subtrees do not predict the class but probabilities  
    940     of classes. The predictions are multiplied by weights, summed and  
    941     the most probable class is returned. 
    942  
    943     .. method:: vote() 
    944  
    945     .. method:: classDistribution() 
    946  
    947     If you'd like to understand how the classification works in C++,  
    948     start reading at :obj:`TTreeClassifier::vote`. It gets a  
    949     :obj:`Node`, an :obj:`Orange.data.Instance`> and a distribution of  
    950     vote weights. For each node, it calls the  
    951     :obj:`TTreeClassifier::classDistribution` and then multiplies  
    952     and sums the distribution. :obj:`vote` returns a normalized  
    953     distribution of predictions. 
    954  
    955     A new overload of :obj:`TTreeClassifier::classDistribution` gets 
    956     an additional parameter, a :obj:`Node`. This is done  
    957     for the sake of recursion. The normal version of  
    958     :obj:`classDistribution` simply calls the overloaded with a  
    959     tree root as an additional parameter. :obj:`classDistribution`  
    960     uses :obj:`descender`. If descender reaches a leaf, it calls  
    961     :obj:`nodeClassifier`, otherwise it calls :obj:`vote`. 
    962  
    963     Thus, the :obj:`TreeClassifier`'s :obj:`vote` and  
    964     :obj:`classDistribution` are written in a form of double  
    965     recursion. The recursive calls do not happen at each node of the  
    966     tree but only at nodes where a vote is needed (that is, at nodes  
    967     where the descender halts). 
    968  
    969     For predicting a class, :obj:`operator()`, calls the 
    970     descender. If it reaches a leaf, the class is predicted by the  
    971     leaf's :obj:`nodeClassifier`. Otherwise, it calls  
    972     :obj:`vote`>. From now on, :obj:`vote` and  
    973     :obj:`classDistribution` interweave down the tree and return  
    974     a distribution of predictions. :obj:`operator()` then simply  
    975     chooses the most probable class. 
    976  
    977 .. class:: TreeLearnerBase 
    978  
    979     The main learning object is :obj:`TreeLearnerBase`. It is basically  
    980     a skeleton into which the user must plug the components for particular  
    981     functions. This class is not meant to be used directly. You should 
    982     rather use :class:`TreeLearner`. 
    983  
    984     Components that govern the structure of the tree are :obj:`split` 
    985     (of type :obj:`SplitConstructor`), :obj:`stop` (of  
    986     type :obj:`StopCriteria` and :obj:`exampleSplitter` 
    987     (of type :obj:`ExampleSplitter`). 
    988  
    989     .. attribute:: split 
    990  
    991         Object of type :obj:`SplitConstructor`. Default value,  
    992         provided by :obj:`TreeLearnerBase`, is :obj:`SplitConstructor_Combined` 
    993         with separate constructors for discrete and continuous attributes.  
    994         Discrete attributes are used as are, while continuous attributes 
    995         are binarized. Gain ratio is used to select attributes. 
    996         A minimum of two examples in a leaf is required for discreter 
    997         and five examples in a leaf for continuous attributes.</DD> 
    998      
    999     .. attribute:: stop 
    1000  
    1001         Object of type :obj:`StopCriteria`. The default stopping 
    1002         criterion stops induction when all examples in a node belong  
    1003         to the same class. 
    1004  
    1005     .. attribute:: splitter 
    1006  
    1007         Object of type :obj:`ExampleSplitter`. The default splitter 
    1008         is :obj:`ExampleSplitter_UnknownsAsSelector` that splits 
    1009         the learning examples according to distributions given by the 
    1010         selector. 
    1011  
    1012     .. attribute:: contingencyComputer 
    1013      
    1014         By default, this slot is left empty and ordinary contingency 
    1015         matrices are computed for examples at each node. If need arises, 
    1016         one can change the way the matrices are computed. This can be 
    1017         used to change the way that unknown values are treated when 
    1018         assessing qualities of attributes. As mentioned earlier, 
    1019         the computed matrices can be used by split constructor and by 
    1020         stopping criteria. On the other hand, they can be (and are) 
    1021         ignored by some splitting constructors. 
    1022  
    1023     .. attribute:: nodeLearner 
    1024  
    1025         Induces a classifier from examples belonging to a node. The 
    1026         same learner is used for internal nodes and for leaves. The 
    1027         default :obj:`nodeLearner` is :obj:`Orange.classification.majority.MajorityLearner`. 
    1028  
    1029     .. attribute:: descender 
    1030  
    1031         Descending component that the induces :obj:`TreeClassifier` 
    1032         will use. Default descender is  
    1033         :obj:`Descender_UnknownMergeAsSelector` which votes using  
    1034         the :obj:`branchSelector`'s distribution for vote weights. 
    1035  
    1036     .. attribute:: maxDepth 
    1037  
    1038         Gives maximal tree depth; 0 means that only root is generated.  
    1039         The default is 100 to prevent any infinite tree induction due 
    1040         to missettings in stop criteria. If you are sure you need 
    1041         larger trees, increase it. If you, on the other hand, want 
    1042         to lower this hard limit, you can do so as well. 
    1043  
    1044     .. attribute:: storeDistributions, storeContingencies, storeExamples, storeNodeClassifier 
    1045  
    1046         Decides whether to store class distributions, contingencies  
    1047         and examples in :obj:`Node`, and whether the  
    1048         :obj:`nodeClassifier` should be build for internal nodes.  
    1049         By default, distributions and node classifiers are stored,  
    1050         while contingencies and examples are not. You won't save any  
    1051         memory by not storing distributions but storing contingencies, 
    1052         since distributions actually points to the same distribution 
    1053         that is stored in :obj:`contingency.classes`. 
    1054  
    1055     The :obj:`TreeLearnerBase` first sets the defaults for missing 
    1056     components. Although stored in the actual :obj:`TreeLearnerBase`'s 
    1057     fields, they are removed when the induction is finished. 
    1058  
    1059     Then it ensures that examples are stored in a table. This is needed 
    1060     because the algorithm juggles with pointers to examples. If 
    1061     examples are in a file or are fed through a filter, they are copied 
    1062     to a table. Even if they are already in a table, they are copied 
    1063     if :obj:`storeExamples` is set. This is to assure that pointers 
    1064     remain pointing to examples even if the user later changes the 
    1065     example table. If they are in the table and the :obj:`storeExamples` 
    1066     flag is clear, we just use them as they are. This will obviously 
    1067     crash in a multi-threaded system if one changes the table during 
    1068     the tree induction. Well... don't do it. 
    1069  
    1070     Apriori class probabilities are computed. At this point we check 
    1071     the sum of example weights; if it's zero, there are no examples and  
    1072     we cannot proceed. A list of candidate attributes is set; in the 
    1073     beginning, all attributes are candidates for the split criterion. 
    1074  
    1075     Now comes the recursive part of the :obj:`TreeLearnerBase`. Its arguments  
    1076     are a set of examples, a weight meta-attribute ID (a tricky thing, 
    1077     it can be always the same as the original or can change to  
    1078     accomodate splitting of examples among branches), apriori class 
    1079     distribution and a list of candidates (represented as a vector 
    1080     of Boolean values). 
    1081  
    1082     The contingency matrix is computed next. This happens 
    1083     even if the flag :obj:`storeContingencies` is false. 
    1084     If the :obj:`contingencyComputer` is given we use it, 
    1085     otherwise we construct just an ordinary contingency matrix. 
    1086  
    1087     A :obj:`stop` is called to see whether it's worth to continue. If  
    1088     not, a :obj:`nodeClassifier` is built and the :obj:`Node` is  
    1089     returned. Otherwise, a :obj:`nodeClassifier` is only built if  
    1090     :obj:`forceNodeClassifier` flag is set. 
    1091  
    1092     To get a :obj:`Node`'s :obj:`nodeClassifier`, the  
    1093     :obj:`nodeLearner`'s :obj:`smartLearn` function is called with  
    1094     the given examples, weight ID and the just computed matrix. If  
    1095     the learner can use the matrix (and the default,  
    1096     :obj:`Orange.classification.majority.MajorityLearner`, can), it won't touch the examples. Thus, 
    1097     a choice of :obj:`contingencyComputer` will, in many cases,  
    1098     affect the :obj:`nodeClassifier`. The :obj:`nodeLearner` can 
    1099     return no classifier; if so and if the classifier would be  
    1100     needed for classification, the :obj:`TreeClassifier`'s function 
    1101     returns DK or an empty distribution. If you're writing your own 
    1102     tree classifier - pay attention. 
    1103  
    1104     If the induction is to continue, a :obj:`split` component is called.  
    1105     If it fails to return a branch selector, induction stops and the  
    1106     :obj:`Node` is returned. 
    1107  
    1108     :obj:`TreeLearnerBase` than uses :obj:`ExampleSplitter` to divide  
    1109     the examples as described above. 
    1110  
    1111     The contingency gets removed at this point if it is not to be  
    1112     stored. Thus, the :obj:`split`, :obj:`stop` and  
    1113     :obj:`exampleSplitter` can use the contingency matrices if they will. 
    1114  
    1115     The :obj:`TreeLearnerBase` then recursively calls itself for each of  
    1116     the non-empty subsets. If the splitter returnes a list of weights,  
    1117     a corresponding weight is used for each branch. Besides, the  
    1118     attribute spent by the splitter (if any) is removed from the  
    1119     list of candidates for the subtree. 
    1120  
    1121     A subset of examples is stored in its corresponding tree node,  
    1122     if so requested. If not, the new weight attributes are removed (if  
    1123     any were created). 
    1124  
    1125 ======= 
    1126 Classes 
    1127 ======= 
    1128  
    1129 Several classes described above are already functional and can 
    1130 (and mostly will) be used as they are. Those classes are :obj:`Node`, 
    1131 :obj:`TreeLearnerBase` and :obj:`TreeClassifier`. This section describe  
    1132 the other classes. 
    1133  
    1134 Classes :obj:`SplitConstructor`, :obj:`StopCriteria`,  
    1135 :obj:`ExampleSplitter`, :obj:`Descender`, :obj:`orange.Learner` 
    1136 and :obj:`Classifier` are among the Orange classes that can be subtyped  
    1137 in Python and have the call operator overloadedd in such a way that it 
    1138 is callbacked from C++ code. You can thus program your own components 
    1139 for :obj:`TreeLearnerBase` and :obj:`TreeClassifier`. The detailed  
    1140 information on how this is done and what can go wrong, is given in a  
    1141 separate page, dedicated to callbacks to Python XXXXXXXXXX. 
    1142  
    1143 SplitConstructors 
    1144 ===================== 
    1145  
    1146 Split construction is almost as exciting as waiting for a delayed flight. 
    1147 Boring, that is. Split constructors are programmed as spaghetti code 
    1148 that juggles with contingency matrices, with separate cases for discrete 
    1149 and continuous classes... Most split constructors work either for 
    1150 discrete or for continuous attributes. The suggested practice is 
    1151 to use a :obj:`SplitConstructor_Combined` that can handle 
    1152 both by simply delegating attributes to specialized split constructors. 
    1153  
    1154 Note: split constructors that cannot handle attributes of particular 
    1155 type (discrete, continuous) do not report an error or a warning but 
    1156 simply skip the attribute. It is your responsibility to use a correct 
    1157 split constructor for your dataset. (May we again suggest 
    1158 using :obj:`SplitConstructor_Combined`?) 
    1159  
    1160 The same components can be used either for inducing classification and 
    1161 regression trees. The only component that needs to be chosen accordingly 
    1162 is the 'measure' attribute for the :obj:`SplitConstructor_Measure` 
    1163 class (and derived classes). 
    1164  
    1165 .. class:: SplitConstructor 
    1166  
    1167     Finds a suitable criteria for dividing the learning (and later testing) 
    1168     examples coming to the node. The data it gets is a set of examples 
    1169     (and, optionally, an ID of weight meta-attribute), a domain 
    1170     contingency computed from examples, apriori class probabilities, 
    1171     a list of candidate attributes it should consider and a node 
    1172     classifier (if it was constructed, that is, if  
    1173     :obj:`storeNodeClassifier` is left true). 
    1174  
    1175     The :obj:`SplitConstructor` should use the domain contingency 
    1176     when possible. The reasons are two-fold; one is that it's faster 
    1177     and the other is that the contingency matrices are not necessarily 
    1178     constructed by simply counting the examples. Why and how is 
    1179     explained later. There are, however, cases, when domain contingency 
    1180     does not suffice, for examples, when ReliefF is used as a measure 
    1181     of quality of attributes. In this case, there's no other way but 
    1182     to use the examples and ignore the precomputed contingencies. 
    1183  
    1184     The split constructor should consider only the attributes in the 
    1185     candidate list (the list is a vector of booleans, one for each 
    1186     attribute). 
    1187  
    1188     :obj:`SplitConstructor` returns most of the data we talked 
    1189     about when describing the :obj:`Node`. It returns a classifier 
    1190     to be used as :obj:`Node`'s :obj:`branchSelector`, a list of 
    1191     branch descriptions and a list with the number of examples that 
    1192     go into each branch. Just what we need for the :obj:`Node`. 
    1193     It can return an empty list for the number of examples in branches; 
    1194     in this case, the :obj:`TreeLearnerBase` will find the number itself 
    1195     after splitting the example set into subsets. However, if a split 
    1196     constructors can provide the numbers at no extra computational 
    1197     cost, it should do so. 
    1198  
    1199     In addition, it returns a quality of the split; a number without 
    1200     any fixed meaning except that higher numbers mean better splits. 
    1201  
    1202     If the constructed splitting criterion uses an attribute in such 
    1203     a way that the attribute is 'completely spent' and should not be 
    1204     considered as a split criterion in any of the subtrees (the 
    1205     typical case of this are discrete attributes that are used 
    1206     as-they-are, that is, without any binarization or subsetting), 
    1207     then it should report the index of this attribute. Some splits 
    1208     do not spend any attribute; this is indicated by returning a 
    1209     negative index. 
    1210  
    1211     A :obj:`SplitConstructor` can veto the further tree induction 
    1212     by returning no classifier. This can happen for many reasons. 
    1213     A general one is related to number of examples in the branches. 
    1214     :obj:`SplitConstructor` has a field :obj:`minSubset`, 
    1215     which sets the minimal number of examples in a branch; null nodes, 
    1216     however, are allowed. If there is no split where this condition 
    1217     is met, :obj:`SplitConstructor` stops the induction. 
    1218  
    1219     .. attribute:: minSubset 
    1220  
    1221         Sets the minimal number of examples in non-null leaves. As 
    1222         always in Orange (where not specified otherwise), "number of  
    1223         examples" refers to the weighted number of examples. 
    1224      
    1225     .. method:: __call__(examples, [weightID=0, apriori_distribution, candidates])  
    1226  
    1227         Construct a split. Returns a tuple (:obj:`branchSelector`, 
    1228         :obj:`branchDescriptions`, :obj:`subsetSizes`, :obj:`quality`, 
    1229         :obj:`spentAttribute`). :obj:`SpentAttribute` is -1 if no 
    1230         attribute is completely spent by the split criterion. If no 
    1231         split is constructed, the :obj:`selector`, :obj:`branchDescriptions` 
    1232         and :obj:`subsetSizes` are None, while :obj:`quality` is 0.0 and 
    1233         :obj:`spentAttribute` is -1. 
    1234  
    1235         :param examples:  Examples can be given in any acceptable form 
    1236             (an :obj:`ExampleGenerator`, such as :obj:`ExampleTable`, or a 
    1237             list of examples). 
    1238         :param weightID: Optional; the default of 0 means that all 
    1239             examples have a weight of 1.0.  
    1240         :param apriori-distribution: Should be of type  
    1241             :obj:`orange.Distribution` and candidates should be a Python  
    1242             list of objects which are interpreted as booleans. 
    1243  
    1244  
    1245  
    1246 .. class:: SplitConstructor_Measure 
    1247  
    1248     Bases: :class:`SplitConstructor` 
    1249  
    1250     An abstract base class for split constructors that employ  
    1251     a :obj:`Orange.feature.scoring.Measure` to assess a quality of a split.  
    1252     At present, 
    1253     all split constructors except for :obj:`SplitConstructor_Combined` 
    1254     are derived from this class. 
    1255  
    1256     .. attribute:: measure 
    1257  
    1258         A component of type :obj:`Orange.feature.scoring.Measure` used for 
    1259         evaluation of a split. Note that you must select the subclass  
    1260         :obj:`Orange.feature.scoring.Measure` capable of handling your  
    1261         class type  
    1262         - you cannot use :obj:`Orange.feature.scoring.GainRatio` 
    1263         for building regression trees or :obj:`Orange.feature.scoring.MSE` 
    1264         for classification trees. 
    1265  
    1266     .. attribute:: worstAcceptable 
    1267  
    1268         The lowest required split quality for a split to be acceptable. 
    1269         Note that this value make sense only in connection with a 
    1270         :obj:`measure` component. Default is 0.0. 
    1271  
    1272 .. class:: SplitConstructor_Attribute 
    1273  
    1274     Bases: :class:`SplitConstructor_Measure` 
    1275  
    1276     Attempts to use a discrete attribute as a split; each value of the  
    1277     attribute corresponds to a branch in the tree. Attributes are 
    1278     evaluated with the :obj:`measure` and the one with the 
    1279     highest score is used for a split. If there is more than one 
    1280     attribute with the highest score, one of them is selected by random. 
    1281  
    1282     The constructed :obj:`branchSelector` is an instance of  
    1283     :obj:`orange.ClassifierFromVarFD` that returns a value of the  
    1284     selected attribute. If the attribute is  
    1285     :obj:`Orange.data.variable.Discrete`, 
    1286     :obj:`branchDescription`'s are the attribute's values. The  
    1287     attribute is marked as spent, so that it cannot reappear in the  
    1288     node's subtrees. 
    1289  
    1290 .. class:: SplitConstructor_ExhaustiveBinary 
    1291  
    1292     Bases: :class:`SplitConstructor_Measure` 
    1293  
    1294     Works on discrete attributes. For each attribute, it determines 
    1295     which binarization of the attribute gives the split with the 
    1296     highest score. If more than one split has the highest score, 
    1297     one of them is selected by random. After trying all the attributes, 
    1298     it returns one of those with the highest score. 
    1299  
    1300     The constructed :obj:`branchSelector` is again an instance 
    1301     :obj:`orange.ClassifierFromVarFD` that returns a value of the 
    1302     selected attribute. This time, however, its :obj:`transformer` 
    1303     contains an instance of :obj:`MapIntValue` that maps the values 
    1304     of the attribute into a binary attribute. Branch descriptions are 
    1305     of form "[<val1>, <val2>, ...<valn>]" for branches corresponding to 
    1306     more than one value of the attribute. Branches that correspond to 
    1307     a single value of the attribute are described with this value. If  
    1308     the attribute was originally binary, it is spent and cannot be  
    1309     used in the node's subtrees. Otherwise, it can reappear in the  
    1310     subtrees. 
    1311  
    1312  
    1313 .. class:: SplitConstructor_Threshold 
    1314  
    1315     Bases: :class:`SplitConstructor_Measure` 
    1316  
    1317     This is currently the only constructor for splits with continuous  
    1318     attributes. It divides the range of attributes values with a threshold  
    1319     that maximizes the split's quality. As always, if there is more than  
    1320     one split with the highest score, a random threshold is selected.  
    1321     The attribute that yields the highest binary split is returned. 
    1322  
    1323     The constructed :obj:`branchSelector` is again an instance of  
    1324     :obj:`orange.ClassifierFromVarFD` with an attached  
    1325     :obj:`transformer`. This time, :obj:`transformer` is of type  
    1326     :obj:`orange.ThresholdDiscretizer`. The branch descriptions are  
    1327     "<threshold" and ">=threshold". The attribute is not spent. 
    1328  
    1329 .. class:: SplitConstructor_OneAgainstOthers 
    1330      
    1331     Bases: :class:`SplitConstructor_Measure` 
    1332  
    1333     Undocumented. 
    1334  
    1335 .. class:: SplitConstructor_Combined 
    1336  
    1337     Bases: :class:`SplitConstructor` 
    1338  
    1339     This constructor delegates the task of finding the optimal split  
    1340     to separate split constructors for discrete and for continuous 
    1341     attributes. Each split constructor is called, given only attributes 
    1342     of appropriate types as candidates. Both construct a candidate for 
    1343     a split; the better of them is selected. 
    1344  
    1345     (Note that there is a problem when more candidates have the same 
    1346     score. Let there be are nine discrete attributes with the highest 
    1347     score; the split constructor for discrete attributes will select 
    1348     one of them. Now, let us suppose that there is a single continuous 
    1349     attribute with the same score. :obj:`SplitConstructor_Combined` 
    1350     would randomly select between the proposed discrete attribute and  
    1351     the continuous attribute, not aware of the fact that the discrete 
    1352     has already competed with eight other discrete attributes. So,  
    1353     he probability for selecting (each) discrete attribute would be 1/18 
    1354     instead of 1/10. Although not really correct, we doubt that this 
    1355     would affect the tree's performance; many other machine learning 
    1356     systems simply choose the first attribute with the highest score  
    1357     anyway.) 
    1358  
    1359     The :obj:`branchSelector`, :obj:`branchDescriptions` and whether  
    1360     the attribute is spent is decided by the winning split constructor. 
    1361  
    1362     .. attribute: discreteSplitConstructor 
    1363  
    1364         Split constructor for discrete attributes; can be, for instance, 
    1365         :obj:`SplitConstructor_Attribute` or  
    1366         :obj:`SplitConstructor_ExhaustiveBinary`. 
    1367  
    1368     .. attribute: continuousSplitConstructor 
    1369  
    1370         Split constructor for continuous attributes; at the moment, it  
    1371         can be either :obj:`SplitConstructor_Threshold` or a  
    1372         split constructor you programmed in Python. 
    1373  
    1374     .. attribute: continuousSplitConstructor 
    1375      
    1376         Split constructor for continuous attributes; at the moment, it  
    1377         can be either :obj:`SplitConstructor_Threshold` or a split 
    1378         constructor you programmed in Python. 
    1379  
    1380  
    1381 StopCriteria and StopCriteria_common 
    1382 ============================================ 
    1383  
    1384 obj:`StopCriteria` determines when to stop the induction of subtrees.  
    1385  
    1386 .. class:: StopCriteria 
    1387  
    1388     Given a set of examples, weight ID and contingency matrices, decide 
    1389     whether to continue the induction or not. The basic criterion checks 
    1390     whether there are any examples and whether they belong to at least 
    1391     two different classes (if the class is discrete). Derived components 
    1392     check things like the number of examples and the proportion of 
    1393     majority classes. 
    1394  
    1395     As opposed to :obj:`SplitConstructor` and similar basic classes, 
    1396     :obj:`StopCriteria` is not an abstract but a fully functional 
    1397     class that provides the basic stopping criteria. That is, the tree 
    1398     induction stops when there is at most one example left; in this case, 
    1399     it is not the weighted but the actual number of examples that counts. 
    1400     Besides that, the induction stops when all examples are in the same 
    1401     class (for discrete problems) or have the same value of the outcome 
    1402     (for regression problems). 
    1403  
    1404     .. method:: __call__(examples[, weightID, domain contingencies]) 
    1405  
    1406         Decides whether to stop (true) or continue (false) the induction. 
    1407         If contingencies are given, they are used for checking whether 
    1408         the examples are in the same class (but not for counting the 
    1409         examples). Derived classes should use the contingencies whenever 
    1410         possible. If contingencies are not given, :obj:`StopCriteria` 
    1411         will work without them. Derived classes should also use them if 
    1412         they are available, but otherwise compute them only when they 
    1413         really need them. 
    1414  
    1415  
    1416  
    1417 .. class:: StopCriteria_common 
    1418  
    1419     :obj:`StopCriteria` contains additional criteria for pre-pruning: 
    1420     it checks the proportion of majority class and the number of weighted 
    1421     examples. 
    1422  
    1423     .. attribute:: maxMajor 
    1424  
    1425         Maximal proportion of majority class. When this is exceeded,  
    1426         induction stops. 
    1427  
    1428     .. attribute:: minExamples 
    1429  
    1430         Minimal number of examples in internal leaves. Subsets with less 
    1431         than :obj:`minExamples` examples are not split any further. 
    1432         Example count is weighed. 
    1433  
    1434 .. class:: StopCriteria_Python 
    1435  
    1436     Undocumented. 
    1437  
    1438 Classes derived from ExampleSplitters 
    1439 ======================================== 
    1440  
    1441 The :obj:`ExampleSplitter` sorts the learning examples into branches. 
    1442  
    1443 .. class:: ExampleSplitter 
    1444  
    1445     Just like the :obj:`Descender` decides the branch for an 
    1446     example during classification, the :obj:`ExampleSplitter` 
    1447     sorts the learning examples into branches. 
    1448  
    1449     :obj:`ExampleSplitter` is given a :obj:`Node` (from which  
    1450     it can use different stuff, but most of splitters only use the  
    1451     :obj:`branchSelector`), a set of examples to be divided, and  
    1452     the weight ID. The result is a list of subsets of examples 
    1453     and, optionally, a list of new weight ID's. 
    1454  
    1455     Subsets are usually stored as :obj:`ExamplePointerTable`'s. Most  
    1456     of :obj:`ExampleSplitters` simply call the node's  
    1457     :obj:`branchSelector` and assign examples to corresponding  
    1458     branches. When the value is unknown they choose a particular  
    1459     branch or simply skip the example. 
    1460  
    1461     Some enhanced splitters can split examples. An example (actually,  
    1462     a pointer to it) is copied to more than one subset. To facilitate 
    1463     real splitting, weights are needed. Each branch is assigned a 
    1464     weight ID (each would usually have its own ID) and all examples 
    1465     that are in that branch (either completely or partially) should 
    1466     have this meta attribute. If an example hasn't been split, it 
    1467     has only one additional attribute - with weight ID corresponding 
    1468     to the subset to which it went. Example that is split between, 
    1469     say, three subsets, has three new meta attributes, one for each 
    1470     subset. ID's of weight meta attributes are returned by the 
    1471     :obj:`ExampleSplitter` to be used at induction of the 
    1472     corresponding subtrees. 
    1473  
    1474     Note that weights are used only when needed. When no splitting 
    1475     occured - because the splitter is not able to do it or becauser 
    1476     there was no need for splitting - no weight ID's are returned. 
    1477  
    1478     An abstract base class for objects that split sets of examples into  
    1479     subsets. The derived classes differ in treatment of examples which 
    1480     cannot be unambiguously placed into a single branch (usually due 
    1481     to unknown value of the crucial attribute). 
    1482  
    1483     .. method:: __call__(node, examples[, weightID]) 
    1484          
    1485         Use the information in :obj:`node` (particularly the  
    1486         :obj:`branchSelector`) to split the given set of examples into subsets.  
    1487         Return a tuple with a list of example generators and a list of weights.  
    1488         The list of weights is either an ordinary python list of integers or  
    1489         a None when no splitting of examples occurs and thus no weights are  
    1490         needed. 
    1491  
    1492  
    1493 .. class:: ExampleSplitter_IgnoreUnknowns 
    1494  
    1495     Bases: :class:`ExampleSplitter` 
    1496  
    1497     Simply ignores the examples for which no single branch can be determined. 
    1498  
    1499 .. class:: ExampleSplitter_UnknownsToCommon 
    1500  
    1501     Bases: :class:`ExampleSplitter` 
    1502  
    1503     Places all such examples to a branch with the highest number of 
    1504     examples. If there is more than one such branch, one is selected at 
    1505     random and then used for all examples. 
    1506  
    1507 .. class:: ExampleSplitter_UnknownsToAll 
    1508  
    1509     Bases: :class:`ExampleSplitter` 
    1510  
    1511     Places examples with unknown value of the attribute into all branches. 
    1512  
    1513 .. class:: ExampleSplitter_UnknownsToRandom 
    1514  
    1515     Bases: :class:`ExampleSplitter` 
    1516  
    1517     Selects a random branch for such examples. 
    1518  
    1519 .. class:: ExampleSplitter_UnknownsToBranch 
    1520  
    1521     Bases: :class:`ExampleSplitter` 
    1522  
    1523     Constructs an additional branch to contain all such examples.  
    1524     The branch's description is "unknown". 
    1525  
    1526 .. class:: ExampleSplitter_UnknownsAsBranchSizes 
    1527  
    1528     Bases: :class:`ExampleSplitter` 
    1529  
    1530     Splits examples with unknown value of the attribute according to  
    1531     proportions of examples in each branch. 
    1532  
    1533 .. class:: ExampleSplitter_UnknownsAsSelector 
    1534  
    1535     Bases: :class:`ExampleSplitter` 
    1536  
    1537     Splits examples with unknown value of the attribute according to  
    1538     distribution proposed by selector (which is in most cases the same  
    1539     as proportions of examples in branches). 
    1540  
    1541 Descender and derived classes 
    1542 ================================= 
    1543  
    1544 This is a classifier's counterpart for :obj:`ExampleSplitter`. It  
    1545 decides the destiny of examples that need to be classified and cannot 
    1546 be unambiguously put in a branch. The detailed function of this class 
    1547 is given in description of classification with trees. XXXXXX 
    1548  
    1549 .. class:: Descender 
    1550  
    1551     An abstract base object for tree descenders. 
    1552  
    1553     .. method:: __call__(node, example) 
    1554  
    1555         Descends down the tree until it reaches a leaf or a node in  
    1556         which a vote of subtrees is required. In both cases, a tuple  
    1557         of two elements is returned; in the former, the tuple contains  
    1558         the reached node and None, in the latter in  
    1559         contains a node and weights of votes for subtrees (a list of floats). 
    1560  
    1561         :obj:`Descender`'s that never split examples always descend 
    1562         to a leaf, but they differ in the treatment of examples with 
    1563         unknown values (or, in general, examples for which a branch 
    1564         cannot be determined at some node(s) the tree). 
    1565         :obj:`Descender`'s that do split examples differ in returned 
    1566         vote weights. 
    1567  
    1568 .. class:: Descender_UnknownsToNode 
    1569  
    1570     Bases: :obj:`Descender` 
    1571  
    1572     When example cannot be classified into a single branch, the 
    1573     current node is returned. Thus, the node's :obj:`NodeClassifier` 
    1574     will be used to make a decision. It is your responsibility to see 
    1575     that even the internal nodes have their :obj:`NodeClassifier` 
    1576     (i.e., don't disable creating node classifier or manually remove 
    1577     them after the induction, that's all) 
    1578  
    1579 .. class:: Descender_UnknownsToBranch 
    1580  
    1581     Bases: :obj:`Descender` 
    1582  
    1583     Classifies examples with unknown value to a special branch. This 
    1584     makes sense only if the tree itself was constructed with 
    1585     :obj:`ExampleSplitter_UnknownsToBranch`. 
    1586  
    1587 .. class:: Descender_UnknownsToCommonBranch 
    1588  
    1589     Bases: :obj:`Descender` 
    1590  
    1591     Classifies examples with unknown values to the branch with the 
    1592     highest number of examples. If there is more than one such branch, 
    1593     random branch is chosen for each example that is to be classified. 
    1594  
    1595 .. class:: Descender_UnknownsToCommonSelector 
    1596  
    1597     Bases: :obj:`Descender` 
    1598  
    1599     Classifies examples with unknown values to the branch which received  
    1600     the highest recommendation by the selector. 
    1601  
    1602 .. class:: Descender_MergeAsBranchSizes 
    1603  
    1604     Bases: :obj:`Descender` 
    1605  
    1606     Makes the subtrees vote for the example's class; the vote is 
    1607     weighted according to the sizes of the branches. 
    1608  
    1609 .. class:: Descender_MergeAsSelector 
    1610  
    1611     Bases: :obj:`Descender` 
    1612  
    1613     Makes the subtrees vote for the example's class; the vote is  
    1614     weighted according to the selectors proposal. 
    1615  
    1616 Pruning 
    1617 ======= 
    1618  
    1619 .. index:: 
    1620     pair: classification trees; pruning 
    1621  
    1622 Tree pruners derived from :obj:`Pruner` can be given either a 
    1623 :obj:`Node` (presumably, but not necessarily a root) or a 
    1624 :obj:`_TreeClassifier`. The result is a new :obj:`Node` 
    1625 or a :obj:`_TreeClassifier` with a pruned tree. The original 
    1626 tree remains intact. 
    1627  
    1628 The pruners construct only a shallow copy of a tree. 
    1629 The pruned tree's :obj:`Node` contain references to the same 
    1630 contingency matrices, node classifiers, branch selectors, ... 
    1631 as the original tree. Thus, you may modify a pruned tree structure 
    1632 (manually cut it, add new nodes, replace components) but modifying, 
    1633 for instance, some node's :obj:`nodeClassifier` (a 
    1634 :obj:`nodeClassifier` itself, not a reference to it!) would modify 
    1635 the node's :obj:`nodeClassifier` in the corresponding node of 
    1636 the original tree. 
    1637  
    1638 Pruners cannot construct a 
    1639 :obj:`nodeClassifier` nor merge :obj:`nodeClassifier` of the pruned 
    1640 subtrees into classifiers for new leaves. Thus, if you want to build 
    1641 a prunable tree, internal nodes must have their :obj:`nodeClassifier` 
    1642 defined. Fortunately, this is the default. 
    1643  
    1644 .. class:: Pruner 
    1645  
    1646     This is an abstract base class which defines nothing useful, only  
    1647     a pure virtual call operator. 
    1648  
    1649     .. method:: __call__(tree) 
    1650  
    1651         Prunes a tree. The argument can be either a tree classifier or  
    1652         a tree node; the result is of the same type as the argument. 
    1653  
    1654 .. class:: Pruner_SameMajority 
    1655  
    1656     Bases: :class:`Pruner` 
    1657  
    1658     In Orange, a tree can have a non-trivial subtrees (i.e. subtrees  
    1659     with more than one leaf) in which all the leaves have the same majority  
    1660     class. (This is allowed because those leaves can still have different 
    1661     distributions of classes and thus predict different probabilities.)  
    1662     However, this can be undesired when we're only interested in the  
    1663     class prediction or a simple tree interpretation. The  
    1664     :obj:`Pruner_SameMajority` prunes the tree so that there is no 
    1665     subtree in which all the nodes would have the same majority class. 
    1666  
    1667     This pruner will only prune the nodes in which the node classifier  
    1668     is of class :obj:`orange.DefaultClassifier` (or from a derived class). 
    1669  
    1670     Note that the leaves with more than one majority class require some  
    1671     special handling. The pruning goes backwards, from leaves to the root.  
    1672     When siblings are compared, the algorithm checks whether they  
    1673     have (at least one) common majority class. If so, they can be pruned. 
    1674  
    1675 .. class:: Pruner_m 
    1676  
    1677     Bases: :class:`Pruner` 
    1678  
    1679     Prunes a tree by comparing m-estimates of static and dynamic  
    1680     error as defined in (Bratko, 2002). 
    1681  
    1682     .. attribute:: m 
    1683  
    1684         Parameter m for m-estimation. 
    1685  
    1686 Undocumented 
    1687 ============ 
    1688  
    1689 .. class:: NodeList 
    1690  
    1691     Undocumented. 
    1692  
    1693 .. class:: C45NodeList 
    1694  
    1695     Undocumented. 
    16961672 
    16971673=========================== 
     
    18231799the arguments: it provides a function :obj:`C45Learner.commandLine` 
    18241800which is given a string and parses it the same way as C4.5 would 
    1825 parse its command line. XXXXXXXXXXX 
     1801parse its command line.  
    18261802 
    18271803.. class:: C45Classifier 
     
    19461922.. autofunction:: printTreeC45 
    19471923 
    1948 TODO 
    1949 ==== 
    1950  
    1951 This page does not provide examples for programming your own components,  
    1952 such as, for instance, a :obj:`SplitConstructor`. Those examples 
    1953 can be found on a page dedicated to callbacks to Python. 
    1954  
    19551924References 
    19561925========== 
     
    19681937 
    19691938from Orange.core import \ 
    1970      TreeLearner as TreeLearnerBase, \ 
     1939     TreeLearner as _TreeLearner, \ 
    19711940         TreeClassifier as _TreeClassifier, \ 
    19721941         C45Learner, \ 
     
    21022071    (from Orange's objects for induction of decision trees).  
    21032072    :class:`TreeLearner` is essentially a wrapper 
    2104     around :class:`TreeLearnerBase`, provided for easier use of the latter. 
     2073    around :class:`_TreeLearner`, provided for easier use of the latter. 
    21052074    It sets a number of parameters used in induction that 
    21062075    can also be set after the creation of the object, most often through 
     
    21572126 
    21582127        Sem `m` and `k` to given values if the :obj:`measure` is relief. 
     2128 
     2129    .. attribute:: splitter 
     2130 
     2131        Object of type :class:`ExampleSplitter`. The default splitter 
     2132        is :class:`ExampleSplitter_UnknownsAsSelector` that splits 
     2133        the learning examples according to distributions given by the 
     2134        selector. 
    21592135 
    21602136    **Pruning** 
     
    22532229        self.stop = None 
    22542230        self.measure = None 
     2231        self.splitter = None 
    22552232        self.__dict__.update(kw) 
    22562233       
     
    22702247            bl.split.discreteSplitConstructor.measure = measure 
    22712248          
     2249        if self.splitter != None: 
     2250            bl.splitter = self.splitter 
     2251 
     2252 
    22722253        #post pruning 
    22732254        tree = bl(examples, weight) 
     
    22822263        """ 
    22832264        DEPRECATED. Return a base learner - an object  
    2284         of :class:`TreeLearnerBase`.  
     2265        of :class:`_TreeLearner`.  
    22852266        This method is left for backwards compatibility. 
    22862267        """ 
     
    23622343 
    23632344    def _base_learner(self): 
    2364         learner = TreeLearnerBase() 
     2345        learner = _TreeLearner() 
    23652346 
    23662347        learner.split = self.build_split() 
Note: See TracChangeset for help on using the changeset viewer.