source: orange/orange/Orange/classification/tree.py @ 7708:ef0356b8bff4

Revision 7708:ef0356b8bff4, 119.3 KB checked in by markotoplak, 3 years ago (diff)

Created a python class, which wraps orange.TreeClassifier. TreeLearner now returns the new class.

Line 
1"""
2
3.. index:: classification tree
4
5.. index::
6   single: classification; tree
7
8********************
9Classification trees
10********************
11
12This page describes the Orange trees. It first describes the basic
13components and procedures: it starts with the structure that represents
14the tree, then it defines how the tree is used for classification,
15how it is built and pruned. The order might seem strange,
16but the things are rather complex and this order is perhaps a
17bit easier to follow. After you have some idea about what the
18principal components do, we described the
19concrete classes that you can use as
20components for a tree learner.
21
22Classification trees are represented as a tree-like hierarchy of
23:obj:`Node` classes.
24
25.. class:: Node
26
27    Node stores information about the learning examples belonging
28    to the node, a branch selector, a list of branches (if the node is
29    not a leaf) with their descriptions and strengths, and a classifier.
30
31    .. attribute:: distribution
32   
33        Stores a distribution for learning examples belonging to the node.
34        Storing distributions can be disabled by setting the
35        :obj:`TreeLearnerBase`'s storeDistributions flag to false.
36
37    .. attribute:: contingency
38
39        Stores complete contingency matrices for the learning examples
40        belonging to the node. Storing contingencies can be enabled by
41        setting :obj:`TreeLearnerBase`'s :obj:`storeContingencies`
42        flag to true. Note that even when the flag is not
43        set, the contingencies get computed and stored to
44        :obj:`Node`, but are removed shortly afterwards.
45        The details are given in the
46        description of the :obj:`TreeLearnerBase` object.
47
48    .. attribute:: examples, weightID
49
50        Store a set of learning examples for the node and the
51        corresponding ID of /weight meta attribute. The root of the
52        tree stores a "master table" of examples, while other nodes'
53        :obj:`Orange.data.Table` contain reference to examples in
54        the root's :obj:`Orange.data.Table`. Examples are only stored
55        if a corresponding flag (:obj:`storeExamples`) has been
56        set while building the tree; to conserve the space, storing
57        is disabled by default.
58
59    .. attribute:: nodeClassifier
60
61        A classifier (usually, but not necessarily, a
62        :obj:`DefaultClassifier`) that can be used to classify
63        examples coming to the node. If the node is a leaf, this is
64        used to decide the final class (or class distribution) of an
65        example. If it's an internal node, it is stored if
66        :obj:`Node`'s flag :obj:`storeNodeClassifier`
67        is set. Since the :obj:`nodeClassifier` is needed by
68        :obj:`Descender` and for pruning (see far below),
69        this is the default behaviour; space consumption of the default
70        :obj:`DefaultClassifier` is rather small. You should
71        never disable this if you intend to prune the tree later.
72
73    If the node is a leaf, the remaining fields are None.
74    If it's an internal node, there are several additional fields.
75
76    .. attribute:: branches
77
78        Stores a list of subtrees, given as :obj:`Node`.
79        An element can be None; in this case the node is empty.
80
81    .. attribute:: branchDescriptions
82
83        A list with string descriptions for branches, constructed by
84        :obj:`SplitConstructor`. It can contain different kinds
85        of descriptions, but basically, expect things like 'red' or '>12.3'.
86
87    .. attribute:: branchSizes
88
89        Gives a (weighted) number of training examples that went into
90        each branch. This can be used later, for instance, for
91        modeling probabilities when classifying examples with
92        unknown values.
93
94    .. attribute:: branchSelector
95
96        Gives a branch for each example. The same object is used during
97        learning and classifying. The :obj:`branchSelector` is of
98        type :obj:`orange.Classifier`, since its job is similar to that
99        of a classifier: it gets an example and returns discrete
100        :obj:`Orange.data.Value` in range :samp:`[0, len(branches)-1]`.
101        When an example cannot be classified to any branch, the selector
102        can return a :obj:`Orange.data.Value` containing a special value
103        (sVal) which should be a discrete distribution
104        (DiscDistribution). This should represent a
105        :obj:`branchSelector`'s opinion of how to divide the
106        example between the branches. Whether the proposition will be
107        used or not depends upon the chosen :obj:`ExampleSplitter`
108        (when learning) or :obj:`Descender` (when classifying).
109
110    The lists :obj:`branches`, :obj:`branchDescriptions` and
111    :obj:`branchSizes` are of the same length; all of them are
112    defined if the node is internal and none if it is a leaf.
113
114    .. method:: treeSize()
115       
116        Return the number of nodes in the subtrees (including the
117        node, excluding null-nodes).
118
119==============
120Classification
121==============
122
123.. class:: TreeClassifier
124
125    Classifies examples according to a tree stored in :obj:`tree`.
126
127    .. attribute:: tree
128
129        The root of the tree, represented as a :class:`Node`.
130   
131    Classification would be straightforward if there were no unknown
132    values or, in general, examples that cannot be placed into a
133    single branch. The response in such cases is determined by a
134    component :obj:`descender`.
135
136    :obj:`Descender` is an abstract object which is given an example
137    and whose basic job is to descend as far down the tree as possible,
138    according to the values of example's attributes. The
139    :obj:`Descender`: calls the node's :obj:`branchSelector` to get
140    the branch index. If it's a simple index, the corresponding branch
141    is followed. If not, it's up to descender to decide what to do, and
142    that's where descenders differ. A :obj:`descender` can choose
143    a single branch (for instance, the one that is the most recommended
144    by the :obj:`branchSelector`) or it can let the branches vote.
145
146    In general there are three possible outcomes of a descent.
147
148    #. Descender reaches a leaf. This happens when nothing went wrong
149       (there are no unknown or out-of-range values in the example) or
150       when things went wrong, but the descender smoothed them by
151       selecting a single branch and continued the descend. In this
152       case, the descender returns the reached :obj:`Node`.
153    #. :obj:`branchSelector` returned a distribution and the
154       :obj:`Descender` decided to stop the descend at this
155       (internal) node.  Again, descender returns the current
156       :obj:`Node` and nothing else.
157    #. :obj:`branchSelector` returned a distribution and the
158       :obj:`Node` wants to split the example (i.e., to decide the
159       class by voting).
160
161    It returns a :obj:`Node` and the vote-weights for the branches.
162    The weights can correspond to the distribution returned by
163    :obj:`branchSelector`, to the number of learning examples that
164    were assigned to each branch, or to something else.
165
166    :obj:`TreeClassifier` uses the descender to descend from the root.
167    If it returns only a :obj:`Node` and no distribution, the
168    descend should stop; it does not matter whether it's a leaf (the
169    first case above) or an internal node (the second case). The node's
170    :obj:`nodeClassifier` is used to decide the class. If the descender
171    returns a :obj:`Node` and a distribution, the :obj:`TreeClassifier`
172    recursively calls itself for each of the subtrees and the
173    predictions are weighted as requested by the descender.
174
175    When voting, subtrees do not predict the class but probabilities
176    of classes. The predictions are multiplied by weights, summed and
177    the most probable class is returned.
178
179    .. method:: vote()
180
181    .. method:: classDistribution()
182
183
184The rest of this section is only for those interested in the C++ code.
185======================================================================
186
187If you'd like to understand how the classification works in C++,
188start reading at :obj:`TTreeClassifier::vote`. It gets a
189:obj:`Node`, an :obj:`Orange.data.Instance`> and a distribution of
190vote weights. For each node, it calls the
191:obj:`TTreeClassifier::classDistribution` and then multiplies
192and sums the distribution. :obj:`vote` returns a normalized
193distribution of predictions.
194
195A new overload of :obj:`TTreeClassifier::classDistribution` gets
196an additional parameter, a :obj:`Node`. This is done
197for the sake of recursion. The normal version of
198:obj:`classDistribution` simply calls the overloaded with a
199tree root as an additional parameter. :obj:`classDistribution`
200uses :obj:`descender`. If descender reaches a leaf, it calls
201:obj:`nodeClassifier`, otherwise it calls :obj:`vote`.
202
203Thus, the :obj:`TreeClassifier`'s :obj:`vote` and
204:obj:`classDistribution` are written in a form of double
205recursion. The recursive calls do not happen at each node of the
206tree but only at nodes where a vote is needed (that is, at nodes
207where the descender halts).
208
209For predicting a class, :obj:`operator()`, calls the
210descender. If it reaches a leaf, the class is predicted by the
211leaf's :obj:`nodeClassifier`. Otherwise, it calls
212:obj:`vote`>. From now on, :obj:`vote` and
213:obj:`classDistribution` interweave down the tree and return
214a distribution of predictions. :obj:`operator()` then simply
215chooses the most probable class.
216
217========
218Learning
219========
220
221The main learning object is :obj:`TreeLearnerBase`. It is basically
222a skeleton into which the user must plug the components for particular
223functions. For easier use, defaults are provided.
224
225Components that govern the structure of the tree are :obj:`split`
226(of type :obj:`SplitConstructor`), :obj:`stop` (of
227type :obj:`StopCriteria` and :obj:`exampleSplitter`
228(of type :obj:`ExampleSplitter`).
229
230.. class:: SplitConstructor
231
232    Finds a suitable criteria for dividing the learning (and later testing)
233    examples coming to the node. The data it gets is a set of examples
234    (and, optionally, an ID of weight meta-attribute), a domain
235    contingency computed from examples, apriori class probabilities,
236    a list of candidate attributes it should consider and a node
237    classifier (if it was constructed, that is, if
238    :obj:`storeNodeClassifier` is left true).
239
240    The :obj:`SplitConstructor` should use the domain contingency
241    when possible. The reasons are two-fold; one is that it's faster
242    and the other is that the contingency matrices are not necessarily
243    constructed by simply counting the examples. Why and how is
244    explained later. There are, however, cases, when domain contingency
245    does not suffice, for examples, when ReliefF is used as a measure
246    of quality of attributes. In this case, there's no other way but
247    to use the examples and ignore the precomputed contingencies.
248
249    The split constructor should consider only the attributes in the
250    candidate list (the list is a vector of booleans, one for each
251    attribute).
252
253    :obj:`SplitConstructor` returns most of the data we talked
254    about when describing the :obj:`Node`. It returns a classifier
255    to be used as :obj:`Node`'s :obj:`branchSelector`, a list of
256    branch descriptions and a list with the number of examples that
257    go into each branch. Just what we need for the :obj:`Node`.
258    It can return an empty list for the number of examples in branches;
259    in this case, the :obj:`TreeLearnerBase` will find the number itself
260    after splitting the example set into subsets. However, if a split
261    constructors can provide the numbers at no extra computational
262    cost, it should do so.
263
264    In addition, it returns a quality of the split; a number without
265    any fixed meaning except that higher numbers mean better splits.
266
267    If the constructed splitting criterion uses an attribute in such
268    a way that the attribute is 'completely spent' and should not be
269    considered as a split criterion in any of the subtrees (the
270    typical case of this are discrete attributes that are used
271    as-they-are, that is, without any binarization or subsetting),
272    then it should report the index of this attribute. Some splits
273    do not spend any attribute; this is indicated by returning a
274    negative index.
275
276    A :obj:`SplitConstructor` can veto the further tree induction
277    by returning no classifier. This can happen for many reasons.
278    A general one is related to number of examples in the branches.
279    :obj:`SplitConstructor` has a field :obj:`minSubset`,
280    which sets the minimal number of examples in a branch; null nodes,
281    however, are allowed. If there is no split where this condition
282    is met, :obj:`SplitConstructor` stops the induction.
283
284    .. attribute:: minSubset
285
286        Sets the minimal number of examples in non-null leaves. As
287        always in Orange (where not specified otherwise), "number of
288        examples" refers to the weighted number of examples.
289   
290    .. method:: __call__(examples, [weightID=0, apriori_distribution, candidates])
291
292        Construct a split. Returns a tuple (:obj:`branchSelector`,
293        :obj:`branchDescriptions`, :obj:`subsetSizes`, :obj:`quality`,
294        :obj:`spentAttribute`). :obj:`SpentAttribute` is -1 if no
295        attribute is completely spent by the split criterion. If no
296        split is constructed, the :obj:`selector`, :obj:`branchDescriptions`
297        and :obj:`subsetSizes` are None, while :obj:`quality` is 0.0 and
298        :obj:`spentAttribute` is -1.
299
300        :param examples:  Examples can be given in any acceptable form
301            (an :obj:`ExampleGenerator`, such as :obj:`ExampleTable`, or a
302            list of examples).
303        :param weightID: Optional; the default of 0 means that all
304            examples have a weight of 1.0.
305        :param apriori-distribution: Should be of type
306            :obj:`orange.Distribution` and candidates should be a Python
307            list of objects which are interpreted as booleans.
308
309
310.. class:: StopCriteria
311
312    Given a set of examples, weight ID and contingency matrices, decide
313    whether to continue the induction or not. The basic criterion checks
314    whether there are any examples and whether they belong to at least
315    two different classes (if the class is discrete). Derived components
316    check things like the number of examples and the proportion of
317    majority classes.
318
319    As opposed to :obj:`SplitConstructor` and similar basic classes,
320    :obj:`StopCriteria` is not an abstract but a fully functional
321    class that provides the basic stopping criteria. That is, the tree
322    induction stops when there is at most one example left; in this case,
323    it is not the weighted but the actual number of examples that counts.
324    Besides that, the induction stops when all examples are in the same
325    class (for discrete problems) or have the same value of the outcome
326    (for regression problems).
327
328    .. method:: __call__(examples[, weightID, domain contingencies])
329
330        Decides whether to stop (true) or continue (false) the induction.
331        If contingencies are given, they are used for checking whether
332        the examples are in the same class (but not for counting the
333        examples). Derived classes should use the contingencies whenever
334        possible. If contingencies are not given, :obj:`StopCriteria`
335        will work without them. Derived classes should also use them if
336        they are available, but otherwise compute them only when they
337        really need them.
338
339
340.. class:: ExampleSplitter
341
342    Just like the :obj:`Descender` decides the branch for an
343    example during classification, the :obj:`ExampleSplitter`
344    sorts the learning examples into branches.
345
346    :obj:`ExampleSplitter` is given a :obj:`Node` (from which
347    it can use different stuff, but most of splitters only use the
348    :obj:`branchSelector`), a set of examples to be divided, and
349    the weight ID. The result is a list of subsets of examples
350    and, optionally, a list of new weight ID's.
351
352    Subsets are usually stored as :obj:`ExamplePointerTable`'s. Most
353    of :obj:`ExampleSplitters` simply call the node's
354    :obj:`branchSelector` and assign examples to corresponding
355    branches. When the value is unknown they choose a particular
356    branch or simply skip the example.
357
358    Some enhanced splitters can split examples. An example (actually,
359    a pointer to it) is copied to more than one subset. To facilitate
360    real splitting, weights are needed. Each branch is assigned a
361    weight ID (each would usually have its own ID) and all examples
362    that are in that branch (either completely or partially) should
363    have this meta attribute. If an example hasn't been split, it
364    has only one additional attribute - with weight ID corresponding
365    to the subset to which it went. Example that is split between,
366    say, three subsets, has three new meta attributes, one for each
367    subset. ID's of weight meta attributes are returned by the
368    :obj:`ExampleSplitter` to be used at induction of the
369    corresponding subtrees.
370
371    Note that weights are used only when needed. When no splitting
372    occured - because the splitter is not able to do it or becauser
373    there was no need for splitting - no weight ID's are returned.
374
375    An abstract base class for objects that split sets of examples into
376    subsets. The derived classes differ in treatment of examples which
377    cannot be unambiguously placed into a single branch (usually due
378    to unknown value of the crucial attribute).
379
380    .. method:: __call__(node, examples[, weightID])
381       
382        Use the information in :obj:`node` (particularly the
383        :obj:`branchSelector`) to split the given set of examples into subsets.
384        Return a tuple with a list of example generators and a list of weights.
385        The list of weights is either an ordinary python list of integers or
386        a None when no splitting of examples occurs and thus no weights are
387        needed.
388
389.. class:: TreeLearnerBase
390
391    TreeLearnerBase has a number of components.
392
393    .. attribute:: split
394
395        Object of type :obj:`SplitConstructor`. Default value,
396        provided by :obj:`TreeLearnerBase`, is :obj:`SplitConstructor_Combined`
397        with separate constructors for discrete and continuous attributes.
398        Discrete attributes are used as are, while continuous attributes
399        are binarized. Gain ratio is used to select attributes.
400        A minimum of two examples in a leaf is required for discreter
401        and five examples in a leaf for continuous attributes.</DD>
402   
403    .. attribute:: stop
404
405        Object of type :obj:`StopCriteria`. The default stopping
406        criterion stops induction when all examples in a node belong
407        to the same class.
408
409    .. attribute:: splitter
410
411        Object of type :obj:`ExampleSplitter`. The default splitter
412        is :obj:`ExampleSplitter_UnknownsAsSelector` that splits
413        the learning examples according to distributions given by the
414        selector.
415
416    .. attribute:: contingencyComputer
417   
418        By default, this slot is left empty and ordinary contingency
419        matrices are computed for examples at each node. If need arises,
420        one can change the way the matrices are computed. This can be
421        used to change the way that unknown values are treated when
422        assessing qualities of attributes. As mentioned earlier,
423        the computed matrices can be used by split constructor and by
424        stopping criteria. On the other hand, they can be (and are)
425        ignored by some splitting constructors.
426
427    .. attribute:: nodeLearner
428
429        Induces a classifier from examples belonging to a node. The
430        same learner is used for internal nodes and for leaves. The
431        default :obj:`nodeLearner` is :obj:`Orange.classification.majority.MajorityLearner`.
432
433    .. attribute:: descender
434
435        Descending component that the induces :obj:`TreeClassifier`
436        will use. Default descender is
437        :obj:`Descender_UnknownMergeAsSelector` which votes using
438        the :obj:`branchSelector`'s distribution for vote weights.
439
440    .. attribute:: maxDepth
441
442        Gives maximal tree depth; 0 means that only root is generated.
443        The default is 100 to prevent any infinite tree induction due
444        to missettings in stop criteria. If you are sure you need
445        larger trees, increase it. If you, on the other hand, want
446        to lower this hard limit, you can do so as well.
447
448    .. attribute:: storeDistributions, storeContingencies, storeExamples, storeNodeClassifier
449
450        Decides whether to store class distributions, contingencies
451        and examples in :obj:`Node`, and whether the
452        :obj:`nodeClassifier` should be build for internal nodes.
453        By default, distributions and node classifiers are stored,
454        while contingencies and examples are not. You won't save any
455        memory by not storing distributions but storing contingencies,
456        since distributions actually points to the same distribution
457        that is stored in :obj:`contingency.classes`.
458
459    The :obj:`TreeLearnerBase` first sets the defaults for missing
460    components. Although stored in the actual :obj:`TreeLearnerBase`'s
461    fields, they are removed when the induction is finished.
462
463    Then it ensures that examples are stored in a table. This is needed
464    because the algorithm juggles with pointers to examples. If
465    examples are in a file or are fed through a filter, they are copied
466    to a table. Even if they are already in a table, they are copied
467    if :obj:`storeExamples` is set. This is to assure that pointers
468    remain pointing to examples even if the user later changes the
469    example table. If they are in the table and the :obj:`storeExamples`
470    flag is clear, we just use them as they are. This will obviously
471    crash in a multi-threaded system if one changes the table during
472    the tree induction. Well... don't do it.
473
474    Apriori class probabilities are computed. At this point we check
475    the sum of example weights; if it's zero, there are no examples and
476    we cannot proceed. A list of candidate attributes is set; in the
477    beginning, all attributes are candidates for the split criterion.
478
479    Now comes the recursive part of the :obj:`TreeLearnerBase`. Its arguments
480    are a set of examples, a weight meta-attribute ID (a tricky thing,
481    it can be always the same as the original or can change to
482    accomodate splitting of examples among branches), apriori class
483    distribution and a list of candidates (represented as a vector
484    of Boolean values).
485
486    The contingency matrix is computed next. This happens
487    even if the flag :obj:`storeContingencies` is false.
488    If the :obj:`contingencyComputer` is given we use it,
489    otherwise we construct just an ordinary contingency matrix.
490
491    A :obj:`stop` is called to see whether it's worth to continue. If
492    not, a :obj:`nodeClassifier` is built and the :obj:`Node` is
493    returned. Otherwise, a :obj:`nodeClassifier` is only built if
494    :obj:`forceNodeClassifier` flag is set.
495
496    To get a :obj:`Node`'s :obj:`nodeClassifier`, the
497    :obj:`nodeLearner`'s :obj:`smartLearn` function is called with
498    the given examples, weight ID and the just computed matrix. If
499    the learner can use the matrix (and the default,
500    :obj:`Orange.classification.majority.MajorityLearner`, can), it won't touch the examples. Thus,
501    a choice of :obj:`contingencyComputer` will, in many cases,
502    affect the :obj:`nodeClassifier`. The :obj:`nodeLearner` can
503    return no classifier; if so and if the classifier would be
504    needed for classification, the :obj:`TreeClassifier`'s function
505    returns DK or an empty distribution. If you're writing your own
506    tree classifier - pay attention.
507
508    If the induction is to continue, a :obj:`split` component is called.
509    If it fails to return a branch selector, induction stops and the
510    :obj:`Node` is returned.
511
512    :obj:`TreeLearnerBase` than uses :obj:`ExampleSplitter` to divide
513    the examples as described above.
514
515    The contingency gets removed at this point if it is not to be
516    stored. Thus, the :obj:`split`, :obj:`stop` and
517    :obj:`exampleSplitter` can use the contingency matrices if they will.
518
519    The :obj:`TreeLearnerBase` then recursively calls itself for each of
520    the non-empty subsets. If the splitter returnes a list of weights,
521    a corresponding weight is used for each branch. Besides, the
522    attribute spent by the splitter (if any) is removed from the
523    list of candidates for the subtree.
524
525    A subset of examples is stored in its corresponding tree node,
526    if so requested. If not, the new weight attributes are removed (if
527    any were created).
528
529Pruning
530=======
531
532Tree pruners derived from :obj:`Pruner` can be given either a
533:obj:`Node` (presumably, but not necessarily a root) or a
534:obj:`TreeClassifier`. The result is a new, pruned :obj:`Node`
535or a new :obj:`TreeClassifier` with a pruned tree. The original
536tree remains intact.
537
538Note however that pruners construct only a shallow copy of a tree.
539The pruned tree's :obj:`Node` contain references to the same
540contingency matrices, node classifiers, branch selectors, ...
541as the original tree. Thus, you may modify a pruned tree structure
542(manually cut it, add new nodes, replace components) but modifying,
543for instance, some node's :obj:`nodeClassifier` (a
544:obj:`nodeClassifier` itself, not a reference to it!) would modify
545the node's :obj:`nodeClassifier` in the corresponding node of
546the original tree.
547
548Talking about node classifiers - pruners cannot construct a
549:obj:`nodeClassifier` nor merge :obj:`nodeClassifier` of the pruned
550subtrees into classifiers for new leaves. Thus, if you want to build
551a prunable tree, internal nodes must have their :obj:`nodeClassifier`
552defined. Fortunately, all you need to do is nothing; if you leave
553the :obj:`TreeLearnerBase`'s flags as they are by default, the
554:obj:`nodeClassifier` are created.
555
556=======
557Classes
558=======
559
560Several classes described above are already functional and can
561(and mostly will) be used as they are. Those classes are :obj:`Node`,
562:obj:`TreeLearnerBase` and :obj:`TreeClassifier`. This section describe
563the other classes.
564
565Classes :obj:`SplitConstructor`, :obj:`StopCriteria`,
566:obj:`ExampleSplitter`, :obj:`Descender`, :obj:`orange.Learner`
567and :obj:`Classifier` are among the Orange classes that can be subtyped
568in Python and have the call operator overloadedd in such a way that it
569is callbacked from C++ code. You can thus program your own components
570for :obj:`TreeLearnerBase` and :obj:`TreeClassifier`. The detailed
571information on how this is done and what can go wrong, is given in a
572separate page, dedicated to callbacks to Python XXXXXXXXXX.
573
574SplitConstructors
575=====================
576
577Split construction is almost as exciting as waiting for a delayed flight.
578Boring, that is. Split constructors are programmed as spaghetti code
579that juggles with contingency matrices, with separate cases for discrete
580and continuous classes... Most split constructors work either for
581discrete or for continuous attributes. The suggested practice is
582to use a :obj:`SplitConstructor_Combined` that can handle
583both by simply delegating attributes to specialized split constructors.
584
585Note: split constructors that cannot handle attributes of particular
586type (discrete, continuous) do not report an error or a warning but
587simply skip the attribute. It is your responsibility to use a correct
588split constructor for your dataset. (May we again suggest
589using :obj:`SplitConstructor_Combined`?)
590
591The same components can be used either for inducing classification and
592regression trees. The only component that needs to be chosen accordingly
593is the 'measure' attribute for the :obj:`SplitConstructor_Measure`
594class (and derived classes).
595
596.. class:: SplitConstructor_Measure
597
598    Bases: :class:`SplitConstructor`
599
600    An abstract base class for split constructors that employ
601    a :obj:`Orange.feature.scoring.Measure` to assess a quality of a split.
602    At present,
603    all split constructors except for :obj:`SplitConstructor_Combined`
604    are derived from this class.
605
606    .. attribute:: measure
607
608        A component of type :obj:`Orange.feature.scoring.Measure` used for
609        evaluation of a split. Note that you must select the subclass
610        :obj:`Orange.feature.scoring.Measure` capable of handling your
611        class type
612        - you cannot use :obj:`Orange.feature.scoring.GainRatio`
613        for building regression trees or :obj:`Orange.feature.scoring.MSE`
614        for classification trees.
615
616    .. attribute:: worstAcceptable
617
618        The lowest required split quality for a split to be acceptable.
619        Note that this value make sense only in connection with a
620        :obj:`measure` component. Default is 0.0.
621
622.. class:: SplitConstructor_Attribute
623
624    Bases: :class:`SplitConstructor_Measure`
625
626    Attempts to use a discrete attribute as a split; each value of the
627    attribute corresponds to a branch in the tree. Attributes are
628    evaluated with the :obj:`measure` and the one with the
629    highest score is used for a split. If there is more than one
630    attribute with the highest score, one of them is selected by random.
631
632    The constructed :obj:`branchSelector` is an instance of
633    :obj:`orange.ClassifierFromVarFD` that returns a value of the
634    selected attribute. If the attribute is
635    :obj:`Orange.data.variable.Discrete`,
636    :obj:`branchDescription`'s are the attribute's values. The
637    attribute is marked as spent, so that it cannot reappear in the
638    node's subtrees.
639
640.. class:: SplitConstructor_ExhaustiveBinary
641
642    Bases: :class:`SplitConstructor_Measure`
643
644    Works on discrete attributes. For each attribute, it determines
645    which binarization of the attribute gives the split with the
646    highest score. If more than one split has the highest score,
647    one of them is selected by random. After trying all the attributes,
648    it returns one of those with the highest score.
649
650    The constructed :obj:`branchSelector` is again an instance
651    :obj:`orange.ClassifierFromVarFD` that returns a value of the
652    selected attribute. This time, however, its :obj:`transformer`
653    contains an instance of :obj:`MapIntValue` that maps the values
654    of the attribute into a binary attribute. Branch descriptions are
655    of form "[<val1>, <val2>, ...<valn>]" for branches corresponding to
656    more than one value of the attribute. Branches that correspond to
657    a single value of the attribute are described with this value. If
658    the attribute was originally binary, it is spent and cannot be
659    used in the node's subtrees. Otherwise, it can reappear in the
660    subtrees.
661
662
663.. class:: SplitConstructor_Threshold
664
665    Bases: :class:`SplitConstructor_Measure`
666
667    This is currently the only constructor for splits with continuous
668    attributes. It divides the range of attributes values with a threshold
669    that maximizes the split's quality. As always, if there is more than
670    one split with the highest score, a random threshold is selected.
671    The attribute that yields the highest binary split is returned.
672
673    The constructed :obj:`branchSelector` is again an instance of
674    :obj:`orange.ClassifierFromVarFD` with an attached
675    :obj:`transformer`. This time, :obj:`transformer` is of type
676    :obj:`orange.ThresholdDiscretizer`. The branch descriptions are
677    "<threshold" and ">=threshold". The attribute is not spent.
678
679.. class:: SplitConstructor_OneAgainstOthers
680   
681    Bases: :class:`SplitConstructor_Measure`
682
683    Undocumented.
684
685.. class:: SplitConstructor_Combined
686
687    Bases: :class:`SplitConstructor`
688
689    This constructor delegates the task of finding the optimal split
690    to separate split constructors for discrete and for continuous
691    attributes. Each split constructor is called, given only attributes
692    of appropriate types as candidates. Both construct a candidate for
693    a split; the better of them is selected.
694
695    (Note that there is a problem when more candidates have the same
696    score. Let there be are nine discrete attributes with the highest
697    score; the split constructor for discrete attributes will select
698    one of them. Now, let us suppose that there is a single continuous
699    attribute with the same score. :obj:`SplitConstructor_Combined`
700    would randomly select between the proposed discrete attribute and
701    the continuous attribute, not aware of the fact that the discrete
702    has already competed with eight other discrete attributes. So,
703    he probability for selecting (each) discrete attribute would be 1/18
704    instead of 1/10. Although not really correct, we doubt that this
705    would affect the tree's performance; many other machine learning
706    systems simply choose the first attribute with the highest score
707    anyway.)
708
709    The :obj:`branchSelector`, :obj:`branchDescriptions` and whether
710    the attribute is spent is decided by the winning split constructor.
711
712    .. attribute: discreteSplitConstructor
713
714        Split constructor for discrete attributes; can be, for instance,
715        :obj:`SplitConstructor_Attribute` or
716        :obj:`SplitConstructor_ExhaustiveBinary`.
717
718    .. attribute: continuousSplitConstructor
719
720        Split constructor for continuous attributes; at the moment, it
721        can be either :obj:`SplitConstructor_Threshold` or a
722        split constructor you programmed in Python.
723
724    .. attribute: continuousSplitConstructor
725   
726        Split constructor for continuous attributes; at the moment, it
727        can be either :obj:`SplitConstructor_Threshold` or a split
728        constructor you programmed in Python.
729
730
731StopCriteria and StopCriteria_common
732============================================
733
734obj:`StopCriteria` determines when to stop the induction of subtrees, as
735described in detail in description of the learning process. XXXXXXXXXX
736
737.. class:: StopCriteria_common
738
739    :obj:`StopCriteria` contains additional criteria for pre-pruning:
740    it checks the proportion of majority class and the number of weighted
741    examples.
742
743    .. attribute:: maxMajor
744
745        Maximal proportion of majority class. When this is exceeded,
746        induction stops.
747
748    .. attribute:: minExamples
749
750        Minimal number of examples in internal leaves. Subsets with less
751        than :obj:`minExamples` examples are not split any further.
752        Example count is weighed.
753
754.. class:: StopCriteria_Python
755
756    Undocumented.
757
758Classes derived from ExampleSplitter
759========================================
760
761:obj:`ExampleSplitter` is the third crucial component of
762:obj:`TreeLearnerBase`. Its function is described in
763description of the learning process. XXXXXXXXXX
764
765.. class:: ExampleSplitter_IgnoreUnknowns
766
767    Bases: :class:`ExampleSplitter`
768
769    Simply ignores the examples for which no single branch can be determined.
770
771.. class:: ExampleSplitter_UnknownsToCommon
772
773    Bases: :class:`ExampleSplitter`
774
775    Places all such examples to a branch with the highest number of
776    examples. If there is more than one such branch, one is selected at
777    random and then used for all examples.
778
779.. class:: ExampleSplitter_UnknownsToAll
780
781    Bases: :class:`ExampleSplitter`
782
783    Places examples with unknown value of the attribute into all branches.
784
785.. class:: ExampleSplitter_UnknownsToRandom
786
787    Bases: :class:`ExampleSplitter`
788
789    Selects a random branch for such examples.
790
791.. class:: ExampleSplitter_UnknownsToBranch
792
793    Bases: :class:`ExampleSplitter`
794
795    Constructs an additional branch to contain all such examples.
796    The branch's description is "unknown".
797
798.. class:: ExampleSplitter_UnknownsAsBranchSizes
799
800    Bases: :class:`ExampleSplitter`
801
802    Splits examples with unknown value of the attribute according to
803    proportions of examples in each branch.
804
805.. class:: ExampleSplitter_UnknownsAsSelector
806
807    Bases: :class:`ExampleSplitter`
808
809    Splits examples with unknown value of the attribute according to
810    distribution proposed by selector (which is in most cases the same
811    as proportions of examples in branches).
812
813Descender and derived classes
814=================================
815
816This is a classifier's counterpart for :obj:`ExampleSplitter`. It
817decides the destiny of examples that need to be classified and cannot
818be unambiguously put in a branch. The detailed function of this class
819is given in description of classification with trees. XXXXXX
820
821.. class:: Descender
822
823    An abstract base object for tree descenders.
824
825    .. method:: __call__(node, example)
826
827        Descends down the tree until it reaches a leaf or a node in
828        which a vote of subtrees is required. In both cases, a tuple
829        of two elements is returned; in the former, the tuple contains
830        the reached node and None, in the latter in
831        contains a node and weights of votes for subtrees (a list of floats).
832
833        :obj:`Descender`'s that never split examples always descend
834        to a leaf, but they differ in the treatment of examples with
835        unknown values (or, in general, examples for which a branch
836        cannot be determined at some node(s) the tree).
837        :obj:`Descender`'s that do split examples differ in returned
838        vote weights.
839
840.. class:: Descender_UnknownsToNode
841
842    Bases: :obj:`Descender`
843
844    When example cannot be classified into a single branch, the
845    current node is returned. Thus, the node's :obj:`NodeClassifier`
846    will be used to make a decision. It is your responsibility to see
847    that even the internal nodes have their :obj:`NodeClassifier`
848    (i.e., don't disable creating node classifier or manually remove
849    them after the induction, that's all)
850
851.. class:: Descender_UnknownsToBranch
852
853    Bases: :obj:`Descender`
854
855    Classifies examples with unknown value to a special branch. This
856    makes sense only if the tree itself was constructed with
857    :obj:`ExampleSplitter_UnknownsToBranch`.
858
859.. class:: Descender_UnknownsToCommonBranch
860
861    Bases: :obj:`Descender`
862
863    Classifies examples with unknown values to the branch with the
864    highest number of examples. If there is more than one such branch,
865    random branch is chosen for each example that is to be classified.
866
867.. class:: Descender_UnknownsToCommonSelector
868
869    Bases: :obj:`Descender`
870
871    Classifies examples with unknown values to the branch which received
872    the highest recommendation by the selector.
873
874.. class:: Descender_MergeAsBranchSizes
875
876    Bases: :obj:`Descender`
877
878    Makes the subtrees vote for the example's class; the vote is
879    weighted according to the sizes of the branches.
880
881.. class:: Descender_MergeAsSelector
882
883    Bases: :obj:`Descender`
884
885    Makes the subtrees vote for the example's class; the vote is
886    weighted according to the selectors proposal.
887
888Pruner and derived classes
889==============================
890
891.. index::
892    pair: classification trees; pruning
893
894Classes derived from :obj:`Pruner` prune the trees as a
895described in the section pruning XXXXXXXX - make sure you read it
896to understand what the pruners will do to your trees.
897
898.. class:: Pruner
899
900    This is an abstract base class which defines nothing useful, only
901    a pure virtual call operator.
902
903    .. method:: __call__(tree)
904
905        Prunes a tree. The argument can be either a tree classifier or
906        a tree node; the result is of the same type as the argument.
907
908.. class:: Pruner_SameMajority
909
910    Bases: :class:`Pruner`
911
912    In Orange, a tree can have a non-trivial subtrees (i.e. subtrees
913    with more than one leaf) in which all the leaves have the same majority
914    class. (This is allowed because those leaves can still have different
915    distributions of classes and thus predict different probabilities.)
916    However, this can be undesired when we're only interested in the
917    class prediction or a simple tree interpretation. The
918    :obj:`Pruner_SameMajority` prunes the tree so that there is no
919    subtree in which all the nodes would have the same majority class.
920
921    This pruner will only prune the nodes in which the node classifier
922    is of class :obj:`orange.DefaultClassifier` (or from a derived class).
923
924    Note that the leaves with more than one majority class require some
925    special handling. The pruning goes backwards, from leaves to the root.
926    When siblings are compared, the algorithm checks whether they
927    have (at least one) common majority class. If so, they can be pruned.
928
929.. class:: Pruner_m
930
931    Bases: :class:`Pruner`
932
933    Prunes a tree by comparing m-estimates of static and dynamic
934    error as defined in (Bratko, 2002).
935
936    .. attribute:: m
937
938        Parameter m for m-estimation.
939
940========
941Examples
942========
943
944This page does not provide examples for programming your own components,
945such as, for instance, a :obj:`SplitConstructor`. Those examples
946can be found on a page dedicated to callbacks to Python XXXXXXXX.
947
948Tree Structure
949==============
950
951To have something to work on, we'll take the data from lenses dataset
952and build a tree using the default components (part of `treestructure.py`_, uses `lenses.tab`_):
953
954.. literalinclude:: code/treestructure.py
955   :lines: 7-10
956
957How big is our tree (part of `treestructure.py`_, uses `lenses.tab`_)?
958
959.. _lenses.tab: code/lenses.tab
960.. _treestructure.py: code/treestructure.py
961
962.. literalinclude:: code/treestructure.py
963   :lines: 12-21
964
965If node is None, we have a null-node; null nodes don't count,
966so we return 0. Otherwise, the size is 1 (this node) plus the
967sizes of all subtrees. The node is an internal node if it has a
968:obj:`branchSelector`; it there's no selector, it's a leaf. Don't
969attempt to skip the if statement: leaves don't have an empty list
970of branches, they don't have a list of branches at all.
971
972    >>> treeSize(treeClassifier.tree)
973    10
974
975Don't forget that this was only an excercise - :obj:`Node` has a
976built-in method :obj:`Node.treeSize` that does exactly the same.
977
978Let us now write a simple script that prints out a tree. The recursive
979part of the function will get a node and its level (part of `treestructure.py`_, uses `lenses.tab`_).
980
981.. literalinclude:: code/treestructure.py
982   :lines: 26-41
983
984Don't waste time on studying formatting tricks (\n's etc.), this is just
985for nicer output. What matters is everything but the print statements.
986As first, we check whether the node is a null-node (a node to which no
987learning examples were classified). If this is so, we just print out
988"<null node>" and return.
989
990After handling null nodes, remaining nodes are internal nodes and leaves.
991For internal nodes, we print a node description consisting of the
992attribute's name and distribution of classes. :obj:`Node`'s branch
993description is, for all currently defined splits, an instance of a
994class derived from :obj:`orange.Classifier` (in fact, it is a
995:obj:`orange.ClassifierFromVarFD`, but a :obj:`orange.Classifier` would
996suffice), and its :obj:`classVar` XXXXX points to the attribute we seek.
997So we print its name. We will also assume that storing class distributions
998has not been disabled and print them as well. A more able function for
999printing trees (as one defined in XXXXXXXXXX) has an alternative
1000means to get the distribution, when this fails. Then we iterate
1001through branches; for each we print a branch description and iteratively
1002call the :obj:`printTree0` with a level increased by 1 (to increase
1003the indent).
1004
1005Finally, if the node is a leaf, we print out the distribution of
1006learning examples in the node and the class to which the examples in
1007the node would be classified. We again assume that the :obj:`nodeClassifier`
1008is the default one - a :obj:`DefaultClassifier`. A better print
1009function should be aware of possible alternatives.
1010
1011Now, we just need to write a simple function to call our printTree0.
1012We could write something like...
1013
1014::
1015
1016    def printTree(x):
1017        printTree0(x.tree, 0)
1018
1019... but we won't. Let us learn how to handle arguments of different
1020types. Let's write a function that will accept either a :obj:`TreeClassifier`
1021or a :obj:`Node`; just like Pruners XXXXXX, remember? Part of `treestructure.py`_, uses `lenses.tab`_.
1022
1023.. literalinclude:: code/treestructure.py
1024   :lines: 43-49
1025
1026It's fairly straightforward: if :obj:`x` is of type derived from
1027:obj:`TreeClassifier`, we print :obj:`x.tree`; if it's
1028:obj:`Node` we just call :obj:`printTree0` with :obj:`x`. If it's
1029of some other type, we don't know how to handle it and thus raise
1030an exception. (Note that we could also use
1031
1032::
1033
1034    if isinstance(x, Orange.classification.tree.TreeClassifier)
1035
1036but this would only work if :obj:`x` would be of type
1037:obj:`TreeClassifier` and not of any derived types. The latter,
1038however, do not exist yet...)
1039
1040    >>> printTree(treeClassifier)
1041    tear_rate (<15.000, 5.000, 4.000>)
1042    : reduced --> none (<12.000, 0.000, 0.000>)
1043    : normal
1044       astigmatic (<3.000, 5.000, 4.000>)
1045       : no
1046          age (<1.000, 5.000, 0.000>)
1047          : young --> soft (<0.000, 2.000, 0.000>)
1048          : pre-presbyopic --> soft (<0.000, 2.000, 0.000>)
1049          : presbyopic --> none (<1.000, 1.000, 0.000>)
1050       : yes
1051          prescription (<2.000, 0.000, 4.000>)
1052          : myope --> hard (<0.000, 0.000, 3.000>)
1053          : hypermetrope --> none (<2.000, 0.000, 1.000>)
1054
1055For a final exercise, let us write a simple pruning procedure. It will
1056be written entirely in Python, unrelated to any :obj:`Pruner`. Our
1057procedure will limit the tree depth - the maximal depth (here defined
1058as the number of internal nodes on any path down the tree) shall be
1059given as an argument. For example, to get a two-level tree, we would
1060call cutTree(root, 2). The function will be recursive, with the second
1061argument (level) decreasing at each call; when zero, the current node
1062will be made a leaf (part of `treestructure.py`_, uses `lenses.tab`_):
1063
1064.. literalinclude:: code/treestructure.py
1065   :lines: 54-62
1066
1067There's nothing to prune at null-nodes or leaves, so we act only when
1068:obj:`node` and :obj:`node.branchSelector` are defined. If level is
1069not zero, we call the function for each branch. Otherwise, we clear
1070the selector, branches and branch descriptions.
1071
1072    >>> cutTree(tree.tree, 2)
1073    >>> printTree(tree)
1074    tear_rate (<15.000, 5.000, 4.000>)
1075    : reduced --> none (<12.000, 0.000, 0.000>)
1076    : normal
1077       astigmatic (<3.000, 5.000, 4.000>)
1078       : no --> soft (<1.000, 5.000, 0.000>)
1079       : yes --> hard (<2.000, 0.000, 4.000>)
1080
1081Learning
1082========
1083
1084You've already seen a simple example of using a :obj:`TreeLearnerBase`.
1085You can just call it and let it fill the empty slots with the default
1086components. This section will teach you three things: what are the
1087missing components (and how to set the same components yourself),
1088how to use alternative components to get a different tree and,
1089finally, how to write a skeleton for tree induction in Python.
1090
1091Default components for TreeLearnerBase
1092======================================
1093
1094Let us construct a :obj:`TreeLearnerBase` to play with.
1095
1096.. _treelearner.py: code/treelearner.py
1097
1098`treelearner.py`_, uses `lenses.tab`_:
1099
1100.. literalinclude:: code/treelearner.py
1101   :lines: 7-10
1102
1103There are three crucial components in learning: the split and stop
1104criteria, and the :obj:`exampleSplitter` (there are some others,
1105which become important during classification; we'll talk about them
1106later). They are not defined; if you use the learner, the slots are
1107filled temporarily but later cleared again.
1108
1109::
1110
1111    >>> print learner.split
1112    None
1113    >>> learner(data)
1114    <TreeClassifier instance at 0x01F08760>
1115    >>> print learner.split
1116    None
1117
1118Stopping criteria
1119=================
1120
1121The stop is trivial. The default is set by
1122
1123::
1124    >>> learner.stop = Orange.classification.tree.StopCriteria_common()
1125
1126Well, this is actually done in C++ and it uses a global component
1127that is constructed once for all, but apart from that we did
1128effectively the same thing.
1129
1130We can now examine the default stopping parameters.
1131
1132    >>> print learner.stop.maxMajority, learner.stop.minExamples
1133    1.0 0.0
1134
1135Not very restrictive. This keeps splitting the examples until
1136there's nothing left to split or all the examples are in the same
1137class. Let us set the minimal subset that we allow to be split to
1138five examples and see what comes out.
1139
1140    >>> learner.stop.minExamples = 5.0
1141    >>> tree = learner(data)
1142    >>> printTree(tree)
1143    tear_rate (<15.000, 5.000, 4.000>)
1144    : reduced --> none (<12.000, 0.000, 0.000>)
1145    : normal
1146       astigmatic (<3.000, 5.000, 4.000>)
1147       : no
1148          age (<1.000, 5.000, 0.000>)
1149          : young --> soft (<0.000, 2.000, 0.000>)
1150          : pre-presbyopic --> soft (<0.000, 2.000, 0.000>)
1151          : presbyopic --> soft (<1.000, 1.000, 0.000>)
1152       : yes
1153          prescription (<2.000, 0.000, 4.000>)
1154          : myope --> hard (<0.000, 0.000, 3.000>)
1155          : hypermetrope --> none (<2.000, 0.000, 1.000>)
1156
1157OK, that's better. If we want an even smaller tree, we can also limit
1158the maximal proportion of majority class.
1159
1160    >>> learner.stop.maxMajority = 0.5
1161    >>> tree = learner(tree)
1162    >>> printTree(tree)
1163    --> none (<15.000, 5.000, 4.000>)
1164
1165
1166Undocumented
1167============
1168
1169.. class:: NodeList
1170
1171    Undocumented.
1172
1173.. class:: C45NodeList
1174
1175    Undocumented.
1176
1177===========================
1178C4.5 Classifier and Learner
1179===========================
1180
1181As C4.5 is a standard benchmark in machine learning,
1182it is incorporated in Orange, although Orange has its own
1183implementation of decision trees.
1184
1185The implementation uses the original Quinlan's code for learning so the
1186tree you get is exactly like the one that would be build by standalone
1187C4.5. Upon return, however, the original tree is copied to Orange
1188components that contain exactly the same information plus what is needed
1189to make them visible from Python. To be sure that the algorithm behaves
1190just as the original, we use a dedicated class :class:`C45Node`
1191instead of reusing the components used by Orange's tree inducer
1192(ie, :class:`Node`). This, however, could be done and probably
1193will be done in the future; we shall still retain :class:`C45Node`
1194but offer transformation to :class:`Node` so that routines
1195that work on Orange trees will also be usable for C45 trees.
1196
1197:class:`C45Learner` and :class:`C45Classifier` behave
1198like any other Orange learner and classifier. Unlike most of Orange
1199learning algorithms, C4.5 does not accepts weighted examples.
1200
1201Building the C4.5 plug-in
1202=========================
1203
1204We haven't been able to obtain the legal rights to distribute
1205C4.5 and therefore couldn't statically link it into Orange. Instead,
1206it's incorporated as a plug-in which you'll need to build yourself.
1207The procedure is trivial, but you'll need a C compiler. On Windows,
1208the scripts we provide work with MS Visual C and the files CL.EXE
1209and LINK.EXE must be on the PATH. On Linux you're equipped with
1210gcc. Mac OS X comes without gcc, but you can download it for
1211free from Apple.
1212
1213Orange must be installed prior to building C4.5. (This is because
1214the build script will copy the created file next to Orange,
1215which it obviously can't if Orange isn't there yet.)
1216
1217#. Download the
1218   `C4.5 (Release 8) sources <http://www.rulequest.com/Personal/c4.5r8.tar.gz>`_
1219   from the `Rule Quest's site <http://www.rulequest.com/>`_ and extract
1220   them into some temporary directory. The files will be modified in the
1221   further process, so don't use your copy of Quinlan's sources that you
1222   need for another purpose.
1223#. Download
1224   `buildC45.zip <http://orange.biolab.si/orange/download/buildC45.zip>`_
1225   and unzip its contents into the directory R8/Src of the Quinlan's
1226   stuff (it's the directory that contains, for instance, the file
1227   average.c).
1228#. Run buildC45.py, which will build the plug-in and put it next to
1229   orange.pyd (or orange.so on Linux/Mac).
1230#. Run python, type :samp:`import Orange` and
1231   create create :samp:`Orange.classification.tree.C45Learner()`.
1232   If this fails, something went wrong; see the diagnostic messages from
1233   buildC45.py and read the below paragraph.
1234#. Finally, you can remove the Quinlan's stuff, along with everything
1235   created by buildC45.py.
1236
1237If the process fails, here's what buildC45.py really does: it creates
1238.h files that wrap Quinlan's .i files and ensure that they are not
1239included twice. It modifies C4.5 sources to include .h's instead of
1240.i's. This step can hardly fail. Then follows the platform dependent
1241step which compiles ensemble.c (which includes all the Quinlan's .c
1242files it needs) into c45.dll or c45.so and puts it next to Orange.
1243If this fails, but you do have a C compiler and linker, and you know
1244how to use them, you can compile the ensemble.c into a dynamic
1245library yourself. See the compile and link steps in buildC45.py,
1246if it helps. Anyway, after doing this check that the built C4.5
1247gives the same results as the original.
1248
1249.. class:: C45Learner
1250
1251    :class:`C45Learner`'s attributes have double names - those that
1252    you know from C4.5 command lines and the corresponding names of C4.5's
1253    internal variables. All defaults are set as in C4.5; if you change
1254    nothing, you are running C4.5.
1255
1256    .. attribute:: gainRatio (g)
1257       
1258        Determines whether to use information gain (false>, default)
1259        or gain ratio for selection of attributes (true).
1260
1261    .. attribute:: batch (b)
1262
1263        Turn on batch mode (no windows, no iterations); this option is
1264        not documented in C4.5 manuals. It conflicts with "window",
1265        "increment" and "trials".
1266
1267    .. attribute:: subset (s)
1268       
1269        Enables subsetting (default: false, no subsetting),
1270 
1271    .. attribute:: probThresh (p)
1272
1273        Probabilistic threshold for continuous attributes (default: false).
1274
1275    .. attribute:: minObjs (m)
1276       
1277        Minimal number of objects (examples) in leaves (default: 2).
1278
1279    .. attribute:: window (w)
1280
1281        Initial windows size (default: maximum of 20% and twice the
1282        square root of the number of data objects).
1283
1284    .. attribute:: increment (i)
1285
1286        The maximum number of objects that can be added to the window
1287        at each iteration (default: 20% of the initial window size).
1288
1289    .. attribute:: cf (c)
1290
1291        Prunning confidence level (default: 25%).
1292
1293    .. attribute:: trials (t)
1294
1295        Set the number of trials in iterative (i.e. non-batch) mode (default: 10).
1296
1297    .. attribute:: prune
1298       
1299        Return pruned tree (not an original C4.5 option) (default: true)
1300
1301
1302:class:`C45Learner` also offers another way for setting
1303the arguments: it provides a function :obj:`C45Learner.commandLine`
1304which is given a string and parses it the same way as C4.5 would
1305parse its command line. XXXXXXXXXXX
1306
1307.. class:: C45Classifier
1308
1309    A faithful reimplementation of Quinlan's function from C4.5. The only
1310    difference (and the only reason it's been rewritten) is that it uses
1311    a tree composed of :class:`C45Node` instead of C4.5's
1312    original tree structure.
1313
1314    .. attribute:: tree
1315
1316        C4.5 tree stored as a tree of :obj:`C45Node`.
1317
1318
1319.. class:: C45Node
1320
1321    This class is a reimplementation of the corresponding *struct* from
1322    Quinlan's C4.5 code.
1323
1324    .. attribute:: nodeType
1325
1326        Type of the node:  :obj:`C45Node.Leaf` (0),
1327        :obj:`C45Node.Branch` (1), :obj:`C45Node.Cut` (2),
1328        :obj:`C45Node.Subset` (3). "Leaves" are leaves, "branches"
1329        split examples based on values of a discrete attribute,
1330        "cuts" cut them according to a threshold value of a continuous
1331        attributes and "subsets" use discrete attributes but with subsetting
1332        so that several values can go into the same branch.
1333
1334    .. attribute:: leaf
1335
1336        Value returned by that leaf. The field is defined for internal
1337        nodes as well.
1338
1339    .. attribute:: items
1340
1341        Number of (learning) examples in the node.
1342
1343    .. attribute:: classDist
1344
1345        Class distribution for the node (of type
1346        :obj:`orange.DiscDistribution`).
1347
1348    .. attribute:: tested
1349       
1350        The attribute used in the node's test. If node is a leaf,
1351        obj:`tested` is None, if node is of type :obj:`Branch` or :obj:`Cut`
1352        :obj:`tested` is a discrete attribute, and if node is of type
1353        :obj:`Cut` then :obj:`tested` is a continuous attribute.
1354
1355    .. attribute:: cut
1356
1357        A threshold for continuous attributes, if node is of type :obj:`Cut`.
1358        Undefined otherwise.
1359
1360    .. attribute:: mapping
1361
1362        Mapping for nodes of type :obj:`Subset`. Element :samp:`mapping[i]`
1363        gives the index for an example whose value of :obj:`tested` is *i*.
1364        Here, *i* denotes an index of value, not a :class:`Orange.data.Value`.
1365
1366    .. attribute:: branch
1367       
1368        A list of branches stemming from this node.
1369
1370Examples
1371========
1372
1373.. _tree_c45.py: code/tree_c45.py
1374.. _iris.tab: code/iris.tab
1375
1376The simplest way to use :class:`C45Learner` is to call it. This
1377script constructs the same learner as you would get by calling
1378the usual C4.5 (`tree_c45.py`_, uses `iris.tab`_):
1379
1380.. literalinclude:: code/tree_c45.py
1381   :lines: 7-14
1382
1383Arguments can be set by the usual mechanism (the below to lines do the
1384same, except that one uses command-line symbols and the other internal
1385variable names)
1386
1387::
1388
1389    tree = Orange.classification.tree.C45Learner(data, m=100)
1390    tree = Orange.classification.tree.C45Learner(data, minObjs=100)
1391
1392The way that could be prefered by veteran C4.5 user might be through
1393method `:obj:C45Learner.commandline`.
1394
1395::
1396
1397    lrn = Orange.classification.tree..C45Learner()
1398    lrn.commandline("-m 1 -s")
1399    tree = lrn(data)
1400
1401There's nothing special about using :obj:`C45Classifier` - it's
1402just like any other. To demonstrate what the structure of
1403:class:`C45Node`'s looks like, will show a script that prints
1404it out in the same format as C4.5 does.
1405
1406.. literalinclude:: code/tree_c45_printtree.py
1407
1408Leaves are the simplest. We just print out the value contained
1409in :samp:`node.leaf`. Since this is not a qualified value (ie.,
1410:obj:`C45Node` does not know to which attribute it belongs), we need to
1411convert it to a string through :obj:`classVar`, which is passed as an
1412extra argument to the recursive part of printTree.
1413
1414For discrete splits without subsetting, we print out all attribute values
1415and recursively call the function for all branches. Continuous splits are
1416equally easy to handle.
1417
1418For discrete splits with subsetting, we iterate through branches, retrieve
1419the corresponding values that go into each branch to inset, turn
1420the values into strings and print them out, separately treating the
1421case when only a single value goes into the branch.
1422
1423Printing out C45 Tree
1424=====================
1425
1426.. autofunction:: printTreeC45
1427
1428=======================
1429orngTree module XXXXXXX
1430=======================
1431
1432.. autoclass:: TreeLearner
1433    :members:
1434
1435For a bit more complex example, here's how to write your own stop
1436function. The example itself is more funny than useful. It constructs
1437and prints two trees. For the first one we define the *defStop*
1438function, which is used by default, and combine it with a random function
1439so that the stop criteria will also be met in additional 20% of the cases
1440when *defStop* is false. The second tree is build such that it
1441considers only the random function as the stopping criteria. Note that in
1442the second case lambda function still has three parameters, since this is
1443a necessary number of parameters for the stop function XXXXX link (
1444part of `tree3.py`_ (uses  `iris.tab`_):
1445
1446.. _tree3.py: code/tree3.py
1447
1448.. literalinclude:: code/tree3.py
1449   :lines: 8-23
1450
1451The output is not shown here since the resulting trees are rather
1452big.
1453
1454Tree size
1455=========
1456
1457.. autofunction:: countNodes
1458
1459.. autofunction:: countLeaves
1460
1461Printing the Tree
1462=================
1463
1464The included printing functions can
1465print out practically anything you'd like to
1466know, from the number of examples, proportion of examples of majority
1467class in nodes and similar, to more complex statistics like the
1468proportion of examples in a particular class divided by the proportion
1469of examples of this class in a parent node. And even more, you can
1470define your own callback functions to be used for printing.
1471
1472
1473.. autofunction:: dumpTree
1474
1475.. autofunction:: printTree
1476
1477.. autofunction:: printTxt
1478
1479Before we go on: you can read all about the function and use it to its
1480full extent, or you can just call it, giving it the tree as the sole
1481argument and it will print out the usual textual representation of the
1482tree. If you're satisfied with that, you can stop here.
1483
1484The magic is in the format string. It is a string which is printed
1485out at every leaf or internal node with the certain format specifiers
1486replaced by data from the tree node. Specifiers are generally of form
1487**%[^]<precision><quantity><divisor>**.
1488
1489**^** at the start tells that the number should be multiplied by 100.
1490It's useful for printing proportions like percentages.
1491
1492**<precision>** is in the same format as in Python (or C) string
1493formatting. For instance, :samp:`%N` denotes the number of examples in the node,
1494hence :samp:`%6.2N` would mean output to two decimal digits and six places
1495altogether. If left out, a default format :samp:`5.3` is used, unless you
1496multiply the numbers by 100, in which case the default is :samp:`.0`
1497(no decimals, the number is rounded to the nearest integer).
1498
1499**<divisor>** tells what to divide the quantity in that node with.
1500:samp:`bP` means division by the same quantity in the parent node; for instance,
1501:samp:`%NbP` will tell give the number of examples in the node divided by the
1502number of examples in parent node. You can add use precision formatting,
1503e.g. :samp:`%6.2NbP.` bA is division by the same quantity over the entire data
1504set, so :samp:`%NbA` will tell you the proportion of examples (out of the entire
1505training data set) that fell into that node. If division is impossible
1506since the parent node does not exist or some data is missing, a dot is
1507printed out instead of the quantity.
1508
1509**<quantity>** is the only required element. It defines what to print.
1510For instance, :samp:`%N` would print out the number of examples in the node.
1511Possible quantities are
1512
1513:samp:`V`
1514    The value predicted at that node. You cannot define the precision
1515    or divisor here.
1516
1517:samp:`N`
1518    The number of examples in the node.
1519
1520:samp:`M`
1521    The number of examples in the majority class (that is, the class
1522    predicted by the node).
1523
1524:samp:`m`
1525    The proportion of examples in the majority class.
1526
1527:samp:`A`
1528    The average class for examples the node; this is available only for
1529    regression trees.
1530
1531:samp:`E`
1532    Standard error for class of examples in the node; available for
1533    regression trees.
1534
1535:samp:`I`
1536    Print out the confidence interval. The modifier is used as
1537    :samp:`%I(95)` of (more complicated) :samp:`%5.3I(95)bP`.
1538
1539:samp:`C`
1540    The number of examples in the given class. For classification trees,
1541    this modifier is used as, for instance in, :samp:`%5.3C="Iris-virginica"bP`
1542    - this will tell the number of examples of Iris-virginica by the
1543    number of examples this class in the parent node. If you are
1544    interested in examples that are *not* Iris-virginica, say
1545    :samp:`%5.3CbP!="Iris-virginica"`
1546
1547    For regression trees, you can use operators =, !=, <, <=, >, and >=,
1548    as in :samp:`%C<22` - add the precision and divisor if you will. You can also
1549    check the number of examples in a certain interval: :samp:`%C[20, 22]`
1550    will give you the number of examples between 20 and 22 (inclusive)
1551    and :samp:`%C(20, 22)` will give the number of such examples excluding the
1552    boundaries. You can of course mix the parentheses, e.g. :samp:`%C(20, 22]`.
1553    If you would like the examples outside the interval, add a :samp:`!`,
1554    like :samp:`%C!(20, 22]`.
1555 
1556:samp:`c`
1557    Same as above, except that it computes the proportion of the class
1558    instead of the number of examples.
1559
1560:samp:`D`
1561    Prints out the number of examples in each class. You can use both,
1562    precision (it is applied to each number in the distribution) or the
1563    divisor. This quantity cannot be computed for regression trees.
1564
1565:samp:`d`
1566    Same as above, except that it shows proportions of examples. This
1567    again doesn't work with regression trees.
1568
1569<user defined formats>
1570    You can add more, if you will. Instructions and examples are given at
1571    the end of this section.
1572
1573
1574Examples
1575========
1576
1577We shall build a small tree from the iris data set - we shall limit the
1578depth to three levels (part of `orngTree1.py`_, uses `iris.tab`_):
1579
1580.. literalinclude:: code/orngTree1.py
1581   :lines: 0-4
1582
1583.. _orngTree1.py: code/orngTree1.py
1584
1585The easiest way to call the function is to pass the tree as the only
1586argument::
1587
1588    >>> Orange.classification.tree.printTree(tree)
1589    petal width<0.800: Iris-setosa (100.00%)
1590    petal width>=0.800
1591    |    petal width<1.750
1592    |    |    petal length<5.350: Iris-versicolor (94.23%)
1593    |    |    petal length>=5.350: Iris-virginica (100.00%)
1594    |    petal width>=1.750
1595    |    |    petal length<4.850: Iris-virginica (66.67%)
1596    |    |    petal length>=4.850: Iris-virginica (100.00%)
1597
1598Let's now print out the predicted class at each node, the number
1599of examples in the majority class with the total number of examples
1600in the node::
1601
1602    >>> Orange.classification.tree.printTree(tree, leafStr="%V (%M out of %N)")
1603    petal width<0.800: Iris-setosa (50.000 out of 50.000)
1604    petal width>=0.800
1605    |    petal width<1.750
1606    |    |    petal length<5.350: Iris-versicolor (49.000 out of 52.000)
1607    |    |    petal length>=5.350: Iris-virginica (2.000 out of 2.000)
1608    |    petal width>=1.750
1609    |    |    petal length<4.850: Iris-virginica (2.000 out of 3.000)
1610    |    |    petal length>=4.850: Iris-virginica (43.000 out of 43.000)
1611
1612Would you like to know how the number of examples declines as
1613compared to the entire data set and to the parent node? We find
1614it with this::
1615
1616    >>> orng.printTree("%V (%^MbA%, %^MbP%)")
1617    petal width<0.800: Iris-setosa (100%, 100%)
1618    petal width>=0.800
1619    |    petal width<1.750
1620    |    |    petal length<5.350: Iris-versicolor (98%, 100%)
1621    |    |    petal length>=5.350: Iris-virginica (4%, 40%)
1622    |    petal width>=1.750
1623    |    |    petal length<4.850: Iris-virginica (4%, 4%)
1624    |    |    petal length>=4.850: Iris-virginica (86%, 96%)
1625
1626Let us first read the format string. :samp:`%M` is the number of
1627examples in the majority class. We want it divided by the number of
1628all examples from this class on the entire data set, hence :samp:`%MbA`.
1629To have it multipied by 100, we say :samp:`%^MbA`. The percent sign *after*
1630that is just printed out literally, just as the comma and parentheses
1631(see the output). The string for showing the proportion of this class
1632in the parent is the same except that we have :samp:`bP` instead
1633of :samp:`bA`.
1634
1635And now for the output: all examples of setosa for into the first node.
1636For versicolor, we have 98% in one node; the rest is certainly
1637not in the neighbouring node (petal length&gt;=5.350) since all
1638versicolors from the node petal width<1.750 went to petal length<5.350
1639(we know this from the 100% in that line). Virginica is the
1640majority class in the three nodes that together contain 94% of this
1641class (4+4+86). The rest must had gone to the same node as versicolor.
1642
1643If you find this guesswork annoying - so do I. Let us print out the
1644number of versicolors in each node, together with the proportion of
1645versicolors among the examples in this particular node and among all
1646versicolors. So,
1647
1648::
1649
1650    '%C="Iris-versicolor" (%^c="Iris-versicolor"% of node, %^CbA="Iris-versicolor"% of versicolors)
1651
1652gives the following output::
1653
1654    petal width<0.800: 0.000 (0% of node, 0% of versicolors)
1655    petal width>=0.800
1656    |    petal width<1.750
1657    |    |    petal length<5.350: 49.000 (94% of node, 98% of versicolors)
1658    |    |    petal length>=5.350: 0.000 (0% of node, 0% of versicolors)
1659    |    petal width>=1.750
1660    |    |    petal length<4.850: 1.000 (33% of node, 2% of versicolors)
1661    |    |    petal length>=4.850: 0.000 (0% of node, 0% of versicolors)
1662
1663Finally, we may want to print out the distributions, using a simple
1664string :samp:`%D`::
1665
1666    petal width<0.800: [50.000, 0.000, 0.000]
1667    petal width>=0.800
1668    |    petal width<1.750
1669    |    |    petal length<5.350: [0.000, 49.000, 3.000]
1670    |    |    petal length>=5.350: [0.000, 0.000, 2.000]
1671    |    petal width>=1.750
1672    |    |    petal length<4.850: [0.000, 1.000, 2.000]
1673    |    |    petal length>=4.850: [0.000, 0.000, 43.000]
1674
1675What is the order of numbers here? If you check
1676:samp:`data.domain.classVar.values` , you'll learn that the order is setosa,
1677versicolor, virginica; so in the node at peta length<5.350 we have 49
1678versicolors and 3 virginicae. To print out the proportions, we can
1679:samp:`%.2d` - this gives us proportions within node, rounded on
1680two decimals::
1681
1682    petal width<0.800: [1.00, 0.00, 0.00]
1683    petal width>=0.800
1684    |    petal width<1.750
1685    |    |    petal length<5.350: [0.00, 0.94, 0.06]
1686    |    |    petal length>=5.350: [0.00, 0.00, 1.00]
1687    |    petal width>=1.750
1688    |    |    petal length<4.850: [0.00, 0.33, 0.67]
1689    |    |    petal length>=4.850: [0.00, 0.00, 1.00]
1690
1691We haven't tried printing out any information for internal nodes.
1692To start with the most trivial case, we shall print the prediction
1693at each node.
1694
1695::
1696
1697    Orange.classification.tree.printTree(tree, leafStr="%V", nodeStr=".")
1698   
1699says that the nodeStr should be the same as leafStr (not very useful
1700here, since leafStr is trivial anyway).
1701
1702::
1703
1704    root: Iris-setosa
1705    |    petal width<0.800: Iris-setosa
1706    |    petal width>=0.800: Iris-versicolor
1707    |    |    petal width<1.750: Iris-versicolor
1708    |    |    |    petal length<5.350: Iris-versicolor
1709    |    |    |    petal length>=5.350: Iris-virginica
1710    |    |    petal width>=1.750: Iris-virginica
1711    |    |    |    petal length<4.850: Iris-virginica
1712    |    |    |    petal length>=4.850: Iris-virginica
1713
1714Note that the output is somewhat different now: there appeared another
1715node called *root* and the tree looks one level deeper. This is
1716needed to print out the data for that node to.
1717
1718Now for something more complicated: let us observe how the number
1719of virginicas decreases down the tree::
1720
1721    Orange.classification.tree.printTree(tree, leafStr='%^.1CbA="Iris-virginica"% (%^.1CbP="Iris-virginica"%)', nodeStr='.')
1722
1723Let's first interpret the format string: :samp:`CbA="Iris-virginica"` is
1724the number of examples from class virginica, divided by the total number
1725of examples in this class. Add :samp:`^.1` and the result will be
1726multiplied and printed with one decimal. The trailing :samp:`%` is printed
1727out. In parentheses we print the same thing except that we divide by the
1728examples in the parent node. Note the use of single quotes, so we can
1729use the double quotes inside the string, when we specify the class.
1730
1731::
1732
1733    root: 100.0% (.%)
1734    |    petal width<0.800: 0.0% (0.0%)
1735    |    petal width>=0.800: 100.0% (100.0%)
1736    |    |    petal width<1.750: 10.0% (10.0%)
1737    |    |    |    petal length<5.350: 6.0% (60.0%)
1738    |    |    |    petal length>=5.350: 4.0% (40.0%)
1739    |    |    petal width>=1.750: 90.0% (90.0%)
1740    |    |    |    petal length<4.850: 4.0% (4.4%)
1741    |    |    |    petal length>=4.850: 86.0% (95.6%)
1742
1743See what's in the parentheses in the root node? If :func:`printTree`
1744cannot compute something (in this case it's because the root has no parent),
1745it prints out a dot. You can also eplace :samp:`=` by :samp:`!=` and it
1746will count all classes *except* virginica.
1747
1748For one final example with classification trees, we shall print the
1749distributions in that nodes, the distribution compared to the parent
1750and the proportions compared to the parent (the latter things are not
1751the same - think why). In the leaves we shall also add the predicted
1752class. So now we'll have to call the function like this.
1753
1754::
1755
1756    >>>Orange.classification.tree.printTree(tree, leafStr='"%V   %D %.2DbP %.2dbP"', nodeStr='"%D %.2DbP %.2dbP"')
1757    root: [50.000, 50.000, 50.000] . .
1758    |    petal width<0.800: [50.000, 0.000, 0.000] [1.00, 0.00, 0.00] [3.00, 0.00, 0.00]:
1759    |        Iris-setosa   [50.000, 0.000, 0.000] [1.00, 0.00, 0.00] [3.00, 0.00, 0.00]
1760    |    petal width>=0.800: [0.000, 50.000, 50.000] [0.00, 1.00, 1.00] [0.00, 1.50, 1.50]
1761    |    |    petal width<1.750: [0.000, 49.000, 5.000] [0.00, 0.98, 0.10] [0.00, 1.81, 0.19]
1762    |    |    |    petal length<5.350: [0.000, 49.000, 3.000] [0.00, 1.00, 0.60] [0.00, 1.04, 0.62]:
1763    |    |    |        Iris-versicolor   [0.000, 49.000, 3.000] [0.00, 1.00, 0.60] [0.00, 1.04, 0.62]
1764    |    |    |    petal length>=5.350: [0.000, 0.000, 2.000] [0.00, 0.00, 0.40] [0.00, 0.00, 10.80]:
1765    |    |    |        Iris-virginica   [0.000, 0.000, 2.000] [0.00, 0.00, 0.40] [0.00, 0.00, 10.80]
1766    |    |    petal width>=1.750: [0.000, 1.000, 45.000] [0.00, 0.02, 0.90] [0.00, 0.04, 1.96]
1767    |    |    |    petal length<4.850: [0.000, 1.000, 2.000] [0.00, 1.00, 0.04] [0.00, 15.33, 0.68]:
1768    |    |    |        Iris-virginica   [0.000, 1.000, 2.000] [0.00, 1.00, 0.04] [0.00, 15.33, 0.68]
1769    |    |    |    petal length>=4.850: [0.000, 0.000, 43.000] [0.00, 0.00, 0.96] [0.00, 0.00, 1.02]:
1770    |    |    |        Iris-virginica   [0.000, 0.000, 43.000] [0.00, 0.00, 0.96] [0.00, 0.00, 1.02]
1771
1772To explore the possibilities when printing regression trees, we are going
1773to induce a tree from the housing data set. Called with the tree as the
1774only argument, :func:`printTree` prints the tree like this::
1775
1776    RM<6.941
1777    |    LSTAT<14.400
1778    |    |    DIS<1.385: 45.6
1779    |    |    DIS>=1.385: 22.9
1780    |    LSTAT>=14.400
1781    |    |    CRIM<6.992: 17.1
1782    |    |    CRIM>=6.992: 12.0
1783    RM>=6.941
1784    |    RM<7.437
1785    |    |    CRIM<7.393: 33.3
1786    |    |    CRIM>=7.393: 14.4
1787    |    RM>=7.437
1788    |    |    TAX<534.500: 45.9
1789    |    |    TAX>=534.500: 21.9
1790
1791Let us add the standard error in both internal nodes and leaves, and the
179290% confidence intervals in the leaves::
1793
1794    >>> Orange.classification.tree.printTree(tree, leafStr="[SE: %E]\t %V %I(90)", nodeStr="[SE: %E]")
1795    root: [SE: 0.409]
1796    |    RM<6.941: [SE: 0.306]
1797    |    |    LSTAT<14.400: [SE: 0.320]
1798    |    |    |    DIS<1.385: [SE: 4.420]:
1799    |    |    |        [SE: 4.420]   45.6 [38.331-52.829]
1800    |    |    |    DIS>=1.385: [SE: 0.244]:
1801    |    |    |        [SE: 0.244]   22.9 [22.504-23.306]
1802    |    |    LSTAT>=14.400: [SE: 0.333]
1803    |    |    |    CRIM<6.992: [SE: 0.338]:
1804    |    |    |        [SE: 0.338]   17.1 [16.584-17.691]
1805    |    |    |    CRIM>=6.992: [SE: 0.448]:
1806    |    |    |        [SE: 0.448]   12.0 [11.243-12.714]
1807    |    RM>=6.941: [SE: 1.031]
1808    |    |    RM<7.437: [SE: 0.958]
1809    |    |    |    CRIM<7.393: [SE: 0.692]:
1810    |    |    |        [SE: 0.692]   33.3 [32.214-34.484]
1811    |    |    |    CRIM>=7.393: [SE: 2.157]:
1812    |    |    |        [SE: 2.157]   14.4 [10.862-17.938]
1813    |    |    RM>=7.437: [SE: 1.124]
1814    |    |    |    TAX<534.500: [SE: 0.817]:
1815    |    |    |        [SE: 0.817]   45.9 [44.556-47.237]
1816    |    |    |    TAX>=534.500: [SE: 0.000]:
1817    |    |    |        [SE: 0.000]   21.9 [21.900-21.900]
1818
1819What's the difference between :samp:`%V`, the predicted value and
1820:samp:`%A` the average? Doesn't a regression tree always predict the
1821leaf average anyway? Not necessarily, the tree predict whatever the
1822:attr:`TreeClassifier.nodeClassifier` in a leaf returns.
1823As :samp:`%V` uses the
1824:obj:`Orange.data.variable.Continuous`' function for printing out the value,
1825therefore the printed number has the same number of decimals
1826as in the data file.
1827
1828Regression trees cannot print the distributions in the same way
1829as classification trees. They instead offer a set of operators for
1830observing the number of examples within a certain range. For instance,
1831let us check the number of examples with values below 22, and compare
1832this number with values in the parent nodes::
1833
1834    >>> Orange.classification.tree.printTree(tree, leafStr="%C<22 (%cbP<22)", nodeStr=".")
1835    root: 277.000 (.)
1836    |    RM<6.941: 273.000 (1.160)
1837    |    |    LSTAT<14.400: 107.000 (0.661)
1838    |    |    |    DIS<1.385: 0.000 (0.000)
1839    |    |    |    DIS>=1.385: 107.000 (1.020)
1840    |    |    LSTAT>=14.400: 166.000 (1.494)
1841    |    |    |    CRIM<6.992: 93.000 (0.971)
1842    |    |    |    CRIM>=6.992: 73.000 (1.040)
1843    |    RM>=6.941: 4.000 (0.096)
1844    |    |    RM<7.437: 3.000 (1.239)
1845    |    |    |    CRIM<7.393: 0.000 (0.000)
1846    |    |    |    CRIM>=7.393: 3.000 (15.333)
1847    |    |    RM>=7.437: 1.000 (0.633)
1848    |    |    |    TAX<534.500: 0.000 (0.000)
1849    |    |    |    TAX>=534.500: 1.000 (30.000)</xmp>
1850
1851The last line, for instance, says the the number of examples with the
1852class below 22 is among those with tax above 534 is 30 times higher
1853than the number of such examples in its parent node.
1854
1855For another exercise, let's count the same for all examples *outside*
1856interval [20, 22] (given like this, the interval includes the bounds).
1857And let us print out the proportions as percents.
1858
1859::
1860
1861    >>> Orange.classification.tree.printTree(tree, leafStr="%C![20,22] (%^cbP![20,22]%)", nodeStr=".")
1862
1863OK, let's observe the format string for one last time. :samp:`%c![20, 22]`
1864would be the proportion of examples (within the node) whose values are
1865below 20 or above 22. By :samp:`%cbP![20, 22]` we derive this by the same
1866statistics computed on the parent. Add a :samp:`^` and you have the percentages.
1867
1868::
1869
1870    root: 439.000 (.%)
1871    |    RM<6.941: 364.000 (98%)
1872    |    |    LSTAT<14.400: 200.000 (93%)
1873    |    |    |    DIS<1.385: 5.000 (127%)
1874    |    |    |    DIS>=1.385: 195.000 (99%)
1875    |    |    LSTAT>=14.400: 164.000 (111%)
1876    |    |    |    CRIM<6.992: 91.000 (96%)
1877    |    |    |    CRIM>=6.992: 73.000 (105%)
1878    |    RM>=6.941: 75.000 (114%)
1879    |    |    RM<7.437: 46.000 (101%)
1880    |    |    |    CRIM<7.393: 43.000 (100%)
1881    |    |    |    CRIM>=7.393: 3.000 (100%)
1882    |    |    RM>=7.437: 29.000 (98%)
1883    |    |    |    TAX<534.500: 29.000 (103%)
1884    |    |    |    TAX>=534.500: 0.000 (0%)
1885
1886
1887Defining Your Own Printout functions
1888====================================
1889
1890:func:`dumpTree`'s argument :obj:`userFormats` can be used to print out
1891some other information in the leaves or nodes. If provided,
1892:obj:`userFormats` should contain a list of tuples with a regular expression
1893and a callback function to be called when that expression is found in the
1894format string. Expressions from :obj:`userFormats` are checked before
1895the built-in expressions discussed above, so you can override the built-ins
1896if you want to.
1897
1898The regular expression should describe a string like those we used above,
1899for instance the string :samp:`%.2DbP`. When a leaf or internal node
1900is printed out, the format string (:obj:`leafStr` or :obj:`nodeStr`)
1901is checked for these regular expressions and when the match is found,
1902the corresponding callback function is called.
1903
1904The callback function will get five arguments: the format string
1905(:obj:`leafStr` or :obj:`nodeStr`), the match object, the node which is
1906being printed, its parent (can be None) and the tree classifier.
1907The function should return the format string in which the part described
1908by the match object (that is, the part that is matched by the regular
1909expression) is replaced by whatever information your callback function
1910is supposed to give.
1911
1912The function can use several utility function provided in the module.
1913
1914.. autofunction:: insertStr
1915
1916.. autofunction:: insertDot
1917
1918.. autofunction:: insertNum
1919
1920.. autofunction:: byWhom
1921
1922
1923There are also a few pieces of regular expression that you may want to reuse.
1924The two you are likely to use are:
1925
1926.. autodata:: fs
1927
1928.. autodata:: by
1929
1930For a trivial example, :samp:`%V` is implemented like this. There is the
1931following tuple in the list of built-in formats::
1932
1933    (re.compile("%V"), replaceV)
1934
1935:obj:`replaceV` is a function defined by::
1936
1937    def replaceV(strg, mo, node, parent, tree):
1938        return insertStr(strg, mo, str(node.nodeClassifier.defaultValue))
1939
1940It therefore takes the value predicted at the node
1941(:samp:`node.nodeClassifier.defaultValue` ), converts it to a string
1942and passes it to *insertStr* to do the replacement.
1943
1944A more complex regular expression is the one for the proportion of
1945majority class, defined as :samp:`"%"+fs+"M"+by`. It uses the
1946two partial expressions defined above.
1947
1948Let's say with like to print the classification margin for each node,
1949that is, the difference between the proportion of the largest and the
1950second largest class in the node (part of `orngTree2.py`_):
1951
1952.. _orngTree2.py: code/orngTree2.py
1953
1954.. literalinclude:: code/orngTree2.py
1955   :lines: 7-31
1956
1957We first defined getMargin which gets the distribution and computes the
1958margin. The callback replaces, replaceB, computes the margin for the node.
1959If we need to divided the quantity by something (that is, if the :data:`by`
1960group is present), we call :func:`byWhom` to get the node with whose margin
1961this node's margin is to be divided. If this node (usually the parent)
1962does not exist of if its margin is zero, we call :func:`insertDot`
1963to insert a dot, otherwise we call :func:`insertNum` which will insert
1964the number, obeying the format specified by the user. myFormat is a list
1965containing the regular expression and the callback function.
1966
1967
1968We can now print out the iris tree:
1969
1970.. literalinclude:: code/orngTree2.py
1971    :lines: 33
1972
1973And we get::
1974
1975    petal width<0.800: Iris-setosa 100% (100.00%)
1976    petal width>=0.800
1977    |    petal width<1.750
1978    |    |    petal length<5.350: Iris-versicolor 88% (108.57%)
1979    |    |    petal length>=5.350: Iris-virginica 100% (122.73%)
1980    |    petal width>=1.750
1981    |    |    petal length<4.850: Iris-virginica 33% (34.85%)
1982    |    |    petal length>=4.850: Iris-virginica 100% (104.55%)
1983
1984
1985Plotting the Tree using Dot
1986===========================
1987
1988.. autofunction:: printDot
1989
1990.. autofunction:: dotTree
1991
1992Suppose you saved the tree in a file "tree5.dot". You can then
1993print it out as a gif if you execute the following command line
1994
1995::
1996   
1997    dot -Tgif tree5.dot -otree5.gif
1998
1999GraphViz's dot has quite a few other output formats, check
2000its documentation to learn which.
2001
2002References
2003==========
2004
2005Bratko, I. (2002). `Prolog Programming for Artificial Intelligence`, Addison
2006Wesley, 2002.
2007
2008E Koutsofios, SC North. Drawing Graphs with dot. AT&T Bell Laboratories,
2009Murray Hill NJ, U.S.A., October 1993.
2010
2011`Graphviz - open source graph drawing software <http://www.research.att.com/sw/tools/graphviz/>`_
2012A home page of AT&T's dot and similar software packages.
2013
2014"""
2015
2016from Orange.core import \
2017     TreeLearner as TreeLearnerBase, \
2018         TreeClassifier as _TreeClassifier, \
2019         C45Learner, \
2020         C45Classifier, \
2021         C45TreeNode as C45Node, \
2022         C45TreeNodeList as C45NodeList, \
2023         TreeDescender as Descender, \
2024              TreeDescender_UnknownMergeAsBranchSizes as Descender_UnknownMergeAsBranchSizes, \
2025              TreeDescender_UnknownMergeAsSelector as Descender_UnknownMergeAsSelector, \
2026              TreeDescender_UnknownToBranch as Descender_UnknownToBranch, \
2027              TreeDescender_UnknownToCommonBranch as Descender_UnknownToCommonBranch, \
2028              TreeDescender_UnknownToCommonSelector as Descender_UnknownToCommonSelector, \
2029         TreeExampleSplitter as Splitter, \
2030              TreeExampleSplitter_IgnoreUnknowns as Splitter_IgnoreUnknowns, \
2031              TreeExampleSplitter_UnknownsAsBranchSizes as Splitter_UnknownsAsBranchSizes, \
2032              TreeExampleSplitter_UnknownsAsSelector as Splitter_UnknownsAsSelector, \
2033              TreeExampleSplitter_UnknownsToAll as Splitter_UnknownsToAll, \
2034              TreeExampleSplitter_UnknownsToBranch as Splitter_UnknownsToBranch, \
2035              TreeExampleSplitter_UnknownsToCommon as Splitter_UnknownsToCommon, \
2036              TreeExampleSplitter_UnknownsToRandom as Splitter_UnknownsToRandom, \
2037         TreeNode as Node, \
2038         TreeNodeList as NodeList, \
2039         TreePruner as Pruner, \
2040              TreePruner_SameMajority as Pruner_SameMajority, \
2041              TreePruner_m as Pruner_m, \
2042         TreeSplitConstructor as SplitConstructor, \
2043              TreeSplitConstructor_Combined as SplitConstructor_Combined, \
2044              TreeSplitConstructor_Measure as SplitConstructor_Measure, \
2045                   TreeSplitConstructor_Attribute as SplitConstructor_Feature, \
2046                   TreeSplitConstructor_ExhaustiveBinary as SplitConstructor_ExhaustiveBinary, \
2047                   TreeSplitConstructor_OneAgainstOthers as SplitConstructor_OneAgainstOthers, \
2048                   TreeSplitConstructor_Threshold as SplitConstructor_Threshold, \
2049         TreeStopCriteria as StopCriteria, \
2050              TreeStopCriteria_Python as StopCriteria_Python, \
2051              TreeStopCriteria_common as StopCriteria_common
2052
2053import Orange.core
2054import operator
2055import base64
2056import re
2057import Orange.data
2058import Orange.feature.scoring
2059import Orange.classification.tree
2060
2061def _c45_showBranch(node, classvar, lev, i):
2062    var = node.tested
2063    if node.nodeType == 1:
2064        print ("\n"+"|   "*lev + "%s = %s:") % (var.name, var.values[i]),
2065        _c45_printTree0(node.branch[i], classvar, lev+1)
2066    elif node.nodeType == 2:
2067        print ("\n"+"|   "*lev + "%s %s %.1f:") % (var.name, ["<=", ">"][i], node.cut),
2068        _c45_printTree0(node.branch[i], classvar, lev+1)
2069    else:
2070        inset = filter(lambda a:a[1]==i, enumerate(node.mapping))
2071        inset = [var.values[j[0]] for j in inset]
2072        if len(inset)==1:
2073            print ("\n"+"|   "*lev + "%s = %s:") % (var.name, inset[0]),
2074        else:
2075            print ("\n"+"|   "*lev + "%s in {%s}:") % (var.name, ", ".join(inset)),
2076        _c45_printTree0(node.branch[i], classvar, lev+1)
2077       
2078       
2079def _c45_printTree0(node, classvar, lev):
2080    var = node.tested
2081    if node.nodeType == 0:
2082        print "%s (%.1f)" % (classvar.values[int(node.leaf)], node.items),
2083    else:
2084        for i, branch in enumerate(node.branch):
2085            if not branch.nodeType:
2086                _c45_showBranch(node, classvar, lev, i)
2087        for i, branch in enumerate(node.branch):
2088            if branch.nodeType:
2089                _c45_showBranch(node, classvar, lev, i)
2090
2091def printTreeC45(tree):
2092    """
2093    Prints the tree given as an argument in the same form as Ross Quinlan's
2094    C4.5 program.
2095
2096    ::
2097
2098        import Orange
2099
2100        data = Orange.data.Table("voting")
2101        c45 = Orange.classification.tree.C45Learner(data)
2102        Orange.classification.tree.printTreeC45(c45)
2103
2104    will print out
2105
2106    ::
2107
2108        physician-fee-freeze = n: democrat (253.4)
2109        physician-fee-freeze = y:
2110        |   synfuels-corporation-cutback = n: republican (145.7)
2111        |   synfuels-corporation-cutback = y:
2112        |   |   mx-missile = y: democrat (6.0)
2113        |   |   mx-missile = n:
2114        |   |   |   adoption-of-the-budget-resolution = n: republican (22.6)
2115        |   |   |   adoption-of-the-budget-resolution = y:
2116        |   |   |   |   anti-satellite-test-ban = n: democrat (5.0)
2117        |   |   |   |   anti-satellite-test-ban = y: republican (2.2)
2118
2119
2120    If you run the original C4.5 (that is, the standalone C4.5 - Orange does use the original C4.5) on the same data, it will print out
2121
2122    ::
2123
2124        physician-fee-freeze = n: democrat (253.4/5.9)
2125        physician-fee-freeze = y:
2126        |   synfuels-corporation-cutback = n: republican (145.7/6.2)
2127        |   synfuels-corporation-cutback = y:
2128        |   |   mx-missile = y: democrat (6.0/2.4)
2129        |   |   mx-missile = n:
2130        |   |   |   adoption-of-the-budget-resolution = n: republican (22.6/5.2)
2131        |   |   |   adoption-of-the-budget-resolution = y:
2132        |   |   |   |   anti-satellite-test-ban = n: democrat (5.0/1.2)
2133        |   |   |   |   anti-satellite-test-ban = y: republican (2.2/1.0)
2134
2135    which is adoringly similar, except that C4.5 tested the tree on
2136    the learning data and has also printed out the number of errors
2137    in each node - something which :obj:`c45_printTree` obviously can't do
2138    (nor is there any need it should).
2139
2140    """
2141    _c45_printTree0(tree.tree, tree.classVar, 0)
2142
2143
2144class TreeLearner(Orange.core.Learner):
2145    """
2146    Assembles the generic classification or regression tree learner
2147    (from Orange's objects for induction of decision trees).
2148    :class:`TreeLearner` is essentially a wrapper
2149    around :class:`TreeLearnerBase`, provided for easier use of the latter.
2150    It sets a number of parameters used in induction that
2151    can also be set after the creation of the object, most often through
2152    the object's attributes. If upon initialization
2153    :class:`TreeLearner` is given a set of examples, then an instance
2154    of :class:`TreeClassifier` object is returned instead.
2155
2156    The values of attributes can be also be set in the constructor.
2157
2158    .. attribute:: nodeLearner
2159
2160        Induces a classifier from examples belonging to a node. The
2161        same learner is used for internal nodes and for leaves. The
2162        default :obj:`nodeLearner` is :obj:`Orange.classification.majority.MajorityLearner`.
2163
2164    **Split construction**
2165
2166    .. attribute:: split
2167       
2168        Defines a function that will be used in place of
2169        :obj:`SplitConstructor`.
2170        Useful when prototyping new tree induction
2171        algorithms. When this parameter is defined, other parameters that
2172        affect the procedures for growing of the tree are ignored. These
2173        include :obj:`binarization`, :obj:`measure`,
2174        :obj:`worstAcceptable` and :obj:`minSubset` (Default:
2175        :class:SplitConstructor_Combined
2176        with separate constructors for discrete and continuous attributes.
2177        Discrete attributes are used as they are, while
2178        continuous attributes are binarized.
2179        Gain ratio is used to select attributes.
2180        A minimum of two examples in a leaf is required for
2181        discrete and five examples in a leaf for continuous attributes.)
2182
2183    .. attribute:: binarization
2184
2185        If 1, :class:`SplitConstructor_ExhaustiveBinary` is used.
2186        If 2, use class:`SplitConstructor_OneAgainstOthers`. If
2187        0, do not use binarization (use class:`SplitConstructor_Attribute`).
2188        Default: 0.
2189
2190    .. attribute:: measure
2191   
2192        Measure for scoring of the attributes when deciding which of the
2193        attributes will be used for splitting of the example set in the node.
2194        Can be either a measure XXXXX or one of
2195        "infoGain" (:class:`Orange.feature.scoring.InfoGain`),
2196        "gainRatio" (:class:`Orange.feature.scoring.GainRatio`),
2197        "gini" (:class:`Orange.feature.scoring.Gini`),
2198        "relief" (:class:`Orange.feature.scoring.Relief`),
2199        "retis" (:class: `Orange.feature.scoring.MSE`). Default: "gainRatio".
2200
2201    .. attribute:: reliefM, reliefK
2202
2203        Sem `m` and `k` to given values if the :obj:`measure` is relief.
2204
2205    **Pruning**
2206
2207    .. attribute:: worstAcceptable
2208
2209        Used in pre-pruning, sets the lowest required attribute
2210        score. If the score of the best attribute is below this margin, the
2211        tree at that node is not grown further (default: 0).
2212
2213        So, to allow splitting only when gainRatio (the default measure)
2214        is greater than 0.6, one should run the learner like this:
2215        :samp:`l = Orange.classification.tree.TreeLearner(data, worstAcceptable=0.6)`
2216
2217    .. attribute:: minSubset
2218
2219        Minimal number of examples in non-null leaves (default: 0).
2220
2221    .. attribute:: minExamples
2222
2223        Data subsets with less than :obj:`minExamples`
2224        examples are not split any further, that is, all leaves in the tree
2225        will contain at least that many of examples (default: 0).
2226
2227    .. attribute:: maxDepth
2228
2229        Gives maximal tree depth;  0 means that only root is generated.
2230        The default is 100.
2231
2232    .. attribute:: maxMajority
2233
2234        Induction stops when the proportion of majority class in the
2235        node exceeds the value set by this parameter(default: 1.0). E.g.
2236        to stop the induction as soon as the majority class reaches 70%,
2237        you should say
2238        :samp:`tree2 = Orange.classification.tree.TreeLearner(data, maxMajority=0.7)`
2239
2240        This is an example of the tree on iris data set, built with
2241        XXXXXXXXX what above arguments XXXXXXXXX
2242        the above arguments - the numbers show the majority class
2243        proportion at each node. You can find more details in the
2244        script tree2.py, which induces and prints this tree.
2245
2246        ::
2247
2248            root: 0.333
2249            |    petal width<0.800: 1.000
2250            |    petal width>=0.800: 0.500
2251            |    |    petal width<1.750: 0.907
2252            |    |    petal width>=1.750: 0.978
2253   
2254    .. attribute:: stop
2255
2256        Used for passing a function which is used in place of
2257        :class:`StopCriteria`. Useful when prototyping new
2258        tree induction algorithms. See a documentation on
2259        :class:`StopCriteria` for more info on this function.
2260        When used, parameters  :obj:`maxMajority` and :obj:`minExamples`
2261        will not be  considered (default: None). XXXXX To je pisalo spodaj.
2262        The default stopping criterion stops induction when all examples in a node belong to the same class.
2263
2264    .. attribute:: mForPruning
2265
2266        If non-zero, invokes an error-based bottom-up post-pruning,
2267        where m-estimate is used to estimate class probabilities
2268        (default: 0).
2269
2270    .. attribute:: sameMajorityPruning
2271
2272        If true, invokes a bottom-up post-pruning by removing the
2273        subtrees of which all leaves classify to the same class
2274        (default: False).
2275
2276    **Record keeping**
2277
2278    .. attribute:: storeDistributions, storeContingencies, storeExamples, storeNodeClassifier
2279
2280        Determines whether to store class distributions, contingencies and
2281        examples in :class:`Node`, and whether the :obj:`nodeClassifier`
2282        should be build for internal nodes. By default everything except
2283        :obj:`storeExamples` is enabled. You won't save any memory by not storing
2284        distributions but storing contingencies, since distributions actually points to
2285        the same distribution that is stored in
2286        :obj:`contingency.classes`. (default: True except for
2287        storeExamples, which defaults to False).
2288   
2289    """
2290    def __new__(cls, examples = None, weightID = 0, **argkw):
2291        self = Orange.core.Learner.__new__(cls, **argkw)
2292        if examples:
2293            self.__init__(**argkw)
2294            return self.__call__(examples, weightID)
2295        else:
2296            return self
2297     
2298    def __init__(self, **kw):
2299        self.learner = None
2300        self.__dict__.update(kw)
2301     
2302    def __call__(self, examples, weight=0):
2303        """
2304        Return a classifier from the given examples.
2305        """
2306        bl = self._base_learner()
2307
2308        #build second part of the split
2309        if not hasattr(self, "split") and not hasattr(self, "measure"):
2310            if examples.domain.classVar.varType == Orange.data.Type.Discrete:
2311                measure = Orange.feature.scoring.GainRatio()
2312            else:
2313                measure = Orange.feature.scoring.MSE()
2314            bl.split.continuousSplitConstructor.measure = measure
2315            bl.split.discreteSplitConstructor.measure = measure
2316           
2317        tree = bl(examples, weight)
2318        if getattr(self, "sameMajorityPruning", 0):
2319            tree = Pruner_SameMajority(tree)
2320        if getattr(self, "mForPruning", 0):
2321            tree = Pruner_m(tree, m = self.mForPruning)
2322
2323        return TreeClassifier(baseClassifier=tree) 
2324
2325    def instance(self):
2326        """
2327        DEPRECATED. Return a base learner - an object
2328        of :class:`TreeLearnerBase`.
2329        This method is left for backwards compatibility.
2330        """
2331        return self._base_learner()
2332
2333    def _build_split(self, learner):
2334
2335        if hasattr(self, "split"):
2336            learner.split = self.split
2337        else:
2338            learner.split = SplitConstructor_Combined()
2339            learner.split.continuousSplitConstructor = \
2340                SplitConstructor_Threshold()
2341            binarization = getattr(self, "binarization", 0)
2342            if binarization == 1:
2343                learner.split.discreteSplitConstructor = \
2344                    SplitConstructor_ExhaustiveBinary()
2345            elif binarization == 2:
2346                learner.split.discreteSplitConstructor = \
2347                    SplitConstructor_OneAgainstOthers()
2348            else:
2349                learner.split.discreteSplitConstructor = \
2350                    SplitConstructor_Feature()
2351
2352            measures = {"infoGain": Orange.feature.scoring.InfoGain,
2353                "gainRatio": Orange.feature.scoring.GainRatio,
2354                "gini": Orange.feature.scoring.Gini,
2355                "relief": Orange.feature.scoring.Relief,
2356                "retis": Orange.feature.scoring.MSE
2357                }
2358
2359            measure = getattr(self, "measure", None)
2360            if isinstance(measure, str):
2361                measure = measures[measure]()
2362            if not measure:
2363                measure = Orange.feature.scoring.GainRatio()
2364
2365            measureIsRelief = isinstance(measure, Orange.feature.scoring.Relief)
2366            relM = getattr(self, "reliefM", None)
2367            if relM and measureIsRelief:
2368                measure.m = relM
2369           
2370            relK = getattr(self, "reliefK", None)
2371            if relK and measureIsRelief:
2372                measure.k = relK
2373
2374            learner.split.continuousSplitConstructor.measure = measure
2375            learner.split.discreteSplitConstructor.measure = measure
2376
2377            wa = getattr(self, "worstAcceptable", 0)
2378            if wa:
2379                learner.split.continuousSplitConstructor.worstAcceptable = wa
2380                learner.split.discreteSplitConstructor.worstAcceptable = wa
2381
2382            ms = getattr(self, "minSubset", 0)
2383            if ms:
2384                learner.split.continuousSplitConstructor.minSubset = ms
2385                learner.split.discreteSplitConstructor.minSubset = ms
2386
2387    def _base_learner(self):
2388        learner = TreeLearnerBase()
2389
2390        self._build_split(learner)
2391
2392        if hasattr(self, "stop"):
2393            learner.stop = self.stop
2394        else:
2395            learner.stop = Orange.classification.tree.StopCriteria_common()
2396            mm = getattr(self, "maxMajority", 1.0)
2397            if mm < 1.0:
2398                learner.stop.maxMajority = self.maxMajority
2399            me = getattr(self, "minExamples", 0)
2400            if me:
2401                learner.stop.minExamples = self.minExamples
2402
2403        for a in ["storeDistributions", "storeContingencies", "storeExamples", 
2404            "storeNodeClassifier", "nodeLearner", "maxDepth"]:
2405            if hasattr(self, a):
2406                setattr(learner, a, getattr(self, a))
2407
2408        return learner
2409
2410#counting
2411
2412def __countNodes(node):
2413    count = 0
2414    if node:
2415        count += 1
2416        if node.branches:
2417            for node in node.branches:
2418                count += __countNodes(node)
2419    return count
2420
2421def countNodes(tree):
2422    """
2423    Return the number of nodes of tree.
2424
2425    :param tree: The tree for which to count the nodes.
2426    :type tree: :class:`TreeClassifier`
2427    """
2428    return __countNodes(tree.tree if isinstance(tree, _TreeClassifier) or \
2429        isinstance(tree, TreeClassifier) else tree)
2430
2431
2432def __countLeaves(node):
2433    count = 0
2434    if node:
2435        if node.branches: # internal node
2436            for node in node.branches:
2437                count += __countLeaves(node)
2438        else:
2439            count += 1
2440    return count
2441
2442def countLeaves(tree):
2443    """
2444    Return the number of leaves in the tree.
2445
2446    :param tree: The tree for which to count the leaves.
2447    :type tree: :class:`TreeClassifier`
2448    """
2449    return __countLeaves(tree.tree if isinstance(tree, _TreeClassifier) or \
2450        isinstance(tree, TreeClassifier) else tree)
2451
2452# the following is for the output
2453
2454
2455fs = r"(?P<m100>\^?)(?P<fs>(\d*\.?\d*)?)"
2456""" Defines the multiplier by 100 (:samp:`^`) and the format
2457for the number of decimals (e.g. :samp:`5.3`). The corresponding
2458groups are named :samp:`m100` and :samp:`fs`. """
2459
2460by = r"(?P<by>(b(P|A)))?"
2461""" Defines bP or bA or nothing; the result is in groups by. """
2462
2463bysub = r"((?P<bysub>b|s)(?P<by>P|A))?"
2464opc = r"(?P<op>=|<|>|(<=)|(>=)|(!=))(?P<num>\d*\.?\d+)"
2465opd = r'(?P<op>=|(!=))"(?P<cls>[^"]*)"'
2466intrvl = r'((\((?P<intp>\d+)%?\))|(\(0?\.(?P<intv>\d+)\))|)'
2467fromto = r"(?P<out>!?)(?P<lowin>\(|\[)(?P<lower>\d*\.?\d+)\s*,\s*(?P<upper>\d*\.?\d+)(?P<upin>\]|\))"
2468re_V = re.compile("%V")
2469re_N = re.compile("%"+fs+"N"+by)
2470re_M = re.compile("%"+fs+"M"+by)
2471re_m = re.compile("%"+fs+"m"+by)
2472re_Ccont = re.compile("%"+fs+"C"+by+opc)
2473re_Cdisc = re.compile("%"+fs+"C"+by+opd)
2474re_ccont = re.compile("%"+fs+"c"+by+opc)
2475re_cdisc = re.compile("%"+fs+"c"+by+opd)
2476re_Cconti = re.compile("%"+fs+"C"+by+fromto)
2477re_cconti = re.compile("%"+fs+"c"+by+fromto)
2478re_D = re.compile("%"+fs+"D"+by)
2479re_d = re.compile("%"+fs+"d"+by)
2480re_AE = re.compile("%"+fs+"(?P<AorE>A|E)"+bysub)
2481re_I = re.compile("%"+fs+"I"+intrvl)
2482
2483def insertStr(s, mo, sub):
2484    """ Replace the part of s which is covered by mo
2485    with the string sub. """
2486    return s[:mo.start()] + sub + s[mo.end():]
2487
2488
2489def insertDot(s, mo):
2490    """ Replace the part of s which is covered by mo
2491    with a dot.  You should use this when the
2492    function cannot compute the desired quantity; it is called, for instance,
2493    when it needs to divide by something in the parent, but the parent
2494    doesn't exist.
2495    """
2496    return s[:mo.start()] + "." + s[mo.end():]
2497
2498def insertNum(s, mo, n):
2499    """ Replace the part of s matched by mo with the number n,
2500    formatted as specified by the user, that is, it multiplies
2501    it by 100, if needed, and prints with the right number of
2502    places and decimals. It does so by checking the mo
2503    for a group named m100 (representing the :samp:`^` in the format string)
2504    and a group named fs representing the part giving the number o
2505    f decimals (e.g. :samp:`5.3`).
2506    """
2507    grps = mo.groupdict()
2508    m100 = grps.get("m100", None)
2509    if m100:
2510        n *= 100
2511    fs = grps.get("fs") or (m100 and ".0" or "5.3")
2512    return s[:mo.start()] + ("%%%sf" % fs % n) + s[mo.end():]
2513
2514def byWhom(by, parent, tree):
2515    """ If by equals bp, it returns parent, else it returns
2516    :samp:`tree.tree`. This is used to find what to divide the quantity
2517    with, when division is required.
2518    """
2519    if by=="bP":
2520        return parent
2521    else:
2522        return tree.tree
2523
2524def replaceV(strg, mo, node, parent, tree):
2525    return insertStr(strg, mo, str(node.nodeClassifier.defaultValue))
2526
2527def replaceN(strg, mo, node, parent, tree):
2528    by = mo.group("by")
2529    N = node.distribution.abs
2530    if by:
2531        whom = byWhom(by, parent, tree)
2532        if whom and whom.distribution:
2533            if whom.distribution.abs > 1e-30:
2534                N /= whom.distribution.abs
2535        else:
2536            return insertDot(strg, mo)
2537    return insertNum(strg, mo, N)
2538       
2539
2540def replaceM(strg, mo, node, parent, tree):
2541    by = mo.group("by")
2542    maj = int(node.nodeClassifier.defaultValue)
2543    N = node.distribution[maj]
2544    if by:
2545        whom = byWhom(by, parent, tree)
2546        if whom and whom.distribution:
2547            if whom.distribution[maj] > 1e-30:
2548                N /= whom.distribution[maj]
2549        else:
2550            return insertDot(strg, mo)
2551    return insertNum(strg, mo, N)
2552       
2553
2554def replacem(strg, mo, node, parent, tree):
2555    by = mo.group("by")
2556    maj = int(node.nodeClassifier.defaultValue)
2557    if node.distribution.abs > 1e-30:
2558        N = node.distribution[maj] / node.distribution.abs
2559        if by:
2560            if whom and whom.distribution:
2561                byN = whom.distribution[maj] / whom.distribution.abs
2562                if byN > 1e-30:
2563                    N /= byN
2564            else:
2565                return insertDot(strg, mo)
2566    else:
2567        N = 0.
2568    return insertNum(strg, mo, N)
2569
2570
2571def replaceCdisc(strg, mo, node, parent, tree):
2572    if tree.classVar.varType != Orange.data.Type.Discrete:
2573        return insertDot(strg, mo)
2574   
2575    by, op, cls = mo.group("by", "op", "cls")
2576    N = node.distribution[cls]
2577    if op == "!=":
2578        N = node.distribution.abs - N
2579    if by:
2580        whom = byWhom(by, parent, tree)
2581        if whom and whom.distribution:
2582            if whom.distribution[cls] > 1e-30:
2583                N /= whom.distribution[cls]
2584        else:
2585            return insertDot(strg, mo)
2586    return insertNum(strg, mo, N)
2587
2588   
2589def replacecdisc(strg, mo, node, parent, tree):
2590    if tree.classVar.varType != Orange.data.Type.Discrete:
2591        return insertDot(strg, mo)
2592   
2593    op, by, cls = mo.group("op", "by", "cls")
2594    N = node.distribution[cls]
2595    if node.distribution.abs > 1e-30:
2596        N /= node.distribution.abs
2597        if op == "!=":
2598            N = 1 - N
2599    if by:
2600        whom = byWhom(by, parent, tree)
2601        if whom and whom.distribution:
2602            if whom.distribution[cls] > 1e-30:
2603                N /= whom.distribution[cls] / whom.distribution.abs
2604        else:
2605            return insertDot(strg, mo)
2606    return insertNum(strg, mo, N)
2607
2608
2609__opdict = {"<": operator.lt, "<=": operator.le, ">": operator.gt, ">=": operator.ge, "=": operator.eq, "!=": operator.ne}
2610
2611def replaceCcont(strg, mo, node, parent, tree):
2612    if tree.classVar.varType != Orange.data.Type.Continuous:
2613        return insertDot(strg, mo)
2614   
2615    by, op, num = mo.group("by", "op", "num")
2616    op = __opdict[op]
2617    num = float(num)
2618    N = sum([x[1] for x in node.distribution.items() if op(x[0], num)], 0.)
2619    if by:
2620        whom = byWhom(by, parent, tree)
2621        if whom and whom.distribution:
2622            byN = sum([x[1] for x in whom.distribution.items() if op(x[0], num)], 0.)
2623            if byN > 1e-30:
2624                N /= byN
2625        else:
2626            return insertDot(strg, mo)
2627
2628    return insertNum(strg, mo, N)
2629   
2630   
2631def replaceccont(strg, mo, node, parent, tree):
2632    if tree.classVar.varType != Orange.data.Type.Continuous:
2633        return insertDot(strg, mo)
2634   
2635    by, op, num = mo.group("by", "op", "num")
2636    op = __opdict[op]
2637    num = float(num)
2638    N = sum([x[1] for x in node.distribution.items() if op(x[0], num)], 0.)
2639    if node.distribution.abs > 1e-30:
2640        N /= node.distribution.abs
2641    if by:
2642        whom = byWhom(by, parent, tree)
2643        if whom and whom.distribution:
2644            byN = sum([x[1] for x in whom.distribution.items() if op(x[0], num)], 0.)
2645            if byN > 1e-30:
2646                N /= byN/whom.distribution.abs # abs > byN, so byN>1e-30 => abs>1e-30
2647        else:
2648            return insertDot(strg, mo)
2649    return insertNum(strg, mo, N)
2650
2651
2652def extractInterval(mo, dist):
2653    out, lowin, lower, upper, upin = mo.group("out", "lowin", "lower", "upper", "upin")
2654    lower, upper = float(lower), float(upper)
2655    if out:
2656        lop = lowin == "(" and operator.le or operator.lt
2657        hop = upin == ")" and operator.ge or operator.ge
2658        return filter(lambda x:lop(x[0], lower) or hop(x[0], upper), dist.items())
2659    else:
2660        lop = lowin == "(" and operator.gt or operator.ge
2661        hop = upin == ")" and operator.lt or operator.le
2662        return filter(lambda x:lop(x[0], lower) and hop(x[0], upper), dist.items())
2663
2664   
2665def replaceCconti(strg, mo, node, parent, tree):
2666    if tree.classVar.varType != Orange.data.Type.Continuous:
2667        return insertDot(strg, mo)
2668
2669    by = mo.group("by")
2670    N = sum([x[1] for x in extractInterval(mo, node.distribution)])
2671    if by:
2672        whom = byWhom(by, parent, tree)
2673        if whom and whom.distribution:
2674            byN = sum([x[1] for x in extractInterval(mo, whom.distribution)])
2675            if byN > 1e-30:
2676                N /= byN
2677        else:
2678            return insertDot(strg, mo)
2679       
2680    return insertNum(strg, mo, N)
2681
2682           
2683def replacecconti(strg, mo, node, parent, tree):
2684    if tree.classVar.varType != Orange.data.Type.Continuous:
2685        return insertDot(strg, mo)
2686
2687    N = sum([x[1] for x in extractInterval(mo, node.distribution)])
2688    ab = node.distribution.abs
2689    if ab > 1e-30:
2690        N /= ab
2691
2692    by = mo.group("by")
2693    if by:
2694        whom = byWhom(by, parent, tree)
2695        if whom and whom.distribution:
2696            byN = sum([x[1] for x in extractInterval(mo, whom.distribution)])
2697            if byN > 1e-30:
2698                N /= byN/whom.distribution.abs
2699        else:
2700            return insertDot(strg, mo)
2701       
2702    return insertNum(strg, mo, N)
2703
2704   
2705def replaceD(strg, mo, node, parent, tree):
2706    if tree.classVar.varType != Orange.data.Type.Discrete:
2707        return insertDot(strg, mo)
2708
2709    fs, by, m100 = mo.group("fs", "by", "m100")
2710    dist = list(node.distribution)
2711    if by:
2712        whom = byWhom(by, parent, tree)
2713        if whom:
2714            for i, d in enumerate(whom.distribution):
2715                if d > 1e-30:
2716                    dist[i] /= d
2717        else:
2718            return insertDot(strg, mo)
2719    mul = m100 and 100 or 1
2720    fs = fs or (m100 and ".0" or "5.3")
2721    return insertStr(strg, mo, "["+", ".join(["%%%sf" % fs % (N*mul) for N in dist])+"]")
2722
2723
2724def replaced(strg, mo, node, parent, tree):
2725    if tree.classVar.varType != Orange.data.Type.Discrete:
2726        return insertDot(strg, mo)
2727
2728    fs, by, m100 = mo.group("fs", "by", "m100")
2729    dist = list(node.distribution)
2730    ab = node.distribution.abs
2731    if ab > 1e-30:
2732        dist = [d/ab for d in dist]
2733    if by:
2734        whom = byWhom(by, parent, tree)
2735        if whom:
2736            for i, d in enumerate(whom.distribution):
2737                if d > 1e-30:
2738                    dist[i] /= d/whom.distribution.abs # abs > d => d>1e-30 => abs>1e-30
2739        else:
2740            return insertDot(strg, mo)
2741    mul = m100 and 100 or 1
2742    fs = fs or (m100 and ".0" or "5.3")
2743    return insertStr(strg, mo, "["+", ".join(["%%%sf" % fs % (N*mul) for N in dist])+"]")
2744
2745
2746def replaceAE(strg, mo, node, parent, tree):
2747    if tree.classVar.varType != Orange.data.Type.Continuous:
2748        return insertDot(strg, mo)
2749
2750    AorE, bysub, by = mo.group("AorE", "bysub", "by")
2751   
2752    if AorE == "A":
2753        A = node.distribution.average()
2754    else:
2755        A = node.distribution.error()
2756    if by:
2757        whom = byWhom("b"+by, parent, tree)
2758        if whom:
2759            if AorE == "A":
2760                avg = whom.distribution.average()
2761            else:
2762                avg = whom.distribution.error()
2763            if bysub == "b":
2764                if avg > 1e-30:
2765                    A /= avg
2766            else:
2767                A -= avg
2768        else:
2769            return insertDot(strg, mo)
2770    return insertNum(strg, mo, A)
2771
2772
2773Z = { 0.75:1.15, 0.80:1.28, 0.85:1.44, 0.90:1.64, 0.95:1.96, 0.99:2.58 }
2774
2775def replaceI(strg, mo, node, parent, tree):
2776    if tree.classVar.varType != Orange.data.Type.Continuous:
2777        return insertDot(strg, mo)
2778
2779    fs = mo.group("fs") or "5.3"
2780    intrvl = float(mo.group("intp") or mo.group("intv") or "95")/100.
2781    mul = mo.group("m100") and 100 or 1
2782
2783    if not Z.has_key(intrvl):
2784        raise SystemError, "Cannot compute %5.3f% confidence intervals" % intrvl
2785
2786    av = node.distribution.average()   
2787    il = node.distribution.error() * Z[intrvl]
2788    return insertStr(strg, mo, "[%%%sf-%%%sf]" % (fs, fs) % ((av-il)*mul, (av+il)*mul))
2789
2790
2791# This class is more a collection of function, merged into a class so
2792# that they don't need to transfer too many arguments. It will be
2793# constructed, used and discarded, it is not meant to store any information.
2794class __TreeDumper:
2795    defaultStringFormats = [(re_V, replaceV), (re_N, replaceN),
2796         (re_M, replaceM), (re_m, replacem), 
2797         (re_Cdisc, replaceCdisc), (re_cdisc, replacecdisc),
2798         (re_Ccont, replaceCcont), (re_ccont, replaceccont),
2799         (re_Cconti, replaceCconti), (re_cconti, replacecconti),
2800         (re_D, replaceD), (re_d, replaced), (re_AE, replaceAE), 
2801         (re_I, replaceI) ]
2802
2803    def __init__(self, leafStr, nodeStr, stringFormats, minExamples, 
2804        maxDepth, simpleFirst, tree, **kw):
2805        self.stringFormats = stringFormats
2806        self.minExamples = minExamples
2807        self.maxDepth = maxDepth
2808        self.simpleFirst = simpleFirst
2809        self.tree = tree
2810        self.__dict__.update(kw)
2811
2812        if leafStr:
2813            self.leafStr = leafStr
2814        else:
2815            if tree.classVar.varType == Orange.data.Type.Discrete:
2816                self.leafStr = "%V (%^.2m%)"
2817            else:
2818                self.leafStr = "%V"
2819
2820        if nodeStr == ".":
2821            self.nodeStr = self.leafStr
2822        else:
2823            self.nodeStr = nodeStr
2824       
2825
2826    def formatString(self, strg, node, parent):
2827        if hasattr(strg, "__call__"):
2828            return strg(node, parent, self.tree)
2829       
2830        if not node:
2831            return "<null node>"
2832       
2833        for rgx, replacer in self.stringFormats:
2834            if not node.distribution:
2835                strg = rgx.sub(".", strg)
2836            else:
2837                strt = 0
2838                while True:
2839                    mo = rgx.search(strg, strt)
2840                    if not mo:
2841                        break
2842                    strg = replacer(strg, mo, node, parent, self.tree)
2843                    strt = mo.start()+1
2844                       
2845        return strg
2846       
2847
2848    def showBranch(self, node, parent, lev, i):
2849        bdes = node.branchDescriptions[i]
2850        bdes = node.branchSelector.classVar.name + \
2851            (bdes[0] not in "<=>" and "=" or "") + bdes
2852        if node.branches[i]:
2853            nodedes = self.nodeStr and ": " + \
2854                self.formatString(self.nodeStr, node.branches[i], node) or ""
2855        else:
2856            nodedes = "<null node>"
2857        return "|    "*lev + bdes + nodedes
2858       
2859       
2860    def dumpTree0(self, node, parent, lev):
2861        if node.branches:
2862            if node.distribution.abs < self.minExamples or \
2863                lev > self.maxDepth:
2864                return "|    "*lev + ". . .\n"
2865           
2866            res = ""
2867            if self.leafStr and self.nodeStr and self.leafStr != self.nodeStr:
2868                leafsep = "\n"+("|    "*lev)+"    "
2869            else:
2870                leafsep = ""
2871            if self.simpleFirst:
2872                for i, branch in enumerate(node.branches):
2873                    if not branch or not branch.branches:
2874                        if self.leafStr == self.nodeStr:
2875                            res += "%s\n" % \
2876                                self.showBranch(node, parent, lev, i)
2877                        else:
2878                            res += "%s: %s\n" % \
2879                                (self.showBranch(node, parent, lev, i),
2880                                 leafsep + 
2881                                 self.formatString(self.leafStr, branch, node))
2882            for i, branch in enumerate(node.branches):
2883                if branch and branch.branches:
2884                    res += "%s\n%s" % (self.showBranch(node, parent, lev, i),
2885                                       self.dumpTree0(branch, node, lev+1))
2886                elif not self.simpleFirst:
2887                    if self.leafStr == self.nodeStr:
2888                        res += "%s\n" % self.showBranch(node, parent, lev, i)
2889                    else:
2890                        res += "%s: %s\n" % \
2891                            (self.showBranch(node, parent, lev, i),
2892                             leafsep + 
2893                             self.formatString(self.leafStr, branch, node))
2894            return res
2895        else:
2896            return self.formatString(self.leafStr, node, parent)
2897
2898
2899    def dumpTree(self):
2900        if self.nodeStr:
2901            lev, res = 1, "root: %s\n" % \
2902                self.formatString(self.nodeStr, self.tree.tree, None)
2903            self.maxDepth += 1
2904        else:
2905            lev, res = 0, ""
2906        return res + self.dumpTree0(self.tree.tree, None, lev)
2907       
2908
2909    def dotTree0(self, node, parent, internalName):
2910        if node.branches:
2911            if node.distribution.abs < self.minExamples or \
2912                len(internalName)-1 > self.maxDepth:
2913                self.fle.write('%s [ shape="plaintext" label="..." ]\n' % \
2914                    _quoteName(internalName))
2915                return
2916               
2917            label = node.branchSelector.classVar.name
2918            if self.nodeStr:
2919                label += "\\n" + self.formatString(self.nodeStr, node, parent)
2920            self.fle.write('%s [ shape=%s label="%s"]\n' % \
2921                (_quoteName(internalName), self.nodeShape, label))
2922           
2923            for i, branch in enumerate(node.branches):
2924                if branch:
2925                    internalBranchName = internalName+chr(i+65)
2926                    self.fle.write('%s -> %s [ label="%s" ]\n' % \
2927                        (_quoteName(internalName), 
2928                         _quoteName(internalBranchName), 
2929                         node.branchDescriptions[i]))
2930                    self.dotTree0(branch, node, internalBranchName)
2931                   
2932        else:
2933            self.fle.write('%s [ shape=%s label="%s"]\n' % \
2934                (internalName, self.leafShape, 
2935                self.formatString(self.leafStr, node, parent)))
2936
2937
2938    def dotTree(self, internalName="n"):
2939        self.fle.write("digraph G {\n")
2940        self.dotTree0(self.tree.tree, None, internalName)
2941        self.fle.write("}\n")
2942
2943def _quoteName(x):
2944    return '"%s"' % (base64.b64encode(x))
2945
2946def dumpTree(tree, leafStr = "", nodeStr = "", **argkw):
2947    """
2948    Return a string representation of a tree.
2949
2950    :arg tree: The tree to dump to string.
2951    :type tree: class:`TreeClassifier`
2952    :arg leafStr: The format string for printing the tree leaves. If
2953      left empty, "%V (%^.2m%)" will be used for classification trees
2954      and "%V" for regression trees.
2955    :type leafStr: string
2956    :arg nodeStr: The format string for printing out the internal nodes.
2957      If left empty (as it is by default), no data is printed out for
2958      internal nodes. If set to :samp:`"."`, the same string is
2959      used as for leaves.
2960    :type nodeStr: string
2961    :arg maxDepth: If set, it limits the depth to which the tree is
2962      printed out.
2963    :type maxDepth: integer
2964    :arg minExamples: If set, the subtrees with less than the given
2965      number of examples are not printed.
2966    :type minExamples: integer
2967    :arg simpleFirst: If True (default), the branches with a single
2968      node are printed before the branches with larger subtrees.
2969      If False, the branches are printed in order of
2970      appearance.
2971    :type simpleFirst: boolean
2972    :arg userFormats: A list of regular expressions and callback
2973      function through which the user can print out other specific
2974      information in the nodes.
2975    """
2976    return __TreeDumper(leafStr, nodeStr, argkw.get("userFormats", []) + 
2977        __TreeDumper.defaultStringFormats, argkw.get("minExamples", 0), 
2978        argkw.get("maxDepth", 1e10), argkw.get("simpleFirst", True),
2979        tree).dumpTree()
2980
2981
2982def printTree(*a, **aa):
2983    """
2984    Print out the tree (call :func:`dumpTree` with the same
2985    arguments and print out the result).
2986    """
2987    print dumpTree(*a, **aa)
2988
2989printTxt = printTree
2990""" An alias for :func:`printTree`. Left for compatibility. """
2991
2992def printDot(tree, fileName, leafStr = "", nodeStr = "", leafShape="plaintext", nodeShape="plaintext", **argkw):
2993    """ Prints the tree to a file in a format used by
2994    `GraphViz <http://www.research.att.com/sw/tools/graphviz>`_.
2995    Uses the same parameters as :func:`printTxt` defined above
2996    plus two parameters which define the shape used for internal
2997    nodes and laves of the tree:
2998
2999    :param leafShape: Shape of the outline around leves of the tree.
3000        If "plaintext", no outline is used (default: "plaintext").
3001    :type leafShape: string
3002    :param internalNodeShape: Shape of the outline around internal nodes
3003        of the tree. If "plaintext", no outline is used (default: "box")
3004    :type leafShape: string
3005
3006    Check `Polygon-based Nodes <http://www.graphviz.org/doc/info/shapes.html>`_
3007    for various outlines supported by GraphViz.
3008    """
3009    fle = type(fileName) == str and file(fileName, "wt") or fileName
3010
3011    __TreeDumper(leafStr, nodeStr, argkw.get("userFormats", []) + 
3012        __TreeDumper.defaultStringFormats, argkw.get("minExamples", 0), 
3013        argkw.get("maxDepth", 1e10), argkw.get("simpleFirst", True), tree,
3014        leafShape=leafShape, nodeShape=nodeShape, fle=fle).dotTree()
3015 
3016class TreeClassifier(Orange.classification.Classifier):
3017    """
3018    Wraps :class:`Orange.core.TreeClassifier`.
3019    """
3020   
3021    def __init__(self, baseClassifier=None):
3022        if not baseClassifier: baseClassifier = _TreeClassifier()
3023        self.nativeClassifier = baseClassifier
3024        for k, v in self.nativeClassifier.__dict__.items():
3025            self.__dict__[k] = v
3026 
3027    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue,
3028                 *args, **kwdargs):
3029        """Classify a new instance.
3030       
3031        :param instance: instance to be classified.
3032        :type instance: :class:`Orange.data.Instance`
3033        :param result_type:
3034              :class:`Orange.classification.Classifier.GetValue` or \
3035              :class:`Orange.classification.Classifier.GetProbabilities` or
3036              :class:`Orange.classification.Classifier.GetBoth`
3037       
3038        :rtype: :class:`Orange.data.Value`,
3039              :class:`Orange.statistics.Distribution` or a tuple with both
3040        """
3041        return self.nativeClassifier(instance, result_type, *args, **kwdargs)
3042
3043    def __setattr__(self, name, value):
3044        if name == "nativeClassifier":
3045            self.__dict__[name] = value
3046            return
3047        if name in self.nativeClassifier.__dict__:
3048            self.nativeClassifier.__dict__[name] = value
3049        self.__dict__[name] = value
3050   
3051   
3052
3053dotTree = printDot
3054""" An alias for :func:`printDot`. Left for compatibility. """
Note: See TracBrowser for help on using the repository browser.