source: orange/docs/widgets/rst/classify/interactivetreebuilder.rst @ 11050:e3c4699ca155

Revision 11050:e3c4699ca155, 8.5 KB checked in by Miha Stajdohar <miha.stajdohar@…>, 16 months ago (diff)

Widget docs From HTML to Sphinx.

Line 
1.. _Interactive Tree Builder:
2
3Interactive Tree Builder
4========================
5
6.. image:: ../icons/InteractiveTreeBuilder.png
7
8A widget for manual construction and/or editing of classification trees
9
10Signals
11-------
12
13Inputs:
14
15
16   - Examples (ExampleTable)
17      Learning examples
18
19   - Tree Learner (orange.TreeLearner)
20      An optional tree learner to be used instead of the default tree learner.
21
22
23Outputs:
24
25   - Examples (orange.ExampleTable)
26      Examples from the selected tree node.
27
28   - Classifier (orange.TreeClassifier)
29      The constructed tree.
30
31   - Tree Learner (orange.Learner)
32      A learner which always returns the same tree - the one constructed in the widget
33
34
35Signal :code:`Examples` sends data only if some tree node is selected and contains some examples.
36
37Description
38-----------
39
40This is a very exciting widget which is useful for teaching induction of classification trees and also in practice, where a data miner and an area expert can use it to manually construct a classification tree helped by the entire Orange's widgetry.
41
42The widget is based on `Classification Tree Viewer <ClassificationTreeViewer.htm>`_. It is mostly the same (so you are encouraged to read the related documentation), except for the different input/output signals and the addition of a few buttons.
43
44.. image:: images/InteractiveTreeBuilder.png
45   :alt: Interactive Tree Builder widget
46
47Button :obj:`Split` splits the selected tree node according to the criterion above the button. For instance, if we pressed Split in the above widget, the animals that don't give milk and have no feathers (the pictures shows a tree for the zoo data set) would be split according to whether they are :code:`aquatic` or not. In case of continuous attributes, a cut off point needs to be specified as well.
48
49If Split is used on a node which is not a leaf, the criterion at that node is replaced. If we, for instance, selected the &lt;root&gt; node and pushed Split, the criterion :code:`milk` would be replaced with :code:`aquatic` and the nodes below (:code:`feathers`) are removed.
50
51Button :obj:`Cut` cuts the tree at the selected node. If we pushed Cut in the situation in the picture, nothing would happen since the selected node (:code:`feathers=0`) is already a leaf. If we selected :code:`&lt;root&gt;` and pushed Cut, the entire tree would be cut off.
52
53Cut is especially useful in combination with :code:`Build` which builds a subtree at the current node. So, if we push Build in the situation depicted above, a subtree would be built for the milkless featherless animals, leaving the rest of the tree (that is, the existing two nodes) intact. If Build is pressed at a node which is not leaf, the entire subtree at that node is replaced with an automatically induced tree.
54
55Build uses some reasonable default parameters for tree learning (information gain ratio is used for attribute selection with a minimum of 2 examples per leaf, which gives an algorithm equivalent to Quinlan's C4.5). To gain more control on the tree construction arguments, use a `Classification Tree widget <ClassificationTree.htm>`_ or `C4.5 <C4.5.htm>`_ widget, set its parameters and connect it to the input of Interactive Tree Builder. The set parameters will the be used for the tree induction. (If you use C4.5, the original Quinlan's algorithm, don't forget to check :obj:`Convert to orange tree structure`.)
56
57The widget has several outputs. :obj:`Examples` gives, as in `Classification Tree Viewer <ClassificationTreeViewer.htm>`_ the list of examples from the selected node. This output can be used to observe the statistical properties or visualizations of various attributes for a specific node, based on which we should decide whether we should split the examples and how.
58
59Signal :obj:`Classification Tree` can be attached to another tree viewer. Using a Classification Tree Viewer is not really useful as it will show the same picture as Interactive Tree Builder. We can however connect the more colorful `Classification Tree Graph <ClassificationTreeGraph.htm>`_.
60
61The last output is :obj:`Tree Learner`. This is a tree learner which always gives the same tree - the one we constructed in this widget. This can be used to assess the tree's quality with the `Test Learners <../Evaluate/TestLearners.htm>`_ widget. This requires some caution, though: you should not test the tree on the same data you used to induce it. See the Examples section below for the correct procedure.
62
63Examples
64--------
65
66The first snapshot shows the typical "environment" of the Interactive Tree Builder.
67
68.. image:: images/InteractiveTreeBuilder-SchemaInduction.png
69   :alt: A schema with Interactive Tree Builder
70
71The learning examples may come from a file. We also use a `Classification Tree <ClassificationTree.htm>`_ widget to able to set the tree induction parameters for the parts of the tree we want to induce automatically.
72
73On the right hand side, we have the `Rank <../Data/Rank.htm>`_ widget which assesses the quality of attributes through measures like information gain, gini index and others. Emulating the induction algorithm by selecting the attributes having the highest value for one of these measures should give the same results as using Classification Tree widget instead of the Interactive Builder. However, in manual construction we can (and should) also rely on the visualization widgets. One-dimensional visualizations like `Distributions <../Visualize/Distributions.htm>`_ give us an impression about the properties of a single attribute, while two- and more dimensional visualizations like `Scatterplot <../Visualize/Scatterplot.htm>`_ and `Linear Projection <../Visualize/LinearProjection.htm>`_ will give us a kind of lookahead by telling us about the useful combinations of attributes. We have also deployed the `Data Table <../Data/DataTable.htm>`_ widget since seeing particular examples in a tree node may also sometimes help the expert.
74
75Finally, we use the `Classification Tree Graph <ClassificationTreeGraph.htm>`_ to present the resulting tree in a fancy looking picture.
76
77As the widget name suggests, the tree construction should be interactive, making the best use of the available Orange's visualization techniques and help of the area expert. At the beginning the widget presents a tree containing only the root. One way to proceed is to immediately click Build and then study the resulting tree. Data examples for various nodes can be presented and visualized to decide which parts of the tree make sense, which don't and should better be reconstructed manually, and which subtrees should be cut off. The other way is to start constructing the tree manually, adding the nodes according to the expert's knowledge and occasionally use Build button to let Orange make a suggestion.
78
79
80Although expert's help will usually prevent overfitting the data, special care still needs to be taken when we are interested in knowing the performance of the induced tree. Since the widely used cross-validation is for obvious reasons inapplicable when the model is constructed manually, we should split the data into training and testing set prior to building the tree.
81
82.. image:: images/InteractiveTreeBuilder-SchemaSampling.png
83   :alt: A schema with Interactive Tree Builder
84
85We have used the `Data Sampler <../Data/DataSampler>`_ widget for splitting the data; in most cases we recommend using stratified random sampling with a sample size of 70% for training. These examples (denoted as "Examples" in the snapshot) are fed to the Interactive Tree Builder where we employ the Orange's armory to construct the tree as described above.
86
87The tricky part is connecting the :code:`Test Learners`: Data Sampler's Examples should be used as Test Learners' Data, and Data Sampler's Remaining Examples are the Test Learners' Separate Test Data.
88
89.. image:: images/InteractiveTreeBuilder-SchemaSampling-Wiring.png
90   :alt: Connecting Data Sampler to Test Learners when using Interactive Tree Builder
91
92In Test Learners, don't forget to set the Sampling type to :obj:`Test on test data`. Interactive Tree Builder should then give its Tree Learner to Test Learners. To compare the manually constructed tree with, say, an automatically constructed one and with a Naive Bayesian classifier, we can include these two in the schema.
93
94Test Learners will now feed the training data (70% sample it gets from Data Sampler) to all three learning algorithms. While Naive Bayes and Classification Tree will actually learn, Interactive Tree Builder will ignore the training examples and return the manually built tree. All three models will then be tested on the remaining 30% examples.
Note: See TracBrowser for help on using the repository browser.