source: orange/docs/widgets/rst/classify/interactivetreebuilder.rst @ 11359:8d54e79aa135

Revision 11359:8d54e79aa135, 8.2 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Cleanup of 'Widget catalog' documentation.

Fixed rst text formating, replaced dead hardcoded reference links (now using
:ref:), etc.

Line 
1.. _Interactive Tree Builder:
2
3Interactive Tree Builder
4========================
5
6.. image:: ../icons/InteractiveTreeBuilder.png
7
8A widget for manual construction and/or editing of classification trees
9
10Signals
11-------
12
13Inputs:
14
15
16   - Examples (ExampleTable)
17      Learning examples
18
19   - Tree Learner (orange.TreeLearner)
20      An optional tree learner to be used instead of the default tree learner.
21
22
23Outputs:
24
25   - Examples (orange.ExampleTable)
26      Examples from the selected tree node.
27
28   - Classifier (orange.TreeClassifier)
29      The constructed tree.
30
31   - Tree Learner (orange.Learner)
32      A learner which always returns the same tree - the one constructed in
33      the widget
34
35
36Signal :code:`Examples` sends data only if some tree node is selected and
37contains some examples.
38
39Description
40-----------
41
42This is a very exciting widget which is useful for teaching induction of
43classification trees and also in practice, where a data miner and an area
44expert can use it to manually construct a classification tree helped by the
45entire Orange's widgetry.
46
47The widget is based on :ref:`Classification Tree Viewer`. It is mostly the
48same (so you are encouraged to read the related documentation), except for
49the different input/output signals and the addition of a few buttons.
50
51.. image:: images/InteractiveTreeBuilder.png
52   :alt: Interactive Tree Builder widget
53
54Button :obj:`Split` splits the selected tree node according to the criterion
55above the button. For instance, if we pressed Split in the above widget,
56the animals that don't give milk and have no feathers (the pictures shows
57a tree for the zoo data set) would be split according to whether they are
58:code:`aquatic` or not. In case of continuous attributes, a cut off point
59needs to be specified as well.
60
61If Split is used on a node which is not a leaf, the criterion at that node
62is replaced. If we, for instance, selected the &lt;root&gt; node and pushed
63Split, the criterion :code:`milk` would be replaced with :code:`aquatic`
64and the nodes below (:code:`feathers`) are removed.
65
66Button :obj:`Cut` cuts the tree at the selected node. If we pushed Cut
67in the situation in the picture, nothing would happen since the selected
68node (:code:`feathers=0`) is already a leaf. If we selected :code:`<root>`
69and pushed Cut, the entire tree would be cut off.
70
71Cut is especially useful in combination with :code:`Build` which builds
72a subtree at the current node. So, if we push Build in the situation
73depicted above, a subtree would be built for the milkless featherless
74animals, leaving the rest of the tree (that is, the existing two nodes)
75intact. If Build is pressed at a node which is not leaf, the entire subtree
76at that node is replaced with an automatically induced tree.
77
78Build uses some reasonable default parameters for tree learning (information
79gain ratio is used for attribute selection with a minimum of 2 examples per
80leaf, which gives an algorithm equivalent to Quinlan's C4.5). To gain more
81control on the tree construction arguments, use a :ref:`Classification Tree`
82widget or :ref:`C4.5` widget, set its parameters and connect it to the
83input of Interactive Tree Builder. The set parameters will the be used for
84the tree induction. (If you use C4.5, the original Quinlan's algorithm,
85don't forget to check :obj:`Convert to orange tree structure`.)
86
87The widget has several outputs. :obj:`Examples` gives, as in
88:ref:`Classification Tree Viewer` the list of examples from the selected node.
89This output can be used to observe the statistical properties or
90visualizations of various attributes for a specific node, based on which
91we should decide whether we should split the examples and how.
92
93Signal :obj:`Classification Tree` can be attached to another tree viewer.
94Using a :ref:`Classification Tree Viewer` is not really useful as it will
95show the same picture as Interactive Tree Builder. We can however connect
96the more colorful :ref:`Classification Tree Graph`.
97
98The last output is :obj:`Tree Learner`. This is a tree learner which always
99gives the same tree - the one we constructed in this widget. This can be used
100to assess the tree's quality with the :ref:`Test Learners` widget. This
101requires some caution, though: you should not test the tree on the same
102data you used to induce it. See the Examples section below for the correct
103procedure.
104
105Examples
106--------
107
108The first snapshot shows the typical "environment" of the Interactive
109Tree Builder.
110
111.. image:: images/InteractiveTreeBuilder-SchemaInduction.png
112   :alt: A schema with Interactive Tree Builder
113
114The learning examples may come from a file. We also use a
115:ref:`Classification Tree` widget to able to set the tree induction parameters
116for the parts of the tree we want to induce automatically.
117
118On the right hand side, we have the :ref:`Rank` widget which assesses the
119quality of attributes through measures like information gain, gini index
120and others. Emulating the induction algorithm by selecting the attributes
121having the highest value for one of these measures should give the same
122results as using Classification Tree widget instead of the Interactive
123Builder. However, in manual construction we can (and should) also rely on
124the visualization widgets. One-dimensional visualizations like
125:ref:`Distributions` give us an impression about the properties of a single
126attribute, while two- and more dimensional visualizations like
127:ref:`Scatter Plot` and :ref:`Linear Projection` will give us a kind of
128lookahead by telling us about the useful combinations of attributes. We
129have also deployed the :ref:`Data Table` widget since seeing particular
130examples in a tree node may also sometimes help the expert.
131
132Finally, we use the :ref:`Classification Tree Graph` to present the resulting
133tree in a fancy looking picture.
134
135As the widget name suggests, the tree construction should be interactive,
136making the best use of the available Orange's visualization techniques
137and help of the area expert. At the beginning the widget presents a tree
138containing only the root. One way to proceed is to immediately click
139Build and then study the resulting tree. Data examples for various nodes
140can be presented and visualized to decide which parts of the tree make sense,
141which don't and should better be reconstructed manually, and which subtrees
142should be cut off. The other way is to start constructing the tree
143manually, adding the nodes according to the expert's knowledge and
144occasionally use Build button to let Orange make a suggestion.
145
146
147Although expert's help will usually prevent overfitting the data,
148special care still needs to be taken when we are interested in knowing
149the performance of the induced tree. Since the widely used cross-validation
150is for obvious reasons inapplicable when the model is constructed
151manually, we should split the data into training and testing set prior
152to building the tree.
153
154.. image:: images/InteractiveTreeBuilder-SchemaSampling.png
155   :alt: A schema with Interactive Tree Builder
156
157We have used the :ref:`Data Sampler` widget for splitting the data; in most
158cases we recommend using stratified random sampling with a sample size
159of 70% for training. These examples (denoted as "Examples" in the snapshot)
160are fed to the Interactive Tree Builder where we employ the Orange's armory
161to construct the tree as described above.
162
163The tricky part is connecting the :ref:`Test Learners`: Data Sampler's
164Examples should be used as Test Learners' Data, and Data Sampler's
165Remaining Examples are the Test Learners' Separate Test Data.
166
167.. image:: images/InteractiveTreeBuilder-SchemaSampling-Wiring.png
168   :alt: Connecting Data Sampler to Test Learners when using Interactive
169         Tree Builder
170
171In Test Learners, don't forget to set the Sampling type to
172:obj:`Test on test data`. Interactive Tree Builder should then give its
173Tree Learner to Test Learners. To compare the manually constructed tree
174with, say, an automatically constructed one and with a Naive Bayesian
175classifier, we can include these two in the schema.
176
177Test Learners will now feed the training data (70% sample it gets from
178Data Sampler) to all three learning algorithms. While Naive Bayes and
179Classification Tree will actually learn, Interactive Tree Builder will
180ignore the training examples and return the manually built tree.
181All three models will then be tested on the remaining 30% examples.
Note: See TracBrowser for help on using the repository browser.