source: orange/docs/widgets/rst/data/rank.rst @ 11359:8d54e79aa135

Revision 11359:8d54e79aa135, 6.0 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Cleanup of 'Widget catalog' documentation.

Fixed rst text formating, replaced dead hardcoded reference links (now using
:ref:), etc.

[11050]1.. _Rank:
6.. image:: ../icons/Rank.png
8A widget for ranking the attributes and selecting attribute subsets.
16   - Examples (ExampleTable)
17      Input data set.
23   - Reduced Example Table (ExampleTable)
24      Data set which include described by selected attributes.
26   - ExampleTable Attributes (ExampleTable)
[11359]27      Data set in where each example corresponds to an attribute from the
28      original set, and the attributes correspond one of the selected
29      attribute evaluation measures.
[11359]35This widget computes a set of measures for evaluating the quality/usefulness
36of attributes: ReliefF, information gain, gain ratio and gini index.
37Besides providing this information, it also allows user to select a subset
38of attributes or it can automatically select the specified number of
39best-ranked attributes.
41.. image:: images/Rank.png
[11359]43The right-hand side of the widget presents the computed quality of the
44attributes. The first line shows the attribute name and the second the
45number of its values (or a "C", if the attribute is continuous. Remaining
46columns show different measures of quality.
[11359]48The user is able to select the measures (s)he wants computed and presented.
49:obj:`ReliefF` requires setting two arguments: the number of :obj:`Neighbours`
50taken into account and the number of randomly chosen reference :obj:`Examples`.
51The former should be higher if there is a lot of noise; the latter generally
52makes the computation less reliable if set too low, while higher values
53make it slow.
[11359]55The order in which the attributes are presented can be set either in the
56list below the measures or by clicking the table's column headers. Attributes
57can also be sorted by a measure not printed in the table.
[11359]59Measures that cannot handle continuous attributes (impurity
60measures - information gain, gain ratio and gini index) are run on
61discretized attributes. For sake of simplicity we always split the
62continuous attributes in intervals with (approximately) equal number of
63examples, but the user can set the number of :obj:`Intervals`.
[11359]65It is also possible to set the number of decimals
66(:obj:`No. of decimals`) in the print out. Using a number to high may
67exaggerate the accuracy of the computation; many decimals may only be
68useful when the computed numbers are really small.
[11359]70The widget outputs two example tables. The one, whose corresponding signal
71is named :code:`ExampleTable Attributes` looks pretty much like the one
72shown in the Rank widget, except that the second column is split into two
73columns, one giving the attribute type (D for discrete and C for continuous),
74and the other giving the number of distinct values if the attribute is
75discrete and undefined if it's continuous.
[11359]77The second, more interesting table has the same examples as the original,
78but with a subset of the attributes. To select/unselect attributes, click
79the corresponding rows in the table. This way, the widget can be used for
80manual selection of attributes. Something similar can also be done with
81a :ref:`Select Attributes` widget, except that the Rank widget can be used
82for selecting the attributes according to their quality, while Select
83Attributes offers more in terms of changing the order of attributes,
84picking another class attribute and similar.
[11359]86The widget can also be used to automatically select a feature subset.
87If :obj:`Best ranked` is selected in box :obj:`Select Attributes`, the
88widget will output a data set where examples are described by the
89specified number of best ranked attributes. The data set is changed
90whenever the order of attributes is changed for any reason (different
91measure is selected for sorting, ReliefF or discretization settings are
[11359]94The first two options in :obj:`Select Attributes` box can be used to
95clear the selection (:obj:`None`) or to select all attributes (:obj:`All`).
[11359]97Button :obj:`Commit` sends the data set with the selected attributes.
98If :obj:`Commit automatically` is set, the data set is committed on any change.
[11359]104On typical use of the widget is to put it immediately after the :ref:`File`
105widget to reduce the attribute set. The snapshot below shows this as a part of
106a bit more complicated schema.
108.. image:: images/Rank-after-file-Schema.png
[11359]110The examples in the file are put through ref:`Data Sampler` which split the
111data set into two subsets: one, containing 70% of examples (signal
112:code:`Classified Examples`) will be used for training a
113:ref:`Naive Bayes <Naive Bayes>` classifier, and the other 30% (signal
114:code:`Remaining Classified Examples`) for testing. Attribute subset selection
115based on information gain was performed on the training set only, and five most
116informative attributes were selected for learning. A data set with all other
117attributes removed (signal :code:`Reduced Example Table`) is fed into
118:ref:`Test Learners`. Test Learners widgets also gets the
119:code:`Remaining Classified Examples` to use them as test examples (don't
120forget to set :code:`Test on Test Data` in that widget!).
[11359]122To verify how the subset selection affects the classifier's performance, we
123added another :ref:`Test Learners`, but connected it to the
124:code:`Data Sampler` so that the two subsets emitted by the latter are used
125for training and testing without any feature subset selection.
[11359]127Running this schema on the heart disease data set shows quite a considerable
128improvements in all respects on the reduced attribute subset.
[11359]130In another, way simpler example, we connected a
131:ref:`Classification Tree Viewer` to the Rank widget to observe different
132attribute quality measures at different nodes. This can give us some picture
133about how important is the selection of measure in tree construction: the more
134the measures agree about attribute ranking, the less crucial is the measure
137.. image:: images/Rank-Tree.png
[11359]139A variation of the above is using the Rank widget after the
140:ref:`Interactive Tree Builder`: the sorted attributes may help us in deciding
141the attribute to use at a certain node.
143.. image:: images/Rank-ITree.png
Note: See TracBrowser for help on using the repository browser.