source: orange/docs/widgets/rst/data/rank.rst @ 11050:e3c4699ca155

Revision 11050:e3c4699ca155, 6.1 KB checked in by Miha Stajdohar <miha.stajdohar@…>, 16 months ago (diff)

Widget docs From HTML to Sphinx.

Line 
1.. _Rank:
2
3Rank
4====
5
6.. image:: ../icons/Rank.png
7
8A widget for ranking the attributes and selecting attribute subsets.
9
10Signals
11-------
12
13Inputs:
14
15
16   - Examples (ExampleTable)
17      Input data set.
18
19
20Outputs:
21
22
23   - Reduced Example Table (ExampleTable)
24      Data set which include described by selected attributes.
25
26   - ExampleTable Attributes (ExampleTable)
27      Data set in where each example corresponds to an attribute from the original set, and the attributes correspond one of the selected attribute evaluation measures.
28
29
30Description
31-----------
32
33This widget computes a set of measures for evaluating the quality/usefulness of attributes: ReliefF, information gain, gain ratio and gini index. Besides providing this information, it also allows user to select a subset of attributes or it can automatically select the specified number of best-ranked attributes.
34
35.. image:: images/Rank.png
36
37The right-hand side of the widget presents the computed quality of the attributes. The first line shows the attribute name and the second the number of its values (or a "C", if the attribute is continuous. Remaining columns show different measures of quality.
38
39The user is able to select the measures (s)he wants computed and presented. :obj:`ReliefF` requires setting two arguments: the number of :obj:`Neighbours` taken into account and the number of randomly chosen reference :obj:`Examples`. The former should be higher if there is a lot of noise; the latter generally makes the computation less reliable if set too low, while higher values make it slow.
40
41The order in which the attributes are presented can be set either in the list below the measures or by clicking the table's column headers. Attributes can also be sorted by a measure not printed in the table.
42
43Measures that cannot handle continuous attributes (impurity measures - information gain, gain ratio and gini index) are run on discretized attributes. For sake of simplicity we always split the continuous attributes in intervals with (approximately) equal number of examples, but the user can set the number of :obj:`Intervals`.
44
45It is also possible to set the number of decimals (:obj:`No. of decimals`) in the print out. Using a number to high may exaggerate the accuracy of the computation; many decimals may only be useful when the computed numbers are really small.
46
47The widget outputs two example tables. The one, whose corresponding signal is named :code:`ExampleTable Attributes` looks pretty much like the one shown in the Rank widget, except that the second column is split into two columns, one giving the attribute type (D for discrete and C for continuous), and the other giving the number of distinct values if the attribute is discrete and undefined if it's continuous.
48
49The second, more interesting table has the same examples as the original, but with a subset of the attributes. To select/unselect attributes, click the corresponding rows in the table. This way, the widget can be used for manual selection of attributes. Something similar can also be done with a `Select Attributes <SelectAttributes.htm>`_ widget, except that the Rank widget can be used for selecting the attributes according to their quality, while Select Attributes offers more in terms of changing the order of attributes, picking another class attribute and similar.
50
51The widget can also be used to automatically select a feature subset. If :obj:`Best ranked` is selected in box :obj:`Select attributes`, the widget will output a data set where examples are described by the specified number of best ranked attributes. The data set is changed whenever the order of attributes is changed for any reason (different measure is selected for sorting, ReliefF or discretization settings are changed...)
52
53The first two options in :obj:`Select Attributes` box can be used to clear the selection (:obj:`None`) or to select all attributes (:obj:`All`).
54
55Button :obj:`Commit` sends the data set with the selected attributes. If :obj:`Send automatically` is set, the data set is committed on any change.
56
57
58Examples
59--------
60
61On typical use of the widget is to put it immediately after the `File widget <File.htm>`_ to reduce the attribute set. The snapshot below shows this as a part of a bit more complicated schema.
62
63.. image:: images/Rank-after-file-Schema.png
64
65The examples in the file are put through `Data Sampler <DataSampler.htm>`_ which split the data set into two subsets: one, containing 70% of examples (signal :code:`Classified Examples`) will be used for training a `naive Bayesian classifier <../Classify/NaiveBayes.htm>`_, and the other 30% (signal :code:`Remaining Classified Examples`) for testing. Attribute subset selection based on information gain was performed on the training set only, and five most informative attributes were selected for learning. A data set with all other attributes removed (signal :code:`Reduced Example Table`) is fed into :code:`Test Learners`. Test Learners widgets also gets the :code:`Remaining Classified Examples` to use them as test examples (don't forget to set :code:`Test on Test Data` in that widget!).
66
67To verify how the subset selection affects the classifier's performance, we added another :code:`Test Learners`, but connected it to the :code:`Data Sampler` so that the two subsets emitted by the latter are used for training and testing without any feature subset selection.
68
69Running this schema on the heart disease data set shows quite a considerable improvements in all respects on the reduced attribute subset.
70
71In another, way simpler example, we connected a `Tree Viewer <../Classify/ClassificationTreeGraph.htm>`_ to the Rank widget to observe different attribute quality measures at different nodes. This can give us some picture about how important is the selection of measure in tree construction: the more the measures agree about attribute ranking, the less crucial is the measure selection.
72
73.. image:: images/Rank-Tree.png
74
75A variation of the above is using the Rank widget after the `Interactive tree builder <../Classify/InteractiveTreeBuilder.htm>`_: the sorted attributes may help us in deciding the attribute to use at a certain node.
76
77.. image:: images/Rank-ITree.png
Note: See TracBrowser for help on using the repository browser.