Changeset 11795:7d7ee77fd99b in orange


Ignore:
Timestamp:
12/06/13 06:57:17 (5 months ago)
Author:
blaz <blaz.zupan@…>
Branch:
default
Message:

Updated documentation for Rank widget.

Location:
docs/widgets/rst/data
Files:
5 added
3 deleted
2 edited

Legend:

Unmodified
Added
Removed
  • docs/widgets/rst/data/rank.rst

    r11778 r11795  
    66.. image:: ../../../../Orange/OrangeWidgets/Data/icons/Rank.svg 
    77 
    8 A widget for ranking the attributes and selecting attribute subsets. 
     8Ranking of attributes in classification or regression data sets. 
    99 
    1010Signals 
     
    1313Inputs: 
    1414 
    15  
    16    - Examples (ExampleTable) 
     15   - Data 
    1716      Input data set. 
    18  
    1917 
    2018Outputs: 
    2119 
    22  
    23    - Reduced Example Table (ExampleTable) 
    24       Data set which include described by selected attributes. 
    25  
    26    - ExampleTable Attributes (ExampleTable) 
    27       Data set in where each example corresponds to an attribute from the 
    28       original set, and the attributes correspond one of the selected 
    29       attribute evaluation measures. 
    30  
     20   - Reduced Data 
     21      Data set which selected attributes. 
    3122 
    3223Description 
    3324----------- 
    3425 
    35 This widget computes a set of measures for evaluating the quality/usefulness 
    36 of attributes: ReliefF, information gain, gain ratio and gini index. 
    37 Besides providing this information, it also allows user to select a subset 
    38 of attributes or it can automatically select the specified number of 
    39 best-ranked attributes. 
     26Rank widget considers class-labeled data sets (classification or regression) 
     27and scores the attributes according to their correlation with the 
     28class. 
    4029 
    41 .. image:: images/Rank.png 
     30.. image:: images/Rank-stamped.png 
    4231 
    43 The right-hand side of the widget presents the computed quality of the 
    44 attributes. The first line shows the attribute name and the second the 
    45 number of its values (or a "C", if the attribute is continuous. Remaining 
    46 columns show different measures of quality. 
     321. Attributes (rows) and their scores by different scoring methods 
     33   (columns). 
     34#. Scoring techniques and their (optional) parameters. 
     35#. For scoring techniques that require discrete attributes this is the number 
     36   of intervals to which continues attributes will be discretized to. 
     37#. Number of decimals used in reporting the score. 
     38#. Toggles the bar-based visualisation of the feature scores. 
     39#. Adds a score table to the current report. 
    4740 
    48 The user is able to select the measures (s)he wants computed and presented. 
    49 :obj:`ReliefF` requires setting two arguments: the number of :obj:`Neighbours` 
    50 taken into account and the number of randomly chosen reference :obj:`Examples`. 
    51 The former should be higher if there is a lot of noise; the latter generally 
    52 makes the computation less reliable if set too low, while higher values 
    53 make it slow. 
     41Example: Attribute Ranking and Selection 
     42---------------------------------------- 
    5443 
    55 The order in which the attributes are presented can be set either in the 
    56 list below the measures or by clicking the table's column headers. Attributes 
    57 can also be sorted by a measure not printed in the table. 
     44Below we have used immediately after the :ref:`File` 
     45widget to reduce the set of data attribute and include only the most 
     46informative one: 
    5847 
    59 Measures that cannot handle continuous attributes (impurity 
    60 measures - information gain, gain ratio and gini index) are run on 
    61 discretized attributes. For sake of simplicity we always split the 
    62 continuous attributes in intervals with (approximately) equal number of 
    63 examples, but the user can set the number of :obj:`Intervals`. 
     48.. image:: images/Rank-Select-Schema.png 
    6449 
    65 It is also possible to set the number of decimals 
    66 (:obj:`No. of decimals`) in the print out. Using a number to high may 
    67 exaggerate the accuracy of the computation; many decimals may only be 
    68 useful when the computed numbers are really small. 
     50Notice how the widget outputs a data set that includes only the best-scored 
     51attributes: 
    6952 
    70 The widget outputs two example tables. The one, whose corresponding signal 
    71 is named :obj:`ExampleTable Attributes` looks pretty much like the one 
    72 shown in the Rank widget, except that the second column is split into two 
    73 columns, one giving the attribute type (D for discrete and C for continuous), 
    74 and the other giving the number of distinct values if the attribute is 
    75 discrete and undefined if it's continuous. 
     53.. image:: images/Rank-Select-Widgets.png 
    7654 
    77 The second, more interesting table has the same examples as the original, 
    78 but with a subset of the attributes. To select/unselect attributes, click 
    79 the corresponding rows in the table. This way, the widget can be used for 
    80 manual selection of attributes. Something similar can also be done with 
    81 a :ref:`Select Attributes` widget, except that the Rank widget can be used 
    82 for selecting the attributes according to their quality, while Select 
    83 Attributes offers more in terms of changing the order of attributes, 
    84 picking another class attribute and similar. 
     55Example: Feature Subset Selection for Machine Learning 
     56------------------------------------------------------ 
    8557 
    86 The widget can also be used to automatically select a feature subset. 
    87 If :obj:`Best ranked` is selected in box :obj:`Select Attributes`, the 
    88 widget will output a data set where examples are described by the 
    89 specified number of best ranked attributes. The data set is changed 
    90 whenever the order of attributes is changed for any reason (different 
    91 measure is selected for sorting, ReliefF or discretization settings are 
    92 changed...) 
     58Following is a bit more complicated example. In the workflow below we 
     59first split the data into training and test set. In the upper branch 
     60the training data passes through the Rank widget to select the most 
     61informative attributes, while in the lower branch there is no feature 
     62selection. Both feature selected and original data sets are passed to 
     63its own :ref:`Test Learners` widget, which develops a 
     64:ref:`Naive Bayes <Naive Bayes>` classifier and scores it on a test set. 
    9365 
    94 The first two options in :obj:`Select Attributes` box can be used to 
    95 clear the selection (:obj:`None`) or to select all attributes (:obj:`All`). 
     66.. image:: images/Rank-and-Test.png 
    9667 
    97 Button :obj:`Commit` sends the data set with the selected attributes. 
    98 If :obj:`Commit automatically` is set, the data set is committed on any change. 
    99  
    100  
    101 Examples 
    102 -------- 
    103  
    104 On typical use of the widget is to put it immediately after the :ref:`File` 
    105 widget to reduce the attribute set. The snapshot below shows this as a part of 
    106 a bit more complicated schema. 
    107  
    108 .. image:: images/Rank-after-file-Schema.png 
    109  
    110 The examples in the file are put through ref:`Data Sampler` which split the 
    111 data set into two subsets: one, containing 70% of examples (signal 
    112 :obj:`Classified Examples`) will be used for training a 
    113 :ref:`Naive Bayes <Naive Bayes>` classifier, and the other 30% (signal 
    114 :obj:`Remaining Classified Examples`) for testing. Attribute subset selection 
    115 based on information gain was performed on the training set only, and five most 
    116 informative attributes were selected for learning. A data set with all other 
    117 attributes removed (signal :obj:`Reduced Example Table`) is fed into 
    118 :ref:`Test Learners`. Test Learners widgets also gets the 
    119 :obj:`Remaining Classified Examples` to use them as test examples (don't 
    120 forget to set :obj:`Test on Test Data` in that widget!). 
    121  
    122 To verify how the subset selection affects the classifier's performance, we 
    123 added another :ref:`Test Learners`, but connected it to the 
    124 :ref:`Data Sampler` so that the two subsets emitted by the latter are used 
    125 for training and testing without any feature subset selection. 
    126  
    127 Running this schema on the heart disease data set shows quite a considerable 
    128 improvements in all respects on the reduced attribute subset. 
    129  
    130 In another, way simpler example, we connected a 
    131 :ref:`Classification Tree Viewer` to the Rank widget to observe different 
    132 attribute quality measures at different nodes. This can give us some picture 
    133 about how important is the selection of measure in tree construction: the more 
    134 the measures agree about attribute ranking, the less crucial is the measure 
    135 selection. 
    136  
    137 .. image:: images/Rank-Tree.png 
    138  
    139 A variation of the above is using the Rank widget after the 
    140 :ref:`Interactive Tree Builder`: the sorted attributes may help us in deciding 
    141 the attribute to use at a certain node. 
    142  
    143 .. image:: images/Rank-ITree.png 
     68For data sets with many features and naive Bayesian classifier feature 
     69selection, as shown above, would often yield a better predictive accuracy. 
Note: See TracChangeset for help on using the changeset viewer.