Changeset 11795:7d7ee77fd99b in orange
- 12/06/13 06:57:17 (3 months ago)
- 5 added
- 3 deleted
- 2 edited
r11778 r11795 6 6 .. image:: ../../../../Orange/OrangeWidgets/Data/icons/Rank.svg 7 7 8 A widget for ranking the attributes and selecting attribute subsets. 8 sets. 9 9 10 10 Signals … … 13 13 Inputs: 14 14 15 16 - Examples (ExampleTable) 15 - Data 17 16 Input data set. 18 19 17 20 18 Outputs: 21 19 22 23 - Reduced Example Table (ExampleTable) 24 Data set which include described by selected attributes. 25 26 - ExampleTable Attributes (ExampleTable) 27 Data set in where each example corresponds to an attribute from the 28 original set, and the attributes correspond one of the selected 29 attribute evaluation measures. 30 20 - Reduced Data 21 Data set which selected attributes. 31 22 32 23 Description 33 24 ----------- 34 25 35 This widget computes a set of measures for evaluating the quality/usefulness 36 of attributes: ReliefF, information gain, gain ratio and gini index. 37 Besides providing this information, it also allows user to select a subset 38 of attributes or it can automatically select the specified number of 39 best-ranked attributes. 26 Rank widget considers class-labeled data sets (classification or regression) 27 and scores the attributes according to their correlation with the 28 class. 40 29 41 .. image:: images/Rank .png 30 .. image:: images/Rank.png 42 31 43 The right-hand side of the widget presents the computed quality of the 44 attributes. The first line shows the attribute name and the second the 45 number of its values (or a "C", if the attribute is continuous. Remaining 46 columns show different measures of quality. 32 1. Attributes (rows) and their scores by different scoring methods 33 (columns). 34 #. Scoring techniques and their (optional) parameters. 35 #. For scoring techniques that require discrete attributes this is the number 36 of intervals to which continues attributes will be discretized to. 37 #. Number of decimals used in reporting the score. 38 #. Toggles the bar-based visualisation of the feature scores. 39 #. Adds a score table to the current report. 47 40 48 The user is able to select the measures (s)he wants computed and presented. 49 :obj:`ReliefF` requires setting two arguments: the number of :obj:`Neighbours` 50 taken into account and the number of randomly chosen reference :obj:`Examples`. 51 The former should be higher if there is a lot of noise; the latter generally 52 makes the computation less reliable if set too low, while higher values 53 make it slow. 41 Example: Attribute Ranking and Selection 42 ---------------------------------------- 54 43 55 The order in which the attributes are presented can be set either in the 56 list below the measures or by clicking the table's column headers. Attributes 57 can also be sorted by a measure not printed in the table. 44 Below we have used immediately after the :ref:`File` 45 widget to reduce the set of data attribute and include only the most 46 informative one: 58 47 59 Measures that cannot handle continuous attributes (impurity 60 measures - information gain, gain ratio and gini index) are run on 61 discretized attributes. For sake of simplicity we always split the 62 continuous attributes in intervals with (approximately) equal number of 63 examples, but the user can set the number of :obj:`Intervals`. 48 .. image:: images/Rank-Select-Schema.png 64 49 65 It is also possible to set the number of decimals 66 (:obj:`No. of decimals`) in the print out. Using a number to high may 67 exaggerate the accuracy of the computation; many decimals may only be 68 useful when the computed numbers are really small. 50 Notice how the widget outputs a data set that includes only the best-scored 51 attributes: 69 52 70 The widget outputs two example tables. The one, whose corresponding signal 71 is named :obj:`ExampleTable Attributes` looks pretty much like the one 72 shown in the Rank widget, except that the second column is split into two 73 columns, one giving the attribute type (D for discrete and C for continuous), 74 and the other giving the number of distinct values if the attribute is 75 discrete and undefined if it's continuous. 53 .. image:: images/Rank-Select-Widgets.png 76 54 77 The second, more interesting table has the same examples as the original, 78 but with a subset of the attributes. To select/unselect attributes, click 79 the corresponding rows in the table. This way, the widget can be used for 80 manual selection of attributes. Something similar can also be done with 81 a :ref:`Select Attributes` widget, except that the Rank widget can be used 82 for selecting the attributes according to their quality, while Select 83 Attributes offers more in terms of changing the order of attributes, 84 picking another class attribute and similar. 55 Example: Feature Subset Selection for Machine Learning 56 ------------------------------------------------------ 85 57 86 The widget can also be used to automatically select a feature subset. 87 If :obj:`Best ranked` is selected in box :obj:`Select Attributes`, the 88 widget will output a data set where examples are described by the 89 specified number of best ranked attributes. The data set is changed 90 whenever the order of attributes is changed for any reason (different 91 measure is selected for sorting, ReliefF or discretization settings are 92 changed...) 58 Following is a bit more complicated example. In the workflow below we 59 first split the data into training and test set. In the upper branch 60 the training data passes through the Rank widget to select the most 61 informative attributes, while in the lower branch there is no feature 62 selection. Both feature selected and original data sets are passed to 63 its own :ref:`Test Learners` widget, which develops a 64 :ref:`Naive Bayes <Naive Bayes>` classifier and scores it on a test set. 93 65 94 The first two options in :obj:`Select Attributes` box can be used to 95 clear the selection (:obj:`None`) or to select all attributes (:obj:`All`). 66 .. image:: images/Rank-and-Test.png 96 67 97 Button :obj:`Commit` sends the data set with the selected attributes. 98 If :obj:`Commit automatically` is set, the data set is committed on any change. 99 100 101 Examples 102 -------- 103 104 On typical use of the widget is to put it immediately after the :ref:`File` 105 widget to reduce the attribute set. The snapshot below shows this as a part of 106 a bit more complicated schema. 107 108 .. image:: images/Rank-after-file-Schema.png 109 110 The examples in the file are put through ref:`Data Sampler` which split the 111 data set into two subsets: one, containing 70% of examples (signal 112 :obj:`Classified Examples`) will be used for training a 113 :ref:`Naive Bayes <Naive Bayes>` classifier, and the other 30% (signal 114 :obj:`Remaining Classified Examples`) for testing. Attribute subset selection 115 based on information gain was performed on the training set only, and five most 116 informative attributes were selected for learning. A data set with all other 117 attributes removed (signal :obj:`Reduced Example Table`) is fed into 118 :ref:`Test Learners`. Test Learners widgets also gets the 119 :obj:`Remaining Classified Examples` to use them as test examples (don't 120 forget to set :obj:`Test on Test Data` in that widget!). 121 122 To verify how the subset selection affects the classifier's performance, we 123 added another :ref:`Test Learners`, but connected it to the 124 :ref:`Data Sampler` so that the two subsets emitted by the latter are used 125 for training and testing without any feature subset selection. 126 127 Running this schema on the heart disease data set shows quite a considerable 128 improvements in all respects on the reduced attribute subset. 129 130 In another, way simpler example, we connected a 131 :ref:`Classification Tree Viewer` to the Rank widget to observe different 132 attribute quality measures at different nodes. This can give us some picture 133 about how important is the selection of measure in tree construction: the more 134 the measures agree about attribute ranking, the less crucial is the measure 135 selection. 136 137 .. image:: images/Rank-Tree.png 138 139 A variation of the above is using the Rank widget after the 140 :ref:`Interactive Tree Builder`: the sorted attributes may help us in deciding 141 the attribute to use at a certain node. 142 143 .. image:: images/Rank-ITree.png 68 For data sets with many features and naive Bayesian classifier feature 69 selection, as shown above, would often yield a better predictive accuracy.
Note: See TracChangeset for help on using the changeset viewer.