source: orange/docs/reference/rst/Orange.feature.selection.rst @ 10172:2ab492979b00

Revision 10172:2ab492979b00, 5.3 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Documentation for Orange.feature.scoring: more cleaning up

Line 
1.. :py:currentmodule:: Orange.feature.selection
2
3#########################
4Selection (``selection``)
5#########################
6
7.. index:: feature selection
8
9.. index::
10   single: feature; feature selection
11
12Feature selection module contains several functions for selecting features based on they scores. A typical example is the function :obj:`select_best_n` that returns the best n features:
13
14    .. literalinclude:: code/selection-best3.py
15        :lines: 7-
16
17    The script outputs::
18
19        Best 3 features:
20        physician-fee-freeze
21        el-salvador-aid
22        synfuels-corporation-cutback
23
24The module also includes a learner that incorporates feature subset
25selection.
26
27--------------------------------------
28Functions for feature subset selection
29--------------------------------------
30
31.. automethod:: Orange.feature.selection.best_n
32
33.. automethod:: Orange.feature.selection.above_threshold
34
35.. automethod:: Orange.feature.selection.select_best_n
36
37.. automethod:: Orange.feature.selection.select_above_threshold
38
39.. automethod:: Orange.feature.selection.select_relief(data, measure=Orange.feature.scoring.Relief(k=20, m=10), margin=0)
40
41--------------------------------------
42Learning with feature subset selection
43--------------------------------------
44
45.. autoclass:: Orange.feature.selection.FilteredLearner(base_learner, filter=FilterAboveThreshold(), name=filtered)
46   :members:
47
48.. autoclass:: Orange.feature.selection.FilteredClassifier
49   :members:
50
51
52--------------------------------------
53Class wrappers for selection functions
54--------------------------------------
55
56.. autoclass:: Orange.feature.selection.FilterAboveThreshold(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), threshold=0.0)
57   :members:
58
59.. autoclass:: Orange.feature.selection.FilterBestN(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), n=5)
60   :members:
61
62.. autoclass:: Orange.feature.selection.FilterRelief(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), margin=0)
63   :members:
64
65
66
67.. rubric:: Examples
68
69The following script defines a new Naive Bayes classifier, that
70selects five best features from the data set before learning.
71The new classifier is wrapped-up in a special class (see
72<a href="../ofb/c_pythonlearner.htm">Building your own learner</a>
73lesson in <a href="../ofb/default.htm">Orange for Beginners</a>). The
74script compares this filtered learner with one that uses a complete
75set of features.
76
77:download:`selection-bayes.py<code/selection-bayes.py>`
78
79.. literalinclude:: code/selection-bayes.py
80    :lines: 7-
81
82Interestingly, and somehow expected, feature subset selection
83helps. This is the output that we get::
84
85    Learner      CA
86    Naive Bayes  0.903
87    with FSS     0.940
88
89We can do all of  he above by wrapping the learner using
90<code>FilteredLearner</code>, thus
91creating an object that is assembled from data filter and a base learner. When
92given a data table, this learner uses attribute filter to construct a new
93data set and base learner to construct a corresponding
94classifier. Attribute filters should be of the type like
95<code>orngFSS.FilterAboveThresh</code> or
96<code>orngFSS.FilterBestN</code> that can be initialized with the
97arguments and later presented with a data, returning new reduced data
98set.
99
100The following code fragment replaces the bulk of code
101from previous example, and compares naive Bayesian classifier to the
102same classifier when only a single most important attribute is
103used.
104
105:download:`selection-filtered-learner.py<code/selection-filtered-learner.py>`
106
107.. literalinclude:: code/selection-filtered-learner.py
108    :lines: 13-16
109
110Now, let's decide to retain three features (change the code in <a
111href="fss4.py">fss4.py</a> accordingly!), but observe how many times
112an attribute was used. Remember, 10-fold cross validation constructs
113ten instances for each classifier, and each time we run
114FilteredLearner a different set of features may be
115selected. <code>orngEval.CrossValidation</code> stores classifiers in
116<code>results</code> variable, and <code>FilteredLearner</code>
117returns a classifier that can tell which features it used (how
118convenient!), so the code to do all this is quite short.
119
120.. literalinclude:: code/selection-filtered-learner.py
121    :lines: 25-
122
123Running :download:`selection-filtered-learner.py <code/selection-filtered-learner.py>` with three features selected each
124time a learner is run gives the following result::
125
126    Learner      CA
127    bayes        0.903
128    filtered     0.956
129
130    Number of times features were used in cross-validation:
131     3 x el-salvador-aid
132     6 x synfuels-corporation-cutback
133     7 x adoption-of-the-budget-resolution
134    10 x physician-fee-freeze
135     4 x crime
136
137Experiment yourself to see, if only one attribute is retained for
138classifier, which attribute was the one most frequently selected over
139all the ten cross-validation tests!
140
141==========
142References
143==========
144
145* K. Kira and L. Rendell. A practical approach to feature selection. In
146  D. Sleeman and P. Edwards, editors, Proc. 9th Int'l Conf. on Machine
147  Learning, pages 249{256, Aberdeen, 1992. Morgan Kaufmann Publishers.
148
149* I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF.
150  In F. Bergadano and L. De Raedt, editors, Proc. European Conf. on Machine
151  Learning (ECML-94), pages  171-182. Springer-Verlag, 1994.
152
153* R. Kohavi, G. John: Wrappers for Feature Subset Selection, Artificial
154  Intelligence, 97 (1-2), pages 273-324, 1997
Note: See TracBrowser for help on using the repository browser.