source: orange/docs/reference/rst/Orange.feature.selection.rst @ 10708:43138a3b5624

Revision 10708:43138a3b5624, 5.8 KB checked in by blaz <blaz.zupan@…>, 2 years ago (diff)

Some cosmetics and renaming in Orange.feature.selection

RevLine 
[10171]1.. :py:currentmodule:: Orange.feature.selection
2
3#########################
4Selection (``selection``)
5#########################
6
7.. index:: feature selection
8
9.. index::
10   single: feature; feature selection
11
[10708]12Feature selection module contains several utility functions for selecting features based on they scores normally
13obtained in classification or regression problems. A typical example is the function :obj:`select`
14that returns a subsets of highest-scored features features:
[10171]15
[10708]16.. literalinclude:: code/selection-best3.py
17    :lines: 7-
[10171]18
[10708]19The script outputs::
[10171]20
[10708]21    Best 3 features:
22    physician-fee-freeze
23    el-salvador-aid
24    synfuels-corporation-cutback
[10171]25
[10172]26The module also includes a learner that incorporates feature subset
27selection.
28
29--------------------------------------
30Functions for feature subset selection
31--------------------------------------
32
[10708]33.. automethod:: Orange.feature.selection.top_rated
[10172]34
35.. automethod:: Orange.feature.selection.above_threshold
36
[10708]37.. automethod:: Orange.feature.selection.select
[10172]38
39.. automethod:: Orange.feature.selection.select_above_threshold
40
41.. automethod:: Orange.feature.selection.select_relief(data, measure=Orange.feature.scoring.Relief(k=20, m=10), margin=0)
42
43--------------------------------------
44Learning with feature subset selection
45--------------------------------------
46
47.. autoclass:: Orange.feature.selection.FilteredLearner(base_learner, filter=FilterAboveThreshold(), name=filtered)
48   :members:
49
50.. autoclass:: Orange.feature.selection.FilteredClassifier
51   :members:
52
53
54--------------------------------------
55Class wrappers for selection functions
56--------------------------------------
57
[10171]58.. autoclass:: Orange.feature.selection.FilterAboveThreshold(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), threshold=0.0)
59   :members:
60
[10708]61Below are few examples of utility of this class::
62
63    >>> filter = Orange.feature.selection.FilterAboveThreshold(threshold=.15)
64    >>> new_data = filter(data)
65    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data)
66    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1)
67    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1, \
68        measure=Orange.feature.scoring.Gini())
69
[10171]70.. autoclass:: Orange.feature.selection.FilterBestN(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), n=5)
71   :members:
72
73.. autoclass:: Orange.feature.selection.FilterRelief(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), margin=0)
74   :members:
75
76
77
78.. rubric:: Examples
79
80The following script defines a new Naive Bayes classifier, that
81selects five best features from the data set before learning.
82The new classifier is wrapped-up in a special class (see
83<a href="../ofb/c_pythonlearner.htm">Building your own learner</a>
84lesson in <a href="../ofb/default.htm">Orange for Beginners</a>). The
85script compares this filtered learner with one that uses a complete
86set of features.
87
88:download:`selection-bayes.py<code/selection-bayes.py>`
89
90.. literalinclude:: code/selection-bayes.py
91    :lines: 7-
92
93Interestingly, and somehow expected, feature subset selection
94helps. This is the output that we get::
95
96    Learner      CA
97    Naive Bayes  0.903
98    with FSS     0.940
99
100We can do all of  he above by wrapping the learner using
101<code>FilteredLearner</code>, thus
102creating an object that is assembled from data filter and a base learner. When
103given a data table, this learner uses attribute filter to construct a new
104data set and base learner to construct a corresponding
105classifier. Attribute filters should be of the type like
106<code>orngFSS.FilterAboveThresh</code> or
107<code>orngFSS.FilterBestN</code> that can be initialized with the
108arguments and later presented with a data, returning new reduced data
109set.
110
111The following code fragment replaces the bulk of code
112from previous example, and compares naive Bayesian classifier to the
113same classifier when only a single most important attribute is
114used.
115
116:download:`selection-filtered-learner.py<code/selection-filtered-learner.py>`
117
118.. literalinclude:: code/selection-filtered-learner.py
119    :lines: 13-16
120
121Now, let's decide to retain three features (change the code in <a
122href="fss4.py">fss4.py</a> accordingly!), but observe how many times
123an attribute was used. Remember, 10-fold cross validation constructs
124ten instances for each classifier, and each time we run
125FilteredLearner a different set of features may be
126selected. <code>orngEval.CrossValidation</code> stores classifiers in
127<code>results</code> variable, and <code>FilteredLearner</code>
128returns a classifier that can tell which features it used (how
129convenient!), so the code to do all this is quite short.
130
131.. literalinclude:: code/selection-filtered-learner.py
132    :lines: 25-
133
134Running :download:`selection-filtered-learner.py <code/selection-filtered-learner.py>` with three features selected each
135time a learner is run gives the following result::
136
137    Learner      CA
138    bayes        0.903
139    filtered     0.956
140
141    Number of times features were used in cross-validation:
142     3 x el-salvador-aid
143     6 x synfuels-corporation-cutback
144     7 x adoption-of-the-budget-resolution
145    10 x physician-fee-freeze
146     4 x crime
147
148Experiment yourself to see, if only one attribute is retained for
149classifier, which attribute was the one most frequently selected over
150all the ten cross-validation tests!
151
152==========
153References
154==========
155
156* K. Kira and L. Rendell. A practical approach to feature selection. In
157  D. Sleeman and P. Edwards, editors, Proc. 9th Int'l Conf. on Machine
158  Learning, pages 249{256, Aberdeen, 1992. Morgan Kaufmann Publishers.
159
160* I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF.
161  In F. Bergadano and L. De Raedt, editors, Proc. European Conf. on Machine
162  Learning (ECML-94), pages  171-182. Springer-Verlag, 1994.
163
164* R. Kohavi, G. John: Wrappers for Feature Subset Selection, Artificial
165  Intelligence, 97 (1-2), pages 273-324, 1997
Note: See TracBrowser for help on using the repository browser.