source: orange/docs/reference/rst/Orange.feature.selection.rst @ 11647:dfa6d31c2fc2

Revision 11647:dfa6d31c2fc2, 5.7 KB checked in by Ales Erjavec <ales.erjavec@…>, 9 months ago (diff)

Preserve the domain's meta attributes and class_vars.

Line 
1.. :py:currentmodule:: Orange.feature.selection
2
3#########################
4Selection (``selection``)
5#########################
6
7.. index:: feature selection
8
9.. index::
10   single: feature; feature selection
11
12Feature selection module contains several utility functions for selecting features based on they scores normally
13obtained in classification or regression problems. A typical example is the function :obj:`select`
14that returns a subsets of highest-scored features features:
15
16.. literalinclude:: code/selection-best3.py
17    :lines: 7-
18
19The script outputs::
20
21    Best 3 features:
22    physician-fee-freeze
23    el-salvador-aid
24    synfuels-corporation-cutback
25
26The module also includes a learner that incorporates feature subset
27selection.
28
29
30.. versionadded:: 2.7.1
31   `select`, `select_above_threshold` and `select_relief` now preserve
32   the domain's meta attributes and `class_vars`.
33
34--------------------------------------
35Functions for feature subset selection
36--------------------------------------
37
38.. automethod:: Orange.feature.selection.top_rated
39
40.. automethod:: Orange.feature.selection.above_threshold
41
42.. automethod:: Orange.feature.selection.select
43
44.. automethod:: Orange.feature.selection.select_above_threshold
45
46.. automethod:: Orange.feature.selection.select_relief(data, measure=Orange.feature.scoring.Relief(k=20, m=10), margin=0)
47
48--------------------------------------
49Learning with feature subset selection
50--------------------------------------
51
52.. autoclass:: Orange.feature.selection.FilteredLearner(base_learner, filter=FilterAboveThreshold(), name=filtered)
53   :members:
54
55.. autoclass:: Orange.feature.selection.FilteredClassifier
56   :members:
57
58
59--------------------------------------
60Class wrappers for selection functions
61--------------------------------------
62
63.. autoclass:: Orange.feature.selection.FilterAboveThreshold(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), threshold=0.0)
64   :members:
65
66Below are few examples of utility of this class::
67
68    >>> filter = Orange.feature.selection.FilterAboveThreshold(threshold=.15)
69    >>> new_data = filter(data)
70    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data)
71    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1)
72    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1, \
73        measure=Orange.feature.scoring.Gini())
74
75.. autoclass:: Orange.feature.selection.FilterBestN(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), n=5)
76   :members:
77
78.. autoclass:: Orange.feature.selection.FilterRelief(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), margin=0)
79   :members:
80
81
82
83.. rubric:: Examples
84
85The following script defines a new Naive Bayes classifier, that
86selects five best features from the data set before learning.
87The new classifier is wrapped-up in a special class (see
88:doc:`/tutorial/rst/python-learners` lesson in
89:doc:`/tutorial/rst/index`). Th script compares this filtered learner with
90one that uses a complete set of features.
91
92:download:`selection-bayes.py<code/selection-bayes.py>`
93
94.. literalinclude:: code/selection-bayes.py
95    :lines: 7-
96
97Interestingly, and somehow expected, feature subset selection
98helps. This is the output that we get::
99
100    Learner      CA
101    Naive Bayes  0.903
102    with FSS     0.940
103
104We can do all of  he above by wrapping the learner using
105:class:`~Orange.feature.selection.FilteredLearner`, thus
106creating an object that is assembled from data filter and a base learner. When
107given a data table, this learner uses attribute filter to construct a new
108data set and base learner to construct a corresponding
109classifier. Attribute filters should be of the type like
110:class:`~Orange.feature.selection.FilterAboveThreshold` or
111:class:`~Orange.feature.selection.FilterBestN` that can be initialized with
112the arguments and later presented with a data, returning new reduced data
113set.
114
115The following code fragment replaces the bulk of code
116from previous example, and compares naive Bayesian classifier to the
117same classifier when only a single most important attribute is
118used.
119
120:download:`selection-filtered-learner.py<code/selection-filtered-learner.py>`
121
122.. literalinclude:: code/selection-filtered-learner.py
123    :lines: 13-16
124
125Now, let's decide to retain three features and observe how many times
126an attribute was used. Remember, 10-fold cross validation constructs
127ten instances for each classifier, and each time we run
128:class:`~.FilteredLearner` a different set of features may be
129selected. ``Orange.evaluation.testing.cross_validation`` stores classifiers in
130``results`` variable, and :class:`~.FilteredLearner`
131returns a classifier that can tell which features it used, so the code
132to do all this is quite short.
133
134.. literalinclude:: code/selection-filtered-learner.py
135    :lines: 25-
136
137Running :download:`selection-filtered-learner.py <code/selection-filtered-learner.py>`
138with three features selected each time a learner is run gives the
139following result::
140
141    Learner      CA
142    bayes        0.903
143    filtered     0.956
144
145    Number of times features were used in cross-validation:
146     3 x el-salvador-aid
147     6 x synfuels-corporation-cutback
148     7 x adoption-of-the-budget-resolution
149    10 x physician-fee-freeze
150     4 x crime
151
152
153==========
154References
155==========
156
157* K. Kira and L. Rendell. A practical approach to feature selection. In
158  D. Sleeman and P. Edwards, editors, Proc. 9th Int'l Conf. on Machine
159  Learning, pages 249{256, Aberdeen, 1992. Morgan Kaufmann Publishers.
160
161* I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF.
162  In F. Bergadano and L. De Raedt, editors, Proc. European Conf. on Machine
163  Learning (ECML-94), pages  171-182. Springer-Verlag, 1994.
164
165* R. Kohavi, G. John: Wrappers for Feature Subset Selection, Artificial
166  Intelligence, 97 (1-2), pages 273-324, 1997
Note: See TracBrowser for help on using the repository browser.