source: orange/docs/reference/rst/Orange.feature.selection.rst @ 11405:a0c174649f63

Revision 11405:a0c174649f63, 5.6 KB checked in by Ales Erjavec <ales.erjavec@…>, 13 months ago (diff)

Fixes to documentation.

Line 
1.. :py:currentmodule:: Orange.feature.selection
2
3#########################
4Selection (``selection``)
5#########################
6
7.. index:: feature selection
8
9.. index::
10   single: feature; feature selection
11
12Feature selection module contains several utility functions for selecting features based on they scores normally
13obtained in classification or regression problems. A typical example is the function :obj:`select`
14that returns a subsets of highest-scored features features:
15
16.. literalinclude:: code/selection-best3.py
17    :lines: 7-
18
19The script outputs::
20
21    Best 3 features:
22    physician-fee-freeze
23    el-salvador-aid
24    synfuels-corporation-cutback
25
26The module also includes a learner that incorporates feature subset
27selection.
28
29--------------------------------------
30Functions for feature subset selection
31--------------------------------------
32
33.. automethod:: Orange.feature.selection.top_rated
34
35.. automethod:: Orange.feature.selection.above_threshold
36
37.. automethod:: Orange.feature.selection.select
38
39.. automethod:: Orange.feature.selection.select_above_threshold
40
41.. automethod:: Orange.feature.selection.select_relief(data, measure=Orange.feature.scoring.Relief(k=20, m=10), margin=0)
42
43--------------------------------------
44Learning with feature subset selection
45--------------------------------------
46
47.. autoclass:: Orange.feature.selection.FilteredLearner(base_learner, filter=FilterAboveThreshold(), name=filtered)
48   :members:
49
50.. autoclass:: Orange.feature.selection.FilteredClassifier
51   :members:
52
53
54--------------------------------------
55Class wrappers for selection functions
56--------------------------------------
57
58.. autoclass:: Orange.feature.selection.FilterAboveThreshold(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), threshold=0.0)
59   :members:
60
61Below are few examples of utility of this class::
62
63    >>> filter = Orange.feature.selection.FilterAboveThreshold(threshold=.15)
64    >>> new_data = filter(data)
65    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data)
66    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1)
67    >>> new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1, \
68        measure=Orange.feature.scoring.Gini())
69
70.. autoclass:: Orange.feature.selection.FilterBestN(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), n=5)
71   :members:
72
73.. autoclass:: Orange.feature.selection.FilterRelief(data=None, measure=Orange.feature.scoring.Relief(k=20, m=50), margin=0)
74   :members:
75
76
77
78.. rubric:: Examples
79
80The following script defines a new Naive Bayes classifier, that
81selects five best features from the data set before learning.
82The new classifier is wrapped-up in a special class (see
83:doc:`/tutorial/rst/python-learners` lesson in
84:doc:`/tutorial/rst/index`). Th script compares this filtered learner with
85one that uses a complete set of features.
86
87:download:`selection-bayes.py<code/selection-bayes.py>`
88
89.. literalinclude:: code/selection-bayes.py
90    :lines: 7-
91
92Interestingly, and somehow expected, feature subset selection
93helps. This is the output that we get::
94
95    Learner      CA
96    Naive Bayes  0.903
97    with FSS     0.940
98
99We can do all of  he above by wrapping the learner using
100:class:`~Orange.feature.selection.FilteredLearner`, thus
101creating an object that is assembled from data filter and a base learner. When
102given a data table, this learner uses attribute filter to construct a new
103data set and base learner to construct a corresponding
104classifier. Attribute filters should be of the type like
105:class:`~Orange.feature.selection.FilterAboveThreshold` or
106:class:`~Orange.feature.selection.FilterBestN` that can be initialized with
107the arguments and later presented with a data, returning new reduced data
108set.
109
110The following code fragment replaces the bulk of code
111from previous example, and compares naive Bayesian classifier to the
112same classifier when only a single most important attribute is
113used.
114
115:download:`selection-filtered-learner.py<code/selection-filtered-learner.py>`
116
117.. literalinclude:: code/selection-filtered-learner.py
118    :lines: 13-16
119
120Now, let's decide to retain three features and observe how many times
121an attribute was used. Remember, 10-fold cross validation constructs
122ten instances for each classifier, and each time we run
123:class:`~.FilteredLearner` a different set of features may be
124selected. ``Orange.evaluation.testing.cross_validation`` stores classifiers in
125``results`` variable, and :class:`~.FilteredLearner`
126returns a classifier that can tell which features it used, so the code
127to do all this is quite short.
128
129.. literalinclude:: code/selection-filtered-learner.py
130    :lines: 25-
131
132Running :download:`selection-filtered-learner.py <code/selection-filtered-learner.py>`
133with three features selected each time a learner is run gives the
134following result::
135
136    Learner      CA
137    bayes        0.903
138    filtered     0.956
139
140    Number of times features were used in cross-validation:
141     3 x el-salvador-aid
142     6 x synfuels-corporation-cutback
143     7 x adoption-of-the-budget-resolution
144    10 x physician-fee-freeze
145     4 x crime
146
147
148==========
149References
150==========
151
152* K. Kira and L. Rendell. A practical approach to feature selection. In
153  D. Sleeman and P. Edwards, editors, Proc. 9th Int'l Conf. on Machine
154  Learning, pages 249{256, Aberdeen, 1992. Morgan Kaufmann Publishers.
155
156* I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF.
157  In F. Bergadano and L. De Raedt, editors, Proc. European Conf. on Machine
158  Learning (ECML-94), pages  171-182. Springer-Verlag, 1994.
159
160* R. Kohavi, G. John: Wrappers for Feature Subset Selection, Artificial
161  Intelligence, 97 (1-2), pages 273-324, 1997
Note: See TracBrowser for help on using the repository browser.