orngFSS: Orange Feature Subset Selection Module
Module orngFSS implements several functions that support or may help design feature subset selection for classification problems. The guiding idea is that some machine learning methods may perform better if they learn only from a selected subset of "best" features. orngFSS mostly implements filter approaches, i.e., approaches were attributes scores are estimated prior to the modelling, that is, without knowing of which machine learning method will be used to construct a predictive model.
Functions
- attMeasure(data[, measure])
- Assesses the quality (score) of attributes using the
given scoring function (measure) on a data set data which
should contain a discrete class. Returns a sorted list of tuples
(attribute name, score). measure is an attribute quality
measure, which should be derived from
orange.MeasureAttributeand defaults toorange.MeasureAttribute_relief(k=20, m=50). - bestNAtts(scores, N)
- Returns the list of names of the N
highest ranked attributes from the scores list. List of
attribute measures (scores) is of the type as returned by
the function
attMeasure. - attsAboveThreshold(scores[, threshold])
- Returns the list of names of attributes that are listed in the list scores and have their score above threshold. The default value for threshold is 0.0.
- selectBestNAtts(data, scores, N)
- Constructs and returns a new data set that includes a class and only N best attributes from a list scores. data is used to pass an original data set.
- selectAttsAboveThresh(data, scores[, threshold])
- Constructs and returns a new data set that
includes a class and attributes from the list returned by function
attMeasurethat have the score above or equal to a specified threshold. data is used to pass an original data set. Parameter threshold is optional and defaults to 0.0. - filterRelieff(data[, measure[, margin]])
- Takes the data set data and a measure for
score of attributes measure. Repeats the process of estimating
attributes and removing the worst attribute if its measure is lower
than margin. Stops when no attribute score is below this
margin. The default for measure is
range.MeasureAttribute_relief(k=20, m=50), and margin defaults to 0.0 Notice that this filter procedure was originally designed for measures such as Relief, which are context dependent, i.e. removal of attributes may change the scores of other remaining attributes. Hence the need to re-estimate score every time an attribute is removed.
Classes
FilterAttsAboveThresh ([measure[, threshold]])- This is simply a wrapper around the function
selectAttsAboveThresh. It allows to create an object which stores filter's parameters and can be later called with the data to return the data set that includes only the selected attributes. measure is a function that returns a list of couples (attribute name, score), and it defaults toorange.MeasureAttribute_relief(k=20, m=50). The default threshold is 0.0. Some examples of how to use this class are:filter = orngFSS.FilterAttsAboveThresh(threshold=.15) new_data = filter(data) new_data = orngFSS.FilterAttsAboveThresh(data) new_data = orngFSS.FilterAttsAboveThresh(data, threshold=.1) new_data = orngFSS.FilterAttsAboveThresh(data, threshold=.1, measure=orange.MeasureAttribute_gini()) FilterBestNAtts ([measure[, n]])- Similarly to
FilterAttsAboveThresh, this is a wrapper around the functionselectBestNAtts. Measure and the number of attributes to retain are optional (the latter defaults to 5). ([measure[, margin]])FilterRelieff - Similarly to
FilterBestNAtts, this is a wrapper around the functionfilterRelieff. measure and margin are optional attributes, where measure defaults toorange.MeasureAttribute_relief(k=20, m=50)and margin to 0.0. ([baseLearner[, examples[, filter[, name]]]])FilteredLearner - Wraps a
baseLearner using a data filter, and returns the
corresponding learner. When such learner is presented a data set, data
is first filtered and then passed to
baseLearner. FilteredLearner comes handy when one
wants to test the schema of feature-subset-selection-and-learning by
some repetitive evaluation method, e.g., cross validation. Filter
defaults to orngFSS.FilterAttsAboveThresh with default
attributes. Here is an example of how to set such learner (build a
wrapper around naive Bayesian learner) and use it on a data set:
nb = orange.BayesLearner() learner = orngFSS.FilteredLearner(nb, filter=orngFSS.FilterBestNAtts(n=5), name='filtered') classifier = learner(data)
Examples
Score Estimation
Let us start with a simple script that reads the data, uses orngFSS.attMeasure to derive attribute scores and prints out these for the first three best scored attributes. Same scoring function is then used to report (only) on three best score attributes.
fss1.py (uses voting.tab)
The script should output something like:
Different Score Measures
The following script reports on gain ratio and relief attribute scores. Notice that for our data set the ranks of the attributes rather match well!
fss2.py (uses voting.tab)
Filter Approach for Machine Learning
Attribute scoring has at least two potential uses. One is informative (or descriptive): the data analyst can use attribute scoring to find "good" attributes and those that are irrelevant for given classification task. The other use is in improving the performance of machine learning by learning only from the data set that includes the most informative features. This so-called filter approach can boost the performance of learner both in terms of predictive accuracy, speed-up of induction, and simplicity of resulting models.
Following is a script that defines a new classifier that is based on naive Bayes and prior to learning selects five best attributes from the data set. The new classifier is wrapped-up in a special class (see Building your own learner lesson in Orange for Beginners). The script compares this filtered learner naive Bayes that uses a complete set of attributes.
fss3.py (uses voting.tab)
Interestingly, and somehow expected, feature subset selection helps. This is the output that we get:
... And a Much Simpler One
Although perhaps educational, we can do all of the above by
wrapping the learner using FilteredLearner, thus creating
an object that is assembled from data filter and a base learner. When
given the data, this learner uses attribute filter to construct a new
data set and base learner to construct a corresponding
classifier. Attribute filters should be of the type like
orngFSS.FilterAttsAboveThresh or
orngFSS.FilterBestNAtts that can be initialized with the
arguments and later presented with a data, returning new reduced data
set.
The following code fragment essentially replaces the bulk of code from previous example, and compares naive Bayesian classifier to the same classifier when only a single most important attribute is used:
from fss4.py (uses voting.tab)
Now, let's decide to retain three attributes (change the code in fss4.py accordingly!), but observe how many times
an attribute was used. Remember, 10-fold cross validation constructs
ten instances for each classifier, and each time we run
FilteredLearner a different set of attributes may be
selected. orngEval.CrossValidation stores classifiers in
results variable, and FilteredLearner
returns a classifier that can tell which attributes it used (how
convenient!), so the code to do all this is quite short:
from fss4.py (uses voting.tab)
Running fss4.py with three attributes selected each time a learner is run gives the following result:
Experiment yourself to see, if only one attribute is retained for classifier, which attribute was the one most frequently selected over all the ten cross-validation tests!
References
K. Kira and L. Rendell. A practical approach to feature selection. In D. Sleeman and P. Edwards, editors, Proc. 9th Int'l Conf. on Machine Learning, pages 249{256, Aberdeen, 1992. Morgan Kaufmann Publishers.
I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF. In F. Bergadano and L. De Raedt, editors, Proc. European Conf. on Machine Learning (ECML-94), pages 171{182. Springer-Verlag, 1994.
R. Kohavi, G. John: Wrappers for Feature Subset Selection, Artificial Intelligence, 97 (1-2), pages 273-324, 1997
