Ignore:
Timestamp:
02/05/12 23:52:01 (2 years ago)
Author:
Miha Stajdohar <miha.stajdohar@…>
Branch:
default
rebase_source:
b5f888ffbfe612a2a2797cba1eba2810f35c322f
Message:

To Orange25.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/feature/selection.py

    r9349 r9645  
    66.. index:: feature selection 
    77 
    8 .. index::  
     8.. index:: 
    99   single: feature; feature selection 
    1010 
    11 Some machine learning methods may perform better if they learn only from a  
    12 selected subset of "best" features.  
    13  
    14 The performance of some machine learning method can be improved by learning  
    15 only from a selected subset of data, which includes the most informative or  
    16 "best" features. This so-called filter approaches can boost the performance  
    17 of learner both in terms of predictive accuracy, speed-up induction, and 
    18 simplicity of resulting models. Feature scores are estimated prior to the 
    19 modelling, that is, without knowing of which machine learning method will be 
     11Some machine learning methods perform better if they learn only from a 
     12selected subset of the most informative or "best" features. 
     13 
     14This so-called filter approach can boost the performance 
     15of learner in terms of predictive accuracy, speed-up induction and 
     16simplicity of resulting models. Feature scores are estimated before 
     17modeling, without knowing  which machine learning method will be 
    2018used to construct a predictive model. 
    2119 
    22 :download:`selection-best3.py <code/selection-best3.py>` (uses :download:`voting.tab <code/voting.tab>`): 
     20:download:`Example script:<code/selection-best3.py>` 
    2321 
    2422.. literalinclude:: code/selection-best3.py 
     
    3230    synfuels-corporation-cutback 
    3331 
    34 .. automethod:: Orange.feature.selection.FilterAttsAboveThresh 
    35  
    36 .. autoclass:: Orange.feature.selection.FilterAttsAboveThresh_Class 
     32.. autoclass:: Orange.feature.selection.FilterAboveThreshold 
    3733   :members: 
    3834 
    39 .. automethod:: Orange.feature.selection.FilterBestNAtts 
    40  
    41 .. autoclass:: Orange.feature.selection.FilterBestNAtts_Class 
     35.. autoclass:: Orange.feature.selection.FilterBestN 
    4236   :members: 
    4337 
    44 .. automethod:: Orange.feature.selection.FilterRelief 
    45  
    46 .. autoclass:: Orange.feature.selection.FilterRelief_Class 
     38.. autoclass:: Orange.feature.selection.FilterRelief 
    4739   :members: 
    4840 
     
    5547   :members: 
    5648 
    57 These functions support in the design of feature subset selection for 
     49These functions support the design of feature subset selection for 
    5850classification problems. 
    5951 
    60 .. automethod:: Orange.feature.selection.bestNAtts 
    61  
    62 .. automethod:: Orange.feature.selection.attsAboveThreshold 
    63  
    64 .. automethod:: Orange.feature.selection.selectBestNAtts 
    65  
    66 .. automethod:: Orange.feature.selection.selectAttsAboveThresh 
    67  
    68 .. automethod:: Orange.feature.selection.filterRelieff 
     52.. automethod:: Orange.feature.selection.best_n 
     53 
     54.. automethod:: Orange.feature.selection.above_threshold 
     55 
     56.. automethod:: Orange.feature.selection.select_best_n 
     57 
     58.. automethod:: Orange.feature.selection.select_above_threshold 
     59 
     60.. automethod:: Orange.feature.selection.select_relief 
    6961 
    7062.. rubric:: Examples 
    7163 
    72 Following is a script that defines a new classifier that is based 
    73 on naive Bayes and prior to learning selects five best features from 
    74 the data set. The new classifier is wrapped-up in a special class (see 
     64The following script defines a new Naive Bayes classifier, that 
     65selects five best features from the data set before learning. 
     66The new classifier is wrapped-up in a special class (see 
    7567<a href="../ofb/c_pythonlearner.htm">Building your own learner</a> 
    7668lesson in <a href="../ofb/default.htm">Orange for Beginners</a>). The 
    77 script compares this filtered learner naive Bayes that uses a complete 
     69script compares this filtered learner with one that uses a complete 
    7870set of features. 
    7971 
     
    165157from Orange.feature.scoring import score_all 
    166158 
    167 # from orngFSS 
    168 def bestNAtts(scores, N): 
     159def best_n(scores, N): 
    169160    """Return the best N features (without scores) from the list returned 
    170161    by :obj:`Orange.feature.scoring.score_all`. 
     
    180171    return map(lambda x:x[0], scores[:N]) 
    181172 
    182 def attsAboveThreshold(scores, threshold=0.0): 
     173bestNAtts = best_n 
     174 
     175def above_threshold(scores, threshold=0.0): 
    183176    """Return features (without scores) from the list returned by 
    184177    :obj:`Orange.feature.scoring.score_all` with score above or 
     
    196189    return map(lambda x:x[0], pairs) 
    197190 
    198 def selectBestNAtts(data, scores, N): 
     191attsAboveThreshold = above_threshold 
     192 
     193 
     194def select_best_n(data, scores, N): 
    199195    """Construct and return a new set of examples that includes a 
    200196    class and only N best features from a list scores. 
     
    210206 
    211207    """ 
    212     return data.select(bestNAtts(scores, N)+[data.domain.classVar.name]) 
    213  
    214  
    215 def selectAttsAboveThresh(data, scores, threshold=0.0): 
     208    return data.select(best_n(scores, N) + [data.domain.classVar.name]) 
     209 
     210selectBestNAtts = select_best_n 
     211 
     212 
     213def select_above_threshold(data, scores, threshold=0.0): 
    216214    """Construct and return a new set of examples that includes a class and  
    217215    features from the list returned by  
     
    229227   
    230228    """ 
    231     return data.select(attsAboveThreshold(scores, threshold)+[data.domain.classVar.name]) 
    232  
    233 def filterRelieff(data, measure=orange.MeasureAttribute_relief(k=20, m=50), margin=0): 
     229    return data.select(above_threshold(scores, threshold) + [data.domain.classVar.name]) 
     230 
     231selectAttsAboveThresh = select_above_threshold 
     232 
     233 
     234def select_relief(data, measure=orange.MeasureAttribute_relief(k=20, m=50), margin=0): 
    234235    """Take the data set and use an attribute measure to remove the worst  
    235236    scored attribute (those below the margin). Repeats, until no attribute has 
     
    252253    """ 
    253254    measl = score_all(data, measure) 
    254     while len(data.domain.attributes)>0 and measl[-1][1]<margin: 
    255         data = selectBestNAtts(data, measl, len(data.domain.attributes)-1) 
     255    while len(data.domain.attributes) > 0 and measl[-1][1] < margin: 
     256        data = (data, measl, len(data.domain.attributes) - 1) 
    256257#        print 'remaining ', len(data.domain.attributes) 
    257258        measl = score_all(data, measure) 
    258259    return data 
    259260 
    260 ############################################################################## 
    261 # wrappers 
    262  
    263 def FilterAttsAboveThresh(data=None, **kwds): 
    264     filter = apply(FilterAttsAboveThresh_Class, (), kwds) 
    265     if data: 
    266         return filter(data) 
    267     else: 
    268         return filter 
    269    
    270 class FilterAttsAboveThresh_Class: 
    271     """Stores filter's parameters and can be later called with the data to 
    272     return the data table with only selected features.  
    273      
    274     This class is used in the function :obj:`selectAttsAboveThresh`. 
    275      
    276     :param measure: an attribute measure (derived from  
    277       :obj:`Orange.feature.scoring.Measure`). Defaults to  
    278       :obj:`Orange.feature.scoring.Relief` for k=20 and m=50.   
     261select_relief = filterRelieff 
     262 
     263 
     264class FilterAboveThreshold(object): 
     265    """Store filter parameters and can be later called with the data to 
     266    return the data table with only selected features. 
     267 
     268    This class uses the function :obj:`select_above_threshold`. 
     269 
     270    :param measure: an attribute measure (derived from 
     271      :obj:`Orange.feature.scoring.Measure`). Defaults to 
     272      :obj:`Orange.feature.scoring.Relief` for k=20 and m=50. 
    279273    :param threshold: score threshold for attribute selection. Defaults to 0. 
    280274    :type threshold: float 
    281       
     275 
    282276    Some examples of how to use this class are:: 
    283277 
    284         filter = Orange.feature.selection.FilterAttsAboveThresh(threshold=.15) 
     278        filter = Orange.feature.selection.FilterAboveThreshold(threshold=.15) 
    285279        new_data = filter(data) 
    286         new_data = Orange.feature.selection.FilterAttsAboveThresh(data) 
    287         new_data = Orange.feature.selection.FilterAttsAboveThresh(data, threshold=.1) 
    288         new_data = Orange.feature.selection.FilterAttsAboveThresh(data, threshold=.1, 
     280        new_data = Orange.feature.selection.FilterAboveThreshold(data) 
     281        new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1) 
     282        new_data = Orange.feature.selection.FilterAboveThreshold(data, threshold=.1, 
    289283                   measure=Orange.feature.scoring.Gini()) 
    290284 
    291285    """ 
    292     def __init__(self, measure=orange.MeasureAttribute_relief(k=20, m=50),  
    293                threshold=0.0): 
     286    def __new__(cls, data=None, 
     287                measure=orange.MeasureAttribute_relief(k=20, m=50), 
     288                threshold=0.0): 
     289 
     290        if data is None: 
     291            self = object.__new__(cls, measure=measure, threshold=threshold) 
     292            return self 
     293        else: 
     294            self = cls(measure=measure, threshold=threshold) 
     295            return self(data) 
     296 
     297    def __init__(self, measure=orange.MeasureAttribute_relief(k=20, m=50), \ 
     298                 threshold=0.0): 
     299 
    294300        self.measure = measure 
    295301        self.threshold = threshold 
     
    297303    def __call__(self, data): 
    298304        """Take data and return features with scores above given threshold. 
    299          
     305 
    300306        :param data: an data table 
    301307        :type data: Orange.data.table 
     
    303309        """ 
    304310        ma = score_all(data, self.measure) 
    305         return selectAttsAboveThresh(data, ma, self.threshold) 
    306  
    307 def FilterBestNAtts(data=None, **kwds): 
    308     """Similarly to :obj:`FilterAttsAboveThresh`, wrap around class 
    309     :obj:`FilterBestNAtts_Class`. 
    310      
    311     :param measure: an attribute measure (derived from  
    312       :obj:`Orange.feature.scoring.Measure`). Defaults to  
    313       :obj:`Orange.feature.scoring.Relief` for k=20 and m=50.   
     311        return select_above_threshold(data, ma, self.threshold) 
     312 
     313FilterAttsAboveThresh = FilterAboveThreshold 
     314FilterAttsAboveThresh_Class = FilterAboveThreshold 
     315 
     316 
     317class FilterBestN(object): 
     318    """Store filter parameters and can be later called with the data to 
     319    return the data table with only selected features. 
     320 
     321    :param measure: an attribute measure (derived from 
     322      :obj:`Orange.feature.scoring.Measure`). Defaults to 
     323      :obj:`Orange.feature.scoring.Relief` for k=20 and m=50. 
    314324    :param n: number of best features to return. Defaults to 5. 
    315325    :type n: int 
    316326 
    317327    """ 
    318     filter = apply(FilterBestNAtts_Class, (), kwds) 
    319     if data: return filter(data) 
    320     else: return filter 
    321    
    322 class FilterBestNAtts_Class: 
     328    def __new__(cls, data=None, 
     329                measure=orange.MeasureAttribute_relief(k=20, m=50), 
     330                n=5): 
     331 
     332        if data is None: 
     333            self = object.__new__(cls, measure=measure, n=n) 
     334            return self 
     335        else: 
     336            self = cls(measure=measure, n=n) 
     337            return self(data) 
     338 
    323339    def __init__(self, measure=orange.MeasureAttribute_relief(k=20, m=50), n=5): 
    324340        self.measure = measure 
    325341        self.n = n 
     342 
    326343    def __call__(self, data): 
    327344        ma = score_all(data, self.measure) 
    328345        self.n = min(self.n, len(data.domain.attributes)) 
    329         return selectBestNAtts(data, ma, self.n) 
    330  
    331 def FilterRelief(data=None, **kwds): 
     346        return (data, ma, self.n) 
     347 
     348FilterBestNAtts = FilterBestN 
     349FilterBestNAtts_Class = FilterBestN 
     350 
     351class FilterRelief(object): 
    332352    """Similarly to :obj:`FilterBestNAtts`, wrap around class  
    333353    :obj:`FilterRelief_Class`. 
     
    339359    :type margin: float 
    340360 
    341     """     
    342     filter = apply(FilterRelief_Class, (), kwds) 
    343     if data: 
    344         return filter(data) 
    345     else: 
    346         return filter 
    347    
    348 class FilterRelief_Class: 
     361    """ 
     362    def __new__(cls, data=None, 
     363                measure=orange.MeasureAttribute_relief(k=20, m=50), 
     364                margin=0): 
     365 
     366        if data is None: 
     367            self = object.__new__(cls, measure=measure, margin=margin) 
     368            return self 
     369        else: 
     370            self = cls(measure=measure, margin=margin) 
     371            return self(data) 
     372 
    349373    def __init__(self, measure=orange.MeasureAttribute_relief(k=20, m=50), margin=0): 
    350374        self.measure = measure 
    351375        self.margin = margin 
     376 
    352377    def __call__(self, data): 
    353         return filterRelieff(data, self.measure, self.margin) 
     378        return select_relief(data, self.measure, self.margin) 
     379 
     380FilterRelief_Class = FilterRelief 
    354381 
    355382############################################################################## 
    356383# wrapped learner 
    357384 
    358 def FilteredLearner(baseLearner, examples = None, weight = None, **kwds): 
     385 
     386def FilteredLearner(baseLearner, examples=None, weight=None, **kwds): 
    359387    """Return the corresponding learner that wraps  
    360388    :obj:`Orange.classification.baseLearner` and a data selection method.  
     
    393421        fdata = self.filter(data) 
    394422        model = self.baseLearner(fdata, weight) 
    395         return FilteredClassifier(classifier = model, domain = model.domain) 
     423        return FilteredClassifier(classifier=model, domain=model.domain) 
    396424 
    397425class FilteredClassifier: 
    398426    def __init__(self, **kwds): 
    399427        self.__dict__.update(kwds) 
    400     def __call__(self, example, resultType = orange.GetValue): 
     428    def __call__(self, example, resultType=orange.GetValue): 
    401429        return self.classifier(example, resultType) 
    402430    def atts(self): 
    403         return self.domain.attributes   
     431        return self.domain.attributes 
Note: See TracChangeset for help on using the changeset viewer.