Changeset 8119:63e9307e92b5 in orange


Ignore:
Timestamp:
07/27/11 23:50:13 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
9834e57816ce4805e3f97791f7c1127c6eba237c
Message:

Work on Orange.feature.scoring

Location:
orange
Files:
1 added
9 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/feature/scoring.py

    r8115 r8119  
    1212prediction task. 
    1313 
    14 The following example computes feature scores with uses :obj:`measure_domain` 
    15 and prints out the three best features. 
     14The following example computes feature scores, both with 
     15:obj:`score_all` and by scoring each feature individually, and prints out  
     16the best three features.  
    1617 
    1718.. _scoring-all.py: code/scoring-all.py 
     
    2324The output:: 
    2425 
    25     Feature scores for best three features: 
     26    Feature scores for best three features (with score_all): 
    2627    0.613 physician-fee-freeze 
    27     0.255 adoption-of-the-budget-resolution 
     28    0.255 el-salvador-aid 
    2829    0.228 synfuels-corporation-cutback 
     30 
     31    Feature scores for best three features (scored individually): 
     32    0.613 physician-fee-freeze 
     33    0.255 el-salvador-aid 
     34    0.228 synfuels-corporation-cutback 
     35 
    2936 
    3037============ 
     
    3239============ 
    3340 
    34 Orange implements several methods for scoring relevance of features to 
    35 the class. All are subclasses of :obj:`Measure`. The most common compute 
    36 statistics on conditional distributions of class values given the feature 
    37 values; these are derived from :obj:`MeasureFromProbabilities`. 
     41Implemented methods for scoring relevances of features to the class 
     42are subclasses of :obj:`Measure`. Those that compute statistics on 
     43conditional distributions of class values given the feature values are 
     44derived from :obj:`MeasureFromProbabilities`. 
    3845 
    3946.. class:: Measure 
     
    4249    features it can handle and the required data. 
    4350 
     51    **Capabilities** 
     52 
    4453    .. attribute:: handles_discrete 
    4554     
     
    5261    .. attribute:: computes_thresholds 
    5362     
    54         Indicated whether the measure implements the :obj:`threshold_function`. 
     63        Indicates whether the measure implements the :obj:`threshold_function`. 
     64 
     65    **Input specification** 
    5566 
    5667    .. attribute:: needs 
    5768     
    58         The kind of data needed. Either 
    59  
    60         * :obj:`NeedsGenerator`; an instance generator (as, for example, 
    61           Relief), 
    62  
    63         * :obj:`NeedsDomainContingency;` needs 
    64           :obj:`Orange.statistics.contingency.Domain`, 
    65  
    66         * :obj:`NeedsContingency_Class`; needs the contingency 
    67           (:obj:`Orange.statistics.contingency.VarClass`), feature 
    68           distribution and the apriori class distribution (as most 
    69           measures). 
     69        The type of data needed: :obj:`NeedsGenerator`, :obj:`NeedsDomainContingency`, 
     70        or :obj:`NeedsContingency_Class`. 
     71 
     72    .. attribute:: NeedsGenerator 
     73 
     74        Constant. Indicates that the measure Needs an instance generator on the input (as, for example, the 
     75        :obj:`Relief` measure). 
     76 
     77    .. attribute:: NeedsDomainContingency 
     78 
     79        Constant. Indicates that the measure needs :obj:`Orange.statistics.contingency.Domain`. 
     80 
     81    .. attribute:: NeedsContingency_Class 
     82 
     83        Constant. Indicates, that the measure needs the contingency 
     84        (:obj:`Orange.statistics.contingency.VarClass`), feature 
     85        distribution and the apriori class distribution (as most 
     86        measures). 
     87 
     88    **Treatment of unknown values** 
    7089 
    7190    .. attribute:: unknowns_treatment 
     
    95114        Constant. Unknown values are treated as a separate value. 
    96115 
    97  
    98     .. method:: __call__(attribute, examples[, apriori_class_distribution][, weightID]) 
    99     .. method:: __call__(attribute, domain_contingency[, apriori_class_distribution]) 
    100     .. method:: __call__(contingency, class_distribution[, apriori_class_distribution]) 
    101  
    102         :param attribute: the choosen feature, either as a descriptor,  
     116    **Methods** 
     117 
     118    .. method:: __call__(attribute, instances[, apriori_class_distribution][, weightID]) 
     119 
     120        :param attribute: the chosen feature, either as a descriptor,  
    103121          index, or a name. 
    104122        :type attribute: :class:`Orange.data.variable.Variable` or int or string 
    105         :param examples: data. 
    106         :type examples: `Orange.data.Table` 
     123        :param instances: data. 
     124        :type instances: `Orange.data.Table` 
     125        :param weightID: id for meta-feature with weight. 
     126 
     127        Abstract. All measures need to support `__call__` with these 
     128        parameters.  Described below. 
     129 
     130    .. method:: __call__(attribute, domain_contingency[, apriori_class_distribution]) 
     131 
     132        :param attribute: the chosen feature, either as a descriptor,  
     133          index, or a name. 
     134        :type attribute: :class:`Orange.data.variable.Variable` or int or string 
     135        :param domain_contingency:  
     136        :type domain_contingency: :obj:`Orange.statistics.contingency.Domain` 
     137 
     138        Abstract. Described below. 
     139         
     140    .. method:: __call__(contingency, class_distribution[, apriori_class_distribution]) 
     141 
     142        :param contingency: 
     143        :type contingency: :obj:`Orange.statistics.contingency.VarClass` 
     144        :param class_distribution: distribution of the class 
     145          variable. If :obj:`unknowns_treatment` is :obj:`IgnoreUnknowns`, 
     146          it should be computed on instances where feature value is 
     147          defined. Otherwise, class distribution should be the overall 
     148          class distribution. 
     149        :type class_distribution:  
     150          :obj:`Orange.statistics.distribution.Distribution` 
    107151        :param apriori_class_distribution: Optional and most often 
    108152          ignored. Useful if the measure makes any probability estimates 
    109153          based on apriori class probabilities (such as the m-estimate). 
    110         :param weightID: id for meta-feature with weight. 
    111         :param domain_contingency: Not sure. 
    112         :type domain_contingency: :obj:`Orange.statistics.contingency.Domain` 
    113         :param distribution: Not sure. 
    114         :type distribution: :obj:`Orange.statistics.distribution.Distribution` 
    115  
    116154        :return: Feature score - the higher the value, the better the feature. 
    117155          If the quality cannot be measured, return :obj:`Measure.Rejected`. 
    118156        :rtype: float or :obj:`Measure.Rejected`. 
    119157 
    120         Abstract.  
    121         
    122         All measures need to support the first form, with the data on 
    123         the input. 
     158        Abstract. 
     159 
     160        Different forms of `__call__` enable optimization.  For instance, 
     161        if contingency matrix has already been computed, you can speed 
     162        up the computation by passing it to the measure (if it supports 
     163        that form - most do). Otherwise the measure will have to compute the 
     164        contingency itself. 
    124165 
    125166        Not all classes will accept all kinds of arguments. :obj:`Relief`, 
    126         for instance, cannot be computed from contingencies 
    127         alone. Besides, the feature and the class need to be of the 
    128         correct type for a particular measure. 
    129  
    130         Different forms of the call enable optimization.  For instance, 
    131         if contingency matrix has already been computed, you can speed 
    132         ab the computation by passing it to the measure (if it supports 
    133         that form - most do). Otherwise the measurea will compute the 
    134         contingency itself. 
    135  
    136         Data is given either as examples, contingency tables or distributions 
    137         for all attributes. In the latter form, what is given as 
    138         the class distribution depends upon what you do with unknown 
    139         values (if there are any).  If :obj:`unknowns_treatment` is 
    140         :obj:`IgnoreUnknowns`, the class distribution should be computed 
    141         on examples for which the feature value is defined. Otherwise, 
    142         class distribution should be the overall class distribution. 
    143  
     167        for instance, only supports the form with instances on the input. 
     168 
     169        The code sample below shows the use of :obj:`GainRatio` with 
     170        different call types. 
     171 
     172        .. literalinclude:: code/scoring-calls.py 
     173            :lines: 7- 
    144174 
    145175    .. method:: threshold_function(attribute, examples[, weightID]) 
     
    156186    .. method:: best_threshold 
    157187 
    158          
     188        Return the best threshold for binarization. Parameters? 
     189 
    159190 
    160191    The script below shows different ways to assess the quality of astigmatic, 
     
    201232 
    202233.. class:: MeasureFromProbabilities 
     234 
     235    Bases: :obj:`Measure` 
    203236 
    204237    Abstract base class for feature quality measures that can be 
     
    353386======================= 
    354387 
    355 :obj:`Relief` (described for classification) can be also used for regression. 
     388:obj:`Relief` can be also used for regression. 
    356389 
    357390.. index::  
     
    389422.. autofunction:: Orange.feature.scoring.merge_values 
    390423 
    391 .. autofunction:: Orange.feature.scoring.measure_domain 
     424.. autofunction:: Orange.feature.scoring.score_all 
    392425 
    393426========== 
     
    620653###### 
    621654# from orngFSS 
    622 def measure_domain(data, measure=Relief(k=20, m=50)): 
     655def score_all(data, measure=Relief(k=20, m=50)): 
    623656    """Assess the quality of features using the given measure and return 
    624657    a sorted list of tuples (feature name, measure). 
  • orange/Orange/feature/selection.py

    r8115 r8119  
    168168import Orange.core as orange 
    169169 
    170 from Orange.feature.scoring import measure_domain 
     170from Orange.feature.scoring import score_all 
    171171 
    172172# from orngFSS 
    173173def bestNAtts(scores, N): 
    174174    """Return the best N features (without scores) from the list returned 
    175     by :obj:`Orange.feature.scoring.measure_domain`. 
     175    by :obj:`Orange.feature.scoring.score_all`. 
    176176     
    177177    :param scores: a list such as returned by  
    178       :obj:`Orange.feature.scoring.measure_domain` 
     178      :obj:`Orange.feature.scoring.score_all` 
    179179    :type scores: list 
    180180    :param N: number of best features to select.  
     
    187187def attsAboveThreshold(scores, threshold=0.0): 
    188188    """Return features (without scores) from the list returned by 
    189     :obj:`Orange.feature.scoring.measure_domain` with score above or 
     189    :obj:`Orange.feature.scoring.score_all` with score above or 
    190190    equal to a specified threshold. 
    191191     
    192192    :param scores: a list such as one returned by 
    193       :obj:`Orange.feature.scoring.measure_domain` 
     193      :obj:`Orange.feature.scoring.score_all` 
    194194    :type scores: list 
    195195    :param threshold: score threshold for attribute selection. Defaults to 0. 
     
    208208    :type data: Orange.data.table 
    209209    :param scores: a list such as one returned by  
    210       :obj:`Orange.feature.scoring.measure_domain` 
     210      :obj:`Orange.feature.scoring.score_all` 
    211211    :type scores: list 
    212212    :param N: number of features to select 
     
    221221    """Construct and return a new set of examples that includes a class and  
    222222    features from the list returned by  
    223     :obj:`Orange.feature.scoring.measure_domain` that have the score above or  
     223    :obj:`Orange.feature.scoring.score_all` that have the score above or  
    224224    equal to a specified threshold. 
    225225     
     
    227227    :type data: Orange.data.table 
    228228    :param scores: a list such as one returned by 
    229       :obj:`Orange.feature.scoring.measure_domain`     
     229      :obj:`Orange.feature.scoring.score_all`     
    230230    :type scores: list 
    231231    :param threshold: score threshold for attribute selection. Defaults to 0. 
     
    256256     
    257257    """ 
    258     measl = measure_domain(data, measure) 
     258    measl = score_all(data, measure) 
    259259    while len(data.domain.attributes)>0 and measl[-1][1]<margin: 
    260260        data = selectBestNAtts(data, measl, len(data.domain.attributes)-1) 
    261261#        print 'remaining ', len(data.domain.attributes) 
    262         measl = measure_domain(data, measure) 
     262        measl = score_all(data, measure) 
    263263    return data 
    264264 
     
    307307 
    308308        """ 
    309         ma = measure_domain(data, self.measure) 
     309        ma = score_all(data, self.measure) 
    310310        return selectAttsAboveThresh(data, ma, self.threshold) 
    311311 
     
    330330        self.n = n 
    331331    def __call__(self, data): 
    332         ma = measure_domain(data, self.measure) 
     332        ma = score_all(data, self.measure) 
    333333        self.n = min(self.n, len(data.domain.attributes)) 
    334334        return selectBestNAtts(data, ma, self.n) 
  • orange/doc/Orange/rst/code/scoring-all.py

    r8115 r8119  
    22# Category:    feature scoring 
    33# Uses:        voting 
    4 # Referenced:  Orange.feature.html#scoring 
    5 # Classes:     Orange.feature.scoring.att_measure, Orange.features.scoring.GainRatio 
     4# Referenced:  Orange.feature.scoring 
     5# Classes:     Orange.feature.scoring.score_all, Orange.feature.scoring.Relief 
    66 
    77import Orange 
    88table = Orange.data.Table("voting") 
    99 
    10 print 'Feature scores for best three features:' 
    11 ma = Orange.feature.scoring.att_measure(table) 
    12 for m in ma[:3]: 
    13     print "%5.3f %s" % (m[1], m[0]) 
     10def print_best_3(ma): 
     11    for m in ma[:3]: 
     12        print "%5.3f %s" % (m[1], m[0]) 
     13 
     14print 'Feature scores for best three features (with score_all):' 
     15ma = Orange.feature.scoring.score_all(table) 
     16print_best_3(ma) 
     17 
     18print 
     19 
     20print 'Feature scores for best three features (scored individually):' 
     21meas = Orange.feature.scoring.Relief(k=20, m=50) 
     22mr = [ (a.name, meas(a, table)) for a in table.domain.attributes ] 
     23mr.sort(key=lambda x: -x[1]) #sort decreasingly by the score 
     24print_best_3(mr) 
     25 
     26 
     27 
     28 
  • orange/doc/Orange/rst/code/scoring-diff-measures.py

    r8115 r8119  
    33# Uses:        measure 
    44# Referenced:  Orange.feature.html#scoring 
    5 # Classes:     Orange.feature.scoring.measure_domain, Orange.features.scoring.Info, Orange.features.scoring.GainRatio, Orange.features.scoring.Gini, Orange.features.scoring.Relevance, Orange.features.scoring.Cost, Orange.features.scoring.Relief 
     5# Classes:     Orange.features.scoring.Info, Orange.features.scoring.GainRatio, Orange.features.scoring.Gini, Orange.features.scoring.Relevance, Orange.features.scoring.Cost, Orange.features.scoring.Relief 
    66 
    77import Orange 
  • orange/doc/Orange/rst/code/scoring-relief-gainRatio.py

    r8115 r8119  
    33# Uses:        voting 
    44# Referenced:  Orange.feature.html#scoring 
    5 # Classes:     Orange.feature.scoring.measure_domain, Orange.features.scoring.GainRatio 
     5# Classes:     Orange.feature.scoring.score_all, Orange.features.scoring.GainRatio 
    66 
    77import Orange 
     
    99 
    1010print 'Relief GainRt Feature' 
    11 ma_def = Orange.feature.scoring.measure_domain(table) 
     11ma_def = Orange.feature.scoring.score_all(table) 
    1212gr = Orange.feature.scoring.GainRatio() 
    13 ma_gr  = Orange.feature.scoring.measure_domain(table, gr) 
     13ma_gr  = Orange.feature.scoring.score_all(table, gr) 
    1414for i in range(5): 
    1515    print "%5.3f  %5.3f  %s" % (ma_def[i][1], ma_gr[i][1], ma_def[i][0]) 
  • orange/doc/Orange/rst/code/selection-bayes.py

    r8115 r8119  
    33# Uses:        voting 
    44# Referenced:  Orange.feature.html#selection 
    5 # Classes:     Orange.feature.scoring.measure_domain, Orange.feature.selection.bestNAtts 
     5# Classes:     Orange.feature.scoring.score_all, Orange.feature.selection.bestNAtts 
    66 
    77import Orange 
     
    2121       
    2222    def __call__(self, table, weight=None): 
    23         ma = Orange.feature.scoring.measure_domain(table) 
     23        ma = Orange.feature.scoring.score_all(table) 
    2424        filtered = Orange.feature.selection.selectBestNAtts(table, ma, self.N) 
    2525        model = Orange.classification.bayes.NaiveLearner(filtered) 
  • orange/doc/Orange/rst/code/selection-best3.py

    r8115 r8119  
    33# Uses:        voting 
    44# Referenced:  Orange.feature.html#selection 
    5 # Classes:     Orange.feature.scoring.measure_domain, Orange.feature.selection.bestNAtts 
     5# Classes:     Orange.feature.scoring.score_all, Orange.feature.selection.bestNAtts 
    66 
    77import Orange 
     
    99 
    1010n = 3 
    11 ma = Orange.feature.scoring.measure_domain(table) 
     11ma = Orange.feature.scoring.score_all(table) 
    1212best = Orange.feature.selection.bestNAtts(ma, n) 
    1313print 'Best %d features:' % n 
  • orange/fixes/fix_changed_names.py

    r8059 r8119  
    6565           "orange.MeasureAttribute_MSE": "Orange.feature.scoring.MSE", 
    6666 
    67            "orngFSS.attMeasure": "Orange.feature.scoring.attMeasure", 
     67           "orngFSS.attMeasure": "Orange.feature.scoring.score_all", 
    6868           "orngFSS.bestNAtts": "Orange.feature.selection.bestNAtts", 
    6969           "orngFSS.attsAbovethreshold": "Orange.feature.selection.attsAbovethreshold", 
  • orange/orngFSS.py

    r8115 r8119  
    77attsAbovethreshold = attsAboveThreshold 
    88 
    9 attMeasure = measure_domain 
     9attMeasure = score_all 
Note: See TracChangeset for help on using the changeset viewer.