Changeset 8115:d812719712a1 in orange


Ignore:
Timestamp:
07/26/11 16:11:24 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
be840e95e5250d95513394f7278b48dcaf7096bf
Message:

Updates to feature scoring documentation. Also changed some names to underscore_separated.

Location:
orange
Files:
1 deleted
14 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/feature/__init__.py

    r8112 r8115  
    11""" 
    2  
    3 .. index:: feature 
    4  
    5 This module provides functionality for feature scoring, selection,  
    6 discretization, continuzation, imputation, construction and feature 
    7 interaction analysis. 
    8  
     2Feature scoring, selection, discretization, continuzation, imputation, 
     3construction and feature interaction analysis. 
    94""" 
    105 
  • orange/Orange/feature/scoring.py

    r8112 r8115  
    1212prediction task. 
    1313 
    14 The following example uses :obj:`attMeasure` to derive feature scores 
     14The following example computes feature scores with uses :obj:`measure_domain` 
    1515and prints out the three best features. 
    1616 
    1717.. _scoring-all.py: code/scoring-all.py 
    1818.. _voting.tab: code/voting.tab 
    19  
    20 `scoring-all.py`_ (uses `voting.tab`_): 
    2119 
    2220.. literalinclude:: code/scoring-all.py 
     
    6159 
    6260        * :obj:`NeedsGenerator`; an instance generator (as, for example, 
    63           Relief) 
    64  
    65         * :obj:`NeedsDomainContingency; needs 
    66           :obj:`Orange.statistics.contingency.Domain`,a 
     61          Relief), 
     62 
     63        * :obj:`NeedsDomainContingency;` needs 
     64          :obj:`Orange.statistics.contingency.Domain`, 
    6765 
    6866        * :obj:`NeedsContingency_Class`; needs the contingency 
     
    7472 
    7573        Not defined in :obj:`Measure` but defined in 
    76         classes that are able to treat unknown values. Possible values: 
    77          
    78         * ignored (:obj:`Measure.IgnoreUnknowns`); 
    79           examples for which the feature value is unknown are removed, 
    80  
    81         * punished (:obj:`Measure.ReduceByUnknown`); the feature quality is 
    82           reduced by the proportion of unknown values. For impurity measures 
    83           the impurity decreases only where the value is defined and stays  
    84           the same otherwise, 
    85  
    86         * imputed (:obj:`Measure.UnknownsToCommon`); undefined values are 
    87           replaced by the most common value, 
    88  
    89         * treated as a separate value (:obj:`Measure.UnknownsAsValue`). 
     74        classes that are able to treat unknown values. Either 
     75        :obj:`IgnoreUnknowns`, :obj:`ReduceByUnknown`. 
     76        :obj:`UnknownsToCommon`, or :obj:`UnknownsAsValue`. 
     77 
     78    .. attribute:: IgnoreUnknowns 
     79 
     80        Constant. Examples for which the feature value is unknown are removed. 
     81 
     82    .. attribute:: ReduceByUnknown 
     83 
     84        Constant. Features with unknown values are  
     85        punished. The feature quality is reduced by the proportion of 
     86        unknown values. For impurity measures the impurity decreases 
     87        only where the value is defined and stays the same otherwise, 
     88 
     89    .. attribute:: UnknownsToCommon 
     90 
     91        Constant. Undefined values are replaced by the most common value. 
     92 
     93    .. attribute:: UnknownsAsValue 
     94 
     95        Constant. Unknown values are treated as a separate value. 
     96 
    9097 
    9198    .. method:: __call__(attribute, examples[, apriori_class_distribution][, weightID]) 
     
    107114        :type distribution: :obj:`Orange.statistics.distribution.Distribution` 
    108115 
    109  
    110         Abstract. Return a float: the higher the value, the better the feature 
    111         If the quality cannot be measured, return :obj:`Measure.Rejected`.  
    112  
    113         All measures need to support the first form, with the data on the input. 
    114  
    115         Not all classes will accept all kinds of arguments. Relief, for instance, 
    116         cannot be computed from contingencies alone. Besides, the feature and 
    117         the class need to be of the correct type for a particular measure. 
     116        :return: Feature score - the higher the value, the better the feature. 
     117          If the quality cannot be measured, return :obj:`Measure.Rejected`. 
     118        :rtype: float or :obj:`Measure.Rejected`. 
     119 
     120        Abstract.  
     121        
     122        All measures need to support the first form, with the data on 
     123        the input. 
     124 
     125        Not all classes will accept all kinds of arguments. :obj:`Relief`, 
     126        for instance, cannot be computed from contingencies 
     127        alone. Besides, the feature and the class need to be of the 
     128        correct type for a particular measure. 
    118129 
    119130        Different forms of the call enable optimization.  For instance, 
     
    123134        contingency itself. 
    124135 
    125         Data is given either as examples (and, optionally, id for 
    126         meta-feature with weight), contingency tables or distributions 
     136        Data is given either as examples, contingency tables or distributions 
    127137        for all attributes. In the latter form, what is given as 
    128138        the class distribution depends upon what you do with unknown 
     
    135145    .. method:: threshold_function(attribute, examples[, weightID]) 
    136146     
    137         Abstract. Assess different binarizations of the continuous feature 
     147        Abstract.  
     148         
     149        Assess different binarizations of the continuous feature 
    138150        :obj:`attribute`.  Return a list of tuples, where the first 
    139151        element is a threshold (between two existing values), the second 
     
    147159 
    148160    The script below shows different ways to assess the quality of astigmatic, 
    149     tear rate and the first feature (whichever it is) in the dataset lenses. 
     161    tear rate and the first feature in the dataset lenses. 
    150162 
    151163    .. literalinclude:: code/scoring-info-lenses.py 
     
    159171        0.548794984818 
    160172 
    161     You shouldn't use this shortcut with ReliefF, though; see the explanation 
    162     in the section on ReliefF. 
    163  
    164     XXXXXXXX It is also possible to assess the quality of features that do not exist 
    165     in the features. For instance, you can assess the quality of discretized 
    166     features without constructing a new domain and dataset that would include 
    167     them. 
    168  
    169     `scoring-info-iris.py`_ (uses `iris.tab`_): 
     173    You shouldn't use this with :obj:`Relief`; see :obj:`Relief` for the explanation. 
     174 
     175    It is also possible to score features that are not  
     176    in the domain. For instance, you can score discretized 
     177    features on the fly: 
    170178 
    171179    .. literalinclude:: code/scoring-info-iris.py 
    172180        :lines: 7-11 
    173181 
    174     The quality of the new feature d1 is assessed on data, which does not 
    175     include the new feature at all. (Note that ReliefF won't do that since 
    176     it would be too slow. ReliefF requires the feature to be present in the 
    177     dataset.) 
    178  
    179     Finally, you can compute the quality of meta-features. The following 
    180     script adds a meta-feature to an example table, initializes it to random 
    181     values and measures its information gain. 
    182  
    183     `scoring-info-lenses.py`_ (uses `lenses.tab`_): 
    184  
    185     .. literalinclude:: code/scoring-info-lenses.py 
    186         :lines: 54- 
     182    Note that this is not possible with :obj:`Relief`, as it would be too slow. 
    187183 
    188184    To show the computation of thresholds, we shall use the Iris data set. 
     
    210206    from having to compute the contingency matrix by defining the first two 
    211207    forms of call operator. (Well, that's not something you need to know if 
    212     you only work in Python.) Additional feature of this class is that you can 
    213     set probability estimators. If none are given, probabilities and 
    214     conditional probabilities of classes are estimated by relative frequencies. 
     208    you only work in Python.) 
    215209 
    216210    .. attribute:: unknowns_treatment 
    217211      
    218         Defines what to do with unknown values. See the possibilities described above. 
     212        See :obj:`Measure.unknowns_treatment`. 
    219213 
    220214    .. attribute:: estimator_constructor 
    221215    .. attribute:: conditional_estimator_constructor 
    222216     
    223         The classes that are used to estimate unconditional and conditional 
    224         probabilities of classes, respectively. You can set this to, for instance,  
    225         :obj:`ProbabilityEstimatorConstructor_m` and  
    226         :obj:`ConditionalProbabilityEstimatorConstructor_ByRows` 
    227         (with estimator constructor again set to  
     217        The classes that are used to estimate unconditional and 
     218        conditional probabilities of classes, respectively. You can set 
     219        this to, for instance, :obj:`ProbabilityEstimatorConstructor_m` 
     220        and :obj:`ConditionalProbabilityEstimatorConstructor_ByRows` 
     221        (with estimator constructor again set to 
    228222        :obj:`ProbabilityEstimatorConstructor_m`), respectively. 
     223        Both default to relative frequencies. 
    229224 
    230225=========================== 
     
    232227=========================== 
    233228 
    234 This script scores features with gain ratio and relief. 
    235  
    236 `scoring-relief-gainRatio.py`_ (uses `voting.tab`_): 
     229This script uses :obj:`GainRatio` and :obj:`Relief`. 
    237230 
    238231.. literalinclude:: code/scoring-relief-gainRatio.py 
    239232    :lines: 7- 
    240233 
    241 Notice that on this data the ranks of features match rather well:: 
     234Notice that on this data the ranks of features match:: 
    242235     
    243236    Relief GainRt Feature 
     
    248241    0.166  0.345  adoption-of-the-budget-resolution 
    249242 
    250 See  `scoring-info-lenses.py`_, `scoring-info-iris.py`_, 
    251 `scoring-diff-measures.py`_ and `scoring-regression.py`_ 
    252 for examples on their use. 
    253  
    254 Found in Orange: 
    255 'MeasureAttribute_IM', 'MeasureAttribute_chiSquare', 'MeasureAttribute_gainRatioA', 'MeasureAttribute_logOddsRatio', 'MeasureAttribute_splitGain' 
     243Undocumented: MeasureAttribute_IM, MeasureAttribute_chiSquare, MeasureAttribute_gainRatioA, MeasureAttribute_logOddsRatio, MeasureAttribute_splitGain. 
    256244 
    257245.. index::  
     
    331319    .. attribute:: check_cached_data 
    332320     
    333         Check if the cached data changed. Defaults to True. Best left alone. 
    334  
    335     ReliefF is slow since it needs to find k nearest 
    336     neighbours for each of m reference examples. 
    337     Since we normally compute ReliefF for all features in the dataset, 
    338     :obj:`Relief` caches the results. When it is called to compute a quality of 
    339     certain feature, it computes qualities for all features in the dataset. 
    340     When called again, it uses the stored results if the data has not changeddomain 
    341     is still the same and the example table has not changed. Checking is done by 
    342     comparing the data table version :obj:`Orange.data.Table` for details) and then 
    343     computing a checksum of the data and comparing it with the previous checksum. 
    344     The latter can take some time on large tables, so you may want to disable it 
    345     by setting `checkCachedData` to :obj:`False`. In most cases it will do no harm, 
    346     except when the data is changed in such a way that it passed unnoticed by the  
    347     version' control, in which cases the computed ReliefFs can be false. Hence: 
    348     disable it if you know that the data does not change or if you know what kind 
    349     of changes are detected by the version control. 
    350  
    351     Caching will only have an effect if you use the same instance for all 
    352     features in the domain. So, don't do this:: 
     321        Check if the cached data is changed with data checksum. Slow 
     322        on large tables.  Defaults to True. Disable it if you know that 
     323        the data will not change. 
     324 
     325    ReliefF is slow since it needs to find k nearest neighbours for each 
     326    of m reference examples.  As we normally compute ReliefF for all 
     327    features in the dataset, :obj:`Relief` caches the results. When called 
     328    to score a certain feature, it computes all feature scores. 
     329    When called again, it uses the stored results if the domain and the 
     330    data table have not changed (data table version and the data checksum 
     331    are compared). Caching will only work if you use the same instance. 
     332    So, don't do this:: 
    353333 
    354334        for attr in data.domain.attributes: 
    355335            print Orange.feature.scoring.Relief(attr, data) 
    356336 
    357     In this script, cached data dies together with the instance of :obj:`Relief`, 
    358     which is constructed and destructed for each feature separately. It's way 
    359     faster to go like this:: 
     337    But this:: 
    360338 
    361339        meas = Orange.feature.scoring.Relief() 
     
    363341            print meas(attr, data) 
    364342 
    365     When called for the first time, meas will compute ReliefF for all features 
    366     and the subsequent calls simply return the stored data. 
    367  
    368343    Class :obj:`Relief` works on discrete and continuous classes and thus  
    369344    implements functionality of algorithms ReliefF and RReliefF. 
    370345 
    371346    .. note:: 
    372        ReliefF can also compute the threshold function, that is, the feature 
     347       Relief can also compute the threshold function, that is, the feature 
    373348       quality at different thresholds for binarization. 
    374349 
     
    399374============ 
    400375 
    401 .. autoclass:: Orange.feature.scoring.OrderAttributesByMeasure 
     376.. autoclass:: Orange.feature.scoring.OrderAttributes 
    402377   :members: 
    403378 
    404 .. automethod:: Orange.feature.scoring.MeasureAttribute_Distance 
    405  
    406 .. autoclass:: Orange.feature.scoring.MeasureAttribute_DistanceClass 
     379.. autofunction:: Orange.feature.scoring.Distance 
     380 
     381.. autoclass:: Orange.feature.scoring.DistanceClass 
    407382   :members: 
    408383    
    409 .. automethod:: Orange.feature.scoring.MeasureAttribute_MDL 
    410  
    411 .. autoclass:: Orange.feature.scoring.MeasureAttribute_MDLClass 
     384.. autofunction:: Orange.feature.scoring.MDL 
     385 
     386.. autoclass:: Orange.feature.scoring.MDLClass 
    412387   :members: 
    413388 
    414 .. automethod:: Orange.feature.scoring.mergeAttrValues 
    415  
    416 .. automethod:: Orange.feature.scoring.attMeasure 
     389.. autofunction:: Orange.feature.scoring.merge_values 
     390 
     391.. autofunction:: Orange.feature.scoring.measure_domain 
    417392 
    418393========== 
     
    438413 
    439414import Orange.core as orange 
     415import Orange.misc 
    440416 
    441417from orange import MeasureAttribute as Measure 
     
    452428###### 
    453429# from orngEvalAttr.py 
    454 class OrderAttributesByMeasure: 
    455     """Construct an instance that orders features by their scores. 
     430class OrderAttributes: 
     431    """Orders features by their scores. 
    456432     
    457433    .. attribute::  measure 
     
    483459        return [x[0] for x in measured] 
    484460 
    485 def MeasureAttribute_Distance(attr=None, data=None): 
    486     """Instantiate :obj:`MeasureAttribute_DistanceClass` and use it to return 
     461def Distance(attr=None, data=None): 
     462    """Instantiate :obj:`DistanceClass` and use it to return 
    487463    the score of a given feature on given data. 
    488464     
     
    494470     
    495471    """ 
    496     m = MeasureAttribute_DistanceClass() 
     472    m = DistanceClass() 
    497473    if attr != None and data != None: 
    498474        return m(attr, data) 
     
    500476        return m 
    501477 
    502 class MeasureAttribute_DistanceClass(orange.MeasureAttribute): 
     478class DistanceClass(Measure): 
    503479    """The 1-D feature distance measure described in Kononenko.""" 
    504480 
    505     def __call__(self, attr, data, aprioriDist=None, weightID=None): 
     481    @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"}) 
     482    def __call__(self, attr, data, apriori_dist=None, weightID=None): 
    506483        """Take :obj:`Orange.data.table` data table and score the given  
    507484        :obj:`Orange.data.variable`. 
     
    513490        :type data: Orange.data.table 
    514491 
    515         :param aprioriDist:  
    516         :type aprioriDist: 
     492        :param apriori_dist:  
     493        :type apriori_dist: 
    517494         
    518495        :param weightID: meta feature used to weight individual data instances 
     
    535512            return 0 
    536513 
    537 def MeasureAttribute_MDL(attr=None, data=None): 
    538     """Instantiate :obj:`MeasureAttribute_MDLClass` and use it n given data to 
     514def MDL(attr=None, data=None): 
     515    """Instantiate :obj:`MDLClass` and use it n given data to 
    539516    return the feature's score.""" 
    540     m = MeasureAttribute_MDLClass() 
     517    m = MDLClass() 
    541518    if attr != None and data != None: 
    542519        return m(attr, data) 
     
    544521        return m 
    545522 
    546 class MeasureAttribute_MDLClass(orange.MeasureAttribute): 
     523class MDLClass(Measure): 
    547524    """Score feature based on the minimum description length principle.""" 
    548525 
    549     def __call__(self, attr, data, aprioriDist=None, weightID=None): 
     526    @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"}) 
     527    def __call__(self, attr, data, apriori_dist=None, weightID=None): 
    550528        """Take :obj:`Orange.data.table` data table and score the given  
    551529        :obj:`Orange.data.variable`. 
     
    557535        :type data: Orange.data.table 
    558536 
    559         :param aprioriDist:  
    560         :type aprioriDist: 
     537        :param apriori_dist:  
     538        :type apriori_dist: 
    561539         
    562540        :param weightID: meta feature used to weight individual data instances 
     
    598576    return ret 
    599577 
    600 def mergeAttrValues(data, attrList, attrMeasure, removeUnusedValues = 1): 
     578 
     579@Orange.misc.deprecated_keywords({"attrList": "attr_list", "attrMeasure": "attr_measure", "removeUnusedValues": "remove_unused_values"}) 
     580def merge_values(data, attr_list, attr_measure, remove_unused_values = 1): 
    601581    import orngCI 
    602     #data = data.select([data.domain[attr] for attr in attrList] + [data.domain.classVar]) 
    603     newData = data.select(attrList + [data.domain.classVar]) 
    604     newAttr = orngCI.FeatureByCartesianProduct(newData, attrList)[0] 
     582    #data = data.select([data.domain[attr] for attr in attr_list] + [data.domain.classVar]) 
     583    newData = data.select(attr_list + [data.domain.class_var]) 
     584    newAttr = orngCI.FeatureByCartesianProduct(newData, attr_list)[0] 
    605585    dist = orange.Distribution(newAttr, newData) 
    606586    activeValues = [] 
    607587    for i in range(len(newAttr.values)): 
    608588        if dist[newAttr.values[i]] > 0: activeValues.append(i) 
    609     currScore = attrMeasure(newAttr, newData) 
     589    currScore = attr_measure(newAttr, newData) 
    610590    while 1: 
    611591        bestScore, bestMerge = currScore, None 
    612592        for i1, ind1 in enumerate(activeValues): 
    613             oldInd1 = newAttr.getValueFrom.lookupTable[ind1] 
     593            oldInd1 = newAttr.get_value_from.lookupTable[ind1] 
    614594            for ind2 in activeValues[:i1]: 
    615                 newAttr.getValueFrom.lookupTable[ind1] = ind2 
    616                 score = attrMeasure(newAttr, newData) 
     595                newAttr.get_value_from.lookupTable[ind1] = ind2 
     596                score = attr_measure(newAttr, newData) 
    617597                if score >= bestScore: 
    618598                    bestScore, bestMerge = score, (ind1, ind2) 
    619                 newAttr.getValueFrom.lookupTable[ind1] = oldInd1 
     599                newAttr.get_value_from.lookupTable[ind1] = oldInd1 
    620600 
    621601        if bestMerge: 
    622602            ind1, ind2 = bestMerge 
    623603            currScore = bestScore 
    624             for i, l in enumerate(newAttr.getValueFrom.lookupTable): 
     604            for i, l in enumerate(newAttr.get_value_from.lookupTable): 
    625605                if not l.isSpecial() and int(l) == ind1: 
    626                     newAttr.getValueFrom.lookupTable[i] = ind2 
     606                    newAttr.get_value_from.lookupTable[i] = ind2 
    627607            newAttr.values[ind2] = newAttr.values[ind2] + "+" + newAttr.values[ind1] 
    628608            del activeValues[activeValues.index(ind1)] 
     
    630610            break 
    631611 
    632     if not removeUnusedValues: 
     612    if not remove_unused_values: 
    633613        return newAttr 
    634614 
    635615    reducedAttr = orange.EnumVariable(newAttr.name, values = [newAttr.values[i] for i in activeValues]) 
    636     reducedAttr.getValueFrom = newAttr.getValueFrom 
    637     reducedAttr.getValueFrom.classVar = reducedAttr 
     616    reducedAttr.get_value_from = newAttr.get_value_from 
     617    reducedAttr.get_value_from.class_var = reducedAttr 
    638618    return reducedAttr 
    639619 
    640620###### 
    641621# from orngFSS 
    642 def attMeasure(data, measure=Relief(k=20, m=50)): 
     622def measure_domain(data, measure=Relief(k=20, m=50)): 
    643623    """Assess the quality of features using the given measure and return 
    644624    a sorted list of tuples (feature name, measure). 
     
    647627    :type data: :obj:`Orange.data.table` 
    648628    :param measure:  feature scoring function. Derived from 
    649       :obj:`Orange.feature.scoring.Measure`. Defaults to Defaults to  
     629      :obj:`Orange.feature.scoring.Measure`. Defaults to  
    650630      :obj:`Orange.feature.scoring.Relief` with k=20 and m=50. 
    651631    :type measure: :obj:`Orange.feature.scoring.Measure`  
  • orange/Orange/feature/selection.py

    r8112 r8115  
    168168import Orange.core as orange 
    169169 
    170 from Orange.feature.scoring import attMeasure 
     170from Orange.feature.scoring import measure_domain 
    171171 
    172172# from orngFSS 
    173173def bestNAtts(scores, N): 
    174174    """Return the best N features (without scores) from the list returned 
    175     by function :obj:`Orange.feature.scoring.attMeasure`. 
    176      
    177     :param scores: a list such as one returned by  
    178       :obj:`Orange.feature.scoring.attMeasure` 
     175    by :obj:`Orange.feature.scoring.measure_domain`. 
     176     
     177    :param scores: a list such as returned by  
     178      :obj:`Orange.feature.scoring.measure_domain` 
    179179    :type scores: list 
    180180    :param N: number of best features to select.  
     
    187187def attsAboveThreshold(scores, threshold=0.0): 
    188188    """Return features (without scores) from the list returned by 
    189     :obj:`Orange.feature.scoring.attMeasure` with score above or 
     189    :obj:`Orange.feature.scoring.measure_domain` with score above or 
    190190    equal to a specified threshold. 
    191191     
    192192    :param scores: a list such as one returned by 
    193       :obj:`Orange.feature.scoring.attMeasure` 
     193      :obj:`Orange.feature.scoring.measure_domain` 
    194194    :type scores: list 
    195195    :param threshold: score threshold for attribute selection. Defaults to 0. 
     
    208208    :type data: Orange.data.table 
    209209    :param scores: a list such as one returned by  
    210       :obj:`Orange.feature.scoring.attMeasure` 
     210      :obj:`Orange.feature.scoring.measure_domain` 
    211211    :type scores: list 
    212212    :param N: number of features to select 
     
    221221    """Construct and return a new set of examples that includes a class and  
    222222    features from the list returned by  
    223     :obj:`Orange.feature.scoring.attMeasure` that have the score above or  
     223    :obj:`Orange.feature.scoring.measure_domain` that have the score above or  
    224224    equal to a specified threshold. 
    225225     
     
    227227    :type data: Orange.data.table 
    228228    :param scores: a list such as one returned by 
    229       :obj:`Orange.feature.scoring.attMeasure`     
     229      :obj:`Orange.feature.scoring.measure_domain`     
    230230    :type scores: list 
    231231    :param threshold: score threshold for attribute selection. Defaults to 0. 
     
    256256     
    257257    """ 
    258     measl = attMeasure(data, measure) 
     258    measl = measure_domain(data, measure) 
    259259    while len(data.domain.attributes)>0 and measl[-1][1]<margin: 
    260260        data = selectBestNAtts(data, measl, len(data.domain.attributes)-1) 
    261261#        print 'remaining ', len(data.domain.attributes) 
    262         measl = attMeasure(data, measure) 
     262        measl = measure_domain(data, measure) 
    263263    return data 
    264264 
     
    307307 
    308308        """ 
    309         ma = attMeasure(data, self.measure) 
     309        ma = measure_domain(data, self.measure) 
    310310        return selectAttsAboveThresh(data, ma, self.threshold) 
    311311 
     
    330330        self.n = n 
    331331    def __call__(self, data): 
    332         ma = attMeasure(data, self.measure) 
     332        ma = measure_domain(data, self.measure) 
    333333        self.n = min(self.n, len(data.domain.attributes)) 
    334334        return selectBestNAtts(data, ma, self.n) 
  • orange/doc/Orange/rst/code/scoring-all.py

    r8042 r8115  
    33# Uses:        voting 
    44# Referenced:  Orange.feature.html#scoring 
    5 # Classes:     Orange.feature.scoring.attMeasure, Orange.features.scoring.GainRatio 
     5# Classes:     Orange.feature.scoring.att_measure, Orange.features.scoring.GainRatio 
    66 
    77import Orange 
     
    99 
    1010print 'Feature scores for best three features:' 
    11 ma = Orange.feature.scoring.attMeasure(table) 
     11ma = Orange.feature.scoring.att_measure(table) 
    1212for m in ma[:3]: 
    1313    print "%5.3f %s" % (m[1], m[0]) 
  • orange/doc/Orange/rst/code/scoring-diff-measures.py

    r8042 r8115  
    33# Uses:        measure 
    44# Referenced:  Orange.feature.html#scoring 
    5 # Classes:     Orange.feature.scoring.attMeasure, Orange.features.scoring.Info, Orange.features.scoring.GainRatio, Orange.features.scoring.Gini, Orange.features.scoring.Relevance, Orange.features.scoring.Cost, Orange.features.scoring.Relief 
     5# Classes:     Orange.feature.scoring.measure_domain, Orange.features.scoring.Info, Orange.features.scoring.GainRatio, Orange.features.scoring.Gini, Orange.features.scoring.Relevance, Orange.features.scoring.Cost, Orange.features.scoring.Relief 
    66 
    77import Orange 
     
    2424    print fstr % (("- no unknowns:",) + tuple([meas(i, table) for i in range(attrs)])) 
    2525 
    26     meas.unknownsTreatment = meas.IgnoreUnknowns 
     26    meas.unknowns_treatment = meas.IgnoreUnknowns 
    2727    print fstr % (("- ignore unknowns:",) + tuple([meas(i, table2) for i in range(attrs)])) 
    2828 
    29     meas.unknownsTreatment = meas.ReduceByUnknowns 
     29    meas.unknowns_treatment = meas.ReduceByUnknowns 
    3030    print fstr % (("- reduce unknowns:",) + tuple([meas(i, table2) for i in range(attrs)])) 
    3131 
    32     meas.unknownsTreatment = meas.UnknownsToCommon 
     32    meas.unknowns_treatment = meas.UnknownsToCommon 
    3333    print fstr % (("- unknowns to common:",) + tuple([meas(i, table2) for i in range(attrs)])) 
    3434 
    35     meas.unknownsTreatment = meas.UnknownsAsValue 
     35    meas.unknowns_treatment = meas.UnknownsAsValue 
    3636    print fstr % (("- unknowns as value:",) + tuple([meas(i, table2) for i in range(attrs)])) 
    3737    print 
  • orange/doc/Orange/rst/code/scoring-info-iris.py

    r8042 r8115  
    1212 
    1313meas = Orange.feature.scoring.Relief() 
    14 for t in meas.thresholdFunction("petal length", table): 
     14for t in meas.threshold_function("petal length", table): 
    1515    print "%5.3f: %5.3f" % t 
    1616 
    17 thresh, score, distr = meas.bestThreshold("petal length", table) 
     17thresh, score, distr = meas.best_threshold("petal length", table) 
    1818print "\nBest threshold: %5.3f (score %5.3f)" % (thresh, score) 
  • orange/doc/Orange/rst/code/scoring-info-lenses.py

    r8042 r8115  
    55# Classes:     Orange.feature.scoring.Measure, Orange.features.scoring.Info 
    66 
    7 import Orange 
    8 import random 
     7import Orange, random 
     8 
    99table = Orange.data.Table("lenses") 
    1010 
     
    1414print "Information gain of 'astigmatic': %6.4f" % meas(astigm, table) 
    1515 
    16 classdistr = Orange.data.value.Distribution(table.domain.classVar, table) 
    17 cont = Orange.probability.distributions.ContingencyAttrClass("tear_rate", table) 
     16classdistr = Orange.statistics.distribution.Distribution(table.domain.class_var, table) 
     17cont = Orange.statistics.contingency.VarClass("tear_rate", table) 
    1818print "Information gain of 'tear_rate': %6.4f" % meas(cont, classdistr) 
    1919 
    20 dcont = Orange.probability.distributions.DomainContingency(table) 
     20dcont = Orange.statistics.contingency.Domain(table) 
    2121print "Information gain of the first attribute: %6.4f" % meas(0, dcont) 
    2222print 
     
    3838print 
    3939 
    40 dcont = Orange.probability.distributions.DomainContingency(table) 
     40dcont = Orange.statistics.contingency.Domain(table) 
    4141print "Computing information gain from DomainContingency" 
    4242print fstr % (("- by attribute number:",) + tuple([meas(i, dcont) for i in range(attrs)])) 
     
    4646 
    4747print "Computing information gain from DomainContingency" 
    48 cdist = Orange.data.value.Distribution(table.domain.classVar, table) 
    49 print fstr % (("- by attribute number:",) + tuple([meas(Orange.probability.distributions.ContingencyAttrClass(i, table), cdist) for i in range(attrs)])) 
    50 print fstr % (("- by attribute name:",) + tuple([meas(Orange.probability.distributions.ContingencyAttrClass(i, table), cdist) for i in names])) 
    51 print fstr % (("- by attribute descriptor:",) + tuple([meas(Orange.probability.distributions.ContingencyAttrClass(i, table), cdist) for i in table.domain.attributes])) 
     48cdist = Orange.statistics.distribution.Distribution(table.domain.class_var, table) 
     49print fstr % (("- by attribute number:",) + tuple([meas(Orange.statistics.contingency.VarClass(i, table), cdist) for i in range(attrs)])) 
     50print fstr % (("- by attribute name:",) + tuple([meas(Orange.statistics.contingency.VarClass(i, table), cdist) for i in names])) 
     51print fstr % (("- by attribute descriptor:",) + tuple([meas(Orange.statistics.contingency.VarClass(i, table), cdist) for i in table.domain.attributes])) 
    5252print 
    5353 
    5454values = ["v%i" % i for i in range(len(table.domain[2].values)*len(table.domain[3].values))] 
    5555cartesian = Orange.data.variable.Discrete("cart", values = values) 
    56 cartesian.getValueFrom = Orange.classification.lookup.ClassifierByLookupTable(cartesian, table.domain[2], table.domain[3], values) 
     56cartesian.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(cartesian, table.domain[2], table.domain[3], values) 
    5757 
    5858print "Information gain of Cartesian product of %s and %s: %6.4f" % (table.domain[2].name, table.domain[3].name, meas(cartesian, table)) 
    5959 
    6060mid = Orange.core.newmetaid() 
    61 table.domain.addmeta(mid, Orange.data.variable.Discrete(values = ["v0", "v1"])) 
    62 table.addMetaAttribute(mid) 
     61table.domain.add_meta(mid, Orange.data.variable.Discrete(values = ["v0", "v1"])) 
     62table.add_meta_attribute(mid) 
    6363 
    6464rg = random.Random() 
    6565rg.seed(0) 
    6666for ex in table: 
    67     ex[mid] = Orange.data.value.Value(rg.randint(0, 1)) 
     67    ex[mid] = Orange.data.Value(rg.randint(0, 1)) 
    6868 
    6969print "Information gain for a random meta attribute: %6.4f" % meas(mid, table) 
  • orange/doc/Orange/rst/code/scoring-regression.py

    r8042 r8115  
    55# Classes:     Orange.feature.scoring.MSE 
    66 
    7 import Orange 
    8 import random 
     7import Orange, random 
     8 
    99data = Orange.data.Table("measure-c") 
    1010 
     
    2424    print fstr % (("- no unknowns:",) + tuple([meas(i, data) for i in range(attrs)])) 
    2525 
    26     meas.unknownsTreatment = meas.IgnoreUnknowns 
     26    meas.unknowns_treatment = meas.IgnoreUnknowns 
    2727    print fstr % (("- ignore unknowns:",) + tuple([meas(i, data2) for i in range(attrs)])) 
    2828 
    29     meas.unknownsTreatment = meas.ReduceByUnknowns 
     29    meas.unknowns_treatment = meas.ReduceByUnknowns 
    3030    print fstr % (("- reduce unknowns:",) + tuple([meas(i, data2) for i in range(attrs)])) 
    3131 
    32     meas.unknownsTreatment = meas.UnknownsToCommon 
     32    meas.unknowns_treatment = meas.UnknownsToCommon 
    3333    print fstr % (("- unknowns to common:",) + tuple([meas(i, data2) for i in range(attrs)])) 
    3434    print 
  • orange/doc/Orange/rst/code/scoring-relief-caching.py

    r7510 r8115  
    11# Description: Shows why ReliefF needs to check the cached neighbours 
    2 # Category:    feature scoring 
     2# Category:    statistics 
     3# Classes:     MeasureAttribute_relief 
    34# Uses:        iris 
    4 # Referenced:  Orange.feature.html#scoring 
    5 # Classes:     Orange.feature.scoring.Relief 
     5# Referenced:  MeasureAttribute.htm 
    66 
    77import orange 
     
    99 
    1010r1 = orange.MeasureAttribute_relief() 
    11 r2 = orange.MeasureAttribute_relief(checkCachedData = False) 
     11r2 = orange.MeasureAttribute_relief(check_cached_data = False) 
    1212 
    1313print "%.3f\t%.3f" % (r1(0, data), r2(0, data)) 
  • orange/doc/Orange/rst/code/scoring-relief-gainRatio.py

    r8042 r8115  
    33# Uses:        voting 
    44# Referenced:  Orange.feature.html#scoring 
    5 # Classes:     Orange.feature.scoring.attMeasure, Orange.features.scoring.GainRatio 
     5# Classes:     Orange.feature.scoring.measure_domain, Orange.features.scoring.GainRatio 
    66 
    77import Orange 
     
    99 
    1010print 'Relief GainRt Feature' 
    11 ma_def = Orange.feature.scoring.attMeasure(table) 
     11ma_def = Orange.feature.scoring.measure_domain(table) 
    1212gr = Orange.feature.scoring.GainRatio() 
    13 ma_gr  = Orange.feature.scoring.attMeasure(table, gr) 
     13ma_gr  = Orange.feature.scoring.measure_domain(table, gr) 
    1414for i in range(5): 
    1515    print "%5.3f  %5.3f  %s" % (ma_def[i][1], ma_gr[i][1], ma_def[i][0]) 
  • orange/doc/Orange/rst/code/selection-bayes.py

    r8042 r8115  
    33# Uses:        voting 
    44# Referenced:  Orange.feature.html#selection 
    5 # Classes:     Orange.feature.scoring.attMeasure, Orange.feature.selection.bestNAtts 
     5# Classes:     Orange.feature.scoring.measure_domain, Orange.feature.selection.bestNAtts 
    66 
    77import Orange 
    8 import orngTest, orngEval 
     8 
    99 
    1010class BayesFSS(object): 
     
    2121       
    2222    def __call__(self, table, weight=None): 
    23         ma = Orange.feature.scoring.attMeasure(table) 
     23        ma = Orange.feature.scoring.measure_domain(table) 
    2424        filtered = Orange.feature.selection.selectBestNAtts(table, ma, self.N) 
    2525        model = Orange.classification.bayes.NaiveLearner(filtered) 
     
    3333        return self.classifier(example, resultType) 
    3434 
     35 
    3536# test above wraper on a data set 
    36 import orngStat, orngTest 
    3737table = Orange.data.Table("voting") 
    3838learners = (Orange.classification.bayes.NaiveLearner(name='Naive Bayes'), 
    3939            BayesFSS(name="with FSS")) 
    40 results = orngTest.crossValidation(learners, table) 
     40results = Orange.evaluation.testing.cross_validation(learners, table) 
    4141 
    4242# output the results 
    4343print "Learner      CA" 
    4444for i in range(len(learners)): 
    45     print "%-12s %5.3f" % (learners[i].name, orngStat.CA(results)[i]) 
     45    print "%-12s %5.3f" % (learners[i].name, Orange.evaluation.scoring.CA(results)[i]) 
  • orange/doc/Orange/rst/code/selection-best3.py

    r7319 r8115  
    33# Uses:        voting 
    44# Referenced:  Orange.feature.html#selection 
    5 # Classes:     Orange.feature.scoring.attMeasure, Orange.feature.selection.bestNAtts 
     5# Classes:     Orange.feature.scoring.measure_domain, Orange.feature.selection.bestNAtts 
    66 
    77import Orange 
     
    99 
    1010n = 3 
    11 ma = Orange.feature.scoring.attMeasure(table) 
     11ma = Orange.feature.scoring.measure_domain(table) 
    1212best = Orange.feature.selection.bestNAtts(ma, n) 
    1313print 'Best %d features:' % n 
  • orange/orngEvalAttr.py

    r8111 r8115  
    11from Orange.feature.scoring import * 
     2 
     3mergeAttrValues = merge_values 
     4 
     5MeasureAttribute_MDL = MDL 
     6MeasureAttribute_MDLClass = MDLClass 
     7 
     8MeasureAttribute_Distance = Distance 
     9MeasureAttribute_DistanceClass = DistanceClass 
     10 
     11OrderAttributesByMeasure = OrderAttributes 
     12 
     13 
  • orange/orngFSS.py

    r8042 r8115  
    66#This was in the old module 
    77attsAbovethreshold = attsAboveThreshold 
     8 
     9attMeasure = measure_domain 
Note: See TracChangeset for help on using the changeset viewer.