Changeset 7761:8ac2e36f9363 in orange


Ignore:
Timestamp:
03/17/11 11:49:54 (3 years ago)
Author:
lanz <lan.zagar@…>
Branch:
default
Convert:
b466673e74b364acc0c26b3965c6894ce29d9929
Message:

Documentation and code refactoring for the lookup module.

Location:
orange
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/lookup.py

    r7754 r7761  
    3434 
    3535    >>> for i in range(5): 
    36     ...     print data2.randomexample() 
    37     ['1', '1', 'yes', '4', 'no', '1'] 
    38     ['3', '3', 'yes', '2', 'no', '1'] 
    39     ['2', '1', 'no', '4', 'no', '0'] 
    40     ['2', '1', 'no', '1', 'yes', '1'] 
    41     ['1', '1', 'yes', '3', 'no', '1'] 
     36    ...     print table2.randomexample() 
     37    ['3', '2', 'no', '2', 'no', '0'] 
     38    ['2', '2', 'yes', '2', 'no', '1'] 
     39    ['1', '2', 'no', '2', 'no', '0'] 
     40    ['2', '3', 'no', '1', 'yes', '1'] 
     41    ['1', '3', 'no', '1', 'yes', '1'] 
    4242 
    4343The first :obj:`ClassifierByLookupTable` takes values of features a 
     
    8686 
    8787 
    88 .. py:class:: ClassifierByLookupTable(classVar, variable1[, variable2[, variable3]] [, lookupTable[, distributions]]) 
    89      
    90     A general constructor that, based on the number of attribute 
     88.. py:class:: ClassifierByLookupTable(class_var, variable1[, variable2[, variable3]] [, lookup_table[, distributions]]) 
     89     
     90    A general constructor that, based on the number of feature 
    9191    descriptors, constructs one of the three classes discussed. 
    92     If lookupTable and distributions are omitted, constructor also 
    93     initializes lookupTable and distributions to two lists of the 
     92    If lookup_table and distributions are omitted, constructor also 
     93    initializes lookup_table and distributions to two lists of the 
    9494    right sizes, but their elements are don't knows and empty 
    9595    distributions. If they are given, they must be of correct size. 
     
    113113        ClassifierByLookupTable3 (all three). 
    114114 
    115     .. attribute:: lookupTable (read only) 
     115    .. attribute:: lookup_table (read only) 
    116116         
    117117        A list of values (ValueList), one for each possible combination of 
     
    130130    .. attribute:: distributions (read only) 
    131131         
    132         Similar to :obj:`lookupTable`, but is of type DistributionList 
     132        Similar to :obj:`lookup_table`, but is of type DistributionList 
    133133        and stores a distribution for each combination of values.  
    134134 
     
    139139        it to make predictions when one or more feature values are unknown. 
    140140        ClassifierByLookupTable1 doesn't need it since this case is covered 
    141         by an additional element in lookupTable and distributions, 
     141        by an additional element in lookup_table and distributions, 
    142142        as told above. 
    143143         
    144144    .. method:: getindex(example) 
    145145     
    146         Returns an index into lookupTable or distributions. The formula 
     146        Returns an index into lookup_table or distributions. The formula 
    147147        depends upon the type of the classifier. If value\ *i* is 
    148148        int(example[variable\ *i*]), then the corresponding formulae are 
    149149 
    150150        ClassifierByLookupTable1: 
    151             index = value1, or len(lookupTable)-1 if value is unknown 
     151            index = value1, or len(lookup_table)-1 if value is unknown 
    152152        ClassifierByLookupTable2: 
    153153            index = value1*noOfValues1 + value2, or -1 if any value is unknown  
     
    164164        Output:: 
    165165         
    166             ['1', '1', '2', '2', '4', '1', '1']: ab 0, e1 3 
    167             ['3', '3', '1', '2', '2', '1', '1']: ab 8, e1 1 
    168             ['2', '1', '2', '3', '4', '2', '0']: ab 3, e1 3 
    169             ['2', '1', '1', '2', '1', '1', '1']: ab 3, e1 0 
    170             ['1', '1', '1', '2', '3', '1', '1']: ab 0, e1 2  
    171  
    172  
    173  
    174 .. py:class:: ClassifierByLookupTable1(classVar, variable1 [, lookupTable, distributions]) 
     166            ['3', '2', '1', '2', '2', '1', '0']: ab 7, e1 1  
     167            ['2', '2', '1', '2', '2', '1', '1']: ab 4, e1 1  
     168            ['1', '2', '1', '2', '2', '2', '0']: ab 1, e1 1  
     169            ['2', '3', '2', '3', '1', '1', '1']: ab 5, e1 0  
     170            ['1', '3', '2', '2', '1', '1', '1']: ab 2, e1 0  
     171 
     172 
     173 
     174.. py:class:: ClassifierByLookupTable1(class_var, variable1 [, lookup_table, distributions]) 
    175175     
    176176    Uses a single feature for lookup. See 
    177177    :obj:`ClassifierByLookupTable` for more details. 
    178178 
    179 .. py:class:: ClassifierByLookupTable2(classVar, variable1, variable2, [, lookupTable[, distributions]]) 
     179.. py:class:: ClassifierByLookupTable2(class_var, variable1, variable2, [, lookup_table[, distributions]]) 
    180180     
    181181    Uses two features for lookup. See 
    182182    :obj:`ClassifierByLookupTable` for more details. 
    183183         
    184 .. py:class:: ClassifierByLookupTable3(classVar, variable1, variable2, variable3, [, lookupTable[, distributions]]) 
     184.. py:class:: ClassifierByLookupTable3(class_var, variable1, variable2, variable3, [, lookup_table[, distributions]]) 
    185185     
    186186    Uses three features for lookup. See 
     
    194194   single: classification; data table 
    195195 
    196 :obj:`ClassifierByExampleTable` is used in similar contexts as 
     196:obj:`ClassifierByDataTable` is used in similar contexts as 
    197197:obj:`ClassifierByLookupTable`. If you write, for instance, a 
    198198constructive induction algorithm, it is recommended that the values 
    199199of the new feature are computed either by one of classifiers by lookup 
    200 table or by ClassifierByExampleTable, depending on the number of bound 
     200table or by ClassifierByDataTable, depending on the number of bound 
    201201features. 
    202202 
    203 .. py:class:: ClassifierByExampleTable 
    204  
    205     :obj:`ClassifierByExampleTable` is the alternative to 
     203.. py:class:: ClassifierByDataTable 
     204 
     205    :obj:`ClassifierByDataTable` is the alternative to 
    206206    :obj:`ClassifierByLookupTable`. It is to be used when the 
    207207    classification is based on more than three features. Instead of having 
     
    210210     
    211211 
    212     .. attribute:: sortedExamples 
     212    .. attribute:: sorted_examples 
    213213         
    214214        A :obj:`Orange.data.Table` with sorted data instances for lookup. 
     
    228228         
    229229        A tuple with features in the domain. This field is here so that 
    230         :obj:`ClassifierByExampleTable` appears more similar to 
     230        :obj:`ClassifierByDataTable` appears more similar to 
    231231        :obj:`ClassifierByLookupTable`. If a constructive induction 
    232232        algorithm returns the result in one of these classifiers, and you 
     
    234234        regardless of the class you actually got. 
    235235 
    236     There are no specific methods for ClassifierByExampleTable. 
     236    There are no specific methods for ClassifierByDataTable. 
    237237    Since this is a classifier, it can be called. When the instance to be 
    238238    classified includes unknown values, :obj:`classifierForUnknown` will be 
     
    243243.. py:class:: LookupLearner 
    244244     
    245     Although :obj:`ClassifierByExampleTable` is not really a classifier in 
     245    Although :obj:`ClassifierByDataTable` is not really a classifier in 
    246246    the sense that you will use it to classify instances, but is rather a 
    247247    function for computation of intermediate values, it has an associated 
    248248    learner, :obj:`LookupLearner`. The learner's task is, basically, to 
    249     construct a Table for :obj:`sortedExamples`. It sorts them, merges them 
     249    construct a Table for :obj:`sorted_examples`. It sorts them, merges them 
    250250    and, of course, regards instance weights in the process as well. 
    251251     
     
    261261In data_s, we have prepared a table in which instances are described 
    262262only by a, b, e and the class. Learner constructs a 
    263 ClassifierByExampleTable and stores instances from data_s into its 
    264 sortedExamples. Instances are merged so that there are no duplicates. 
    265  
    266     >>> print len(data_s) 
    267     432 
    268     >>> print len(abe2.sortedExamples) 
     263ClassifierByDataTable and stores instances from data_s into its 
     264sorted_examples. Instances are merged so that there are no duplicates. 
     265 
     266    >>> print len(table_s) 
     267    556 
     268    >>> print len(abe.sorted_examples) 
    269269    36 
    270     >>> for i in abe2.sortedExamples[:5]: 
     270    >>> for i in abe.sorted_examples[:10]: 
    271271    ...     print i 
    272272    ['1', '1', '1', '1'] 
     
    288288and none in the other. 
    289289 
    290     >>> for i in abe2.sortedExamples[:10]: 
     290    >>> for i in abe.sorted_examples[:10]: 
    291291    ...     print i, i.getclass().svalue 
    292292    ['1', '1', '1', '1'] <0.000, 12.000> 
     
    301301    ['1', '3', '2', '0'] <12.000, 0.000> 
    302302 
    303 ClassifierByExampleTable will usually be used by getValueFrom. So, we 
     303ClassifierByDataTable will usually be used by getValueFrom. So, we 
    304304would probably continue this by constructing a new feature and put the 
    305305classifier into its getValueFrom. 
    306306 
    307     >>> y2 = orange.EnumVariable("y2", values = ["0", "1"]) 
     307    >>> y2 = Orange.data.variable.Discrete("y2", values = ["0", "1"]) 
    308308    >>> y2.getValueFrom = abe 
    309309 
    310310There's something disturbing here. Although abe determines the value of 
    311 y2, abe.classVar is still y. Orange doesn't bother (the whole example 
     311y2, abe.class_var is still y. Orange doesn't bother (the whole example 
    312312is artificial - you will seldom pack the entire data set in an 
    313 ClassifierByExampleTable...), so shouldn't you. But still, for the sake 
     313ClassifierByDataTable...), so shouldn't you. But still, for the sake 
    314314of hygiene, you can conclude by 
    315315 
    316     >>> abe.classVar = y2 
     316    >>> abe.class_var = y2 
    317317 
    318318The whole story can be greatly simplified. LookupLearner can also be 
     
    320320the new class variable and the features that should be used for 
    321321classification. This saves us from constructing data_s and reassigning 
    322 the classVar. It doesn't set the getValueFrom, though. 
     322the class_var. It doesn't set the getValueFrom, though. 
    323323 
    324324part of `lookup-table.py`_ (uses: `monks-1.tab`_):: 
     
    335335alternative call arguments, it offers an easy way to observe feature 
    336336interactions. For this purpose, we shall omit e, and construct a 
    337 ClassifierByExampleTable from a and b only (part of `lookup-table.py`_; uses: `monks-1.tab`_): 
     337ClassifierByDataTable from a and b only (part of `lookup-table.py`_; uses: `monks-1.tab`_): 
    338338 
    339339.. literalinclude:: code/lookup-table.py 
     
    357357 
    358358 
     359Utility functions 
     360================= 
     361 
     362 
     363There are several functions for working with classifiers that use a stored 
     364data table for making predictions. There are four such classifiers; the most 
     365general stores an :class:`Orange.data.Table` and the other three are 
     366specialized and optimized for cases where the domain contains only one, two or 
     367three features (besides the class variable). 
     368 
     369.. function:: lookup_from_bound(classVar, bound) 
     370 
     371    This function constructs an appropriate lookup classifier for one, two or 
     372    three features. If there are more, it returns None. The resulting 
     373    classifier is of type :obj:`ClassifierByLookupTable`, 
     374    :obj:`ClassifierByLookupTable2` or :obj:`ClassifierByLookupTable3`, with 
     375    classVar and bound set set as given. 
     376 
     377    If, for instance, table contains a data set Monk 1 and you would like to 
     378    construct a new feature from features a and b, you can call this function 
     379    as follows. 
     380     
     381        >>> newvar = Orange.data.variable.Discrete() 
     382        >>> bound = [table.domain[name] for name in ["a", "b"]] 
     383        >>> lookup = lookup_from_bound(newvar, bound) 
     384        >>> print lookup.lookup_table 
     385        <?, ?, ?, ?, ?, ?, ?, ?, ?> 
     386 
     387    Function lookup_from_bound does not initialize neither newVar nor 
     388    the lookup table... 
     389 
     390.. function:: lookup_from_function(classVar, bound, function) 
     391 
     392    ... and that's exactly where lookup_from_function differs from 
     393    :obj:`lookup_from_bound`. lookup_from_function first calls 
     394    lookup_from_bound and then uses the function to initialize the lookup 
     395    table. The other difference between this and the previous function is that 
     396    lookup_from_function also accepts bound sets with more than three 
     397    features. In this case, it construct a :obj:`ClassifierByDataTable`. 
     398 
     399    The function gets the values of features as integer indices and should 
     400    return an integer index of the "class value". The class value must be 
     401    properly initialized. 
     402 
     403    For exercise, let us construct a new feature called a=b whose value will 
     404    be "yes" when a and b are equal and "no" when they are not. We will then 
     405    add the feature to the data set. 
     406     
     407        >>> bound = [table.domain[name] for name in ["a", "b"]] 
     408        >>> newVar = Orange.data.variable.Discrete("a=b", values=["no", "yes"]) 
     409        >>> lookup = lookup_from_function(newVar, bound, lambda x: x[0]==x[1]) 
     410        >>> newVar.getValueFrom = lookup 
     411        >>> import orngCI 
     412        >>> table2 = orngCI.addAnAttribute(newVar, table) 
     413        >>> for i in table2[:30]: 
     414            ... print i 
     415        ['1', '1', '1', '1', '1', '1', 'yes', '1'] 
     416        ['1', '1', '1', '1', '1', '2', 'yes', '1'] 
     417        ['1', '1', '1', '1', '2', '1', 'yes', '1'] 
     418        ['1', '1', '1', '1', '2', '2', 'yes', '1'] 
     419        ... 
     420        ['2', '1', '2', '3', '4', '1', 'no', '0'] 
     421        ['2', '1', '2', '3', '4', '2', 'no', '0'] 
     422        ['2', '2', '1', '1', '1', '1', 'yes', '1'] 
     423        ['2', '2', '1', '1', '1', '2', 'yes', '1'] 
     424        ... 
     425 
     426    The feature was inserted with use of orngCI.addAnAttribute. By setting 
     427    newVar.getValueFrom to lookup we state that when converting domains 
     428    (either when needed by addAnAttribute or at some other place), lookup 
     429    should be used to compute newVar's value. (A bit off topic, but 
     430    important: you should never call getValueFrom directly, but always call 
     431    it through computeValue.) 
     432 
     433.. function:: lookup_from_data(examples [, weight]) 
     434 
     435    This function takes a set of examples (e.g. :obj:`Orange.data.Table`) 
     436    and turns it into a classifier. If there are one, two or three features and 
     437    no ambiguous examples (examples are ambiguous if they have same values of 
     438    features but with different class values), it will construct an appropriate 
     439    :obj:`ClassifierByLookupTable`. Otherwise, it will return an 
     440    :obj:`ClassifierByDataTable`. 
     441     
     442        >>> lookup = lookup_from_data(table) 
     443        >>> test_instance = Orange.data.Instance(table.domain, ['3', '2', '2', '3', '4', '1', '?']) 
     444        >>> lookup(test_instance) 
     445        <orange.Value 'y'='0'> 
     446     
     447.. function:: dump_lookup_function(func) 
     448 
     449    dump_lookup_function returns a string with a lookup function in 
     450    tab-delimited format. Argument func can be any of the above-mentioned 
     451    classifiers or a feature whose getValueFrom points to one of such 
     452    classifiers. 
     453 
     454    For instance, if lookup is such as constructed in the example for 
     455    lookup_from_function, you can print it out by:: 
     456     
     457        >>> print dump_lookup_function(lookup) 
     458        a      b      a=b 
     459        ------ ------ ------ 
     460        1      1      yes 
     461        1      2      no 
     462        1      3      no 
     463        2      1      no 
     464        2      2      yes 
     465        2      3      no 
     466        3      1      no 
     467        3      2      no 
     468        3      3      yes 
     469 
     470 
    359471.. _lookup-lookup.py: code/lookup-lookup.py 
    360472.. _lookup-table.py: code/lookup-table.py 
     
    377489        raise TypeError, "no bound attributes" 
    378490    elif len(bound) <= 3: 
    379         return apply([ClassifierByLookupTable, ClassifierByLookupTable2, 
    380                       ClassifierByLookupTable3][len(bound) - 1], 
    381                      [attribute] + list(bound)) 
     491        return [ClassifierByLookupTable, ClassifierByLookupTable2, 
     492                ClassifierByLookupTable3][len(bound) - 1](attribute, *list(bound)) 
    382493    else: 
    383494        return None 
     
    385496     
    386497def lookup_from_function(attribute, bound, function): 
    387     """Constructs ClassifierByExampleTable or ClassifierByLookupTable 
     498    """Constructs ClassifierByDataTable or ClassifierByLookupTable 
    388499    mirroring the given function 
    389500     
     
    391502    lookup = lookup_from_bound(attribute, bound) 
    392503    if lookup: 
    393         lookup.lookupTable = [Orange.data.Value(attribute, function(attributes)) 
     504        lookup.lookup_table = [Orange.data.Value(attribute, function(attributes)) 
    394505                              for attributes in Orange.misc.counters.LimitedCounter( 
    395506                                  [len(attr.values) for attr in bound])] 
     
    406517def lookup_from_data(examples, weight = 0, learnerForUnknown = None): 
    407518    if len(examples.domain.attributes) <= 3: 
    408         lookup = lookup_from_bound(examples.domain.classVar, 
     519        lookup = lookup_from_bound(examples.domain.class_var, 
    409520                                 examples.domain.attributes) 
    410         lookupTable = lookup.lookupTable 
     521        lookup_table = lookup.lookup_table 
    411522        for example in examples: 
    412523            ind = lookup.getindex(example) 
    413             if not lookupTable[ind].isSpecial() and (lookupTable[ind] <> 
     524            if not lookup_table[ind].isSpecial() and (lookup_table[ind] <> 
    414525                                                     example.getclass()): 
    415526                break 
    416             lookupTable[ind] = example.getclass() 
     527            lookup_table[ind] = example.getclass() 
    417528        else: 
    418529            return lookup 
    419530 
    420531        # there are ambiguities; a backup plan is 
    421         # ClassifierByExampleTable, let it deal with them 
     532        # ClassifierByDataTable, let it deal with them 
    422533        return LookupLearner(examples, weight, 
    423534                             learnerForUnknown=learnerForUnknown) 
     
    436547 
    437548    outp = "" 
    438     if isinstance(func, ClassifierByExampleTable): 
     549    if isinstance(func, ClassifierByDataTable): 
    439550    # XXX This needs some polishing :-) 
    440         for i in func.sortedExamples: 
     551        for i in func.sorted_examples: 
    441552            outp += "%s\n" % i 
    442553    else: 
     
    444555        for a in boundset: 
    445556            outp += "%s\t" % a.name 
    446         outp += "%s\n" % func.classVar.name 
     557        outp += "%s\n" % func.class_var.name 
    447558        outp += "------\t" * (len(boundset)+1) + "\n" 
    448559         
     
    458569                else: 
    459570                    outp += "?\t", 
    460             outp += "%s\n" % func.classVar.values[int(func.lookupTable[lc])] 
     571            outp += "%s\n" % func.class_var.values[int(func.lookup_table[lc])] 
    461572            lc += 1 
    462573    return outp 
  • orange/fixes/fix_orange_imports.py

    r7631 r7761  
    4949           "orngWrap": "Orange.optimization", 
    5050           "orngClustering": "Orange.clustering", 
     51           "orngLookup": "Orange.classification.lookup", 
    5152           } 
    5253 
Note: See TracChangeset for help on using the changeset viewer.