Changeset 10068:011be94478b7 in orange


Ignore:
Timestamp:
02/08/12 12:25:59 (2 years ago)
Author:
Lan Zagar <lan.zagar@…>
Branch:
default
rebase_source:
cf5435ab9142ed211d6c37e375de1cb4f71d5236
Message:

Improvements to the lookup module.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Orange/classification/lookup.py

    r9994 r10068  
    1919 
    2020A natural habitat for these classifiers is feature construction: 
    21 they usually reside in :obj:`~Orange.feature.Descriptor.get_value_from` fields of constructed 
     21they usually reside in :obj:`~Orange.feature.Descriptor.get_value_from` 
     22fields of constructed 
    2223features to facilitate their automatic computation. For instance, 
    2324the following script shows how to translate the `monks-1.tab` data set 
    2425features into a more useful subset that will only include the features 
    25 ``a``, ``b``, ``e``, and features that will tell whether ``a`` and ``b`` are equal and 
    26 whether ``e`` is 1 (don't bother about the details, they follow later;  
     26``a``, ``b``, ``e``, and features that will tell whether ``a`` and ``b`` 
     27are equal and whether ``e`` is 1 (part of 
    2728:download:`lookup-lookup.py <code/lookup-lookup.py>`): 
    2829 
     
    5354and tells that ``e1`` should be unknown then as well. 
    5455 
    55 Note that you don't need :obj:`ClassifierByLookupTable` for this. 
     56Note that :obj:`ClassifierByLookupTable` is not needed for this. 
    5657The new feature ``e1`` could be computed with a callback to Python, 
    5758for instance:: 
     
    6768 
    6869Although the above example used :obj:`ClassifierByLookupTable` as if it 
    69 was a concrete class, ClassifierByLookupTable is actually 
    70 abstract. Calling its constructor is a typical Orange trick: what you 
    71 get, is never ClassifierByLookupTable, but either 
     70was a concrete class, :obj:`ClassifierByLookupTable` is actually 
     71abstract. Calling its constructor is a typical Orange trick: it does not 
     72return an instance of :obj:`ClassifierByLookupTable`, but either 
    7273:obj:`ClassifierByLookupTable1`, :obj:`ClassifierByLookupTable2` or 
    7374:obj:`ClassifierByLookupTable3`. As their names tell, the first 
     
    7980of the table that corresponds to the combination of feature values. 
    8081 
    81 These classifiers are built to be fast, not safe. If you, for instance, 
    82 change the number of values for one of the features, Orange will 
    83 most probably crash. To protect you somewhat, many of these classes' 
    84 features are read-only and can only be set when the object is 
    85 constructed. 
     82These classifiers are built to be fast, not safe. For instance, if the number 
     83of values for one of the features is changed, Orange will most probably crash. 
     84To alleviate this, many of these classes' features are read-only and can only 
     85be set when the object is constructed. 
    8686 
    8787 
     
    114114    .. attribute:: lookup_table (read only) 
    115115         
    116         A list of values (:obj:`Orange.core.ValueList`), one for each possible 
     116        A list of values, one for each possible 
    117117        combination of features. For ClassifierByLookupTable1, there is an 
    118118        additional element that is returned when the feature's value is 
     
    123123        variable2. 
    124124         
    125         The list is read-only in the sense that you cannot assign a new 
    126         list to this field. You can, however, change its elements. Don't 
    127         change its size, though.  
     125        The attribute is read-only - a new list cannot be assigned to it. 
     126        Its elements, however, can be changed. Don't change its size.  
    128127 
    129128    .. attribute:: distributions (read only) 
    130129         
    131         Similar to :obj:`lookup_table`, but is of type 
    132         :obj:`Orange.core.DistributionList` and stores a distribution 
    133         for each combination of values.  
     130        Similar to :obj:`lookup_table`, but it stores a distribution for 
     131        each combination of values.  
    134132 
    135133    .. attribute:: data_description 
     
    218216        classes), they are merged into a single instance. Regardless of 
    219217        merging, class values in this table are distributed: their svalue 
    220         contains a :obj:`Distribution`. 
    221  
    222     .. attribute:: classifierForUnknown 
     218        contains a :obj:`~Orange.statistics.distribution.Distribution`. 
     219 
     220    .. attribute:: classifier_for_unknown 
    223221         
    224222        This classifier is used to classify instances which were not found 
    225         in the table. If classifierForUnknown is not set, don't know's are 
     223        in the table. If classifier_for_unknown is not set, don't know's are 
    226224        returned. 
    227225 
     
    237235    There are no specific methods for ClassifierByDataTable. 
    238236    Since this is a classifier, it can be called. When the instance to be 
    239     classified includes unknown values, :obj:`classifierForUnknown` will be 
     237    classified includes unknown values, :obj:`classifier_for_unknown` will be 
    240238    used if it is defined. 
    241239 
     
    248246    function for computation of intermediate values, it has an associated 
    249247    learner, :obj:`LookupLearner`. The learner's task is, basically, to 
    250     construct a Table for :obj:`sorted_examples`. It sorts them, merges them 
    251     and, of course, regards instance weights in the process as well. 
     248    construct a table for :obj:`ClassifierByDataTable.sorted_examples`. 
     249    It sorts them, merges them 
     250    and regards instance weights in the process as well. 
    252251     
    253252    If data instances are provided to the constructor, the learning algorithm 
     
    260259 
    261260 
    262 In data_s, we have prepared a table in which instances are described 
    263 only by a, b, e and the class. Learner constructs a 
    264 ClassifierByDataTable and stores instances from data_s into its 
    265 sorted_examples. Instances are merged so that there are no duplicates. 
     261In `table_s`, we have prepared a table in which instances are described 
     262only by `a`, `b`, `e` and the class. The learner constructs a 
     263:obj:`ClassifierByDataTable` and stores instances from `table_s` into its 
     264:obj:`~ClassifierByDataTable.sorted_examples`. Instances are merged so that 
     265there are no duplicates. 
    266266 
    267267    >>> print len(table_s) 
     
    290290 
    291291    >>> for i in abe.sorted_examples[:10]: 
    292     ...     print i, i.getclass().svalue 
     292    ...     print i, i.get_class().svalue 
    293293    ['1', '1', '1', '1'] <0.000, 12.000> 
    294294    ['1', '1', '2', '1'] <0.000, 12.000> 
     
    302302    ['1', '3', '2', '0'] <12.000, 0.000> 
    303303 
    304 ClassifierByDataTable will usually be used by :obj:`Orange.feature.Descriptor.get_value_from`. So, we 
     304:obj:`ClassifierByDataTable` will usually be used by 
     305:obj:`~Orange.feature.Descriptor.get_value_from`. So, we 
    305306would probably continue this by constructing a new feature and put the 
    306 classifier into its :obj:`Orange.feature.Descriptor.get_value_from`. 
     307classifier into its :obj:`~Orange.feature.Descriptor.get_value_from`. 
    307308 
    308309    >>> y2 = Orange.feature.Discrete("y2", values = ["0", "1"]) 
    309310    >>> y2.get_value_from = abe 
    310311 
    311 There's something disturbing here. Although abe determines the value of 
    312 y2, abe.class_var is still y. Orange doesn't bother (the whole example 
    313 is artificial - you will seldom pack the entire data set in an 
    314 ClassifierByDataTable...), so shouldn't you. But still, for the sake 
    315 of hygiene, you can conclude by 
     312Although `abe` determines the value of `y2`, `abe.class_var` is still `y`. 
     313Orange doesn't mind (the whole example is artificial - the entire data set 
     314will seldom be packed in an :obj:`ClassifierByDataTable`), but this can still 
     315be solved by 
    316316 
    317317    >>> abe.class_var = y2 
    318318 
    319 The whole story can be greatly simplified. LookupLearner can also be 
     319The whole story can be greatly simplified. :obj:`LookupLearner` can also be 
    320320called differently than other learners. Besides instances, you can pass 
    321321the new class variable and the features that should be used for 
    322 classification. This saves us from constructing data_s and reassigning 
    323 the class_var. It doesn't set the :obj:`Orange.feature.Descriptor.get_value_from`, though. 
     322classification. This saves us from constructing table_s and reassigning 
     323the :obj:`~Orange.data.Domain.class_var`. It doesn't set the 
     324:obj:`~Orange.feature.Descriptor.get_value_from`, though. 
    324325 
    325326part of :download:`lookup-table.py <code/lookup-table.py>`:: 
     
    333334    abe2 = Orange.classification.lookup.LookupLearner(y2, [a, b, e], table) 
    334335 
    335 Let us, for the end, show another use of LookupLearner. With the 
     336Let us, for the end, show another use of :obj:`LookupLearner`. With the 
    336337alternative call arguments, it offers an easy way to observe feature 
    337 interactions. For this purpose, we shall omit e, and construct a 
    338 ClassifierByDataTable from a and b only (part of :download:`lookup-table.py <code/lookup-table.py>`): 
     338interactions. For this purpose, we shall omit `e`, and construct a 
     339:obj:`ClassifierByDataTable` from `a` and `b` only (part of 
     340:download:`lookup-table.py <code/lookup-table.py>`): 
    339341 
    340342.. literalinclude:: code/lookup-table.py 
     
    342344 
    343345The script's output show how the classes are distributed for different 
    344 values of a and b:: 
     346values of `a` and `b`:: 
    345347 
    346348    ['1', '1', '1'] <0.000, 48.000> 
     
    354356    ['3', '3', '1'] <0.000, 48.000> 
    355357 
    356 For instance, when a is '1' and b is '3', the majority class is '0', 
     358For instance, when `a` is '1' and `b` is '3', the majority class is '0', 
    357359and the class distribution is 36:12 in favor of '0'. 
    358360 
     
    364366There are several functions for working with classifiers that use a stored 
    365367data table for making predictions. There are four such classifiers; the most 
    366 general stores an :class:`Orange.data.Table` and the other three are 
     368general stores a :class:`~Orange.data.Table` and the other three are 
    367369specialized and optimized for cases where the domain contains only one, two or 
    368370three features (besides the class variable). 
    369371 
    370 .. function:: lookup_from_bound(classVar, bound) 
     372.. function:: lookup_from_bound(class_var, bound) 
    371373 
    372374    This function constructs an appropriate lookup classifier for one, two or 
     
    374376    classifier is of type :obj:`ClassifierByLookupTable`, 
    375377    :obj:`ClassifierByLookupTable2` or :obj:`ClassifierByLookupTable3`, with 
    376     classVar and bound set set as given. 
    377  
    378     If, for instance, table contains a data set Monk 1 and you would like to 
    379     construct a new feature from features a and b, you can call this function 
    380     as follows. 
    381      
    382         >>> newvar = Orange.feature.Discrete() 
     378    `class_var` and bound set set as given. 
     379 
     380    For example, using the data set `monks-1.tab`, to construct a new feature 
     381    from features `a` and `b`, this function can be called as follows. 
     382     
     383        >>> new_var = Orange.feature.Discrete() 
    383384        >>> bound = [table.domain[name] for name in ["a", "b"]] 
    384         >>> lookup = lookup_from_bound(newvar, bound) 
     385        >>> lookup = Orange.classification.lookup.lookup_from_bound(new_var, bound) 
    385386        >>> print lookup.lookup_table 
    386387        <?, ?, ?, ?, ?, ?, ?, ?, ?> 
    387388 
    388     Function lookup_from_bound does not initialize neither newVar nor 
     389    Function `lookup_from_bound` does not initialize neither `new_var` nor 
    389390    the lookup table... 
    390391 
    391 .. function:: lookup_from_function(classVar, bound, function) 
    392  
    393     ... and that's exactly where lookup_from_function differs from 
    394     :obj:`lookup_from_bound`. lookup_from_function first calls 
    395     lookup_from_bound and then uses the function to initialize the lookup 
    396     table. The other difference between this and the previous function is that 
    397     lookup_from_function also accepts bound sets with more than three 
     392.. function:: lookup_from_function(class_var, bound, function) 
     393 
     394    ... and that's exactly where `lookup_from_function` differs from 
     395    :obj:`lookup_from_bound`. `lookup_from_function` first calls 
     396    :obj:`lookup_from_bound` and then uses the function to initialize the 
     397    lookup table. The other difference between this and the previous function 
     398    is that `lookup_from_function` also accepts bound sets with more than three 
    398399    features. In this case, it construct a :obj:`ClassifierByDataTable`. 
    399400 
     
    402403    properly initialized. 
    403404 
    404     For exercise, let us construct a new feature called a=b whose value will 
    405     be "yes" when a and b are equal and "no" when they are not. We will then 
     405    For exercise, let us construct a new feature called `a=b` whose value will 
     406    be "yes" when `a` and `b` are equal and "no" when they are not. We will then 
    406407    add the feature to the data set. 
    407408     
    408409        >>> bound = [table.domain[name] for name in ["a", "b"]] 
    409         >>> newVar = Orange.feature.Discrete("a=b", values=["no", "yes"]) 
    410         >>> lookup = lookup_from_function(newVar, bound, lambda x: x[0] == x[1]) 
    411         >>> newVar.get_value_from = lookup 
     410        >>> new_var = Orange.feature.Discrete("a=b", values=["no", "yes"]) 
     411        >>> lookup = Orange.classification.lookup.lookup_from_function(new_var, bound, lambda x: x[0] == x[1]) 
     412        >>> new_var.get_value_from = lookup 
    412413        >>> import orngCI 
    413         >>> table2 = orngCI.addAnAttribute(newVar, table) 
     414        >>> table2 = orngCI.addAnAttribute(new_var, table) 
    414415        >>> for i in table2[:30]: 
    415416            ... print i 
    416         ['1', '1', '1', '1', '1', '1', 'yes', '1'] 
    417         ['1', '1', '1', '1', '1', '2', 'yes', '1'] 
    418         ['1', '1', '1', '1', '2', '1', 'yes', '1'] 
    419         ['1', '1', '1', '1', '2', '2', 'yes', '1'] 
     417        ['1', '1', '1', '1', '3', '1', 'yes', '1'] 
     418        ['1', '1', '1', '1', '3', '2', 'yes', '1'] 
     419        ['1', '1', '1', '3', '2', '1', 'yes', '1'] 
    420420        ... 
    421         ['2', '1', '2', '3', '4', '1', 'no', '0'] 
    422         ['2', '1', '2', '3', '4', '2', 'no', '0'] 
    423         ['2', '2', '1', '1', '1', '1', 'yes', '1'] 
    424         ['2', '2', '1', '1', '1', '2', 'yes', '1'] 
     421        ['1', '2', '1', '1', '1', '2', 'no', '1'] 
     422        ['1', '2', '1', '1', '2', '1', 'no', '0'] 
     423        ['1', '2', '1', '1', '3', '1', 'no', '0'] 
    425424        ... 
    426425 
    427     The feature was inserted with use of orngCI.addAnAttribute. By setting 
    428     newVar.get_value_from to lookup we state that when converting domains 
    429     (either when needed by addAnAttribute or at some other place), lookup 
    430     should be used to compute newVar's value. (A bit off topic, but 
    431     important: you should never call :obj:`Orange.feature.Descriptor.get_value_from` directly, but always call 
    432     it through computeValue.) 
     426    The feature was inserted with use of `orngCI.addAnAttribute`. By setting 
     427    `new_var.get_value_from` to `lookup` we state that when converting domains 
     428    (either when needed by `addAnAttribute` or at some other place), `lookup` 
     429    should be used to compute `new_var`'s value. (A bit off topic, but 
     430    important: you should never call 
     431    :obj:`~Orange.feature.Descriptor.get_value_from` directly, but always  
     432    through :obj:`~Orange.feature.Descriptor.compute_value`.) 
    433433 
    434434.. function:: lookup_from_data(examples [, weight]) 
    435435 
    436     This function takes a set of examples (e.g. :obj:`Orange.data.Table`) 
     436    This function takes a set of data instances (e.g. :obj:`Orange.data.Table`) 
    437437    and turns it into a classifier. If there are one, two or three features and 
    438438    no ambiguous examples (examples are ambiguous if they have same values of 
     
    441441    :obj:`ClassifierByDataTable`. 
    442442     
    443         >>> lookup = lookup_from_data(table) 
     443        >>> lookup = Orange.classification.lookup.lookup_from_data(table) 
    444444        >>> test_instance = Orange.data.Instance(table.domain, ['3', '2', '2', '3', '4', '1', '?']) 
    445445        >>> lookup(test_instance) 
     
    448448.. function:: dump_lookup_function(func) 
    449449 
    450     dump_lookup_function returns a string with a lookup function in 
    451     tab-delimited format. Argument func can be any of the above-mentioned 
    452     classifiers or a feature whose :obj:`Orange.feature.Descriptor.get_value_from` points to one of such 
     450    `dump_lookup_function` returns a string with a lookup function in 
     451    tab-delimited format. Argument `func` can be any of the above-mentioned 
     452    classifiers or a feature whose 
     453    :obj:`~Orange.feature.Descriptor.get_value_from` points to one of such 
    453454    classifiers. 
    454455 
    455     For instance, if lookup is such as constructed in the example for 
    456     lookup_from_function, you can print it out by:: 
     456    For instance, if `lookup` is such as constructed in the example for 
     457    `lookup_from_function`, it can be printed by:: 
    457458     
    458459        >>> print dump_lookup_function(lookup) 
     
    471472""" 
    472473 
     474from Orange.misc import deprecated_keywords 
    473475import Orange.data 
    474476from Orange.core import \ 
     
    481483 
    482484 
    483 def lookup_from_bound(attribute, bound): 
     485@deprecated_keywords({"attribute":"class_var"}) 
     486def lookup_from_bound(class_var, bound): 
    484487    if not len(bound): 
    485488        raise TypeError, "no bound attributes" 
    486489    elif len(bound) <= 3: 
    487490        return [ClassifierByLookupTable, ClassifierByLookupTable2, 
    488                 ClassifierByLookupTable3][len(bound) - 1](attribute, *list(bound)) 
     491                ClassifierByLookupTable3][len(bound) - 1](class_var, *list(bound)) 
    489492    else: 
    490493        return None 
    491494 
    492495     
    493 def lookup_from_function(attribute, bound, function): 
    494     """Constructs ClassifierByDataTable or ClassifierByLookupTable 
    495     mirroring the given function 
    496      
     496@deprecated_keywords({"attribute":"class_var"}) 
     497def lookup_from_function(class_var, bound, function): 
    497498    """ 
    498     lookup = lookup_from_bound(attribute, bound) 
     499    Constructs ClassifierByDataTable or ClassifierByLookupTable 
     500    mirroring the given function. 
     501     
     502    """ 
     503    lookup = lookup_from_bound(class_var, bound) 
    499504    if lookup: 
    500         lookup.lookup_table = [Orange.data.Value(attribute, function(attributes)) 
    501                               for attributes in Orange.misc.counters.LimitedCounter( 
    502                                   [len(attr.values) for attr in bound])] 
     505        for i, attrs in enumerate(Orange.misc.counters.LimitedCounter( 
     506                    [len(var.values) for var in bound])): 
     507            lookup.lookup_table[i] = Orange.data.Value(class_var, function(attrs)) 
    503508        return lookup 
    504509    else: 
    505         examples = Orange.data.Table(Orange.data.Domain(bound, attribute)) 
    506         for attributes in Orange.misc.counters.LimitedCounter([len(attr.values) 
    507                                                    for attr in dom.attributes]): 
    508             examples.append(Orange.data.Example(dom, attributes + 
    509                                                 [function(attributes)])) 
    510         return LookupLearner(examples) 
     510        dom = Orange.data.Domain(bound, class_var) 
     511        data = Orange.data.Table(dom) 
     512        for attrs in Orange.misc.counters.LimitedCounter( 
     513                    [len(var.values) for var in dom.features]): 
     514            data.append(Orange.data.Example(dom, attrs + [function(attrs)])) 
     515        return LookupLearner(data) 
    511516       
    512517 
    513 from Orange.misc import deprecated_keywords 
    514518@deprecated_keywords({"learnerForUnknown":"learner_for_unknown"}) 
    515519def lookup_from_data(examples, weight=0, learner_for_unknown=None): 
    516     if len(examples.domain.attributes) <= 3: 
     520    if len(examples.domain.features) <= 3: 
    517521        lookup = lookup_from_bound(examples.domain.class_var, 
    518                                  examples.domain.attributes) 
     522                                 examples.domain.features) 
    519523        lookup_table = lookup.lookup_table 
    520524        for example in examples: 
    521             ind = lookup.getindex(example) 
    522             if not lookup_table[ind].isSpecial() and (lookup_table[ind] != 
    523                                                      example.getclass()): 
     525            ind = lookup.get_index(example) 
     526            if not lookup_table[ind].is_special() and (lookup_table[ind] != 
     527                                                     example.get_class()): 
    524528                break 
    525             lookup_table[ind] = example.getclass() 
     529            lookup_table[ind] = example.get_class() 
    526530        else: 
    527531            return lookup 
Note: See TracChangeset for help on using the changeset viewer.