Ignore:
Timestamp:
02/24/12 00:18:24 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Children:
10356:bf0b9ef8974f, 10368:28f5cab86b85
Message:

Dedostoyevskied documentation on lookup learners, changed the order of classifiers in the index

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.classification.lookup.rst

    r9372 r10347  
    1 .. automodule:: Orange.classification.lookup 
     1.. py:currentmodule:: Orange.classification.lookup 
     2 
     3.. index:: classification; lookup 
     4 
     5******************************* 
     6Lookup classifiers (``lookup``) 
     7******************************* 
     8 
     9Lookup classifiers predict classes by looking into stored lists of 
     10cases. There are two kinds of such classifiers in Orange. The simpler 
     11and faster :obj:`ClassifierByLookupTable` uses up to three discrete 
     12features and has a stored mapping from values of those features to the 
     13class value. The more complex classifiers store an 
     14:obj:`Orange.data.Table` and predict the class by matching the 
     15instance to instances in the table. 
     16 
     17.. index:: 
     18   single: feature construction; lookup classifiers 
     19 
     20A natural habitat for these classifiers is feature construction: they 
     21usually reside in :obj:`~Orange.feature.Descriptor.get_value_from` 
     22fields of constructed features to facilitate their automatic 
     23computation. For instance, the following script shows how to translate 
     24the ``monks-1.tab`` data set features into a more useful subset that 
     25will only include the features ``a``, ``b``, ``e``, and features that 
     26will tell whether ``a`` and ``b`` are equal and whether ``e`` is 1 
     27(part of :download:`lookup-lookup.py <code/lookup-lookup.py>`): 
     28 
     29.. 
     30    .. literalinclude:: code/lookup-lookup.py 
     31        :lines: 7-21 
     32 
     33.. testcode:: 
     34 
     35    import Orange 
     36 
     37    monks = Orange.data.Table("monks-1") 
     38 
     39    a, b, e = monks.domain["a"], monks.domain["b"], monks.domain["e"] 
     40 
     41    ab = Orange.feature.Discrete("a==b", values = ["no", "yes"]) 
     42    ab.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(ab, a, b, 
     43                        ["yes", "no", "no",  "no", "yes", "no",  "no", "no", "yes"]) 
     44 
     45    e1 = Orange.feature.Discrete("e==1", values = ["no", "yes"]) 
     46    e1.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(e1, e, 
     47                        ["yes", "no", "no", "no", "?"]) 
     48 
     49    monks2 = monks.select([a, b, ab, e, e1, monks.domain.class_var]) 
     50     
     51We can check the correctness of the script by printing out several 
     52random examples from ``data2``. 
     53 
     54    >>> for i in range(5): 
     55    ...     print monks2.randomexample() 
     56    ['3', '2', 'no', '2', 'no', '0'] 
     57    ['2', '2', 'yes', '2', 'no', '1'] 
     58    ['1', '2', 'no', '2', 'no', '0'] 
     59    ['2', '3', 'no', '1', 'yes', '1'] 
     60    ['1', '3', 'no', '1', 'yes', '1'] 
     61 
     62The first :obj:`ClassifierByLookupTable` takes values of features ``a`` 
     63and ``b`` and computes the value of ``ab`` according to the rule given in the 
     64given table. The first three values correspond to ``a=1`` and ``b=1,2,3``; 
     65for the first combination, value of ``ab`` should be "yes", for the other 
     66two ``a`` and ``b`` are different. The next triplet corresponds to ``a=2``; 
     67here, the middle value is "yes"... 
     68 
     69The second lookup is simpler: since it involves only a single feature, 
     70the list is a simple one-to-one mapping from the four-valued ``e`` to the 
     71two-valued ``e1``. The last value in the list is returned when ``e`` is unknown 
     72and tells that ``e1`` should be unknown then as well. 
     73 
     74Note that :obj:`ClassifierByLookupTable` is not needed for this. 
     75The new feature ``e1`` could be computed with a callback to Python, 
     76for instance:: 
     77 
     78    e2.get_value_from = lambda ex, rw: orange.Value(e2, ex["e"] == "1") 
     79 
     80 
     81Classifiers by lookup table 
     82=========================== 
     83 
     84.. index:: 
     85   single: classification; lookup table 
     86 
     87Although the above example used :obj:`ClassifierByLookupTable` as if 
     88it was a concrete class, :obj:`ClassifierByLookupTable` is actually 
     89abstract. Calling its constructor does not return an instance of 
     90:obj:`ClassifierByLookupTable`, but either 
     91:obj:`ClassifierByLookupTable1`, :obj:`ClassifierByLookupTable2` or 
     92:obj:`ClassifierByLookupTable3`, that take one (``e``, above), two 
     93(like ``a`` and ``b``) or three features, respectively. Class 
     94predictions for each combination of feature values are stored in a 
     95(one dimensional) table. To classify an instance, the classifier 
     96computes an index of the element of the table that corresponds to the 
     97combination of feature values. 
     98 
     99These classifiers are built to be fast, not safe. For instance, if the 
     100number of values for one of the features is changed, Orange will most 
     101probably crash.  To alleviate this, many of these classes' attributes 
     102are read-only and can only be set when the object is constructed. 
     103 
     104 
     105.. py:class:: ClassifierByLookupTable(class_var, variable1[, variable2[, variable3]] [, lookup_table[, distributions]]) 
     106     
     107    A general constructor that, based on the number of feature 
     108    descriptors, constructs one of the three classes discussed. If 
     109    :obj:`lookup_table` and :obj:`distributions` are omitted, the 
     110    constructor also initializes them to two lists of the right sizes, 
     111    but their elements are missing values and empty distributions. If 
     112    they are given, they must be of correct size. 
     113     
     114    .. attribute:: variable1[, variable2[, variable3]](read only) 
     115         
     116        The feature(s) that the classifier uses for classification. 
     117        :obj:`ClassifierByLookupTable1` only has :obj:`variable1`, 
     118        :obj:`ClassifierByLookupTable2` also has :obj:`variable2` and 
     119        :obj:`ClassifierByLookupTable3` has all three. 
     120 
     121    .. attribute:: variables (read only) 
     122         
     123        The above variables, returned as a tuple. 
     124 
     125    .. attribute:: no_of_values1[, no_of_values2[, no_of_values3]] (read only) 
     126         
     127        The number of values for :obj:`variable1`, :obj:`variable2` 
     128        and :obj:`variable3`. This is stored here to make the 
     129        classifier faster. These attributes are defined only for 
     130        :obj:`ClassifierByLookupTable2` (the first two) and 
     131        :obj:`ClassifierByLookupTable3` (all three). 
     132 
     133    .. attribute:: lookup_table (read only) 
     134         
     135        A list of values, one for each possible combination of 
     136        features. For :obj:`ClassifierByLookupTable1`, there is an 
     137        additional element that is returned when the feature's value 
     138        is unknown. Values are ordered by values of features, with 
     139        :obj:`variable1` being the most important. For instance, for 
     140        two three-valued features, the elements of :obj:`lookup_table` 
     141        correspond to combinations (1, 1), (1, 2), (1, 3), (2, 1), (2, 
     142        2), (2, 3), (3, 1), (3, 2), (3, 3). 
     143         
     144        The attribute is read-only; it cannot be assigned a new list, 
     145        but the existing list can be changed. Changing its size will 
     146        most likely crash Orange. 
     147 
     148    .. attribute:: distributions (read only) 
     149         
     150        Similar to :obj:`lookup_table`, but storing a distribution for 
     151        each combination of values.  
     152 
     153    .. attribute:: data_description 
     154         
     155        An object of type :obj:`EFMDataDescription`, defined only for 
     156        :obj:`ClassifierByLookupTable2` and 
     157        :obj:`ClassifierByLookupTable3`. They use it to make 
     158        predictions when one or more feature values are missing. 
     159        :obj:`ClassifierByLookupTable1` does not need it since this 
     160        case is covered by an additional element in 
     161        :obj:`lookup_table` and :obj:`distributions`, as described 
     162        above. 
     163         
     164    .. method:: get_index(inst) 
     165     
     166        Returns an index of in :obj:`lookup_table` and 
     167        :obj:`distributions` that corresponds to the given data 
     168        instance ``inst`` . The formula depends upon the type of the 
     169        classifier. If value\ *i* is int(example[variable\ *i*]), then 
     170        the corresponding formulae are 
     171 
     172        ``ClassifierByLookupTable1``: 
     173            index = value1, or len(lookup_table) - 1 if value of :obj:`variable1` is missing 
     174 
     175        ``ClassifierByLookupTable2``: 
     176            index = value1 * no_of_values1 + value2, or -1 if ``value1`` or ``value2`` is missing 
     177 
     178        ClassifierByLookupTable3: 
     179            index = (value1 * no_of_values1 + value2) * no_of_values2 + value3, or -1 if any value is missing 
     180 
     181.. py:class:: ClassifierByLookupTable1(class_var, variable1 [, lookup_table, distributions]) 
     182     
     183    Uses a single feature for lookup. See 
     184    :obj:`ClassifierByLookupTable` for more details. 
     185 
     186.. py:class:: ClassifierByLookupTable2(class_var, variable1, variable2, [, lookup_table[, distributions]]) 
     187     
     188    Uses two features for lookup. See 
     189    :obj:`ClassifierByLookupTable` for more details. 
     190         
     191.. py:class:: ClassifierByLookupTable3(class_var, variable1, variable2, variable3, [, lookup_table[, distributions]]) 
     192     
     193    Uses three features for lookup. See 
     194    :obj:`ClassifierByLookupTable` for more details. 
     195 
     196 
     197Classifier by data table 
     198======================== 
     199 
     200.. index:: 
     201   single: classification; data table 
     202 
     203:obj:`ClassifierByDataTable` is used in similar contexts as 
     204:obj:`ClassifierByLookupTable`. The class is much slower so it is recommended to use :obj:`ClassifierByLookupTable` if the number of features is less than four. 
     205 
     206.. py:class:: ClassifierByDataTable 
     207 
     208    :obj:`ClassifierByDataTable` is the alternative to 
     209    :obj:`ClassifierByLookupTable` for more than three features. 
     210    Instead of having a lookup table, it stores the data in 
     211    :obj:`Orange.data.Table` that is optimized for faster access. 
     212     
     213    .. attribute:: sorted_examples 
     214         
     215        A :obj:`Orange.data.Table` with sorted data instances for 
     216        lookup.  If there were multiple instances with the same 
     217        feature values (but possibly different classes) in the 
     218        original data, they can be merged into a single 
     219        instance. Regardless of merging, class values in this table 
     220        are distributed: their ``svalue`` contains a 
     221        :obj:`~Orange.statistics.distribution.Distribution`. 
     222 
     223    .. attribute:: classifier_for_unknown 
     224         
     225        The classifier for instances that are not found in the 
     226        table. If not set, :obj:`ClassifierByDataTable` returns 
     227        missing value for such instances. 
     228 
     229    .. attribute:: variables (read only) 
     230         
     231        A tuple with features in the domain. Equal to 
     232        :obj:`domain.features`, but here for similarity with 
     233        :obj:`ClassifierByLookupTable`. 
     234 
     235 
     236 
     237.. py:class:: LookupLearner 
     238     
     239    A learner that constructs a table for 
     240    :obj:`ClassifierByDataTable.sorted_examples`. It sorts the data 
     241    instances and merges those with the same feature values. 
     242     
     243    The constructor returns an instance of :obj:`LookupLearners`, 
     244    unless the data is provided, in which case it return 
     245    :obj:`ClassifierByDataTable`. 
     246 
     247    :obj:`LookupLearner` also supports a different call signature than 
     248    other learners. Besides instances, it accepts a new class 
     249    variable and the features that should be used for 
     250    classification.  
     251 
     252part of :download:`lookup-table.py <code/lookup-table.py>`: 
     253 
     254.. 
     255    .. literalinclude:: code/lookup-table.py 
     256        :lines: 7-13 
     257 
     258.. testcode:: 
     259         
     260    import Orange 
     261 
     262    table = Orange.data.Table("monks-1") 
     263    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"] 
     264 
     265    table_s = table.select([a, b, e, table.domain.class_var]) 
     266    abe = Orange.classification.lookup.LookupLearner(table_s) 
     267 
     268 
     269In ``table_s``, we have prepared a table in which instances are described 
     270only by ``a``, ``b``, ``e`` and the class. The learner constructs a 
     271:obj:`ClassifierByDataTable` and stores instances from ``table_s`` into its 
     272:obj:`~ClassifierByDataTable.sorted_examples`. Instances are merged so that 
     273there are no duplicates. 
     274 
     275    >>> print len(table_s) 
     276    556 
     277    >>> print len(abe.sorted_examples) 
     278    36 
     279    >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP 
     280    ...     print i 
     281    ['1', '1', '1', '1'] 
     282    ['1', '1', '2', '1'] 
     283    ['1', '1', '3', '1'] 
     284    ['1', '1', '4', '1'] 
     285    ['1', '2', '1', '1'] 
     286    ['1', '2', '2', '0'] 
     287    ['1', '2', '3', '0'] 
     288    ['1', '2', '4', '0'] 
     289    ['1', '3', '1', '1'] 
     290    ['1', '3', '2', '0'] 
     291 
     292Each instance's class value also stores the distribution of classes 
     293for all instances that were merged into it. In our case, the three 
     294features suffice to unambiguously determine the classes and, since 
     295instances cover the entire space, all distributions have 12 
     296instances in one of the class and none in the other. 
     297 
     298    >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP 
     299    ...     print i, i.get_class().svalue 
     300    ['1', '1', '1', '1'] <0.000, 12.000> 
     301    ['1', '1', '2', '1'] <0.000, 12.000> 
     302    ['1', '1', '3', '1'] <0.000, 12.000> 
     303    ['1', '1', '4', '1'] <0.000, 12.000> 
     304    ['1', '2', '1', '1'] <0.000, 12.000> 
     305    ['1', '2', '2', '0'] <12.000, 0.000> 
     306    ['1', '2', '3', '0'] <12.000, 0.000> 
     307    ['1', '2', '4', '0'] <12.000, 0.000> 
     308    ['1', '3', '1', '1'] <0.000, 12.000> 
     309    ['1', '3', '2', '0'] <12.000, 0.000> 
     310 
     311A typical use of :obj:`ClassifierByDataTable` is to construct a new 
     312feature and put the classifier into its 
     313:obj:`~Orange.feature.Descriptor.get_value_from`. 
     314 
     315    >>> y2 = Orange.feature.Discrete("y2", values = ["0", "1"]) 
     316    >>> y2.get_value_from = abe 
     317 
     318Although ``abe`` determines the value of ``y2``, ``abe.class_var`` is 
     319still ``y``.  Orange does not complain about the mismatch. 
     320 
     321Using the specific :obj:`LookupLearner`'s call signature can save us 
     322from constructing `table_s` and reassigning the 
     323:obj:`~Orange.data.Domain.class_var`, but it still does not set the 
     324:obj:`~Orange.feature.Descriptor.get_value_from`. 
     325 
     326part of :download:`lookup-table.py <code/lookup-table.py>`:: 
     327 
     328    import Orange 
     329 
     330    table = Orange.data.Table("monks-1") 
     331    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"] 
     332 
     333    y2 = Orange.feature.Discrete("y2", values = ["0", "1"]) 
     334    abe2 = Orange.classification.lookup.LookupLearner(y2, [a, b, e], table) 
     335 
     336For the final example, :obj:`LookupLearner`'s alternative call 
     337arguments offers an easy way to observe feature interactions. For this 
     338purpose, we shall omit ``e``, and construct a 
     339:obj:`ClassifierByDataTable` from ``a`` and ``b`` only (part of 
     340:download:`lookup-table.py <code/lookup-table.py>`): 
     341 
     342.. literalinclude:: code/lookup-table.py 
     343    :lines: 32-35 
     344 
     345The script's output show how the classes are distributed for different 
     346values of ``a`` and ``b``:: 
     347 
     348    ['1', '1', '1'] <0.000, 48.000> 
     349    ['1', '2', '0'] <36.000, 12.000> 
     350    ['1', '3', '0'] <36.000, 12.000> 
     351    ['2', '1', '0'] <36.000, 12.000> 
     352    ['2', '2', '1'] <0.000, 48.000> 
     353    ['2', '3', '0'] <36.000, 12.000> 
     354    ['3', '1', '0'] <36.000, 12.000> 
     355    ['3', '2', '0'] <36.000, 12.000> 
     356    ['3', '3', '1'] <0.000, 48.000> 
     357 
     358For instance, when ``a`` is '1' and ``b`` is '3', the majority class is '0', 
     359and the class distribution is 36:12 in favor of '0'. 
     360 
     361 
     362Utility functions 
     363================= 
     364 
     365 
     366There are several functions related to the above classes. 
     367 
     368.. function:: lookup_from_function(class_var, bound, function) 
     369 
     370    Construct a :obj:`ClassifierByLookupTable` or 
     371    :obj:`ClassifierByDataTable` with the given bound variables and 
     372    then use the function to initialize the lookup table. 
     373 
     374    The function is given the values of features as integer indices and 
     375    must return an integer index of the `class_var`'s value. 
     376 
     377    The following example constructs a new feature called ``a=b`` 
     378    whose value will be "yes" when ``a`` and ``b`` are equal and "no" 
     379    when they are not. We will then add the feature to the data set. 
     380     
     381        >>> bound = [table.domain[name] for name in ["a", "b"]] 
     382        >>> new_var = Orange.feature.Discrete("a=b", values=["no", "yes"]) 
     383        >>> lookup = Orange.classification.lookup.lookup_from_function(new_var, bound, lambda x: x[0] == x[1]) 
     384        >>> new_var.get_value_from = lookup 
     385        >>> import orngCI 
     386        >>> table2 = orngCI.addAnAttribute(new_var, table) 
     387        >>> for i in table2[:30]: 
     388        ...     print i 
     389        ['1', '1', '1', '1', '3', '1', 'yes', '1'] 
     390        ['1', '1', '1', '1', '3', '2', 'yes', '1'] 
     391        ['1', '1', '1', '3', '2', '1', 'yes', '1'] 
     392        ... 
     393        ['1', '2', '1', '1', '1', '2', 'no', '1'] 
     394        ['1', '2', '1', '1', '2', '1', 'no', '0'] 
     395        ['1', '2', '1', '1', '3', '1', 'no', '0'] 
     396        ... 
     397 
     398    The feature was inserted with use of ``orngCI.addAnAttribute``. By setting 
     399    ``new_var.get_value_from`` to ``lookup`` we state that when converting domains 
     400    (either when needed by ``addAnAttribute`` or at some other place), ``lookup`` 
     401    should be used to compute ``new_var``'s value. 
     402 
     403.. function:: lookup_from_data(examples [, weight]) 
     404 
     405    Take a set of data instances (e.g. :obj:`Orange.data.Table`) and 
     406    turn it into a classifier. If there are one, two or three features 
     407    and no ambiguous data instances (i.e. no instances with same 
     408    feature values and different classes), it will construct an 
     409    appropriate :obj:`ClassifierByLookupTable`. Otherwise, it will 
     410    return an :obj:`ClassifierByDataTable`. 
     411     
     412        >>> lookup = Orange.classification.lookup.lookup_from_data(table) 
     413        >>> test_instance = Orange.data.Instance(table.domain, ['3', '2', '2', '3', '4', '1', '?']) 
     414        >>> lookup(test_instance) 
     415        <orange.Value 'y'='0'> 
     416     
     417.. function:: dump_lookup_function(func) 
     418 
     419    Returns a string with a lookup function. Argument ``func`` can be 
     420    any of the above-mentioned classifiers or a feature whose 
     421    :obj:`~Orange.feature.Descriptor.get_value_from` contains one of 
     422    such classifiers. 
     423 
     424    For instance, if ``lookup`` is such as constructed in the example for 
     425    ``lookup_from_function``, it can be printed by:: 
     426     
     427        >>> print dump_lookup_function(lookup) 
     428        a      b      a=b 
     429        ------ ------ ------ 
     430        1      1      yes 
     431        1      2      no 
     432        1      3      no 
     433        2      1      no 
     434        2      2      yes 
     435        2      3      no 
     436        3      1      no 
     437        3      2      no 
     438        3      3      yes 
     439 
Note: See TracChangeset for help on using the changeset viewer.