source: orange/docs/reference/rst/Orange.classification.lookup.rst @ 10347:566f380f0bc5

Revision 10347:566f380f0bc5, 17.1 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Dedostoyevskied documentation on lookup learners, changed the order of classifiers in the index

Line 
1.. py:currentmodule:: Orange.classification.lookup
2
3.. index:: classification; lookup
4
5*******************************
6Lookup classifiers (``lookup``)
7*******************************
8
9Lookup classifiers predict classes by looking into stored lists of
10cases. There are two kinds of such classifiers in Orange. The simpler
11and faster :obj:`ClassifierByLookupTable` uses up to three discrete
12features and has a stored mapping from values of those features to the
13class value. The more complex classifiers store an
14:obj:`Orange.data.Table` and predict the class by matching the
15instance to instances in the table.
16
17.. index::
18   single: feature construction; lookup classifiers
19
20A natural habitat for these classifiers is feature construction: they
21usually reside in :obj:`~Orange.feature.Descriptor.get_value_from`
22fields of constructed features to facilitate their automatic
23computation. For instance, the following script shows how to translate
24the ``monks-1.tab`` data set features into a more useful subset that
25will only include the features ``a``, ``b``, ``e``, and features that
26will tell whether ``a`` and ``b`` are equal and whether ``e`` is 1
27(part of :download:`lookup-lookup.py <code/lookup-lookup.py>`):
28
29..
30    .. literalinclude:: code/lookup-lookup.py
31        :lines: 7-21
32
33.. testcode::
34
35    import Orange
36
37    monks = Orange.data.Table("monks-1")
38
39    a, b, e = monks.domain["a"], monks.domain["b"], monks.domain["e"]
40
41    ab = Orange.feature.Discrete("a==b", values = ["no", "yes"])
42    ab.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(ab, a, b,
43                        ["yes", "no", "no",  "no", "yes", "no",  "no", "no", "yes"])
44
45    e1 = Orange.feature.Discrete("e==1", values = ["no", "yes"])
46    e1.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(e1, e,
47                        ["yes", "no", "no", "no", "?"])
48
49    monks2 = monks.select([a, b, ab, e, e1, monks.domain.class_var])
50   
51We can check the correctness of the script by printing out several
52random examples from ``data2``.
53
54    >>> for i in range(5):
55    ...     print monks2.randomexample()
56    ['3', '2', 'no', '2', 'no', '0']
57    ['2', '2', 'yes', '2', 'no', '1']
58    ['1', '2', 'no', '2', 'no', '0']
59    ['2', '3', 'no', '1', 'yes', '1']
60    ['1', '3', 'no', '1', 'yes', '1']
61
62The first :obj:`ClassifierByLookupTable` takes values of features ``a``
63and ``b`` and computes the value of ``ab`` according to the rule given in the
64given table. The first three values correspond to ``a=1`` and ``b=1,2,3``;
65for the first combination, value of ``ab`` should be "yes", for the other
66two ``a`` and ``b`` are different. The next triplet corresponds to ``a=2``;
67here, the middle value is "yes"...
68
69The second lookup is simpler: since it involves only a single feature,
70the list is a simple one-to-one mapping from the four-valued ``e`` to the
71two-valued ``e1``. The last value in the list is returned when ``e`` is unknown
72and tells that ``e1`` should be unknown then as well.
73
74Note that :obj:`ClassifierByLookupTable` is not needed for this.
75The new feature ``e1`` could be computed with a callback to Python,
76for instance::
77
78    e2.get_value_from = lambda ex, rw: orange.Value(e2, ex["e"] == "1")
79
80
81Classifiers by lookup table
82===========================
83
84.. index::
85   single: classification; lookup table
86
87Although the above example used :obj:`ClassifierByLookupTable` as if
88it was a concrete class, :obj:`ClassifierByLookupTable` is actually
89abstract. Calling its constructor does not return an instance of
90:obj:`ClassifierByLookupTable`, but either
91:obj:`ClassifierByLookupTable1`, :obj:`ClassifierByLookupTable2` or
92:obj:`ClassifierByLookupTable3`, that take one (``e``, above), two
93(like ``a`` and ``b``) or three features, respectively. Class
94predictions for each combination of feature values are stored in a
95(one dimensional) table. To classify an instance, the classifier
96computes an index of the element of the table that corresponds to the
97combination of feature values.
98
99These classifiers are built to be fast, not safe. For instance, if the
100number of values for one of the features is changed, Orange will most
101probably crash.  To alleviate this, many of these classes' attributes
102are read-only and can only be set when the object is constructed.
103
104
105.. py:class:: ClassifierByLookupTable(class_var, variable1[, variable2[, variable3]] [, lookup_table[, distributions]])
106   
107    A general constructor that, based on the number of feature
108    descriptors, constructs one of the three classes discussed. If
109    :obj:`lookup_table` and :obj:`distributions` are omitted, the
110    constructor also initializes them to two lists of the right sizes,
111    but their elements are missing values and empty distributions. If
112    they are given, they must be of correct size.
113   
114    .. attribute:: variable1[, variable2[, variable3]](read only)
115       
116        The feature(s) that the classifier uses for classification.
117        :obj:`ClassifierByLookupTable1` only has :obj:`variable1`,
118        :obj:`ClassifierByLookupTable2` also has :obj:`variable2` and
119        :obj:`ClassifierByLookupTable3` has all three.
120
121    .. attribute:: variables (read only)
122       
123        The above variables, returned as a tuple.
124
125    .. attribute:: no_of_values1[, no_of_values2[, no_of_values3]] (read only)
126       
127        The number of values for :obj:`variable1`, :obj:`variable2`
128        and :obj:`variable3`. This is stored here to make the
129        classifier faster. These attributes are defined only for
130        :obj:`ClassifierByLookupTable2` (the first two) and
131        :obj:`ClassifierByLookupTable3` (all three).
132
133    .. attribute:: lookup_table (read only)
134       
135        A list of values, one for each possible combination of
136        features. For :obj:`ClassifierByLookupTable1`, there is an
137        additional element that is returned when the feature's value
138        is unknown. Values are ordered by values of features, with
139        :obj:`variable1` being the most important. For instance, for
140        two three-valued features, the elements of :obj:`lookup_table`
141        correspond to combinations (1, 1), (1, 2), (1, 3), (2, 1), (2,
142        2), (2, 3), (3, 1), (3, 2), (3, 3).
143       
144        The attribute is read-only; it cannot be assigned a new list,
145        but the existing list can be changed. Changing its size will
146        most likely crash Orange.
147
148    .. attribute:: distributions (read only)
149       
150        Similar to :obj:`lookup_table`, but storing a distribution for
151        each combination of values.
152
153    .. attribute:: data_description
154       
155        An object of type :obj:`EFMDataDescription`, defined only for
156        :obj:`ClassifierByLookupTable2` and
157        :obj:`ClassifierByLookupTable3`. They use it to make
158        predictions when one or more feature values are missing.
159        :obj:`ClassifierByLookupTable1` does not need it since this
160        case is covered by an additional element in
161        :obj:`lookup_table` and :obj:`distributions`, as described
162        above.
163       
164    .. method:: get_index(inst)
165   
166        Returns an index of in :obj:`lookup_table` and
167        :obj:`distributions` that corresponds to the given data
168        instance ``inst`` . The formula depends upon the type of the
169        classifier. If value\ *i* is int(example[variable\ *i*]), then
170        the corresponding formulae are
171
172        ``ClassifierByLookupTable1``:
173            index = value1, or len(lookup_table) - 1 if value of :obj:`variable1` is missing
174
175        ``ClassifierByLookupTable2``:
176            index = value1 * no_of_values1 + value2, or -1 if ``value1`` or ``value2`` is missing
177
178        ClassifierByLookupTable3:
179            index = (value1 * no_of_values1 + value2) * no_of_values2 + value3, or -1 if any value is missing
180
181.. py:class:: ClassifierByLookupTable1(class_var, variable1 [, lookup_table, distributions])
182   
183    Uses a single feature for lookup. See
184    :obj:`ClassifierByLookupTable` for more details.
185
186.. py:class:: ClassifierByLookupTable2(class_var, variable1, variable2, [, lookup_table[, distributions]])
187   
188    Uses two features for lookup. See
189    :obj:`ClassifierByLookupTable` for more details.
190       
191.. py:class:: ClassifierByLookupTable3(class_var, variable1, variable2, variable3, [, lookup_table[, distributions]])
192   
193    Uses three features for lookup. See
194    :obj:`ClassifierByLookupTable` for more details.
195
196
197Classifier by data table
198========================
199
200.. index::
201   single: classification; data table
202
203:obj:`ClassifierByDataTable` is used in similar contexts as
204:obj:`ClassifierByLookupTable`. The class is much slower so it is recommended to use :obj:`ClassifierByLookupTable` if the number of features is less than four.
205
206.. py:class:: ClassifierByDataTable
207
208    :obj:`ClassifierByDataTable` is the alternative to
209    :obj:`ClassifierByLookupTable` for more than three features.
210    Instead of having a lookup table, it stores the data in
211    :obj:`Orange.data.Table` that is optimized for faster access.
212   
213    .. attribute:: sorted_examples
214       
215        A :obj:`Orange.data.Table` with sorted data instances for
216        lookup.  If there were multiple instances with the same
217        feature values (but possibly different classes) in the
218        original data, they can be merged into a single
219        instance. Regardless of merging, class values in this table
220        are distributed: their ``svalue`` contains a
221        :obj:`~Orange.statistics.distribution.Distribution`.
222
223    .. attribute:: classifier_for_unknown
224       
225        The classifier for instances that are not found in the
226        table. If not set, :obj:`ClassifierByDataTable` returns
227        missing value for such instances.
228
229    .. attribute:: variables (read only)
230       
231        A tuple with features in the domain. Equal to
232        :obj:`domain.features`, but here for similarity with
233        :obj:`ClassifierByLookupTable`.
234
235
236
237.. py:class:: LookupLearner
238   
239    A learner that constructs a table for
240    :obj:`ClassifierByDataTable.sorted_examples`. It sorts the data
241    instances and merges those with the same feature values.
242   
243    The constructor returns an instance of :obj:`LookupLearners`,
244    unless the data is provided, in which case it return
245    :obj:`ClassifierByDataTable`.
246
247    :obj:`LookupLearner` also supports a different call signature than
248    other learners. Besides instances, it accepts a new class
249    variable and the features that should be used for
250    classification.
251
252part of :download:`lookup-table.py <code/lookup-table.py>`:
253
254..
255    .. literalinclude:: code/lookup-table.py
256        :lines: 7-13
257
258.. testcode::
259       
260    import Orange
261
262    table = Orange.data.Table("monks-1")
263    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"]
264
265    table_s = table.select([a, b, e, table.domain.class_var])
266    abe = Orange.classification.lookup.LookupLearner(table_s)
267
268
269In ``table_s``, we have prepared a table in which instances are described
270only by ``a``, ``b``, ``e`` and the class. The learner constructs a
271:obj:`ClassifierByDataTable` and stores instances from ``table_s`` into its
272:obj:`~ClassifierByDataTable.sorted_examples`. Instances are merged so that
273there are no duplicates.
274
275    >>> print len(table_s)
276    556
277    >>> print len(abe.sorted_examples)
278    36
279    >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP
280    ...     print i
281    ['1', '1', '1', '1']
282    ['1', '1', '2', '1']
283    ['1', '1', '3', '1']
284    ['1', '1', '4', '1']
285    ['1', '2', '1', '1']
286    ['1', '2', '2', '0']
287    ['1', '2', '3', '0']
288    ['1', '2', '4', '0']
289    ['1', '3', '1', '1']
290    ['1', '3', '2', '0']
291
292Each instance's class value also stores the distribution of classes
293for all instances that were merged into it. In our case, the three
294features suffice to unambiguously determine the classes and, since
295instances cover the entire space, all distributions have 12
296instances in one of the class and none in the other.
297
298    >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP
299    ...     print i, i.get_class().svalue
300    ['1', '1', '1', '1'] <0.000, 12.000>
301    ['1', '1', '2', '1'] <0.000, 12.000>
302    ['1', '1', '3', '1'] <0.000, 12.000>
303    ['1', '1', '4', '1'] <0.000, 12.000>
304    ['1', '2', '1', '1'] <0.000, 12.000>
305    ['1', '2', '2', '0'] <12.000, 0.000>
306    ['1', '2', '3', '0'] <12.000, 0.000>
307    ['1', '2', '4', '0'] <12.000, 0.000>
308    ['1', '3', '1', '1'] <0.000, 12.000>
309    ['1', '3', '2', '0'] <12.000, 0.000>
310
311A typical use of :obj:`ClassifierByDataTable` is to construct a new
312feature and put the classifier into its
313:obj:`~Orange.feature.Descriptor.get_value_from`.
314
315    >>> y2 = Orange.feature.Discrete("y2", values = ["0", "1"])
316    >>> y2.get_value_from = abe
317
318Although ``abe`` determines the value of ``y2``, ``abe.class_var`` is
319still ``y``.  Orange does not complain about the mismatch.
320
321Using the specific :obj:`LookupLearner`'s call signature can save us
322from constructing `table_s` and reassigning the
323:obj:`~Orange.data.Domain.class_var`, but it still does not set the
324:obj:`~Orange.feature.Descriptor.get_value_from`.
325
326part of :download:`lookup-table.py <code/lookup-table.py>`::
327
328    import Orange
329
330    table = Orange.data.Table("monks-1")
331    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"]
332
333    y2 = Orange.feature.Discrete("y2", values = ["0", "1"])
334    abe2 = Orange.classification.lookup.LookupLearner(y2, [a, b, e], table)
335
336For the final example, :obj:`LookupLearner`'s alternative call
337arguments offers an easy way to observe feature interactions. For this
338purpose, we shall omit ``e``, and construct a
339:obj:`ClassifierByDataTable` from ``a`` and ``b`` only (part of
340:download:`lookup-table.py <code/lookup-table.py>`):
341
342.. literalinclude:: code/lookup-table.py
343    :lines: 32-35
344
345The script's output show how the classes are distributed for different
346values of ``a`` and ``b``::
347
348    ['1', '1', '1'] <0.000, 48.000>
349    ['1', '2', '0'] <36.000, 12.000>
350    ['1', '3', '0'] <36.000, 12.000>
351    ['2', '1', '0'] <36.000, 12.000>
352    ['2', '2', '1'] <0.000, 48.000>
353    ['2', '3', '0'] <36.000, 12.000>
354    ['3', '1', '0'] <36.000, 12.000>
355    ['3', '2', '0'] <36.000, 12.000>
356    ['3', '3', '1'] <0.000, 48.000>
357
358For instance, when ``a`` is '1' and ``b`` is '3', the majority class is '0',
359and the class distribution is 36:12 in favor of '0'.
360
361
362Utility functions
363=================
364
365
366There are several functions related to the above classes.
367
368.. function:: lookup_from_function(class_var, bound, function)
369
370    Construct a :obj:`ClassifierByLookupTable` or
371    :obj:`ClassifierByDataTable` with the given bound variables and
372    then use the function to initialize the lookup table.
373
374    The function is given the values of features as integer indices and
375    must return an integer index of the `class_var`'s value.
376
377    The following example constructs a new feature called ``a=b``
378    whose value will be "yes" when ``a`` and ``b`` are equal and "no"
379    when they are not. We will then add the feature to the data set.
380   
381        >>> bound = [table.domain[name] for name in ["a", "b"]]
382        >>> new_var = Orange.feature.Discrete("a=b", values=["no", "yes"])
383        >>> lookup = Orange.classification.lookup.lookup_from_function(new_var, bound, lambda x: x[0] == x[1])
384        >>> new_var.get_value_from = lookup
385        >>> import orngCI
386        >>> table2 = orngCI.addAnAttribute(new_var, table)
387        >>> for i in table2[:30]:
388        ...     print i
389        ['1', '1', '1', '1', '3', '1', 'yes', '1']
390        ['1', '1', '1', '1', '3', '2', 'yes', '1']
391        ['1', '1', '1', '3', '2', '1', 'yes', '1']
392        ...
393        ['1', '2', '1', '1', '1', '2', 'no', '1']
394        ['1', '2', '1', '1', '2', '1', 'no', '0']
395        ['1', '2', '1', '1', '3', '1', 'no', '0']
396        ...
397
398    The feature was inserted with use of ``orngCI.addAnAttribute``. By setting
399    ``new_var.get_value_from`` to ``lookup`` we state that when converting domains
400    (either when needed by ``addAnAttribute`` or at some other place), ``lookup``
401    should be used to compute ``new_var``'s value.
402
403.. function:: lookup_from_data(examples [, weight])
404
405    Take a set of data instances (e.g. :obj:`Orange.data.Table`) and
406    turn it into a classifier. If there are one, two or three features
407    and no ambiguous data instances (i.e. no instances with same
408    feature values and different classes), it will construct an
409    appropriate :obj:`ClassifierByLookupTable`. Otherwise, it will
410    return an :obj:`ClassifierByDataTable`.
411   
412        >>> lookup = Orange.classification.lookup.lookup_from_data(table)
413        >>> test_instance = Orange.data.Instance(table.domain, ['3', '2', '2', '3', '4', '1', '?'])
414        >>> lookup(test_instance)
415        <orange.Value 'y'='0'>
416   
417.. function:: dump_lookup_function(func)
418
419    Returns a string with a lookup function. Argument ``func`` can be
420    any of the above-mentioned classifiers or a feature whose
421    :obj:`~Orange.feature.Descriptor.get_value_from` contains one of
422    such classifiers.
423
424    For instance, if ``lookup`` is such as constructed in the example for
425    ``lookup_from_function``, it can be printed by::
426   
427        >>> print dump_lookup_function(lookup)
428        a      b      a=b
429        ------ ------ ------
430        1      1      yes
431        1      2      no
432        1      3      no
433        2      1      no
434        2      2      yes
435        2      3      no
436        3      1      no
437        3      2      no
438        3      3      yes
439
Note: See TracBrowser for help on using the repository browser.