source: orange/orange/Orange/classification/lookup.py @ 7754:9714af1924c6

Revision 7754:9714af1924c6, 18.1 KB checked in by markotoplak, 3 years ago (diff)

Fixed some links in lookup documentation.

Line 
1"""
2
3.. index:: classification; lookup
4
5******************
6Lookup classifiers
7******************
8
9Lookup classifiers predict classes by looking into stored lists of
10cases. There are two kinds of such classifiers in Orange. The simpler
11and fastest :obj:`ClassifierByLookupTable` use up to three discrete
12features and have a stored mapping from values of those features to
13class value. The more complex classifiers store a
14:obj:`Orange.data.Table` and predict the class by matching the instance
15to instances in the table.
16
17.. index::
18   single: feature construction; lookup classifiers
19
20A natural habitat for these classifiers is feature construction:
21they usually reside in :obj:`getValueFrom` fields of constructed
22features to facilitate their automatic computation. For instance,
23the following script shows how to translate the `monks-1.tab`_ dataset
24features into a more useful subset that will only include the features
25a, b, e, and features that will tell whether a and b are equal and
26whether e is 1 (don't bother about the details, they follow later;
27`lookup-lookup.py`_, uses: `monks-1.tab`_):
28
29.. literalinclude:: code/lookup-lookup.py
30    :lines: 7-21
31   
32We can check the correctness of the script by printing out several
33random examples from data2.
34
35    >>> for i in range(5):
36    ...     print data2.randomexample()
37    ['1', '1', 'yes', '4', 'no', '1']
38    ['3', '3', 'yes', '2', 'no', '1']
39    ['2', '1', 'no', '4', 'no', '0']
40    ['2', '1', 'no', '1', 'yes', '1']
41    ['1', '1', 'yes', '3', 'no', '1']
42
43The first :obj:`ClassifierByLookupTable` takes values of features a
44and b and computes the value of ab according to the rule given in the
45given table. The first three values correspond to a=1 and b=1, 2, 3;
46for the first combination, value of ab should be "yes", for the other
47two a and b are different. The next triplet correspond to a=2;
48here, the middle value is "yes"...
49
50The second lookup is simpler: since it involves only a single feature,
51the list is a simple one-to-one mapping from the four-valued e to the
52two-valued e1. The last value in the list is returned when e is unknown
53and tells that e1 should be unknown then as well.
54
55Note that you don't need :obj:`ClassifierByLookupTable` for this.
56The new feature e1 could be computed with a callback to Python,
57for instance::
58
59    e2.getValueFrom = lambda ex, rw: orange.Value(e2, ex["e"]=="1")
60
61
62Classifiers by lookup table
63===========================
64
65.. index::
66   single: classification; lookup table
67
68Although the above example used :obj:`ClassifierByLookupTable` as if it
69was a concrete class, ClassifierByLookupTable is actually
70abstract. Calling its constructor is a typical Orange trick: what you
71get, is never ClassifierByLookupTable, but either
72:obj:`ClassifierByLookupTable1`, :obj:`ClassifierByLookupTable2` or
73:obj:`ClassifierByLookupTable3`. As their names tell, the first
74classifies using a single feature (so that's what we had for e1),
75the second uses a pair of features (and has been constructed for ab
76above), and the third uses three features. Class predictions for each
77combination of feature values are stored in a (one dimensional) table.
78To classify an instance, the classifier computes an index of the element
79of the table that corresponds to the combination of feature values.
80
81These classifiers are built to be fast, not safe. If you, for instance,
82change the number of values for one of the features, Orange will
83most probably crash. To protect you somewhat, many of these classes'
84features are read-only and can only be set when the object is
85constructed.
86
87
88.. py:class:: ClassifierByLookupTable(classVar, variable1[, variable2[, variable3]] [, lookupTable[, distributions]])
89   
90    A general constructor that, based on the number of attribute
91    descriptors, constructs one of the three classes discussed.
92    If lookupTable and distributions are omitted, constructor also
93    initializes lookupTable and distributions to two lists of the
94    right sizes, but their elements are don't knows and empty
95    distributions. If they are given, they must be of correct size.
96   
97    .. attribute:: variable1[, variable2[, variable3]](read only)
98       
99        The feature(s) that the classifier uses for classification.
100        ClassifierByLookupTable1 only has variable1,
101        ClassifierByLookupTable2 also has variable2 and
102        ClassifierByLookupTable3 has all three.
103
104    .. attribute:: variables (read only)
105       
106        The above variables, returned as a tuple.
107
108    .. attribute:: noOfValues1, noOfValues2[, noOfValues3] (read only)
109       
110        The number of values for variable1, variable2 and variable3.
111        This is stored here to make the classifier faster. Those features
112        are defined only for ClassifierByLookupTable2 (the first two) and
113        ClassifierByLookupTable3 (all three).
114
115    .. attribute:: lookupTable (read only)
116       
117        A list of values (ValueList), one for each possible combination of
118        features. For ClassifierByLookupTable1, there is an additional
119        element that is returned when the feature's value is unknown.
120        Values are ordered by values of features, with variable1 being the
121        most important. In case of two three valued features, the list
122        order is therefore 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3,
123        where the first digit corresponds to variable1 and the second to
124        variable2.
125       
126        The list is read-only in the sense that you cannot assign a new
127        list to this field. You can, however, change its elements. Don't
128        change its size, though.
129
130    .. attribute:: distributions (read only)
131       
132        Similar to :obj:`lookupTable`, but is of type DistributionList
133        and stores a distribution for each combination of values.
134
135    .. attribute:: dataDescription
136       
137        An object of type EFMDataDescription, defined only for
138        ClassifierByLookupTable2 and ClassifierByLookupTable3. They use
139        it to make predictions when one or more feature values are unknown.
140        ClassifierByLookupTable1 doesn't need it since this case is covered
141        by an additional element in lookupTable and distributions,
142        as told above.
143       
144    .. method:: getindex(example)
145   
146        Returns an index into lookupTable or distributions. The formula
147        depends upon the type of the classifier. If value\ *i* is
148        int(example[variable\ *i*]), then the corresponding formulae are
149
150        ClassifierByLookupTable1:
151            index = value1, or len(lookupTable)-1 if value is unknown
152        ClassifierByLookupTable2:
153            index = value1*noOfValues1 + value2, or -1 if any value is unknown
154        ClassifierByLookupTable3:
155            index = (value1*noOfValues1 + value2) * noOfValues2 + value3, or -1 if any value is unknown
156
157        Let's see some indices for randomly chosen examples from the original table.
158       
159        part of `lookup-lookup.py`_ (uses: `monks-1.tab`_):
160
161        .. literalinclude:: code/lookup-lookup.py
162            :lines: 26-29
163       
164        Output::
165       
166            ['1', '1', '2', '2', '4', '1', '1']: ab 0, e1 3
167            ['3', '3', '1', '2', '2', '1', '1']: ab 8, e1 1
168            ['2', '1', '2', '3', '4', '2', '0']: ab 3, e1 3
169            ['2', '1', '1', '2', '1', '1', '1']: ab 3, e1 0
170            ['1', '1', '1', '2', '3', '1', '1']: ab 0, e1 2
171
172
173
174.. py:class:: ClassifierByLookupTable1(classVar, variable1 [, lookupTable, distributions])
175   
176    Uses a single feature for lookup. See
177    :obj:`ClassifierByLookupTable` for more details.
178
179.. py:class:: ClassifierByLookupTable2(classVar, variable1, variable2, [, lookupTable[, distributions]])
180   
181    Uses two features for lookup. See
182    :obj:`ClassifierByLookupTable` for more details.
183       
184.. py:class:: ClassifierByLookupTable3(classVar, variable1, variable2, variable3, [, lookupTable[, distributions]])
185   
186    Uses three features for lookup. See
187    :obj:`ClassifierByLookupTable` for more details.
188
189
190Classifier by data table
191========================
192
193.. index::
194   single: classification; data table
195
196:obj:`ClassifierByExampleTable` is used in similar contexts as
197:obj:`ClassifierByLookupTable`. If you write, for instance, a
198constructive induction algorithm, it is recommended that the values
199of the new feature are computed either by one of classifiers by lookup
200table or by ClassifierByExampleTable, depending on the number of bound
201features.
202
203.. py:class:: ClassifierByExampleTable
204
205    :obj:`ClassifierByExampleTable` is the alternative to
206    :obj:`ClassifierByLookupTable`. It is to be used when the
207    classification is based on more than three features. Instead of having
208    a lookup table, it stores an :obj:`Orange.data.Table`, which is
209    optimized for a faster access.
210   
211
212    .. attribute:: sortedExamples
213       
214        A :obj:`Orange.data.Table` with sorted data instances for lookup.
215        Instances in the table can be merged; if there were multiple
216        instances with the same feature values (but possibly different
217        classes), they are merged into a single instance. Regardless of
218        merging, class values in this table are distributed: their svalue
219        contains a :obj:`Distribution`.
220
221    .. attribute:: classifierForUnknown
222       
223        This classifier is used to classify instances which were not found
224        in the table. If classifierForUnknown is not set, don't know's are
225        returned.
226
227    .. attribute:: variables (read only)
228       
229        A tuple with features in the domain. This field is here so that
230        :obj:`ClassifierByExampleTable` appears more similar to
231        :obj:`ClassifierByLookupTable`. If a constructive induction
232        algorithm returns the result in one of these classifiers, and you
233        would like to check which features are used, you can use variables
234        regardless of the class you actually got.
235
236    There are no specific methods for ClassifierByExampleTable.
237    Since this is a classifier, it can be called. When the instance to be
238    classified includes unknown values, :obj:`classifierForUnknown` will be
239    used if it is defined.
240
241
242
243.. py:class:: LookupLearner
244   
245    Although :obj:`ClassifierByExampleTable` is not really a classifier in
246    the sense that you will use it to classify instances, but is rather a
247    function for computation of intermediate values, it has an associated
248    learner, :obj:`LookupLearner`. The learner's task is, basically, to
249    construct a Table for :obj:`sortedExamples`. It sorts them, merges them
250    and, of course, regards instance weights in the process as well.
251   
252    If data instances are provided to the constructor, the learning algorithm
253    is called and the resulting classifier is returned instead of the learner.
254
255part of `lookup-table.py`_ (uses: `monks-1.tab`_):
256
257.. literalinclude:: code/lookup-table.py
258    :lines: 7-13
259
260
261In data_s, we have prepared a table in which instances are described
262only by a, b, e and the class. Learner constructs a
263ClassifierByExampleTable and stores instances from data_s into its
264sortedExamples. Instances are merged so that there are no duplicates.
265
266    >>> print len(data_s)
267    432
268    >>> print len(abe2.sortedExamples)
269    36
270    >>> for i in abe2.sortedExamples[:5]:
271    ...     print i
272    ['1', '1', '1', '1']
273    ['1', '1', '2', '1']
274    ['1', '1', '3', '1']
275    ['1', '1', '4', '1']
276    ['1', '2', '1', '1']
277    ['1', '2', '2', '0']
278    ['1', '2', '3', '0']
279    ['1', '2', '4', '0']
280    ['1', '3', '1', '1']
281    ['1', '3', '2', '0']
282
283Well, there's a bit more here than meets the eye: each instance's class
284value also stores the distribution of classes for all instances that
285were merged into it. In our case, the three features suffice to
286unambiguously determine the classes and, since instances covered the
287entire space, all distributions have 12 instances in one of the class
288and none in the other.
289
290    >>> for i in abe2.sortedExamples[:10]:
291    ...     print i, i.getclass().svalue
292    ['1', '1', '1', '1'] <0.000, 12.000>
293    ['1', '1', '2', '1'] <0.000, 12.000>
294    ['1', '1', '3', '1'] <0.000, 12.000>
295    ['1', '1', '4', '1'] <0.000, 12.000>
296    ['1', '2', '1', '1'] <0.000, 12.000>
297    ['1', '2', '2', '0'] <12.000, 0.000>
298    ['1', '2', '3', '0'] <12.000, 0.000>
299    ['1', '2', '4', '0'] <12.000, 0.000>
300    ['1', '3', '1', '1'] <0.000, 12.000>
301    ['1', '3', '2', '0'] <12.000, 0.000>
302
303ClassifierByExampleTable will usually be used by getValueFrom. So, we
304would probably continue this by constructing a new feature and put the
305classifier into its getValueFrom.
306
307    >>> y2 = orange.EnumVariable("y2", values = ["0", "1"])
308    >>> y2.getValueFrom = abe
309
310There's something disturbing here. Although abe determines the value of
311y2, abe.classVar is still y. Orange doesn't bother (the whole example
312is artificial - you will seldom pack the entire data set in an
313ClassifierByExampleTable...), so shouldn't you. But still, for the sake
314of hygiene, you can conclude by
315
316    >>> abe.classVar = y2
317
318The whole story can be greatly simplified. LookupLearner can also be
319called differently than other learners. Besides instances, you can pass
320the new class variable and the features that should be used for
321classification. This saves us from constructing data_s and reassigning
322the classVar. It doesn't set the getValueFrom, though.
323
324part of `lookup-table.py`_ (uses: `monks-1.tab`_)::
325
326    import Orange
327
328    table = Orange.data.Table("monks-1")
329    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"]
330
331    y2 = Orange.data.variable.Discrete("y2", values = ["0", "1"])
332    abe2 = Orange.classification.lookup.LookupLearner(y2, [a, b, e], table)
333
334Let us, for the end, show another use of LookupLearner. With the
335alternative call arguments, it offers an easy way to observe feature
336interactions. For this purpose, we shall omit e, and construct a
337ClassifierByExampleTable from a and b only (part of `lookup-table.py`_; uses: `monks-1.tab`_):
338
339.. literalinclude:: code/lookup-table.py
340    :lines: 32-35
341
342The script's output show how the classes are distributed for different
343values of a and b::
344
345    ['1', '1', '1'] <0.000, 48.000>
346    ['1', '2', '0'] <36.000, 12.000>
347    ['1', '3', '0'] <36.000, 12.000>
348    ['2', '1', '0'] <36.000, 12.000>
349    ['2', '2', '1'] <0.000, 48.000>
350    ['2', '3', '0'] <36.000, 12.000>
351    ['3', '1', '0'] <36.000, 12.000>
352    ['3', '2', '0'] <36.000, 12.000>
353    ['3', '3', '1'] <0.000, 48.000>
354
355For instance, when a is '1' and b is '3', the majority class is '0',
356and the class distribution is 36:12 in favor of '0'.
357
358
359.. _lookup-lookup.py: code/lookup-lookup.py
360.. _lookup-table.py: code/lookup-table.py
361.. _monks-1.tab: code/monks-1.tab
362
363"""
364
365import Orange.data
366from Orange.core import \
367        LookupLearner, \
368         ClassifierByLookupTable, \
369              ClassifierByLookupTable1, \
370              ClassifierByLookupTable2, \
371              ClassifierByLookupTable3, \
372              ClassifierByExampleTable as ClassifierByDataTable
373
374
375def lookup_from_bound(attribute, bound):
376    if not len(bound):
377        raise TypeError, "no bound attributes"
378    elif len(bound) <= 3:
379        return apply([ClassifierByLookupTable, ClassifierByLookupTable2,
380                      ClassifierByLookupTable3][len(bound) - 1],
381                     [attribute] + list(bound))
382    else:
383        return None
384
385   
386def lookup_from_function(attribute, bound, function):
387    """Constructs ClassifierByExampleTable or ClassifierByLookupTable
388    mirroring the given function
389   
390    """
391    lookup = lookup_from_bound(attribute, bound)
392    if lookup:
393        lookup.lookupTable = [Orange.data.Value(attribute, function(attributes))
394                              for attributes in Orange.misc.counters.LimitedCounter(
395                                  [len(attr.values) for attr in bound])]
396        return lookup
397    else:
398        examples = Orange.data.Table(Orange.data.Domain(bound, attribute))
399        for attributes in Orange.misc.counters.LimitedCounter([len(attr.values)
400                                                   for attr in dom.attributes]):
401            examples.append(Orange.data.Example(dom, attributes +
402                                                [function(attributes)]))
403        return LookupLearner(examples)
404     
405
406def lookup_from_data(examples, weight = 0, learnerForUnknown = None):
407    if len(examples.domain.attributes) <= 3:
408        lookup = lookup_from_bound(examples.domain.classVar,
409                                 examples.domain.attributes)
410        lookupTable = lookup.lookupTable
411        for example in examples:
412            ind = lookup.getindex(example)
413            if not lookupTable[ind].isSpecial() and (lookupTable[ind] <>
414                                                     example.getclass()):
415                break
416            lookupTable[ind] = example.getclass()
417        else:
418            return lookup
419
420        # there are ambiguities; a backup plan is
421        # ClassifierByExampleTable, let it deal with them
422        return LookupLearner(examples, weight,
423                             learnerForUnknown=learnerForUnknown)
424
425    else:
426        return LookupLearner(examples, weight,
427                             learnerForUnknown=learnerForUnknown)
428       
429       
430def dump_lookup_function(func):
431    if isinstance(func, Orange.data.variable.Variable):
432        if not func.getValueFrom:
433            raise TypeError, "attribute '%s' does not have an associated function" % func.name
434        else:
435            func = func.getValueFrom
436
437    outp = ""
438    if isinstance(func, ClassifierByExampleTable):
439    # XXX This needs some polishing :-)
440        for i in func.sortedExamples:
441            outp += "%s\n" % i
442    else:
443        boundset = func.boundset()
444        for a in boundset:
445            outp += "%s\t" % a.name
446        outp += "%s\n" % func.classVar.name
447        outp += "------\t" * (len(boundset)+1) + "\n"
448       
449        lc = 0
450        if len(boundset)==1:
451            cnt = Orange.misc.counters.LimitedCounter([len(x.values)+1 for x in boundset])
452        else:
453            cnt = Orange.misc.counters.LimitedCounter([len(x.values) for x in boundset])
454        for ex in cnt:
455            for i in range(len(ex)):
456                if ex[i]<len(boundset[i].values):
457                    outp += "%s\t" % boundset[i].values[ex[i]]
458                else:
459                    outp += "?\t",
460            outp += "%s\n" % func.classVar.values[int(func.lookupTable[lc])]
461            lc += 1
462    return outp
Note: See TracBrowser for help on using the repository browser.