source: orange/orange/doc/reference/lookup.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 16.6 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8<h1>Lookup Classifiers</h1>
9<index name="classifiers+lookup classification">
10
11<P>Lookup classifiers predict classes by looking into stored lists of cases. There are two kinds of such classifiers in Orange. The simpler and fastest <CODE><INDEX name="classes/ClassifierByLookupTable">ClassifierByLookupTable</CODE> use up to three discrete attributes and have a stored mapping from values of those attributes to class value. The more complex classifiers stores an <CODE>ExampleTable</CODE> and predicts the class by matching the example to examples in the table.</P>
12
13<P>The natural habitat of these classifiers is feature construction: they usually reside in <CODE>getValueFrom</CODE> fields of constructed attributes to facilitate their automatic computation. For instance, the following script shows how to translate the Monk 1 dataset features into a more useful subset that will only include the attributes <CODE>a</CODE>, <CODE>b</CODE>, <CODE>e</CODE>, and attributes that will tell whether <CODE>a</CODE> and <CODE>b</CODE> are equal and whether <CODE>e</CODE> is 1 (don't bother the details, they follow later).</P>
14
15<p class="header"><a href="ClassifierByLookupTable.py">part of ClassifierByLookupTable.py</a>
16(uses <a href="monk1.tab">monk1.tab</a>)</p>
17<XMP class=code>import orange
18
19data = orange.ExampleTable("monk1")
20a, b, e = data.domain["a"], data.domain["b"], data.domain["e"]
21
22ab = orange.EnumVariable("a==b", values = ["no", "yes"])
23ab.getValueFrom = orange.ClassifierByLookupTable(ab, a, b, \
24  ["yes", "no", "no",   "no", "yes", "no",   "no", "no", "yes"])
25
26e1 = orange.EnumVariable("e==1", values = ["no", "yes"])
27e1.getValueFrom = orange.ClassifierByLookupTable(e1, e, \
28  ["yes", "no", "no", "no", "?"])
29
30data2 = data.select([a, b, ab, e, e1, data.domain.classVar])
31</XMP>
32
33<P>We can check the correctness of the script by printing out several random examples from <CODE>data2</CODE>.</p>
34
35<XMP class=code>>>> for i in range(5):
36...     print data2.randomexample()
37['1', '1', 'yes', '4', 'no', '1']
38['3', '3', 'yes', '2', 'no', '1']
39['2', '1', 'no', '4', 'no', '0']
40['2', '1', 'no', '1', 'yes', '1']
41['1', '1', 'yes', '3', 'no', '1']
42</XMP>
43
44<P>The first <CODE>ClassifierByLookupTable</CODE> takes values of attributes <CODE>a</CODE> and <CODE>b</CODE> and computes the value of <CODE>ab</CODE> according to the rule given in the given table. The first three values correspond to <CODE>a</CODE>=1 and <CODE>b</CODE>=1, 2, 3; for the first combination, value of <CODE>ab</CODE> should be "yes", for the other two <CODE>a</CODE> and <CODE>b</CODE> are different. The next triplet correspond to <CODE>a</CODE>=2; here, the middle value is "yes"...</P>
45
46<P>The second lookup is simpler: since it involves only a single attribute, the list is a simple one-to-one mapping from the four-valued <CODE>e</CODE> to the two-valued <CODE>e1</CODE>. The last value in the list is returned when <CODE>e</CODE> is unknown and tells that <CODE>e1</CODE> should be unknown then as well.</P>
47
48<P>Note that you don't need <CODE>ClassifierByLookupTable</CODE> for this. The new attribute <CODE>e1</CODE> could be computed with a callback to Python, for instance:</P>
49
50<XMP class=code>e2.getValueFrom = lambda ex, rw: orange.Value(e2, ex["e"]=="1")
51</XMP>
52
53<P>While functionally the same, using classifiers by lookup table is faster.</P>
54
55<HR>
56
57<H2>Classifiers by Lookup Table</H2>
58
59<P>Although the above example used <CODE>ClassifierByLookupTable</CODE> as if it was a concrete class, <CODE>ClassifierByLookupTable</CODE> is actually abstract. Calling its constructor is a typical Orange trick: what you get, is never <CODE>ClassifierByLookupTable</CODE>, but either <CODE><INDEX name="classes/ClassifierByLookupTable1">ClassifierByLookupTable1</CODE>, <CODE><INDEX name="classes/ClassifierByLookupTable2">ClassifierByLookupTable2</CODE> and <CODE><INDEX name="classes/ClassifierByLookupTable3">ClassifierByLookupTable3</CODE>. As their names tell, the first classifies using a single attribute (so that's what we had for <CODE>e1</CODE>), the second uses a pair of attributes (and has been constructed for <CODE>ab</CODE> above), and the third uses three attributes. Class predictions for each combination of attribute values are stored in a (one dimensional) table.
60To classify an example, the classifier computes an index of the element of the table that corresponds to the combination of attribute values.</P>
61
62<P>These classifiers are built to be fast, not safe. If you, for instance, change the number of values for one of the attributes, the Orange will most probably crash. To protect you somewhat, many of these classes' attributes are read-only and can only be set when the object is constructed.</P>
63
64<P class=section>Attributes</P>
65<DL class=attributes>
66
67<DT>variable1[, variable2[, variable3]]<SPAN class=normalfont>(read only)</SPAN></DT>
68<DD>The attribute(s) that the classifier uses for classification. <CODE>ClassifierByLookupTable1</CODE> only has <CODE>variable1</CODE>, <CODE>ClassifierByLookupTable2</CODE> also has <CODE>variable2</CODE> and <CODE>ClassifierByLookupTable3</CODE> has all three.</DD>
69
70<DT>variables <SPAN class=normalfont>(read only)</SPAN></DT>
71<DD>The above variables, returned as a tuple.</DD>
72
73<DT>noOfValues1, noOfValues2[, noOfValues3] <SPAN class=normalfont>(read only)</SPAN></DT>
74<DD>The number of values for <CODE>variable1</CODE>, <CODE>variable2</CODE> and <CODE>variable3</CODE>. This is stored here to make the classifier faster. Those attributes are defined only for <CODE>ClassifierByLookupTable2</CODE> (the first two) and <CODE>ClassifierByLookupTable3</CODE> (all three).</DD>
75
76<DT>lookupTable <SPAN class=normalfont>(read only)</SPAN></DT>
77<DD>A list of values (<CODE>ValueList</CODE>), one for each possible combination of attributes. For <CODE>ClassifierByLookupTable1</CODE>, there is an additional element that is returned when the attribute's value is unknown. Values are ordered by values of attributes, with <CODE>variable1</CODE> being the most important. In case of two three valued attributes, the list order is therefore 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3, where the first digit corresponds to <CODE>variable1</CODE> and the second to <CODE>variable2</CODE>.</P>
78
79<P>The list is read-only in the sense that you cannot assign a new list to this field. You can, however, change its elements. Don't change its size, though.</DD>
80
81<DT>distributions <SPAN class=normalfont>(read only)</SPAN></DT>
82<DD>Similar to <CODE>lookupTable</CODE>, but is of type <CODE>DistributionList</CODE> and stores a distribution for each combination of values.</CODE>
83
84<DT>dataDescription</DT>
85<DD>An object of type <CODE>EFMDataDescription</CODE>, defined only for <CODE>ClassifierByLookupTable2</CODE> and <CODE>ClassifierByLookupTable3</CODE>. They use it to make predictions when one or more attribute values are unknown. <CODE>ClassifierByLookupTable1</CODE> doesn't need it since this case is covered by an additional element in <CODE>lookupTable</CODE> and <CODE>distributions</CODE>, as told above.</DD>
86</DL>
87
88
89<P class=section>Methods</P>
90<dl class=attributes>
91<DT>ClassifierByLookupTable(classVar, variable1[, variable2[, variable3]] [, lookupTable[, distributions]])</DT>
92<DD>A general constructor that, based on the number of attribute descriptors, constructs one of the three classes discussed. If <CODE>lookupTable</CODE> and <CODE>distributions</CODE> are omitted, constructor also initializes <CODE>lookupTable</CODE> and <CODE>distributions</CODE> to two lists of the right sizes, but their elements are don't knows and empty distributions. If they are given, they must be of correct size.</DD>
93
94<DT>ClassifierByLookupTable1(classVar, variable1 [, lookupTable, distributions])<BR>
95ClassifierByLookupTable2(classVar, variable1, variable2, [, lookupTable[, distributions]])<BR>
96ClassifierByLookupTable3(classVar, variable1, variable2, variable3, [, lookupTable[, distributions]])</dt>
97<DD>Class-specific constructors that you can call instead of the general constructor. The number of attributes must match the constructor called.</DD>
98
99
100<DT>getindex(example)</DT>
101<DD>Returns an index into <CODE>lookupTable</CODE> or <CODE>distributions</CODE>. The formula depends upon the type of the classifier. If value<EM>i</EM> is <CODE>int(example[variable<EM>i</EM>])</CODE>, then the corresponding formulae are
102
103<DL>
104<DT><CODE>ClassifierByLookupTable1</CODE>:</DT>
105<DD><CODE>index = value1</CODE>, or <CODE>len(lookupTable)-1</CODE> if value is unknown</DD>
106
107<DT><CODE>ClassifierByLookupTable2</CODE>:</DT>
108<DD><CODE>index = value1*noOfValues1 + value2</CODE>, or -1 if any value is unknown
109
110<DT><CODE>ClassifierByLookupTable3</CODE>:</DT>
111<DD><CODE>index = (value1*noOfValues1 + value2) * noOfValues2 + value3</CODE>, or -1 if any value is unknown</DD>
112</DD>
113</DL>
114<P style="margin-top: 12pt">Let's see some indices for randomly chosen examples from the original table.</P>
115<p class="header"><a href="ClassifierByLookupTable.py">part of ClassifierByLookupTable.py (continued from above)</a>
116(uses <a href="monk1.tab">monk1.tab</a>)</p>
117<XMP class=code>>>> for i in range(5):
118...     ex = data.randomexample()
119...     print "%s: ab %i, e1 %i " % (ex, \
120...         ab.getValueFrom.getindex(ex), \
121...         e1.getValueFrom.getindex(ex))
122['1', '1', '2', '2', '4', '1', '1']: ab 0, e1 3
123['3', '3', '1', '2', '2', '1', '1']: ab 8, e1 1
124['2', '1', '2', '3', '4', '2', '0']: ab 3, e1 3
125['2', '1', '1', '2', '1', '1', '1']: ab 3, e1 0
126['1', '1', '1', '2', '3', '1', '1']: ab 0, e1 2
127</XMP>
128</DD>
129</DL>
130
131
132<H2>Classifier by ExampleTable</H2>
133
134<P><CODE><INDEX name="classes/ClassifierByExampleTable">ClassifierByExampleTable</CODE> is the alternative to <CODE>ClassifierByLookupTable</CODE>. It is to be used when the classification is based on more than three attributes. Instead of having a lookup table, it stores an <CODE>ExampleTable</CODE>, which is optimized for a faster access.</P>
135
136<P>This class is used in similar contexts as <CODE>ClassifierByLookupTable</CODE>. If you write, for instance, a constructive induction algorithm, it is recommendable that the values of the new attribute are computed either by one of classifiers by lookup table or by <CODE>ClassifierByExampleTable</CODE>, depending on the number of bound attributes.</P>
137
138<P class=section>Attributes</P>
139<DL class=attributes>
140<DT>sortedExamples</DT>
141<DD>An <CODE>ExampleTable</CODE> with sorted examples for lookup. Examples in the table can be merged; if there were multiple examples with the same attribute values (but possibly different classes), they are merged into a single example. Regardless of merging, class values in this table are distributed: their <CODE>svalue</CODE> contains a <CODE>Distribution</CODE>.</DD>
142
143<DT>classifierForUnknown</DT>
144<DD>This classifier is used to classify examples which were not found in the table. If <CODE>classifierForUnknown</CODE> is not set, don't know's are returned.</DD>
145
146<DT>variables <SPAN class=normalfont>(read only)</SPAN></DT>
147<DD>A tuple with attributes in the domain. This field is here so that <CODE>ClassifierByExampleTable</CODE> appears more similar to <CODE>ClassifierByLookupTable</CODE>. If a constructive induction algorithm returns the result in one of these classifiers, and you would like to check which attributes are used, you can use <CODE>variables</CODE> regardless of the class you actually got.</DD>
148</DL>
149
150<P>There are no specific methods for <CODE>ClassifierByExampleTable</CODE>. Since this is a classifier, it can be called. When the example to be classified includes unknown values, <CODE>classifierForUnknown</CODE> will be used if it is defined.</P>
151
152<P>Although <CODE>ClassifierByExampleTable</CODE> is not really a classifier in the sense that you will use it to classify examples, but is rather a function for computation of intermediate values, it has an associated learner, <CODE>LookupLearner</CODE>. The learner's task is, basically, to construct an <CODE>ExampleTable</CODE> for <CODE>sortedExamples</CODE>. It sorts them, merges them and, of course, regards example weights in the process as well.</P>
153
154<p class="header"><a href="ClassifierByExampleTable.py">part of ClassifierByExampleTable.py</a>
155(uses <a href="monk1.tab">monk1.tab</a>)</p>
156<XMP class=code>import orange
157
158data = orange.ExampleTable("monk1")
159a, b, e = data.domain["a"], data.domain["b"], data.domain["e"]
160
161data_s = data.select([a, b, e, data.domain.classVar])
162abe = orange.LookupLearner(data_s)
163</XMP>
164
165<P>In <CODE>data_s</CODE>, we have prepared a table in which examples are described only by <CODE>a</CODE>, <CODE>b</CODE>, <CODE>e</CODE> and the class. Learner constructs a <CODE>ClassifierByExampleTable</CODE> and stores examples from <CODE>data_s</CODE> into its <CODE>sortedExamples</CODE>. Examples are merged so that there are no duplicates.
166
167<XMP class=code>>>> print len(data_s)
168432
169>>> print len(abe2.sortedExamples)
17036
171>>> for i in abe2.sortedExamples[:5]:
172...     print i
173['1', '1', '1', '1']
174['1', '1', '2', '1']
175['1', '1', '3', '1']
176['1', '1', '4', '1']
177['1', '2', '1', '1']
178['1', '2', '2', '0']
179['1', '2', '3', '0']
180['1', '2', '4', '0']
181['1', '3', '1', '1']
182['1', '3', '2', '0']
183</XMP>
184
185<P>Well, there's a bit more here than meets the eye: each example's class value also stores the distribution of classes for all examples that were merged into it. In our case, the three attribute suffice to unambiguously determine the classes and, since example covered the entire space, all distributions have 12 examples in one of the class and none in the other.</P>
186
187<XMP class=code>>>> for i in abe2.sortedExamples[:10]:
188...     print i, i.getclass().svalue
189['1', '1', '1', '1'] <0.000, 12.000>
190['1', '1', '2', '1'] <0.000, 12.000>
191['1', '1', '3', '1'] <0.000, 12.000>
192['1', '1', '4', '1'] <0.000, 12.000>
193['1', '2', '1', '1'] <0.000, 12.000>
194['1', '2', '2', '0'] <12.000, 0.000>
195['1', '2', '3', '0'] <12.000, 0.000>
196['1', '2', '4', '0'] <12.000, 0.000>
197['1', '3', '1', '1'] <0.000, 12.000>
198['1', '3', '2', '0'] <12.000, 0.000>
199</XMP>
200
201<P><CODE>ClassifierByExampleTable</CODE> will usually used by <CODE>getValueFrom</CODE>. So, we would probably continue this by constructing a new attribute and put the classifier into its <CODE>getValueFrom</CODE>.</P>
202
203<XMP class=code>>>> y2 = orange.EnumVariable("y2", values = ["0", "1"])
204>>> y2.getValueFrom = abe
205</XMP>
206
207<P>There's something disturbing here. Although <CODE>abe</CODE> determines the value of <CODE>y2</CODE>, <CODE>abe.classVar</CODE> is still <CODE>y</CODE>. Orange doesn't bother (the whole example is artificial - you will seldom pack entire dataset in an <CODE>ClassifierByExampleTable</CODE>...), so shouldn't you. But still, for the sake of hygiene, you can conclude by
208
209<XMP class=code>>>> abe.classVar = y2
210</XMP>
211
212<P>Whole story can be greatly simplified. <CODE>LookupLearner</CODE> can also be called differently than other learners. Besides examples, you can pass the new class attribute and the attributes that should be used for classification. This saves us from constructing <CODE>data_s</CODE> and reassigning the <CODE>classVar</CODE>. It doesn't set the <CODE>getValueFrom</CODE>, though.</P>
213
214<p class="header"><a href="ClassifierByExampleTable.py">part of ClassifierByExampleTable.py</a>
215(uses <a href="monk1.tab">monk1.tab</a>)</p>
216<XMP class=code>import orange
217data = orange.ExampleTable("monk1")
218a, b, e = data.domain["a"], data.domain["b"], data.domain["e"]
219
220y2 = orange.EnumVariable("y2", values = ["0", "1"])
221abe2 = orange.LookupLearner(y2, [a, b, e], data)
222</XMP>
223
224<P>Let us, for the end, show another use of <CODE>LookupLearner</CODE>. With the alternative call arguments, it offers an easy way to observe attribute interactions. For this purpose, we shall omit <CODE>e</CODE>, and construct a <CODE>ClassifierByExampleTable</CODE> from <CODE>a</CODE> and <CODE>b</CODE> only.</P>
225
226<p class="header"><a href="ClassifierByExampleTable.py">part of ClassifierByExampleTable.py</a>
227(uses <a href="monk1.tab">monk1.tab</a>)</p>
228<XMP class=code>y2 = orange.EnumVariable("y2", values = ["0", "1"])
229abe2 = orange.LookupLearner(y2, [a, b], data)
230for i in abe2.sortedExamples:
231    print i, i.getclass().svalue
232</XMP>
233
234<P>The script's output show how the classes are distributed for different values of <CODE>a</CODE> and <CODE>b</CODE>.</P>
235
236<XMP class=code>['1', '1', '1'] <0.000, 48.000>
237['1', '2', '0'] <36.000, 12.000>
238['1', '3', '0'] <36.000, 12.000>
239['2', '1', '0'] <36.000, 12.000>
240['2', '2', '1'] <0.000, 48.000>
241['2', '3', '0'] <36.000, 12.000>
242['3', '1', '0'] <36.000, 12.000>
243['3', '2', '0'] <36.000, 12.000>
244['3', '3', '1'] <0.000, 48.000>
245</XMP>
246
247<P>For instance, when <CODE>a</CODE> is '1' and <CODE>b</CODE> is '3', the majority class is '0', and the class distribution is 36:12 in favor of '0'.</P>
248
249</BODY></HTML> 
Note: See TracBrowser for help on using the repository browser.