source: orange/orange/doc/modules/orngCI.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 40.6 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3</HEAD>
4<body>
5<h1>orngCI: Orange Constructive Induction Module</h1>
6<index name="modules+constructive induction (CI)">
7<index name="classifiers+function decomposition">
8<index name="feature construction">
9
10<p>Module orngCI implements classes and function that support
11constructive induction (CI). It includes minimal-complexity function
12decomposition and a much generalized minimal-error function
13decomposition. Both can be used as feature induction algorithms or as
14a standalone learning algorithm HINT (Zupan, 1997; Zupan et al,
151998). In addition, the module provides an interface to generalized
16Kramer's method (Kramer, 1994) and serveral other CI related
17algorithms.</P>
18
19<hr>
20
21<H2>Constructive Induction in Orange</H2>
22
23<P>Constructive induction takes a subset of attributes, which we call
24<EM>bound attributes</EM> and creates a new attribute whose values are
25computed from values of the bound attributes. Orange supports the
26so-called data-driven constructive induction, where learning examples
27are used to decide how the new attribute is computed from the bound
28ones.</P>
29
30<P>Feature constructors (orange objects that construct features) are
31thus given examples, a list of bound attributes, and, optionally, an
32id of the weight attribute. They construct a new attribute and return
33a tuple with the new feature and a value that estimates its
34quality.</p>
35
36<h3>Constructing a Constructor</h3>
37
38<p>Complex feature constructors, such as those for minimal-error
39decomposition and for Kramer's algorithm show the flexibility of
40Orange's component based approach. On the other hand, the hierarchy of
41components that need to be put together to construct a workable
42feature inducer is too complex even for an experienced user. The main
43mission of <CODE>orngCI</CODE> is thus to help the user by
44"flattening" the hierarchy. Defaults are provided for all the
45components so the user only needs to give the specific parts of the
46hierarchy.</p>
47
48<p>For instance, let <code>fim</code> be an instance of
49<code>orange.FeatureIM</code>, a feature constructor for minimal error
50function decomposition. Its most crucial parameter, <em>m</em>, is
51"hidden" in
52<CODE>fim.clustersFromIM.columnAssessor.m</CODE>. Parameter <em>m</em>
53is an attribute of an m-estimate based column assessor component of
54the column quality assessor based algorithm for column clustering.  To
55save the user from having to remember this, <CODE>orngCI</CODE> offers
56its own class <CODE><INDEX name="classes/FeatureByIM (in orngCI)">FeatureByIM</CODE>. Now, if <CODE>fim</CODE> is an
57instance of <CODE>orngCI.FeatureByIM</CODE> the user needs to set
58<CODE>fim.m</CODE>. Before the actual feature construction takes
59place, <CODE>orngCI.FeatureByIM</CODE> constructs an appropriate
60<CODE>orange.FeatureByIM</CODE> and sets its
61<CODE>clustersFromIM.columnAssessor.m</CODE> to
62<code>fim.m</code>. Constructing a structure and putting the
63user-supplied components at the right places is called
64<em>deflattening</em>.</P>
65
66<P>A general principle is that if you define a component, you also
67need to define its subcomponents as well. The default
68<code>columnAssessor</code> is
69<code>orange.ColumnAssessor_m</code>. If you leave it alone, all its
70subcomponents are set as well. However, if you set
71<CODE>columnAssessor</CODE> to
72<CODE>orange.ColumnAssessor_Measure</CODE>, it is your responsibility
73to set its attribute "measure", either by specifying it as a "measure"
74argument to <CODE>FeatureByIM</CODE> or by setting it directly in
75<CODE>columnAssessor.measure</CODE>.</P>
76
77
78<P><B>The rest of this section is intended for advanced users.</B></P>
79
80<p>The function that performs deflattening is always called <code>createInstance</code>.</p>
81
82<dl>
83<DT><B>createInstance()</B></DT>
84<DD>Returns an instance of pure <CODE>orange</CODE> classes that perform the feature construction. The instance is also stored in <CODE>self</CODE>'s field <CODE>instance</CODE>.
85</DL>
86
87<p>Deflattening is initiated by the call operator. The call operator
88is given examples, bound attributes and, optionally, the id of the
89weight attribute, and returns a constructed attribute and an
90assessment of its quality. To do its job, the call operator needs an
91instance of a corresponding Orange feature constructor. Constructing
92such instances at each invocation of call would be too costly,
93therefore a kind of caching is implemented. When the call operator is
94invoked for the first time, <CODE>createInstance</CODE> is called to
95deflatten the structure and the result is stored in
96<CODE>self.instance</CODE>. In further calls, the stored
97<CODE>self.instance</CODE> is used again. However, when any of classes
98attributes that affect the feature construction (such as
99<CODE>fim.m</CODE>, see above example) are modified,
100<CODE>self.instance</CODE> is set to <CODE>None</CODE> and a new
101instance will be constructed at next call.</SMALL></P>
102
103<P>Now, if you want to, say, tune <CODE>fim.m</CODE>, you can change
104it and cause reinstantiations of corresponding <CODE>orange</CODE>
105classes.</P>
106
107<xmp class="code">bound = data.domain[:2]
108fim = orngCI.FeatureByIM()
109attrs = []
110for m in [0.1, 0.5, 1.0, 2.0]:
111  fim.m = m
112  attr, quality = fim(data, bound)
113  attrs.append(attr)
114</XMP>
115
116<P>This will construct features from the first two attributes of the
117domain, using different values of m. (It would be a blasphemy not to
118publish a shorted way to write the above code fragment: the last four
119lines can be replaced by <CODE>attrs = [fim(data, bound)[0] for fim.m
120in [0.1, 0.5, 1.0, 2.0]]</CODE>). However, at each call to
121<CODE>fim</CODE>, <CODE>createInstances</CODE> is called and a new set
122of components is constructed since setting <CODE>m</CODE> resets
123them. To make the program faster, you should replace <xmp
124class="code"> fim.m = 2 </xmp>
125<P>by</P>
126
127<xmp class="code">fim.instance.clustersFromIM.columnAssessor.m = m
128</xmp>
129
130
131<h3>Constructing a Feature</h3>
132
133<p>To construct a feature, you need to call a constructor. The call
134operator expects two or three arguments: examples, bound-attributes
135and, optionally an id of a weight meta attribute. It returns a tuple
136containing the attribute and its quality estimation. This is the same
137for all feature constructors in <code>orngCI</code>, although some may
138ignore some arguments or return useless quality assessments.</p>
139
140<p>The method of quality estimation depends upon the algorithm used -
141it can be a decrease of error, an increase of impurity, ets. Qualities
142returned by different feature constructors are not comparable. The
143quality can be either negative or positive; in any case, higher
144numbers mean better attributes. If the measure of quality is negated
145(ie. low numbers meaning better attributes), they will have a negative
146sign.</P>
147
148<P>The famous Monk 1 data set (Thrun et al, 1991) has the goal concept <EM>y := a=b or e=1</EM>. When joining attributes 'a' and 'b', the new attribute should express equality of a and b. Let us see if the minimal-complexity function decomposition can find the appropriate function.</P>
149
150<xmp class="code">>>> import orange, orngCI
151>>> data = orange.ExampleTable("monk2")
152>>> ab, quality = orngCI.FeatureByMinComplexity(data, ["a", "b"])
153>>> ab.values
154<1-1+2-2+3-3, 1-2+1-3+2-1+2-3+3-1+3-2>
155>>> quality
156-2.0
157</xmp>
158
159<P>Note that we specified the bound attributes by names. This is not the only way. If you want, you can give attribute descriptors, or even their indices in the domain. Thus, knowing that attributes "a" and "b" are the first two attributes, the above call could be replaced by</P>
160
161<xmp class="code">>>> ab, quality = orngCI.FeatureByMinComplexity(data, data.domain[:2])
162</xmp>
163
164<P>or even</P>
165
166<xmp class="code">>>> ab, quality = orngCI.FeatureByMinComplexity(data, [0, 1])
167</xmp>
168
169<P>or by any combination of names, descriptors and indices.</P>
170
171<P>Now for the results. Values of the new attribute show to which values of the bound attributes they correspond. The new attribute is binary, its first value is named '1-1+2-2+3-3' and the other is '1-2+1-3+2-1+2-3+3-1+3-2'. It is obvious that this is the correct feature. Its quality is -2; minimal-complexity function decomposition prefers attributes with less values, so higher the number of values, more negative (and thus lower) the quality.</P>
172
173<P>It makes sense to rename the values of the constructed feature. In our case, we shall call the first value "eq" and the second would be "diff".</P>
174
175<xmp class="code">>>> ab.values = ["eq", "diff"]
176</xmp>
177
178<P>If any of bound attributes have values that include '-' or '+', the constructed attribute's values will be named "c1", "c2", "c3"...</P>
179
180
181<h3>Using the Constructed Features</h3>
182
183<P>Constructed features can be added to the domain and used in further processing of the data, such as, for instance, learning. If that is what you want, you do not need to read this section any further: if you have followed the above example, you can conclude it by</P>
184
185<xmp class="code">>>> data2 = orngCI.addAnAttribute(ab, data)
186</xmp>
187
188<P>This adds the attribute to the domain and returns a new example table.</p>
189
190<p>The feature knows how to compute itself from the values of the attributes it is constructed from. The usual Orange mechanism is used for this: the new feature has a <CODE>getValueFrom</CODE> pointing to a classifier that computes the feature's value. In short, when some method has an example and needs a value of a particular attribute, but this attribute is not in the example's domain, it can check whether the attribute has a <code>getValueFrom</code> defined. If it has, the method calls it, passing the example as an argument. The <code>getValueFrom</code> either computes the attribute's value or returns <em>don't know</em>.</P>
191
192<xmp class="code">>>> ab.getValueFrom
193<ClassifierByLookupTable2 instance at 0x0197BEF0>
194>>> print ab.getValueFrom.lookupTable
195<eq, diff, diff, diff, eq, diff, diff, diff, eq>
196>>> ab.getValueFrom.variable1
197EnumVariable 'a'
198>>> ab.getValueFrom.variable2
199EnumVariable 'b'
200</XMP>
201
202<p>For most of features constructed by methods in <code>orngCI</code>, <code>getValueFrom</code> will contain some variant of <code>orange.ClassifierByLookupTable</code>. In above case, it is a classifier by lookup table based on two attributes. Values of the new attribute are read from its <code>lookupTable</code>. Each entry in the table corresponds to a combination of value of bound attributes; the first corresponds to (1, 1), the second to (1, 2), then to (1, 3), (2, 1), (2,2)..., where the first digit is for <CODE>variable1</CODE> and the second for <CODE>variable2</CODE>. Value "eq" occurs at the first, the fifth and the ninth elements of the table, which correspond to (1, 1), (2, 2) and (3, 3). This is exactly what we expected for Monk 1 dataset.</p>
203
204<p>If you need to manually correct the feature, you can change the <CODE>lookupTable</CODE>:</P>
205
206<xmp class="code">>>> ab.getValueFrom.lookupTable[0]="diff"
207</XMP>
208
209<P>This would work for most cases, but it would fail when you start to mess with probability distributions of the new feature. The complete description of classifiers by lookup table is, however, out of scope of this text.</P>
210
211<P>Let us show an example of what <CODE>getValueFrom</CODE> can do. We shall compute the class distributions for different values of the new attribute:</P>
212
213<xmp class="code">>>> c=orange.ContingencyAttrClass(ab, data)
214>>> for i in c:
215...   i
216<0.000, 144.000>
217<216.000, 72.000>
218</XMP>
219
220<P>Works even though the new attribute, <CODE>ab</CODE> is not in domain. Its values are computed on the fly, while counting, and are not stored anywhere. What the results say is that for the first value ("eq"), all examples are in the second class ("1"), while when values of a and b are different, 216 examples (exactly 3/4) are in the first class and the other 72 in the second. Similarly, you could assess the quality of the new attribute by using information gain, for instance. Note that the automatic computation won't always work. This is done as a precaution in cases when the algorithm would require many recomputations of the value on one and the same example. ReliefF, for instance, is one such measure. In such cases, you add the new attribute to a domain and compute a new dataset in an <CODE>ExampleTable</CODE>, as shown above.</P>
221
222<h3>Feature Inducers as Redundancy Removers</h3>
223
224<P>Feature inducers can be used to remove (or merge) redundant attribute values. In Monk 1, the attribute "e" has four values, but the goal concept only asks whether the value is "1" or not. Thus, values "2", "3" and "4" are equivalent. If minimal complexity decomposition is called to construct a new attribute from "e", it will easily remove the redundancies:</P>
225
226<xmp class="code">>>> new_e, quality = orngCI.FeatureByMinComplexity(data, ["e"])
227>>> new_e.values
228<1, 2+3+4>
229</xmp>
230
231</P>The new attribute is binary, the first value corresponding to "e=1" and the other to other values of "e".<P>
232
233<p>We can check the <code>lookupTable</code> again.</p>
234
235<xmp class="code">>>> print new_e.getValueFrom.lookupTable
236<1, 2+3+4, 2+3+4, 2+3+4, 2+3+4>
237</xmp>
238
239<h2>Single Feature Construction</h2>
240
241<H3>FeatureByMinComplexity</H3>
242
243<P><INDEX name="classes/FeatureByMinComplexity (in orngCI)">FeatureByMinComplexity constructs features using a modified algorithm of Ashenhurst and Curtis, as described by Zupan (1997, 1998). It has two attributes.</P>
244
245<DL>
246<DT>colorIG</DT><DD> defines the component that colors the graph. Since only one class (<CODE>orange.ColorIG_MCF</CODE>) is available, there is no point in changing this (more accurately, there is nothing to change it to).</DD>
247
248<DT>completion</DT><DD> defines what to do with combinations of values of bound attributes that did not occur in the data or for which the corresponding graph node had no connections to any other node. The default value of <CODE>completion</CODE> is <CODE>orngCI.FeatureByMinComplexity.CompletionByBayes</CODE>; the values are determined using Bayesian classifier. The alternatives are <CODE>orngCI.FeatureByMinComplexity.CompletionByDefault</CODE>, where the most frequent value is used for imputation, and <CODE>orngCI.FeatureByMinComplexity.NoCompletion</CODE>, where the values are left undefined.</DT>
249</DL>
250
251
252<H3>FeatureByIM</H3>
253
254<P>Class <INDEX name="classes/FeatureByIM (in orngCI)">FeatureByIM implements and extends Zupan's (1997, 1998) method for minimal-error decomposition. It is analogous to minimal-complexity decomposition. While minimal-complexity decomposition supposes that the dataset is noise-free, minimal-error decomposition can also handle noisy data.</p>
255
256<p>The algorithm builds an incompatibility matrix (also called partition matrix) and merges the columns until a stopping criterion is reached. In Zupan's method, columns are compared by how the merge would affect the average m-error estimate over elements of the matrix. Merging is stopped when the error stops decreasing. In Orange, different similarity measures and stopping criteria can be used.</P>
257
258<P>Since the <CODE>FeatureByIM</CODE> is analogous for <CODE>FeatureByMinComplexity</CODE> , <CODE>orngCI.FeatureByMinError</CODE> is provided as an alias for <CODE>orngCI.FeatureByIM</CODE>. No class with this name appears in <CODE>orange</CODE>, though.</P>
259
260<P><CODE>orngCI.FeatureByIM</CODE> has a number of settable components. Their actual positions after deflattening are given in parentheses.</P>
261
262<DL>
263<DT>IMconstructor (instance.IMconstructor)</DT>
264<DD>is a component used for construction of incompatibility matrix. If left at default value of <CODE>None</CODE>, <CODE>orange.IMBySorting</CODE> will be used.</DD>
265
266<DT>clustersFromIM (instance.clustersFromIM)</DT>
267<DD>is the method that defines how the columns are merged. The only available class at the moment is <CODE>ClustersFromIMByAssessor</CODE> which uses a <CODE>columnAssessor</CODE> to evaluate distances between the columns and merges the most similar columns until the <CODE>stopCriterion</CODE> says it's enough.</DD>
268
269<DT>stopCriterion (instance.clustersFromIM.stopCriterion)</DT>
270<DD>decides when to stop the merging. The default depends upon other parameters. The default default (see below) is <CODE>orange.StopIMClusteringByAssessor_noProfit</CODE>, which stops the merging when the gain is negative or when it is smaller than the prescribed proportion. Alternatives are <CODE>orange.StopIMClusteringByAssessor_binary</CODE> that stops merging when only two columns (future values) are left, and <CODE>orange.StopIMClusteringByAssessor_n</CODE> which stops at <CODE>n</CODE> columns.
271
272<DT>minProfitProportion (instance.clustersFromIM.stopCriterion.minProfitProportion)</DT>
273<DD>sets the minimal profit for merging to continue. If you set it, for example, to 0.05, the merging will stop when the best merge would improve the current quality for less than 5%. If you use this with m-error estimation, for instance, the merging would stop when no merge would decrease the error by at least 5%. The default <CODE>minProfitProportion</CODE> is 0.</DD>
274
275<DT>n (instance.clustersFromIM.stopCriterion.n)</DT>
276<DD>sets the number of values for the new feature. If <CODE>n</CODE> is specified, the default <CODE>stopCriterion</CODE> is <CODE>orange.StopIMClusteringFromByAssessor_n</CODE>.</DD>
277
278<DT>binary</DT>
279<DD>tells that we want to induce binary features. <CODE>orange.StopIMClusteringByAssessor_binary</CODE> is used as stopping criterion.
280
281<DT>columnAssessor (instance.clustersFromIM.columnAssessor)</DT>
282<DD>is the component used to assess the qualities of matrices' elements, its columns and profits by column merging.
283This component is independent of stopping criteria. The default default is <CODE>orange.ColumnAssessor_m</CODE>, which uses m-error estimation. Alternatives are
284<UL>
285<LI>
286<CODE>orange.ColumnAssessor_Laplace</CODE> minimizes Laplace estimation of error of matrix elements;</LI>
287<LI><CODE>orange.ColumnAssessor_Measure</CODE> which optimizes an attribute quality measure (such as information gain); this would be usually used in conjunction with <CODE>orange.StopIMClusteringByAssessor_binary</CODE> or <CODE>orange.StopIMClusteringByAssessor_n</CODE>;</LI>
288<LI>
289<CODE>orange.ColumnAssessor_Kramer</CODE> which optimizes Kramer's impurity measure. It is defined for binary classes only. Since the profit is non-positive, the default stopping criteria is changed to <CODE>orange.StopIMClusteringByAssessor_binary</CODE>. You may replace it by <CODE>orange.StopIMClusteringByAssessor_n</CODE>.</LI>
290<LI>
291<CODE>orange.ColumnAssessor_Relief</CODE> uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.</LI>
292</UL>
293</DD>
294
295<DT>m (instance.clustersFromIM.columnAssessor.m)</DT>
296<DD>gives <EM>m</EM> for m-error estimate.</DD>
297
298<DT>measure (instance.clustersFromIM.columnAssessor.measure)</DT>
299<DD>is the measure to be used when <CODE>columnAssessor</CODE> is of type <CODE>orange.ColumnAssessor_Measure</CODE>. If this option is given, the default <CODE>columnAssessor</CODE> is <CODE>orange.ColumnAssessor_Measure</CODE>.</DD>
300
301<DT>completion</DT><DD> defines what to do with combinations of values of bound attributes that were not present in the data. The default value of <CODE>completion</CODE> is <CODE>orngCI.FeatureByIM.CompletionByBayes</CODE>; the values are determined using Bayesian classifier. The alternatives are <CODE>orngCI.FeatureByIM.CompletionByDefault</CODE>, where the most frequent value is used for imputation, and <CODE>orngCI.FeatureByIM.NoCompletion</CODE>, where the values are left undefined.</DT>
302</DL>
303</DL>
304
305<P>You should pay attention to make sensible combinations of the arguments. If you, for instance, combine column assessor based on Kramer's impurity measure with the ordinary "no profit" stop criterion (by setting both explicitly), you will probably get features that will be simply Cartesian products of bound attributes, since Kramer's impurity measure is non-decreasing. <CODE>orngCI</CODE> does not warning about nonsense settings.</P>
306
307<P>As an example, let us check how minimal-error decomposition performs on Monk 1. We shall check its performance with different values of <EM>m</EM>, and for each we shall print out the values of the constructed feature.</P>
308
309<xmp class="code">>>> import orange, orngCI
310>>> data = orange.ExampleTable("monk1")
311>>> for m in [0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0]:
312...     ab, quality = orngCI.FeatureByIM(data, ["a", "b"], m=m)
313...     print "m=%4.1f: %5.3f %s" % (m, quality, ab.values)
314m= 0.0: 0.000 <1-1+3-3+2-2, 3-1+3-2+1-2+2-3+1-3+2-1>
315m= 0.1: 0.087 <2-2+3-3+1-1, 2-3+3-1+2-1+1-2+3-2+1-3>
316m= 0.2: 0.152 <1-1+2-2+3-3, 1-3+3-2+2-1+2-3+3-1+1-2>
317m= 0.5: 0.300 <2-3+3-1+1-2+3-2+1-3+2-1+3-3+2-2+1-1>
318m= 1.0: 0.333 <1-1+3-3+2-2, 1-2+3-2+2-1+1-3+3-1+2-3>
319m= 2.0: 0.500 <2-3+3-1+3-2+1-2+2-1+1-3+3-3+1-1+2-2>
320m= 5.0: 0.375 <2-1+3-1+3-2+1-2+2-3+1-3+3-3+1-1+2-2>
321m=10.0: 0.186 <1-1+3-3+2-2+1-3+1-2+3-2+3-1+2-1+2-3>
322</xmp>
323
324<P>Results can vary due to random decisions in the procedure. In general, lower <EM>m</EM>'s give correct features. This is not surprising: through <EM>m</EM> the user may inform the algorithm about the level of noise in the data. With higher <EM>m</EM>'s, the algorithm will tend to oversimplify the feature, in our case, it will merge more than needed.</P>
325
326<P>Let us see whether Kramer's impurity measure can compete with this.</P>
327
328<xmp class="code">>>> ab, quality = orngCI.FeatureByIM(data, ["a", "b"],
329      columnAssessor=orange.ColumnAssessor_Kramer())
330>>> print "%5.3f %s" % (quality, ab.values)
3316.917 <2-2+2-3+1-3+3-3+1-2+1-1+3-1+2-1, 3-2>
332</xmp>
333
334<P>It cannot. But what if we allow it to build a four-valued attribute?</P>
335
336<xmp class="code">>>> ab, quality = orngCI.FeatureByIM(data, ["a", "b"],
337      columnAssessor=orange.ColumnAssessor_Kramer(), n=4)
338>>> print "%5.3f %s" % (quality, ab.values)
3392.917 <1-1+3-2+2-1+3-3+1-2+2-2, 1-3, 2-3, 3-1>
340</xmp
341
342<P>Not much better. Note that these examples are not relevant tests of the methods. Minimal-error based decomposition is designed for noisy domains, and <EM>m</EM> is usually fitted with internal cross validation. Kramer's impurity measure was designed for another feature construction method, not for comparing columns of incompatibility matrices.</P>
343
344<P>As a more interesting test, let us see what happens if we construct an attribute with optimization of information gain.</P>
345
346<xmp class="code">>>> ab, quality = orngCI.FeatureByIM(data, ["a", "b"], measure = orange.MeasureAttribute_info())
347>>> print "%5.3f %s" % (quality, ab.values)
3480.000 <1-1+3-3+2-2, 2-3+3-1+1-3+3-2+1-2+2-1>>
349</xmp>
350
351Perfect. We could add "binary=1" to the call, but it is obviously not needed. Values 1-1, 2-2, and 3-3 can be joined without increasing the entropy, and the other values can be joined likewise. (Note that this does not say anything about the actual information gain of the new attribute. <CODE>orange.MeasureAttribute_info()</CODE> estimates entropies in the elements of the incompatibility matrix and observes the changes in entropy with potential merging. This is not the standard information gain of an attribute but rather a form of non-myopic information gain measure.)
352
353
354<H3>FeatureByKramer</H3>
355
356<p><code><INDEX name="classes/FeatureByKramer (in orngCI)">FeatureByKramer</code> implements Kramer's algorithm for feature construction (Kramer, 1994). <P>Basically, it computes a distribution of classes for each combination of values of bound attributes and merges them, just like the minimal-error decomposition merges columns of incompatibility matrix - and hopefully. In case of Monk 1, it extract a 0:36 distribution for combinations 1-1, 2-2 and 3-3, and 36:12 for other values. It is straightforward for any clustering algorithm to see that the former three values belong to one and the latter three to another group.</p>
357
358<p>This method is somewhat similar to minimal-error decomposition. If you sum all rows of incompability matrix to a single row, you get a list of distributions corresponding to combinations of value of bound attributes. And that's exactly the data that Kramer's algorithm operates on. The remaining differences between the algorithms are only in the choice of components; while minimal-error estimation uses a change of m-error estimate as a measure of distance between two values of the new attribute, Kramer's algorithm uses a special impurity measure (which can be, as Kramer himself states, replaced by any other impurity measure or even by some measure of similarity between two distributions).</p>
359
360<p>Due to the similarity between the two methods, the structure of their components is similar. The data structure that corresponds to <CODE>IM</CODE> (incompatilibity matrix) is called <CODE>ExampleDist</CODE>. There is no component analogous to <CODE>IMconstructor</CODE>.</p>
361
362<DL>
363<DT>clustersFromDistributions (instance.clustersFromDistributions)</DT>
364<DD>is the method that defines how the values are merged. The only available class at the moment is <CODE>ClustersFromDistributionsByAssessor</CODE> which uses a <CODE>distributionAssessor</CODE> to evaluate differences between the distributions and merges the most similar values until the <CODE>stopCriterion</CODE> says it's enough.</DD>
365
366<DT>distributionAssessor (instance.clustersFromDistributions.distributionAssessor)</DT>
367<DD>is the component used to assess the individual and average qualities of distributions, and profits from their merging.
368This component is independent of stopping criteria. The default is <CODE>orange.DistributionAssessor_Kramer</CODE>, which uses Kramer's measure of impurity. Alternatives are
369<UL>
370<LI>
371<CODE>orange.ColumnAssessor_Laplace</CODE> minimizes Laplace estimation of error of distributions;</LI>
372<LI>
373<CODE>orange.DistributionAssessor_Measure</CODE> which optimizes an attribute quality measure (such as information gain);</LI>
374<LI>
375<CODE>orange.ColumnAssessor_m</CODE> which assesses the quality of distributions using m-estimate for error.</LI>
376<LI>
377<CODE>orange.DistributionAssessor_Relief</CODE> uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.</LI>
378</UL>
379Note that this default is different than the one in analogous <code>orngCI.FeatureByIM</code> class. This also influences the default <code>stopCriterion</code> (see below).
380
381<DT>stopCriterion (instance.clustersFromDistributions.stopCriterion)</DT>
382<DD>decides when to stop the merging. The default depends upon other parameters. Rules for default are the same as for <code>orngCI.FeatureByIM</code>: when no <code>n</code>, <code>minProfitProportion</code> or <code>binary</code> are given, the <code>stopCriterion</code> depends upon <code>columnAssessor</code>. When Kramer's assessor is used, default is <CODE>orange.StopDistributionClustering_binary</CODE>, which stops when only two values are left. Otherwise, <CODE>orange.StopDistributionClustering_noProfit</CODE> that stops when the gain is negative or when it is smaller than the prescribed proportion is used. The third alternative is <CODE>orange.StopDistributionClustering_n</CODE> that stops at <CODE>n</CODE> values.
383
384<DT>minProfitProportion (instance.clustersFromDistributions.stopCriterion.minProfitProportion)</DT>
385<DD>sets the minimal profit for merging to continue. If you set it, for example, to 0.05, the merging will stop when the best merge would improve the current quality for less than 5%. If you use this with m-error estimation, for instance, the merging would stop when no merge would decrease the error by at least 5%. The default <CODE>minProfitProporion</CODE> is 0.</DD>
386
387<DT>n (instance.clustersFromDistributions.stopCriterion.n)</DT>
388<DD>sets the number of values for the new feature. If <CODE>n</CODE> is specified, the default <CODE>stopCriterion</CODE> is <CODE>orange.StopDistributionClusteringByAssessor_n</CODE>.</DD>
389
390<DT>binary</DT>
391<DD>tells that we want to induce binary features. <CODE>orange.StopIMClusteringByAssessor_binary</CODE> is used as stopping criterion.
392</DD>
393
394<DT>m (instance.clustersFromDistributions.distributionAssessor.m)</DT>
395<DD>gives <EM>m</EM> for m-error estimate.</DD>
396
397<DT>measure (instance.clustersFromDistribution.distributionAssessor.measure)</DT>
398<DD>is the measure to be used when <CODE>distributionAssessor</CODE> is of type <CODE>orange.DistributionAssessor_Measure</CODE>. If this option is given, the default <CODE>distributionAssessor</CODE> is <CODE>orange.DistributionAssessor_Measure</CODE>.</DD>
399
400<DT>completion</DT><DD> defines what to do with combinations of values of bound attributes that were not present in the data. The default value of <CODE>completion</CODE> is <CODE>orngCI.FeatureByKramer.CompletionByBayes</CODE>; the values are determined using Bayesian classifier. The alternatives are <CODE>orngCI.FeatureByKramer.CompletionByDefault</CODE>, where the most frequent value is used for imputation, and <CODE>orngCI.FeatureByKramer.NoCompletion</CODE>, where the values are left undefined.</DT>
401</DL>
402</DL>
403
404
405<p>The class usage is essentially the same as that of <code>orngCI.FeatureByIM</code> except for different defaults for <code>distributionAssessor</code> and <code>stopCriterion</code>. This invokes the default Kramer's algorithm that uses Kramer's measure of impurity for distribution quality assessment and induces binary concepts:</p>
406
407<xmp class="code">>>> ab, quality = orngCI.FeatureByKramer(data, ["a", "b"])
408>>> print "%5.3f %s" % (quality, ab.values)
409-0.125 <2-1+2-3+1-2+3-1+1-3+3-2, 2-2+3-3+1-1>
410</xmp>
411
412<p>Setting <em>m</em> will replace the distribution estimator by <code>orange.DistributionAssessor_m</code> and, consequentially, <code>orange.StopDistributionClustering_noProfit</code> will be used as the stop criterion:</p>
413
414<xmp class="code">>>> for m in [0.0, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0]:
415...    ab, quality = orngCI.FeatureByKramer(data, ["a", "b"], m=m)
416...    print "m=%4.1f: %5.3f %s" % (m, quality, ab.values)
417...
418Kramer's method with m-estimation of error
419m= 0.0: 1.458 <2-1+3-2+3-1+2-3+1-2+1-3, 2-2+3-3+1-1>
420m= 0.1: 1.457 <1-2+1-3+2-1+3-2+3-1+2-3, 2-2+3-3+1-1>
421m= 0.2: 1.455 <1-1+3-3+2-2, 2-1+3-1+1-2+2-3+3-2+1-3>
422m= 0.5: 1.451 <2-1+1-3+3-1+1-2+3-2+2-3, 2-2+3-3+1-1>
423m= 1.0: 1.444 <2-2+3-3+1-1, 1-3+2-3+1-2+3-2+2-1+3-1>
424m= 2.0: 1.430 <1-1+2-2+3-3, 1-3+2-3+2-1+3-2+3-1+1-2>
425m= 5.0: 1.391 <1-1+3-3+2-2, 2-1+1-2+3-1+2-3+3-2+1-3>
426m=10.0: 1.334 <1-1+2-2+3-3, 3-1+1-2+2-1+2-3+3-2+1-3>
427</xmp>
428
429<h3>FeatureByRandomMerge</h3>
430
431<p><code><INDEX name="classes/FeatureByRandomMerge (in orngCI)">FeatureByRandomMerge</code> constructs features with the given number of values. Grouping of combinations of bound values is random and the estimated quality of the feature is a random value between 0 and 100, inclusive. The class is useful for comparisons - if a "real" method of constructive induction does not outperform this, it is of no use.</p>
432
433<DL>
434<DT>n</dt><dd>Sets the number of values for the new feature</dd></dl>
435<p>As an example, we shall induce 30-valued random features:</p>
436
437<xmp class="code">>>> for i in range(5):
438...    ab, quality = orngCI.FeatureByRandom(data, ["a", "b"], n=3)
439...    print "%5.3f %s" % (quality, ab.values)
440...
44153.000 <2-2+2-3, 1-2+1-3+2-1+3-2+3-3, 1-1+3-1>
44219.000 <3-2+3-3, 1-2+1-3+2-2+3-1, 1-1+2-1+2-3>
4432.000 <2-1+2-3+3-1, 2-2, 1-1+1-2+1-3+3-2+3-3>
444100.000 <1-3+2-3+3-1+3-3, 1-2+2-1+3-2, 1-1+2-2>
44586.000 <1-3+2-2+3-1+3-2, 1-1+2-3+3-3, 1-2+2-1>
446</xmp>
447
448<h3>FeatureByCartesianProduct</h3>
449
450<p><code><INDEX name="classes/FeatureByCartesianProduct (in orngCI)">FeatureByCartesianProduct</code> constructs an attribute which presents a Cartesian product of bound attributes. Its quality is estimated by a given attribute quality estimator of type <code>orange.MeasureAttribute</code>. If none is given, all qualities are 0.</p>
451
452<DL>
453<DT>measure</dt><dd>The measure of attribute quality.</dd></dl>
454
455<p>Let us construct a cartesian product-attribute and assess its quality using information gain:</p>
456
457<xmp class="code">>>> ab, quality = orngCI.FeatureByCartesianProduct(data, ["a", "b"], ...               measure=orange.MeasureAttribute_gainRatio())
458>>> print "%5.3f %s" % (quality, ab.values)
4590.145 <1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3>
460</xmp>
461
462
463<h2>Multiple Feature Construction</h2>
464
465<h3>FeatureGenerator</h3>
466
467<p><code>orngCI.<INDEX name="classes/FeatureGenerator (in orngCI)">FeatureGenerator</code> is a class that constructs a list of features using the given feature construction algorithm. Besides, you can also specify a subset generator - an object that generates bound sets.</p>
468
469<dl>
470<dt>featureInducer</dt><dd>The object that builds features from given examples and sets of bound attributes. You can use any of the above feature inducers for this purpose. This attribute needs to be specified; no default is provided.</dd>
471
472<dt>subsetGenerator</dt><dd>Generates bound sets from the given attributes. The default generator is <code>orange.SubsetsGenerator_constSize(2)</code>, which returns all pairs of attributes. Orange provides several alternatives, the most useful if <code>orange.SubsetsGenerator_minMaxSize(min, max)</code> which generates all subsets within the specified minimal and maximal number of elements. You can define your own generators in Python, if you like.</dd>
473</DL>
474
475<p>The core of the classes' code is really trivial. The class first checks whether the two fields are specified and provides the default for <code>subsetGenerator</code> if needed. Then it executes:</p>
476
477<xmp class="code">return [self.featureInducer(data, bound, weightID) for bound in ssgen]
478</xmp>
479
480<p>We will user the class to induce new features from all attribute pairs in Monk 1 using Kramer's algorithm.</p>
481
482<xmp class="code">>>> fk = orngCI.FeatureByKramer()
483>>> features = orngCI.FeatureGenerator(data, featureInducer = fk)
484>>> for ab, quality in features:
485...   names = [x.name for x in ab.getValueFrom.boundset()]
486...   print "%s: %5.3f, %s" % (names, quality)
487['a', 'b']: -0.125
488['a', 'c']: -0.250
489['a', 'd']: -0.250
490['a', 'e']: -0.167
491['a', 'f']: -0.250
492    ...
493</xmp>
494
495<h3>StructureInducer</h3>
496
497<P><CODE><INDEX name="classes/StructureInducer (in orngCI)">StructureInducer</CODE> induces a hierarchy of attributes as proposed by Zupan et al (1998). At each step, it induces attributes from subset of the current attribute set. If any attributes are found, one of them is selected, inserted into domain and its bound attributes are removed. This is the repeated until there is only one attribute left or the selected attribute includes all attributes as bound set (<EM>i.e.</EM> if new attributes are induced from pairs of existing attributes, this would stop the induction when there are only two attributes left. The result of calling <CODE>StructureInducer</CODE> is a function that computes the class attribute from the remaining attributes. Through using functions <CODE>boundset()</CODE> and <CODE>getValueFrom</CODE> it is possible to descend down through hierarchy.</P>
498
499<DL>
500<DT>featureInducer</DT>
501<DD>Any feature inducer described above (such as <CODE>FeatureByMinError</CODE> or <CODE>FeatureByKramer</CODE>) or another Python function or class with the same behaviour.</DD>
502
503<DT>subsetsGenerator</DT>
504<DD>A subset generator for bound attributes. <CODE>StructureInducer</CODE> constructs a <CODE>FeatureGenerator</CODE> and initializes it with <CODE>featureInducer</CODE> and <CODE>subsetsGenerator</CODE>.</DD>
505
506<DT>redundancyRemover</DT>
507<DD>Any class or function that gets examples and weight, and returns examples with redundancies removed. An example of such class is <CODE>orngCI.AttributeRedundanciesRemover</CODE>. If given, it is called prior to structure induction (but not during induction).</DD>
508
509<DT>keepDuplicates</DT>
510<DD>During structure induction, as attributes are merged, more and more examples become the same. To speed up the induction, such examples are replaced with a single example with a higher weight. If feature inducer can correctly handle weights (or if example duplicates do not change the induced features, as for instance, in minimal-complexity decomposition), this does not change the outcome. If you, however, want to keep the duplicated examples, set <CODE>keepDuplicates</CODE> to 1).</DD>
511</DL>
512
513<P>This example shows how to induce a structure using Kramer's method for feature construction, where <CODE>data</CODE> stores examples for dataset Monk 1.</P>
514
515<xmp class="code">>>> inducer = orngCI.FeatureByKramer()
516>>> root = orngCI.StructureInducer(data, featureInducer=inducer)
517>>> orngCI.printHierarchy(root)
518y/2 <0, 1>
519      d/3 <1, 2, 3>
520      c4/2 <c0, c1>
521            f/2 <1, 2>
522            c3/2 <1-c0+2-c0, 1-c1+2-c1>
523                  c/2 <1, 2>
524                  c2/2 <c0, c1>
525                        e/4 <1, 2, 3, 4>
526                        c1/2 <3-1+1-3+2-1+1-2+3-2+2-3, 2-2+3-3+1-1>
527                              a/3 <1, 2, 3>
528                              b/3 <1, 2, 3>
529</xmp>
530
531<h2>Learning</h2>
532
533<h3>HINT</h3>
534<index name="classifiers+HINT">
535
536<P>Class <CODE><INDEX name="classes/HINT (in orngCI)">HINT</CODE> can be used as a <CODE>Learner</CODE>.</P>
537
538<P>Basically, <CODE>HINT</CODE> is just a wrapper around <CODE>StructureInducer</CODE>. The user can specify the type of decomposition (minimal complexity, minimal error) and the size of bound sets. Alternatively, <CODE>HINT</CODE> can use defaults. When minimal error decomposition is used, <CODE>HINT</CODE> fits parameter <EM>m</EM> for error estimation by performing a 5-fold cross validation to find the value that gives the maximal classification accuracy.</P>
539
540<DL>
541<DT>type</DT>
542<DD>Sets the type of decomposition. It can be <CODE>"complexity"</CODE>, <CODE>"error"</CODE>, or <CODE>"auto"</CODE>. This, basically, sets the <CODE>StructureInducer</CODE>'s <CODE>featureInducer</CODE> to <CODE>FeatureByMinError</CODE> or <CODE>FeatureByMinComplexity</CODE>. When <CODE>type</CODE> is <CODE>"auto"</CODE>, <CODE>HINT</CODE> tries to determine whether the domain contains noise or not by observing whether there are examples with the same values of attributes but different classes. If there are no such examples, minimal-complexity is used; if there are, it switches to minimal-error.
543</DD>
544
545<DT>boundSize</DT>
546<DD>Defines the size of bound sets. It can be either an integer (usually 2 or 3) or a tuple with minimal and maximal bound set size. If bound size is not given, bound sets are pairs of attributes.</DD>
547</DL>
548
549<h2>Auxiliary Functions</h2>
550
551<H3>RemoveAttributeRedundancies</H3>
552
553<P><INDEX name="classes/RemoveAttributeRedundancies (in orngCI)">RemoveAttributeRedundancies is a class for removing (merging) redundant attribute values. The class works as proposed by Zupan (1997). First it ranks the attributes according to some measure of quality (ReliefF, by default). Then it induces a new attribute from each attribute, starting with the worst ranked. If the induced attribute is the same as the original, nothing happens. If some values were merged, the original attribute is replaced with the induced. If all values are merged into one, the attribute is redundant and is removed.</P>
554
555<DL>
556<DT>m</DT>
557<DD>The parameter for m-error estimate</DD>
558
559<DT>inducer</DT>
560<DD>Inducer for constructing attributes from the original. If omitted, <CODE>FeatureByMinError</CODE> is used if <CODE>m</CODE> is given, and <CODE>FeatureByMinComplexity</CODE> if it is not.</DD>
561</DL>
562
563<h3>addAttribute(attribute, table)</h3>
564
565<P>This function adds an attribute to the domain. A new domain (and a new <CODE>ExampleTable</CODE> is constructed; the original is left intact. Values for the new attribute are computed through its <CODE>valueFrom</CODE> function.</P>
566
567<h3>replaceWithInduced(attribute, table)</h3>
568
569<P>Similar to <CODE>addAttribute</CODE>, but also removes the bound attributes from the domain.</P>
570
571
572<h3>printHierarchy(attribute | classifier)</h3>
573
574<P>Prints out a hierarchy starting with a given attribute or a classifier. In the latter case, the classifier must support <CODE>boundset</CODE> method; classifiers returned by <CODE>InduceStructure</CODE> and <CODE>HINT</CODE> do have this method.</P>
575
576<P>You've seen an example of how to use this function a while ago.</P>
577
578<h3>dotHierarchy(file, attribute | classifier</h3>
579
580<P>Writes a hierarchy starting with a given attribute or a classifier to a file for tree drawing program Dot from package GraphViz. If the classifier is given, it must support <CODE>boundset</CODE> method; classifiers returned by <CODE>InduceStructure</CODE> and <CODE>HINT</CODE> do have this method. File can be given as a string, in which case a new file is created, or as an opened file, in which case the tree is appended to the file.</P>
581
582<h2>References</h2>
583
584<p>S. Kramer. CN2-MCI (1994): A two-Step Method for Constructive Induction.
585<EM>Proc. ML-COLT '94 Workshop on Constructive Induction and Change
586of Representation</EM>. (<a href = "http://home.comcast.net/~tom.fawcett/public_html/CICR/kramer.ps.gz">http://www.hpl.hp.com/personal/Tom_Fawcett/CICR/kramer.ps.gz</a>)
587</p>
588
589<P>S. B. Thrun et al. (1991): <EM>A performance comparison of different learning algorithms</EM>. Carnegie Mellon University CMU-CS-91-197.</P>
590
591<P>B. Zupan (1997). <EM>Machine learning based on function decomposition</EM>. PhD Thesis at University of Ljubljana.</EM>
592
593<P>B. Zupan, M. Bohanec, J. Demsar and I. Bratko (1998). Feature transformation by function decomposition. <em>IEEE intell. syst. their appl.</em>, vol. 13, pp. 38-43
594
595</body> </html>
Note: See TracBrowser for help on using the repository browser.