source: orange/Orange/doc/reference/callbacks.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 44.2 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8<index name="callbacks to Python">
9<index name="subtyping Orange's classes in Python">
10<h1>Subtyping Orange classes in Python</h1>
11
12<p>This page describes how to subtype Orange's classes in Python and make the overloaded methods callable from C++ components.</p>
13
14<p style="margin-bottom: 0mm">Since Orange has only an interface to Python but is otherwise independent from it, subtyping might sometimes not function as you would expect. If you subtype an Orange class in Python and overload some methods, the C++ code (say if you use your instance as a component for some Orange object) will call the C++ method and not the one you provided in Python.</p>
15
16<P>Exceptions to that are: <CODE>Filter</CODE>, <CODE>Learner</CODE>, <CODE>Classifier</CODE>, <CODE>TreeSplitConstructor</CODE>, <CODE>TreeStopCriteria</CODE>, <CODE>TreeExampleSplitter</CODE>, <CODE>TreeDescender</CODE> and <CODE>MeasureAttribute</CODE>, <CODE>TransformValue</CODE>, <CODE>ExamplesDistance</CODE>, <CODE>ExamplesDistance_Constructor</CODE>. If you subtype one of these classes and overload its call operator, your operator will get called from the C++ code. If you subclass any other class or overload any other method, C++ code won't know about it.</P>
17
18<P>If your subclass won't get called from C++ but only from Python code, you can subclass anything you want and it will function as it should.</P>
19
20<P>If you are satisfied with that, you can <a href="#examples">skip to the examples</a> on this page to learn how to subclass what you need. If you wonder why does it have to be like this, read on.</P>
21
22
23<hr>
24
25<H2>General Problem with Subtyping</H2>
26
27<p><a name="footnoteref1"></a>Orange was first conceived as a C++ library of machine learning components. It was only after several years of development that Python was first used as a glue language. But even after being interfaced to Python, Orange still maintains its independency. It would be, in principle, possible to export Orange's components as, for example, COM objects that wouldn't require Python to run<SUP><a href="#footnote1">1</a></SUP>. Orange components are not aware that they are being called from Python. Even more, they are not aware that they're being exposed to Python.</p>
28
29<p>This becomes important when subtyping the components. Let's say you derived a Python class <code>MyDomain</code> from <code>Domain</code>. We would like to redefine the call operator which is used to convert an example from another domain. Our operator uses the original <code>Domain</code> to convert the example but afterwards sets the class to unknown.</p>
30
31<p class="header"><a href="cb-mydomain.py">cb-mydomain.py</a>
32(uses <a href="lenses.tab">lenses.tab</a>)</p>
33<xmp class="code">class MyDomain(orange.Domain):
34  def __call__(self, example):
35    ex = orange.Domain.__call__(self, example)
36    ex.setclass("?")
37    return ex
38
39md = MyDomain(data.domain)
40</xmp>
41
42<p><a name="footnoteref2"></a>Subtyping built-in classes in Python is technically complex. When you call <code>MyDomain</code>, the <code>Domain</code>'s constructor is called. It will construct a C++ instance of <code>Domain</code>, but when returning it to Python, it would mark it as an instance of <code>MyDomain</code><SUP><a href="#footnote2">2</a></SUP>. So, the <code>md</code>'s memory representation is the same as that of ordinary instances of <code>orange.Domain</code>, but its type, as known to Python, is <code>MyDomain</code>. So when Python interpreter calls <code>md</code>, it will treat it as <code>MyDomain</code> and correctly call the method we defined above.</P>
43
44<xmp class="code">>>> print md(data[0])
45['psby', 'hyper', 'y', 'normal', '?']
46</xmp>
47
48<p>Not so with C++. The C++ code knows nothing about Python wrappers, types, overloaded methods. For C++, <code>md</code> is an ordinary instance of <CODE>Domain</CODE>. So, what happens when C code tries to call it? To check this, we will convert an example in another way. We'll call the <code>Example</code>'s constructor. There are different arguments that can be given; if you provide a domain and an existing example, a new example will be constructed by converting the existing into the specified domain. The domain will be called internally to perform the actual conversion.</p>
49
50<xmp class="code">>>> print orange.Example(md, data[0])
51['psby', 'hyper', 'y', 'normal', 'no']
52</xmp>
53
54<p><a name="footnoteref3"></a>The class is still 'no', not unknown.<SUP><a href="#footnote3">3</a></SUP></p>
55
56<p>The obvious solution to the problem would be to make Orange's components "Python aware". This would be extremely slow: at each call, even at calls of <code>operator[]</code>, for example, C++ code would have to check whether the corresponding operator has been overloaded in Python.</P>
57
58<p>The solution we preferred is efficient, yet limited. The following fragment is a fully functional filter, derived from <code>Orange.Filter</code>.</p>
59
60<xmp class="code">class FilterYoung(orange.Filter):
61    def __call__(self, ex):
62        return ex["age"]=="young"
63</xmp>
64
65<p>You can, for instance, use it as an argument for <code>ExampleTable.filter</code> method:</p>
66
67<xmp class="code">>>> fy = FilterYoung()
68>>> for e in data.filter(fy):
69>>>   print e
70['young', 'myope', 'no', 'reduced', 'none']
71['young', 'myope', 'no', 'normal', 'soft']
72['young', 'myope', 'yes', 'reduced', 'none']
73['young', 'myope', 'yes', 'normal', 'hard']
74['young', 'hypermetrope', 'no', 'reduced', 'none']
75['young', 'hypermetrope', 'no', 'normal', 'soft']
76['young', 'hypermetrope', 'yes', 'reduced', 'none']
77['young', 'hypermetrope', 'yes', 'normal', 'hard']
78</xmp>
79
80<p><a name="footnoteref4"></a><code>orange.Filter</code> is an abstract object. You cannot construct it, e.g. by calling <code>Filter()</code>. But when the <code>Filter</code>'s constructor is called to construct an instance of <code>FilterYoung</code>, such as <code>fy</code>, it constructs an instance of a special callback class, <code>Filter_Python</code> (although Orange doesn't admit it - when returning it to Python it says it's a <code>FilterYoung</code>, and that its base class is <code>Filter</code><SUP><a href="#footnote4">4</a></SUP>).</p>
81
82<xmp class="code">>>> type(FilterYoung())
83<class '__main__.FilterYoung'>
84>>> type(FilterYoung()).__base__
85<type 'Filter'>
86</xmp>
87
88<p><code>Filter_Python</code>'s call operator (written in C++ and thus seen and respected in C++ code) callbacks the overloaded <code>__call__</code> written in Python.</p>
89
90<p>If it works for <code>orange.Filter</code> - why does it fail on <code>orange.Domain</code>? It's not been programmed. Only call operators of a few classes can do this: <code>Filter</code>, <code>Learner</code>, <code>Classifier</code>, <code>TreeSplitConstructor</code>, <code>TreeStopCriteria</code>, <code>TreeExampleSplitter</code>, <code>TreeDescender</code> and <code>MeasureAttribute</code>. Why only those? A rough estimate is that making all the methods of the existing 300 orange classes overloadable would inflate the Orange's source code to its triple size! On the other hand, the chosen classes are those that are most likely to be overloaded. For other, you can either find another solution or ask us to make them overloadable as well. Adding this functionality to a single method of a single class is a small undertaking.</P>
91
92<p>You might be tempted to overload a call operator of a class that is derived from <code>Filter</code>. For instance</p>
93
94<xmp class="code">class MyFilter(orange.Filter_index):
95   ...
96</xmp>
97
98<p>This will fail just as for <code>Domain</code>. The class <code>Filter_Python</code> is derived from <code>Filter</code> and can only inherit its functionality. In the previous example, <code>Filter_Python</code> was a hidden class between <code>FilterYoung</code> and <code>orange.Filter</code>. In this example, you would need a class between <code>MyFilter</code> and <code>orange.Filter_index</code>, and this role obviously cannot be played by <code>Filter_Python</code>.</P>
99
100<p>How to do it then? The simplest way is to wrap a <code>Filter_index</code>, like this</p>
101
102<xmp class="code">class MyFilter(orange.Filter):
103   def __init__(self):
104      self.subfilter = orange.Filter_index()
105      ...
106
107   def __call__(self):
108     ... here you can call a self.subfilter
109         or do some of your stuff ...
110</xmp>
111
112<p><code>MyFilter</code> is now derived from <code>orange.Filter</code> but still has it's own copy of <code>Filter_index</code>. Not pretty, but it works.</p>
113
114<p>There's another nice way to construct your own filters (and, in general, other overloadable components). You don't need to derive a new class but seemingly construct an instance of an abstract class, giving a callback function as an argument.</p>
115
116<xmp class="code">filt = orange.Filter(lambda ex:ex["age"]=="young")
117for e in data.filter(filt):
118    print e
119</xmp>
120
121<p>The <code>Filter</code>'s constructor is called directly, as to construct an instance of <code>Filter</code>, which it would usually refuse (since the class is abstract). But when given a function as an argument (here we used lambda function, but you can, of course, use ordinary functions as well), it constructs a <code>Filter_Python</code> and stores the given function to its dictionary. When <code>Filter_Python</code> is called, it calls the function passed as an argument to its constructor.</p>
122
123<p>There's another twist. Sometimes you don't need to wrap the function into a class at all. You can, for example, construct a tree's <CODE>nodeLearner</CODE> on the fly. <CODE>nodeLearner</CODE> should be derived from <CODE>Classifier</CODE>, but you can assign it a callback function.</p>
124
125<xmp class="code">treeLearner = orange.TreeLearner()
126treeLearner.nodeLearner = lambda gen, weightID: orange.MajorityLearner(gen, weightID)
127</xmp>
128
129<p>This function replaces a <code>nodeLearner</code> with a learner that calls <code>orange.MajorityLearner</code>. The example is artificial; the <code>MajorityLearner</code> can be given directly, with <code>treeLearner.nodeLearner = orange.MajorityLearner</code>; besides that, this is default anyway. Anyway, how can we assign a Python function to a <code>nodeLearner</code> which can only hold a pure C++ object <CODE>Classifier</CODE>, not Python functions. And how can we then even expect <code>TreeLearner</code> (written in C++) to call it? Checking what the above snippet actually stored to <CODE>treeLearner.nodeLearner</CODE> is revealing.</p>
130
131<xmp class="code">>>> treeLearner.nodeLearner
132<Learner instance at 0x019B1930>
133</xmp>
134
135<p>When Orange assigns values to a component's attribute like <code>treeLearner.nodeLearner</code> it tries to convert the given arguments to a correct class, a <code>Learner</code> in this case. If the user actually gives an instance of (something derived from) <code>Learner</code>, that's great. Otherwise, Orange will call the <code>Learner</code>'s constructor with what was given as an argument. If constructor can use this to construct an object, it's OK. The above assignment is thus equivalent to</p>
136
137<xmp class="code">treeLearner = orange.TreeLearner()
138treeLearner.nodeLearner = orange.Learner(lambda gen, weightID: orange.MajorityLearner(gen, weightID))
139</xmp>
140
141<p>which, as we know from the last example with <code>Filter</code>, works as intended.</P>
142
143<p>You might have used this feature before without knowing it. Have you ever constructed an <code>EnumVariable</code> and assigned some values to its field <code>values</code>? Field <code>values</code> is stored as <code>orange.StringList</code>, pure C++ <CODE>vector&lt;string&gt;</CODE>, not a Python list of strings which you have provided. A same thing as with <CODE>nodeLearner</CODE> happens here: since you tried to assign Python list instead of <CODE>StringList</CODE>, <CODE>StringList</CODE>'s constructor was called with Python list as an argument.</p>
144
145<p>A final advice for using derived classes is: don't wonder too much about how it works. Just be happy that it does. Use it, try things that you think might work, but be sure to check (a simple <code>print</code> here and there will suffice) that your call operators are actually called. That's all you need to care about.</p>
146
147<H2>Calling Inherited Methods</H2>
148
149<p>All classes for which you can (really) overload the call operator are abstract. The only exception is <CODE>TreeStopCriteria</CODE>, so this is the only class for which calling the inherited call operator makes sense. Examples section shows how to do it.</p>
150
151<p>For all other classes: calling the inherited method is an error. Similarly, forgetting to define the call operator but trying to use it leads to a call of the inherited operator; an error again.</p>
152
153<A name="examples"></a>
154<H2>Examples</H2>
155
156<p>The below examples suppose that you have loaded a 'lenses' data in a variable 'data'.</p>
157
158<p>Examples are somewhat simplified. For instance, many classes below will silently assume that the attribute they deal with is discrete. This is to make the code clearer.</p>
159
160<H3>Filter</H3>
161<index name="classes/Filter">
162
163<p>We've already show how to derive filters. Filter is a simple object that decides whether a given example is "acceptable" or not. The below class accepts example for which the value of "age" is "young".</p>
164
165<xmp class="code">class FilterYoung(orange.Filter):
166    def __call__(self, ex):
167        return ex["age"]=="young"
168</xmp>
169
170<p><code>Filter</code> can be used, for instance, for selecting examples from an example table.</p>
171
172<xmp class="code">>>> fy = FilterYoung()
173>>> for e in data.filter(fy):
174...   print e
175['young', 'myope', 'no', 'reduced', 'none']
176['young', 'myope', 'no', 'normal', 'soft']
177['young', 'myope', 'yes', 'reduced', 'none']
178['young', 'myope', 'yes', 'normal', 'hard']
179['young', 'hypermetrope', 'no', 'reduced', 'none']
180['young', 'hypermetrope', 'no', 'normal', 'soft']
181['young', 'hypermetrope', 'yes', 'reduced', 'none']
182['young', 'hypermetrope', 'yes', 'normal', 'hard']
183</xmp>
184
185<p>Note two things. You don't need to write your own filters to select examples based on values. You'd get the same effect by</p>
186
187<xmp class="code">>>> for e in data.select(age="young"):
188...   print e
189</xmp>
190
191<p>Second, you don't need to derive a class from a filter when a function would suffice. You can write either</p>
192
193<xmp class="code">>>> def f(ex):
194...   return ex["age"]=="young"
195>>> for e in data.filter(orange.Filter(f)):
196...    print e
197</xmp>
198
199<p>or, for cases as simple as this, squeeze the whole function into a lambda function</p>
200
201<xmp class="code">>>> for e in data.filter(orange.Filter(lambda ex:ex["age"]=="young")):
202...    print e
203</xmp>
204
205
206<H3>Classifier</H3>
207<index name="classes/Classifier">
208
209<p>A "classifier" in Orange has a rather non-standard meaning. A classifier is an object with a call operator that gets an example and returns a value, a distribution of values or both - the return type is regulated by an optional second argument. Beside the standard use of classifiers - "class predictors" - this also covers predictors in regression, objects used in constructive induction (which use some of example's attributes to compute a value of a new attribute), and others.</p>
210
211<p>For this tutorial, we will define a classifier that can be used for simple constructive induction. Its constructor will accept two attributes and construct a new attribute as Cartesian product of the two. Its name and names of its values will be constructed from pairs of names for original attributes. The call operator will return a value of the new attribute that corresponds to the values that the two attribute have on the example.</p>
212
213<p class="header"><a href="cb-classifier.py">cb-classifier.py</a>
214(uses <a href="lenses.tab">lenses.tab</a>)</p>
215<xmp class="code">class CartesianClassifier(orange.Classifier):
216  def __init__(self, var1, var2):
217    self.var1, self.var2 = var1, var2
218    self.noValues2 = len(var2.values)
219    self.classVar = orange.EnumVariable("%sx%s" % (var1.name, var2.name))
220    self.classVar.values = ["%s-%s" % (v1, v2) \
221                            for v1 in var1.values for v2 in var2.values]
222
223  def __call__(self, ex, what = orange.Classifier.GetValue):
224    val = ex[self.var1] * self.noValues2 + ex[self.var2]
225    if what == orange.Classifier.GetValue:
226      return orange.Value(self.classVar, val)
227    probs = orange.DiscDistribution(self.classVar)
228    probs[val] = 1.0
229    if what == orange.Classifier.GetProbabilities:
230      return probs
231    else:
232      return (orange.Value(self.classVar, val), probs)
233</xmp>
234
235<p>No surprises in constructor, except for a trick for construction of <code>classVar.values</code>.</p>
236
237<p>In the call operator, the first line uses an implicit conversion of values to integers. When <code>ex[self.var1]</code>, which is of type <code>orange.Value</code>, is multiplied by <code>noValues2</code>, which is an integer, the former is converted to an integer. The same happens at addition.</p>
238
239<p><code>val</code> is an index of the value to be returned. What follows is the usual procedure for constructing a correct return type for a classifier - you will often do something very similar in your classifiers.</p>
240
241<p class="header"><a href="cb-classifier.py">cb-classifier.py</a>
242(uses <a href="lenses.tab">lenses.tab</a>)</p>
243<xmp class="code">>>> tt = CartesianClassifier(data.domain[2], data.domain[3])
244>>> for i in range(5):
245...     print "%s --> %s" % (data[i], tt(data[i]))
246...
247['young', 'myope', 'no', 'reduced', 'none'] ---> young-myope
248['young', 'myope', 'no', 'normal', 'soft'] ---> young-myope
249['young', 'myope', 'yes', 'reduced', 'none'] ---> young-myope
250['young', 'myope', 'yes', 'normal', 'hard'] ---> young-myope
251['young', 'hypermetrope', 'no', 'reduced', 'none'] ---> young-hypermetrope
252['young', 'hypermetrope', 'no', 'normal', 'soft'] ---> young-hypermetrope
253</xmp>
254
255
256<H3>Learner</H3>
257<index name="classes/Learner">
258
259<p><code>ClassifierByLookupTable</code> is a classifier whose predictions are based on the value of a single attribute. It contains a simple table named <code>lookupTable</code> for conversion from attribute value to class prediction. The last element of the table is the value that is returned when the attribute value is unknown or out of range. Similarly, <code>distributions</code> is a list of distributions, used when <code>ClassifierByLookupTable</code> is used to predict a distribution.</p>
260
261<p>Let us write a learner which chooses an attribute using a specified measure of quality and constructs a <code>ClassifierByLookupTable</code> that would use this single attribute for making predictions.</p>
262
263<p class="header"><a href="cb-learner.py">cb-learner.py</a>
264(uses <a href="lenses.tab">lenses.tab</a>)</p>
265<xmp class="code">class OneAttributeLearner(orange.Learner):
266  def __init__(self, measure):
267    self.measure = measure
268
269  def __call__(self, gen, weightID=0):
270    selectBest = orngMisc.BestOnTheFly()
271    for attr in gen.domain.attributes:
272      selectBest.candidate(self.measure(attr, gen, None, weightID))
273    bestAttr = gen.domain.attributes[selectBest.winnerIndex()]
274    classifier = orange.ClassifierByLookupTable(gen.domain.classVar, bestAttr)
275
276    contingency = orange.ContingencyAttrClass(bestAttr, gen, weightID)
277    for i in range(len(contingency)):
278      classifier.lookupTable[i] = contingency[i].modus()
279      classifier.distributions[i] = contingency[i]
280    classifier.lookupTable[-1] = contingency.innerDistribution.modus()
281    classifier.distributions[-1] = contingency.innerDistribution
282    for d in classifier.distributions:
283      d.normalize()
284
285    return classifier
286</xmp>
287
288<p>Constructor stores the measure to be used for choosing the attribute. Call operator assesses the qualities of attributes and feeds them to <code>orngMisc.BestOnTheFly</code>. This is a simple class with method <code>candidates</code> to which we feed some objects, and <code>winnerIndex</code> that tells the index of the greatest of the "candidates" (there's also a method <code>winner</code> that returns a winner itself, but we cannot use it here). The benefit of using <code>BestOnTheFly</code> is that it is fair; in case there are more than one winners, it will return a random winner and not the first or the last (however, if you call <code>winnerIndex</code> repetitively without adding any (winning) candidates, it will repeatedly return the same winner).</p>
289
290<p>The chosen attribute is stored in <code>bestAttr</code>. A <code>ClassifierByLookupTable</code> is constructed next.</p>
291
292<p>We then need to fill the <code>lookupTable</code> and <code>distributions</code>. For this, we construct a contingency matrix of type <code>ContingencyAttrClass</code> that has the given attribute as the outer and the class as the inner attribute. Thus, <code>contingency[i]</code> gives the distribution of classes for the i-th value of the attribute. We then iterate through the contingency to find the most probable class for each value of the attribute (obtained as modus of the distribution). When predicting probabilities of classes, our classifier will return normalized distributions.</p>
293
294<p>When the value of the attribute is unknown or out of range, it will return the most probable class and the apriori class distribution; this can be find as inner distribution of the contingency.</p>
295
296<p class="header"><a href="cb-learner.py">cb-learner.py</a>
297(uses <a href="lenses.tab">lenses.tab</a>)</p>
298<xmp class="code">>>> oal = OneAttributeLearner(orange.MeasureAttribute_gainRatio())
299>>> c = oal(data)
300>>> c.variable
301EnumVariable 'tear_rate'
302>>> c.variable.values
303<reduced, normal>
304>>> print c.lookupTable
305<none, soft, none>
306>>> print c.distributions
307<<1.000, 0.000, 0.000>, <0.250, 0.417, 0.333>, <0.625, 0.208, 0.167>>
308</xmp>
309
310<p>When trained on 'lenses' data, our learner chose the attribute 'tear_rate'. When its value is 'reduced', the predicted class is 'none' and the distribution shows that the classifier is pretty sure about it (100%). When the value is 'normal', the predicted class is 'soft' but with much less certainty (42%). When the value will be unknown or out of range (for example, is the user adds some values), the classifier will predict class 'no' with 62.5% certainty.</p>
311
312
313<H3>ExamplesDistance and ExamplesDistance_Constructor</H3>
314
315<P><code>ExamplesDistance_Constructor</code> receives four arguments: an example generator and weights meta id, domain distributions (of type <code>DomainDistributions</code> and basic attribute statistics (an instance of <code>DomainBasicAttrStat</code>). The latter two can be <code>None</code>; you should write your code so that it computes them itself from the examples if they are needed. Function should return an instance of <code>ExamplesDistance</code>.</P>
316
317<P><code>ExamplesDistance</code> gets two examples and should return a number representing the distance between them.</P>
318
319
320<H3>MeasureAttribute</H3>
321<index name="classes/MeasureAttribute">
322
323<p><code>MeasureAttribute</code> is slightly more complex since it can be given different sets of parameters. The class defines the way it will be called by setting the "needs" field (see documentation on <a href="MeasureAttribute.htm">attribute evaluation</a> for more details). (<b>Note: this has been changed from the mess we had in the past. Any existing code should still work or will need to be simplified if it does not.)</b>)</p>
324
325<DL class=attributes>
326<DT>__call__(attributeNumber, domainContingency, aprioriProbabilities)</DT>
327<DD>These arguments are sent if <code>needs</code> is set to <code>orange.MeasureAttribute.DomainContingency</code>. The data from which the attribute is to be evaluated is given by contingency of all attributes in the dataset. The <CODE>attributeNumber</CODE> tells which of those attribute the function needs to evaluate. Finally, there are apriori class probabilities, if the methods can make use of them; the third argument can sometimes be <CODE>None</CODE>.</DD>
328
329<DT>__call__(contingencyMatrix, classDistribution, aprioriProbabilities)</DT>
330<DD>In this form, which is used if <code>needs</code> equals <code>orange.MeasureAttribute.Contingency_Class</code>, you are given a class distribution and the contingency matrix for the attribute that is to be evaluated. In context of decision tree induction, this is a class distribution in a node and class distribution in branches if this attribute is chosen. The third argument again gives the apriori class distribution, and can sometimes be <CODE>None</CODE> if apriori distribution is unknown.</DD>
331
332<DT>__call__(attributes, examples, aprioriProbabilities, weightID)</DT>
333<DD>This form is used if <code>needs</code> is <code>orange.MeasureAttribute.Generator</code>. The attribute can be given as an instance of <code>int</code> or of <code>Variable</code> - you might want to check the argument type before using it.</DD>
334</DL>
335
336<P>In all cases, the method must return a real number representing the quality attribute; higher numbers mean better attributes. If in your measure of quality higher values mean worse attributes, you can either negate or inverse the number.</p>
337
338<p>As an example, we will write a measure that is based on cardinality of attributes. It will also have a flag by which the user will decide whether he prefers the attributes with higher or with lower cardinalities.</p>
339
340<p class="header"><a href="cb-measureattribute.py">cb-measureattribute.py</a>
341(uses <a href="lenses.tab">lenses.tab</a>)</p>
342<xmp class="code">class MeasureAttribute_Cardinality(orange.MeasureAttribute):
343  def __init__(self, moreIsBetter = 1):
344    self.moreIsBetter = moreIsBetter
345
346  def __call__(self, a1, a2, a3):
347    if type(a1) == int:
348      attrNo, domainContingency, apriorClass = a1, a2, a3
349      q = len(domainContingency[attrNo])
350    else:
351      contingency, classDistribution, apriorClass = a1, a2, a3
352      q = len(contingency)
353
354    if self.moreIsBetter:
355      return q
356    else:
357      return -q
358</xmp>
359
360<p>Alternatively, we can write the measure in form of a function, but without the flag. To make it shorter, will skip fancy renaming of parameters.</p>
361
362<p class="header"><a href="cb-measureattribute.py">cb-measureattribute.py</a>
363(uses <a href="lenses.tab">lenses.tab</a>)</p>
364<xmp class="code">def measure_cardinality(a1, a2, a3):
365  if type(a1) == int:
366    return len(a2[a1])
367  else:
368    return len(a1)
369</xmp>
370
371<p>To test the class and the function we shall induce a decision tree using the specified measure.</p>
372
373<p class="header"><a href="cb-measureattribute.py">cb-measureattribute.py</a>
374(uses <a href="lenses.tab">lenses.tab</a>)</p>
375<xmp class="code">treeLearner = orange.TreeLearner()
376treeLearner.split = orange.TreeSplitConstructor_Attribute()
377treeLearner.split.measure = MeasureAttribute_Cardinality(1))
378tree = treeLearner(data)
379orngTree.printModel(tree)
380</xmp>
381
382<p>There are three two-valued and one three-valued attribute. If we set the <code>moreIsBetter</code> to 1, as above, the attribute of the root of the tree would be the three-valued <code>age</code> while the attributes for the rest of the tree are chosen at random. If we set it to 0, the attribute <code>age</code> is used only when the values of all remaining attributes have been checked.</p>
383
384<p>To use the function <code>measure_cardinality</code> we don't need to wrap it into anything. If we simply set</p>
385
386<xmp class="code">treeLearner.split = orange.TreeSplitConstructor_Attribute()
387treeLearner.split.measure = measure_cardinality)
388</xmp>
389
390<p>the function is automatically wrapped.</p>
391
392
393<H3>TransformValue</H3>
394<index name="classes/TransformValue">
395
396<P><A href="TransformValue.htm"><CODE>TransformValue</CODE></a> is a simple class whose call operator gets a <CODE>Value</CODE> and returns another (or the same) <CODE>Value</CODE>. An example of its use is given in the page about <a href="classifierFromVar.htm">classifiers from attribute</a>.</P>
397
398<H3>TreeSplitConstructor</H3>
399<index name="classes/TreeSplitConstructor">
400
401<p>The usual tree split constructor chooses an attribute on which the split is based and construct a <code>ClassifierFromVarFD</code> to return the chosen attribute's value. They are capable of much more. To demonstrate this, we will write a split constructor that constructs a split based on values of two attributes, joined in a Cartesian product. We will utilize a <code>CartesianClassifier</code> that we've already written above.</p>
402
403<p class="header"><a href="cb-splitconstructor.py">cb-splitconstructor.py</a>
404(uses <a href="lenses.tab">lenses.tab</a>)</p>
405<xmp class="code">class SplitConstructor_CartesianMeasure(orange.TreeSplitConstructor):
406  def __init__(self, measure):
407    self.measure = measure
408
409  def __call__(self, gen, weightID, contingencies, apriori, candidates):
410    attributes = data.domain.attributes
411    selectBest = orngMisc.BestOnTheFly(orngMisc.compare2_firstBigger)
412    for var1, var2 in orange.SubsetsGenerator_constSize(2, attributes):
413      if candidates[attributes.index(var1)] and candidates[attributes.index(var2)]:
414        cc = CartesianClassifier(var1, var2)
415        cc.classVar.getValueFrom = cc
416        meas = self.measure(cc.classVar, gen)
417        selectBest.candidate((meas, cc))
418
419    if not selectBest.best:
420      return None
421
422    bestMeas, bestSelector = selectBest.winner()
423    return (bestSelector, bestSelector.classVar.values, None, bestMeas)
424</xmp>
425
426<p>We again use the class <code>BestOnTheFly</code> from <code>orngMisc</code> module. This time we need to add a compare function that will compare the first element of the tuple, <code>orngMisc.compare2_firstBigger</code>, since we will feed it with tuples (a quality of the split, selector). The best selector and its quality are retrieved by the method <code>winner</code>.</p>
427
428<p>Class <code>orange.SubsetGenerator_constSize</code> is used to generate pairs of attributes. For each pair, we check that both attributes are among the candidates.</p>
429
430<p>Now comes the tricky business. We construct a <code>CartesianClassifier</code> to compute a Cartesian product of the two attributes. <code>CartesianClassifier</code>'s constructor prepares a new attribute, which is stored in its <code>classVar</code>. The quality of the split needs to be determined as the quality of this attribute, as measured by <code>self.measure</code> - "<code>meas = self.measure(cc.classVar, gen)</code>" does the job. But the problem is that given examples (<code>gen</code>) do not have the attribute <code>cc.classVar</code>.</p>
431
432<p>Not all, but many Orange's methods act like this: when asked to do something with the attribute that does not exist in the given domain, they try to compute its value from the attributes that are available. More precisely, the attribute needs to have a pointer to a classifier that is able to compute its value. In our case, we set the <code>cc.classVar</code>'s field <code>getValueFrom</code> to <code>cc</code>.</p>
433
434<p>When <code>self.measure</code> notices that the attribute <code>cc.classVar</code> does not exist in domain <code>gen.domain</code>, it will use <code>cc.classVar.getValueFrom</code> to compute its values on the fly.</p>
435
436<p>If you don't understand the last few paragraphs, a short resume: using some magic, we construct a classifier that can be used as a split criterion (node's <code>branchSelector</code>), assess its quality and show it to <code>selectBest</code>. More about this is written in documentation on <a href="Variable.htm">attribute descriptors</a>.</p>
437
438<p>When the loop ends, we return <code>None</code> if no splits were find (possibly because there were not enough candidates). Otherwise, we retrieve the winning quality and selector, and return an appropriate tuple consisting of</p>
439<UL>
440<LI>the <CODE>branchSelector</CODE>; a classifier that returns a value computed from the two used attributes;
441<LI>descriptions of branches; values of the constructed attribute fit this purpose well;
442<LI>numbers of examples in branches; we don't have this available so we'll let the <CODE>TreeLearner</CODE> find it itself
443<LI>quality of the split; a measure will do;
444<LI>the index of spent attribute; we've spent two, but can return only a single number, so we act as if we've spent none - we simply omit the index.
445</UL>
446
447<p>The code can be tested with the following script.</p>
448
449<p class="header"><a href="cb-splitconstructor.py">cb-splitconstructor.py</a>
450(uses <a href="lenses.tab">lenses.tab</a>)</p>
451<xmp class="code">treeLearner = orange.TreeLearner()
452treeLearner.split = SplitConstructor_CartesianMeasure(orange.MeasureAttribute_gainRatio())
453tree = treeLearner(data)
454orngTree.printTxt(tree)
455</xmp>
456
457<H4>TreeStopCriteria</H4>
458<index name="classes/TreeStopCriteria">
459
460<p><code>TreeStopCriteria</code> is a simple class. Its arguments are examples, id of meta-attribute with weights (or 0 if examples are not weighted) and a <code>DomainContingency</code>. The induction stops if <code>TreeStopCriteria</code> returns 1 (or anything representing "true" in Python). The class is peculiar for being the only non-abstract class whose call operator can be (really) overloaded. Thus, it is possible to call the inherited call operator. Even more, you should do so.</p>
461
462<p>For a brief example, let us write a stop criterion that will call the common stop criteria, but besides that stop the induction randomly in 20% of cases.</p>
463
464<p class="header">part of <a href="cb-stopcriteria.py">cb-stopcriteria.py</a>
465(uses <a href="lenses.tab">lenses.tab</a>)</p>
466<xmp class="code">from random import randint
467
468defStop = orange.TreeStopCriteria()
469treeLearner = orange.TreeLearner()
470treeLearner.stop = lambda e, w, c: defStop(e, w, c) or randint(1, 5)==1
471</xmp>
472
473<p>We've defined a default stop criterion <code>defStop</code> to avoid constructing it at each call of our function. The whole stopping criterion is hidden in the lambda function which stops when the default says so or when the random number between 1 and 5 equals 1.</p>
474
475<p>To demonstrate a call of inherited call operator, let us do the same thing by deriving a new class.</p>
476
477<p class="header">part of <a href="cb-stopcriteria.py">cb-stopcriteria.py</a>
478(uses <a href="lenses.tab">lenses.tab</a>)</p>
479<xmp class="code">class StoppingCriterion_random(orange.TreeStopCriteria):
480  def __call__(self, gen, weightID, contingency):
481    return orange.TreeStopCriteria.__call__(self, gen, weightID, contingency) \
482           or randint(1, 5)==1
483
484treeLearner.stop = StoppingCriterion_random()
485</xmp>
486
487
488
489<H4>TreeExampleSplitter</H4>
490<index name="classes/TreeExampleSplitter">
491
492<p>Example splitter's task is to split a list of examples (usually an <code>ExamplePointerTable</code> or <code>ExampleTable</code>) into subsets and return them as <code>ExampleGeneratorList</code>. The arguments it gets are a <code>TreeNode</code> (it will need at least <code>branchSelector</code>, some splitters also use <code>branchSizes</code>), a list of examples and an id of meta-attribute with example weights (or 0, if they are not weighted).</p>
493
494<p>If some examples are split among the branches so that only part of an example belongs to a branch, the splitter should construct new weight meta attributes and fill it with example weights. A list of weight ID's should be returned in a tuple with the <code>ExampleGenerator</code> list. The exact mechanics of this is given on the page describing the tree induction.</p>
495
496
497<H4>TreeDescender</H4>
498<index name="classes/TreeDescender">
499
500<p>Descenders are about the trickiest components of Orange trees. They get two arguments, a starting node (not necessarily a tree root) and an example, and return a tuple with the finishing node and, optionally, a discrete distribution.</p>
501
502<p>If there's no distribution, the <code>TreeClassifier</code> (who usually calls the descender) will use the returned node to classify the example. Thus the node's <code>nodeClassifier</code> will probably need to be defined (unless you've patched a <code>TreeClassifier</code> or written your own version of it).</p>
503
504<p>If the distribution is returned, branches below the returned node will vote on the classifiers class and the distribution represents weights of votes for individual branches. Voting will require additional calls of the descender, but that's something that a <code>TreeClassifier</code> needs to worry about.</p>
505
506<p>The descender's real job is to decide what should happen when the descend halts because a branch for an example cannot be determined. It can either return the node (so it will be used to classify the example without looking any further), silently decide for some branch, or request a vote.</p>
507
508<p>A general descender look like that:</p>
509<xmp class="code">class Descender_RandomBranch(orange.TreeDescender):
510  def __call__(self, node, example):
511    while node.branchSelector:
512      branch = node.branchSelector(example)
513      if branch.isSpecial() or int(branch)>len(node.branches):
514        < do something >
515      nextNode = node.branches[int(branch)]
516      if not nextNode:
517        break
518      node = nextNode
519    return node
520</xmp>
521
522<p>Descenders descend until they reach a node with no <CODE>branchSelector</CODE> - a leaf. They call each node's <CODE>branchSelector</CODE> to find the branch to follow. If the value is defined, they check whether the node below is a null-node. If this is so, they act as if the current node is a leaf.</p>
523
524<p>Descenders differ in what they do when the branch index is unknown or out of range.</p>
525
526<p>In this section, we will suppose that the tree has already been induced (using, say, default settings for <code>TreeLearner</code>) and stored in a <code>TreeClassifier</code> <code>tree</code>.</p>
527
528<xmp class="code">>>> tree = orange.TreeLearner(data)
529>>> orngTree.printTxt(tree)
530tear_rate=reduced: none (100.00%)
531tear_rate=normal
532|    astigmatic=no
533|    |    age=young: soft (100.00%)
534|    |    age=pre-presbyopic: soft (100.00%)
535|    |    age=presbyopic: none (50.00%)
536|    astigmatic=yes
537|    |    prescription=myope: hard (100.00%)
538|    |    prescription=hypermetrope: none (66.67%)
539</xmp>
540
541
542<H5>Continuing the descent</H5>
543
544<p>For the first exercise, will implement a descender that decides for a random branchs when descent stops. The decision will be random, it will ignore any probabilities that might be computed based on <code>branchSizes</code> or values of other example's attributes.</p>
545
546<p class="header">part of <a href="cb-descender.py">cb-descender.py</a>
547(uses <a href="lenses.tab">lenses.tab</a>)</p>
548<xmp class="code">class Descender_Report(orange.TreeDescender):
549  def __call__(self, node, example):
550    print "Descent: root ",
551    while node.branchSelector:
552      branch = node.branchSelector(example)
553      if branch.isSpecial() or int(branch)>=len(node.branches):
554        break
555      nextNode = node.branches[int(branch)]
556      if not nextNode:
557        break
558      print ">> %s = %s" % (node.branchSelector.classVar.name, node.branchDescriptions[int(branch)]),
559      node = nextNode
560    return node
561</xmp>
562
563<p>Everything goes according to the above template. When the <code>branchSelector</code> does not return a (valid) branch, we select a random branch (and print it out for debugging purposes).</p>
564
565<p>To see how it works, we'll take the third example from the table and remove the value of the attribute needed at the root of the tree.</p>
566
567<p class="header">part of <a href="cb-descender.py">cb-descender.py</a>
568(uses <a href="lenses.tab">lenses.tab</a>)</p>
569<xmp class="code">>>> ex = orange.Example(data.domain, list(data[3]))
570>>> ex[tree.tree.branchSelector.classVar] = "?"
571>>> print ex
572['young', 'myope', 'yes', '?', 'hard']
573</xmp>
574
575<p>We'll now tell the classifier to use our descender, and classify the example - we'll call the classifier for five times.</p>
576
577<p class="header">part of <a href="cb-descender.py">cb-descender.py</a>
578(uses <a href="lenses.tab">lenses.tab</a>)</p>
579<xmp class="code">>>> tree.descender = Descender_RandomBranch()
580>>> for i in range(3):
581...    print tree(ex)
582Descender decides for  1
583hard
584Descender decides for  1
585hard
586Descender decides for  0
587none
588</xmp>
589
590<p>When the descender decides for the second branch (branch 1), the astigmatism and age is checked and the example is classified to "hard". When the descender takes the first branch (0), the classifier returns "none".</p>
591
592<H5>Voting</H5>
593
594<p>Our next descender will request a vote. It will, however, disregard any known probabilities and assign random weights to the branches.</p>
595
596<p class="header">part of <a href="cb-descender.py">cb-descender.py</a>
597(uses <a href="lenses.tab">lenses.tab</a>)</p>
598<xmp class="code">class Descender_RandomVote(orange.TreeDescender):
599  def __call__(self, node, example):
600    while node.branchSelector:
601      branch = node.branchSelector(example)
602      if branch.isSpecial() or int(branch)>=len(node.branches):
603        votes = orange.DiscDistribution([randint(0, 100) for i in node.branches])
604        votes.normalize()
605        print "Weights:", votes
606        return node, votes
607      nextNode = node.branches[int(branch)]
608      if not nextNode:
609        break
610      node = nextNode
611    return node
612</xmp>
613
614<p>In the first interesting line we construct a discrete distribution with random integers between 0 and 100, one for each branch. We normalize it and return the current node and the weights of votes. It's as simple as that.</p>
615
616<p>We'll check the descender on the same example as above.</p>
617
618<xmp class="code">>>> tree.descender = Descender_RandomVote()
619>>> print tree(ex, orange.GetProbabilities)
620Decisions by random voting
621Weights: <0.338, 0.662>
622<0.338, 0.000, 0.662>
623</xmp>
624
625<p>The first output line gives the weights of the branches - 0.338 for the first one and 0.662 for the second, which is reflected on the final answer.</p>
626
627<H5>A Reporting Descender</H5>
628
629<p>As the last example, here's a handy descender that prints out the descriptions of branches on the way. When <code>branchSelector</code> does not return (a valid) branch, it simply returns the current node, as if it was a leaf (you can change this if you want to).</p>
630
631<p class="header">part of <a href="cb-descender.py">cb-descender.py</a>
632(uses <a href="lenses.tab">lenses.tab</a>)</p>
633<xmp class="code">class Descender_Report(orange.TreeDescender):
634  def __call__(self, node, example):
635    print "Descent: root ",
636    while node.branchSelector:
637      branch = node.branchSelector(example)
638      if branch.isSpecial() or int(branch)>=len(node.branches):
639        break
640      nextNode = node.branches[int(branch)]
641      if not nextNode:
642        break
643      print ">> %s = %s" % (
644           node.branchSelector.classVar.name,
645           node.branchDescriptions[int(branch)]),
646      node = nextNode
647    print
648    return node
649</xmp>
650
651<p>We'll test it on the first example from the table (without removing any values).</p>
652
653<xmp class="code">>>> tree.descender = Descender_Report()
654>>> print "Classifying example", data[0]
655Classifying example ['young', 'myope', 'no', 'reduced', 'none']
656>>> print "----> %s" % tree(data[1])
657Descent: root  >> tear_rate = normal >> astigmatic = no >> age = young ----> soft
658</xmp>
659
660<hr>
661
662<FONT SIZE=-2>
663<SUP><a href="#footnoteref1">1</a></SUP><a name="footnote1"></a>
664Why "in principle"? The main reason is that the Python-to-Orange interface is so big that no one, at least not the principle authors of Orange, are ready to program and maintain another such interface. The other reason is that we've committed a small sin regarding independency; at certain point we stopped developing our own garbage collection system and now use the Python's instead. Getting independency from Python would mean rewriting the garbage collection which is something we'd prefer not to have to do.</FONT><P>
665
666<FONT SIZE=-2>
667<SUP><a href="#footnoteref2">2</a></SUP><a name="footnote2"></a>
668For those that are familiar with C++ terms, but not with Python's API: each Python object has a pointer to a type description - the object that you get when calling Python's built-in function <code>type()</code>. The type description gives the name of the type, the memory size of its instances, several flags and pointers to functions that the objects provides. This is somewhat similar to a C++ list of virtual methods, except that here the methods are defined in advance and have fixed position. For example, the <code>tp_cmp</code> pointer points to the function that should be called when this object is to be compared to another; if <code>NULL</code>, object does not support comparison.
669</FONT><P>
670
671<FONT SIZE=-2><a name="footnote3"></a>
672<SUP><a href="#footnoteref3">3</a></SUP><code>md</code> has two tables of virtual methods (<code>vfptr</code>), one for Python and one for C++. When Python calls it, it uses the <code>__call__</code> you defined, when C++ calls it, it calls the function defined in C++.
673</FONT><P>
674
675<FONT SIZE=-2><SUP><a href="#footnoteref4">4</a></SUP><a name="footnote4"></a>
676There's a case when the intermediate class is revealed, a <code>TreeStopCriteria_Python</code>; this is needed because the class <code>TreeStopCriteria</code> is not abstract, but we won't discuss the details here.
677</FONT>
678</body>
679</html> 
Note: See TracBrowser for help on using the repository browser.