source: orange/Orange/doc/reference/callbacks.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 44.2 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
8<index name="callbacks to Python">
9<index name="subtyping Orange's classes in Python">
10<h1>Subtyping Orange classes in Python</h1>
12<p>This page describes how to subtype Orange's classes in Python and make the overloaded methods callable from C++ components.</p>
14<p style="margin-bottom: 0mm">Since Orange has only an interface to Python but is otherwise independent from it, subtyping might sometimes not function as you would expect. If you subtype an Orange class in Python and overload some methods, the C++ code (say if you use your instance as a component for some Orange object) will call the C++ method and not the one you provided in Python.</p>
16<P>Exceptions to that are: <CODE>Filter</CODE>, <CODE>Learner</CODE>, <CODE>Classifier</CODE>, <CODE>TreeSplitConstructor</CODE>, <CODE>TreeStopCriteria</CODE>, <CODE>TreeExampleSplitter</CODE>, <CODE>TreeDescender</CODE> and <CODE>MeasureAttribute</CODE>, <CODE>TransformValue</CODE>, <CODE>ExamplesDistance</CODE>, <CODE>ExamplesDistance_Constructor</CODE>. If you subtype one of these classes and overload its call operator, your operator will get called from the C++ code. If you subclass any other class or overload any other method, C++ code won't know about it.</P>
18<P>If your subclass won't get called from C++ but only from Python code, you can subclass anything you want and it will function as it should.</P>
20<P>If you are satisfied with that, you can <a href="#examples">skip to the examples</a> on this page to learn how to subclass what you need. If you wonder why does it have to be like this, read on.</P>
25<H2>General Problem with Subtyping</H2>
27<p><a name="footnoteref1"></a>Orange was first conceived as a C++ library of machine learning components. It was only after several years of development that Python was first used as a glue language. But even after being interfaced to Python, Orange still maintains its independency. It would be, in principle, possible to export Orange's components as, for example, COM objects that wouldn't require Python to run<SUP><a href="#footnote1">1</a></SUP>. Orange components are not aware that they are being called from Python. Even more, they are not aware that they're being exposed to Python.</p>
29<p>This becomes important when subtyping the components. Let's say you derived a Python class <code>MyDomain</code> from <code>Domain</code>. We would like to redefine the call operator which is used to convert an example from another domain. Our operator uses the original <code>Domain</code> to convert the example but afterwards sets the class to unknown.</p>
31<p class="header"><a href=""></a>
32(uses <a href=""></a>)</p>
33<xmp class="code">class MyDomain(orange.Domain):
34  def __call__(self, example):
35    ex = orange.Domain.__call__(self, example)
36    ex.setclass("?")
37    return ex
39md = MyDomain(data.domain)
42<p><a name="footnoteref2"></a>Subtyping built-in classes in Python is technically complex. When you call <code>MyDomain</code>, the <code>Domain</code>'s constructor is called. It will construct a C++ instance of <code>Domain</code>, but when returning it to Python, it would mark it as an instance of <code>MyDomain</code><SUP><a href="#footnote2">2</a></SUP>. So, the <code>md</code>'s memory representation is the same as that of ordinary instances of <code>orange.Domain</code>, but its type, as known to Python, is <code>MyDomain</code>. So when Python interpreter calls <code>md</code>, it will treat it as <code>MyDomain</code> and correctly call the method we defined above.</P>
44<xmp class="code">>>> print md(data[0])
45['psby', 'hyper', 'y', 'normal', '?']
48<p>Not so with C++. The C++ code knows nothing about Python wrappers, types, overloaded methods. For C++, <code>md</code> is an ordinary instance of <CODE>Domain</CODE>. So, what happens when C code tries to call it? To check this, we will convert an example in another way. We'll call the <code>Example</code>'s constructor. There are different arguments that can be given; if you provide a domain and an existing example, a new example will be constructed by converting the existing into the specified domain. The domain will be called internally to perform the actual conversion.</p>
50<xmp class="code">>>> print orange.Example(md, data[0])
51['psby', 'hyper', 'y', 'normal', 'no']
54<p><a name="footnoteref3"></a>The class is still 'no', not unknown.<SUP><a href="#footnote3">3</a></SUP></p>
56<p>The obvious solution to the problem would be to make Orange's components "Python aware". This would be extremely slow: at each call, even at calls of <code>operator[]</code>, for example, C++ code would have to check whether the corresponding operator has been overloaded in Python.</P>
58<p>The solution we preferred is efficient, yet limited. The following fragment is a fully functional filter, derived from <code>Orange.Filter</code>.</p>
60<xmp class="code">class FilterYoung(orange.Filter):
61    def __call__(self, ex):
62        return ex["age"]=="young"
65<p>You can, for instance, use it as an argument for <code>ExampleTable.filter</code> method:</p>
67<xmp class="code">>>> fy = FilterYoung()
68>>> for e in data.filter(fy):
69>>>   print e
70['young', 'myope', 'no', 'reduced', 'none']
71['young', 'myope', 'no', 'normal', 'soft']
72['young', 'myope', 'yes', 'reduced', 'none']
73['young', 'myope', 'yes', 'normal', 'hard']
74['young', 'hypermetrope', 'no', 'reduced', 'none']
75['young', 'hypermetrope', 'no', 'normal', 'soft']
76['young', 'hypermetrope', 'yes', 'reduced', 'none']
77['young', 'hypermetrope', 'yes', 'normal', 'hard']
80<p><a name="footnoteref4"></a><code>orange.Filter</code> is an abstract object. You cannot construct it, e.g. by calling <code>Filter()</code>. But when the <code>Filter</code>'s constructor is called to construct an instance of <code>FilterYoung</code>, such as <code>fy</code>, it constructs an instance of a special callback class, <code>Filter_Python</code> (although Orange doesn't admit it - when returning it to Python it says it's a <code>FilterYoung</code>, and that its base class is <code>Filter</code><SUP><a href="#footnote4">4</a></SUP>).</p>
82<xmp class="code">>>> type(FilterYoung())
83<class '__main__.FilterYoung'>
84>>> type(FilterYoung()).__base__
85<type 'Filter'>
88<p><code>Filter_Python</code>'s call operator (written in C++ and thus seen and respected in C++ code) callbacks the overloaded <code>__call__</code> written in Python.</p>
90<p>If it works for <code>orange.Filter</code> - why does it fail on <code>orange.Domain</code>? It's not been programmed. Only call operators of a few classes can do this: <code>Filter</code>, <code>Learner</code>, <code>Classifier</code>, <code>TreeSplitConstructor</code>, <code>TreeStopCriteria</code>, <code>TreeExampleSplitter</code>, <code>TreeDescender</code> and <code>MeasureAttribute</code>. Why only those? A rough estimate is that making all the methods of the existing 300 orange classes overloadable would inflate the Orange's source code to its triple size! On the other hand, the chosen classes are those that are most likely to be overloaded. For other, you can either find another solution or ask us to make them overloadable as well. Adding this functionality to a single method of a single class is a small undertaking.</P>
92<p>You might be tempted to overload a call operator of a class that is derived from <code>Filter</code>. For instance</p>
94<xmp class="code">class MyFilter(orange.Filter_index):
95   ...
98<p>This will fail just as for <code>Domain</code>. The class <code>Filter_Python</code> is derived from <code>Filter</code> and can only inherit its functionality. In the previous example, <code>Filter_Python</code> was a hidden class between <code>FilterYoung</code> and <code>orange.Filter</code>. In this example, you would need a class between <code>MyFilter</code> and <code>orange.Filter_index</code>, and this role obviously cannot be played by <code>Filter_Python</code>.</P>
100<p>How to do it then? The simplest way is to wrap a <code>Filter_index</code>, like this</p>
102<xmp class="code">class MyFilter(orange.Filter):
103   def __init__(self):
104      self.subfilter = orange.Filter_index()
105      ...
107   def __call__(self):
108     ... here you can call a self.subfilter
109         or do some of your stuff ...
112<p><code>MyFilter</code> is now derived from <code>orange.Filter</code> but still has it's own copy of <code>Filter_index</code>. Not pretty, but it works.</p>
114<p>There's another nice way to construct your own filters (and, in general, other overloadable components). You don't need to derive a new class but seemingly construct an instance of an abstract class, giving a callback function as an argument.</p>
116<xmp class="code">filt = orange.Filter(lambda ex:ex["age"]=="young")
117for e in data.filter(filt):
118    print e
121<p>The <code>Filter</code>'s constructor is called directly, as to construct an instance of <code>Filter</code>, which it would usually refuse (since the class is abstract). But when given a function as an argument (here we used lambda function, but you can, of course, use ordinary functions as well), it constructs a <code>Filter_Python</code> and stores the given function to its dictionary. When <code>Filter_Python</code> is called, it calls the function passed as an argument to its constructor.</p>
123<p>There's another twist. Sometimes you don't need to wrap the function into a class at all. You can, for example, construct a tree's <CODE>nodeLearner</CODE> on the fly. <CODE>nodeLearner</CODE> should be derived from <CODE>Classifier</CODE>, but you can assign it a callback function.</p>
125<xmp class="code">treeLearner = orange.TreeLearner()
126treeLearner.nodeLearner = lambda gen, weightID: orange.MajorityLearner(gen, weightID)
129<p>This function replaces a <code>nodeLearner</code> with a learner that calls <code>orange.MajorityLearner</code>. The example is artificial; the <code>MajorityLearner</code> can be given directly, with <code>treeLearner.nodeLearner = orange.MajorityLearner</code>; besides that, this is default anyway. Anyway, how can we assign a Python function to a <code>nodeLearner</code> which can only hold a pure C++ object <CODE>Classifier</CODE>, not Python functions. And how can we then even expect <code>TreeLearner</code> (written in C++) to call it? Checking what the above snippet actually stored to <CODE>treeLearner.nodeLearner</CODE> is revealing.</p>
131<xmp class="code">>>> treeLearner.nodeLearner
132<Learner instance at 0x019B1930>
135<p>When Orange assigns values to a component's attribute like <code>treeLearner.nodeLearner</code> it tries to convert the given arguments to a correct class, a <code>Learner</code> in this case. If the user actually gives an instance of (something derived from) <code>Learner</code>, that's great. Otherwise, Orange will call the <code>Learner</code>'s constructor with what was given as an argument. If constructor can use this to construct an object, it's OK. The above assignment is thus equivalent to</p>
137<xmp class="code">treeLearner = orange.TreeLearner()
138treeLearner.nodeLearner = orange.Learner(lambda gen, weightID: orange.MajorityLearner(gen, weightID))
141<p>which, as we know from the last example with <code>Filter</code>, works as intended.</P>
143<p>You might have used this feature before without knowing it. Have you ever constructed an <code>EnumVariable</code> and assigned some values to its field <code>values</code>? Field <code>values</code> is stored as <code>orange.StringList</code>, pure C++ <CODE>vector&lt;string&gt;</CODE>, not a Python list of strings which you have provided. A same thing as with <CODE>nodeLearner</CODE> happens here: since you tried to assign Python list instead of <CODE>StringList</CODE>, <CODE>StringList</CODE>'s constructor was called with Python list as an argument.</p>
145<p>A final advice for using derived classes is: don't wonder too much about how it works. Just be happy that it does. Use it, try things that you think might work, but be sure to check (a simple <code>print</code> here and there will suffice) that your call operators are actually called. That's all you need to care about.</p>
147<H2>Calling Inherited Methods</H2>
149<p>All classes for which you can (really) overload the call operator are abstract. The only exception is <CODE>TreeStopCriteria</CODE>, so this is the only class for which calling the inherited call operator makes sense. Examples section shows how to do it.</p>
151<p>For all other classes: calling the inherited method is an error. Similarly, forgetting to define the call operator but trying to use it leads to a call of the inherited operator; an error again.</p>
153<A name="examples"></a>
156<p>The below examples suppose that you have loaded a 'lenses' data in a variable 'data'.</p>
158<p>Examples are somewhat simplified. For instance, many classes below will silently assume that the attribute they deal with is discrete. This is to make the code clearer.</p>
161<index name="classes/Filter">
163<p>We've already show how to derive filters. Filter is a simple object that decides whether a given example is "acceptable" or not. The below class accepts example for which the value of "age" is "young".</p>
165<xmp class="code">class FilterYoung(orange.Filter):
166    def __call__(self, ex):
167        return ex["age"]=="young"
170<p><code>Filter</code> can be used, for instance, for selecting examples from an example table.</p>
172<xmp class="code">>>> fy = FilterYoung()
173>>> for e in data.filter(fy):
174...   print e
175['young', 'myope', 'no', 'reduced', 'none']
176['young', 'myope', 'no', 'normal', 'soft']
177['young', 'myope', 'yes', 'reduced', 'none']
178['young', 'myope', 'yes', 'normal', 'hard']
179['young', 'hypermetrope', 'no', 'reduced', 'none']
180['young', 'hypermetrope', 'no', 'normal', 'soft']
181['young', 'hypermetrope', 'yes', 'reduced', 'none']
182['young', 'hypermetrope', 'yes', 'normal', 'hard']
185<p>Note two things. You don't need to write your own filters to select examples based on values. You'd get the same effect by</p>
187<xmp class="code">>>> for e in"young"):
188...   print e
191<p>Second, you don't need to derive a class from a filter when a function would suffice. You can write either</p>
193<xmp class="code">>>> def f(ex):
194...   return ex["age"]=="young"
195>>> for e in data.filter(orange.Filter(f)):
196...    print e
199<p>or, for cases as simple as this, squeeze the whole function into a lambda function</p>
201<xmp class="code">>>> for e in data.filter(orange.Filter(lambda ex:ex["age"]=="young")):
202...    print e
207<index name="classes/Classifier">
209<p>A "classifier" in Orange has a rather non-standard meaning. A classifier is an object with a call operator that gets an example and returns a value, a distribution of values or both - the return type is regulated by an optional second argument. Beside the standard use of classifiers - "class predictors" - this also covers predictors in regression, objects used in constructive induction (which use some of example's attributes to compute a value of a new attribute), and others.</p>
211<p>For this tutorial, we will define a classifier that can be used for simple constructive induction. Its constructor will accept two attributes and construct a new attribute as Cartesian product of the two. Its name and names of its values will be constructed from pairs of names for original attributes. The call operator will return a value of the new attribute that corresponds to the values that the two attribute have on the example.</p>
213<p class="header"><a href=""></a>
214(uses <a href=""></a>)</p>
215<xmp class="code">class CartesianClassifier(orange.Classifier):
216  def __init__(self, var1, var2):
217    self.var1, self.var2 = var1, var2
218    self.noValues2 = len(var2.values)
219    self.classVar = orange.EnumVariable("%sx%s" % (,
220    self.classVar.values = ["%s-%s" % (v1, v2) \
221                            for v1 in var1.values for v2 in var2.values]
223  def __call__(self, ex, what = orange.Classifier.GetValue):
224    val = ex[self.var1] * self.noValues2 + ex[self.var2]
225    if what == orange.Classifier.GetValue:
226      return orange.Value(self.classVar, val)
227    probs = orange.DiscDistribution(self.classVar)
228    probs[val] = 1.0
229    if what == orange.Classifier.GetProbabilities:
230      return probs
231    else:
232      return (orange.Value(self.classVar, val), probs)
235<p>No surprises in constructor, except for a trick for construction of <code>classVar.values</code>.</p>
237<p>In the call operator, the first line uses an implicit conversion of values to integers. When <code>ex[self.var1]</code>, which is of type <code>orange.Value</code>, is multiplied by <code>noValues2</code>, which is an integer, the former is converted to an integer. The same happens at addition.</p>
239<p><code>val</code> is an index of the value to be returned. What follows is the usual procedure for constructing a correct return type for a classifier - you will often do something very similar in your classifiers.</p>
241<p class="header"><a href=""></a>
242(uses <a href=""></a>)</p>
243<xmp class="code">>>> tt = CartesianClassifier(data.domain[2], data.domain[3])
244>>> for i in range(5):
245...     print "%s --> %s" % (data[i], tt(data[i]))
247['young', 'myope', 'no', 'reduced', 'none'] ---> young-myope
248['young', 'myope', 'no', 'normal', 'soft'] ---> young-myope
249['young', 'myope', 'yes', 'reduced', 'none'] ---> young-myope
250['young', 'myope', 'yes', 'normal', 'hard'] ---> young-myope
251['young', 'hypermetrope', 'no', 'reduced', 'none'] ---> young-hypermetrope
252['young', 'hypermetrope', 'no', 'normal', 'soft'] ---> young-hypermetrope
257<index name="classes/Learner">
259<p><code>ClassifierByLookupTable</code> is a classifier whose predictions are based on the value of a single attribute. It contains a simple table named <code>lookupTable</code> for conversion from attribute value to class prediction. The last element of the table is the value that is returned when the attribute value is unknown or out of range. Similarly, <code>distributions</code> is a list of distributions, used when <code>ClassifierByLookupTable</code> is used to predict a distribution.</p>
261<p>Let us write a learner which chooses an attribute using a specified measure of quality and constructs a <code>ClassifierByLookupTable</code> that would use this single attribute for making predictions.</p>
263<p class="header"><a href=""></a>
264(uses <a href=""></a>)</p>
265<xmp class="code">class OneAttributeLearner(orange.Learner):
266  def __init__(self, measure):
267    self.measure = measure
269  def __call__(self, gen, weightID=0):
270    selectBest = orngMisc.BestOnTheFly()
271    for attr in gen.domain.attributes:
272      selectBest.candidate(self.measure(attr, gen, None, weightID))
273    bestAttr = gen.domain.attributes[selectBest.winnerIndex()]
274    classifier = orange.ClassifierByLookupTable(gen.domain.classVar, bestAttr)
276    contingency = orange.ContingencyAttrClass(bestAttr, gen, weightID)
277    for i in range(len(contingency)):
278      classifier.lookupTable[i] = contingency[i].modus()
279      classifier.distributions[i] = contingency[i]
280    classifier.lookupTable[-1] = contingency.innerDistribution.modus()
281    classifier.distributions[-1] = contingency.innerDistribution
282    for d in classifier.distributions:
283      d.normalize()
285    return classifier
288<p>Constructor stores the measure to be used for choosing the attribute. Call operator assesses the qualities of attributes and feeds them to <code>orngMisc.BestOnTheFly</code>. This is a simple class with method <code>candidates</code> to which we feed some objects, and <code>winnerIndex</code> that tells the index of the greatest of the "candidates" (there's also a method <code>winner</code> that returns a winner itself, but we cannot use it here). The benefit of using <code>BestOnTheFly</code> is that it is fair; in case there are more than one winners, it will return a random winner and not the first or the last (however, if you call <code>winnerIndex</code> repetitively without adding any (winning) candidates, it will repeatedly return the same winner).</p>
290<p>The chosen attribute is stored in <code>bestAttr</code>. A <code>ClassifierByLookupTable</code> is constructed next.</p>
292<p>We then need to fill the <code>lookupTable</code> and <code>distributions</code>. For this, we construct a contingency matrix of type <code>ContingencyAttrClass</code> that has the given attribute as the outer and the class as the inner attribute. Thus, <code>contingency[i]</code> gives the distribution of classes for the i-th value of the attribute. We then iterate through the contingency to find the most probable class for each value of the attribute (obtained as modus of the distribution). When predicting probabilities of classes, our classifier will return normalized distributions.</p>
294<p>When the value of the attribute is unknown or out of range, it will return the most probable class and the apriori class distribution; this can be find as inner distribution of the contingency.</p>
296<p class="header"><a href=""></a>
297(uses <a href=""></a>)</p>
298<xmp class="code">>>> oal = OneAttributeLearner(orange.MeasureAttribute_gainRatio())
299>>> c = oal(data)
300>>> c.variable
301EnumVariable 'tear_rate'
302>>> c.variable.values
303<reduced, normal>
304>>> print c.lookupTable
305<none, soft, none>
306>>> print c.distributions
307<<1.000, 0.000, 0.000>, <0.250, 0.417, 0.333>, <0.625, 0.208, 0.167>>
310<p>When trained on 'lenses' data, our learner chose the attribute 'tear_rate'. When its value is 'reduced', the predicted class is 'none' and the distribution shows that the classifier is pretty sure about it (100%). When the value is 'normal', the predicted class is 'soft' but with much less certainty (42%). When the value will be unknown or out of range (for example, is the user adds some values), the classifier will predict class 'no' with 62.5% certainty.</p>
313<H3>ExamplesDistance and ExamplesDistance_Constructor</H3>
315<P><code>ExamplesDistance_Constructor</code> receives four arguments: an example generator and weights meta id, domain distributions (of type <code>DomainDistributions</code> and basic attribute statistics (an instance of <code>DomainBasicAttrStat</code>). The latter two can be <code>None</code>; you should write your code so that it computes them itself from the examples if they are needed. Function should return an instance of <code>ExamplesDistance</code>.</P>
317<P><code>ExamplesDistance</code> gets two examples and should return a number representing the distance between them.</P>
321<index name="classes/MeasureAttribute">
323<p><code>MeasureAttribute</code> is slightly more complex since it can be given different sets of parameters. The class defines the way it will be called by setting the "needs" field (see documentation on <a href="MeasureAttribute.htm">attribute evaluation</a> for more details). (<b>Note: this has been changed from the mess we had in the past. Any existing code should still work or will need to be simplified if it does not.)</b>)</p>
325<DL class=attributes>
326<DT>__call__(attributeNumber, domainContingency, aprioriProbabilities)</DT>
327<DD>These arguments are sent if <code>needs</code> is set to <code>orange.MeasureAttribute.DomainContingency</code>. The data from which the attribute is to be evaluated is given by contingency of all attributes in the dataset. The <CODE>attributeNumber</CODE> tells which of those attribute the function needs to evaluate. Finally, there are apriori class probabilities, if the methods can make use of them; the third argument can sometimes be <CODE>None</CODE>.</DD>
329<DT>__call__(contingencyMatrix, classDistribution, aprioriProbabilities)</DT>
330<DD>In this form, which is used if <code>needs</code> equals <code>orange.MeasureAttribute.Contingency_Class</code>, you are given a class distribution and the contingency matrix for the attribute that is to be evaluated. In context of decision tree induction, this is a class distribution in a node and class distribution in branches if this attribute is chosen. The third argument again gives the apriori class distribution, and can sometimes be <CODE>None</CODE> if apriori distribution is unknown.</DD>
332<DT>__call__(attributes, examples, aprioriProbabilities, weightID)</DT>
333<DD>This form is used if <code>needs</code> is <code>orange.MeasureAttribute.Generator</code>. The attribute can be given as an instance of <code>int</code> or of <code>Variable</code> - you might want to check the argument type before using it.</DD>
336<P>In all cases, the method must return a real number representing the quality attribute; higher numbers mean better attributes. If in your measure of quality higher values mean worse attributes, you can either negate or inverse the number.</p>
338<p>As an example, we will write a measure that is based on cardinality of attributes. It will also have a flag by which the user will decide whether he prefers the attributes with higher or with lower cardinalities.</p>
340<p class="header"><a href=""></a>
341(uses <a href=""></a>)</p>
342<xmp class="code">class MeasureAttribute_Cardinality(orange.MeasureAttribute):
343  def __init__(self, moreIsBetter = 1):
344    self.moreIsBetter = moreIsBetter
346  def __call__(self, a1, a2, a3):
347    if type(a1) == int:
348      attrNo, domainContingency, apriorClass = a1, a2, a3
349      q = len(domainContingency[attrNo])
350    else:
351      contingency, classDistribution, apriorClass = a1, a2, a3
352      q = len(contingency)
354    if self.moreIsBetter:
355      return q
356    else:
357      return -q
360<p>Alternatively, we can write the measure in form of a function, but without the flag. To make it shorter, will skip fancy renaming of parameters.</p>
362<p class="header"><a href=""></a>
363(uses <a href=""></a>)</p>
364<xmp class="code">def measure_cardinality(a1, a2, a3):
365  if type(a1) == int:
366    return len(a2[a1])
367  else:
368    return len(a1)
371<p>To test the class and the function we shall induce a decision tree using the specified measure.</p>
373<p class="header"><a href=""></a>
374(uses <a href=""></a>)</p>
375<xmp class="code">treeLearner = orange.TreeLearner()
376treeLearner.split = orange.TreeSplitConstructor_Attribute()
377treeLearner.split.measure = MeasureAttribute_Cardinality(1))
378tree = treeLearner(data)
382<p>There are three two-valued and one three-valued attribute. If we set the <code>moreIsBetter</code> to 1, as above, the attribute of the root of the tree would be the three-valued <code>age</code> while the attributes for the rest of the tree are chosen at random. If we set it to 0, the attribute <code>age</code> is used only when the values of all remaining attributes have been checked.</p>
384<p>To use the function <code>measure_cardinality</code> we don't need to wrap it into anything. If we simply set</p>
386<xmp class="code">treeLearner.split = orange.TreeSplitConstructor_Attribute()
387treeLearner.split.measure = measure_cardinality)
390<p>the function is automatically wrapped.</p>
394<index name="classes/TransformValue">
396<P><A href="TransformValue.htm"><CODE>TransformValue</CODE></a> is a simple class whose call operator gets a <CODE>Value</CODE> and returns another (or the same) <CODE>Value</CODE>. An example of its use is given in the page about <a href="classifierFromVar.htm">classifiers from attribute</a>.</P>
399<index name="classes/TreeSplitConstructor">
401<p>The usual tree split constructor chooses an attribute on which the split is based and construct a <code>ClassifierFromVarFD</code> to return the chosen attribute's value. They are capable of much more. To demonstrate this, we will write a split constructor that constructs a split based on values of two attributes, joined in a Cartesian product. We will utilize a <code>CartesianClassifier</code> that we've already written above.</p>
403<p class="header"><a href=""></a>
404(uses <a href=""></a>)</p>
405<xmp class="code">class SplitConstructor_CartesianMeasure(orange.TreeSplitConstructor):
406  def __init__(self, measure):
407    self.measure = measure
409  def __call__(self, gen, weightID, contingencies, apriori, candidates):
410    attributes = data.domain.attributes
411    selectBest = orngMisc.BestOnTheFly(orngMisc.compare2_firstBigger)
412    for var1, var2 in orange.SubsetsGenerator_constSize(2, attributes):
413      if candidates[attributes.index(var1)] and candidates[attributes.index(var2)]:
414        cc = CartesianClassifier(var1, var2)
415        cc.classVar.getValueFrom = cc
416        meas = self.measure(cc.classVar, gen)
417        selectBest.candidate((meas, cc))
419    if not
420      return None
422    bestMeas, bestSelector = selectBest.winner()
423    return (bestSelector, bestSelector.classVar.values, None, bestMeas)
426<p>We again use the class <code>BestOnTheFly</code> from <code>orngMisc</code> module. This time we need to add a compare function that will compare the first element of the tuple, <code>orngMisc.compare2_firstBigger</code>, since we will feed it with tuples (a quality of the split, selector). The best selector and its quality are retrieved by the method <code>winner</code>.</p>
428<p>Class <code>orange.SubsetGenerator_constSize</code> is used to generate pairs of attributes. For each pair, we check that both attributes are among the candidates.</p>
430<p>Now comes the tricky business. We construct a <code>CartesianClassifier</code> to compute a Cartesian product of the two attributes. <code>CartesianClassifier</code>'s constructor prepares a new attribute, which is stored in its <code>classVar</code>. The quality of the split needs to be determined as the quality of this attribute, as measured by <code>self.measure</code> - "<code>meas = self.measure(cc.classVar, gen)</code>" does the job. But the problem is that given examples (<code>gen</code>) do not have the attribute <code>cc.classVar</code>.</p>
432<p>Not all, but many Orange's methods act like this: when asked to do something with the attribute that does not exist in the given domain, they try to compute its value from the attributes that are available. More precisely, the attribute needs to have a pointer to a classifier that is able to compute its value. In our case, we set the <code>cc.classVar</code>'s field <code>getValueFrom</code> to <code>cc</code>.</p>
434<p>When <code>self.measure</code> notices that the attribute <code>cc.classVar</code> does not exist in domain <code>gen.domain</code>, it will use <code>cc.classVar.getValueFrom</code> to compute its values on the fly.</p>
436<p>If you don't understand the last few paragraphs, a short resume: using some magic, we construct a classifier that can be used as a split criterion (node's <code>branchSelector</code>), assess its quality and show it to <code>selectBest</code>. More about this is written in documentation on <a href="Variable.htm">attribute descriptors</a>.</p>
438<p>When the loop ends, we return <code>None</code> if no splits were find (possibly because there were not enough candidates). Otherwise, we retrieve the winning quality and selector, and return an appropriate tuple consisting of</p>
440<LI>the <CODE>branchSelector</CODE>; a classifier that returns a value computed from the two used attributes;
441<LI>descriptions of branches; values of the constructed attribute fit this purpose well;
442<LI>numbers of examples in branches; we don't have this available so we'll let the <CODE>TreeLearner</CODE> find it itself
443<LI>quality of the split; a measure will do;
444<LI>the index of spent attribute; we've spent two, but can return only a single number, so we act as if we've spent none - we simply omit the index.
447<p>The code can be tested with the following script.</p>
449<p class="header"><a href=""></a>
450(uses <a href=""></a>)</p>
451<xmp class="code">treeLearner = orange.TreeLearner()
452treeLearner.split = SplitConstructor_CartesianMeasure(orange.MeasureAttribute_gainRatio())
453tree = treeLearner(data)
458<index name="classes/TreeStopCriteria">
460<p><code>TreeStopCriteria</code> is a simple class. Its arguments are examples, id of meta-attribute with weights (or 0 if examples are not weighted) and a <code>DomainContingency</code>. The induction stops if <code>TreeStopCriteria</code> returns 1 (or anything representing "true" in Python). The class is peculiar for being the only non-abstract class whose call operator can be (really) overloaded. Thus, it is possible to call the inherited call operator. Even more, you should do so.</p>
462<p>For a brief example, let us write a stop criterion that will call the common stop criteria, but besides that stop the induction randomly in 20% of cases.</p>
464<p class="header">part of <a href=""></a>
465(uses <a href=""></a>)</p>
466<xmp class="code">from random import randint
468defStop = orange.TreeStopCriteria()
469treeLearner = orange.TreeLearner()
470treeLearner.stop = lambda e, w, c: defStop(e, w, c) or randint(1, 5)==1
473<p>We've defined a default stop criterion <code>defStop</code> to avoid constructing it at each call of our function. The whole stopping criterion is hidden in the lambda function which stops when the default says so or when the random number between 1 and 5 equals 1.</p>
475<p>To demonstrate a call of inherited call operator, let us do the same thing by deriving a new class.</p>
477<p class="header">part of <a href=""></a>
478(uses <a href=""></a>)</p>
479<xmp class="code">class StoppingCriterion_random(orange.TreeStopCriteria):
480  def __call__(self, gen, weightID, contingency):
481    return orange.TreeStopCriteria.__call__(self, gen, weightID, contingency) \
482           or randint(1, 5)==1
484treeLearner.stop = StoppingCriterion_random()
490<index name="classes/TreeExampleSplitter">
492<p>Example splitter's task is to split a list of examples (usually an <code>ExamplePointerTable</code> or <code>ExampleTable</code>) into subsets and return them as <code>ExampleGeneratorList</code>. The arguments it gets are a <code>TreeNode</code> (it will need at least <code>branchSelector</code>, some splitters also use <code>branchSizes</code>), a list of examples and an id of meta-attribute with example weights (or 0, if they are not weighted).</p>
494<p>If some examples are split among the branches so that only part of an example belongs to a branch, the splitter should construct new weight meta attributes and fill it with example weights. A list of weight ID's should be returned in a tuple with the <code>ExampleGenerator</code> list. The exact mechanics of this is given on the page describing the tree induction.</p>
498<index name="classes/TreeDescender">
500<p>Descenders are about the trickiest components of Orange trees. They get two arguments, a starting node (not necessarily a tree root) and an example, and return a tuple with the finishing node and, optionally, a discrete distribution.</p>
502<p>If there's no distribution, the <code>TreeClassifier</code> (who usually calls the descender) will use the returned node to classify the example. Thus the node's <code>nodeClassifier</code> will probably need to be defined (unless you've patched a <code>TreeClassifier</code> or written your own version of it).</p>
504<p>If the distribution is returned, branches below the returned node will vote on the classifiers class and the distribution represents weights of votes for individual branches. Voting will require additional calls of the descender, but that's something that a <code>TreeClassifier</code> needs to worry about.</p>
506<p>The descender's real job is to decide what should happen when the descend halts because a branch for an example cannot be determined. It can either return the node (so it will be used to classify the example without looking any further), silently decide for some branch, or request a vote.</p>
508<p>A general descender look like that:</p>
509<xmp class="code">class Descender_RandomBranch(orange.TreeDescender):
510  def __call__(self, node, example):
511    while node.branchSelector:
512      branch = node.branchSelector(example)
513      if branch.isSpecial() or int(branch)>len(node.branches):
514        < do something >
515      nextNode = node.branches[int(branch)]
516      if not nextNode:
517        break
518      node = nextNode
519    return node
522<p>Descenders descend until they reach a node with no <CODE>branchSelector</CODE> - a leaf. They call each node's <CODE>branchSelector</CODE> to find the branch to follow. If the value is defined, they check whether the node below is a null-node. If this is so, they act as if the current node is a leaf.</p>
524<p>Descenders differ in what they do when the branch index is unknown or out of range.</p>
526<p>In this section, we will suppose that the tree has already been induced (using, say, default settings for <code>TreeLearner</code>) and stored in a <code>TreeClassifier</code> <code>tree</code>.</p>
528<xmp class="code">>>> tree = orange.TreeLearner(data)
529>>> orngTree.printTxt(tree)
530tear_rate=reduced: none (100.00%)
532|    astigmatic=no
533|    |    age=young: soft (100.00%)
534|    |    age=pre-presbyopic: soft (100.00%)
535|    |    age=presbyopic: none (50.00%)
536|    astigmatic=yes
537|    |    prescription=myope: hard (100.00%)
538|    |    prescription=hypermetrope: none (66.67%)
542<H5>Continuing the descent</H5>
544<p>For the first exercise, will implement a descender that decides for a random branchs when descent stops. The decision will be random, it will ignore any probabilities that might be computed based on <code>branchSizes</code> or values of other example's attributes.</p>
546<p class="header">part of <a href=""></a>
547(uses <a href=""></a>)</p>
548<xmp class="code">class Descender_Report(orange.TreeDescender):
549  def __call__(self, node, example):
550    print "Descent: root ",
551    while node.branchSelector:
552      branch = node.branchSelector(example)
553      if branch.isSpecial() or int(branch)>=len(node.branches):
554        break
555      nextNode = node.branches[int(branch)]
556      if not nextNode:
557        break
558      print ">> %s = %s" % (, node.branchDescriptions[int(branch)]),
559      node = nextNode
560    return node
563<p>Everything goes according to the above template. When the <code>branchSelector</code> does not return a (valid) branch, we select a random branch (and print it out for debugging purposes).</p>
565<p>To see how it works, we'll take the third example from the table and remove the value of the attribute needed at the root of the tree.</p>
567<p class="header">part of <a href=""></a>
568(uses <a href=""></a>)</p>
569<xmp class="code">>>> ex = orange.Example(data.domain, list(data[3]))
570>>> ex[tree.tree.branchSelector.classVar] = "?"
571>>> print ex
572['young', 'myope', 'yes', '?', 'hard']
575<p>We'll now tell the classifier to use our descender, and classify the example - we'll call the classifier for five times.</p>
577<p class="header">part of <a href=""></a>
578(uses <a href=""></a>)</p>
579<xmp class="code">>>> tree.descender = Descender_RandomBranch()
580>>> for i in range(3):
581...    print tree(ex)
582Descender decides for  1
584Descender decides for  1
586Descender decides for  0
590<p>When the descender decides for the second branch (branch 1), the astigmatism and age is checked and the example is classified to "hard". When the descender takes the first branch (0), the classifier returns "none".</p>
594<p>Our next descender will request a vote. It will, however, disregard any known probabilities and assign random weights to the branches.</p>
596<p class="header">part of <a href=""></a>
597(uses <a href=""></a>)</p>
598<xmp class="code">class Descender_RandomVote(orange.TreeDescender):
599  def __call__(self, node, example):
600    while node.branchSelector:
601      branch = node.branchSelector(example)
602      if branch.isSpecial() or int(branch)>=len(node.branches):
603        votes = orange.DiscDistribution([randint(0, 100) for i in node.branches])
604        votes.normalize()
605        print "Weights:", votes
606        return node, votes
607      nextNode = node.branches[int(branch)]
608      if not nextNode:
609        break
610      node = nextNode
611    return node
614<p>In the first interesting line we construct a discrete distribution with random integers between 0 and 100, one for each branch. We normalize it and return the current node and the weights of votes. It's as simple as that.</p>
616<p>We'll check the descender on the same example as above.</p>
618<xmp class="code">>>> tree.descender = Descender_RandomVote()
619>>> print tree(ex, orange.GetProbabilities)
620Decisions by random voting
621Weights: <0.338, 0.662>
622<0.338, 0.000, 0.662>
625<p>The first output line gives the weights of the branches - 0.338 for the first one and 0.662 for the second, which is reflected on the final answer.</p>
627<H5>A Reporting Descender</H5>
629<p>As the last example, here's a handy descender that prints out the descriptions of branches on the way. When <code>branchSelector</code> does not return (a valid) branch, it simply returns the current node, as if it was a leaf (you can change this if you want to).</p>
631<p class="header">part of <a href=""></a>
632(uses <a href=""></a>)</p>
633<xmp class="code">class Descender_Report(orange.TreeDescender):
634  def __call__(self, node, example):
635    print "Descent: root ",
636    while node.branchSelector:
637      branch = node.branchSelector(example)
638      if branch.isSpecial() or int(branch)>=len(node.branches):
639        break
640      nextNode = node.branches[int(branch)]
641      if not nextNode:
642        break
643      print ">> %s = %s" % (
644 ,
645           node.branchDescriptions[int(branch)]),
646      node = nextNode
647    print
648    return node
651<p>We'll test it on the first example from the table (without removing any values).</p>
653<xmp class="code">>>> tree.descender = Descender_Report()
654>>> print "Classifying example", data[0]
655Classifying example ['young', 'myope', 'no', 'reduced', 'none']
656>>> print "----> %s" % tree(data[1])
657Descent: root  >> tear_rate = normal >> astigmatic = no >> age = young ----> soft
662<FONT SIZE=-2>
663<SUP><a href="#footnoteref1">1</a></SUP><a name="footnote1"></a>
664Why "in principle"? The main reason is that the Python-to-Orange interface is so big that no one, at least not the principle authors of Orange, are ready to program and maintain another such interface. The other reason is that we've committed a small sin regarding independency; at certain point we stopped developing our own garbage collection system and now use the Python's instead. Getting independency from Python would mean rewriting the garbage collection which is something we'd prefer not to have to do.</FONT><P>
666<FONT SIZE=-2>
667<SUP><a href="#footnoteref2">2</a></SUP><a name="footnote2"></a>
668For those that are familiar with C++ terms, but not with Python's API: each Python object has a pointer to a type description - the object that you get when calling Python's built-in function <code>type()</code>. The type description gives the name of the type, the memory size of its instances, several flags and pointers to functions that the objects provides. This is somewhat similar to a C++ list of virtual methods, except that here the methods are defined in advance and have fixed position. For example, the <code>tp_cmp</code> pointer points to the function that should be called when this object is to be compared to another; if <code>NULL</code>, object does not support comparison.
671<FONT SIZE=-2><a name="footnote3"></a>
672<SUP><a href="#footnoteref3">3</a></SUP><code>md</code> has two tables of virtual methods (<code>vfptr</code>), one for Python and one for C++. When Python calls it, it uses the <code>__call__</code> you defined, when C++ calls it, it calls the function defined in C++.
675<FONT SIZE=-2><SUP><a href="#footnoteref4">4</a></SUP><a name="footnote4"></a>
676There's a case when the intermediate class is revealed, a <code>TreeStopCriteria_Python</code>; this is needed because the class <code>TreeStopCriteria</code> is not abstract, but we won't discuss the details here.
Note: See TracBrowser for help on using the repository browser.