source: orange/Orange/doc/reference/peculiarities.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 13.3 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
4</HEAD>
5<body>
6
7<h1>Orange Peculiarities</h1>
8
9<p>Orange has several features that will seem peculiar to an experienced Python programmer. Orange has been conceived as a machine library system that, "incidentally", has an interface to Python, and is tailored to suit the machine learning community. The other reason for differences with typical Python modules is that we have not always followed novelties in Python, but we have added several extensions of our own.</p>
10
11<hr>
12
13<h2>Schizophrenic Constructors</h2>
14
15<p>In ordinary Python classes, constructors of class <code>A</code> construct instances of class <code>A</code>. It is hard to imagine otherwise. One of Python's PEPs, however, briefly mentions that it is possible for a built-in class to return an instance of another type. Orange classes are implemented as built-in classes and exploit this possibility to offer a shortcut for construction of certain objects.</p>
16
17<p>The typical example for this are learners and classifiers. Basically, learners are objects that can be called with learning examples as an argument, and return a classifier. Classifiers, in turn, are object that can be called with an example and predict its class (or probabilities for the classes). For instance, if <code>data</code> contains learning examples, we can construct a Bayesian learner and classifier like this:<p>
18
19<xmp class="code">learner = orange.BayesLearner()
20classifier = learner(data)
21</xmp>
22
23<p>Everything is normal here. <code>orange.BayesLearner()</code> returns an instance of <code>orange.BayesLearner</code>. Many users would, however, prefer to do to it like this:</p>
24
25<xmp class="code">classifier = orange.BayesLearner()(data)
26</xmp>
27
28<p>The learner is constructed and immediately called. In order to avoid extra parentheses, <code>orange.BayesLearner</code>'s constructor can accept data, call learning and return a classifier:</code>
29
30<xmp class="code">classifier = orange.BayesLearner(data)
31</xmp>
32
33<p>The same can be done with most classes whose main purpose is construction of another class through the call operator (in the same way as <code>orange.BayesLearner</code>'s main function is construction of <code>orange.BayesClassifier</code>). If the constructor is passed the arguments that would be passed to the call, the constructor will execute the call itself and returned the corresponding instance.</p>
34
35<p>Is there a way to program similar constructors in Python? Yes, to some extent. The trick is to have a function instead of a class. Here's a skeleton of the Python code that would simulate the <code>BayesLearner</code>'s behaviour.</p>
36
37<xmp class="code">def BayesLearner(data = None, weightId = 0):
38  learner = BayesLearnerClass()
39  if data:
40    return learner(data, weightId)
41  else:
42    return learner
43
44class BayesLearnerClass:
45  def __init__(self):
46    # here comes initialization
47
48  def __call__(self, data, weightId=0):
49    # here comes learning, in the end, you can
50    # return an instance of BayesClassifierClass
51
52class BayesClassifierClass:
53    # ... and so on
54</xmp>
55
56<p>The difference between this and what Orange offers is that <code>type(BayesLearner)</code> is <code>&lt;type 'function'&gt;</code>, while for Orange classes <code>type(orange.BayesLearner)</code> is not a function but a class type.
57
58
59
60<h2>Passing Attributes to Constructors</h2>
61
62<p>Another peculiarity of constructors is that they accept values of attributes to be set. Essentially, everything that is passed as a keyword argument is copied to the instances dictionary, just as if constructors were defined like this.</p>
63
64<xmp class="code">class MyClass:
65  def __init__(self, <positional arguments>, **args):
66    self.__dict__.update(args)
67    <...>
68</xmp>
69
70<p>This will significantly reduce the length of your scripts. Instead of</p>
71
72<xmp class="code">treelearner = orange.TreeLearner()
73treelearner.storeExamples = 1
74tree = treelearner(data)
75</xmp>
76
77<p>you can - using this trick and the one from the previous section, simply say</p>
78
79<xmp class="code">tree = orange.TreeLearner(data, storeExamples = 1)
80</xmp>
81
82<p>The constructor will construct an <code>orange.TreeLearner</code>, set its attribute <code>storeExamples</code>, call it with the <code>data</code> and return the resulting tree classifier.</p>
83
84<P><B>Important note:</B> Previously, Orange also accepted key arguments to calls, which were stored as attribute values by call operators. This was removed for several reasons. Most importantly, statements like <CODE>filtered_data = filter(data, negate=1)</CODE> were sometimes understood as if negate was set to 1 only for the duration of the call, while in fact it was stored and remained in effect for the subsequent calls. The other reason is that Python now uses this syntax for naming positional arguments (this wasn't so when we began developing Orange).</P>
85
86<P>Passing attribute values in calls was seldom used, so we felt we can eliminate this functionality without major inconvenience. Passing attribute values now raises an exception, telling that the call doesn't support keyword arguments; if you happen to see this in your code, please move setting the attribute above the call.</P>
87
88<P>There are a two cases where this functionality was handy: setting 'negate' in filters and specifying probabilities or number of folds in descendants of <CODE>MakeRandomIndices</CODE>. In these two places we therefore retained it, but setting the attribute is now effective only for the duration of the call. This is the only normal expected behaviour; there may be some who would have problems because of this incompatibility (we hope to avoid it by properly advertising it), but we believe that many more would have problems if we kept the counterintuitive version that we had before.
89
90Passing attribute values to constructors is not ambiguous and is very handy, so it will remain supported.
91
92
93<h2>Type Checking for Built-in Attributes</h2>
94
95<p>Instances of Orange's classes have two kinds of attributes. Orange's classes are written in C++, and most of their fields are exported to Python where they are seen as instances' attributes. These attributes are "built-in" and are stored as native C++ objects (as opposed to the attributes added by user, which are a matter of the following section). Thus, <code>orange.Filter_index</code>'s attribute <code>value</code>, which poses as an integer in Python, is stored as an ordinary C++ <code>int</code>, and <code>orange.Variable</code>'s <code>name</code> is internally an STL's <code>string</code>.</p>
96
97<p>Python has no type-checking. If a Python class expects certain attribute to be of certain type, this is not checked when the attribute value is set. Potential errors are discovered only when the attribute is used. Not so in Orange. Since built-in attributes are stored as C++ structures, they need to be converted while set (or read). Passing a value of wrong type is discovered immediately. If <code>data</code> is an <code>orange.ExampleTable</code>, setting its <code>domain</code> to a wrong type will yield an error.</p>
98
99<xmp class="code">>>> data.domain=2
100Traceback (most recent call last):
101  File "<interactive input>", line 1, in ?
102TypeError: invalid parameter type for 'ExampleTable.domain', (expected 'Domain', got 'int')
103</xmp>
104
105
106<h2>Warnings While Setting Attributes</h2>
107
108<p>Python classes do not have a fixed set of attributes. You can create new attributes by assignment.</p>
109
110<xmp class="code">>>> class A:
111...   pass
112...
113>>> a = A()
114>>> a.x = 12
115</xmp>
116
117<P>The ability to add attributes as needed is a really neat feature of Python, which enables extensibility and reusability unparalleled by most (if not all) other object-oriented languages. Orange supports this as well and it can be put to good use, for instance, in classification trees, where a testing procedure can use additional attributes to count the number of misclassifications for each node.</P>
118
119<P>As described in the previous section, Orange's classes are written in C++, and a rather complex machinery is needed to support adding new attributes. Allowing this, however, can cause nasty bugs in Python scripts for Orange. Orange's classes have many attributes with long names and it is quite easy to misspell them. For instance, if you want a Bayesian classifier to use m-estimate for probability, you need to set its <code>unconditionalEstimatorConstructor</code>. Misspell it and you will create a new attribute with your, misspelled variation of the original name. The learner will not see it and will use the default value instead.</p>
120
121<p>Due to many such errors and relatively little need for the user-defined attributes, Orange will trigger a warnings when such attributes are created.</p>
122
123<xmp class="code">>>> b=orange.BayesLearner()
124>>> b.g=12
125__main__:1: AttributeWarning: 'g' is not a builtin attribute of 'BayesLearner'
126</xmp>
127
128<p>This, on the other hand, can be annoying when setting a new attribute is not a mistake. There are two ways to avoid the unwanted warnings. One is to utilize standard Python's warning filters.</p>
129
130<xmp class="code">warnings.filterwarnings("ignore", "", orange.AttributeWarning)
131</xmp>
132
133<p>This is rather dangerous practice; it is much safer to disable warnings that occur in particular module and/or with a particular orange class or module. The following disables warnings in <CODE>orng</CODE> modules.</p>
134
135<xmp class="code">warnings.filterwarnings("ignore", "", orange.AttributeWarning, "orng.*")
136</xmp>
137
138<P>The complete documentation on module <code>warning</code> is included in Python's distribution.</p>
139
140<P>The best way to use new attributes but without disabling the otherwise beneficial warnings is not to set them the usual way but through a function <CODE>setattr</CODE>.</P>
141
142<XMP class="code">>>> b.setattr("i", 15)
143</XMP>
144
145<P>This has the exact same effect as <CODE>b.i = 15</CODE>, only without warnings.</P>
146
147<h2>Orange Lists</h2>
148
149<p>Orange has many objects that behave like lists, but aren't ordinary lists. The most widely seen objects of this kind are <code>orange.ExampleTable</code>, <code>orange.Example</code> and <code>orange.Domain</code>. There is nothing really special about them, but we have seen that many users tend to have problems with them. At certain points, they are reluctant to do typical list-like operations of them. For example, to iterate through the attributes of the <code>data.domain</code>, many write</p>
150
151<xmp class="code">for i in range(len(data.domain)):
152  attr = data.domain[i]
153  ...
154</xmp>
155
156<p>while it is obviously more elegant to use <code>data.domain</code> as any other list</p>
157
158<xmp class="code">for attr in data.domain:
159  ...
160</xmp>
161
162<p>Similarly, to print out the first ten examples of an <code>orange.ExampleTable</code>, you can use slicing.</p>
163
164<xmp class="code">for example in data[:10]:
165  print example
166</xmp>
167
168<p>On the other hand, these are not ordinary Python lists and do not necessarily support all operations supported by Python lists. Sorting, for instance, is typically unsupported as it makes no sense or would violate object's consistency.</p>
169
170<p>There are two efficient ways to convert an Orange list to an ordinary Python list. One is to instruct Python to create a list from your object:</p>
171
172<xmp class="code">listofexamples = list(data)
173</xmp>
174
175<p>This is equivalent to</p>
176
177<xmp class="code">listofexamples = [example for example in data]
178</xmp>
179
180<p>or (if you are not familiar with list comprehensions)</p>
181
182<xmp class="code">listofexmaples = []
183for example in data:
184  listofexamples.append(example)
185</xmp>
186
187<p>The other is to use the method <code>native</code>. This method is not offered by all list-like object (but is also implemented for some non list-like objects). Method <code>native</code> can have an argument specifying the level of conversion. The <code>orange.ExampleTable</code>'s method <code>native</code> can return a list of instances of <code>orange.Example</code> (as <code>list</code>, described above). If so requested, examples themselves can be converted to lists of values. Values, in turn, can be <code>orange.Value</code> or converted to Python integers and strings. Arguments to method <code>native</code> are specific upon the class.</p>
188
189
190<h2>A Few Things You Might Miss</h2>
191
192<h3>Pickle and Deep-copy</h3>
193<index name="pickling Orange objects">
194<index name="deep-copying Orange objects">
195
196Orange classes cannot be pickled (yet). This also prevents deep-copying, which works through pickling. Shallow copying can sometimes be achieved by calling constructor with the object that is to be copied as an argument. Deep-copying is, by our experience and by experience of other users, seldom needed and can be circumvented. Pickling, in contrast, would be very useful.
197
198<h3>Named Arguments</h3>
199
200<p>Orange does not support (yet) named arguments to functions. In Python, function arguments can be given by position or by keywords.</p>
201
202<xmp class="code">>>> def f(a, b):
203...   return a-b
204...
205>>> f(7, 5)
2062
207>>> f(b=5, a=7)
2082
209</xmp>
210
211<p>Orange functions do not accept the second way of passing arguments. The main reason for this is that we started to write interface between Orange and Python when keyword passing of arguments was not supported in Python yet. Supporting it now would require a lot of effort on our side. Since Orange's function generally have only a few arguments, keyword-style calling is not really needed, so solving this problem is rather at the bottom of the priority list.</p>
212
213</body> </html>
Note: See TracBrowser for help on using the repository browser.