source: orange/Orange/doc/reference/Example.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 19.9 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print></LINK>
5</HEAD>
6
7<BODY>
8<h1>Example</h1>
9<index name="classes/Example">
10
11<P><CODE>orange.Example</CODE> holds examples - a list of attribute values, together with some auxiliary data. That is how you see them in Python: as a list, resembling ordinary Python's list to extent possible. There are, however, differences; each example corresponds to some domain and therefore the number of attributes ("list elements") and their types are always as prescribed.</P>
12
13<p class=section>Attributes</P>
14
15<DL class=attributes>
16<DT>domain <span class=normalfont>(read-only)</SPAN></DT><DD>Each example corresponds to a <A href="Domain.htm">domain</A>. This field is set at construction time and cannot be modified.</DD>
17
18<DT>name</DT>
19<DD>Example can be assigned a name. It is not used by Orange methods but is provided to be used in user interfaces.</DD>
20</DL>
21
22<P>Examples cannot be assigned arbitrary attributes like other Python and Orange objects ("attribute" is here meant in the sense "class attribute"). For instance, if <CODE>ex</CODE> is an example, <CODE>ex.xxx=12</CODE> will yield an error.</P>
23
24
25<H2>Construction</H2>
26
27<P>To construct a new example, you first have to have a <A href="Domain.htm">domain description</A>. You can construct one yourself or load it from a file (which otherwise also contains some examples). For sake of simplicity, we shall load a domain from "lenses" dataset.</P>
28
29<p class="header">part of <a href="example.py">example.py</a>
30(uses <a href="lenses.tab">lenses.tab</a>)</p>
31<XMP class="code">>>> import orange
32>>> data = orange.ExampleTable("lenses")
33>>> domain = data.domain
34>>> for attr in domain:
35...    print attr.name, attr.values
36...
37age <young, pre-presbyopic, presbyopic>
38prescription <myope, hypermetrope>
39astigmatic <no, yes>
40tear_rate <reduced, normal>
41lenses <none, soft, hard>
42</XMP>
43
44<DL class=attributes>
45<DT>orange.Example(domain)</DT>
46<DD>This is a basic constructor that creates an example with all values unknown. Setting them is one of subjects of this page.</DD>
47
48<DT>orange.Example(domain, list-of-values)</DT>
49<DD>This construct an initialized example. The list can contain anything that can be converted to a value (see documentation on <A href="Value.htm"><CODE>orange.Value</CODE></A>) and must be of appropriate length, one value for each corresponding attribute.
50
51<XMP class="code">>>> ex = orange.Example(domain, ["young", "myope", "yes", "reduced", "soft"])
52>>> print ex
53['young', 'myope', 'yes', 'reduced', 'soft']
54>>> ex = orange.Example(domain, ["young", 0, 1, orange.Value(domain[3],\
55                                 "reduced"), "soft"])
56>>> print ex
57['young', 'myope', 'yes', 'reduced', 'soft']
58</XMP>
59
60<P>The first example was constructed by giving values as strings. That's what you'll usually do; continuous values can, naturally, be given as numbers (or as strings, if you desire so). In the second example, we've shown alternatives: the second and the third values are given by indices and for the fourth we have constructed an <CODE>orange.Value</CODE> (something that <CODE>orange</CODE> would do for us automatically anyway if we just passed a string).
61</DD>
62
63<DT>orange.Example(example)</DT>
64<DD>This is cloning: a new example is created which is exactly the same as the original.</DD>
65
66<DT>orange.Example(domain, example)</DT>
67<DD>This form of constructor can be used for converting examples from other domains.
68
69<XMP class="code">>>> reduced_dom = orange.Domain(["age", "lenses"], domain)
70>>> reduced_ex = orange.Example(reduced_dom, ex)
71>>> print reduced_ex
72['young', 'soft']
73</XMP>
74
75<P>If <CODE>domain</CODE> is the same as original example's domain, this constructor is equivalent to the previous one.</p>
76</DD>
77
78<DT>orange.Example(domain, list-of-examples)</DT>
79<DD>Essentially similar to the converting constructor, <CODE>orange.Example(domain, example)</CODE>, which fills the example with values obtained from another example, this constructor fills the example with values obtained from multiple examples. The needed values are sought for in ordinary and meta-attributes registered with the corresponding domains. Meta-attributes that appear in the given examples and don't appear in the new example either as ordinary or meta attributes, are copied as well.</P>
80
81<P>We shall demonstrate the function on the datasets <a href="merge1.tab">merge1.tab</a> and <A href="merge2.tab">merge2.tab</A>; the first has attributes <CODE>a1</CODE> and <CODE>a2</CODE>, and meta-attributes <CODE>m1</CODE> and <CODE>m2</CODE>, while the second has attributes <CODE>a1</CODE> and <CODE>a3</CODE> and meta-attributes <CODE>m1</CODE> and <CODE>m3</CODE>.
82
83<p class="header"><a href="example_merge.py">example_merge.py</a>  (uses <a href="merge1.tab">merge1.tab</a>, <A href="merge2.tab">merge2.tab</A>)</p>
84<XMP class="code">import orange
85
86data1 = orange.ExampleTable("merge1")
87data2 = orange.ExampleTable("merge2", use = data1.domain)
88
89a1, a2 = data1.domain.attributes
90m1, m2 = data1.domain.getmetas().items()
91a1, a3 = data2.domain.attributes
92
93m1i, m2i = data1.domain.metaid(m1), data1.domain.metaid(m2)
94
95a1, a3 = data2.domain.attributes
96n1 = orange.FloatVariable("n1")
97n2 = orange.FloatVariable("n2")
98
99newdomain = orange.Domain([a1, a3, m1, n1])
100newdomain.addmeta(m2i, m2)
101newdomain.addmeta(orange.newmetaid(), a2)
102newdomain.addmeta(orange.newmetaid(), n2)
103
104merge = orange.Example(newdomain, [data1[0], data2[0]])
105print "First example: ", data1[0]
106print "Second example: ", data2[0]
107print "Merge: ", merge
108</XMP>
109
110<P>The <CODE>newdomain</CODE> consists of several attributes from <CODE>data1</CODE> and <CODE>data2</CODE>: <CODE>a1</CODE>, <CODE>a3</CODE> and <CODE>m1</CODE> are ordinary, and <CODE>m2</CODE> and <CODE>a2</CODE> are meta-attributes. Variables <CODE>m1</CODE> and <CODE>m2</CODE> are really tuples of meta-id and a descriptor (<A href="Variable.htm">Variable</A>). For this reason, <CODE>orange.Domain</CODE> is initialized with <CODE>m1[1]</CODE>, descriptor, while when adding meta attributes, we use <CODE>m2[0]</CODE> and <CODE>m2[1]</CODE>, so that <CODE>m2</CODE> has the same id in both domains. For meta-attribute <CODE>a2</CODE> which was original ordinary, we obtain a new id.</P>
111
112<P>In addition, <CODE>newdomain</CODE> has two new attributes, <CODE>n1</CODE> and <CODE>n2</CODE>, the first as ordinary and the second as meta-attribute.</P>
113
114<XMP class=code>First example:  [1, 2], {"m1":3, "m2":4}
115Second example:  [1, 2.5], {"m1":3, "m3":4.5}
116Merge:  [1, 2.5, 3, ?], {"a2":2, "m2":4, -5:4.50, "n2":?}
117</XMP>
118
119<P>Since attributes <CODE>a1</CODE> and <CODE>m1</CODE> appear in domains of both original examples, the new examples can only be constructed if these values match. They indeed do, and the merged example has all the values defined in the domain (<CODE>a1</CODE>, <CODE>a3</CODE> and <CODE>m2</CODE>, and meta-attributes <CODE>a2</CODE> and <CODE>m1</CODE>). In addition, it got the value of the meta-attribute <CODE>m3</CODE> from the second example, which is only identified by id <CODE>-4</CODE> since it is not registered with the domain. Values of the two new attributes are left undefined.</P>
120</DL>
121
122
123<H2>Basic methods</H2>
124
125<P>Examples have certain list-like behaviour. You can address their values. You can use <CODE>for</CODE> loops to iterate through example's values (<CODE>for value in example:...</CODE>. You can query example's length; it equals the number of attributes, including class attribute. You however cannot change the "length" of example, by inserting or removing attributes. Number and types of attributes are defined by <CODE>domain</CODE>, and the <CODE>domain</CODE> cannot be changed once example is constructed. Finally, you can convert an example to an ordinary Python's list.</P>
126
127<P>Examples can be indexed by integer indices, attribute descriptors or attribute names. Since "age" is the the first attribute in dataset lenses, the below statements are equivalent.</P>
128
129<XMP class="code">>>> age = data.domain["age"]
130>>> example = data[0]
131>>> print example[0]
132young
133>>> print example[age]
134young
135>>> print example["age"]
136young
137</XMP>
138
139<P>Example's values can be modified. We shall increase the age (and if it becomes larger than 2, reset it to 0).</P>
140
141<XMP class="code">>>> example = data[0]
142>>> print data[0]
143['young', 'myope', 'no', 'reduced', 'nono']
144>>> example[age] = (int(example[age])+1) % 3
145>>> print data[0]
146['pre-presbyopic', 'hyper', 'y', 'normal', 'no']
147</XMP>
148
149<P>The lesson which we've learned by the way is that by <CODE>example = data[0]</CODE> we don't get a fresh copy of example but a reference to the first example in the <CODE>data</CODE>. If you need a fresh copy, you need to clone the example, as explained above.</P>
150
151<P>The last value in the example is class value. Do not access it by <CODE>example[-1]</CODE> since this is reserved for future use (with meta values); use <CODE>getclass</CODE> and <CODE>setclass</CODE> instead.</P>
152
153<P class=section>Methods</P>
154<DL class=attributes>
155<DT>getclass()</DT>
156<DD>Returns the example's class as <CODE>Value</CODE>.</DD>
157
158<DT>setclass(value)</DT>
159<DD>Sets the example's class. The argument can be a <CODE>Value</CODE>, number or string.</DD>
160
161<DT>setvalue(value)</DT>
162<DD>Argument <CODE>value</CODE> should be a qualified <CODE>orange.Value</CODE>, that is, it should have field <CODE>variable</CODE> defined. <CODE>value.variable</CODE> should be one of the attributes in example's domain (either ordinary or a registered meta-attribute). Functions sets the value of the attribute to the given value. This function is equivalent to calling <CODE>self[value.variable] = value</CODE>.</P>
163
164<P>This function makes it easy to assign prescribed values to examples; see an example in the section about meta values.</P>
165</DD>
166
167<DT>native([nativity])</DT>
168<DD>Converts the example into an ordinary Python list. If the optional argument is 1 (default), the list will contain objects of type <CODE>Value</CODE>; if it is 0, the list will contain native Python objects - string for discrete and numbers for continuous attribute values).</DD>
169
170<DT>compatible(other_example, ignoreClass = 0)</DT>
171<DD>Return true if the two examples are compatible, that is, if they are the same in all attributes which are known for both examples. If the optional second argument is true, class values are ignored, so two examples are compatible even though they differ in their class values.</DD>
172</DL>
173
174<H2>Hashing</H2>
175
176<P>Hash function for example (accessible via Python's built-in function <CODE>hash</CODE>, see Python documentation) is computed using CRC32. To some extent, you can also use it as random number (this is done, for instance, by <a href="RandomClassifier.htm"><CODE>RandomClassifier</CODE></A>.</P>
177
178<index name="meta attributes">
179<H2>Meta Values</H2>
180
181<P>Data examples in Orange are described by a fixed number and types of values, defined by domain descriptor. There is, however a way to attach additional attributes to examples. Such attributes (we call them meta-attributes) are not used for learning, but can carry additional information, such as, for instance, patient's name or the number of times the example was missclassified during some test procedure. The most common additional information is example's weight. To make things even more complex, we have already encountered problems for which examples had to have more than one weight each.</P>
182
183<P>For contrast from ordinary attributes, examples from the same domain (or even the same <CODE>ExampleTable</CODE>) can have varying number of meta values. Ordinary attributes are addressed by positions (eg <CODE>example[0]</CODE> is the first and <CODE>example[4]</CODE> is the fifth value in the example). Meta-attributes are addressed by id's; id's are really negative integers, but you should see them as "keys". An example can have any number of meta values with distinct id's. Domain descriptor can, but doesn't need to know about them.</P>
184
185<P>Id's are "created" by function <CODE>orange.newmetaid()</CODE>. (The function uses a very elaborate procedure for generating unique negative integers; the procedure might reveal itself only to the brightest if they make a few calls to the function and carefully observe the returned values.) So, if you want to assign meta values to examples, you need to obtain an id from <CODE>orange.newmetaid()</CODE>; afterwards, you can use it on any examples you want.</P>
186
187<P>If there is a particular <a href="Variable.htm">attributes</a> associated with the meta value, you can also pass the attribute as an argument to <code>orange.newmetaid</code>. If the attribute has been already registered with some id, the id can be reused. Doing so is recommended, but not the necessary.</P>
188
189<P>Meta values can also be loaded from files in tab-delimited or Excel format. In this case, you only need to know the names of corresponding meta-attributes; id's and stuff will be taken care of while loading the data. See documentation on <a href="fileformats.htm">file format</A>.</P>
190
191<P>Most often, you will use id for assigning weights; to each example you would assign a number (can be greater or smaller than one, most algorithms will even tolerate negative weights) and pass the id to the learning algorithm. Let's do this with random weights.</P>
192
193<p class="header"><a href="example2.py">example2.py</a>
194(uses <a href="lenses.tab">lenses.tab</a>)</p>
195<XMP class="code">>>> import orange, random
196>>> random.seed(0)
197>>> data = orange.ExampleTable("lenses")
198>>> id = orange.newmetaid()
199>>> for example in data:
200>>>     example[id] = random.random()
201>>> print data[0]
202['young', 'myope', 'no', 'reduced', 'none'], {-2:0.84}
203</XMP>
204
205<P>Example now consists of two parts, ordinary attributes that resemble a list since they are addressed by positions (eg. the first value is "psby"), and meta values that are more like dictionaries, where the id (-2) is a key and 0.34 is a value (of type <CODE>orange.Value</CODE>, as all values in <CODE>Example</CODE>).
206
207<P>To make learner aware of weights, one only needs to pass the id as an additional argument. Therefore, to train a Bayesian classifier on our randomly weighted examples, you would call it by.</P>
208
209<XMP class="code">>>> bayes = orange.BayesLearner(data, id)
210</XMP>
211
212<P>Many other functions accept weights in similar fashion.</P>
213
214<XMP class="code">>>> print orange.getClassDistribution(data)
215<15.000, 5.000, 4.000>
216>>> print orange.getClassDistribution(data, id)
217<9.691, 3.232, 1.969>
218</XMP>
219
220<P>It is easy to see how this system also accommodates examples having different weights to be used for different procedures in the same experimental setup.</P>
221
222<P>As mentioned in documentation on <A href="Domain.htm"><CODE>orange.Domain</CODE></A>, you can enhance the output by registering an attribute descriptor for meta-attribute with id -2 in the example's domain.</P>
223
224<XMP class="code">>>> w = orange.FloatVariable("w")
225>>> data.domain.addmeta(id, w)
226</XMP>
227
228<P>Meta-attribute can now be indexed just as any other attribute:</P>
229
230<XMP class="code">>>> print data[0][id]
2310.844422
232>>> print data[0][w]
2330.844422
234>>> print data[0]["w"]
2350.844422
236</XMP>
237
238<P>More important consequence of registering attribute with the domain is that it enables automatic value conversion.</P>
239
240<P>Let us add a nominal meta-attribute, which will tell whether the example has been double-checked by the domain expert. The attributes values will be "yes" and "no".</P>
241
242<p class="header">part of <a href="example3.py">example3.py</a>
243(uses <a href="lenses.tab">lenses.tab</a>)</p>
244<XMP class="code">>>> ok = orange.EnumVariable("ok?", values=["no", "yes"])
245>>> ok_id = orange.newmetaid()
246>>> data[0].setmeta(ok_id, "yes")
247Traceback (most recent call last):
248  File "C:\PROGRA~1\python22\lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 301, in RunScript
249    exec codeObject in __main__.__dict__
250  File "D:\ai\OrangeTest\reference-misc\example3.py", line 23, in ?
251    data[0].setmeta(ok_id, "yes")
252TypeError: cannot convert 'yes' to a value of an unknown attribute
253</XMP>
254
255<P>This can't work since we haven't told Orange that <CODE>ok_id</CODE> corresponds to attribute <CODE>ok</CODE> and thus it cannot convert string "yes" to a <CODE>orange.Value</CODE>. You should perform the conversion manually.</P>
256
257<XMP class="code">>>> data[0][ok_id] = orange.Value(ok, "yes"))
258</XMP>
259
260<P>However, if you register the meta-attribute with the <A href="Domain.htm">domain descriptor</A>, Orange can find a descriptor and perform the conversion itself.</P>
261
262<XMP class="code">>>> data.domain.addmeta(ok_id, ok)
263>>> data[0][ok_id] = "yes"
264</XMP>
265
266<P>As before, you can use either id <CODE>ok_id</CODE>, attribute descriptor <CODE>ok</CODE> or attribute's name <CODE>"ok?"</CODE> to index the example.</P>
267
268<XMP class="code">>>> data[0][ok_id] = "yes"
269>>> data[0][ok] = "yes"
270>>> data[0]["ok?"] = "yes"
271</XMP>
272
273<P>It is even possible to use the meta-attribute with the <CODE>setvalue</CODE> function.</P>
274
275<XMP class="code">>>> no_yes = [orange.Value(ok, "no"), orange.Value(ok, "yes")]
276>>> for example in data:
277...     example.setvalue(no_yes[whrandom.randint(0, 1)])
278</XMP>
279
280
281<P class=section>Methods</P>
282<DL class=attributes>
283<DT>setmeta(value), getmeta()</DT>
284<DD>Obsolete functions for setting and getting meta values.</DD>
285
286<dt>getmetas([key-type]), getmetas(optional, [key-type])</dt>
287<dd>Returns example's meta values as a dictionary. Key type can be <code>int</code> (default), <code>str</code> or <code>orange.Variable</code>, and determines whether the keys in the dictionary will be meta-id's, attribute names or attribute descriptors. In the latter two cases, the function will only return the meta values that are registered in the domain (there are no descriptors/names associated with other values). In either case, the dictionary contains only a copy of the values: changing the dictionary won't affect the example's meta values.</p>
288
289<p>Argument 'optional' tells the method to return only the optional or the non-optional meta attributes. For the optional, the attributes with the same value of the flag are returned. If the argument is absent, both types of attributes are returned.</p>
290
291<P>The below code will print out the dictionary with all four possible key-types.</P>
292
293<p class="header">part of <a href="example.py">basket.py</a>
294(uses <a href="inquisition2.basket">inquisition2.basket</a>)</p>
295<XMP class="code">data = orange.ExampleTable("inquisition2")
296
297example = data[4]
298print example.getmetas()
299print example.getmetas(int)
300print example.getmetas(str)
301print example.getmetas(orange.Variable)
302</XMP>
303</dd>
304
305<DT>hasmeta(id | attribute-descriptor | name)</DT>
306<DD>Returns True if the example has the meta attribute.</DD>
307
308<DT>removemeta(id | attribute-descriptor | name)</DT>
309<DD>Removes a meta value. To use an attribute-descriptor or name, the corresponding attribute must be registered.</DD>
310
311<index name="example weights">
312
313<DT>getweight(id | attribute-descriptor | name | None)</DT>
314<DD>Returns a value of specified meta attribute. Value must be continuous; an exception is raised otherwise. If the argument is zero or <CODE>None</CODE>, function returns 1.0 (since weight id of 0 normally means that examples are not weighted).</P>
315
316<P>If you are writing your own learner, you should always use this function to retrieve example's weight. It is practical: most functions in Orange that can optionally accept weights, understand a weight id of 0 as "no weights"; this function takes care of that. In particular, never attempt to do this:
317<XMP class=code>>>> weight = example[id]
318</XMP>
319<P>If examples are not weighted, <CODE>id</CODE> will be zero and you'll get the value of the first attribute...</P>
320</DD>
321
322<DT>setweight((id | attribute-descriptor | name) [, weight])
323<DD>Sets a weight. If <CODE>id</CODE> is zero or <CODE>None</CODE>, nothing happens. <CODE>weight</CODE> must be a number; if omitted, the weight is set to 1.0.</DD>
324
325
326<DT>removeweight(id | attribute-descriptor | name)</DT>
327<DD>A simplified equivalent for <CODE>removemeta</CODE>. It does exactly the same thing except that it doesn't accept anything but integer for <CODE>id</CODE>. If <CODE>id</CODE> is zero or <CODE>None</CODE>, this function does nothing.</DD>
328</DT>
329</DL>
330
331</BODY> 
Note: See TracBrowser for help on using the repository browser.