source: orange/Orange/doc/reference/Variable.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 28.8 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print></LINK>
5</HEAD>
6
7<BODY>
8<h1>Attribute Descriptors</h1>
9<index name="attributes">
10<index name="attribute types">
11
12<P>Attribute descriptors are stored in objects derived from type
13<CODE>orange.Variable</CODE>. Their role is to identify the
14attributes. Two attributes in Orange are same, if they have the
15same descriptor, not the same name. Besides, descriptors store
16symbolic names for attributes and their symbolic values. Another
17important feature of <CODE>orange.Variable</CODE> is that define
18a method by which an attribute value can be computed from other
19attributes; this is used in, for instance, discretization.</P>
20
21<P>Variables can be constructed the usual way, through constructors, or by calling functions <code>orange.Variable.getExisting</code> or <code>orange.Variable.make</code>. These functions search through the existing variables to find one with the same name, type and for discrete attributes, values. If the succeed, the existing descriptor (an instance of <code>Variable</code>) is returned. If none is found, <code>orange.Variable.getExisting</code> returns None, while <code>orange.Variable.make</code> creates a new variable. Through using these two functions, same-named attributes can be the same attributes. This is needed for loading the data, while typical user-written scripts seldom require such attributes as they can store and reuse descriptors themselves. The functions are described <a href="#getExisting">later on</a>.</P>
22
23
24<H2>Variable</H2>
25
26<P><CODE>orange.<INDEX name="classes/Variable">Variable</CODE> is a base class for attribute descriptors.</P>
27
28<p class=section>Attributes</P>
29
30<DL class=attributes>
31<DT>name</DT>
32<DD>
33Each attribute has a name. An empty string is a wholly legal name
34that can and should be used for temporary or very internal
35attributes. Two attributes can have the same name: Orange does
36not distinguish attributes by names except in communication with
37user (when user wants to see a value of attribute 'age', the name
38is obviously used) or when loading the data (see the explanation
39in <A href="fileformats.htm">Supported File Formats</A>).
40However, if two attributes with same names appear in the same
41domain and indexing by names is used, results of user queries are
42unpredictable. In general, try to avoid giving the same name to
43different attributes.<DD>
44
45<DT>varType</DT>
46<DD><CODE>varType</CODE> is an integer describing the attribute
47type. As for <CODE>orange.Value</CODE>'s <CODE>varType</CODE>, it
48can be <code>orange.VarTypes.Discrete</code> (1),
49<code>orange.VarTypes.Continuous</code> (2) or
50<code>orange.VarTypes.Other</code>.</DD>
51
52<DT>getValueFrom</DT>
53<DD>When attribute is derived from other attributes, e.g. through
54discretization, binarization or some form of constructive
55induction, <CODE>getValueFrom</CODE> points to a "function" that
56computes the value of the attribute from values of other
57attributes. The function is actually an
58<CODE>orange.Classifier</CODE>: its input is an
59<CODE>orange.Example</CODE> whose values are used to compute the
60value of the derived attribute, and its result is the computed
61value. A great thing about this is that it usually happens behind
62your back. Even more, you should <B>never call
63<CODE>getValueFrom</CODE> directly, but should do so through
64method <CODE>computeValue</CODE> that establishes security
65measures prohibiting deadlocks.</B></P>
66
67<P>Although <CODE>getValueFrom</CODE> is always of type
68<CODE>orange.Classifier</CODE>, you can set it to an ordinary
69Python function or callable class. Orange will automatically wrap
70it into an <CODE>orange.Classifier</CODE>, as described in <A
71href="callbacks.htm">Subtyping Orange classes in Python</A>.</P>
72
73<P>See the corresponding example below.</P></DD>
74
75<DT>ordered</DT>
76<DD>A flag telling whether the attribute values are ordered. At
77the moment, no method actually treats ordinal attributes
78differently than nominal, so this flag is reserved for future
79use.</DD>
80
81<DT>distributed</DT>
82<DD>A flag that tells whether the values of this attribute are distributions. As for flag <CODE>ordered</CODE>, no methods treat such attributes in any special manner, so the flag is again reserved for future use.</DD>
83
84<DT>sourceVariable</DT>
85<DD>Another attribute for potential use in future: if
86<CODE>getValueFrom</CODE> computes the attribute value from a
87single attribute, this attribute can be (but is not necessarily)
88stored in <CODE>sourceVariable</CODE>. As this is only used in a
89rather obscure place you won't run into, there's no harm in not
90ever setting <CODE>sourceVariable</CODE>.</DD>
91</DT>
92
93<DT>randomGenerator</DT>
94<DD>Local random number generator used by method <CODE>randomvalue</CODE>.</DD>
95
96<dt>defaultMetaId</dt>
97<dd>A proposed meta id to be used with that variable. By default it is set to 0; when the attribute is first <a href="Domain.htm#meta-attributes">registered with any domain</a>. It does not mean that the attribute should always have this same meta id. <code>defaultMetaId</code> is, for instance, used by the data loader for <a href="tabdelimited.htm">tab-delimited file format</a>, or by function <code>newmetaid</code>, if the variable is passed as an argument.</dd>
98</DL>
99
100<p class=section>Methods</P>
101
102<DL class=attributes>
103<DT>&lt;constructors&gt;</DT>
104<DD>Constructors for classes derived from
105<CODE>orange.Variable</CODE> (which is abstract itself) can be
106given the usual keyword arguments. Besides, the attribute name
107can be given directly. That is, an attribute descriptor for
108continuous attribute "age" can be constructed by calling
109<CODE>orange.FloatVariable("age")</CODE> or, equivalently, by
110<CODE>orange.FloatVariable(name="age")</CODE>.</DD>
111
112<DT>&lt;call&gt;</DT>
113<DD>Calling a descriptor can be used to convert symbolic, integer
114or any other applicable native Python types into
115<CODE>orange.Value</CODE> objects for this attribute. Calling
116<CODE>var(val)</CODE> is equivalent to <CODE>orange.Value(var,
117val)</CODE>; see <A href="Value.htm#construction">construction of
118values</A>.</DD>
119
120<DT>&lt;iteration&gt;</DT>
121<DD>Attribute descriptors can be used in <CODE>for</CODE> loops.
122So <CODE>for val in var</CODE> would iterate through all values
123of attribute <CODE>var</CODE>, when possible.</DD>
124
125<DT>randomvalue()</DT>
126<DD><CODE>randomvalue</CODE> returns a random value for the attribute, when possible. This function uses <CODE>randomGenerator</CODE>; if none has been assigned yet, a new is constructed with the initial seed 0, and stored for the future use.</DD>
127
128<DT>computeValue(example)</DT>
129<DD>Calls <CODE>getValueFrom</CODE> through a mechanism that prevents deadlocks by circular calls.</DD>
130</DL>
131
132<A name="EnumVariable">EnumVariable</a>
133<H2>EnumVariable</H2>
134
135<P><CODE><INDEX name="classes/EnumVariable">EnumVariable</CODE> is a descriptor for nominal and
136ordinal attributes. It defines two additional attributes,
137<CODE>values</CODE> and <CODE>baseValue</CODE>, and no additional
138methods. Iterating and returning random values is supported.
139
140<p class=section>Attributes</p>
141
142<DL class=attributes>
143<DT>values</DT>
144<DD>A list with symbolic names for attribute's values. Values for
145attributes of type <CODE></CODE> are stored as
146integers referring to this list. Therefore, modifying this list
147instantly changes names of values of examples, as they are
148printed out or referred to by user. The size of the list is also
149used to indicate the number of possible values for this
150attribute; changing the size, especially <B>shrinking the list
151can have disastrous effects and is therefore not really
152recommendable</B>. Also, do not add values to the list by calling its <code>append</code> or <code>extend</code> method: call <code>EnumVariable.addValues</code> described below.</P>
153
154<P>It is also assumed that <CODE>values</CODE> is always defined
155(but can be empty), so you should never set <CODE>values</CODE>
156to <CODE>None</CODE>.</dd>
157
158<DT>baseValue</DT>
159<DD>Sets the base value for the attribute. This can be, for
160instance a "normal" value, such as "no complications" as opposed
161to abnormal "low blood pressure" and "excessive blooding". The
162base value can be (and is) used by certain statistics and,
163potentially, learning algorithms. <CODE>baseValue</CODE> is an
164integer that is to be interpreted as an index to
165<CODE>values</CODE>. The absence of base value ("sex" can be
166either "female" or "male", without an obvious base value) is
167indicated by <CODE>-1</CODE>.</DD> </DL>
168
169<p class=section>Methods</p>
170
171<DL class=attributes>
172<DT>addValue(string)</DT>
173<dd>Adds a value to values. Always call this function instead of appending to <code>values</code>.</dd>
174</dl>
175
176<A name="FloatVariable"></a>
177<H2>FloatVariable</H2>
178
179
180<P><CODE><INDEX name="classes/FloatVariable">FloatVariable</CODE> is a descriptor for continuous
181attributes.
182
183<DL class=attributes>
184<DT>startValue, endValue, stepValue</DT>
185<DD>The range of attribute, used for returning random values and for iteration. You can leave
186the three values at defaults (<CODE>-1</CODE>, which is
187interpreted as undefined), if you don't need randoms and
188iterations. (I can't recall ever using them...)</DD>
189
190<DT>numberOfDecimals</DT>
191<DD>The number of decimals used when the value is printed, converted to a string or saved to a file</DD>
192
193<DT>scientificFormat</DT>
194<DD>If <CODE>True</CODE>, the value is printed in scientific format whenever it would have more than 5 digits. In this case, <CODE>numberOfDecimals</CODE> is ignored.</DD>
195
196<DT>adjustDecimals</DT>
197<DD>Tells Orange to monitor the number of decimals when the value is converted from a string (either by setting the attribute values, <I>e.g.</I> <CODE>example[0]="3.14"</CODE> or when reading from file). The value of 0 means that the number of decimals should not be adjusted, while 1 and 2 mean that adjustments are on, with 2 denoting that no values have been converted yet.</DD>
198</DL>
199
200<P>By default, adjustment of number of decimals goes as follows. If the attribute was constructed when examples were read from a file, it will be printed with the same number of decimals as the largest number of decimals encountered in the file. If scientific notation occurs in the file, <CODE>scientificFormat</CODE> will be set to True and scientific format will be used for values too large or too small.</P>
201
202<P>If the attribute is created in a script, it will have, by default, three decimals places. This can be changed either by setting the attribute value from a string (<I>e.g.</I> <CODE>example[0]="3.14"</CODE>, but not <CODE>example[0]=3.14</CODE>) or by manually setting the <CODE>numberOfDecimals</CODE> (<I>e.g.</I> <CODE>attr.numberOfDecimals=1</CODE>).</P>
203
204<a name="StringVariable"></a>
205<H2>StringVariable</H2>
206
207<P><CODE><INDEX name="classes/StringVariable">StringVariable</CODE> describes attributes that contains
208strings. No method can use them for learning; some will complain
209and other will silently ignore them when the encounter them. They
210can be, however, useful for meta-attributes; if examples in
211dataset have unique id's, the most efficient way to retain them
212is to read them as meta-attributes. In general, never use
213discrete attributes with many (say, more than 50) values. Such
214attributes are probably not of any use for learning and should be
215stored as string attributes.</P>
216
217<P>There's a short and simple example which makes use of
218<CODE>StringVariable</CODE> near the end of the page about <a
219href="Domain.htm"><CODE>Domain</CODE></A>.</P>
220
221<P>When converting strings into values and back, empty strings are treated differently than usual. For other types, an empty string can be used as a synonymous for question mark ("don't know"), while <code>StringVariable</code> will take empty string as an empty string -- that is, except when loading or saving into file. Empty strings in files are interpreted as "don't know". You can, however, enclose the string into double quotes; these get removed when the string is loaded. Therefore, to give an empty string, put it into double quotes, <code>""</code>.</P>
222
223
224<H2>PythonVariable</H2>
225
226<P><CODE><INDEX name="classes/PythonVariable">PythonVariable</CODE> is a base class for descriptors defined in Python. Itself fully functional, <CODE>PythonVariable</CODE> can already be used as a descriptor for attributes that contain arbitrary Python values. Since this is an advanced topic, <CODE>PythonVariable</CODE>s are described on a <A href="PythonVariable.htm">a separate page</A>.
227
228<a name="getValueFrom"></a>
229<H2>Using getValueFrom</H2>
230
231<P>Monk 1 is a well-known dataset with target concept <CODE>y :=
232a==b or e==1</CODE>. It does not hurt, even more, it can even
233help if we replace the four-valued attribute <CODE>e</CODE> with
234a binary attribute having values <CODE>1</CODE> and <CODE>not
2351</CODE>. The new attribute shall be computed from the old one on
236the fly.</P>
237
238<p class="header">part of <a href="variable.py">variable.py</a>
239(uses <a href="monk1.tab">monk1.tab</a>)</p>
240<XMP class="code">import orange
241data = orange.ExampleTable("monk1")
242
243e2 = orange.EnumVariable("e2", values = ["not 1", "1"])
244
245def checkE(example, returnWhat):
246    if example["e"]=="1":
247        return orange.Value(e2, "1")
248    else:
249        return orange.Value(e2, "not 1")
250
251e2.getValueFrom = checkE
252</XMP>
253</P>
254
255<P>Our new attribute is named <CODE>e2</CODE>; we define it by
256descriptor of type <CODE>orange.EnumVariable</CODE>, with
257appropriate name and values <CODE>not 1</CODE> and <CODE>1</CODE> (we chose this order so that the <CODE>not 1</CODE>'s index is 0, which can be, if needed, interpreted as <CODE>false</CODE>).</P>
258
259<P><CODE>checkE</CODE> is a function that is passed an example
260and another argument we don't care about. If example's attribute
261<CODE>e</CODE> equals <CODE>1</CODE>, the function returns value
262<CODE>1</CODE>, otherwise it returns <CODE>not 1</CODE>. Both are returned as values, not plain strings of attribute
263<CODE>e2</CODE>. Finally, we tell <CODE>e2</CODE> to use
264<CODE>checkE</CODE> to compute its value when needed, by
265assigning <CODE>checkE</CODE> to <CODE>getValueFrom</CODE>.</P>
266
267<P>In most circumstances, value of <CODE>e2</CODE> can be computed on the fly - we can pretend that the attribute exists in the <CODE>data</CODE>, although it doesn't (but can be computed from it). For instance, we can observe the conditional distribution of classes with regard to <CODE>e2</CODE>.</P>
268
269<XMP class="code">>>> dist = orange.Distribution(e2, data)
270>>> print dist
271<324.000, 108.000>
272>>>
273>>> cont = orange.ContingencyAttrClass(e2, data)
274>>> print "Class distribution when e=1:", cont["1"]
275Class distribution when e=1: <0.000, 108.000>
276>>> print "Class distribution when e<>1:", cont["not 1"]
277Class distribution when e<>1: <216.000, 108.000>
278</XMP>
279
280<P><CODE>orange.Distribution</CODE> is called to compute the distribution for <CODE>e2</CODE> in <CODE>data</CODE>. When it notices that <CODE>data.domain</CODE> does not contain <CODE>e2</CODE>, it checks whether <CODE>e2</CODE>'s <CODE>getValueFrom</CODE> is defined and, seeing that it is, utilizes it to get <CODE>e2</CODE>'s values.</P>
281
282<P>We describe technical details to make you aware that automatic recomputation requires some effort on the side of <CODE>orange.ContingencyAttrClass</CODE>. There are methods which will not do that for you, either because it would be too complex or time consuming. An example of such situation is constructive induction by function decomposition; making incompatibility matrices with attributes computed on the fly would be slow and impractical, so attempting it would yield an error. In such cases, you can simply convert entire examples table to a new domain that also includes the new attribute.</P>
283
284<p class="header">part of <a href="variable.py">variable.py</a>
285(uses <a href="monk1.tab">monk1.tab</a>)</p>
286<XMP class="code">newDomain = orange.Domain([data.domain["a"], data.domain["b"],
287                           e2, data.domain.classVar])
288newData = orange.ExampleTable(newDomain, data)
289</XMP>
290</P>
291
292<P>Automatic computation is useful when the data is split onto training and testing examples. Training examples can be modified by adding, removing and transforming attributes (in a typical setup, continuous attributes are discretized prior to learning, therefore the original attributes are replaced by new attributes), while testing examples are left as they are. When they are classified, the classifier automatically converts the testing examples into the new domain, which includes recomputation of transformed attributes. With our toy script, we can split the data, use it for learning and then test the classification of unmodified test examples.</P>
293
294<p class="header"><a href="variable2.py">variable2.py</a>
295(uses <a href="monk1.tab">monk1.tab</a>)</p>
296<XMP class="code">import orange, orngTree
297
298data = orange.ExampleTable("monk1")
299
300indices = orange.MakeRandomIndices2(data, p0=0.7)
301trainData = data.select(indices, 0)
302testData = data.select(indices, 1)
303
304e2 = orange.EnumVariable("e2", values = ["not 1", "1"])
305e2.getValueFrom = lambda example, returnWhat: orange.Value(e2, example["e"]=="1")
306
307newDomain = orange.Domain([data.domain["a"], data.domain["b"], e2, data.domain.classVar])
308newTrain = orange.ExampleTable(newDomain, trainData)
309
310tree = orange.TreeLearner(newTrain)
311
312orngTree.printTxt(tree)
313
314for ex in testData[:10]:
315    print ex.getclass(), tree(ex)
316</XMP>
317</P>
318
319<P>First, note that we have rewritten the above example,
320replacing the <CODE>checkE</CODE> function with a simpler
321<CODE>lambda function</CODE>, which exploits the fact that
322Python's <CODE>false</CODE> and <CODE>true</CODE> equal 0 and 1.
323We have split the <CODE>data</CODE> into <CODE>trainData</CODE>
324and <CODE>testData</CODE>, with 70% and 30% of examples,
325respectively. After constructing a new domain, we only translate
326the training examples and induce a decision tree. Printout shows
327that it first split the examples by the attribute <CODE>e2</CODE>
328and then, if <CODE>e2</CODE> is not 1, it (implicitly) checks the
329equality of <CODE>a</CODE> and <CODE>b</CODE>. In the
330<CODE>for</CODE> loop, examples from <CODE>testData</CODE>, which
331does not have attribute <CODE>e2</CODE> are correctly classified.
332The way this is done is same for all classifiers: classifier
333stores the domain description for the learning examples (or, to
334be more precise, a domain in which the model is described). Prior
335to classification, examples from other domains are converted to
336the stored domain. In our case, examples from
337<CODE>testData</CODE> are converted to <CODE>newDomain</CODE>,
338and the given lambda function is used to compute the value from
339<CODE>e2</CODE> from <CODE>e</CODE>.</P>
340
341<P>What to do if an attribute can be computed from different
342domains, using different procedures? Can there be more than one
343function to be tried? Why is there only one
344<CODE>getValueFrom</CODE>, not a list of them? Although we are
345pretty advanced Orange users, we never ran into a situation where
346we needed this (obviously; if needed it, we'd have done something
347about it :). If you, however, need to specify more than one
348function for attribute value computation, you can define a Python
349class that stores a list of functions and calls them in
350appropriate manner. Then give an object of this class to
351<CODE>getValueFrom</CODE>. And tell us about your case, and we
352shall rethink our position.</P>
353
354
355<a name="getExisting"></a>
356<h2>Advanced: Reuse of Descriptors</h2>
357<index name="getExisting">
358<index name="Variable.getExisting">
359<index name="make">
360<index name="Variable.make">
361
362<P>There are situations when the attribute descriptor may need to be reused, yet the reference to it is not available. Typically, the user loads some training examples, trains a classifier and then loads a separate test set. For the classifier to recognize the attributes in the second data set, the descriptors, not just the names, need to be the same. This problem was first solved by requiring the user to explicitly provide the "original" <a href="Domain.htm"><code>Domain</code></a>, which mystified too many, so later on Orange used <a href="DomainDepot.htm">domain depots</a> where it looked for suitable domains to reuse without any user intervention. This worked - with a few nasty exceptions - until Orange started to (tend to) support pickling: as unpickling always created new attributes, unpickled classifiers (or data or any other object storing references to descriptors) were useless.</P>
363
364<P>Orange now maintains a list of all existing Variables and can check it before constructing new variables. This is done while loading the data, will be used for unpickling and can be explicitly used by the user. Creating variables directly, with constructors (<code>EnumVariable()</code> etc) always constructs brand new variables.</P>
365
366<P>The search is based on four arguments: the attribute's name, type, ordered values and unordered values. As for the latter two, the values can be explicitly ordered by the user, e.g. in the second line of the tab-delimited file, for instance to order sizes as small-medium-big.</P>
367
368<P>The search for existing variables can end with one of the following statuses. (Note: Use symbolic constants, not integer numbers given in parentheses; we may introduce a new status, <code>ExtraValues</code> between <code>OK</code> and <code>MissingValues</code>. You can, however, count on the order of statuses to stay the same.)</P>
369
370<ul>
371<li><code>orange.<index>Variable.MakeStatus.NotFound (4)</index></code>: the attribute with that name and type does not exist</li>
372
373<li><code>orange.<index>Variable.MakeStatus.Incompatible (3)</index></code>: there is (or are) attributes with matching name and type, but their list of values is incompatible with the prescribed ordered values. For example, if the existing variable already has values ["a", "b"] and the new one wants ["b", "a"], this is no go. The existing list can, however be extended by the new values, so searching for ["a", "b", "c"] would succeed. So will also the search for ["a"], since the extra existing value does not matter. The formal rule is thus that the values are compatible if <code>existing_values[:len(ordered_values)] == ordered_values[:len(existing_values)]</code>.</li>
374
375<li><code>orange.<index>Variable.MakeStatus.NoRecognizedValues (2)</index></code>:
376there is a matching attribute, yet it has none of the values that the new attribute will have (this is obviously possible only if the new attribute has no prescribed ordered values). For instance, we search for an attribute "sex" with values "male" and "female", while there is an attribute of the same name with values "M" and "F" (or, well, "no" and "yes" :). Reuse of this attribute is possible, though this should probably be a new attribute since it obviously comes from a different data set. If we do decide for reuse, the old attribute will get some unneeded new values and the new one will inherit some from the old.</li>
377
378<li><code>orange.<index>Variable.MakeStatus.MissingValues (1)</index></code>: there is a matching attribute with some of the values that the new one requires, but some values are missing. This situation is neither uncommon nor suspicious: in case of separate training and testing data sets there may be attribute values which occur in one set but not in the other.</li>
379
380<li><code>orange.<index>Variable.MakeStatus.OK (0)</index></code>: the is an attribute which contains all the prescribed values in the correct order. The existing attribute may have some extra values, though.</li>
381</ul>
382</P>
383
384<P>Continuous attributes can obviously have only two statuses, <code>NotFound</code> or <code>OK</code>.</P>
385
386<P>When loading the data using <a href="fileformats.htm"><code>orange.ExampleTable</code></a>, orange takes the safest approach and, by default, reuses everything that is compatible, that is, up to and including <code>NoRecognizedValues</code>. Unintended reuse would be obvious from the attribute having to many values, which the user can notice and fix. More on that in the page on <a href="fileformats.htm">loading data</a>.</P>
387
388<P>There are two functions for reusing the attributes instead of creating new ones.</P>
389
390<DL class=attributes>
391<DT>Variable.make(name, type[, ordered-values, unordered-values, createNewOn])</DT>
392<DD><P>The <code>type</code> should be one of the types in <code>orange.VarTypes</code>, e.g., <code>orange.VarTypes.Discrete</code>. Values can be given with any iterable type (list, set...). The optional <code>createOnNew</code> specifies the status at which a new attribute is created. The status must be at most <code>Incompatible</code> since incompatible (or non-existing) attributes cannot be reused. If it is set lower, for instance to <code>MissingValues</code>, a new attribute will be created even if there exists an attribute which only misses same values. If set to <code>OK</code>, the function will always create a new attribute.</P>
393
394<P>The function returns a tuple containing an attribute descriptor and the status of the best matching attribute. So, if <code>createOnNew</code> was set to <code>MissingValues</code>, and there exists an attribute whose status is, say, <code>UnrecognizedValues</code>, a new attribute would be created, while the second element of the tuple would contain <code>UnrecognizedValues</code>. If, on the other hand, there exists an attribute which is perfectly OK, its descriptor is returned and the returned status is <code>OK</code>. The function returns no indicator whether the returned constructor is reused or not. This can be, however, read from the status code: if it is smaller than the specified <code>createNewOn</code>, the attribute is reused, otherwise we got a new descriptor.</P>
395
396<P>The exception to the rule is when <code>createNewOn</code> is OK. In this case, the function does not search through the existing attributes and cannot know the statuses, so the returned status in this case is always <code>OK</code>.</P></DD>
397
398<dt>Variable.getExisting(name, type[, ordered-values, unordered-values, createNewOn])</DT>
399<dd>This function is essentially the same as <code>make</code> except that it does not construct a new attribute but returns <code>None</code> instead.</dd>
400</DL>
401
402<P>Here are a few examples for <code>Variable.make</code>; <Code>getExisting</Code> works similarly. These examples give the shown results if executed only once (in a Python session) and in this order.</P>
403
404<p class="header">part of <a href="variableReuse.py">variableReuse.py</a>
405</p>
406<XMP class="code">>>> v1, s = orange.Variable.make("a", orange.VarTypes.Discrete, ["a", "b"])
407>>> print s, v1.values
4084 <a, b>
409</XMP>
410
411<P>No surprises here: new variable is created and the status is <code>NotFound</code>.</P>
412
413<XMP class="code">>>> v2, s = orange.Variable.make("a", orange.VarTypes.Discrete, ["a"], ["c"])
414>>> print s, v2 is v1, v1.values
4151 True <a, b, c>
416</XMP>
417
418<P>The status is 1 (<code>MissingValues</code>), yet the variable is reused (<code>v2 is v1</code> is <code>True</code>). <code>v1</code> gets a new value, <code>c</code>, which was given as an unordered value. It does not matter that the new variable does not need value <code>b</code>.</P>
419
420<XMP class="code">>>> v3, s = orange.Variable.make("a", orange.VarTypes.Discrete, ["a", "b", "c", "d"])
421>>> print s, v3 is v1, v1.values
4221 True <a, b, c, d>
423</XMP>
424
425<P>This is similar as before, except that the new value, <code>d</code> is not among the ordered values.</P>
426
427<XMP class="code">>>> v4, s = orange.Variable.make("a", orange.VarTypes.Discrete, ["b"])
428>>> print s, v4 is v1, v1.values, v4.values
4293, False, <b>, <a, b, c, d>
430</xmp>
431
432<P>The new attribute needs to have <Code>b</Code> as the first value, so it is incompatible with the existing attribute. The status is thus 3 (<Code>Incompatible</Code>), the two attributes are not equal and have different lists of values.</P>
433
434<XMP class="code">>>> v5, s = orange.Variable.make("a", orange.VarTypes.Discrete, None, ["c", "a"])
435>>> print s, v5 is v1, v1.values, v5.values
4360 True <a, b, c, d> <a, b, c, d>
437</XMP>
438
439<P>The new attribute has values <code>c</code> and <code>a</code>, but does not mind about the order, so the existing attribute is <code>OK</code>.</P>
440
441<XMP class="code">>>> v6, s = orange.Variable.make("a", orange.VarTypes.Discrete, None, ["e"]) "a"])
442>>> print s, v6 is v1, v1.values, v6.values
4432 True <a, b, c, d, e> <a, b, c, d, e>
444</xmp>
445
446<P>The new attribute has different values than the existing (status is 2, <code>NoRecognizedValues</code>), but the existing is reused nevertheless. Note that we gave <code>e</code> in the list of unordered values. If it was among the ordered, the reuse would fail.</P>
447
448<XMP class="code">>>> v7, s = orange.Variable.make("a", orange.VarTypes.Discrete, None,
449        ["f"], orange.Variable.MakeStatus.NoRecognizedValues)) "a"])
450>>> print s, v7 is v1, v1.values, v7.values
4512 False <a, b, c, d, e> <f>
452</xmp>
453
454<P>This is the same as before, except that we prohibited reuse when there are no recognized value. Hence a new attribute is created, though the returned status is the same as before.</P>
455
456<XMP class="code">>>> v8, s = orange.Variable.make("a", orange.VarTypes.Discrete,
457      ["a", "b", "c", "d", "e"], None, orange.Variable.MakeStatus.OK)
458>>> print s, v8 is v1, v1.values, v8.values
4590 False <a, b, c, d, e> <a, b, c, d, e>
460</xmp>
461
462<P>Finally, this is a perfect match, but any reuse is prohibited, so a new attribute is created.</P>
463
464</BODY>
465
Note: See TracBrowser for help on using the repository browser.