source: orange/orange/doc/reference/classifierFromVar.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 7.8 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html> <HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
4</HEAD> <body>
5
6<index name="classifiers/from a single attribute">
7<h1>Classifier from Attribute</h1>
8
9<P>Classifiers from attribute predict the class based on the value of a single attribute. While they can be used for making predictions, they actually play a different, yet important role in Orange. They are used not to predict class values but to compute attribute's values. For instance, when a continuous attribute is discretized and replaced by a discrete attribute, an instance of a classifier from attribute takes care of automatic value computation when needed. Similarly, a classifier from attribute usually decides the branch when example is classified in decision trees.</P>
10
11<P>There are two classifiers from attribute; the simpler <CODE>ClassifierFromVarFD</CODE> supposes that example is from some fixed domain and the safer <CODE>ClassifierFromVar</CODE> does not. You should primarily use the latter, moreover since it uses a caching schema which helps the class to be practically as fast as the former.</P>
12
13<P>Both classifiers can be given a transformer that can modify the value. In discretization, for instance, the transformer is responsible to compute a discrete interval for a continuous value of the original attribute.</P>
14
15<H2>ClassifierFromVar</H2>
16<index name="classes/ClassifierFromVarFD">
17
18<P class=section>Attributes</P>
19<DL class=attributes>
20<DT>whichVar</DT>
21<DD>The descriptor of the attribute whose value is to be returned.</DD>
22
23<DT>transformer</DT>
24<DD>The transformer for the value. It should be a class derived from <CODE>TransformValue</CODE>, but you can also use a callback function.</DD>
25
26<DT>distributionForUnknown</DT>
27<DD>The distribution that is returned when the <CODE>whichVar</CODE>'s value is undefined.</CODE>
28</DL>
29
30<P>When given an <CODE>example</CODE>, <CODE>ClassifierFromVar</CODE> will return <CODE>transformer(example[whichVar])</CODE>. <CODE>whichVar</CODE> can be either an ordinary attribute, a meta attribute or an attribute which is not defined for the example but has <CODE>getValueFrom</CODE> that can be used to compute the value. If none goes through or if the value found is unknown, a <CODE>Value</CODE> of subtype <CODE>Distribution</CODE> containing <CODE>distributionForUnknown</CODE> is returned.</P>
31
32<P>The class stores the domain version for the last example and its position in the domain. If consecutive examples come from the same domain (which is usually the case), <CODE>ClassifierFromVar</CODE> is just two simple <CODE>if</CODE>s slower than <CODE>ClassifierFromVarFD</CODE>.
33
34<P>As you might have guessed, the crucial component here is the <CODE>transformer</CODE>. Let us, for sake of demonstration, load a Monk 1 dataset and construct an attribute <CODE>e1</CODE> that will have value "1", when <CODE>e</CODE> is "1", and "not 1" when <CODE>e</CODE> is different than 1. There are many ways to do it, and that same problem is covered in different places in Orange documentation. Although the way presented here is not the simplest, it will serve to demonstrate how <CODE>ClassifierFromVar</CODE> works.</P>
35
36<p class="header">part of <a href="classifierFromVar.py">part of classifierFromVar.py</a>
37(uses <a href="monk1.tab">monk1.tab</a>)</p>
38<XMP class=code>import orange
39data = orange.ExampleTable("monk1")
40e = data.domain["e"]
41
42e1 = orange.EnumVariable("e1", values = ["1", "not 1"])
43
44def eTransformer(value):
45    if int(value) == 0:
46        return 0
47    else:
48        return 1
49
50e1.getValueFrom = orange.ClassifierFromVar()
51e1.getValueFrom.whichVar = e
52e1.getValueFrom.transformer = eTransformer)
53
54data2 = data.select(["a", "b", "e", e1, "y"])
55for i in data2:
56    print i
57</XMP>
58
59<P>As first, you might have noticed that <CODE>transformer</CODE>, an attribute  of a pure C++ object <CODE>ClassifierFromVar</CODE>, has been assigned a Python function. As you can learn by reading <A href="callbacks.htm">the documentation on callback functions</A>, the function itself gets automatically wrapped into a C++ class that performs the argument conversion to Python and back. (Not that you need to know about it. Just use it and be happy that it works.)</P>
60
61<P>The problem here is that the <CODE>eTransformer</CODE> doesn't get the nice instances of <CODE>orange.Value</CODE> that you are used to. You cannot compare the value to a string - the function cannot begin by "<CODE>if value == "1"</CODE>", since the <CODE>value</CODE> has no associated attribute descriptor that would "understand" the string "1". Instead, you need to use integer indices. Since values of <CODE>e</CODE> are "1", "2", "3", "4", index 0 corresponds to value "1". The same goes for returning values; values of <CODE>e1</CODE> are "1" and "not 1", in this order, so returning 0 says "1" and returning 1 says "not 1".</P>
62
63<P>Having written the transformer, the rest is trivial - we assign a <CODE>ClassifierFromVar</CODE> to the new attribute's <CODE>getValueFrom</CODE>, and set its <CODE>whichVar</CODE> to <CODE>e</CODE> and <CODE>transformer</CODE> to <CODE>eTransformer</CODE>.</P>
64
65<P>To check the results, we constructed a new example table containing only attributes <CODE>a</CODE>, <CODE>b</CODE> and <CODE>e</CODE>, the new attribute <CODE>e1</CODE> and the class attribute. For example conversion, the value of <CODE>e1</CODE> is computed by calling <CODE>ClassifierFromVar</CODE> and the overall effect is that for each example <CODE>ex</CODE>, <CODE>e1</CODE> has value <CODE>eTransformer(ex[e])</CODE>.</P>
66
67
68<H2>ClassifierFromVarFD</H2>
69<index name="classes/ClassifierFromVarFD">
70
71<P><CODE>ClassifierFromVarFD</CODE> is very similar to <CODE>ClassifierFromVar</CODE> except that the attribute is not given as a descriptor (like <CODE>whichVar</CODE>) but as an index. The index can be either a position of the attribute in the domain or a meta-id. Given that <CODE>ClassifierFromVarFD</CODE> is practically no faster than <CODE>ClassifierFromVar</CODE> (and can in future even be merged with the latter), you should seldom need to use the class.</P>
72
73<P class=section>Attributes</P>
74<DL class=attributes>
75<DT>domain <SPAN class=normalfont>(inherited from <CODE>ClassifierFD</CODE>)</SPAN></DT>
76<DD>The domain on which the classifier operates.</DD>
77
78<DT>position</DT>
79<DD>The position of the attribute in the domain or its meta-id.</DD>
80
81<DT>transformer</DT>
82<DD>The transformer for the value.</DD>
83
84<DT>distributionForUnknown</DT>
85<DD>The distribution that is returned when the <CODE>whichVar</CODE>'s value is undefined.</CODE>
86</DL>
87
88<P>When an example is passed to <CODE>ClassifierFromVarFD</CODE>, it is first checked whether it is from the correct <CODE>domain</CODE>; an exception is raised if not. If the domain is OK, the corresponding attribute value is retrieved, transformed and returned.</P>
89
90<P><CODE>ClassifierFromVarFD</CODE>'s twin brother, <CODE>ClassifierFromVar</CODE>, can also handle attributes that are not in the examples' domain or meta-attributes, but can be computed therefrom by using their <CODE>getValueFrom</CODE>. Since <CODE>ClassifierFromVarFD</CODE> doesn't store attribute descriptor but only an index, such functionality is obviously impossible.</P>
91
92<P>To rewrite the above script to use <CODE>ClassifierFromVarFD</CODE>, we need to set the domain and the <CODE>e</CODE>'s index to <CODE>position</CODE> (equivalent to setting <CODE>whichVar</CODE> in <CODE>ClassifierFromVar</CODE>). The initialization of <CODE>ClassifierFromVarFD</CODE> thus goes like this:
93
94<p class="header">part of <a href="classifierFromVar.py">part of classifierFromVar.py</a>
95(uses <a href="monk1.tab">monk1.tab</a>)</p>
96<XMP class=code>e1.getValueFrom = orange.ClassifierFromVarFD()
97e1.getValueFrom.domain = data.domain
98e1.getValueFrom.position = data.domain.attributes.index(e)
99e1.getValueFrom.transformer = eTransformer
100</XMP>
101
102</BODY>
103</HTML> 
Note: See TracBrowser for help on using the repository browser.