source: orange/Orange/doc/extend-widgets/channels.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 10.9 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<HTML>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<title>Orange Widgets: Channels and Tokens</title>
5</HEAD>
6<BODY>
7
8<H1>Channels and Tokens</H1>
9 
10<p>Our data sampler widget was, regarding the channels, rather simple
11and linear: the widget was designed to receive the token from one
12widget, and send an output token to another widget. Just like in an
13example schema below:</p>
14
15<img src="schemawithdatasamplerB.png">
16
17<p>There's quite a bit more to channels and management of tokens, and
18we will overview most of the stuff you need to know to make your more
19complex widgets in this section.</p>
20
21<h2>Multi-Input Channels</h2>
22
23<p>First, I do not like the name, but can't make up anything better. In
24essence, the basic idea about "multi-input" channels is that they can
25be used to connect them with several output channels. That is, if a
26widget supports such a channel, several widgets can feed their input
27to that widget simultaneously.</p>
28
29<p>Say we want to build a widget that takes a data set and test
30various predictive modelling techniques on it. A widget has to have an
31input data channel, and this we know how to deal with from our <a
32href="settings.htm">previous</a> lesson. But, somehow differently, we
33want to connect any number of widgets which define learners to our
34testing widget. Just like in a schema below, where three different
35learners are used:</p>
36
37<img src="learningcurve.png">
38
39<p>We will here take a look at how we define the channels for a learning
40curve widget, and how we manage its input tokens. But before we do it,
41just in brief: learning curve is something that you can use to test
42some machine learning algorithm in trying to see how its performance
43depends on the size of the training set size. For this, one can draw a
44smaller subset of data, learn the classifier, and test it on remaining
45data set. To do this in a just way (by Salzberg, 1997), we perform
46k-fold cross validation but use only a proportion of the data for
47training. The output of the widget should then look something
48like:</p>
49
50<img src="learningcurve-output.png">
51
52<p>Now back to channels and tokens. Input and output channels for our
53widget are defined by:</p>
54
55<xmp class="code">self.inputs = [("Data", ExampleTable, self.dataset), ("Learner", orange.Learner, self.learner, 0)]
56</xmp>
57
58<p>Notice that everything is pretty much the same as it was with
59widgets from previous lessons, the only difference being "0" as the
60last value in the list that defines the <code>Learner</code>
61channel. This "0" says that this is a multi-input channel. If it would
62be "1" (default), it would mean that the widget can receive the input
63only from one widget.</p>
64
65<p>How does the widget know from which widget did the token come from?
66In Orange, tokens are sent around with an id of a widget that is
67sending the token (essentially, with a pointer to the corresponding
68widget object), and having a multi-input channel only tells Orange to
69send a token together with sending widget id, the two arguments with
70which the receiving function is called. For our <code>Learner</code>
71channel the receiving function is <code>learner</code>, and this looks
72like the following:</p>
73
74<xmp class="code">def learner(self, learner, id=None):
75    ids = [x[0] for x in self.learners]
76    if not learner: # remove a learner and corresponding results
77        if not ids.count(id):
78            return # no such learner, removed before
79        indx = ids.index(id)
80        for i in range(self.steps):
81            self.curves[i].remove(indx)
82        del self.scores[indx]
83        del self.learners[indx]
84        self.setTable()
85    else:
86        if ids.count(id): # update
87                   # (already seen a learner from this source)
88            indx = ids.index(id)
89            self.learners[indx] = (id, learner)
90            if self.data:
91                curve = self.getLearningCurve([learner])
92                score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
93                self.scores[indx] = score
94                for i in range(self.steps):
95                    self.curves[i].add(curve[i], 0, replace=indx)
96        else: # add new learner
97            self.learners.append((id, learner))
98            if self.data:
99                curve = self.getLearningCurve([learner])
100                score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
101                self.scores.append(score)
102                if len(self.curves):
103                    for i in range(self.steps):
104                        self.curves[i].add(curve[i], 0)
105                else:
106                    self.curves = curve
107    if len(self.learners):
108        self.infob.setText("%d learners on input." % len(self.learners))
109    else:
110        self.infob.setText("No learners.")
111    self.commitBtn.setEnabled(len(self.learners))
112    if self.data:
113        self.setTable()
114</xmp>
115
116<p>OK, this looks like one long and complicated function. But be
117patient! Learning curve is not the simplest widget there is, so
118there's some extra code in the function above to manage the
119information it handles in the appropriate way. To understand the
120signals, though, you should only understand the following. We store
121the learners (objects that learn from data) in the list
122<code>self.learners</code>. The list contains tuples with an id of the
123widget that has sent the learner, and the learner itself. We could
124store such information in a dictionary as well, but for this
125particular widget the order of learners is important, and we thought
126that list is a more appropriate structure.</p>
127
128<p>The function above first checks if the learner sent is empty
129(<code>None</code>). Remember that sending an empty learner
130essentially means that the link with the sending widget was removed,
131hance we need to remove such learner from our list. If a non-empty
132learner was sent, then it is either a new learner (say, from a widget
133we have just linked to our learning curve widget), or an update
134version of the previously sent learner. If the later is the case, then
135there is an ID which we already have in the learners list, and we
136need to replace previous information on that learner. If a new learner
137was sent, the case is somehow simpler, and we just add this learner
138and its learning curve to the corresponding variables that hold this
139information.</p>
140
141<p>The function that handles <code>learners</code> as shown above is
142the most complicated function in our learning curve widget. In fact,
143the rest of the widget does some simple GUI management, and calls
144learning curve routines from <a
145href="/doc/modules/orngTest.htm">orngTest</a> and performance
146scoring functions from <a
147href="/doc/modules/orngStat.htm">orngStat</a>. I rather like
148the easy by which new scoring functions are added to the widget, since
149all that is needed is the augmenting the list</p>
150
151<xmp class="code">self.scoring = [("Classification Accuracy", orngStat.CA),\
152                ("AUC", orngStat.AUC), \
153                ("BrierScore", orngStat.BrierScore),\
154                ("Information Score", orngStat.IS),\
155                ("Sensitivity", orngStat.sens), \
156                ("Specificity", orngStat.spec)]
157</xmp>
158
159<p>which is defined in the initialization part of the widget. The
160other useful trick in this widget is that evaluation (k-fold cross
161validation) is carried out just once given the learner, data set and
162evaluation parameters, and scores are then derived from class
163probability estimates as obtained from the evaluation procedure. Which
164essentially means that switching from one to another scoring function
165(and displaying the result in the table) takes only a split of a
166second. To see the rest of the widget, check out <a
167href="OWLearningCurveA.py">its code</a>.</p>
168
169<h2>Using Several Output Channels</h2>
170
171<p>There's nothing new here, only that we need a widget that has
172several output channels of the same type to illustrate the idea of the
173default channels in the next section. For this purpose, we will modify
174our sampling widget as defined in previous lessons such that it will
175send out the sampled data to one channel, and all other data to
176another channel. The corresponding channel definition of this widget
177is:</p>
178
179<xmp class="code">self.outputs = [("Sampled Data", ExampleTable), ("Other Data", ExampleTable)]
180</xmp>
181
182</p>We used this in the third incarnation of <a
183href="OWDataSamplerC.py">data sampler widget</a>, with essentially the
184only other change in the code in the <code>selection</code> and
185<code>commit</code> functions:</p>
186
187<xmp class="code">def selection(self):
188    indices = orange.MakeRandomIndices2(p0=self.proportion / 100.)
189    ind = indices(self.dataset)
190    self.sample = self.dataset.select(ind, 0)
191    self.otherdata = self.dataset.select(ind, 1)
192    self.infob.setText('%d sampled instances' % len(self.sample))
193
194def commit(self):
195    self.send("Sampled Data", self.sample)
196    self.send("Other Data", self.otherdata)
197</xmp>
198
199<p>If a widget that has multiple channels of the same type is
200connected to a widget that accepts such tokens, Orange Canvas opens a
201window asking the user to confirm which channels to connect. The
202channel mentioned in <code>self.outputs</code> is connected by
203default. Hence, if we have just connected Data Sampler
204(C) widget to a Data Table widget in a schema below:</p>
205
206<img src="datasampler-totable.png">
207
208<p>we would get a following window quering users for information on
209which channels to connect:</p>
210
211<img src="datasampler-channelquerry.png">
212
213<h2>Default Channels (When Using Input Channels of the Same Type)</h2>
214
215<p>Now, let's say we want to extend our learning curve widget such
216that it does the learning the same way as it used to, but can -
217provided that such data set is defined - test the
218learners (always) on the same, external data set. That is, besides the
219training data set, we need another channel of the same type but used
220for training data set. Notice, however, that most often we will only
221provide the training data set, so we would not like to be bothered (in
222Orange Canvas) with the dialog which channel to connect to, as the
223training data set channel will be the default one.</p>
224
225<p>When enlisting the input channel of the same type, the non-default
226channels have a special fourth value in the channel specification
227list. So for our new <a href="OWLearningCurveB.py">learning curve</a>
228widget, the channel specification is:</p>
229
230<xmp class="code">self.inputs = [("Train Data", ExampleTable, self.trainset), ("Test Data", ExampleTable, self.testset, 1, 1), ("Learner", orange.Learner, self.learner, 0)]
231</xmp>
232
233<p>That is, the <code>Test Data</code> channel is a single-token
234channel (third parameter) which is not a default one (where "1"
235indicates that this is a so-called minor channel). To test how this
236works, connect a file widget to a learning curve widget and - nothing
237will really happen:</p>
238
239<img src="file-to-learningcurveb.png">
240
241<p>That is, no window with a query on which channels
242to connect to will open. To find out which channels got connected,
243double click on the green link between the two widgets:</p>
244
245<img src="file-to-learningcurveb-channels.png">
246
247</body>
248</html>
Note: See TracBrowser for help on using the repository browser.