source: orange/Orange/doc/extend-widgets/channels.htm @ 10997:e82a939c44ce

Revision 10997:e82a939c44ce, 11.2 KB checked in by Ales Erjavec <ales.erjavec@…>, 18 months ago (diff)

Updated channel specification flags documentation in 'extend-widgets'.

Line 
1<HTML>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<title>Orange Widgets: Channels and Tokens</title>
5</HEAD>
6<BODY>
7
8<H1>Channels and Tokens</H1>
9 
10<p>Our data sampler widget was, regarding the channels, rather simple
11and linear: the widget was designed to receive the token from one
12widget, and send an output token to another widget. Just like in an
13example schema below:</p>
14
15<img src="schemawithdatasamplerB.png">
16
17<p>There's quite a bit more to channels and management of tokens, and
18we will overview most of the stuff you need to know to make your more
19complex widgets in this section.</p>
20
21<h2>Multi-Input Channels</h2>
22
23<p>First, I do not like the name, but can't make up anything better. In
24essence, the basic idea about "multi-input" channels is that they can
25be used to connect them with several output channels. That is, if a
26widget supports such a channel, several widgets can feed their input
27to that widget simultaneously.</p>
28
29<p>Say we want to build a widget that takes a data set and test
30various predictive modelling techniques on it. A widget has to have an
31input data channel, and this we know how to deal with from our <a
32href="settings.htm">previous</a> lesson. But, somehow differently, we
33want to connect any number of widgets which define learners to our
34testing widget. Just like in a schema below, where three different
35learners are used:</p>
36
37<img src="learningcurve.png">
38
39<p>We will here take a look at how we define the channels for a learning
40curve widget, and how we manage its input tokens. But before we do it,
41just in brief: learning curve is something that you can use to test
42some machine learning algorithm in trying to see how its performance
43depends on the size of the training set size. For this, one can draw a
44smaller subset of data, learn the classifier, and test it on remaining
45data set. To do this in a just way (by Salzberg, 1997), we perform
46k-fold cross validation but use only a proportion of the data for
47training. The output of the widget should then look something
48like:</p>
49
50<img src="learningcurve-output.png">
51
52<p>Now back to channels and tokens. Input and output channels for our
53widget are defined by:</p>
54
55<xmp class="code">self.inputs = [("Data", ExampleTable, self.dataset),
56               ("Learner", orange.Learner, self.learner, Multiple)]
57</xmp>
58
59<p>Notice that everything is pretty much the same as it was with
60widgets from previous lessons, the only difference being
61<code>Multiple + Default</code> as the last value in the list that defines
62the <code>Learner</code> channel. This <code>Multiple + Default</code> says
63that this is a multi-input channel and is the default input for its type.
64If it would be unspecified then by default value of
65<code>Single + NonDefault</code> would be used. That would mean that the
66widget can receive the input only from one widget and is not the default input
67channel for its type (more on default channels later).</p>
68
69<p>How does the widget know from which widget did the token come from?
70In Orange, tokens are sent around with an id of a widget that is
71sending the token (essentially, with a pointer to the corresponding
72widget object), and having a multi-input channel only tells Orange to
73send a token together with sending widget id, the two arguments with
74which the receiving function is called. For our <code>Learner</code>
75channel the receiving function is <code>learner</code>, and this looks
76like the following:</p>
77
78<xmp class="code">def learner(self, learner, id=None):
79    ids = [x[0] for x in self.learners]
80    if not learner: # remove a learner and corresponding results
81        if not ids.count(id):
82            return # no such learner, removed before
83        indx = ids.index(id)
84        for i in range(self.steps):
85            self.curves[i].remove(indx)
86        del self.scores[indx]
87        del self.learners[indx]
88        self.setTable()
89    else:
90        if ids.count(id): # update
91                   # (already seen a learner from this source)
92            indx = ids.index(id)
93            self.learners[indx] = (id, learner)
94            if self.data:
95                curve = self.getLearningCurve([learner])
96                score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
97                self.scores[indx] = score
98                for i in range(self.steps):
99                    self.curves[i].add(curve[i], 0, replace=indx)
100        else: # add new learner
101            self.learners.append((id, learner))
102            if self.data:
103                curve = self.getLearningCurve([learner])
104                score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
105                self.scores.append(score)
106                if len(self.curves):
107                    for i in range(self.steps):
108                        self.curves[i].add(curve[i], 0)
109                else:
110                    self.curves = curve
111    if len(self.learners):
112        self.infob.setText("%d learners on input." % len(self.learners))
113    else:
114        self.infob.setText("No learners.")
115    self.commitBtn.setEnabled(len(self.learners))
116    if self.data:
117        self.setTable()
118</xmp>
119
120<p>OK, this looks like one long and complicated function. But be
121patient! Learning curve is not the simplest widget there is, so
122there's some extra code in the function above to manage the
123information it handles in the appropriate way. To understand the
124signals, though, you should only understand the following. We store
125the learners (objects that learn from data) in the list
126<code>self.learners</code>. The list contains tuples with an id of the
127widget that has sent the learner, and the learner itself. We could
128store such information in a dictionary as well, but for this
129particular widget the order of learners is important, and we thought
130that list is a more appropriate structure.</p>
131
132<p>The function above first checks if the learner sent is empty
133(<code>None</code>). Remember that sending an empty learner
134essentially means that the link with the sending widget was removed,
135hance we need to remove such learner from our list. If a non-empty
136learner was sent, then it is either a new learner (say, from a widget
137we have just linked to our learning curve widget), or an update
138version of the previously sent learner. If the later is the case, then
139there is an ID which we already have in the learners list, and we
140need to replace previous information on that learner. If a new learner
141was sent, the case is somehow simpler, and we just add this learner
142and its learning curve to the corresponding variables that hold this
143information.</p>
144
145<p>The function that handles <code>learners</code> as shown above is
146the most complicated function in our learning curve widget. In fact,
147the rest of the widget does some simple GUI management, and calls
148learning curve routines from <a
149href="/doc/modules/orngTest.htm">orngTest</a> and performance
150scoring functions from <a
151href="/doc/modules/orngStat.htm">orngStat</a>. I rather like
152the easy by which new scoring functions are added to the widget, since
153all that is needed is the augmenting the list</p>
154
155<xmp class="code">self.scoring = [("Classification Accuracy", orngStat.CA),\
156                ("AUC", orngStat.AUC), \
157                ("BrierScore", orngStat.BrierScore),\
158                ("Information Score", orngStat.IS),\
159                ("Sensitivity", orngStat.sens), \
160                ("Specificity", orngStat.spec)]
161</xmp>
162
163<p>which is defined in the initialization part of the widget. The
164other useful trick in this widget is that evaluation (k-fold cross
165validation) is carried out just once given the learner, data set and
166evaluation parameters, and scores are then derived from class
167probability estimates as obtained from the evaluation procedure. Which
168essentially means that switching from one to another scoring function
169(and displaying the result in the table) takes only a split of a
170second. To see the rest of the widget, check out <a
171href="OWLearningCurveA.py">its code</a>.</p>
172
173<h2>Using Several Output Channels</h2>
174
175<p>There's nothing new here, only that we need a widget that has
176several output channels of the same type to illustrate the idea of the
177default channels in the next section. For this purpose, we will modify
178our sampling widget as defined in previous lessons such that it will
179send out the sampled data to one channel, and all other data to
180another channel. The corresponding channel definition of this widget
181is:</p>
182
183<xmp class="code">self.outputs = [("Sampled Data", ExampleTable), ("Other Data", ExampleTable)]
184</xmp>
185
186</p>We used this in the third incarnation of <a
187href="OWDataSamplerC.py">data sampler widget</a>, with essentially the
188only other change in the code in the <code>selection</code> and
189<code>commit</code> functions:</p>
190
191<xmp class="code">def selection(self):
192    indices = orange.MakeRandomIndices2(p0=self.proportion / 100.)
193    ind = indices(self.dataset)
194    self.sample = self.dataset.select(ind, 0)
195    self.otherdata = self.dataset.select(ind, 1)
196    self.infob.setText('%d sampled instances' % len(self.sample))
197
198def commit(self):
199    self.send("Sampled Data", self.sample)
200    self.send("Other Data", self.otherdata)
201</xmp>
202
203<p>If a widget that has multiple channels of the same type is
204connected to a widget that accepts such tokens, Orange Canvas opens a
205window asking the user to confirm which channels to connect. The
206channel mentioned in <code>self.outputs</code> is connected by
207default. Hence, if we have just connected Data Sampler
208(C) widget to a Data Table widget in a schema below:</p>
209
210<img src="datasampler-totable.png">
211
212<p>we would get a following window querying users for information on
213which channels to connect:</p>
214
215<img src="datasampler-channelquerry.png">
216
217<h2>Default Channels (When Using Input Channels of the Same Type)</h2>
218
219<p>Now, let's say we want to extend our learning curve widget such
220that it does the learning the same way as it used to, but can -
221provided that such data set is defined - test the
222learners (always) on the same, external data set. That is, besides the
223training data set, we need another channel of the same type but used
224for training data set. Notice, however, that most often we will only
225provide the training data set, so we would not like to be bothered (in
226Orange Canvas) with the dialog which channel to connect to, as the
227training data set channel will be the default one.</p>
228
229<p>When enlisting the input channel of the same type, the non-default
230channels have a special flag in the channel specification list. So for
231our new <a href="OWLearningCurveB.py">learning curve</a> widget, the
232channel specification is:</p>
233
234<xmp class="code">self.inputs = [("Train Data", ExampleTable, self.trainset, Default),
235               ("Test Data", ExampleTable, self.testset),
236               ("Learner", orange.Learner, self.learner, Multiple)]
237</xmp>
238
239<p>That is, the <code>Train Data</code> channel is a single-token
240channel which is a default one (third parameter). Note that the flags can
241be added (or OR-d) together so <code>Default + Multi</code> is a valid flag.
242To test how this works, connect a file widget to a learning curve widget and
243- nothing will really happen:</p>
244
245<img src="file-to-learningcurveb.png">
246
247<p>That is, no window with a query on which channels
248to connect to will open. To find out which channels got connected,
249double click on the green link between the two widgets:</p>
250
251<img src="file-to-learningcurveb-channels.png">
252
253</body>
254</html>
Note: See TracBrowser for help on using the repository browser.