source: orange/docs/extend-widgets/rst/channels.rst @ 11439:2a63a9963207

Revision 11439:2a63a9963207, 11.2 KB checked in by Ales Erjavec <ales.erjavec@…>, 12 months ago (diff)

Small fixes to widget development documentation.

Line 
1###################
2Channels and Tokens
3###################
4
5Our data sampler widget was, regarding the channels, rather simple
6and linear: the widget was designed to receive the token from one
7widget, and send an output token to another widget. Just like in an
8example schema below:
9
10.. image:: schemawithdatasamplerB.png
11
12There's quite a bit more to channels and management of tokens, and
13we will overview most of the stuff you need to know to make your more
14complex widgets in this section.
15
16********************
17Multi-Input Channels
18********************
19
20First, I do not like the name, but can't make up anything better. In
21essence, the basic idea about "multi-input" channels is that they can
22be used to connect them with several output channels. That is, if a
23widget supports such a channel, several widgets can feed their input
24to that widget simultaneously.
25
26Say we want to build a widget that takes a data set and test
27various predictive modelling techniques on it. A widget has to have an
28input data channel, and this we know how to deal with from our
29:doc:`previous <settings>` lesson. But, somehow differently, we
30want to connect any number of widgets which define learners to our
31testing widget. Just like in a schema below, where three different
32learners are used:
33
34.. image:: learningcurve.png
35
36We will here take a look at how we define the channels for a learning
37curve widget, and how we manage its input tokens. But before we do it,
38just in brief: learning curve is something that you can use to test
39some machine learning algorithm in trying to see how its performance
40depends on the size of the training set size. For this, one can draw a
41smaller subset of data, learn the classifier, and test it on remaining
42data set. To do this in a just way (by Salzberg, 1997), we perform
43k-fold cross validation but use only a proportion of the data for
44training. The output of the widget should then look something
45like:
46
47.. image:: learningcurve-output.png
48
49Now back to channels and tokens. Input and output channels for our
50widget are defined by::
51
52    self.inputs = [("Data", ExampleTable, self.dataset),
53                   ("Learner", orange.Learner, self.learner, Multiple + Default)]
54
55Notice that everything is pretty much the same as it was with
56widgets from previous lessons, the only difference being
57``Multiple + Default`` as the last value in the list that defines
58the :obj:`Learner` channel. This ``Multiple + Default`` says
59that this is a multi-input channel and is the default input for its type.
60If it would be unspecified then by default value of
61``Single + NonDefault`` would be used. That would mean that the
62widget can receive the input only from one widget and is not the default input
63channel for its type (more on default channels later).
64
65.. note:: :obj:`Default` flag here is used for illustration. Since *Learner*
66          channel is the only channel for a :class:`orange.Learner` type
67          it is also the default.
68
69How does the widget know from which widget did the token come from?
70In Orange, tokens are sent around with an id of a widget that is
71sending the token, and having a multi-input channel only tells Orange to
72send a token together with sending widget id, the two arguments with
73which the receiving function is called. For our :obj:`Learner`
74channel the receiving function is :obj:`learner`, and this looks
75like the following::
76
77    def learner(self, learner, id=None):
78        ids = [x[0] for x in self.learners]
79        if not learner: # remove a learner and corresponding results
80            if not ids.count(id):
81                return # no such learner, removed before
82            indx = ids.index(id)
83            for i in range(self.steps):
84                self.curves[i].remove(indx)
85            del self.scores[indx]
86            del self.learners[indx]
87            self.setTable()
88        else:
89            if ids.count(id): # update
90                       # (already seen a learner from this source)
91                indx = ids.index(id)
92                self.learners[indx] = (id, learner)
93                if self.data:
94                    curve = self.getLearningCurve([learner])
95                    score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
96                    self.scores[indx] = score
97                    for i in range(self.steps):
98                        self.curves[i].add(curve[i], 0, replace=indx)
99            else: # add new learner
100                self.learners.append((id, learner))
101                if self.data:
102                    curve = self.getLearningCurve([learner])
103                    score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
104                    self.scores.append(score)
105                    if len(self.curves):
106                        for i in range(self.steps):
107                            self.curves[i].add(curve[i], 0)
108                    else:
109                        self.curves = curve
110        if len(self.learners):
111            self.infob.setText("%d learners on input." % len(self.learners))
112        else:
113            self.infob.setText("No learners.")
114        self.commitBtn.setEnabled(len(self.learners))
115        if self.data:
116            self.setTable()
117
118OK, this looks like one long and complicated function. But be
119patient! Learning curve is not the simplest widget there is, so
120there's some extra code in the function above to manage the
121information it handles in the appropriate way. To understand the
122signals, though, you should only understand the following. We store
123the learners (objects that learn from data) in the list
124:obj:`self.learners`. The list contains tuples with an id of the
125widget that has sent the learner, and the learner itself. We could
126store such information in a dictionary as well, but for this
127particular widget the order of learners is important, and we thought
128that list is a more appropriate structure.
129
130The function above first checks if the learner sent is empty
131(:obj:`None`). Remember that sending an empty learner
132essentially means that the link with the sending widget was removed,
133hance we need to remove such learner from our list. If a non-empty
134learner was sent, then it is either a new learner (say, from a widget
135we have just linked to our learning curve widget), or an update
136version of the previously sent learner. If the later is the case, then
137there is an ID which we already have in the learners list, and we
138need to replace previous information on that learner. If a new learner
139was sent, the case is somehow simpler, and we just add this learner
140and its learning curve to the corresponding variables that hold this
141information.
142
143The function that handles :obj:`learners` as shown above is
144the most complicated function in our learning curve widget. In fact,
145the rest of the widget does some simple GUI management, and calls
146learning curve routines from testing and performance
147scoring functions from stats. I rather like
148the easy by which new scoring functions are added to the widget, since
149all that is needed is the augmenting the list::
150
151    self.scoring = [("Classification Accuracy", orngStat.CA),
152                    ("AUC", orngStat.AUC),
153                    ("BrierScore", orngStat.BrierScore),
154                    ("Information Score", orngStat.IS),
155                    ("Sensitivity", orngStat.sens),
156                    ("Specificity", orngStat.spec)]
157
158which is defined in the initialization part of the widget. The
159other useful trick in this widget is that evaluation (k-fold cross
160validation) is carried out just once given the learner, data set and
161evaluation parameters, and scores are then derived from class
162probability estimates as obtained from the evaluation procedure. Which
163essentially means that switching from one to another scoring function
164(and displaying the result in the table) takes only a split of a
165second. To see the rest of the widget, check out
166:download:`its code <OWLearningCurveA.py>`.
167
168*****************************
169Using Several Output Channels
170*****************************
171
172There's nothing new here, only that we need a widget that has
173several output channels of the same type to illustrate the idea of the
174default channels in the next section. For this purpose, we will modify
175our sampling widget as defined in previous lessons such that it will
176send out the sampled data to one channel, and all other data to
177another channel. The corresponding channel definition of this widget
178is::
179
180    self.outputs = [("Sampled Data", ExampleTable), ("Other Data", ExampleTable)]
181
182We used this in the third incarnation of :download:`data sampler widget <OWDataSamplerC.py>`,
183with essentially the only other change in the code in the :obj:`selection` and
184:obj:`commit` functions::
185
186    def selection(self):
187        indices = orange.MakeRandomIndices2(p0=self.proportion / 100.)
188        ind = indices(self.dataset)
189        self.sample = self.dataset.select(ind, 0)
190        self.otherdata = self.dataset.select(ind, 1)
191        self.infob.setText('%d sampled instances' % len(self.sample))
192
193    def commit(self):
194        self.send("Sampled Data", self.sample)
195        self.send("Other Data", self.otherdata)
196
197If a widget that has multiple channels of the same type is
198connected to a widget that accepts such tokens, Orange Canvas opens a
199window asking the user to confirm which channels to connect. The
200channel mentioned in :obj:`self.outputs` is connected by
201default. Hence, if we have just connected Data Sampler
202(C) widget to a Data Table widget in a schema below:
203
204.. image:: datasampler-totable.png
205
206we would get a following window querying users for information on
207which channels to connect:
208
209.. image:: datasampler-channelquerry.png
210
211*************************************************************
212Default Channels (When Using Input Channels of the Same Type)
213*************************************************************
214
215Now, let's say we want to extend our learning curve widget such
216that it does the learning the same way as it used to, but can -
217provided that such data set is defined - test the
218learners (always) on the same, external data set. That is, besides the
219training data set, we need another channel of the same type but used
220for training data set. Notice, however, that most often we will only
221provide the training data set, so we would not like to be bothered (in
222Orange Canvas) with the dialog which channel to connect to, as the
223training data set channel will be the default one.
224
225When enlisting the input channel of the same type, the non-default
226channels have a special flag in the channel specification list. So for
227our new :download:`learning curve <OWLearningCurveB.py>` widget, the
228channel specification is::
229
230    self.inputs = [("Train Data", ExampleTable, self.trainset, Default),
231                   ("Test Data", ExampleTable, self.testset),
232                   ("Learner", orange.Learner, self.learner, Multiple)]
233
234That is, the :obj:`Train Data` channel is a single-token
235channel which is a default one (third parameter). Note that the flags can
236be added (or OR-d) together so ``Default + Multiple`` is a valid flag.
237To test how this works, connect a file widget to a learning curve widget and
238- nothing will really happen:
239
240.. image:: file-to-learningcurveb.png
241
242That is, no window with a query on which channels
243to connect to will open. To find out which channels got connected,
244double click on the green link between the two widgets:
245
246.. image:: file-to-learningcurveb-channels.png
Note: See TracBrowser for help on using the repository browser.