source: orange/docs/extend-widgets/rst/channels.rst @ 11050:e3c4699ca155

Revision 11050:e3c4699ca155, 11.0 KB checked in by Miha Stajdohar <miha.stajdohar@…>, 16 months ago (diff)

Widget docs From HTML to Sphinx.

Line 
1###################
2Channels and Tokens
3###################
4 
5Our data sampler widget was, regarding the channels, rather simple
6and linear: the widget was designed to receive the token from one
7widget, and send an output token to another widget. Just like in an
8example schema below:
9
10.. image:: schemawithdatasamplerB.png
11
12There's quite a bit more to channels and management of tokens, and
13we will overview most of the stuff you need to know to make your more
14complex widgets in this section.
15
16********************
17Multi-Input Channels
18********************
19
20First, I do not like the name, but can't make up anything better. In
21essence, the basic idea about "multi-input" channels is that they can
22be used to connect them with several output channels. That is, if a
23widget supports such a channel, several widgets can feed their input
24to that widget simultaneously.
25
26Say we want to build a widget that takes a data set and test
27various predictive modelling techniques on it. A widget has to have an
28input data channel, and this we know how to deal with from our :doc:`previous <settings>`
29lesson. But, somehow differently, we
30want to connect any number of widgets which define learners to our
31testing widget. Just like in a schema below, where three different
32learners are used:
33
34.. image:: learningcurve.png
35
36We will here take a look at how we define the channels for a learning
37curve widget, and how we manage its input tokens. But before we do it,
38just in brief: learning curve is something that you can use to test
39some machine learning algorithm in trying to see how its performance
40depends on the size of the training set size. For this, one can draw a
41smaller subset of data, learn the classifier, and test it on remaining
42data set. To do this in a just way (by Salzberg, 1997), we perform
43k-fold cross validation but use only a proportion of the data for
44training. The output of the widget should then look something
45like:
46
47.. image:: learningcurve-output.png
48
49Now back to channels and tokens. Input and output channels for our
50widget are defined by::
51
52    self.inputs = [("Data", ExampleTable, self.dataset),
53               ("Learner", orange.Learner, self.learner, Multiple)]
54
55Notice that everything is pretty much the same as it was with
56widgets from previous lessons, the only difference being
57:obj:`Multiple + Default` as the last value in the list that defines
58the :obj:`Learner` channel. This :obj:`Multiple + Default` says
59that this is a multi-input channel and is the default input for its type.
60If it would be unspecified then by default value of
61:obj:`Single + NonDefault` would be used. That would mean that the
62widget can receive the input only from one widget and is not the default input
63channel for its type (more on default channels later).
64
65How does the widget know from which widget did the token come from?
66In Orange, tokens are sent around with an id of a widget that is
67sending the token (essentially, with a pointer to the corresponding
68widget object), and having a multi-input channel only tells Orange to
69send a token together with sending widget id, the two arguments with
70which the receiving function is called. For our :obj:`Learner`
71channel the receiving function is :obj:`learner`, and this looks
72like the following::
73
74    def learner(self, learner, id=None):
75        ids = [x[0] for x in self.learners]
76        if not learner: # remove a learner and corresponding results
77            if not ids.count(id):
78                return # no such learner, removed before
79            indx = ids.index(id)
80            for i in range(self.steps):
81                self.curves[i].remove(indx)
82            del self.scores[indx]
83            del self.learners[indx]
84            self.setTable()
85        else:
86            if ids.count(id): # update
87                       # (already seen a learner from this source)
88                indx = ids.index(id)
89                self.learners[indx] = (id, learner)
90                if self.data:
91                    curve = self.getLearningCurve([learner])
92                    score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
93                    self.scores[indx] = score
94                    for i in range(self.steps):
95                        self.curves[i].add(curve[i], 0, replace=indx)
96            else: # add new learner
97                self.learners.append((id, learner))
98                if self.data:
99                    curve = self.getLearningCurve([learner])
100                    score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
101                    self.scores.append(score)
102                    if len(self.curves):
103                        for i in range(self.steps):
104                            self.curves[i].add(curve[i], 0)
105                    else:
106                        self.curves = curve
107        if len(self.learners):
108            self.infob.setText("%d learners on input." % len(self.learners))
109        else:
110            self.infob.setText("No learners.")
111        self.commitBtn.setEnabled(len(self.learners))
112        if self.data:
113            self.setTable()
114
115OK, this looks like one long and complicated function. But be
116patient! Learning curve is not the simplest widget there is, so
117there's some extra code in the function above to manage the
118information it handles in the appropriate way. To understand the
119signals, though, you should only understand the following. We store
120the learners (objects that learn from data) in the list
121:obj:`self.learners`. The list contains tuples with an id of the
122widget that has sent the learner, and the learner itself. We could
123store such information in a dictionary as well, but for this
124particular widget the order of learners is important, and we thought
125that list is a more appropriate structure.
126
127The function above first checks if the learner sent is empty
128(:obj:`None`). Remember that sending an empty learner
129essentially means that the link with the sending widget was removed,
130hance we need to remove such learner from our list. If a non-empty
131learner was sent, then it is either a new learner (say, from a widget
132we have just linked to our learning curve widget), or an update
133version of the previously sent learner. If the later is the case, then
134there is an ID which we already have in the learners list, and we
135need to replace previous information on that learner. If a new learner
136was sent, the case is somehow simpler, and we just add this learner
137and its learning curve to the corresponding variables that hold this
138information.
139
140The function that handles :obj:`learners` as shown above is
141the most complicated function in our learning curve widget. In fact,
142the rest of the widget does some simple GUI management, and calls
143learning curve routines from testing and performance
144scoring functions from stats. I rather like
145the easy by which new scoring functions are added to the widget, since
146all that is needed is the augmenting the list ::
147
148    self.scoring = [("Classification Accuracy", orngStat.CA),\
149                ("AUC", orngStat.AUC), \
150                ("BrierScore", orngStat.BrierScore),\
151                ("Information Score", orngStat.IS),\
152                ("Sensitivity", orngStat.sens), \
153                ("Specificity", orngStat.spec)]
154
155which is defined in the initialization part of the widget. The
156other useful trick in this widget is that evaluation (k-fold cross
157validation) is carried out just once given the learner, data set and
158evaluation parameters, and scores are then derived from class
159probability estimates as obtained from the evaluation procedure. Which
160essentially means that switching from one to another scoring function
161(and displaying the result in the table) takes only a split of a
162second. To see the rest of the widget, check out `its code <OWLearningCurveA.py>`_.
163
164*****************************
165Using Several Output Channels
166*****************************
167
168There's nothing new here, only that we need a widget that has
169several output channels of the same type to illustrate the idea of the
170default channels in the next section. For this purpose, we will modify
171our sampling widget as defined in previous lessons such that it will
172send out the sampled data to one channel, and all other data to
173another channel. The corresponding channel definition of this widget
174is::
175
176    self.outputs = [("Sampled Data", ExampleTable), ("Other Data", ExampleTable)]
177
178We used this in the third incarnation of `data sampler widget <OWDataSamplerC.py>`_,
179with essentially the only other change in the code in the :obj:`selection` and
180:obj:`commit` functions::
181
182    def selection(self):
183        indices = orange.MakeRandomIndices2(p0=self.proportion / 100.)
184        ind = indices(self.dataset)
185        self.sample = self.dataset.select(ind, 0)
186        self.otherdata = self.dataset.select(ind, 1)
187        self.infob.setText('%d sampled instances' % len(self.sample))
188
189    def commit(self):
190        self.send("Sampled Data", self.sample)
191        self.send("Other Data", self.otherdata)
192
193If a widget that has multiple channels of the same type is
194connected to a widget that accepts such tokens, Orange Canvas opens a
195window asking the user to confirm which channels to connect. The
196channel mentioned in :obj:`self.outputs` is connected by
197default. Hence, if we have just connected Data Sampler
198(C) widget to a Data Table widget in a schema below:
199
200.. image:: datasampler-totable.png
201
202we would get a following window querying users for information on
203which channels to connect:
204
205.. image:: datasampler-channelquerry.png
206
207*************************************************************
208Default Channels (When Using Input Channels of the Same Type)
209*************************************************************
210
211Now, let's say we want to extend our learning curve widget such
212that it does the learning the same way as it used to, but can -
213provided that such data set is defined - test the
214learners (always) on the same, external data set. That is, besides the
215training data set, we need another channel of the same type but used
216for training data set. Notice, however, that most often we will only
217provide the training data set, so we would not like to be bothered (in
218Orange Canvas) with the dialog which channel to connect to, as the
219training data set channel will be the default one.
220
221When enlisting the input channel of the same type, the non-default
222channels have a special flag in the channel specification list. So for
223our new `learning curve <OWLearningCurveB.py>`_ widget, the
224channel specification is::
225
226    self.inputs = [("Train Data", ExampleTable, self.trainset, Default),
227               ("Test Data", ExampleTable, self.testset),
228               ("Learner", orange.Learner, self.learner, Multiple)]
229
230That is, the :obj:`Train Data` channel is a single-token
231channel which is a default one (third parameter). Note that the flags can
232be added (or OR-d) together so :obj:`Default + Multi` is a valid flag.
233To test how this works, connect a file widget to a learning curve widget and
234- nothing will really happen:
235
236.. image:: file-to-learningcurveb.png
237
238That is, no window with a query on which channels
239to connect to will open. To find out which channels got connected,
240double click on the green link between the two widgets:
241
242.. image:: file-to-learningcurveb-channels.png
Note: See TracBrowser for help on using the repository browser.