source: orange/docs/extend-widgets/rst/channels.rst @ 11593:6edc44eb9655

Revision 11593:6edc44eb9655, 11.1 KB checked in by Ales Erjavec <ales.erjavec@…>, 10 months ago (diff)

Updated Widget development tutorial.

RevLine 
[11049]1###################
2Channels and Tokens
3###################
[11439]4
[11049]5Our data sampler widget was, regarding the channels, rather simple
6and linear: the widget was designed to receive the token from one
7widget, and send an output token to another widget. Just like in an
8example schema below:
9
10.. image:: schemawithdatasamplerB.png
11
12There's quite a bit more to channels and management of tokens, and
13we will overview most of the stuff you need to know to make your more
14complex widgets in this section.
15
16********************
17Multi-Input Channels
18********************
19
[11593]20In essence, the basic idea about "multi-input" channels is that they can
[11049]21be used to connect them with several output channels. That is, if a
22widget supports such a channel, several widgets can feed their input
23to that widget simultaneously.
24
25Say we want to build a widget that takes a data set and test
[11593]26various predictive modeling techniques on it. A widget has to have an
[11424]27input data channel, and this we know how to deal with from our
28:doc:`previous <settings>` lesson. But, somehow differently, we
[11049]29want to connect any number of widgets which define learners to our
30testing widget. Just like in a schema below, where three different
31learners are used:
32
33.. image:: learningcurve.png
34
35We will here take a look at how we define the channels for a learning
36curve widget, and how we manage its input tokens. But before we do it,
37just in brief: learning curve is something that you can use to test
38some machine learning algorithm in trying to see how its performance
39depends on the size of the training set size. For this, one can draw a
40smaller subset of data, learn the classifier, and test it on remaining
41data set. To do this in a just way (by Salzberg, 1997), we perform
42k-fold cross validation but use only a proportion of the data for
43training. The output of the widget should then look something
44like:
45
46.. image:: learningcurve-output.png
47
48Now back to channels and tokens. Input and output channels for our
49widget are defined by::
50
[11593]51    self.inputs = [("Data", Orange.data.Table, self.dataset),
52                   ("Learner", Orange.classification.Learner,
53                    self.learner, Multiple + Default)]
[11049]54
55Notice that everything is pretty much the same as it was with
56widgets from previous lessons, the only difference being
[11439]57``Multiple + Default`` as the last value in the list that defines
58the :obj:`Learner` channel. This ``Multiple + Default`` says
[11049]59that this is a multi-input channel and is the default input for its type.
60If it would be unspecified then by default value of
[11439]61``Single + NonDefault`` would be used. That would mean that the
[11049]62widget can receive the input only from one widget and is not the default input
63channel for its type (more on default channels later).
64
[11593]65.. note::
66   :obj:`Default` flag here is used for illustration. Since *"Learner"*
67   channel is the only channel for a :class:`Orange.classification.Learner`
68   type it is also the default.
[11408]69
[11049]70How does the widget know from which widget did the token come from?
71In Orange, tokens are sent around with an id of a widget that is
[11408]72sending the token, and having a multi-input channel only tells Orange to
[11049]73send a token together with sending widget id, the two arguments with
[11593]74which the receiving function is called. For our *"Learner"*
75channel the receiving function is :func:`learner`, and this looks
[11049]76like the following::
77
78    def learner(self, learner, id=None):
79        ids = [x[0] for x in self.learners]
80        if not learner: # remove a learner and corresponding results
81            if not ids.count(id):
82                return # no such learner, removed before
83            indx = ids.index(id)
84            for i in range(self.steps):
85                self.curves[i].remove(indx)
86            del self.scores[indx]
87            del self.learners[indx]
88            self.setTable()
89        else:
90            if ids.count(id): # update
91                       # (already seen a learner from this source)
92                indx = ids.index(id)
93                self.learners[indx] = (id, learner)
94                if self.data:
95                    curve = self.getLearningCurve([learner])
96                    score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
97                    self.scores[indx] = score
98                    for i in range(self.steps):
99                        self.curves[i].add(curve[i], 0, replace=indx)
100            else: # add new learner
101                self.learners.append((id, learner))
102                if self.data:
103                    curve = self.getLearningCurve([learner])
104                    score = [self.scoring[self.scoringF][1](x)[0] for x in curve]
105                    self.scores.append(score)
106                    if len(self.curves):
107                        for i in range(self.steps):
108                            self.curves[i].add(curve[i], 0)
109                    else:
110                        self.curves = curve
111        if len(self.learners):
112            self.infob.setText("%d learners on input." % len(self.learners))
113        else:
114            self.infob.setText("No learners.")
115        self.commitBtn.setEnabled(len(self.learners))
116        if self.data:
117            self.setTable()
118
119OK, this looks like one long and complicated function. But be
120patient! Learning curve is not the simplest widget there is, so
121there's some extra code in the function above to manage the
122information it handles in the appropriate way. To understand the
123signals, though, you should only understand the following. We store
124the learners (objects that learn from data) in the list
125:obj:`self.learners`. The list contains tuples with an id of the
126widget that has sent the learner, and the learner itself. We could
127store such information in a dictionary as well, but for this
128particular widget the order of learners is important, and we thought
129that list is a more appropriate structure.
130
131The function above first checks if the learner sent is empty
132(:obj:`None`). Remember that sending an empty learner
133essentially means that the link with the sending widget was removed,
[11593]134hence we need to remove such learner from our list. If a non-empty
[11049]135learner was sent, then it is either a new learner (say, from a widget
136we have just linked to our learning curve widget), or an update
137version of the previously sent learner. If the later is the case, then
138there is an ID which we already have in the learners list, and we
139need to replace previous information on that learner. If a new learner
140was sent, the case is somehow simpler, and we just add this learner
141and its learning curve to the corresponding variables that hold this
142information.
143
144The function that handles :obj:`learners` as shown above is
145the most complicated function in our learning curve widget. In fact,
146the rest of the widget does some simple GUI management, and calls
147learning curve routines from testing and performance
148scoring functions from stats. I rather like
149the easy by which new scoring functions are added to the widget, since
[11439]150all that is needed is the augmenting the list::
[11049]151
[11593]152    self.scoring = [("Classification Accuracy", Orange.evaluation.scoring.CA),
153                    ("AUC", Orange.evaluation.scoring.AUC),
154                    ("BrierScore", Orange.evaluation.scoring.Brier_score),
155                    ("Information Score", Orange.evaluation.scoring.IS),
156                    ("Sensitivity", Orange.evaluation.scoring.Sensitivity),
157                    ("Specificity", Orange.evaluation.scoring.Specificity)]
[11049]158
159which is defined in the initialization part of the widget. The
160other useful trick in this widget is that evaluation (k-fold cross
161validation) is carried out just once given the learner, data set and
162evaluation parameters, and scores are then derived from class
163probability estimates as obtained from the evaluation procedure. Which
164essentially means that switching from one to another scoring function
165(and displaying the result in the table) takes only a split of a
[11408]166second. To see the rest of the widget, check out
167:download:`its code <OWLearningCurveA.py>`.
[11049]168
169*****************************
170Using Several Output Channels
171*****************************
172
173There's nothing new here, only that we need a widget that has
174several output channels of the same type to illustrate the idea of the
175default channels in the next section. For this purpose, we will modify
176our sampling widget as defined in previous lessons such that it will
177send out the sampled data to one channel, and all other data to
178another channel. The corresponding channel definition of this widget
179is::
180
[11593]181    self.outputs = [("Sampled Data", Orange.data.Table),
182                    ("Other Data", Orange.data.Table)]
[11049]183
[11408]184We used this in the third incarnation of :download:`data sampler widget <OWDataSamplerC.py>`,
[11593]185with essentially the only other change in the code in the :func:`selection` and
186:func:`commit` functions::
[11049]187
188    def selection(self):
[11593]189        indices = Orange.data.sample.SubsetIndices2(p0=self.proportion / 100.)
[11049]190        ind = indices(self.dataset)
191        self.sample = self.dataset.select(ind, 0)
192        self.otherdata = self.dataset.select(ind, 1)
193        self.infob.setText('%d sampled instances' % len(self.sample))
194
195    def commit(self):
196        self.send("Sampled Data", self.sample)
197        self.send("Other Data", self.otherdata)
198
199If a widget that has multiple channels of the same type is
200connected to a widget that accepts such tokens, Orange Canvas opens a
[11593]201window asking the user to confirm which channels to connect. Hence,
202if we have just connected *Data Sampler (C)* widget to a Data Table
203widget in a schema below:
[11049]204
205.. image:: datasampler-totable.png
206
207we would get a following window querying users for information on
208which channels to connect:
209
210.. image:: datasampler-channelquerry.png
211
212*************************************************************
213Default Channels (When Using Input Channels of the Same Type)
214*************************************************************
215
216Now, let's say we want to extend our learning curve widget such
217that it does the learning the same way as it used to, but can -
218provided that such data set is defined - test the
219learners (always) on the same, external data set. That is, besides the
220training data set, we need another channel of the same type but used
221for training data set. Notice, however, that most often we will only
222provide the training data set, so we would not like to be bothered (in
223Orange Canvas) with the dialog which channel to connect to, as the
224training data set channel will be the default one.
225
[11593]226When enlisting the input channel of the same type, the default
[11049]227channels have a special flag in the channel specification list. So for
[11408]228our new :download:`learning curve <OWLearningCurveB.py>` widget, the
[11049]229channel specification is::
230
[11593]231    self.inputs = [("Train Data", Orange.data.Table, self.trainset, Default),
232                   ("Test Data", Orange.data.Table, self.testset),
233                   ("Learner", Orange.classification.Learner, self.learner, Multiple)]
[11049]234
235That is, the :obj:`Train Data` channel is a single-token
236channel which is a default one (third parameter). Note that the flags can
[11439]237be added (or OR-d) together so ``Default + Multiple`` is a valid flag.
[11049]238To test how this works, connect a file widget to a learning curve widget and
239- nothing will really happen:
240
241.. image:: file-to-learningcurveb.png
242
[11593]243That is, no window with a query on which channels to connect to will
244open, as the default *"Train Data"* was selected.
Note: See TracBrowser for help on using the repository browser.