source: orange/Orange/doc/modules/orngTest.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 21.7 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3</HEAD>
4<body>
5<h1>orngTest: Orange Module for Sampling and Testing</h1>
6<index name="sampling techniques">
7
8<p><CODE>orngTest</CODE> is Orange module for testing learning
9algorithms. It includes functions for data sampling and splitting, and
10for testing learners. It implements cross-validation, leave-one out,
11random sampling, learning curves. All functions return result in the
12same for - an instance of <CODE>ExperimentResults</CODE>, described at the end of the page, or, in case
13of learning curves, a list of <CODE>ExperimentResults</CODE>. This
14object(s) can be passed to statistical function for model evaluation
15(classification accuracy, Brier score, ROC analysis...) available in
16module <a href="orngStat.htm"><CODE>orngStat</CODE></a>.</p>
17
18<P>Your scripts will thus basically conduct experiments using
19functions in <CODE>orngTest</CODE>, covered on this page and then
20evaluate the results by functions in <A
21href="orngStat.htm"><CODE>orngStat</CODE></a>. For those interested in
22writing their own statistical measures of the quality of models,
23description of <CODE>TestedExample</CODE> and
24<CODE>ExperimentResults</CODE> are available at the end of this
25page.</P>
26
27<P><B>An important change over previous versions of Orange:</B> Orange
28has been "de-randomized". Running the same script twice will generally
29give the same results, unless special care is taken to randomize
30it. This is opposed to the previous versions where special care needed
31to be taken to make experiments repeatable. See arguments
32<CODE>randseed</CODE> and <CODE>randomGenerator</CODE> for the
33explanation.</P>
34
35<P>Example scripts in this section suppose that the data is loaded and
36a list of learning algorithms is prepared.</P>
37
38<p class=header><a href="test.py">part of test.py</a>
39(uses <a href="voting.tab">voting.tab</a>)</p>
40<XMP class=code>import orange, orngTest, orngStat
41
42data = orange.ExampleTable("voting")
43
44bayes = orange.BayesLearner(name = "bayes")
45tree = orange.TreeLearner(name = "tree")
46majority = orange.MajorityLearner(name = "default")
47learners = [bayes, tree, majority]
48
49names = [x.name for x in learners]
50</XMP>
51
52<P>After testing is done, classification accuracies can be computed
53and printed by the following function (function uses list
54<CODE>names</CODE> constructed above).</P>
55
56<XMP class=code>def printResults(res):
57    CAs = orngStat.CA(res, reportSE=1)
58    for i in range(len(names)):
59        print "%s: %5.3f+-%4.3f" % (names[i], CAs[i][0], 1.96*CAs[i][1]),
60    print
61</XMP>
62
63
64<h2>Common Arguments</H2>
65
66<P>Many function in this module use a set of common arguments, which we define here.</P>
67
68<DL class=attributes>
69<DT>learners</DT>
70<DD>A list of learning algorithms. These can be either pure Orange
71objects (such as <CODE>orange.BayesLearner</CODE>) or Python classes
72or functions written in pure Python (anything that can be called with
73the same arguments and results as Orange's classifiers and performs
74similar function).</DD>
75
76<DT>examples, learnset, testset</DT>
77<DD>Examples, given as an <CODE>ExampleTable</CODE> (some functions
78need an undivided set of examples while others need examples that are
79already split into two sets). If examples are weighted, pass them as a
80tuple (<CODE>examples</CODE>, <CODE>weightID</CODE>). Weights are
81respected by learning and testing, but not by sampling. When selecting
8210% of examples, this means 10% by number, not by weights. There is
83also no guarantee that sums of example weights will be (at least
84roughly) equal for folds in cross validation.</DD>
85
86<DT>strat</DT>
87<DD>Tells whether to stratify the random selections. Its default value
88is <CODE>orange.StratifiedIfPossible</CODE> which stratifies
89selections if the class attribute is discrete and has no unknown
90values.</DD>
91
92<!--<DT>baseClass</DT>
93<DD>An index of a base class (default -1, no base class), also called
94a target class. This argument has no effect on the actual testing, but
95is stored in the object which holds the results and is then used in
96computation of several performance scores (like sensitivity and
97specificity) as computed in <a
98href="orngStat.htm"><CODE>orngStat</CODE> module.</DD>-->
99
100<DT>randseed <SPAN class=normalfont>(obsolete: </SPAN>indicesrandseed<span class=normalfont>),</span> randomGenerator</DT>
101<DD>Random seed (<CODE>randseed</CODE>) or random generator
102(<CODE>randomGenerator</CODE>) for random selection of examples. If
103omitted, random seed of 0 is used and the same test will always select
104the same examples from the example set. There are various slightly
105different ways to randomize it.</P>
106<P>
107<UL>
108
109<LI>Set <CODE>randomGenerator</CODE> to
110<CODE>orange.globalRandom</CODE>. The function's selection will depend
111upon Orange's global random generator that is reset (with random seed
1120) when Orange is imported. Script's output will therefore depend upon
113what you did after Orange was first imported in the current Python
114session.
115<XMP class=code>res = orngTest.proportionTest(learners, data, 0.7, randomGenerator = orange.globalRandom)
116</XMP></LI>
117
118<LI>Construct a new <CODE>orange.RandomGenerator</CODE> and use in
119various places and times. The code below, for instance, will produce
120different results in each iteration, but overall the same results each
121time it's run.
122<XMP class=code>for i in range(3):
123    res = orngTest.proportionTest(learners, data, 0.7, randomGenerator = myRandom)
124    printResults(res)
125</XMP>
126
127<LI>Set the random seed (argument <CODE>randseed</CODE>) to a random
128number provided by Python. Python has a global random generator that
129is reset when Python is loaded, using current system time for a
130seed. With this, results will be (in general) different each time the
131script is run.
132<XMP class=code>import random
133for i in range(3):
134    res = orngTest.proportionTest(learners, data, 0.7, randomGenerator = random.randint(0, 100))
135    printResults(res)
136</XMP>
137
138The same module also provides random generators as object, so that you
139can have independent local random generators in case you need
140them.</LI> </UL> </DD>
141
142<DT>pps</DT>
143<DD>A list of preprocessors. It consists of tuples <CODE>(c,
144preprocessor)</CODE>, where c determines whether the preprocessor will
145be applied to learning set ("L"), test set ("T") or to both ("B"). The
146latter is applied first, when the example set is still undivided. The
147"L" and "T" preprocessors are applied on the separated
148subsets. Preprocessing testing examples is allowed only on
149experimental procedures that do not report the
150<CODE>TestedExample</CODE>'s in the same order as examples in the
151original set. The second item in the tuple, <CODE>preprocessor</CODE>
152can be either pure Orange or pure Python preprocessor, that is, any
153function or callable class that accept a table of examples and weight,
154and returns a preprocessed table and weight.</P>
155
156<P>This example will demonstrate the devastating effect of 100% class
157noise on learning.</P>
158<XMP class="code">classnoise = orange.Preprocessor_addClassNoise(proportion=1.0)
159res = orngTest.proportionTest(learners, data, 0.7, 100, pps = [("L", classnoise)])
160</XMP>
161</DD>
162
163<DT>proportions</DT>
164<DD>Gives the proportions of learning examples at which the tests are to be made, where applicable. The default is [0.1, 0.2, ..., 1.0].</DD>
165
166<DT>storeClassifiers <SPAN class=normalfont>(keyword argument)</SPAN></DT>
167<DD>If this flag is set, the testing procedure will store the constructed classifiers. For each iteration of the test (eg for each fold in cross validation, for each left out example in leave-one-out...), the list of classifiers is appended to the <CODE>ExperimentResults</CODE>' field <CODE>classifiers</CODE>.
168
169<P>The script below makes 100 repetitions of 70:30 test and store the classifiers it induces.</P>
170
171<XMP class=code>res = orngTest.proportionTest(learners, data, 0.7, 100, storeClassifier = 1)
172</XMP>
173
174<P>After this, <CODE>res.classifiers</CODE> is a list of 100 items and each item will be a list with three classifiers.</P>
175</DD>
176
177<DT>verbose <SPAN class=normalfont>(keyword argument)</SPAN></DT>
178<DD>Several functions can report their progress if you add a keyword argument <CODE>verbose=1</CODE></DD>
179</DL>
180
181
182<H2>Sampling and Testing Functions</H2>
183
184
185<DL class=attributes>
186<DT>proportionTest(learners, data, learnProp, times = 10,
187strat = ..., pps = [])</DT>
188<DD>Splits the data with <CODE>learnProp</CODE> of examples in the learning and the rest in the testing set. The test is repeated for a given number of <CODE>times</CODE> (default 10). Division is stratified by default. Function also accepts keyword arguments for randomization and storing classifiers.</P>
189
190<P>100 repetitions of the so-called 70:30 test in which 70% of examples are used for training and 30% for testing is done by
191<XMP class=code>res = orngTest.proportionTest(learners, data, 0.7, 100)
192</XMP>
193<P>Note that Python allows naming the arguments; instead of "<CODE>100</CODE>" you can use "<CODE>times = 100</CODE>" to increase the clarity (not so with keyword arguments, such as <CODE>storeClassifiers</CODE>, <CODE>randseed</CODE> or <CODE>verbose</CODE> that must always be given with a name, as shown in examples above).</DD>
194
195<DT><INDEX>leaveOneOut</INDEX>(learners, examples, pps = [])</DT>
196<DD>Performs a leave-one-out experiment with the given list of
197learners and examples. This is equivalent to performing
198len(examples)-fold cross validation. Function accepts additional
199keyword arguments for preprocessing, storing classifiers and verbose
200output.</DD>
201
202<DT><INDEX>crossValidation</INDEX>(learners, examples, folds = 10, strat = ..., pps = [])</DT>
203<DD>Performs a cross validation with the given number of folds.</DD>
204
205
206<DT>testWithIndices(learners, examples, weight, indices, indicesrandseed="*", pps=None)</DT>
207<DD>Performs a cross-validation-like test. The difference is that the
208caller provides indices (each index gives a fold of an example) which
209do not necessarily divide the examples into folds of (approximately)
210same sizes. In fact, function <CODE>crossValidation</CODE> is actually
211written as a single call to <CODE>testWithIndices</CODE>.</P>
212
213<CODE>testWithIndices</CODE> takes care the <CODE>TestedExamples</CODE> are in the same order as the corresponding examples in the original set. Preprocessing of testing examples is thus not allowed. The computed results can be saved in files or loaded therefrom if you add a keyword argument <CODE>cache = 1</CODE>. In this case, you also have to specify the rand seed which was used to compute the indices (argument <CODE>indicesrandseed</CODE>; if you don't there will be no caching.
214</P>
215
216<P>You can request progress reports with a keyword argument <CODE>verbose = 1</CODE>.</P>
217</DD>
218
219
220<DT>learningCurveN(learners, examples, folds = 10, strat = ..., proportions = ..., pps=[])</DT>
221<DD>A simpler interface for function learningCurve (see
222below). Instead of methods for preparing indices, it simply takes
223number of folds and a flag telling whether we want a stratified
224cross-validation or not.  This function does not return a single
225<CODE>ExperimentResults</CODE> but a list of them, one for each
226proportion.</P>
227
228<XMP class=code>prop = [0.2, 0.4, 0.6, 0.8, 1.0]
229res = orngTest.learningCurveN(learners, data, folds = 5, proportions = prop)
230for i, p in enumerate(prop):
231    print "%5.3f:" % p,
232    printResults(res[i])
233</XMP>
234
235<P>Function basically prepares a random generator and example selectors (<CODE>cv</CODE> and <CODE>pick</CODE>, see below) and calls the <CODE>learningCurve</CODE>.</P>
236
237</DD>
238
239
240<DT><INDEX>learningCurve</INDEX>(learners, examples, cv = None, pick = None,
241proportions = ..., pps=[])</DT>
242
243<DD>Computes learning curves using a procedure recommended by Salzberg
244(1997). It first prepares data subsets (folds). For each proportion,
245it performs the cross-validation, but taking only a proportion of
246examples for learning.</P>
247
248<P>Arguments <CODE>cv</CODE> and <CODE>pick</CODE> give the methods
249for preparing indices for cross-validation and random selection of
250learning examples. If they are not given,
251<CODE>orange.MakeRandomIndicesCV</CODE> and
252<CODE>orange.MakeRandomIndices2</CODE> are used, both will be
253stratified and the cross-validation will be 10-fold. Proportions is a
254list of proportions of learning examples.</P>
255
256<P>The function can save time by loading experimental existing data
257for any test that were already conducted and saved. Also, the computed
258results are stored for later use. You can enable this by adding a
259keyword argument <CODE>cache=1</CODE>. Another keyword deals with
260progress report. If you add <CODE>verbose=1</CODE>, the function will
261print the proportion and the fold number.</p> </DD>
262
263
264<DT>learningCurveWithTestData(learners, learnset, testset, times = 10, proportions = ..., strat = ..., pps=[])</DT>
265
266<DD>This function is suitable for computing a learning curve on
267datasets, where learning and testing examples are split in advance. For
268each proportion of learning examples, it randomly select the requested
269number of learning examples, builds the models and tests them on the
270entire <CODE>testset</CODE>. The whole test is repeated for the given
271number of <CODE>times</CODE> for each proportion. The result is a list
272of <CODE>ExperimentResults</CODE>, one for each proportion.</P>
273
274<P>In the following scripts, examples are pre-divided onto training and testing set. Learning curves are computed in which 20, 40, 60, 80 and 100 percents of the examples in the former set are used for learning and the latter set is used for testing. Random selection of the given proportion of learning set is repeated for five times.
275<XMP class=code>indices = orange.MakeRandomIndices2(data, p0 = 0.7)
276train = data.select(indices, 0)
277test = data.select(indices, 1)
278
279res = orngTest.learningCurveWithTestData(
280         learners, train, test, times = 5, proportions = prop)
281for i, p in enumerate(prop):
282    print "%5.3f:" % p,
283    printResults(res[i])
284</XMP>
285</P>
286</DD>
287
288
289<DT>learnAndTestOnTestData(learners, learnset, testset, testResults=None, iterationNumber=0, pps=[])</DT>
290<DD>This function performs no sampling on its own: two separate datasets need to be passed, one for training and the other for testing. The function preprocesses the data, induces the model and tests it. The order of filters is peculiar, but it makes sense when compared to other methods that support preprocessing of testing examples. The function first applies preprocessors marked "B" (both sets), and only then the preprocessors that need to processor only one of the sets.</P>
291
292<P>You can pass an already initialized <CODE>ExperimentResults</CODE> (argument <CODE>results</CODE>) and an iteration number (<CODE>iterationNumber</CODE>). Results of the test will be appended with the given iteration number. This is because <CODE>learnAndTestWithTestData</CODE> gets called by other functions, like <CODE>proportionTest</CODE> and <CODE>learningCurveWithTestData</CODE>. If you omit the parameters, a new <CODE>ExperimentResults</CODE> will be created.</P>
293</DD>
294
295
296<DT>learnAndTestOnLearnData(learners, learnset, testResults=None, iterationNumber=0, pps=[])</DT>
297<DD>This function is similar to the above, except that it learns and tests on the same data. If first preprocesses the data with "B" preprocessors on the whole data, and afterwards any "L" or "T" preprocessors on separate datasets. Then it induces the model from the learning set and tests it on the testing set.</P>
298
299<P>As with <CODE>learnAndTestOnTestData</CODE>, you can pass an already initialized <CODE>ExperimentResults</CODE> (argument <CODE>results</CODE>) and an iteration number to the function. In this case, results of the test will be appended with the given iteration number.</DD>
300
301
302<DT>testOnData(classifiers, testset, testResults=None, iterationNumber=0)</DT>
303<DD>This function gets a list of classifiers, not learners like the other functions in this module. It classifies each testing example with each classifier. You can pass an existing <CODE>ExperimentResults</CODE> and iteration number, like in <CODE>learnAndTestWithTestData</CODE> (which actually calls <CODE>testWithTestData</CODE>). If you don't, a new <CODE>ExperimentResults</CODE> will be created.</DD>
304
305</DL>
306
307
308<H2>Classes</H2>
309
310<p>Knowing classes <CODE>TestedExample</CODE> that stores results of testing for a single test example and <CODE>ExperimentResults</CODE> that stores a list of <CODE>TestedExample</CODE>s along with some other data on experimental procedures and classifiers used, is important if you would like to write your own measures of quality of models, compatible the sampling infrastructure provided by Orange. If not, you can skip the remainder of this page.</p>
311
312<H3>TestedExample</H3>
313
314<p><INDEX name="classes/TestedExample (in orngTest)">TestedExample stores predictions of different classifiers for a single testing example.</p>
315
316<p class=section>Attributes</p>
317<DL class=attributes>
318<DT>classes</DT>
319<DD>A list of predictions of type <CODE>Value</CODE>, one for each classifier.</DD>
320
321<DT>probabilities</DT>
322<DD>A list of probabilities of classes, one for each classifier.</DD>
323
324<DT>iterationNumber</DT>
325<DD>Iteration number (e.g. fold) in which the <CODE>TestedExample</CODE> was created/tested.</DD>
326
327<DT>actualClass</DT>
328<DD>The correct class of the example</DD>
329
330<DT>weight</DT>
331<DD>Example's weight. Even if the example set was not weighted, this attribute is present and equals 1.0.</DD>
332</DL>
333
334
335<p class=section>Methods</p>
336<DL class=attributes>
337<DT>__init__(iterationNumber = None, actualClass = None, n = 0)</DT>
338<DD>Constructs and initializes a new <CODE>TestExample</CODE>.</DD>
339
340<DT>addResult(aclass, aprob)</DT>
341<DD>Appends a new result (class and probability prediction by a single classifier) to the classes and probabilities field.</DD>
342
343<DT>setResult(i, aclass, aprob)</DT>
344<DD>Sets the result of the i-th classifier to the given values.</DD>
345</DL>
346
347
348<H3>ExperimentResults</H3>
349
350<p><INDEX name="classes/ExperimentResults (in orngTest)">ExperimentResults stores results of one or more repetitions of some test (cross
351validation, repeated sampling...) under the same circumstances.</p>
352
353<p class=section>Attributes</p>
354<DL class=attributes>
355<DT>results</DT>
356<DD>A list of instances of <CODE>TestedExample</CODE>, one for each example in the dataset.</DD>
357
358<DT>classifiers</DT>
359<DD>A list of classifiers, one element for each repetition (eg fold). Each element is a list of classifiers, one for each learner. This field is used only if storing is enabled by <code>storeClassifiers = 1</code>.</DD>
360
361<DT>numberOfIterations</DT>
362<DD>Number of iterations. This can be number of folds (in cross validation) or number of repetitions of some test. <CODE>TestedExample</CODE>'s attribute iterationNumber should be in range <CODE>[0, numberOfIterations-1]</CODE>.</DD>
363
364<DT>numberOfLearners</DT>
365<DD>Number of learners. Lengths of lists <CODE>classes</CODE> and <CODE>probabilities</CODE> in each <CODE>TestedExample</CODE> should equal <CODE>numberOfLearners</CODE>.</DD>
366
367<DT>loaded</DT>
368<DD>If the experimental method supports caching and there are no obstacles for caching (such as unknown random seeds), this is a list of boolean values. Each element corresponds to a classifier and tells whether the experimental results for that classifier were computed or loaded from the cache.</DD>
369
370<DT>weights</DT>
371<DD>A flag telling whether the results are weighted. If <CODE>false</CODE>, weights are still present in <CODE>TestedExamples</CODE>, but they are all 1.0. Clear this flag, if your experimental procedure ran on weighted testing examples but you would like to ignore the weights in statistics.</DD>
372</DL>
373
374<p class=section>Methods</p>
375
376<DL class=attributes>
377<DT>__init__(iterations, learners, weights)</DT>
378<DD>Initializes the object and sets the number of iterations, learners and the flag telling whether <CODE>TestedExamples</CODE> will be weighted.</DD>
379
380<DT>saveToFiles(lrn, filename), loadFromFiles(lrn, filename)</DT>
381<DD>Saves and load testing results. <CODE>lrn</CODE> is a list of learners and filename is a template for the filename. The attribute loaded is initialized so that it contains 1's for the learners whose data was loaded and 0's for learners which need to be tested. The function returns 1 if all the files were found and loaded, and 0 otherwise.</p>
382
383<p>The data is saved in a separate file for each classifier. The file is a binary pickle file containing a list of tuples<BR>
384<code>((x.actualClass, x.iterationNumber), (x.classes[i], x.probabilities[i]))</code><BR> where x is a <CODE>TestedExample</CODE> and <CODE>i</CODE> is the index of a learner.</P>
385
386<p>The file resides in the directory <CODE>./cache</CODE>. Its name
387consists of a template, given by a caller. The filename should contain
388a <CODE>%s</CODE> which is replaced by <CODE>name</CODE>,
389<CODE>shortDescription</CODE>, <CODE>description</CODE>,
390<CODE>func_doc</CODE> or <CODE>func_name</CODE> (in that order)
391attribute of the learner (this gets extracted by
392<CODE>orngMisc.getobjectname</CODE>). If a learner has none of these
393attributes, its class name is used.</P>
394
395<p>Filename should include enough data to make sure that it indeed
396contains the right experimental results. The function
397<CODE>learningCurve</CODE>, for example, forms the name of the file
398from a string "<CODE>{learningCurve}</CODE>", the proportion of
399learning examples, random seeds for cross-validation and learning set
400selection, a list of preprocessors' names and a checksum for
401examples. Of course you can outsmart this, but it should suffice in
402most cases.</p>
403
404<DT>remove(i)</DT>
405<DD>Removes the results for the <code>i</code>-th learner.</DD>
406
407<DT>add(results, index, replace=-1)</DT>
408
409<DD>Appends the results of the <code>index</code>-th learner, or uses
410it to replace the results of the learner with the index
411<code>replace</code> if <code>replace</code> is a valid index. Assumes
412that <code>results</code> came from evaluation on the same data set
413using the same testing technique (same number of iterations). </DD>
414
415</DL>
416
417<hr>
418
419<H2>References</H2>
420
421<p>Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid
422and a recommended approach. <EM>Data Mining and Knowledge Discovery
4231</EM>, pages 317-328.</P>
424
425</BODY>
Note: See TracBrowser for help on using the repository browser.