# source:orange/Orange/doc/modules/orngStat.htm@9671:a7b056375472

Revision 9671:a7b056375472, 27.9 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line
4<body>
5
6<h1>orngStat: Orange Statistics for Predictors</h1>
7<index name="modules/performance of classifiers">
8<index name="modules/performance of regressors">
9<index name="classifiers/accuracy of">
10<index name="regression/evaluation of">
11
12<P>This module contains various measures of quality for classification
13and regression. Most functions require an argument named
14<code>res</code>, an instance of <code><INDEX
15name="classes/ExperimentResults (in
16orngTest)">ExperimentResults</code> as computed by functions from <a
17href="orngTest.htm">orngTest</a> and which contains predictions
18obtained through cross-validation, leave one-out, testing on training
19data or test set examples.</P>
20
21<h2>Classification</h2>
22
24voting data set (problem of predicting the congressman's party
25(republican, democrat) based on a selection of votes) and evaluate
26naive bayesian learner, classification trees and majority classifier
27using cross-validation. For examples requiring a multivalued class
28problem, we shall do the same with the vehicle data set (telling
29whether a vehicle described by the features extracted from a picture
30is a van, bus, or Opel or Saab car).</P>
31
33<xmp class="code">import orange, orngTest, orngTree
34
35learners = [orange.BayesLearner(name = "bayes"),
36            orngTree.TreeLearner(name="tree"),
37            orange.MajorityLearner(name="majrty")]
38
39voting = orange.ExampleTable("voting")
40res = orngTest.crossValidation(learners, voting)
41
42vehicle = orange.ExampleTable("vehicle")
43resVeh = orngTest.crossValidation(learners, vehicle)
44</xmp>
45
46<P>If examples are weighted, weights are taken into account. This can
47be disabled by giving <code>unweighted=1</code> as a keyword
48argument. Another way of disabling weights is to clear the
49<code>ExperimentResults</code>' flag <code>weights</code>.</P>
50
51
52<H3>General Measures of Quality</H3>
53
54<DL class="attributes">
55
56<DT>CA(res, reportSE=False)
57<index name="performance scores+classification accuracy"></DT>
58
59<DD>
60<P>Computes classification accuracy, i.e. percentage of matches
61between predicted and actual class. The function returns a list of
62classification accuracies of all classifiers tested. If
63<code>reportSE</code> is set to true, the list will contain tuples
64with accuracies and standard errors.</P>
65
66<P>If results are from multiple repetitions of experiments (like those
67returned by orngTest.crossValidation or orngTest.proportionTest) the
68standard error (SE) is estimated from deviation of classification
69accuracy accross folds (SD), as SE = SD/sqrt(N), where N is number of
70repetitions (e.g. number of folds).</P>
71
72<P>If results are from a single repetition, we assume independency of
73examples and treat the classification accuracy as distributed
74according to binomial distribution. This can be approximated by normal
75distribution, so we report the SE of sqrt(CA*(1-CA)/N), where CA is
76classification accuracy and N is number of test examples.</P>
77
78<P>Instead of <code>ExperimentResults</code>, this function can be
79given a list of confusion matrices (see below). Standard errors are in
80this case estimated using the latter method.</P>
81
82</DD>
83
84<DT>AP(res, reportSE=False)</DT>
85<index name="performance scores+average probability of the correct class">
86<DD>Computes the average probability assigned to the correct class.</DD><P>
87
88<DT>BrierScore(res, reportSE=False)
89<index name="performance scores+Brier score"></DT>
90<DD>Computes the Brier's score, defined as the average (over test
91examples) of sum<SUB>x</SUB>(t(x)-p(x))<SUP>2</SUP>, where x is a
92class, t(x) is 1 for the correct class and 0 for the others, and p(x)
93is the probability that the classifier assigned to the class
94x.</DD><P>
95
96<DT>IS(res, apriori=None, reportSE=False) <index name="performance
97scores+information score"></DT> <DD>Computes the information score as
98defined by <a
100and Bratko (1991)</a>. Argument 'apriori' gives the apriori class
101distribution; if it is omitted, the class distribution is computed
102from the actual classes of examples in res.</DD><P>
103
104</DL>
105
106<P>So, let's compute all this and print it out.</P>
107
109<xmp class="code">import orngStat
110
111CAs = orngStat.CA(res)
112APs = orngStat.AP(res)
113Briers = orngStat.BrierScore(res)
114ISs = orngStat.IS(res)
115
116print
117print "method\tCA\tAP\tBrier\tIS"
118for l in range(len(learners)):
119    print "%s\t%5.3f\t%5.3f\t%5.3f\t%6.3f" % (learners[l].name, CAs[l], APs[l], Briers[l], ISs[l])
120</xmp>
121
122<P>The output should look like this.</P>
123<xmp class="code">method  CA  AP  Brier   IS
124bayes   0.903   0.902   0.175    0.759
125tree    0.846   0.845   0.286    0.641
126majrty  0.614   0.526   0.474   -0.000
127</xmp>
128
129<P>Script <a href="statExamples.py">statExamples.py</a> contains another example that also prints out the standard errors.</P>
130
131<H3>Confusion Matrix</H3>
132<index name="performance scores+confusion matrix">
133
134<DL class="attributes">
135
136<DT>confusionMatrices(res, classIndex=-1, {cutoff})</DT>
137
138<DD><P>This function can compute two different forms of confusion
139matrix: one in which a certain class is marked as positive and the
140other(s) negative, and another in which no class is singled out. The
141way to specify what we want is somewhat confusing due to backward
142compatibility issues.</P>
143
144<br>
145
146<p><b>A positive-negative confusion matrix</b> is computed (a) if the
147class is binary unless <code>classIndex</code> argument is -2, (b) if
148the class is multivalued and the <code>classIndex</code> is
149non-negative. Argument <code>classIndex</code> then tells which class
150is positive. In case (a), <code>classIndex</code> may be omited; the
151first class is then negative and the second is positive, unless the
152<code>baseClass</code> attribute in the object with results has
153non-negative value. In that case, <code>baseClass</code> is an index
154of the traget class. <code>baseClass</code> attribute of results
155object should be set manually. The result of a function is a list
156of instances of class <code>ConfusionMatrix</code>, containing the
157(weighted) number of true positives (<code>TP</code>), false negatives
158(<code>FN</code>), false positives (<code>FP</code>) and true
159negatives (<code>TN</code>).</P>
160
161<br>
162
163<P>We can also add the keyword argument <code>cutoff</code>
164(e.g. <code>confusionMatrices(results, cutoff=0.3)</code>; if we do,
165<code>confusionMatrices</code> will disregard the classifiers' class
166predictions and observe the predicted probabilities, and consider the
167prediction "positive" if the predicted probability of the positive
168class is higher than the cutoff.</P>
169
170<br>
171
172<P>The example below shows how setting the cut off threshold from the
173default 0.5 to 0.2 affects the confusion matrics for naive Bayesian
174classifier.</P>
175
177<xmp class="code">cm = orngStat.confusionMatrices(res)[0]
178print "Confusion matrix for naive Bayes:"
179print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
180
181cm = orngStat.confusionMatrices(res, cutoff=0.2)[0]
182print "Confusion matrix for naive Bayes:"
183print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
184</xmp>
185
186<P>The output,
187<xmp>Confusion matrix for naive Bayes:
188TP: 238, FP: 13, FN: 29.0, TN: 155
189Confusion matrix for naive Bayes:
190TP: 239, FP: 18, FN: 28.0, TN: 150</xmp>
191shows that the number of true positives increases (and hence the number of false negatives decreases) by only a single example, while five examples that were originally true negatives become false positives due to the lower threshold.</P>
192
193<br>
194
195<P>To observe how good are the classifiers in detecting vans in the vehicle data set, we would compute the matrix like this:
196<xmp class="code">cm = orngStat.confusionMatrices(resVeh, vehicle.domain.classVar.values.index("van"))</xmp>
197and get the results like these
198<xmp>TP: 189, FP: 241, FN: 10.0, TN: 406</xmp>
199while the same for class "opel" would give
200<xmp>TP: 86, FP: 112, FN: 126.0, TN: 522</xmp>
201The main difference is that there are only a few false negatives for the van, meaning that the classifier seldom misses it (if it says it's not a van, it's almost certainly not a van). Not so for the Opel car, where the classifier missed 126 of them and correctly detected only 86.
202</P>
203
204<br>
205<P><b>General confusion matrix</b> is computed (a) in case of a binary class, when <code>classIndex</code> is set to -2, (b) when we have multivalued class and the caller doesn't specify the <code>classIndex</code> of the positive class. When called in this manner, the function cannot use the argument <code>cutoff</code>.</P>
206
207<P>The function then returns a three-dimensional matrix, where the element <em>A[learner][actualClass][predictedClass]</em> gives the number of examples belonging to 'actualClass' for which the 'learner' predicted 'predictedClass'. We shall compute and print out the matrix for naive Bayesian classifier.
209<xmp class="code">cm = orngStat.confusionMatrices(resVeh)[0]
210classes = vehicle.domain.classVar.values
211print "\t"+"\t".join(classes)
212for className, classConfusions in zip(classes, cm):
213    print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions))
214</xmp></P>
215
216<P><small>Sorry for the language, but it's time you learn to talk
217dirty in Python, too. <code>"\t".join(classes)</code> will join the
218strings from list <code>classes</code> by putting tabulators between
219them. <code>zip</code> merges to lists, element by element, hence it
220will create a list of tuples containing a class name from
221<code>classes</code> and a list telling how many examples from this
222class were classified into each possible class. Finally, the format
223string consists of a <code>%s</code> for the class name and one
224tabulator and <code>%i</code> for each class. The data we provide for
225this format string is <code>(className, )</code> (a tuple containing
226the class name), plus the misclassification list converted to a
227tuple.</small></P>
228
229<br>
230
231<P>So, here's what this nice piece of code gives:
232<xmp>       bus   van  saab opel
233bus     56   95   21   46
234van     6    189  4    0
235saab    3    75   73   66
236opel    4    71   51   86
237</xmp>
238
239<p>Van's are clearly simple: 189 vans were classified as vans (we know
240this already, we've printed it out above), and the 10 misclassified
241pictures were classified as buses (6) and Saab cars (4). In all other
242classes, there were more examples misclassified as vans than correctly
243classified examples. The classifier is obviously quite biased to
244vans.</P>
245
246</P>
247
248
249</DD>
250
251<dt>sens(confm), spec(confm), PPV(confm), NPV(confm), precision(confm), recall(confm), F2(confm), Falpha(confm, alpha=2.0), MCC(conf)
252<index name="performance scores+sensitivity">
253<index name="performance scores+specificity">
254<index name="performance scores+positive prediction value">
255<index name="performance scores+negative prediction value">
256<index name="performance scores+precision">
257<index name="performance scores+recall">
258<index name="performance scores+">
259<index name="performance scores+F-measure">
260<index name="performance scores+F1">
261<index name="performance scores+Falpha"></dt>
262
263<dd><p>With the confusion matrix defined in terms of positive and
264negative classes, you can also compute the <a
265href="http://en.wikipedia.org/wiki/Sensitivity_(tests)">sensitivity</a>
266[TP/(TP+FN)], <a
267href="http://en.wikipedia.org/wiki/Specificity_%28tests%29">specificity</a>
268[TN/(TN+FP)], <a
269href="http://en.wikipedia.org/wiki/Positive_predictive_value">positive
270predictive value</a> [TP/(TP+FP)] and <a
271href="http://en.wikipedia.org/wiki/Negative_predictive_value">negative
272predictive value</a> [TN/(TN+FN)]. In information retrieval, positive
273predictive value is called precision (the ratio of the number of
274relevant records retrieved to the total number of irrelevant and
275relevant records retrieved), and sensitivity is called <a
276href="http://en.wikipedia.org/wiki/Information_retrieval">recall</a>
277(the ratio of the number of relevant records retrieved to the total
278number of relevant records in the database). The <a
279href="http://en.wikipedia.org/wiki/Harmonic_mean">harmonic mean</a> of
280precision and recall is called an <a
281href="http://en.wikipedia.org/wiki/F-measure">F-measure</a>, where,
282depending on the ratio of the weight between precision and recall is
283implemented as <code>F1</code> [2*precision*recall/(precision+recall)]
284or, for a general case, <code>Falpha</code>
285[(1+alpha)*precision*recall / (alpha*precision + recall)].  The
286[http://en.wikipedia.org/wiki/Matthews_correlation_coefficient
287Matthews correlation coefficient] in essence a correlation coefficient
288between the observed and predicted binary classifications; it returns
289a value between -1 and +1. A coefficient of +1 represents a perfect
290prediction, 0 an average random prediction and -1 an inverse
291prediction.</p> <br>
292
293<p>If the argument <code>confm</code> is a single confusion matrix, a
294single result (a number) is returned. If <code>confm</code> is a list
295of confusion matrices, a list of scores is returned, one for each
296confusion matrix.</p>
297
298<br>
299
300<P>Note that weights are taken into account when computing the matrix, so
301these functions don't check the 'weighted' keyword argument.</p>
302
303<P>Let us print out sensitivities and specificities of our classifiers.</P>
304
306<xmp class="code">cm = orngStat.confusionMatrices(res)
307print
308print "method\tsens\tspec"
309for l in range(len(learners)):
310    print "%s\t%5.3f\t%5.3f" % (learners[l].name, orngStat.sens(cm[l]), orngStat.spec(cm[l]))
311</xmp>
312</dd>
313
314</DL>
315
316
317<H3>ROC Analysis</H3>
318<index name="performance scores+ROC analysis">
319<index name="performance scores+AUC">
320
321<P><a
323Operating Characteristic</a> (ROC) analysis was initially developed
324for a binary-like problems and there is no consensus on how to apply
325it in multi-class problems, nor do we know for sure how to do ROC
326analysis after cross validation and similar multiple sampling
327techniques. If you are interested in the area under the curve,
328function <code>AUC</code> will deal with those problems as
329specifically described below.</P>
330
331<DL class="attributes">
332<DT>AUC(res, method = AUC.ByWeightedPairs)</DT>
333<dd>Returns the area under ROC curve (AUC) given a set of experimental results. For multivalued class problems, it will compute some sort of average, as specified by the argument <code>method</code>:
334
335  <dl>
336  <dt><code>AUC.ByWeightedPairs</code> (or <code>0</code>)</dt>
337  <dd>Computes AUC for each pair of classes (ignoring examples of all other classes) and averages the results, weighting them by the number of pairs of examples from these two classes (e.g. by the product of probabilities of the two classes). AUC computed in this way still behaves as concordance index, e.g., gives the probability that two randomly chosen examples from different classes will be correctly recognized (this is of course true only if the classifier <em>knows</em> from which two classes the examples came).</dd>
338
339  <dt><code>AUC.ByPairs</code> (or <code>1</code>)</dt>
340  <dd>Similar as above, except that the average over class pairs is not weighted. This AUC is, like the binary, independent of class distributions, but it is not related to concordance index any more.</dd>
341
342  <dt><code>AUC.WeightedOneAgainstAll</code> (or <code>2</code>)</dt>
343  <dd>For each class, it computes AUC for this class against all others (that is, treating other classes as one class). The AUCs are then averaged by the class probabilities. This is related to concordance index in which we test the classifier's (average) capability for distinguishing the examples from a specified class from those that come from other classes. Unlike the binary AUC, the measure is not independent of class distributions.</dd>
344
345  <dt><code>AUC.OneAgainstAll</code> (or <code>3</code>)</dt>
346  <dd>As above, except that the average is not weighted.</dd>
347  </dl>
348
349<P>In case of <em>multiple folds</em> (for instance if the data comes from cross validation), the computation goes like this. When computing the partial AUCs for individual pairs of classes or singled-out classes, AUC is computed for each fold separately and then averaged (ignoring the number of examples in each fold, it's just a simple average). However, if a certain fold doesn't contain any examples of a certain class (from the pair), the partial AUC is computed treating the results as if they came from a single-fold. This is not really correct since the class probabilities from different folds are not necessarily comparable, yet this will most often occur in a leave-one-out experiments, comparability shouldn't be a problem.</P>
350
351<p>Computing and printing out the AUC's looks just like printing out classification accuracies (except that we call AUC instead of CA, of course):</p>
353<xmp class="code">AUCs = orngStat.AUC(res)
354for l in range(len(learners)):
355    print "%10s: %5.3f" % (learners[l].name, AUCs[l])
356</xmp>
357
358<P>For vehicle, you can run exactly this same code; it will compute AUCs for all pairs of classes and return the average weighted by probabilities of pairs. Or, you can specify the averaging method yourself, like this
359<xmp class="code">AUCs = orngStat.AUC(resVeh, orngStat.AUC.WeightedOneAgainstAll)</xmp>
360The following snippet tries out all four. (We don't claim that this is how the function needs to be used; it's better to stay with the default.)
362<xmp class="code">methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]
363print " " *25 + "  \tbayes\ttree\tmajority"
364for i in range(4):
365    AUCs = orngStat.AUC(resVeh, i)
366    print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))
367</xmp>
368As you can see from the output,
369<xmp>                            bayes   tree    majority
370       by pairs, weighted:  0.789   0.871   0.500
371                 by pairs:  0.791   0.872   0.500
372    one vs. all, weighted:  0.783   0.800   0.500
373              one vs. all:  0.783   0.800   0.500
374</xmp>
375</dd>
376
377<dt>AUC_single(res, classIndex)</dt>
378<dd>Computes AUC where the class given <code>classIndex</code> is singled out, and all other classes are treated as a single class. To find how good our classifiers are in distinguishing between vans and other vehicle, call the function like this
379<xmp class="code">orngStat.AUC_single(resVeh, classIndex = vehicle.domain.classVar.values.index("van"))</xmp></dd>
380
381<dt>AUC_pair(res, classIndex1, classIndex2)</dt>
382<dd>Computes AUC between a pair of examples, ignoring examples from all other classes.</dd>
383
384<dt>AUC_matrix(res)</dt>
385<dd>Computes a (lower diagonal) matrix with AUCs for all pairs of classes. If there are empty classes, the corresponding elements in the matrix are -1. Remember the beautiful(?) code for printing out the confusion matrix? Here it strikes again:
387<xmp class="code">classes = vehicle.domain.classVar.values
388AUCmatrix = orngStat.AUC_matrix(resVeh)[0]
389print "\t"+"\t".join(classes[:-1])
390for className, AUCrow in zip(classes[1:], AUCmatrix[1:]):
391    print ("%s" + ("\t%5.3f" * len(AUCrow))) % ((className, ) + tuple(AUCrow))
392</xmp>
393</dd>
394</DL>
395
396
397<P>The remaining functions, which plot the curves and statistically
398compare them, require that the results come from a test with a single
399iteration, and they always compare one chosen class against all
400others. If you have cross validation results, you can either use <a
401href="#splitbyiterations"><code>splitByIterations</code></a> to split
402the results by folds, call the function for each fold separately and
403then sum the results up however you see fit, or you can set the
404<code>ExperimentResults</code>' attribute
405<code>numberOfIterations</code> to 1, to cheat the function - at your
406own responsibility for the statistical correctness. Regarding the
407multi-class problems, if you don't chose a specific class,
408<code>orngStat</code> will use the class attribute's
409<CODE>baseValue</CODE> at the time when results were computed. If
410<code>baseValue</code> was not given at that time, 1 (that is, the
411second class) is used as default.</P>
412
413<P>We shall use the following code to prepare suitable experimental results
414<xmp class="code">ri2 = orange.MakeRandomIndices2(voting, 0.6)
415train = voting.selectref(ri2, 0)
416test = voting.selectref(ri2, 1)
417res1 = orngTest.learnAndTestOnTestData(learners, train, test)
418</xmp>
419
420<dl class="attributes"> <DT>AUCWilcoxon(res, classIndex=1)</DT>
421<DD><P>Computes the area under ROC (AUC) and its standard error using
422Wilcoxon's approach proposed by <a
424and McNeal (1982)</a>. If <code>classIndex</code> is not specified,
425the first class is used as "the positive" and others are negative. The
426result is a list of tuples (aROC, standard error).</P>
427
428<P>To compute the AUCs with the corresponding confidence intervals for
429our experimental results, simply call
430<xmp class="code">orngStat.AUCWilcoxon(res1)
431</xmp></P>
432</DD>
433
434<DT>compare2AUCs(res, learner1, learner2, classIndex=1)</DT>
435
436<DD><P>Compares ROC curves of learning algorithms with indices
437<code>learner1</code> and <code>learner2</code>. The function returns
438three tuples, the first two have areas under ROCs and standard errors
439for both learner, and the third is the difference of the areas and its
440standard error: ((AUC1, SE1), (AUC2, SE2), (AUC1-AUC2,
441SE(AUC1)+SE(AUC2)-2*COVAR)).</P>
442<
443<P><B>This function is broken at the moment: it returns some numbers,
444but they're wrong.</B></P>
445</DD><P>
446
447<DT>computeROC(res, classIndex=1)</DT>
448
449<DD><P>Computes a ROC curve as a list of (x, y) tuples, where x is
4501-specificity and y is sensitivity.</P></DD>
451
452<DT>computeCDT(res, classIndex=1), ROCsFromCDT(cdt, {print})</DT>
453
454<DD><P><em>These two functions are obsolete and shouldn't be called. Use <code>AUC</code> instead.</em></P></DD>
455
456<DT><B>AROC(res, classIndex=1), AROCFromCDT(res, {print}), compare2AROCs(res, learner1, learner2, classIndex=1)</B></DT>
457<DD><em>These are all deprecated, too. Instead, use AUCWilcoxon (for AROC),
458AUC (for AROCFromCDT), and compare2AUCs (for compare2AROCs).</em></dd><p>
459
460</DL>
461
462<H3>Comparison of Algorithms</H3>
463
464<DL>
465<DT><B>McNemar(res)</B>
466<index name="performance scores+McNemar test"></DT>
467<DD>Computes a triangular matrix with McNemar statistics for each pair
468of classifiers. The statistics is distributed by chi-square
469distribution with one degree of freedom; critical value for 5%
470significance is around 3.84.</DD>
471<DT><B>McNemarOfTwo(res, learner1, learner2)</B></DT>
472<DD>McNemarOfTwo computes a McNemar statistics for a pair of
473classifier, specified by indices learner1 and learner2.</DD>
474</DL>
475
476<H2>Regression</H2>
477
478<p>Several alternative measures, as given below, can be used to
479evaluate the sucess of numeric prediction:</p>
480
481<img src="orngStat-regression.png">
482
483<dl class="attributes">
484
485<dt>MSE(res)
486<index name="performance scores+mean-squared error"></dt>
487<dd>Computes mean-squared error.</dd>
488
489<dt>RMSE(res)
490<index name="performance scores+root mean-squared error"></dt>
491<dd>Computes root mean-squared error.</dd>
492
493<dt>MAE(res)
494<index name="performance scores+mean absolute error"></dt>
495<dd>Computes mean absolute error.</dd>
496
497<dt>RSE(res)
498<index name="performance scores+relative squared error"></dt>
499<dd>Computes relative squared error.</dd>
500
501<dt>RRSE(res)
502<index name="performance scores+root relative squared error"></dt>
503<dd>Computes root relative squared error.</dd>
504
505<dt>RAE(res)
506<index name="performance scores+relative absolute error"></dt>
507<dd>Computes relative absolute error.</dd>
508
509<dt>R2(res)
510<index name="performance scores+R-squared"></dt>
511<dd>Computes the coefficient of determination, R-squared.</dd>
512
513</dl>
514
515<p> The following code uses most of the above measures to score
516several regression methods.</p>
517
519<xmp class="code">import orange
520import orngRegression as r
521import orngTree
522import orngStat, orngTest
523
524data = orange.ExampleTable("housing")
525
526# definition of regressors
527lr = r.LinearRegressionLearner(name="lr")
528rt = orngTree.TreeLearner(measure="retis", mForPruning=2,
529          minExamples=20, name="rt")
530maj = orange.MajorityLearner(name="maj")
531knn = orange.kNNLearner(k=10, name="knn")
532
533learners = [maj, rt, knn, lr]
534
535# cross validation, selection of scores, report of results
536results = orngTest.crossValidation(learners, data, folds=3)
537scores = [("MSE", orngStat.MSE),   ("RMSE", orngStat.RMSE),
538  ("MAE", orngStat.MAE),   ("RSE", orngStat.RSE),
539  ("RRSE", orngStat.RRSE), ("RAE", orngStat.RAE),
540  ("R2", orngStat.R2)]
541
542print "Learner   " + "".join(["%-8s" % s[0] for s in scores])
543for i in range(len(learners)):
544print "%-8s " % learners[i].name + \
545"".join(["%7.3f " % s[1](results)[i] for s in scores])
546</xmp>
547
548<p>The code above produces the following output:</p>
549
550<xmp class="code">Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
551maj       84.585   9.197   6.653   1.002   1.001   1.001  -0.002
552rt        40.015   6.326   4.592   0.474   0.688   0.691   0.526
553knn       21.248   4.610   2.870   0.252   0.502   0.432   0.748
554lr        24.092   4.908   3.425   0.285   0.534   0.515   0.715
555</xmp>
556
557<H2>Plotting Functions</H2>
558
559<DL>
560<DT><B>graph_ranks(filename, avranks, names, cd=None, lowv=None, highv=None, width=6, textspace=1, reverse=False, cdmethod=None)</B></DT>
561<DD>
562Draws a CD graph, which is used to display  the differences in methods'
563performance. See Janez Demsar, Statistical Comparisons of Classifiers over
564Multiple Data Sets, 7(Jan):1--30, 2006.
565
566<p>Needs matplotlib to work.</p>
567
568<dl>
569<dt>filename</dt><dd>Output file name (with extension). Formats supported by matplotlib can be used.</dd>
570<dt>avranks</dt><dd>List of average methods' ranks.</dd>
571<dt>names</dt><dd>List of methods' names.</dd>
572<dt>cd</dt><dd>Critical difference. Used for marking methods that whose difference is not statistically significant.</dd>
573<dt>lowv</dt><dd>The lowest shown rank, if None, use 1.</dd>
574<dt>highv</dt><dd>he highest shown rank, if None, use len(avranks).</dd>
575<dt>width</dt><dd>Width of the drawn figure in inches, default 6 inches.</dd>
576<dt>textspace</dt><dd>Space on figure sides left for the description of methods, default 1 inch.</dd>
577<dt>reverse</dt><dd>If True, the lowest rank is on the right. Default: False.</dd></dt>
578<dt>cdmethod</dt><dd>None by default. It can be an index of element in avranks or or names which specifies the method which should be marked with an interval. If specified, the interval is marked only around that method. This option is ment to be used with Bonferonni-Dunn test.</dd>
579
580</dl>
581
583<xmp class="code">import orange, orngStat
584
585names = ["first", "third", "second", "fourth" ]
586avranks =  [1.9, 3.2, 2.8, 3.3 ]
587cd = orngStat.compute_CD(avranks, 30) #tested on 30 datasets
588orngStat.graph_ranks("statExamples-graph_ranks1.png", avranks, names, \
589    cd=cd, width=6, textspace=1.5)
590</xmp>
591
592<p>The code above produces the following graph:<br>
593<img src=statExamples-graph_ranks1.png></p>
594
595
596</DD>
597</DL>
598
599<DL>
600<DT><B>compute_CD(avranks, N, alpha=&quot;0.05&quot;, type=&quot;nemenyi&quot;)</B></DT>
601<DD> Returns critical difference for Nemenyi or Bonferroni-Dunn test according to given alpha (either alpha=&quot;0.05&quot; or alpha=&quot;0.1&quot;) for average ranks and number of tested data sets N. Type can be either &quot;nemenyi&quot; for for Nemenyi two tailed test or &quot;bonferroni-dunn&quot; for Bonferroni-Dunn test.
602</DD>
603</DL>
604
605<DL>
606<DT><B>compute_friedman(avranks, N)</B></DT>
607<DD> Returns a tuple composed of (friedman statistic, degrees of freedom)
608and (Iman statistic - F-distribution, degrees of freedoma) given average ranks and a number of tested data sets N.
609</DD>
610</DL>
611
612
613<H2>Utility Functions</H2>
614
615<DL>
616<DT><B>splitByIterations(res)</B></DT>
617<DD>Splits ExperimentResults of multiple iteratation test into a list
618of ExperimentResults, one for each iteration.</DD>
619</DL>
Note: See TracBrowser for help on using the repository browser.