source: orange/orange/Orange/evaluation/scoring.py @ 9550:bd71f96b33d5

Revision 9550:bd71f96b33d5, 100.1 KB checked in by lanz <lan.zagar@…>, 2 years ago (diff)

Documentation fixes.

Line 
1"""
2############################
3Method scoring (``scoring``)
4############################
5
6.. index: scoring
7
8This module contains various measures of quality for classification and
9regression. Most functions require an argument named :obj:`res`, an instance of
10:class:`Orange.evaluation.testing.ExperimentResults` as computed by
11functions from :mod:`Orange.evaluation.testing` and which contains
12predictions obtained through cross-validation,
13leave one-out, testing on training data or test set instances.
14
15==============
16Classification
17==============
18
19To prepare some data for examples on this page, we shall load the voting data
20set (problem of predicting the congressman's party (republican, democrat)
21based on a selection of votes) and evaluate naive Bayesian learner,
22classification trees and majority classifier using cross-validation.
23For examples requiring a multivalued class problem, we shall do the same
24with the vehicle data set (telling whether a vehicle described by the features
25extracted from a picture is a van, bus, or Opel or Saab car).
26
27Basic cross validation example is shown in the following part of
28(:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`):
29
30.. literalinclude:: code/statExample0.py
31
32If instances are weighted, weights are taken into account. This can be
33disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of
34disabling weights is to clear the
35:class:`Orange.evaluation.testing.ExperimentResults`' flag weights.
36
37General Measures of Quality
38===========================
39
40.. autofunction:: CA
41
42.. autofunction:: AP
43
44.. autofunction:: Brier_score
45
46.. autofunction:: IS
47
48So, let's compute all this in part of
49(:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`) and print it out:
50
51.. literalinclude:: code/statExample1.py
52   :lines: 13-
53
54The output should look like this::
55
56    method  CA      AP      Brier    IS
57    bayes   0.903   0.902   0.175    0.759
58    tree    0.846   0.845   0.286    0.641
59    majrty  0.614   0.526   0.474   -0.000
60
61Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out
62the standard errors.
63
64Confusion Matrix
65================
66
67.. autofunction:: confusion_matrices
68
69   **A positive-negative confusion matrix** is computed (a) if the class is
70   binary unless :obj:`classIndex` argument is -2, (b) if the class is
71   multivalued and the :obj:`classIndex` is non-negative. Argument
72   :obj:`classIndex` then tells which class is positive. In case (a),
73   :obj:`classIndex` may be omitted; the first class
74   is then negative and the second is positive, unless the :obj:`baseClass`
75   attribute in the object with results has non-negative value. In that case,
76   :obj:`baseClass` is an index of the target class. :obj:`baseClass`
77   attribute of results object should be set manually. The result of a
78   function is a list of instances of class :class:`ConfusionMatrix`,
79   containing the (weighted) number of true positives (TP), false
80   negatives (FN), false positives (FP) and true negatives (TN).
81   
82   We can also add the keyword argument :obj:`cutoff`
83   (e.g. confusion_matrices(results, cutoff=0.3); if we do, :obj:`confusion_matrices`
84   will disregard the classifiers' class predictions and observe the predicted
85   probabilities, and consider the prediction "positive" if the predicted
86   probability of the positive class is higher than the :obj:`cutoff`.
87
88   The example (part of :download:`statExamples.py <code/statExamples.py>`) below shows how setting the
89   cut off threshold from the default 0.5 to 0.2 affects the confusion matrics
90   for naive Bayesian classifier::
91   
92       cm = Orange.evaluation.scoring.confusion_matrices(res)[0]
93       print "Confusion matrix for naive Bayes:"
94       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
95       
96       cm = Orange.evaluation.scoring.confusion_matrices(res, cutoff=0.2)[0]
97       print "Confusion matrix for naive Bayes:"
98       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
99
100   The output::
101   
102       Confusion matrix for naive Bayes:
103       TP: 238, FP: 13, FN: 29.0, TN: 155
104       Confusion matrix for naive Bayes:
105       TP: 239, FP: 18, FN: 28.0, TN: 150
106   
107   shows that the number of true positives increases (and hence the number of
108   false negatives decreases) by only a single instance, while five instances
109   that were originally true negatives become false positives due to the
110   lower threshold.
111   
112   To observe how good are the classifiers in detecting vans in the vehicle
113   data set, we would compute the matrix like this::
114   
115      cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \
116vehicle.domain.classVar.values.index("van"))
117   
118   and get the results like these::
119   
120       TP: 189, FP: 241, FN: 10.0, TN: 406
121   
122   while the same for class "opel" would give::
123   
124       TP: 86, FP: 112, FN: 126.0, TN: 522
125       
126   The main difference is that there are only a few false negatives for the
127   van, meaning that the classifier seldom misses it (if it says it's not a
128   van, it's almost certainly not a van). Not so for the Opel car, where the
129   classifier missed 126 of them and correctly detected only 86.
130   
131   **General confusion matrix** is computed (a) in case of a binary class,
132   when :obj:`classIndex` is set to -2, (b) when we have multivalued class and
133   the caller doesn't specify the :obj:`classIndex` of the positive class.
134   When called in this manner, the function cannot use the argument
135   :obj:`cutoff`.
136   
137   The function then returns a three-dimensional matrix, where the element
138   A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`]
139   gives the number of instances belonging to 'actual_class' for which the
140   'learner' predicted 'predictedClass'. We shall compute and print out
141   the matrix for naive Bayesian classifier.
142   
143   Here we see another example from :download:`statExamples.py <code/statExamples.py>`::
144   
145       cm = Orange.evaluation.scoring.confusion_matrices(resVeh)[0]
146       classes = vehicle.domain.classVar.values
147       print "\t"+"\t".join(classes)
148       for className, classConfusions in zip(classes, cm):
149           print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions))
150   
151   So, here's what this nice piece of code gives::
152   
153              bus   van  saab opel
154       bus     56   95   21   46
155       van     6    189  4    0
156       saab    3    75   73   66
157       opel    4    71   51   86
158       
159   Van's are clearly simple: 189 vans were classified as vans (we know this
160   already, we've printed it out above), and the 10 misclassified pictures
161   were classified as buses (6) and Saab cars (4). In all other classes,
162   there were more instances misclassified as vans than correctly classified
163   instances. The classifier is obviously quite biased to vans.
164   
165   .. method:: sens(confm)
166   .. method:: spec(confm)
167   .. method:: PPV(confm)
168   .. method:: NPV(confm)
169   .. method:: precision(confm)
170   .. method:: recall(confm)
171   .. method:: F2(confm)
172   .. method:: Falpha(confm, alpha=2.0)
173   .. method:: MCC(conf)
174
175   With the confusion matrix defined in terms of positive and negative
176   classes, you can also compute the
177   `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_(tests)>`_
178   [TP/(TP+FN)], `specificity \
179<http://en.wikipedia.org/wiki/Specificity_%28tests%29>`_
180   [TN/(TN+FP)], `positive predictive value \
181<http://en.wikipedia.org/wiki/Positive_predictive_value>`_
182   [TP/(TP+FP)] and `negative predictive value \
183<http://en.wikipedia.org/wiki/Negative_predictive_value>`_ [TN/(TN+FN)].
184   In information retrieval, positive predictive value is called precision
185   (the ratio of the number of relevant records retrieved to the total number
186   of irrelevant and relevant records retrieved), and sensitivity is called
187   `recall <http://en.wikipedia.org/wiki/Information_retrieval>`_
188   (the ratio of the number of relevant records retrieved to the total number
189   of relevant records in the database). The
190   `harmonic mean <http://en.wikipedia.org/wiki/Harmonic_mean>`_ of precision
191   and recall is called an
192   `F-measure <http://en.wikipedia.org/wiki/F-measure>`_, where, depending
193   on the ratio of the weight between precision and recall is implemented
194   as F1 [2*precision*recall/(precision+recall)] or, for a general case,
195   Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)].
196   The `Matthews correlation coefficient \
197<http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
198   in essence a correlation coefficient between
199   the observed and predicted binary classifications; it returns a value
200   between -1 and +1. A coefficient of +1 represents a perfect prediction,
201   0 an average random prediction and -1 an inverse prediction.
202   
203   If the argument :obj:`confm` is a single confusion matrix, a single
204   result (a number) is returned. If confm is a list of confusion matrices,
205   a list of scores is returned, one for each confusion matrix.
206   
207   Note that weights are taken into account when computing the matrix, so
208   these functions don't check the 'weighted' keyword argument.
209   
210   Let us print out sensitivities and specificities of our classifiers in
211   part of :download:`statExamples.py <code/statExamples.py>`::
212   
213       cm = Orange.evaluation.scoring.confusion_matrices(res)
214       print
215       print "method\tsens\tspec"
216       for l in range(len(learners)):
217           print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l]))
218   
219ROC Analysis
220============
221
222`Receiver Operating Characteristic \
223<http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
224(ROC) analysis was initially developed for
225a binary-like problems and there is no consensus on how to apply it in
226multi-class problems, nor do we know for sure how to do ROC analysis after
227cross validation and similar multiple sampling techniques. If you are
228interested in the area under the curve, function AUC will deal with those
229problems as specifically described below.
230
231.. autofunction:: AUC
232   
233   .. attribute:: AUC.ByWeightedPairs (or 0)
234     
235      Computes AUC for each pair of classes (ignoring instances of all other
236      classes) and averages the results, weighting them by the number of
237      pairs of instances from these two classes (e.g. by the product of
238      probabilities of the two classes). AUC computed in this way still
239      behaves as concordance index, e.g., gives the probability that two
240      randomly chosen instances from different classes will be correctly
241      recognized (this is of course true only if the classifier knows
242      from which two classes the instances came).
243   
244   .. attribute:: AUC.ByPairs (or 1)
245   
246      Similar as above, except that the average over class pairs is not
247      weighted. This AUC is, like the binary, independent of class
248      distributions, but it is not related to concordance index any more.
249     
250   .. attribute:: AUC.WeightedOneAgainstAll (or 2)
251     
252      For each class, it computes AUC for this class against all others (that
253      is, treating other classes as one class). The AUCs are then averaged by
254      the class probabilities. This is related to concordance index in which
255      we test the classifier's (average) capability for distinguishing the
256      instances from a specified class from those that come from other classes.
257      Unlike the binary AUC, the measure is not independent of class
258      distributions.
259     
260   .. attribute:: AUC.OneAgainstAll (or 3)
261   
262      As above, except that the average is not weighted.
263   
264   In case of multiple folds (for instance if the data comes from cross
265   validation), the computation goes like this. When computing the partial
266   AUCs for individual pairs of classes or singled-out classes, AUC is
267   computed for each fold separately and then averaged (ignoring the number
268   of instances in each fold, it's just a simple average). However, if a
269   certain fold doesn't contain any instances of a certain class (from the
270   pair), the partial AUC is computed treating the results as if they came
271   from a single-fold. This is not really correct since the class
272   probabilities from different folds are not necessarily comparable,
273   yet this will most often occur in a leave-one-out experiments,
274   comparability shouldn't be a problem.
275   
276   Computing and printing out the AUC's looks just like printing out
277   classification accuracies (except that we call AUC instead of
278   CA, of course)::
279   
280       AUCs = Orange.evaluation.scoring.AUC(res)
281       for l in range(len(learners)):
282           print "%10s: %5.3f" % (learners[l].name, AUCs[l])
283           
284   For vehicle, you can run exactly this same code; it will compute AUCs
285   for all pairs of classes and return the average weighted by probabilities
286   of pairs. Or, you can specify the averaging method yourself, like this::
287   
288       AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll)
289   
290   The following snippet tries out all four. (We don't claim that this is
291   how the function needs to be used; it's better to stay with the default.)::
292   
293       methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]
294       print " " *25 + "  \tbayes\ttree\tmajority"
295       for i in range(4):
296           AUCs = Orange.evaluation.scoring.AUC(resVeh, i)
297           print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))
298   
299   As you can see from the output::
300   
301                                   bayes   tree    majority
302              by pairs, weighted:  0.789   0.871   0.500
303                        by pairs:  0.791   0.872   0.500
304           one vs. all, weighted:  0.783   0.800   0.500
305                     one vs. all:  0.783   0.800   0.500
306
307.. autofunction:: AUC_single
308
309.. autofunction:: AUC_pair
310
311.. autofunction:: AUC_matrix
312
313The remaining functions, which plot the curves and statistically compare
314them, require that the results come from a test with a single iteration,
315and they always compare one chosen class against all others. If you have
316cross validation results, you can either use split_by_iterations to split the
317results by folds, call the function for each fold separately and then sum
318the results up however you see fit, or you can set the ExperimentResults'
319attribute number_of_iterations to 1, to cheat the function - at your own
320responsibility for the statistical correctness. Regarding the multi-class
321problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class
322attribute's baseValue at the time when results were computed. If baseValue
323was not given at that time, 1 (that is, the second class) is used as default.
324
325We shall use the following code to prepare suitable experimental results::
326
327    ri2 = Orange.core.MakeRandomIndices2(voting, 0.6)
328    train = voting.selectref(ri2, 0)
329    test = voting.selectref(ri2, 1)
330    res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test)
331
332
333.. autofunction:: AUCWilcoxon
334
335.. autofunction:: compute_ROC
336
337Comparison of Algorithms
338------------------------
339
340.. autofunction:: McNemar
341
342.. autofunction:: McNemar_of_two
343
344==========
345Regression
346==========
347
348General Measure of Quality
349==========================
350
351Several alternative measures, as given below, can be used to evaluate
352the sucess of numeric prediction:
353
354.. image:: files/statRegression.png
355
356.. autofunction:: MSE
357
358.. autofunction:: RMSE
359
360.. autofunction:: MAE
361
362.. autofunction:: RSE
363
364.. autofunction:: RRSE
365
366.. autofunction:: RAE
367
368.. autofunction:: R2
369
370The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to
371score several regression methods.
372
373.. literalinclude:: code/statExamplesRegression.py
374
375The code above produces the following output::
376
377    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
378    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
379    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
380    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
381    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715
382   
383=================
384Ploting functions
385=================
386
387.. autofunction:: graph_ranks
388
389The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:
390
391.. literalinclude:: code/statExamplesGraphRanks.py
392
393Code produces the following graph:
394
395.. image:: files/statExamplesGraphRanks1.png
396
397.. autofunction:: compute_CD
398
399.. autofunction:: compute_friedman
400
401=================
402Utility Functions
403=================
404
405.. autofunction:: split_by_iterations
406
407=====================================
408Scoring for multilabel classification
409=====================================
410
411Multi-label classification requries different metrics than those used in traditional single-label
412classification. This module presents the various methrics that have been proposed in the literature.
413Let :math:`D` be a multi-label evaluation data set, conisting of :math:`|D|` multi-label examples
414:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multi-label classifier
415and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`.
416
417.. autofunction:: mlc_hamming_loss
418.. autofunction:: mlc_accuracy
419.. autofunction:: mlc_precision
420.. autofunction:: mlc_recall
421
422So, let's compute all this and print it out (part of
423:download:`mlc-evaluate.py <code/mlc-evaluate.py>`, uses
424:download:`emotions.tab <code/emotions.tab>`):
425
426.. literalinclude:: code/mlc-evaluate.py
427   :lines: 1-15
428
429The output should look like this::
430
431    loss= [0.9375]
432    accuracy= [0.875]
433    precision= [1.0]
434    recall= [0.875]
435
436References
437==========
438
439Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification',
440Pattern Recogintion, vol.37, no.9, pp:1757-71
441
442Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper
443presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
444(PAKDD 2004)
445 
446Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',
447Machine Learning, vol.39, no.2/3, pp:135-68.
448
449"""
450
451import statc, operator, math
452from operator import add
453import numpy
454
455import Orange
456
457
458#### Private stuff
459
460def log2(x):
461    """Calculate logarithm in base 2."""
462    return math.log(x)/math.log(2)
463
464def check_non_zero(x):
465    """Throw Value Error when x = 0.0."""
466    if x==0.0:
467        raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
468
469def gettotweight(res):
470    """Sum all the weights"""
471    totweight = reduce(lambda x, y: x+y.weight, res.results, 0)
472    if totweight==0.0:
473        raise ValueError, "Cannot compute the score: sum of weights is 0.0."
474    return totweight
475
476def gettotsize(res):
477    """ Get number of result instances """
478    if len(res.results):
479        return len(res.results)
480    else:
481        raise ValueError, "Cannot compute the score: no examples."
482
483
484def split_by_iterations(res):
485    """ Splits ExperimentResults of multiple iteratation test into a list
486    of ExperimentResults, one for each iteration.
487    """
488    if res.number_of_iterations < 2:
489        return [res]
490       
491    ress = [Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded)
492            for i in range(res.number_of_iterations)]
493    for te in res.results:
494        ress[te.iteration_number].results.append(te)
495    return ress
496
497def split_by_classifiers(res):
498    """ Splites an instance of :obj:`ExperimentResults` into a list of
499    :obj:`ExperimentResults`, one for each classifier.
500    """
501    split_res = []
502    for i in range(len(res.classifierNames)):
503        r = Orange.evaluation.testing.ExperimentResults(res.numberOfIterations,
504                    [res.classifierNames[i]], res.classValues,
505                    weights=res.weights, baseClass=res.baseClass,
506                    classifiers=[res.classifiers[i]] if res.classifiers else [])
507        r.results = []
508        for te in res.results:
509            r.results.append(Orange.evaluation.testing.TestedExample(te.iterationNumber,
510                                te.actualClass, n=1, weight=te.weight))
511            r.results[-1].classes = [te.classes[i]]
512            r.results[-1].probabilities = [te.probabilities[i]]
513        split_res.append(r)
514    return split_res
515
516
517def class_probabilities_from_res(res, **argkw):
518    """Calculate class probabilities"""
519    probs = [0.0] * len(res.class_values)
520    if argkw.get("unweighted", 0) or not res.weights:
521        for tex in res.results:
522            probs[int(tex.actual_class)] += 1.0
523        totweight = gettotsize(res)
524    else:
525        totweight = 0.0
526        for tex in res.results:
527            probs[tex.actual_class] += tex.weight
528            totweight += tex.weight
529        check_non_zero(totweight)
530    return [prob/totweight for prob in probs]
531
532
533def statistics_by_folds(stats, foldN, reportSE, iterationIsOuter):
534    # remove empty folds, turn the matrix so that learner is outer
535    if iterationIsOuter:
536        if not stats:
537            raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
538        number_of_learners = len(stats[0])
539        stats = filter(lambda (x, fN): fN>0.0, zip(stats,foldN))
540        stats = [ [x[lrn]/fN for x, fN in stats] for lrn in range(number_of_learners)]
541    else:
542        stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, foldN))] for lrnD in stats]
543
544    if not stats:
545        raise ValueError, "Cannot compute the score: no classifiers"
546    if not stats[0]:
547        raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
548   
549    if reportSE:
550        return [(statc.mean(x), statc.sterr(x)) for x in stats]
551    else:
552        return [statc.mean(x) for x in stats]
553   
554def ME(res, **argkw):
555    MEs = [0.0]*res.number_of_learners
556
557    if argkw.get("unweighted", 0) or not res.weights:
558        for tex in res.results:
559            MEs = map(lambda res, cls, ac = float(tex.actual_class):
560                      res + abs(float(cls) - ac), MEs, tex.classes)
561        totweight = gettotsize(res)
562    else:
563        for tex in res.results:
564            MEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight:
565                       res + tw*abs(float(cls) - ac), MEs, tex.classes)
566        totweight = gettotweight(res)
567
568    return [x/totweight for x in MEs]
569
570MAE = ME
571
572#########################################################################
573# PERFORMANCE MEASURES:
574# Scores for evaluation of numeric predictions
575
576def check_argkw(dct, lst):
577    """check_argkw(dct, lst) -> returns true if any items have non-zero value in dct"""
578    return reduce(lambda x,y: x or y, [dct.get(k, 0) for k in lst])
579
580def regression_error(res, **argkw):
581    """regression_error(res) -> regression error (default: MSE)"""
582    if argkw.get("SE", 0) and res.number_of_iterations > 1:
583        # computes the scores for each iteration, then averages
584        scores = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)]
585        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0):
586            norm = [0.0] * res.number_of_iterations
587
588        nIter = [0]*res.number_of_iterations       # counts examples in each iteration
589        a = [0]*res.number_of_iterations           # average class in each iteration
590        for tex in res.results:
591            nIter[tex.iteration_number] += 1
592            a[tex.iteration_number] += float(tex.actual_class)
593        a = [a[i]/nIter[i] for i in range(res.number_of_iterations)]
594
595        if argkw.get("unweighted", 0) or not res.weights:
596            # iterate accross test cases
597            for tex in res.results:
598                ai = float(tex.actual_class)
599                nIter[tex.iteration_number] += 1
600
601                # compute normalization, if required
602                if argkw.get("norm-abs", 0):
603                    norm[tex.iteration_number] += abs(ai - a[tex.iteration_number])
604                elif argkw.get("norm-sqr", 0):
605                    norm[tex.iteration_number] += (ai - a[tex.iteration_number])**2
606
607                # iterate accross results of different regressors
608                for i, cls in enumerate(tex.classes):
609                    if argkw.get("abs", 0):
610                        scores[i][tex.iteration_number] += abs(float(cls) - ai)
611                    else:
612                        scores[i][tex.iteration_number] += (float(cls) - ai)**2
613        else: # unweighted<>0
614            raise NotImplementedError, "weighted error scores with SE not implemented yet"
615
616        if argkw.get("norm-abs") or argkw.get("norm-sqr"):
617            scores = [[x/n for x, n in zip(y, norm)] for y in scores]
618        else:
619            scores = [[x/ni for x, ni in zip(y, nIter)] for y in scores]
620
621        if argkw.get("R2"):
622            scores = [[1.0 - x for x in y] for y in scores]
623
624        if argkw.get("sqrt", 0):
625            scores = [[math.sqrt(x) for x in y] for y in scores]
626
627        return [(statc.mean(x), statc.std(x)) for x in scores]
628       
629    else: # single iteration (testing on a single test set)
630        scores = [0.0] * res.number_of_learners
631        norm = 0.0
632
633        if argkw.get("unweighted", 0) or not res.weights:
634            a = sum([tex.actual_class for tex in res.results]) \
635                / len(res.results)
636            for tex in res.results:
637                if argkw.get("abs", 0):
638                    scores = map(lambda res, cls, ac = float(tex.actual_class):
639                                 res + abs(float(cls) - ac), scores, tex.classes)
640                else:
641                    scores = map(lambda res, cls, ac = float(tex.actual_class):
642                                 res + (float(cls) - ac)**2, scores, tex.classes)
643
644                if argkw.get("norm-abs", 0):
645                    norm += abs(tex.actual_class - a)
646                elif argkw.get("norm-sqr", 0):
647                    norm += (tex.actual_class - a)**2
648            totweight = gettotsize(res)
649        else:
650            # UNFINISHED
651            for tex in res.results:
652                MSEs = map(lambda res, cls, ac = float(tex.actual_class),
653                           tw = tex.weight:
654                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes)
655            totweight = gettotweight(res)
656
657        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0):
658            scores = [s/norm for s in scores]
659        else: # normalize by number of instances (or sum of weights)
660            scores = [s/totweight for s in scores]
661
662        if argkw.get("R2"):
663            scores = [1.0 - s for s in scores]
664
665        if argkw.get("sqrt", 0):
666            scores = [math.sqrt(x) for x in scores]
667
668        return scores
669
670def MSE(res, **argkw):
671    """ Computes mean-squared error. """
672    return regression_error(res, **argkw)
673   
674def RMSE(res, **argkw):
675    """ Computes root mean-squared error. """
676    argkw.setdefault("sqrt", True)
677    return regression_error(res, **argkw)
678
679def MAE(res, **argkw):
680    """ Computes mean absolute error. """
681    argkw.setdefault("abs", True)
682    return regression_error(res, **argkw)
683
684def RSE(res, **argkw):
685    """ Computes relative squared error. """
686    argkw.setdefault("norm-sqr", True)
687    return regression_error(res, **argkw)
688
689def RRSE(res, **argkw):
690    """ Computes relative squared error. """
691    argkw.setdefault("norm-sqr", True)
692    argkw.setdefault("sqrt", True)
693    return regression_error(res, **argkw)
694
695def RAE(res, **argkw):
696    """ Computes relative absolute error. """
697    argkw.setdefault("abs", True)
698    argkw.setdefault("norm-abs", True)
699    return regression_error(res, **argkw)
700
701def R2(res, **argkw):
702    """ Computes the coefficient of determination, R-squared. """
703    argkw.setdefault("norm-sqr", True)
704    argkw.setdefault("R2", True)
705    return regression_error(res, **argkw)
706
707def MSE_old(res, **argkw):
708    """MSE(res) -> mean-squared error"""
709    if argkw.get("SE", 0) and res.number_of_iterations > 1:
710        MSEs = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)]
711        nIter = [0]*res.number_of_iterations
712        if argkw.get("unweighted", 0) or not res.weights:
713            for tex in res.results:
714                ac = float(tex.actual_class)
715                nIter[tex.iteration_number] += 1
716                for i, cls in enumerate(tex.classes):
717                    MSEs[i][tex.iteration_number] += (float(cls) - ac)**2
718        else:
719            raise ValueError, "weighted RMSE with SE not implemented yet"
720        MSEs = [[x/ni for x, ni in zip(y, nIter)] for y in MSEs]
721        if argkw.get("sqrt", 0):
722            MSEs = [[math.sqrt(x) for x in y] for y in MSEs]
723        return [(statc.mean(x), statc.std(x)) for x in MSEs]
724       
725    else:
726        MSEs = [0.0]*res.number_of_learners
727        if argkw.get("unweighted", 0) or not res.weights:
728            for tex in res.results:
729                MSEs = map(lambda res, cls, ac = float(tex.actual_class):
730                           res + (float(cls) - ac)**2, MSEs, tex.classes)
731            totweight = gettotsize(res)
732        else:
733            for tex in res.results:
734                MSEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight:
735                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes)
736            totweight = gettotweight(res)
737
738        if argkw.get("sqrt", 0):
739            MSEs = [math.sqrt(x) for x in MSEs]
740        return [x/totweight for x in MSEs]
741
742def RMSE_old(res, **argkw):
743    """RMSE(res) -> root mean-squared error"""
744    argkw.setdefault("sqrt", 1)
745    return MSE_old(res, **argkw)
746
747#########################################################################
748# PERFORMANCE MEASURES:
749# Scores for evaluation of classifiers
750
751def CA(res, reportSE = False, **argkw):
752    """ Computes classification accuracy, i.e. percentage of matches between
753    predicted and actual class. The function returns a list of classification
754    accuracies of all classifiers tested. If reportSE is set to true, the list
755    will contain tuples with accuracies and standard errors.
756   
757    If results are from multiple repetitions of experiments (like those
758    returned by Orange.evaluation.testing.crossValidation or
759    Orange.evaluation.testing.proportionTest) the
760    standard error (SE) is estimated from deviation of classification
761    accuracy accross folds (SD), as SE = SD/sqrt(N), where N is number
762    of repetitions (e.g. number of folds).
763   
764    If results are from a single repetition, we assume independency of
765    instances and treat the classification accuracy as distributed according
766    to binomial distribution. This can be approximated by normal distribution,
767    so we report the SE of sqrt(CA*(1-CA)/N), where CA is classification
768    accuracy and N is number of test instances.
769   
770    Instead of ExperimentResults, this function can be given a list of
771    confusion matrices (see below). Standard errors are in this case
772    estimated using the latter method.
773    """
774    if res.number_of_iterations==1:
775        if type(res)==ConfusionMatrix:
776            div = nm.TP+nm.FN+nm.FP+nm.TN
777            check_non_zero(div)
778            ca = [(nm.TP+nm.TN)/div]
779        else:
780            CAs = [0.0]*res.number_of_learners
781            if argkw.get("unweighted", 0) or not res.weights:
782                totweight = gettotsize(res)
783                for tex in res.results:
784                    CAs = map(lambda res, cls: res+(cls==tex.actual_class), CAs, tex.classes)
785            else:
786                totweight = 0.
787                for tex in res.results:
788                    CAs = map(lambda res, cls: res+(cls==tex.actual_class and tex.weight), CAs, tex.classes)
789                    totweight += tex.weight
790            check_non_zero(totweight)
791            ca = [x/totweight for x in CAs]
792           
793        if reportSE:
794            return [(x, x*(1-x)/math.sqrt(totweight)) for x in ca]
795        else:
796            return ca
797       
798    else:
799        CAsByFold = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)]
800        foldN = [0.0]*res.number_of_iterations
801
802        if argkw.get("unweighted", 0) or not res.weights:
803            for tex in res.results:
804                for lrn in range(res.number_of_learners):
805                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class)
806                foldN[tex.iteration_number] += 1
807        else:
808            for tex in res.results:
809                for lrn in range(res.number_of_learners):
810                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) and tex.weight
811                foldN[tex.iteration_number] += tex.weight
812
813        return statistics_by_folds(CAsByFold, foldN, reportSE, False)
814
815
816# Obsolete, but kept for compatibility
817def CA_se(res, **argkw):
818    return CA(res, True, **argkw)
819
820
821def AP(res, reportSE = False, **argkw):
822    """ Computes the average probability assigned to the correct class. """
823    if res.number_of_iterations == 1:
824        APs=[0.0]*res.number_of_learners
825        if argkw.get("unweighted", 0) or not res.weights:
826            for tex in res.results:
827                APs = map(lambda res, probs: res + probs[tex.actual_class], APs, tex.probabilities)
828            totweight = gettotsize(res)
829        else:
830            totweight = 0.
831            for tex in res.results:
832                APs = map(lambda res, probs: res + probs[tex.actual_class]*tex.weight, APs, tex.probabilities)
833                totweight += tex.weight
834        check_non_zero(totweight)
835        return [AP/totweight for AP in APs]
836
837    APsByFold = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)]
838    foldN = [0.0] * res.number_of_iterations
839    if argkw.get("unweighted", 0) or not res.weights:
840        for tex in res.results:
841            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class], APsByFold[tex.iteration_number], tex.probabilities)
842            foldN[tex.iteration_number] += 1
843    else:
844        for tex in res.results:
845            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class] * tex.weight, APsByFold[tex.iteration_number], tex.probabilities)
846            foldN[tex.iteration_number] += tex.weight
847
848    return statistics_by_folds(APsByFold, foldN, reportSE, True)
849
850
851def Brier_score(res, reportSE = False, **argkw):
852    """ Computes the Brier's score, defined as the average (over test examples)
853    of sumx(t(x)-p(x))2, where x is a class, t(x) is 1 for the correct class
854    and 0 for the others, and p(x) is the probability that the classifier
855    assigned to the class x
856    """
857    # Computes an average (over examples) of sum_x(t(x) - p(x))^2, where
858    #    x is class,
859    #    t(x) is 0 for 'wrong' and 1 for 'correct' class
860    #    p(x) is predicted probabilty.
861    # There's a trick: since t(x) is zero for all classes but the
862    # correct one (c), we compute the sum as sum_x(p(x)^2) - 2*p(c) + 1
863    # Since +1 is there for each example, it adds 1 to the average
864    # We skip the +1 inside the sum and add it just at the end of the function
865    # We take max(result, 0) to avoid -0.0000x due to rounding errors
866
867    if res.number_of_iterations == 1:
868        MSEs=[0.0]*res.number_of_learners
869        if argkw.get("unweighted", 0) or not res.weights:
870            totweight = 0.0
871            for tex in res.results:
872                MSEs = map(lambda res, probs:
873                           res + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities)
874                totweight += tex.weight
875        else:
876            for tex in res.results:
877                MSEs = map(lambda res, probs:
878                           res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities)
879            totweight = gettotweight(res)
880        check_non_zero(totweight)
881        if reportSE:
882            return [(max(x/totweight+1.0, 0), 0) for x in MSEs]  ## change this, not zero!!!
883        else:
884            return [max(x/totweight+1.0, 0) for x in MSEs]
885
886    BSs = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)]
887    foldN = [0.] * res.number_of_iterations
888
889    if argkw.get("unweighted", 0) or not res.weights:
890        for tex in res.results:
891            BSs[tex.iteration_number] = map(lambda rr, probs:
892                       rr + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities)
893            foldN[tex.iteration_number] += 1
894    else:
895        for tex in res.results:
896            BSs[tex.iteration_number] = map(lambda res, probs:
897                       res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities)
898            foldN[tex.iteration_number] += tex.weight
899
900    stats = statistics_by_folds(BSs, foldN, reportSE, True)
901    if reportSE:
902        return [(x+1.0, y) for x, y in stats]
903    else:
904        return [x+1.0 for x in stats]
905
906def BSS(res, **argkw):
907    return [1-x/2 for x in apply(Brier_score, (res, ), argkw)]
908
909def IS_ex(Pc, P):
910    "Pc aposterior probability, P aprior"
911    if (Pc>=P):
912        return -log2(P)+log2(Pc)
913    else:
914        return -(-log2(1-P)+log2(1-Pc))
915   
916def IS(res, apriori=None, reportSE = False, **argkw):
917    """ Computes the information score as defined by
918    `Kononenko and Bratko (1991) \
919    <http://www.springerlink.com/content/g5p7473160476612/>`_.
920    Argument :obj:`apriori` gives the apriori class
921    distribution; if it is omitted, the class distribution is computed from
922    the actual classes of examples in :obj:`res`.
923    """
924    if not apriori:
925        apriori = class_probabilities_from_res(res)
926
927    if res.number_of_iterations==1:
928        ISs = [0.0]*res.number_of_learners
929        if argkw.get("unweighted", 0) or not res.weights:
930            for tex in res.results:
931              for i in range(len(tex.probabilities)):
932                    cls = tex.actual_class
933                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls])
934            totweight = gettotsize(res)
935        else:
936            for tex in res.results:
937              for i in range(len(tex.probabilities)):
938                    cls = tex.actual_class
939                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight
940            totweight = gettotweight(res)
941        if reportSE:
942            return [(IS/totweight,0) for IS in ISs]
943        else:
944            return [IS/totweight for IS in ISs]
945
946       
947    ISs = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)]
948    foldN = [0.] * res.number_of_iterations
949
950    # compute info scores for each fold   
951    if argkw.get("unweighted", 0) or not res.weights:
952        for tex in res.results:
953            for i in range(len(tex.probabilities)):
954                cls = tex.actual_class
955                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls])
956            foldN[tex.iteration_number] += 1
957    else:
958        for tex in res.results:
959            for i in range(len(tex.probabilities)):
960                cls = tex.actual_class
961                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight
962            foldN[tex.iteration_number] += tex.weight
963
964    return statistics_by_folds(ISs, foldN, reportSE, False)
965
966
967def Friedman(res, statistics, **argkw):
968    sums = None
969    for ri in split_by_iterations(res):
970        ranks = statc.rankdata(apply(statistics, (ri,), argkw))
971        if sums:
972            sums = sums and [ranks[i]+sums[i] for i in range(k)]
973        else:
974            sums = ranks
975            k = len(sums)
976    N = res.number_of_iterations
977    k = len(sums)
978    T = sum([x*x for x in sums])
979    F = 12.0 / (N*k*(k+1)) * T  - 3 * N * (k+1)
980    return F, statc.chisqprob(F, k-1)
981   
982
983def Wilcoxon(res, statistics, **argkw):
984    res1, res2 = [], []
985    for ri in split_by_iterations(res):
986        stats = apply(statistics, (ri,), argkw)
987        if (len(stats) != 2):
988            raise TypeError, "Wilcoxon compares two classifiers, no more, no less"
989        res1.append(stats[0])
990        res2.append(stats[1])
991    return statc.wilcoxont(res1, res2)
992
993def rank_difference(res, statistics, **argkw):
994    if not res.results:
995        raise TypeError, "no experiments"
996
997    k = len(res.results[0].classes)
998    if (k<2):
999        raise TypeError, "nothing to compare (less than two classifiers given)"
1000    if k==2:
1001        return apply(Wilcoxon, (res, statistics), argkw)
1002    else:
1003        return apply(Friedman, (res, statistics), argkw)
1004   
1005class ConfusionMatrix:
1006    """ Class ConfusionMatrix stores data about false and true
1007    predictions compared to real class. It stores the number of
1008    True Negatives, False Positive, False Negatives and True Positives.
1009    """
1010    def __init__(self):
1011        self.TP = self.FN = self.FP = self.TN = 0.0
1012
1013    def addTFPosNeg(self, predictedPositive, isPositive, weight = 1.0):
1014        if predictedPositive:
1015            if isPositive:
1016                self.TP += weight
1017            else:
1018                self.FP += weight
1019        else:
1020            if isPositive:
1021                self.FN += weight
1022            else:
1023                self.TN += weight
1024
1025
1026def confusion_matrices(res, classIndex=-1, **argkw):
1027    """ This function can compute two different forms of confusion matrix:
1028    one in which a certain class is marked as positive and the other(s)
1029    negative, and another in which no class is singled out. The way to
1030    specify what we want is somewhat confusing due to backward
1031    compatibility issues.
1032    """
1033    tfpns = [ConfusionMatrix() for i in range(res.number_of_learners)]
1034   
1035    if classIndex<0:
1036        numberOfClasses = len(res.class_values)
1037        if classIndex < -1 or numberOfClasses > 2:
1038            cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.number_of_learners)]
1039            if argkw.get("unweighted", 0) or not res.weights:
1040                for tex in res.results:
1041                    trueClass = int(tex.actual_class)
1042                    for li, pred in enumerate(tex.classes):
1043                        predClass = int(pred)
1044                        if predClass < numberOfClasses:
1045                            cm[li][trueClass][predClass] += 1
1046            else:
1047                for tex in enumerate(res.results):
1048                    trueClass = int(tex.actual_class)
1049                    for li, pred in tex.classes:
1050                        predClass = int(pred)
1051                        if predClass < numberOfClasses:
1052                            cm[li][trueClass][predClass] += tex.weight
1053            return cm
1054           
1055        elif res.baseClass>=0:
1056            classIndex = res.baseClass
1057        else:
1058            classIndex = 1
1059           
1060    cutoff = argkw.get("cutoff")
1061    if cutoff:
1062        if argkw.get("unweighted", 0) or not res.weights:
1063            for lr in res.results:
1064                isPositive=(lr.actual_class==classIndex)
1065                for i in range(res.number_of_learners):
1066                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive)
1067        else:
1068            for lr in res.results:
1069                isPositive=(lr.actual_class==classIndex)
1070                for i in range(res.number_of_learners):
1071                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive, lr.weight)
1072    else:
1073        if argkw.get("unweighted", 0) or not res.weights:
1074            for lr in res.results:
1075                isPositive=(lr.actual_class==classIndex)
1076                for i in range(res.number_of_learners):
1077                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive)
1078        else:
1079            for lr in res.results:
1080                isPositive=(lr.actual_class==classIndex)
1081                for i in range(res.number_of_learners):
1082                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive, lr.weight)
1083    return tfpns
1084
1085
1086# obsolete (renamed)
1087compute_confusion_matrices = confusion_matrices
1088
1089
1090def confusion_chi_square(confusionMatrix):
1091    dim = len(confusionMatrix)
1092    rowPriors = [sum(r) for r in confusionMatrix]
1093    colPriors = [sum([r[i] for r in confusionMatrix]) for i in range(dim)]
1094    total = sum(rowPriors)
1095    rowPriors = [r/total for r in rowPriors]
1096    colPriors = [r/total for r in colPriors]
1097    ss = 0
1098    for ri, row in enumerate(confusionMatrix):
1099        for ci, o in enumerate(row):
1100            e = total * rowPriors[ri] * colPriors[ci]
1101            if not e:
1102                return -1, -1, -1
1103            ss += (o-e)**2 / e
1104    df = (dim-1)**2
1105    return ss, df, statc.chisqprob(ss, df)
1106       
1107   
1108def sens(confm):
1109    """Return sensitivity (recall rate) over the given confusion matrix."""
1110    if type(confm) == list:
1111        return [sens(cm) for cm in confm]
1112    else:
1113        tot = confm.TP+confm.FN
1114        if tot < 1e-6:
1115            import warnings
1116            warnings.warn("Can't compute sensitivity: one or both classes have no instances")
1117            return -1
1118
1119        return confm.TP/tot
1120
1121def recall(confm):
1122    """Return recall rate (sensitivity) over the given confusion matrix."""
1123    return sens(confm)
1124
1125
1126def spec(confm):
1127    """Return specificity over the given confusion matrix."""
1128    if type(confm) == list:
1129        return [spec(cm) for cm in confm]
1130    else:
1131        tot = confm.FP+confm.TN
1132        if tot < 1e-6:
1133            import warnings
1134            warnings.warn("Can't compute specificity: one or both classes have no instances")
1135            return -1
1136        return confm.TN/tot
1137 
1138
1139def PPV(confm):
1140    """Return positive predictive value (precision rate) over the given confusion matrix."""
1141    if type(confm) == list:
1142        return [PPV(cm) for cm in confm]
1143    else:
1144        tot = confm.TP+confm.FP
1145        if tot < 1e-6:
1146            import warnings
1147            warnings.warn("Can't compute PPV: one or both classes have no instances")
1148            return -1
1149        return confm.TP/tot
1150
1151
1152def precision(confm):
1153    """Return precision rate (positive predictive value) over the given confusion matrix."""
1154    return PPV(confm)
1155
1156
1157def NPV(confm):
1158    """Return negative predictive value over the given confusion matrix."""
1159    if type(confm) == list:
1160        return [NPV(cm) for cm in confm]
1161    else:
1162        tot = confm.FN+confm.TN
1163        if tot < 1e-6:
1164            import warnings
1165            warnings.warn("Can't compute NPV: one or both classes have no instances")
1166            return -1
1167        return confm.TN/tot
1168
1169def F1(confm):
1170    """Return F1 score (harmonic mean of precision and recall) over the given confusion matrix."""
1171    if type(confm) == list:
1172        return [F1(cm) for cm in confm]
1173    else:
1174        p = precision(confm)
1175        r = recall(confm)
1176        if p + r > 0:
1177            return 2. * p * r / (p + r)
1178        else:
1179            import warnings
1180            warnings.warn("Can't compute F1: P + R is zero or not defined")
1181            return -1
1182
1183def Falpha(confm, alpha=1.0):
1184    """Return the alpha-mean of precision and recall over the given confusion matrix."""
1185    if type(confm) == list:
1186        return [Falpha(cm, alpha=alpha) for cm in confm]
1187    else:
1188        p = precision(confm)
1189        r = recall(confm)
1190        return (1. + alpha) * p * r / (alpha * p + r)
1191   
1192def MCC(confm):
1193    '''
1194    Return Mattew correlation coefficient over the given confusion matrix.
1195
1196    MCC is calculated as follows:
1197    MCC = (TP*TN - FP*FN) / sqrt( (TP+FP)*(TP+FN)*(TN+FP)*(TN+FN) )
1198   
1199    [1] Matthews, B.W., Comparison of the predicted and observed secondary
1200    structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442-451
1201
1202    code by Boris Gorelik
1203    '''
1204    if type(confm) == list:
1205        return [MCC(cm) for cm in confm]
1206    else:
1207        truePositive = confm.TP
1208        trueNegative = confm.TN
1209        falsePositive = confm.FP
1210        falseNegative = confm.FN
1211         
1212        try:   
1213            r = (((truePositive * trueNegative) - (falsePositive * falseNegative))/ 
1214                math.sqrt(  (truePositive + falsePositive)  * 
1215                ( truePositive + falseNegative ) * 
1216                ( trueNegative + falsePositive ) * 
1217                ( trueNegative + falseNegative ) )
1218                )
1219        except ZeroDivisionError:
1220            # Zero difision occurs when there is either no true positives
1221            # or no true negatives i.e. the problem contains only one
1222            # type of classes.
1223            import warnings
1224            warnings.warn("Can't compute MCC: TP or TN is zero or not defined")
1225            r = None
1226
1227    return r
1228
1229def scotts_pi(confm, bIsListOfMatrices=True):
1230   """Compute Scott's Pi for measuring inter-rater agreement for nominal data
1231
1232   http://en.wikipedia.org/wiki/Scott%27s_Pi
1233   Scott's Pi is a statistic for measuring inter-rater reliability for nominal
1234   raters.
1235
1236   @param confm: confusion matrix, or list of confusion matrices. To obtain
1237                           non-binary confusion matrix, call
1238                           Orange.evaluation.scoring.compute_confusion_matrices and set the
1239                           classIndex parameter to -2.
1240   @param bIsListOfMatrices: specifies whether confm is list of matrices.
1241                           This function needs to operate on non-binary
1242                           confusion matrices, which are represented by python
1243                           lists, therefore one needs a way to distinguish
1244                           between a single matrix and list of matrices
1245   """
1246
1247   if bIsListOfMatrices:
1248       try:
1249           return [scotts_pi(cm, bIsListOfMatrices=False) for cm in confm]
1250       except TypeError:
1251           # Nevermind the parameter, maybe this is a "conventional" binary
1252           # confusion matrix and bIsListOfMatrices was specified by mistake
1253           return scottsPiSingle(confm, bIsListOfMatrices=False)
1254   else:
1255       if isinstance(confm, ConfusionMatrix):
1256           confm = numpy.array( [[confm.TP, confm.FN],
1257                   [confm.FP, confm.TN]], dtype=float)
1258       else:
1259           confm = numpy.array(confm, dtype=float)
1260
1261       marginalSumOfRows = numpy.sum(confm, axis=0)
1262       marginalSumOfColumns = numpy.sum(confm, axis=1)
1263       jointProportion = (marginalSumOfColumns + marginalSumOfRows)/ \
1264                           (2.0 * numpy.sum(confm, axis=None))
1265       # In the eq. above, 2.0 is what the Wikipedia page calls
1266       # the number of annotators. Here we have two annotators:
1267       # the observed (true) labels (annotations) and the predicted by
1268       # the learners.
1269
1270       prExpected = numpy.sum(jointProportion ** 2, axis=None)
1271       prActual = numpy.sum(numpy.diag(confm), axis=None)/numpy.sum(confm, axis=None)
1272
1273       ret = (prActual - prExpected) / (1.0 - prExpected)
1274       return ret
1275
1276def AUCWilcoxon(res, classIndex=-1, **argkw):
1277    """ Computes the area under ROC (AUC) and its standard error using
1278    Wilcoxon's approach proposed by Hanley and McNeal (1982). If
1279    :obj:`classIndex` is not specified, the first class is used as
1280    "the positive" and others are negative. The result is a list of
1281    tuples (aROC, standard error).
1282    """
1283    import corn
1284    useweights = res.weights and not argkw.get("unweighted", 0)
1285    problists, tots = corn.computeROCCumulative(res, classIndex, useweights)
1286
1287    results=[]
1288
1289    totPos, totNeg = tots[1], tots[0]
1290    N = totPos + totNeg
1291    for plist in problists:
1292        highPos, lowNeg = totPos, 0.0
1293        W, Q1, Q2 = 0.0, 0.0, 0.0
1294        for prob in plist:
1295            thisPos, thisNeg = prob[1][1], prob[1][0]
1296            highPos -= thisPos
1297            W += thisNeg * (highPos + thisPos/2.)
1298            Q2 += thisPos * (lowNeg**2  + lowNeg*thisNeg  + thisNeg**2 /3.)
1299            Q1 += thisNeg * (highPos**2 + highPos*thisPos + thisPos**2 /3.)
1300
1301            lowNeg += thisNeg
1302
1303        W  /= (totPos*totNeg)
1304        Q1 /= (totNeg*totPos**2)
1305        Q2 /= (totPos*totNeg**2)
1306
1307        SE = math.sqrt( (W*(1-W) + (totPos-1)*(Q1-W**2) + (totNeg-1)*(Q2-W**2)) / (totPos*totNeg) )
1308        results.append((W, SE))
1309    return results
1310
1311AROC = AUCWilcoxon # for backward compatibility, AROC is obsolote
1312
1313def compare_2_AUCs(res, lrn1, lrn2, classIndex=-1, **argkw):
1314    import corn
1315    return corn.compare2ROCs(res, lrn1, lrn2, classIndex, res.weights and not argkw.get("unweighted"))
1316
1317compare_2_AROCs = compare_2_AUCs # for backward compatibility, compare_2_AROCs is obsolote
1318
1319   
1320def compute_ROC(res, classIndex=-1):
1321    """ Computes a ROC curve as a list of (x, y) tuples, where x is
1322    1-specificity and y is sensitivity.
1323    """
1324    import corn
1325    problists, tots = corn.computeROCCumulative(res, classIndex)
1326
1327    results = []
1328    totPos, totNeg = tots[1], tots[0]
1329
1330    for plist in problists:
1331        curve=[(1., 1.)]
1332        TP, TN = totPos, 0.0
1333        FN, FP = 0., totNeg
1334        for prob in plist:
1335            thisPos, thisNeg = prob[1][1], prob[1][0]
1336            # thisPos go from TP to FN
1337            TP -= thisPos
1338            FN += thisPos
1339            # thisNeg go from FP to TN
1340            TN += thisNeg
1341            FP -= thisNeg
1342
1343            sens = TP/(TP+FN)
1344            spec = TN/(FP+TN)
1345            curve.append((1-spec, sens))
1346        results.append(curve)
1347
1348    return results   
1349
1350## TC's implementation of algorithms, taken from:
1351## T Fawcett: ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, submitted to KDD Journal.
1352def ROC_slope((P1x, P1y, P1fscore), (P2x, P2y, P2fscore)):
1353    if (P1x == P2x):
1354        return 1e300
1355    return (P1y - P2y) / (P1x - P2x)
1356
1357def ROC_add_point(P, R, keepConcavities=1):
1358    if keepConcavities:
1359        R.append(P)
1360    else:
1361        while (1):
1362            if len(R) < 2:
1363                R.append(P)
1364                return R
1365            else:
1366                T = R.pop()
1367                T2 = R[-1]
1368                if ROC_slope(T2, T) > ROC_slope(T, P):
1369                    R.append(T)
1370                    R.append(P)
1371                    return R
1372    return R
1373
1374def TC_compute_ROC(res, classIndex=-1, keepConcavities=1):
1375    import corn
1376    problists, tots = corn.computeROCCumulative(res, classIndex)
1377
1378    results = []
1379    P, N = tots[1], tots[0]
1380
1381    for plist in problists:
1382        ## corn gives an increasing by scores list, we need a decreasing by scores
1383        plist.reverse()
1384        TP = 0.0
1385        FP = 0.0
1386        curve=[]
1387        fPrev = 10e300 # "infinity" score at 0.0, 0.0
1388        for prob in plist:
1389            f = prob[0]
1390            if f <> fPrev:
1391                if P:
1392                    tpr = TP/P
1393                else:
1394                    tpr = 0.0
1395                if N:
1396                    fpr = FP/N
1397                else:
1398                    fpr = 0.0
1399                curve = ROC_add_point((fpr, tpr, fPrev), curve, keepConcavities)
1400                fPrev = f
1401            thisPos, thisNeg = prob[1][1], prob[1][0]
1402            TP += thisPos
1403            FP += thisNeg
1404        if P:
1405            tpr = TP/P
1406        else:
1407            tpr = 0.0
1408        if N:
1409            fpr = FP/N
1410        else:
1411            fpr = 0.0
1412        curve = ROC_add_point((fpr, tpr, f), curve, keepConcavities) ## ugly
1413        results.append(curve)
1414
1415    return results
1416
1417## returns a list of points at the intersection of the tangential iso-performance line and the given ROC curve
1418## for given values of FPcost, FNcost and pval
1419def TC_best_thresholds_on_ROC_curve(FPcost, FNcost, pval, curve):
1420    m = (FPcost*(1.0 - pval)) / (FNcost*pval)
1421
1422    ## put the iso-performance line in point (0.0, 1.0)
1423    x0, y0 = (0.0, 1.0)
1424    x1, y1 = (1.0, 1.0 + m)
1425    d01 = math.sqrt((x1 - x0)*(x1 - x0) + (y1 - y0)*(y1 - y0))
1426
1427    ## calculate and find the closest point to the line
1428    firstp = 1
1429    mind = 0.0
1430    a = (x0*y1 - x1*y0)
1431    closestPoints = []
1432    for (x, y, fscore) in curve:
1433        d = ((y0 - y1)*x + (x1 - x0)*y + a) / d01
1434        d = abs(d)
1435        if firstp or d < mind:
1436            mind, firstp = d, 0
1437            closestPoints = [(x, y, fscore)]
1438        else:
1439            if abs(d - mind) <= 0.0001: ## close enough
1440                closestPoints.append( (x, y, fscore) )
1441    return closestPoints         
1442
1443def frange(start, end=None, inc=None):
1444    "A range function, that does accept float increments..."
1445
1446    if end == None:
1447        end = start + 0.0
1448        start = 0.0
1449
1450    if inc == None or inc == 0:
1451        inc = 1.0
1452
1453    L = [start]
1454    while 1:
1455        next = start + len(L) * inc
1456        if inc > 0 and next >= end:
1457            L.append(end)
1458            break
1459        elif inc < 0 and next <= end:
1460            L.append(end)
1461            break
1462        L.append(next)
1463       
1464    return L
1465
1466## input ROCcurves are of form [ROCcurves1, ROCcurves2, ... ROCcurvesN],
1467## where ROCcurvesX is a set of ROC curves,
1468## where a (one) ROC curve is a set of (FP, TP) points
1469##
1470## for each (sub)set of input ROC curves
1471## returns the average ROC curve and an array of (vertical) standard deviations
1472def TC_vertical_average_ROC(ROCcurves, samples = 10):
1473    def INTERPOLATE((P1x, P1y, P1fscore), (P2x, P2y, P2fscore), X):
1474        if (P1x == P2x) or ((X > P1x) and (X > P2x)) or ((X < P1x) and (X < P2x)):
1475            raise ValueError, "assumptions for interpolation are not met: P1 = %f,%f P2 = %f,%f X = %f" % (P1x, P1y, P2x, P2y, X)
1476        dx = float(P2x) - float(P1x)
1477        dy = float(P2y) - float(P1y)
1478        m = dy/dx
1479        return P1y + m*(X - P1x)
1480
1481    def TP_FOR_FP(FPsample, ROC, npts):
1482        i = 0
1483        while i < npts - 1:
1484            (fp, _, _) = ROC[i + 1]
1485            if (fp <= FPsample):
1486                i += 1
1487            else:
1488                break
1489        (fp, tp, _) = ROC[i]
1490        if fp == FPsample:
1491            return tp
1492        elif fp < FPsample and i + 1 < len(ROC):
1493            return INTERPOLATE(ROC[i], ROC[i+1], FPsample)
1494        elif fp < FPsample and i + 1 == len(ROC): # return the last
1495            return ROC[i][1]
1496        raise ValueError, "cannot compute: TP_FOR_FP in TC_vertical_average_ROC"
1497        #return 0.0
1498
1499    average = []
1500    stdev = []
1501    for ROCS in ROCcurves:
1502        npts = []
1503        for c in ROCS:
1504            npts.append(len(c))
1505        nrocs = len(ROCS)
1506
1507        TPavg = []
1508        TPstd = []
1509        for FPsample in frange(0.0, 1.0, 1.0/samples):
1510            TPsum = []
1511            for i in range(nrocs):
1512                TPsum.append( TP_FOR_FP(FPsample, ROCS[i], npts[i]) ) ##TPsum = TPsum + TP_FOR_FP(FPsample, ROCS[i], npts[i])
1513            TPavg.append( (FPsample, statc.mean(TPsum)) )
1514            if len(TPsum) > 1:
1515                stdv = statc.std(TPsum)
1516            else:
1517                stdv = 0.0
1518            TPstd.append( stdv )
1519
1520        average.append(TPavg)
1521        stdev.append(TPstd)
1522
1523    return (average, stdev)
1524
1525## input ROCcurves are of form [ROCcurves1, ROCcurves2, ... ROCcurvesN],
1526## where ROCcurvesX is a set of ROC curves,
1527## where a (one) ROC curve is a set of (FP, TP) points
1528##
1529## for each (sub)set of input ROC curves
1530## returns the average ROC curve, an array of vertical standard deviations and an array of horizontal standard deviations
1531def TC_threshold_average_ROC(ROCcurves, samples = 10):
1532    def POINT_AT_THRESH(ROC, npts, thresh):
1533        i = 0
1534        while i < npts - 1:
1535            (px, py, pfscore) = ROC[i]
1536            if (pfscore > thresh):
1537                i += 1
1538            else:
1539                break
1540        return ROC[i]
1541
1542    average = []
1543    stdevV = []
1544    stdevH = []
1545    for ROCS in ROCcurves:
1546        npts = []
1547        for c in ROCS:
1548            npts.append(len(c))
1549        nrocs = len(ROCS)
1550
1551        T = []
1552        for c in ROCS:
1553            for (px, py, pfscore) in c:
1554##                try:
1555##                    T.index(pfscore)
1556##                except:
1557                T.append(pfscore)
1558        T.sort()
1559        T.reverse() ## ugly
1560
1561        TPavg = []
1562        TPstdV = []
1563        TPstdH = []
1564        for tidx in frange(0, (len(T) - 1.0), float(len(T))/samples):
1565            FPsum = []
1566            TPsum = []
1567            for i in range(nrocs):
1568                (fp, tp, _) = POINT_AT_THRESH(ROCS[i], npts[i], T[int(tidx)])
1569                FPsum.append(fp)
1570                TPsum.append(tp)
1571            TPavg.append( (statc.mean(FPsum), statc.mean(TPsum)) )
1572            ## vertical standard deviation
1573            if len(TPsum) > 1:
1574                stdv = statc.std(TPsum)
1575            else:
1576                stdv = 0.0
1577            TPstdV.append( stdv )
1578            ## horizontal standard deviation
1579            if len(FPsum) > 1:
1580                stdh = statc.std(FPsum)
1581            else:
1582                stdh = 0.0
1583            TPstdH.append( stdh )
1584
1585        average.append(TPavg)
1586        stdevV.append(TPstdV)
1587        stdevH.append(TPstdH)
1588
1589    return (average, stdevV, stdevH)
1590
1591## Calibration Curve
1592## returns an array of (curve, yesClassPredictions, noClassPredictions) elements, where:
1593##  - curve is an array of points (x, y) on the calibration curve
1594##  - yesClassRugPoints is an array of (x, 1) points
1595##  - noClassRugPoints is an array of (x, 0) points
1596def compute_calibration_curve(res, classIndex=-1):
1597    import corn
1598    ## merge multiple iterations into one
1599    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded)
1600    for te in res.results:
1601        mres.results.append( te )
1602
1603    problists, tots = corn.computeROCCumulative(mres, classIndex)
1604
1605    results = []
1606    P, N = tots[1], tots[0]
1607
1608    bins = 10 ## divide interval between 0.0 and 1.0 into N bins
1609
1610    for plist in problists:
1611        yesClassRugPoints = [] 
1612        noClassRugPoints = []
1613
1614        yesBinsVals = [0] * bins
1615        noBinsVals = [0] * bins
1616        for (f, (thisNeg, thisPos)) in plist:
1617            yesClassRugPoints.append( (f, thisPos) ) #1.0
1618            noClassRugPoints.append( (f, thisNeg) ) #1.0
1619
1620            index = int(f * bins )
1621            index = min(index, bins - 1) ## just in case for value 1.0
1622            yesBinsVals[index] += thisPos
1623            noBinsVals[index] += thisNeg
1624
1625        curve = []
1626        for cn in range(bins):
1627            f = float(cn * 1.0 / bins) + (1.0 / 2.0 / bins)
1628            yesVal = yesBinsVals[cn]
1629            noVal = noBinsVals[cn]
1630            allVal = yesVal + noVal
1631            if allVal == 0.0: continue
1632            y = float(yesVal)/float(allVal)
1633            curve.append( (f,  y) )
1634
1635        ## smooth the curve
1636        maxnPoints = 100
1637        if len(curve) >= 3:
1638#            loessCurve = statc.loess(curve, -3, 0.6)
1639            loessCurve = statc.loess(curve, maxnPoints, 0.5, 3)
1640        else:
1641            loessCurve = curve
1642        clen = len(loessCurve)
1643        if clen > maxnPoints:
1644            df = clen / maxnPoints
1645            if df < 1: df = 1
1646            curve = [loessCurve[i]  for i in range(0, clen, df)]
1647        else:
1648            curve = loessCurve
1649        curve = [(c)[:2] for c in curve] ## remove the third value (variance of epsilon?) that suddenly appeared in the output of the statc.loess function
1650        results.append((curve, yesClassRugPoints, noClassRugPoints))
1651
1652    return results
1653
1654
1655## Lift Curve
1656## returns an array of curve elements, where:
1657##  - curve is an array of points ((TP+FP)/(P + N), TP/P, (th, FP/N)) on the Lift Curve
1658def compute_lift_curve(res, classIndex=-1):
1659    import corn
1660    ## merge multiple iterations into one
1661    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded)
1662    for te in res.results:
1663        mres.results.append( te )
1664
1665    problists, tots = corn.computeROCCumulative(mres, classIndex)
1666
1667    results = []
1668    P, N = tots[1], tots[0]
1669    for plist in problists:
1670        ## corn gives an increasing by scores list, we need a decreasing by scores
1671        plist.reverse()
1672        TP = 0.0
1673        FP = 0.0
1674        curve = [(0.0, 0.0, (10e300, 0.0))]
1675        for (f, (thisNeg, thisPos)) in plist:
1676            TP += thisPos
1677            FP += thisNeg
1678            curve.append( ((TP+FP)/(P + N), TP, (f, FP/(N or 1))) )
1679        results.append(curve)
1680
1681    return P, N, results
1682###
1683
1684class CDT:
1685  """ Stores number of concordant (C), discordant (D) and tied (T) pairs (used for AUC) """
1686  def __init__(self, C=0.0, D=0.0, T=0.0):
1687    self.C, self.D, self.T = C, D, T
1688   
1689def is_CDT_empty(cdt):
1690    return cdt.C + cdt.D + cdt.T < 1e-20
1691
1692
1693def compute_CDT(res, classIndex=-1, **argkw):
1694    """Obsolete, don't use"""
1695    import corn
1696    if classIndex<0:
1697        if res.baseClass>=0:
1698            classIndex = res.baseClass
1699        else:
1700            classIndex = 1
1701           
1702    useweights = res.weights and not argkw.get("unweighted", 0)
1703    weightByClasses = argkw.get("weightByClasses", True)
1704
1705    if (res.number_of_iterations>1):
1706        CDTs = [CDT() for i in range(res.number_of_learners)]
1707        iterationExperiments = split_by_iterations(res)
1708        for exp in iterationExperiments:
1709            expCDTs = corn.computeCDT(exp, classIndex, useweights)
1710            for i in range(len(CDTs)):
1711                CDTs[i].C += expCDTs[i].C
1712                CDTs[i].D += expCDTs[i].D
1713                CDTs[i].T += expCDTs[i].T
1714        for i in range(res.number_of_learners):
1715            if is_CDT_empty(CDTs[0]):
1716                return corn.computeCDT(res, classIndex, useweights)
1717       
1718        return CDTs
1719    else:
1720        return corn.computeCDT(res, classIndex, useweights)
1721
1722## THIS FUNCTION IS OBSOLETE AND ITS AVERAGING OVER FOLDS IS QUESTIONABLE
1723## DON'T USE IT
1724def ROCs_from_CDT(cdt, **argkw):
1725    """Obsolete, don't use"""
1726    if type(cdt) == list:
1727        return [ROCs_from_CDT(c) for c in cdt]
1728
1729    C, D, T = cdt.C, cdt.D, cdt.T
1730    N = C+D+T
1731    if N < 1e-6:
1732        import warnings
1733        warnings.warn("Can't compute AUC: one or both classes have no instances")
1734        return (-1,)*8
1735    if N < 2:
1736        import warnings
1737        warnings.warn("Can't compute AUC: one or both classes have too few examples")
1738
1739    som = (C-D)/N
1740    c = 0.5*(1+som)
1741 
1742    if (C+D):
1743        res = (C/N*100, D/N*100, T/N*100, N, som, (C-D)/(C+D), (C-D)/(N*(N-1)/2), 0.5*(1+som))
1744    else:
1745        res = (C/N*100, D/N*100, T/N*100, N, som, -1.0, (C-D)/(N*(N-1)/2), 0.5*(1+som))
1746
1747    if argkw.get("print"):
1748        print "Concordant  = %5.1f       Somers' D = %1.3f" % (res[0], res[4])
1749        print "Discordant  = %5.1f       Gamma     = %1.3f" % (res[1], res[5]>0 and res[5] or "N/A")
1750        print "Tied        = %5.1f       Tau-a     = %1.3f" % (res[2], res[6])
1751        print " %6d pairs             c         = %1.3f"    % (res[3], res[7])
1752
1753    return res
1754
1755AROC_from_CDT = ROCs_from_CDT  # for backward compatibility, AROC_from_CDT is obsolote
1756
1757
1758
1759# computes AUC using a specified 'cdtComputer' function
1760# It tries to compute AUCs from 'ite' (examples from a single iteration) and,
1761# if C+D+T=0, from 'all_ite' (entire test set). In the former case, the AUCs
1762# are divided by 'divideByIfIte'. Additional flag is returned which is True in
1763# the former case, or False in the latter.
1764def AUC_x(cdtComputer, ite, all_ite, divideByIfIte, computerArgs):
1765    cdts = cdtComputer(*(ite, ) + computerArgs)
1766    if not is_CDT_empty(cdts[0]):
1767        return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T)/divideByIfIte for cdt in cdts], True
1768       
1769    if all_ite:
1770        cdts = cdtComputer(*(all_ite, ) + computerArgs)
1771        if not is_CDT_empty(cdts[0]):
1772            return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T) for cdt in cdts], False
1773
1774    return False, False
1775
1776   
1777# computes AUC between classes i and j as if there we no other classes
1778def AUC_ij(ite, classIndex1, classIndex2, useWeights = True, all_ite = None, divideByIfIte = 1.0):
1779    import corn
1780    return AUC_x(corn.computeCDTPair, ite, all_ite, divideByIfIte, (classIndex1, classIndex2, useWeights))
1781
1782
1783# computes AUC between class i and the other classes (treating them as the same class)
1784def AUC_i(ite, classIndex, useWeights = True, all_ite = None, divideByIfIte = 1.0):
1785    import corn
1786    return AUC_x(corn.computeCDT, ite, all_ite, divideByIfIte, (classIndex, useWeights))
1787   
1788
1789# computes the average AUC over folds using a "AUCcomputer" (AUC_i or AUC_ij)
1790# it returns the sum of what is returned by the computer, unless at a certain
1791# fold the computer has to resort to computing over all folds or even this failed;
1792# in these cases the result is returned immediately
1793def AUC_iterations(AUCcomputer, iterations, computerArgs):
1794    subsum_aucs = [0.] * iterations[0].number_of_learners
1795    for ite in iterations:
1796        aucs, foldsUsed = AUCcomputer(*(ite, ) + computerArgs)
1797        if not aucs:
1798            return None
1799        if not foldsUsed:
1800            return aucs
1801        subsum_aucs = map(add, subsum_aucs, aucs)
1802    return subsum_aucs
1803
1804
1805# AUC for binary classification problems
1806def AUC_binary(res, useWeights = True):
1807    if res.number_of_iterations > 1:
1808        return AUC_iterations(AUC_i, split_by_iterations(res), (-1, useWeights, res, res.number_of_iterations))
1809    else:
1810        return AUC_i(res, -1, useWeights)[0]
1811
1812# AUC for multiclass problems
1813def AUC_multi(res, useWeights = True, method = 0):
1814    numberOfClasses = len(res.class_values)
1815   
1816    if res.number_of_iterations > 1:
1817        iterations = split_by_iterations(res)
1818        all_ite = res
1819    else:
1820        iterations = [res]
1821        all_ite = None
1822   
1823    # by pairs
1824    sum_aucs = [0.] * res.number_of_learners
1825    usefulClassPairs = 0.
1826
1827    if method in [0, 2]:
1828        prob = class_probabilities_from_res(res)
1829       
1830    if method <= 1:
1831        for classIndex1 in range(numberOfClasses):
1832            for classIndex2 in range(classIndex1):
1833                subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
1834                if subsum_aucs:
1835                    if method == 0:
1836                        p_ij = prob[classIndex1] * prob[classIndex2]
1837                        subsum_aucs = [x * p_ij  for x in subsum_aucs]
1838                        usefulClassPairs += p_ij
1839                    else:
1840                        usefulClassPairs += 1
1841                    sum_aucs = map(add, sum_aucs, subsum_aucs)
1842    else:
1843        for classIndex in range(numberOfClasses):
1844            subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, useWeights, all_ite, res.number_of_iterations))
1845            if subsum_aucs:
1846                if method == 0:
1847                    p_i = prob[classIndex]
1848                    subsum_aucs = [x * p_i  for x in subsum_aucs]
1849                    usefulClassPairs += p_i
1850                else:
1851                    usefulClassPairs += 1
1852                sum_aucs = map(add, sum_aucs, subsum_aucs)
1853                   
1854    if usefulClassPairs > 0:
1855        sum_aucs = [x/usefulClassPairs for x in sum_aucs]
1856
1857    return sum_aucs
1858
1859def AUC():
1860    pass
1861
1862AUC.ByWeightedPairs = 0
1863
1864# Computes AUC, possibly for multiple classes (the averaging method can be specified)
1865# Results over folds are averages; if some folds examples from one class only, the folds are merged
1866def AUC(res, method = AUC.ByWeightedPairs, useWeights = True):
1867    """ Returns the area under ROC curve (AUC) given a set of experimental
1868    results. For multivalued class problems, it will compute some sort of
1869    average, as specified by the argument method.
1870    """
1871    if len(res.class_values) < 2:
1872        raise ValueError("Cannot compute AUC on a single-class problem")
1873    elif len(res.class_values) == 2:
1874        return AUC_binary(res, useWeights)
1875    else:
1876        return AUC_multi(res, useWeights, method)
1877
1878AUC.ByWeightedPairs = 0
1879AUC.ByPairs = 1
1880AUC.WeightedOneAgainstAll = 2
1881AUC.OneAgainstAll = 3
1882
1883
1884# Computes AUC; in multivalued class problem, AUC is computed as one against all
1885# Results over folds are averages; if some folds examples from one class only, the folds are merged
1886def AUC_single(res, classIndex = -1, useWeights = True):
1887    """ Computes AUC where the class given classIndex is singled out, and
1888    all other classes are treated as a single class. To find how good our
1889    classifiers are in distinguishing between vans and other vehicle, call
1890    the function like this::
1891   
1892        Orange.evaluation.scoring.AUC_single(resVeh, \
1893classIndex = vehicle.domain.classVar.values.index("van"))
1894    """
1895    if classIndex<0:
1896        if res.baseClass>=0:
1897            classIndex = res.baseClass
1898        else:
1899            classIndex = 1
1900
1901    if res.number_of_iterations > 1:
1902        return AUC_iterations(AUC_i, split_by_iterations(res), (classIndex, useWeights, res, res.number_of_iterations))
1903    else:
1904        return AUC_i( res, classIndex, useWeights)[0]
1905
1906# Computes AUC for a pair of classes (as if there were no other classes)
1907# Results over folds are averages; if some folds have examples from one class only, the folds are merged
1908def AUC_pair(res, classIndex1, classIndex2, useWeights = True):
1909    """ Computes AUC between a pair of instances, ignoring instances from all
1910    other classes.
1911    """
1912    if res.number_of_iterations > 1:
1913        return AUC_iterations(AUC_ij, split_by_iterations(res), (classIndex1, classIndex2, useWeights, res, res.number_of_iterations))
1914    else:
1915        return AUC_ij(res, classIndex1, classIndex2, useWeights)
1916 
1917
1918# AUC for multiclass problems
1919def AUC_matrix(res, useWeights = True):
1920    """ Computes a (lower diagonal) matrix with AUCs for all pairs of classes.
1921    If there are empty classes, the corresponding elements in the matrix
1922    are -1. Remember the beautiful(?) code for printing out the confusion
1923    matrix? Here it strikes again::
1924   
1925        classes = vehicle.domain.classVar.values
1926        AUCmatrix = Orange.evaluation.scoring.AUC_matrix(resVeh)[0]
1927        print "\t"+"\t".join(classes[:-1])
1928        for className, AUCrow in zip(classes[1:], AUCmatrix[1:]):
1929            print ("%s" + ("\t%5.3f" * len(AUCrow))) % ((className, ) + tuple(AUCrow))
1930    """
1931    numberOfClasses = len(res.class_values)
1932    number_of_learners = res.number_of_learners
1933   
1934    if res.number_of_iterations > 1:
1935        iterations, all_ite = split_by_iterations(res), res
1936    else:
1937        iterations, all_ite = [res], None
1938   
1939    aucs = [[[] for i in range(numberOfClasses)] for i in range(number_of_learners)]
1940    prob = class_probabilities_from_res(res)
1941       
1942    for classIndex1 in range(numberOfClasses):
1943        for classIndex2 in range(classIndex1):
1944            pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
1945            if pair_aucs:
1946                for lrn in range(number_of_learners):
1947                    aucs[lrn][classIndex1].append(pair_aucs[lrn])
1948            else:
1949                for lrn in range(number_of_learners):
1950                    aucs[lrn][classIndex1].append(-1)
1951    return aucs
1952               
1953
1954def McNemar(res, **argkw):
1955    """ Computes a triangular matrix with McNemar statistics for each pair of
1956    classifiers. The statistics is distributed by chi-square distribution with
1957    one degree of freedom; critical value for 5% significance is around 3.84.
1958    """
1959    nLearners = res.number_of_learners
1960    mcm = []
1961    for i in range(nLearners):
1962       mcm.append([0.0]*res.number_of_learners)
1963
1964    if not res.weights or argkw.get("unweighted"):
1965        for i in res.results:
1966            actual = i.actual_class
1967            classes = i.classes
1968            for l1 in range(nLearners):
1969                for l2 in range(l1, nLearners):
1970                    if classes[l1]==actual:
1971                        if classes[l2]!=actual:
1972                            mcm[l1][l2] += 1
1973                    elif classes[l2]==actual:
1974                        mcm[l2][l1] += 1
1975    else:
1976        for i in res.results:
1977            actual = i.actual_class
1978            classes = i.classes
1979            for l1 in range(nLearners):
1980                for l2 in range(l1, nLearners):
1981                    if classes[l1]==actual:
1982                        if classes[l2]!=actual:
1983                            mcm[l1][l2] += i.weight
1984                    elif classes[l2]==actual:
1985                        mcm[l2][l1] += i.weight
1986
1987    for l1 in range(nLearners):
1988        for l2 in range(l1, nLearners):
1989            su=mcm[l1][l2] + mcm[l2][l1]
1990            if su:
1991                mcm[l2][l1] = (abs(mcm[l1][l2]-mcm[l2][l1])-1)**2 / su
1992            else:
1993                mcm[l2][l1] = 0
1994
1995    for l1 in range(nLearners):
1996        mcm[l1]=mcm[l1][:l1]
1997
1998    return mcm
1999
2000
2001def McNemar_of_two(res, lrn1, lrn2):
2002    """ McNemar_of_two computes a McNemar statistics for a pair of classifier,
2003    specified by indices learner1 and learner2.
2004    """
2005    tf = ft = 0.0
2006    if not res.weights or argkw.get("unweighted"):
2007        for i in res.results:
2008            actual=i.actual_class
2009            if i.classes[lrn1]==actual:
2010                if i.classes[lrn2]!=actual:
2011                    tf += i.weight
2012            elif i.classes[lrn2]==actual:
2013                    ft += i.weight
2014    else:
2015        for i in res.results:
2016            actual=i.actual_class
2017            if i.classes[lrn1]==actual:
2018                if i.classes[lrn2]!=actual:
2019                    tf += 1.0
2020            elif i.classes[lrn2]==actual:
2021                    ft += 1.0
2022
2023    su = tf + ft
2024    if su:
2025        return (abs(tf-ft)-1)**2 / su
2026    else:
2027        return 0
2028
2029
2030def Friedman(res, stat=CA):
2031    """ Compares classifiers by Friedman test, treating folds as different examles.
2032        Returns F, p and average ranks
2033    """
2034    res_split = split_by_iterations(res)
2035    res = [stat(r) for r in res_split]
2036   
2037    N = len(res)
2038    k = len(res[0])
2039    sums = [0.0]*k
2040    for r in res:
2041        ranks = [k-x+1 for x in statc.rankdata(r)]
2042        if stat==Brier_score: # reverse ranks for Brier_score (lower better)
2043            ranks = [k+1-x for x in ranks]
2044        sums = [ranks[i]+sums[i] for i in range(k)]
2045
2046    T = sum([x*x for x in sums])
2047    sums = [x/N for x in sums]
2048
2049    F = 12.0 / (N*k*(k+1)) * T  - 3 * N * (k+1)
2050
2051    return F, statc.chisqprob(F, k-1), sums
2052
2053
2054def Wilcoxon_pairs(res, avgranks, stat=CA):
2055    """ Returns a triangular matrix, where element[i][j] stores significance of difference
2056        between i-th and j-th classifier, as computed by Wilcoxon test. The element is positive
2057        if i-th is better than j-th, negative if it is worse, and 1 if they are equal.
2058        Arguments to function are ExperimentResults, average ranks (as returned by Friedman)
2059        and, optionally, a statistics; greater values should mean better results.append
2060    """
2061    res_split = split_by_iterations(res)
2062    res = [stat(r) for r in res_split]
2063
2064    k = len(res[0])
2065    bt = []
2066    for m1 in range(k):
2067        nl = []
2068        for m2 in range(m1+1, k):
2069            t, p = statc.wilcoxont([r[m1] for r in res], [r[m2] for r in res])
2070            if avgranks[m1]<avgranks[m2]:
2071                nl.append(p)
2072            elif avgranks[m2]<avgranks[m1]:
2073                nl.append(-p)
2074            else:
2075                nl.append(1)
2076        bt.append(nl)
2077    return bt
2078
2079
2080def plot_learning_curve_learners(file, allResults, proportions, learners, noConfidence=0):
2081    plot_learning_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], noConfidence)
2082   
2083def plot_learning_curve(file, allResults, proportions, legend, noConfidence=0):
2084    import types
2085    fopened=0
2086    if (type(file)==types.StringType):
2087        file=open(file, "wt")
2088        fopened=1
2089       
2090    file.write("set yrange [0:1]\n")
2091    file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[-1]))
2092    file.write("set multiplot\n\n")
2093    CAs = [CA_dev(x) for x in allResults]
2094
2095    file.write("plot \\\n")
2096    for i in range(len(legend)-1):
2097        if not noConfidence:
2098            file.write("'-' title '' with yerrorbars pointtype %i,\\\n" % (i+1))
2099        file.write("'-' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1))
2100    if not noConfidence:
2101        file.write("'-' title '' with yerrorbars pointtype %i,\\\n" % (len(legend)))
2102    file.write("'-' title '%s' with linespoints pointtype %i\n" % (legend[-1], len(legend)))
2103
2104    for i in range(len(legend)):
2105        if not noConfidence:
2106            for p in range(len(proportions)):
2107                file.write("%f\t%f\t%f\n" % (proportions[p], CAs[p][i][0], 1.96*CAs[p][i][1]))
2108            file.write("e\n\n")
2109
2110        for p in range(len(proportions)):
2111            file.write("%f\t%f\n" % (proportions[p], CAs[p][i][0]))
2112        file.write("e\n\n")
2113
2114    if fopened:
2115        file.close()
2116
2117
2118def print_single_ROC_curve_coordinates(file, curve):
2119    import types
2120    fopened=0
2121    if (type(file)==types.StringType):
2122        file=open(file, "wt")
2123        fopened=1
2124
2125    for coord in curve:
2126        file.write("%5.3f\t%5.3f\n" % tuple(coord))
2127
2128    if fopened:
2129        file.close()
2130
2131
2132def plot_ROC_learners(file, curves, learners):
2133    plot_ROC(file, curves, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))])
2134   
2135def plot_ROC(file, curves, legend):
2136    import types
2137    fopened=0
2138    if (type(file)==types.StringType):
2139        file=open(file, "wt")
2140        fopened=1
2141
2142    file.write("set yrange [0:1]\n")
2143    file.write("set xrange [0:1]\n")
2144    file.write("set multiplot\n\n")
2145
2146    file.write("plot \\\n")
2147    for leg in legend:
2148        file.write("'-' title '%s' with lines,\\\n" % leg)
2149    file.write("'-' title '' with lines\n")
2150
2151    for curve in curves:
2152        for coord in curve:
2153            file.write("%5.3f\t%5.3f\n" % tuple(coord))
2154        file.write("e\n\n")
2155
2156    file.write("1.0\t1.0\n0.0\t0.0e\n\n")         
2157
2158    if fopened:
2159        file.close()
2160
2161
2162
2163def plot_McNemar_curve_learners(file, allResults, proportions, learners, reference=-1):
2164    plot_McNemar_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], reference)
2165
2166def plot_McNemar_curve(file, allResults, proportions, legend, reference=-1):
2167    if reference<0:
2168        reference=len(legend)-1
2169       
2170    import types
2171    fopened=0
2172    if (type(file)==types.StringType):
2173        file=open(file, "wt")
2174        fopened=1
2175       
2176    #file.write("set yrange [0:1]\n")
2177    #file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[-1]))
2178    file.write("set multiplot\n\n")
2179    file.write("plot \\\n")
2180    tmap=range(reference)+range(reference+1, len(legend))
2181    for i in tmap[:-1]:
2182        file.write("'-' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1))
2183    file.write("'-' title '%s' with linespoints pointtype %i\n" % (legend[tmap[-1]], tmap[-1]))
2184    file.write("\n")
2185
2186    for i in tmap:
2187        for p in range(len(proportions)):
2188            file.write("%f\t%f\n" % (proportions[p], McNemar_of_two(allResults[p], i, reference)))
2189        file.write("e\n\n")
2190
2191    if fopened:
2192        file.close()
2193
2194default_point_types=("{$\\circ$}", "{$\\diamond$}", "{$+$}", "{$\\times$}", "{$|$}")+tuple([chr(x) for x in range(97, 122)])
2195default_line_types=("\\setsolid", "\\setdashpattern <4pt, 2pt>", "\\setdashpattern <8pt, 2pt>", "\\setdashes", "\\setdots")
2196
2197def learning_curve_learners_to_PiCTeX(file, allResults, proportions, **options):
2198    return apply(learning_curve_to_PiCTeX, (file, allResults, proportions), options)
2199   
2200def learning_curve_to_PiCTeX(file, allResults, proportions, **options):
2201    import types
2202    fopened=0
2203    if (type(file)==types.StringType):
2204        file=open(file, "wt")
2205        fopened=1
2206
2207    nexamples=len(allResults[0].results)
2208    CAs = [CA_dev(x) for x in allResults]
2209
2210    graphsize=float(options.get("graphsize", 10.0)) #cm
2211    difprop=proportions[-1]-proportions[0]
2212    ntestexamples=nexamples*proportions[-1]
2213    xunit=graphsize/ntestexamples
2214
2215    yshift=float(options.get("yshift", -ntestexamples/20.))
2216   
2217    pointtypes=options.get("pointtypes", default_point_types)
2218    linetypes=options.get("linetypes", default_line_types)
2219
2220    if options.has_key("numberedx"):
2221        numberedx=options["numberedx"]
2222        if type(numberedx)==types.IntType:
2223            if numberedx>0:
2224                numberedx=[nexamples*proportions[int(i/float(numberedx)*len(proportions))] for i in range(numberedx)]+[proportions[-1]*nexamples]
2225            elif numberedx<0:
2226                numberedx = -numberedx
2227                newn=[]
2228                for i in range(numberedx+1):
2229                    wanted=proportions[0]+float(i)/numberedx*difprop
2230                    best=(10, 0)
2231                    for t in proportions:
2232                        td=abs(wanted-t)
2233                        if td<best[0]:
2234                            best=(td, t)
2235                    if not best[1] in newn:
2236                        newn.append(best[1])
2237                newn.sort()
2238                numberedx=[nexamples*x for x in newn]
2239        elif type(numberedx[0])==types.FloatType:
2240            numberedx=[nexamples*x for x in numberedx]
2241    else:
2242        numberedx=[nexamples*x for x in proportions]
2243
2244    file.write("\\mbox{\n")
2245    file.write(\\beginpicture\n")
2246    file.write(\\setcoordinatesystem units <%10.8fcm, %5.3fcm>\n\n" % (xunit, graphsize))   
2247    file.write(\\setplotarea x from %5.3f to %5.3f, y from 0 to 1\n" % (0, ntestexamples))   
2248    file.write(\\axis bottom invisible\n")# label {#examples}\n")
2249    file.write("      ticks short at %s /\n" % reduce(lambda x,y:x+" "+y, ["%i"%(x*nexamples+0.5) for x in proportions]))
2250    if numberedx:
2251        file.write("            long numbered at %s /\n" % reduce(lambda x,y:x+y, ["%i " % int(x+0.5) for x in numberedx]))
2252    file.write("  /\n")
2253    file.write(\\axis left invisible\n")# label {classification accuracy}\n")
2254    file.write("      shiftedto y=%5.3f\n" % yshift)
2255    file.write("      ticks short from 0.0 to 1.0 by 0.05\n")
2256    file.write("            long numbered from 0.0 to 1.0 by 0.25\n")
2257    file.write("  /\n")
2258    if options.has_key("default"):
2259        file.write(\\setdashpattern<1pt, 1pt>\n")
2260        file.write(\\plot %5.3f %5.3f %5.3f %5.3f /\n" % (0., options["default"], ntestexamples, options["default"]))
2261   
2262    for i in range(len(CAs[0])):
2263        coordinates=reduce(lambda x,y:x+" "+y, ["%i %5.3f" % (proportions[p]*nexamples, CAs[p][i][0]) for p in range(len(proportions))])
2264        if linetypes:
2265            file.write(%s\n" % linetypes[i])
2266            file.write(\\plot %s /\n" % coordinates)
2267        if pointtypes:
2268            file.write(\\multiput %s at %s /\n" % (pointtypes[i], coordinates))
2269
2270    file.write(\\endpicture\n")
2271    file.write("}\n")
2272    if fopened:
2273        file.close()
2274    file.close()
2275    del file
2276
2277def legend_learners_to_PiCTeX(file, learners, **options):
2278  return apply(legend_to_PiCTeX, (file, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))]), options)
2279   
2280def legend_to_PiCTeX(file, legend, **options):
2281    import types
2282    fopened=0
2283    if (type(file)==types.StringType):
2284        file=open(file, "wt")
2285        fopened=1
2286
2287    pointtypes=options.get("pointtypes", default_point_types)
2288    linetypes=options.get("linetypes", default_line_types)
2289
2290    file.write("\\mbox{\n")
2291    file.write(\\beginpicture\n")
2292    file.write(\\setcoordinatesystem units <5cm, 1pt>\n\n")
2293    file.write(\\setplotarea x from 0.000 to %5.3f, y from 0 to 12\n" % len(legend))
2294
2295    for i in range(len(legend)):
2296        if linetypes:
2297            file.write(%s\n" % linetypes[i])
2298            file.write(\\plot %5.3f 6 %5.3f 6 /\n" % (i, i+0.2))
2299        if pointtypes:
2300            file.write(\\put {%s} at %5.3f 6\n" % (pointtypes[i], i+0.1))
2301        file.write(\\put {%s} [lb] at %5.3f 0\n" % (legend[i], i+0.25))
2302
2303    file.write(\\endpicture\n")
2304    file.write("}\n")
2305    if fopened:
2306        file.close()
2307    file.close()
2308    del file
2309
2310
2311def compute_friedman(avranks, N):
2312    """ Returns a tuple composed of (friedman statistic, degrees of freedom)
2313    and (Iman statistic - F-distribution, degrees of freedoma) given average
2314    ranks and a number of tested data sets N.
2315    """
2316
2317    k = len(avranks)
2318
2319    def friedman(N, k, ranks):
2320        return 12*N*(sum([rank**2.0 for rank in ranks]) - (k*(k+1)*(k+1)/4.0) )/(k*(k+1))
2321
2322    def iman(fried, N, k):
2323        return (N-1)*fried/(N*(k-1) - fried)
2324
2325    f = friedman(N, k, avranks)
2326    im = iman(f, N, k)
2327    fdistdof = (k-1, (k-1)*(N-1))
2328
2329    return (f, k-1), (im, fdistdof)
2330
2331def compute_CD(avranks, N, alpha="0.05", type="nemenyi"):
2332    """ Returns critical difference for Nemenyi or Bonferroni-Dunn test
2333    according to given alpha (either alpha="0.05" or alpha="0.1") for average
2334    ranks and number of tested data sets N. Type can be either "nemenyi" for
2335    for Nemenyi two tailed test or "bonferroni-dunn" for Bonferroni-Dunn test.
2336    """
2337
2338    k = len(avranks)
2339   
2340    d = {}
2341
2342    #can be computed in R as qtukey(0.95, n, Inf)**0.5
2343    #for (x in c(2:20)) print(qtukey(0.95, x, Inf)/(2**0.5)
2344    d[("nemenyi", "0.05")] = [0, 0, 1.959964, 2.343701, 2.569032, 2.727774, 2.849705, 2.94832, 3.030879, 3.101730, 3.163684, 3.218654, 3.268004, 3.312739, 3.353618, 3.39123, 3.426041, 3.458425, 3.488685, 3.517073, 3.543799 ]
2345    d[("nemenyi", "0.1")] = [0, 0, 1.644854, 2.052293, 2.291341, 2.459516, 2.588521, 2.692732, 2.779884, 2.854606, 2.919889, 2.977768, 3.029694, 3.076733, 3.119693, 3.159199, 3.195743, 3.229723, 3.261461, 3.291224, 3.319233 ]
2346
2347    d[("bonferroni-dunn", "0.05")] =  [0, 0, 1.960, 2.241, 2.394, 2.498, 2.576, 2.638, 2.690, 2.724, 2.773 ]
2348    d[("bonferroni-dunn", "0.1")] = [0, 0, 1.645, 1.960, 2.128, 2.241, 2.326, 2.394, 2.450, 2.498, 2.539 ]
2349
2350    q = d[(type, alpha)]
2351
2352    cd = q[k]*(k*(k+1)/(6.0*N))**0.5
2353
2354    return cd
2355 
2356
2357def graph_ranks(filename, avranks, names, cd=None, cdmethod=None, lowv=None, highv=None, width=6, textspace=1, reverse=False, **kwargs):
2358    """
2359    Draws a CD graph, which is used to display  the differences in methods'
2360    performance.
2361    See Janez Demsar, Statistical Comparisons of Classifiers over
2362    Multiple Data Sets, 7(Jan):1--30, 2006.
2363
2364    Needs matplotlib to work.
2365
2366    :param filename: Output file name (with extension). Formats supported
2367                     by matplotlib can be used.
2368    :param avranks: List of average methods' ranks.
2369    :param names: List of methods' names.
2370
2371    :param cd: Critical difference. Used for marking methods that whose
2372               difference is not statistically significant.
2373    :param lowv: The lowest shown rank, if None, use 1.
2374    :param highv: The highest shown rank, if None, use len(avranks).
2375    :param width: Width of the drawn figure in inches, default 6 in.
2376    :param textspace: Space on figure sides left for the description
2377                      of methods, default 1 in.
2378    :param reverse:  If True, the lowest rank is on the right. Default: False.
2379    :param cdmethod: None by default. It can be an index of element in avranks
2380                     or or names which specifies the method which should be
2381                     marked with an interval.
2382    """
2383
2384    width = float(width)
2385    textspace = float(textspace)
2386
2387    def nth(l,n):
2388        """
2389        Returns only nth elemnt in a list.
2390        """
2391        n = lloc(l,n)
2392        return [ a[n] for a in l ]
2393
2394    def lloc(l,n):
2395        """
2396        List location in list of list structure.
2397        Enable the use of negative locations:
2398        -1 is the last element, -2 second last...
2399        """
2400        if n < 0:
2401            return len(l[0])+n
2402        else:
2403            return n
2404
2405    def mxrange(lr):
2406        """
2407        Multiple xranges. Can be used to traverse matrices.
2408        This function is very slow due to unknown number of
2409        parameters.
2410
2411        >>> mxrange([3,5])
2412        [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
2413
2414        >>> mxrange([[3,5,1],[9,0,-3]])
2415        [(3, 9), (3, 6), (3, 3), (4, 9), (4, 6), (4, 3)]
2416
2417        """
2418        if len(lr) == 0:
2419            yield ()
2420        else:
2421            #it can work with single numbers
2422            index = lr[0]
2423            if type(1) == type(index):
2424                index = [ index ]
2425            for a in range(*index):
2426                for b in mxrange(lr[1:]):
2427                    yield tuple([a] + list(b))
2428
2429    try:
2430        from matplotlib.figure import Figure
2431        from matplotlib.patches import Polygon
2432        from matplotlib.backends.backend_agg import FigureCanvasAgg
2433    except:
2434        import sys
2435        print >> sys.stderr, "Function requires matplotlib. Please install it."
2436        return
2437
2438    def print_figure(fig, *args, **kwargs):
2439        canvas = FigureCanvasAgg(fig)
2440        canvas.print_figure(*args, **kwargs)
2441
2442    sums = avranks
2443
2444    tempsort =  sorted([ (a,i) for i,a in  enumerate(sums) ], reverse=reverse)
2445    ssums = nth(tempsort, 0)
2446    sortidx = nth(tempsort, 1)
2447    nnames = [ names[x] for x in sortidx ]
2448   
2449    if lowv == None:
2450        lowv = min(1, int(math.floor(min(ssums))))
2451    if highv == None:
2452        highv = max(len(avranks), int(math.ceil(max(ssums))))
2453
2454    cline = 0.4
2455
2456    k = len(sums)
2457
2458    lines = None
2459    sums = sorted(sums)
2460
2461    linesblank = 0
2462    scalewidth = width - 2*textspace
2463
2464    def rankpos(rank):
2465        if not reverse:
2466            a = rank - lowv
2467        else:
2468            a = highv - rank
2469        return textspace+scalewidth/(highv-lowv)*a
2470
2471    distanceh = 0.25
2472
2473    if cd and cdmethod == None:
2474   
2475        #get pairs of non significant methods
2476
2477        def get_lines(sums, hsd):
2478
2479            #get all pairs
2480            lsums = len(sums)
2481            allpairs = [ (i,j) for i,j in mxrange([[lsums], [lsums]]) if j > i ]
2482            #remove not significant
2483            notSig = [ (i,j) for i,j in allpairs if abs(sums[i]-sums[j]) <= hsd ]
2484            #keep only longest
2485           
2486            def no_longer((i,j), notSig):
2487                for i1,j1 in notSig:
2488                    if (i1 <= i and j1 > j) or (i1 < i and j1 >= j):
2489                        return False
2490                return True
2491
2492            longest = [ (i,j) for i,j in notSig if no_longer((i,j),notSig) ]
2493           
2494            return longest
2495
2496        lines = get_lines(ssums, cd)
2497        linesblank = 0.2 + 0.2 + (len(lines)-1)*0.1
2498
2499        #add scale
2500        distanceh = 0.25
2501        cline += distanceh
2502
2503    #calculate height needed height of an image
2504    minnotsignificant = max(2*0.2, linesblank)
2505    height = cline + ((k+1)/2)*0.2 + minnotsignificant
2506
2507    fig = Figure(figsize=(width, height))
2508    ax = fig.add_axes([0,0,1,1]) #reverse y axis
2509    ax.set_axis_off()
2510
2511    hf = 1./height # height factor
2512    wf = 1./width
2513
2514    def hfl(l): 
2515        return [ a*hf for a in l ]
2516
2517    def wfl(l): 
2518        return [ a*wf for a in l ]
2519
2520    """
2521    Upper left corner is (0,0).
2522    """
2523
2524    ax.plot([0,1], [0,1], c="w")
2525    ax.set_xlim(0, 1)
2526    ax.set_ylim(1, 0)
2527
2528    def line(l, color='k', **kwargs):
2529        """
2530        Input is a list of pairs of points.
2531        """
2532        ax.plot(wfl(nth(l,0)), hfl(nth(l,1)), color=color, **kwargs)
2533
2534    def text(x, y, s, *args, **kwargs):
2535        ax.text(wf*x, hf*y, s, *args, **kwargs)
2536
2537    line([(textspace, cline), (width-textspace, cline)], linewidth=0.7)
2538   
2539    bigtick = 0.1
2540    smalltick = 0.05
2541
2542
2543    import numpy
2544
2545    for a in list(numpy.arange(lowv, highv, 0.5)) + [highv]:
2546        tick = smalltick
2547        if a == int(a): tick = bigtick
2548        line([(rankpos(a), cline-tick/2),(rankpos(a), cline)], linewidth=0.7)
2549
2550    for a in range(lowv, highv+1):
2551        text(rankpos(a), cline-tick/2-0.05, str(a), ha="center", va="bottom")
2552
2553    k = len(ssums)
2554
2555    for i in range((k+1)/2):
2556        chei = cline+ minnotsignificant + (i)*0.2
2557        line([(rankpos(ssums[i]), cline), (rankpos(ssums[i]), chei), (textspace-0.1, chei)], linewidth=0.7)
2558        text(textspace-0.2, chei, nnames[i], ha="right", va="center")
2559
2560    for i in range((k+1)/2, k):
2561        chei = cline + minnotsignificant + (k-i-1)*0.2
2562        line([(rankpos(ssums[i]), cline), (rankpos(ssums[i]), chei), (textspace+scalewidth+0.1, chei)], linewidth=0.7)
2563        text(textspace+scalewidth+0.2, chei, nnames[i], ha="left", va="center")
2564
2565    if cd and cdmethod == None:
2566
2567        #upper scale
2568        if not reverse:
2569            begin, end = rankpos(lowv), rankpos(lowv+cd)
2570        else:
2571            begin, end = rankpos(highv), rankpos(highv - cd)
2572           
2573        line([(begin, distanceh), (end, distanceh)], linewidth=0.7)
2574        line([(begin, distanceh + bigtick/2), (begin, distanceh - bigtick/2)], linewidth=0.7)
2575        line([(end, distanceh + bigtick/2), (end, distanceh - bigtick/2)], linewidth=0.7)
2576        text((begin+end)/2, distanceh - 0.05, "CD", ha="center", va="bottom")
2577
2578        #non significance lines   
2579        def draw_lines(lines, side=0.05, height=0.1):
2580            start = cline + 0.2
2581            for l,r in lines: 
2582                line([(rankpos(ssums[l])-side, start), (rankpos(ssums[r])+side, start)], linewidth=2.5) 
2583                start += height
2584
2585        draw_lines(lines)
2586
2587    elif cd:
2588        begin = rankpos(avranks[cdmethod]-cd)
2589        end = rankpos(avranks[cdmethod]+cd)
2590        line([(begin, cline), (end, cline)], linewidth=2.5) 
2591        line([(begin, cline + bigtick/2), (begin, cline - bigtick/2)], linewidth=2.5)
2592        line([(end, cline + bigtick/2), (end, cline - bigtick/2)], linewidth=2.5)
2593 
2594    print_figure(fig, filename, **kwargs)
2595
2596def mlc_hamming_loss(res):
2597    """
2598    Schapire and Singer (2000) presented Hamming Loss, which id defined as:
2599   
2600    :math:`HammingLoss(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{Y_i \\vartriangle Z_i}{|L|}`
2601    """
2602    losses = [0]*res.number_of_learners
2603    label_num = len(res.labels)
2604    example_num = gettotsize(res)
2605   
2606    for e in res.results:
2607        aclass = e.actual_class
2608        for i, labels in enumerate(e.classes):
2609            labels = map(int, labels)
2610            if len(labels) <> len(aclass):
2611                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2612            for j in range(label_num):
2613                if labels[j] != aclass[j]:
2614                    losses[i] += 1
2615           
2616    return [float(x)/(label_num*example_num) for x in losses]
2617
2618def mlc_accuracy(res, forgiveness_rate = 1.0):
2619    """
2620    Godbole & Sarawagi, 2004 uses the metrics accuracy, precision, recall as follows:
2621     
2622    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|}`
2623   
2624    Boutell et al. (2004) give a more generalized version using a parameter :math:`\\alpha \\ge 0`,
2625    called forgiveness rate:
2626   
2627    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} (\\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|})^{\\alpha}`
2628    """
2629    accuracies = [0.0]*res.number_of_learners
2630    label_num = len(res.labels)
2631    example_num = gettotsize(res)
2632   
2633    for e in res.results:
2634        aclass = e.actual_class
2635        for i, labels in enumerate(e.classes):
2636            labels = map(int, labels)
2637            if len(labels) <> len(aclass):
2638                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2639           
2640            intersection = 0.0
2641            union = 0.0
2642            for real, pred in zip(labels, aclass):
2643                if real and pred:
2644                    intersection = intersection+1
2645                if real or pred:
2646                    union = union+1
2647
2648            if union != 0:
2649                accuracies[i] = accuracies[i] + intersection/union
2650           
2651    return [math.pow(x/example_num,forgiveness_rate) for x in accuracies]
2652
2653def mlc_precision(res):
2654    """
2655    :math:`Precision(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Z_i|}`
2656    """
2657    precisions = [0.0]*res.number_of_learners
2658    label_num = len(res.labels)
2659    example_num = gettotsize(res)
2660   
2661    for e in res.results:
2662        aclass = e.actual_class
2663        for i, labels in enumerate(e.classes):
2664            labels = map(int, labels)
2665            if len(labels) <> len(aclass):
2666                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2667           
2668            intersection = 0.0
2669            predicted = 0.0
2670            for real, pred in zip(labels, aclass):
2671                if real and pred:
2672                    intersection = intersection+1
2673                if real:
2674                    predicted = predicted + 1
2675            if predicted <> 0:
2676                precisions[i] = precisions[i] + intersection/predicted
2677           
2678    return [x/example_num for x in precisions]
2679
2680def mlc_recall(res):
2681    """
2682    :math:`Recall(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i|}`
2683    """
2684    recalls = [0.0]*res.number_of_learners
2685    label_num = len(res.labels)
2686    example_num = gettotsize(res)
2687   
2688    for e in res.results:
2689        aclass = e.actual_class
2690        for i, labels in enumerate(e.classes):
2691            labels = map(int, labels)
2692            if len(labels) <> len(aclass):
2693                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2694           
2695            intersection = 0.0
2696            actual = 0.0
2697            for real, pred in zip(labels, aclass):
2698                if real and pred:
2699                    intersection = intersection+1
2700                if pred:
2701                    actual = actual + 1
2702            if actual <> 0:
2703                recalls[i] = recalls[i] + intersection/actual
2704           
2705    return [x/example_num for x in recalls]
2706
2707#def mlc_ranking_loss(res):
2708#    pass
2709#
2710#def mlc_average_precision(res):
2711#    pass
2712#
2713#def mlc_hierarchical_loss(res):
2714#    pass
2715
2716#########################################################################################
2717if __name__ == "__main__":
2718    avranks =  [3.143, 2.000, 2.893, 1.964]
2719    names = ["prva", "druga", "tretja", "cetrta" ]
2720    cd = compute_CD(avranks, 14)
2721    #cd = compute_CD(avranks, 10, type="bonferroni-dunn")
2722    print cd
2723
2724    print compute_friedman(avranks, 14)
2725
2726    #graph_ranks("test.eps", avranks, names, cd=cd, cdmethod=0, width=6, textspace=1.5)
Note: See TracBrowser for help on using the repository browser.