source: orange/Orange/evaluation/scoring.py @ 9725:6c16952df555

Revision 9725:6c16952df555, 100.3 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Updated imports of c extensions.

Line 
1"""
2############################
3Method scoring (``scoring``)
4############################
5
6.. index: scoring
7
8This module contains various measures of quality for classification and
9regression. Most functions require an argument named :obj:`res`, an instance of
10:class:`Orange.evaluation.testing.ExperimentResults` as computed by
11functions from :mod:`Orange.evaluation.testing` and which contains
12predictions obtained through cross-validation,
13leave one-out, testing on training data or test set instances.
14
15==============
16Classification
17==============
18
19To prepare some data for examples on this page, we shall load the voting data
20set (problem of predicting the congressman's party (republican, democrat)
21based on a selection of votes) and evaluate naive Bayesian learner,
22classification trees and majority classifier using cross-validation.
23For examples requiring a multivalued class problem, we shall do the same
24with the vehicle data set (telling whether a vehicle described by the features
25extracted from a picture is a van, bus, or Opel or Saab car).
26
27Basic cross validation example is shown in the following part of
28(:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`):
29
30.. literalinclude:: code/statExample0.py
31
32If instances are weighted, weights are taken into account. This can be
33disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of
34disabling weights is to clear the
35:class:`Orange.evaluation.testing.ExperimentResults`' flag weights.
36
37General Measures of Quality
38===========================
39
40.. autofunction:: CA
41
42.. autofunction:: AP
43
44.. autofunction:: Brier_score
45
46.. autofunction:: IS
47
48So, let's compute all this in part of
49(:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`) and print it out:
50
51.. literalinclude:: code/statExample1.py
52   :lines: 13-
53
54The output should look like this::
55
56    method  CA      AP      Brier    IS
57    bayes   0.903   0.902   0.175    0.759
58    tree    0.846   0.845   0.286    0.641
59    majrty  0.614   0.526   0.474   -0.000
60
61Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out
62the standard errors.
63
64Confusion Matrix
65================
66
67.. autofunction:: confusion_matrices
68
69   **A positive-negative confusion matrix** is computed (a) if the class is
70   binary unless :obj:`classIndex` argument is -2, (b) if the class is
71   multivalued and the :obj:`classIndex` is non-negative. Argument
72   :obj:`classIndex` then tells which class is positive. In case (a),
73   :obj:`classIndex` may be omitted; the first class
74   is then negative and the second is positive, unless the :obj:`baseClass`
75   attribute in the object with results has non-negative value. In that case,
76   :obj:`baseClass` is an index of the target class. :obj:`baseClass`
77   attribute of results object should be set manually. The result of a
78   function is a list of instances of class :class:`ConfusionMatrix`,
79   containing the (weighted) number of true positives (TP), false
80   negatives (FN), false positives (FP) and true negatives (TN).
81   
82   We can also add the keyword argument :obj:`cutoff`
83   (e.g. confusion_matrices(results, cutoff=0.3); if we do, :obj:`confusion_matrices`
84   will disregard the classifiers' class predictions and observe the predicted
85   probabilities, and consider the prediction "positive" if the predicted
86   probability of the positive class is higher than the :obj:`cutoff`.
87
88   The example (part of :download:`statExamples.py <code/statExamples.py>`) below shows how setting the
89   cut off threshold from the default 0.5 to 0.2 affects the confusion matrics
90   for naive Bayesian classifier::
91   
92       cm = Orange.evaluation.scoring.confusion_matrices(res)[0]
93       print "Confusion matrix for naive Bayes:"
94       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
95       
96       cm = Orange.evaluation.scoring.confusion_matrices(res, cutoff=0.2)[0]
97       print "Confusion matrix for naive Bayes:"
98       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
99
100   The output::
101   
102       Confusion matrix for naive Bayes:
103       TP: 238, FP: 13, FN: 29.0, TN: 155
104       Confusion matrix for naive Bayes:
105       TP: 239, FP: 18, FN: 28.0, TN: 150
106   
107   shows that the number of true positives increases (and hence the number of
108   false negatives decreases) by only a single instance, while five instances
109   that were originally true negatives become false positives due to the
110   lower threshold.
111   
112   To observe how good are the classifiers in detecting vans in the vehicle
113   data set, we would compute the matrix like this::
114   
115      cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \
116vehicle.domain.classVar.values.index("van"))
117   
118   and get the results like these::
119   
120       TP: 189, FP: 241, FN: 10.0, TN: 406
121   
122   while the same for class "opel" would give::
123   
124       TP: 86, FP: 112, FN: 126.0, TN: 522
125       
126   The main difference is that there are only a few false negatives for the
127   van, meaning that the classifier seldom misses it (if it says it's not a
128   van, it's almost certainly not a van). Not so for the Opel car, where the
129   classifier missed 126 of them and correctly detected only 86.
130   
131   **General confusion matrix** is computed (a) in case of a binary class,
132   when :obj:`classIndex` is set to -2, (b) when we have multivalued class and
133   the caller doesn't specify the :obj:`classIndex` of the positive class.
134   When called in this manner, the function cannot use the argument
135   :obj:`cutoff`.
136   
137   The function then returns a three-dimensional matrix, where the element
138   A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`]
139   gives the number of instances belonging to 'actual_class' for which the
140   'learner' predicted 'predictedClass'. We shall compute and print out
141   the matrix for naive Bayesian classifier.
142   
143   Here we see another example from :download:`statExamples.py <code/statExamples.py>`::
144   
145       cm = Orange.evaluation.scoring.confusion_matrices(resVeh)[0]
146       classes = vehicle.domain.classVar.values
147       print "\t"+"\t".join(classes)
148       for className, classConfusions in zip(classes, cm):
149           print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions))
150   
151   So, here's what this nice piece of code gives::
152   
153              bus   van  saab opel
154       bus     56   95   21   46
155       van     6    189  4    0
156       saab    3    75   73   66
157       opel    4    71   51   86
158       
159   Van's are clearly simple: 189 vans were classified as vans (we know this
160   already, we've printed it out above), and the 10 misclassified pictures
161   were classified as buses (6) and Saab cars (4). In all other classes,
162   there were more instances misclassified as vans than correctly classified
163   instances. The classifier is obviously quite biased to vans.
164   
165   .. method:: sens(confm)
166   .. method:: spec(confm)
167   .. method:: PPV(confm)
168   .. method:: NPV(confm)
169   .. method:: precision(confm)
170   .. method:: recall(confm)
171   .. method:: F2(confm)
172   .. method:: Falpha(confm, alpha=2.0)
173   .. method:: MCC(conf)
174
175   With the confusion matrix defined in terms of positive and negative
176   classes, you can also compute the
177   `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_(tests)>`_
178   [TP/(TP+FN)], `specificity \
179<http://en.wikipedia.org/wiki/Specificity_%28tests%29>`_
180   [TN/(TN+FP)], `positive predictive value \
181<http://en.wikipedia.org/wiki/Positive_predictive_value>`_
182   [TP/(TP+FP)] and `negative predictive value \
183<http://en.wikipedia.org/wiki/Negative_predictive_value>`_ [TN/(TN+FN)].
184   In information retrieval, positive predictive value is called precision
185   (the ratio of the number of relevant records retrieved to the total number
186   of irrelevant and relevant records retrieved), and sensitivity is called
187   `recall <http://en.wikipedia.org/wiki/Information_retrieval>`_
188   (the ratio of the number of relevant records retrieved to the total number
189   of relevant records in the database). The
190   `harmonic mean <http://en.wikipedia.org/wiki/Harmonic_mean>`_ of precision
191   and recall is called an
192   `F-measure <http://en.wikipedia.org/wiki/F-measure>`_, where, depending
193   on the ratio of the weight between precision and recall is implemented
194   as F1 [2*precision*recall/(precision+recall)] or, for a general case,
195   Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)].
196   The `Matthews correlation coefficient \
197<http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
198   in essence a correlation coefficient between
199   the observed and predicted binary classifications; it returns a value
200   between -1 and +1. A coefficient of +1 represents a perfect prediction,
201   0 an average random prediction and -1 an inverse prediction.
202   
203   If the argument :obj:`confm` is a single confusion matrix, a single
204   result (a number) is returned. If confm is a list of confusion matrices,
205   a list of scores is returned, one for each confusion matrix.
206   
207   Note that weights are taken into account when computing the matrix, so
208   these functions don't check the 'weighted' keyword argument.
209   
210   Let us print out sensitivities and specificities of our classifiers in
211   part of :download:`statExamples.py <code/statExamples.py>`::
212   
213       cm = Orange.evaluation.scoring.confusion_matrices(res)
214       print
215       print "method\tsens\tspec"
216       for l in range(len(learners)):
217           print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l]))
218   
219ROC Analysis
220============
221
222`Receiver Operating Characteristic \
223<http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
224(ROC) analysis was initially developed for
225a binary-like problems and there is no consensus on how to apply it in
226multi-class problems, nor do we know for sure how to do ROC analysis after
227cross validation and similar multiple sampling techniques. If you are
228interested in the area under the curve, function AUC will deal with those
229problems as specifically described below.
230
231.. autofunction:: AUC
232   
233   .. attribute:: AUC.ByWeightedPairs (or 0)
234     
235      Computes AUC for each pair of classes (ignoring instances of all other
236      classes) and averages the results, weighting them by the number of
237      pairs of instances from these two classes (e.g. by the product of
238      probabilities of the two classes). AUC computed in this way still
239      behaves as concordance index, e.g., gives the probability that two
240      randomly chosen instances from different classes will be correctly
241      recognized (this is of course true only if the classifier knows
242      from which two classes the instances came).
243   
244   .. attribute:: AUC.ByPairs (or 1)
245   
246      Similar as above, except that the average over class pairs is not
247      weighted. This AUC is, like the binary, independent of class
248      distributions, but it is not related to concordance index any more.
249     
250   .. attribute:: AUC.WeightedOneAgainstAll (or 2)
251     
252      For each class, it computes AUC for this class against all others (that
253      is, treating other classes as one class). The AUCs are then averaged by
254      the class probabilities. This is related to concordance index in which
255      we test the classifier's (average) capability for distinguishing the
256      instances from a specified class from those that come from other classes.
257      Unlike the binary AUC, the measure is not independent of class
258      distributions.
259     
260   .. attribute:: AUC.OneAgainstAll (or 3)
261   
262      As above, except that the average is not weighted.
263   
264   In case of multiple folds (for instance if the data comes from cross
265   validation), the computation goes like this. When computing the partial
266   AUCs for individual pairs of classes or singled-out classes, AUC is
267   computed for each fold separately and then averaged (ignoring the number
268   of instances in each fold, it's just a simple average). However, if a
269   certain fold doesn't contain any instances of a certain class (from the
270   pair), the partial AUC is computed treating the results as if they came
271   from a single-fold. This is not really correct since the class
272   probabilities from different folds are not necessarily comparable,
273   yet this will most often occur in a leave-one-out experiments,
274   comparability shouldn't be a problem.
275   
276   Computing and printing out the AUC's looks just like printing out
277   classification accuracies (except that we call AUC instead of
278   CA, of course)::
279   
280       AUCs = Orange.evaluation.scoring.AUC(res)
281       for l in range(len(learners)):
282           print "%10s: %5.3f" % (learners[l].name, AUCs[l])
283           
284   For vehicle, you can run exactly this same code; it will compute AUCs
285   for all pairs of classes and return the average weighted by probabilities
286   of pairs. Or, you can specify the averaging method yourself, like this::
287   
288       AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll)
289   
290   The following snippet tries out all four. (We don't claim that this is
291   how the function needs to be used; it's better to stay with the default.)::
292   
293       methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]
294       print " " *25 + "  \tbayes\ttree\tmajority"
295       for i in range(4):
296           AUCs = Orange.evaluation.scoring.AUC(resVeh, i)
297           print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))
298   
299   As you can see from the output::
300   
301                                   bayes   tree    majority
302              by pairs, weighted:  0.789   0.871   0.500
303                        by pairs:  0.791   0.872   0.500
304           one vs. all, weighted:  0.783   0.800   0.500
305                     one vs. all:  0.783   0.800   0.500
306
307.. autofunction:: AUC_single
308
309.. autofunction:: AUC_pair
310
311.. autofunction:: AUC_matrix
312
313The remaining functions, which plot the curves and statistically compare
314them, require that the results come from a test with a single iteration,
315and they always compare one chosen class against all others. If you have
316cross validation results, you can either use split_by_iterations to split the
317results by folds, call the function for each fold separately and then sum
318the results up however you see fit, or you can set the ExperimentResults'
319attribute number_of_iterations to 1, to cheat the function - at your own
320responsibility for the statistical correctness. Regarding the multi-class
321problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class
322attribute's baseValue at the time when results were computed. If baseValue
323was not given at that time, 1 (that is, the second class) is used as default.
324
325We shall use the following code to prepare suitable experimental results::
326
327    ri2 = Orange.core.MakeRandomIndices2(voting, 0.6)
328    train = voting.selectref(ri2, 0)
329    test = voting.selectref(ri2, 1)
330    res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test)
331
332
333.. autofunction:: AUCWilcoxon
334
335.. autofunction:: compute_ROC
336
337Comparison of Algorithms
338------------------------
339
340.. autofunction:: McNemar
341
342.. autofunction:: McNemar_of_two
343
344==========
345Regression
346==========
347
348General Measure of Quality
349==========================
350
351Several alternative measures, as given below, can be used to evaluate
352the sucess of numeric prediction:
353
354.. image:: files/statRegression.png
355
356.. autofunction:: MSE
357
358.. autofunction:: RMSE
359
360.. autofunction:: MAE
361
362.. autofunction:: RSE
363
364.. autofunction:: RRSE
365
366.. autofunction:: RAE
367
368.. autofunction:: R2
369
370The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to
371score several regression methods.
372
373.. literalinclude:: code/statExamplesRegression.py
374
375The code above produces the following output::
376
377    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
378    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
379    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
380    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
381    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715
382   
383=================
384Ploting functions
385=================
386
387.. autofunction:: graph_ranks
388
389The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:
390
391.. literalinclude:: code/statExamplesGraphRanks.py
392
393Code produces the following graph:
394
395.. image:: files/statExamplesGraphRanks1.png
396
397.. autofunction:: compute_CD
398
399.. autofunction:: compute_friedman
400
401=================
402Utility Functions
403=================
404
405.. autofunction:: split_by_iterations
406
407=====================================
408Scoring for multilabel classification
409=====================================
410
411Multi-label classification requries different metrics than those used in traditional single-label
412classification. This module presents the various methrics that have been proposed in the literature.
413Let :math:`D` be a multi-label evaluation data set, conisting of :math:`|D|` multi-label examples
414:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multi-label classifier
415and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`.
416
417.. autofunction:: mlc_hamming_loss
418.. autofunction:: mlc_accuracy
419.. autofunction:: mlc_precision
420.. autofunction:: mlc_recall
421
422So, let's compute all this and print it out (part of
423:download:`mlc-evaluate.py <code/mlc-evaluate.py>`, uses
424:download:`emotions.tab <code/emotions.tab>`):
425
426.. literalinclude:: code/mlc-evaluate.py
427   :lines: 1-15
428
429The output should look like this::
430
431    loss= [0.9375]
432    accuracy= [0.875]
433    precision= [1.0]
434    recall= [0.875]
435
436References
437==========
438
439Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification',
440Pattern Recogintion, vol.37, no.9, pp:1757-71
441
442Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper
443presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
444(PAKDD 2004)
445 
446Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',
447Machine Learning, vol.39, no.2/3, pp:135-68.
448
449"""
450
451import operator, math
452from operator import add
453import numpy
454
455import Orange
456from Orange import statc
457
458
459#### Private stuff
460
461def log2(x):
462    """Calculate logarithm in base 2."""
463    return math.log(x)/math.log(2)
464
465def check_non_zero(x):
466    """Throw Value Error when x = 0.0."""
467    if x==0.0:
468        raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
469
470def gettotweight(res):
471    """Sum all the weights"""
472    totweight = reduce(lambda x, y: x+y.weight, res.results, 0)
473    if totweight==0.0:
474        raise ValueError, "Cannot compute the score: sum of weights is 0.0."
475    return totweight
476
477def gettotsize(res):
478    """ Get number of result instances """
479    if len(res.results):
480        return len(res.results)
481    else:
482        raise ValueError, "Cannot compute the score: no examples."
483
484
485def split_by_iterations(res):
486    """ Splits ExperimentResults of multiple iteratation test into a list
487    of ExperimentResults, one for each iteration.
488    """
489    if res.number_of_iterations < 2:
490        return [res]
491       
492    ress = [Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels)
493            for i in range(res.number_of_iterations)]
494    for te in res.results:
495        ress[te.iteration_number].results.append(te)
496    return ress
497
498def split_by_classifiers(res):
499    """ Splites an instance of :obj:`ExperimentResults` into a list of
500    :obj:`ExperimentResults`, one for each classifier.
501    """
502    split_res = []
503    for i in range(len(res.classifierNames)):
504        r = Orange.evaluation.testing.ExperimentResults(res.numberOfIterations,
505                    [res.classifierNames[i]], res.classValues,
506                    weights=res.weights, baseClass=res.baseClass,
507                    classifiers=[res.classifiers[i]] if res.classifiers else [],
508                    test_type = res.test_type, labels = res.labels)
509        r.results = []
510        for te in res.results:
511            r.results.append(Orange.evaluation.testing.TestedExample(te.iterationNumber,
512                                te.actualClass, n=1, weight=te.weight))
513            r.results[-1].classes = [te.classes[i]]
514            r.results[-1].probabilities = [te.probabilities[i]]
515        split_res.append(r)
516    return split_res
517
518
519def class_probabilities_from_res(res, **argkw):
520    """Calculate class probabilities"""
521    probs = [0.0] * len(res.class_values)
522    if argkw.get("unweighted", 0) or not res.weights:
523        for tex in res.results:
524            probs[int(tex.actual_class)] += 1.0
525        totweight = gettotsize(res)
526    else:
527        totweight = 0.0
528        for tex in res.results:
529            probs[tex.actual_class] += tex.weight
530            totweight += tex.weight
531        check_non_zero(totweight)
532    return [prob/totweight for prob in probs]
533
534
535def statistics_by_folds(stats, foldN, reportSE, iterationIsOuter):
536    # remove empty folds, turn the matrix so that learner is outer
537    if iterationIsOuter:
538        if not stats:
539            raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
540        number_of_learners = len(stats[0])
541        stats = filter(lambda (x, fN): fN>0.0, zip(stats,foldN))
542        stats = [ [x[lrn]/fN for x, fN in stats] for lrn in range(number_of_learners)]
543    else:
544        stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, foldN))] for lrnD in stats]
545
546    if not stats:
547        raise ValueError, "Cannot compute the score: no classifiers"
548    if not stats[0]:
549        raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
550   
551    if reportSE:
552        return [(statc.mean(x), statc.sterr(x)) for x in stats]
553    else:
554        return [statc.mean(x) for x in stats]
555   
556def ME(res, **argkw):
557    MEs = [0.0]*res.number_of_learners
558
559    if argkw.get("unweighted", 0) or not res.weights:
560        for tex in res.results:
561            MEs = map(lambda res, cls, ac = float(tex.actual_class):
562                      res + abs(float(cls) - ac), MEs, tex.classes)
563        totweight = gettotsize(res)
564    else:
565        for tex in res.results:
566            MEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight:
567                       res + tw*abs(float(cls) - ac), MEs, tex.classes)
568        totweight = gettotweight(res)
569
570    return [x/totweight for x in MEs]
571
572MAE = ME
573
574#########################################################################
575# PERFORMANCE MEASURES:
576# Scores for evaluation of numeric predictions
577
578def check_argkw(dct, lst):
579    """check_argkw(dct, lst) -> returns true if any items have non-zero value in dct"""
580    return reduce(lambda x,y: x or y, [dct.get(k, 0) for k in lst])
581
582def regression_error(res, **argkw):
583    """regression_error(res) -> regression error (default: MSE)"""
584    if argkw.get("SE", 0) and res.number_of_iterations > 1:
585        # computes the scores for each iteration, then averages
586        scores = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)]
587        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0):
588            norm = [0.0] * res.number_of_iterations
589
590        nIter = [0]*res.number_of_iterations       # counts examples in each iteration
591        a = [0]*res.number_of_iterations           # average class in each iteration
592        for tex in res.results:
593            nIter[tex.iteration_number] += 1
594            a[tex.iteration_number] += float(tex.actual_class)
595        a = [a[i]/nIter[i] for i in range(res.number_of_iterations)]
596
597        if argkw.get("unweighted", 0) or not res.weights:
598            # iterate accross test cases
599            for tex in res.results:
600                ai = float(tex.actual_class)
601                nIter[tex.iteration_number] += 1
602
603                # compute normalization, if required
604                if argkw.get("norm-abs", 0):
605                    norm[tex.iteration_number] += abs(ai - a[tex.iteration_number])
606                elif argkw.get("norm-sqr", 0):
607                    norm[tex.iteration_number] += (ai - a[tex.iteration_number])**2
608
609                # iterate accross results of different regressors
610                for i, cls in enumerate(tex.classes):
611                    if argkw.get("abs", 0):
612                        scores[i][tex.iteration_number] += abs(float(cls) - ai)
613                    else:
614                        scores[i][tex.iteration_number] += (float(cls) - ai)**2
615        else: # unweighted<>0
616            raise NotImplementedError, "weighted error scores with SE not implemented yet"
617
618        if argkw.get("norm-abs") or argkw.get("norm-sqr"):
619            scores = [[x/n for x, n in zip(y, norm)] for y in scores]
620        else:
621            scores = [[x/ni for x, ni in zip(y, nIter)] for y in scores]
622
623        if argkw.get("R2"):
624            scores = [[1.0 - x for x in y] for y in scores]
625
626        if argkw.get("sqrt", 0):
627            scores = [[math.sqrt(x) for x in y] for y in scores]
628
629        return [(statc.mean(x), statc.std(x)) for x in scores]
630       
631    else: # single iteration (testing on a single test set)
632        scores = [0.0] * res.number_of_learners
633        norm = 0.0
634
635        if argkw.get("unweighted", 0) or not res.weights:
636            a = sum([tex.actual_class for tex in res.results]) \
637                / len(res.results)
638            for tex in res.results:
639                if argkw.get("abs", 0):
640                    scores = map(lambda res, cls, ac = float(tex.actual_class):
641                                 res + abs(float(cls) - ac), scores, tex.classes)
642                else:
643                    scores = map(lambda res, cls, ac = float(tex.actual_class):
644                                 res + (float(cls) - ac)**2, scores, tex.classes)
645
646                if argkw.get("norm-abs", 0):
647                    norm += abs(tex.actual_class - a)
648                elif argkw.get("norm-sqr", 0):
649                    norm += (tex.actual_class - a)**2
650            totweight = gettotsize(res)
651        else:
652            # UNFINISHED
653            for tex in res.results:
654                MSEs = map(lambda res, cls, ac = float(tex.actual_class),
655                           tw = tex.weight:
656                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes)
657            totweight = gettotweight(res)
658
659        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0):
660            scores = [s/norm for s in scores]
661        else: # normalize by number of instances (or sum of weights)
662            scores = [s/totweight for s in scores]
663
664        if argkw.get("R2"):
665            scores = [1.0 - s for s in scores]
666
667        if argkw.get("sqrt", 0):
668            scores = [math.sqrt(x) for x in scores]
669
670        return scores
671
672def MSE(res, **argkw):
673    """ Computes mean-squared error. """
674    return regression_error(res, **argkw)
675   
676def RMSE(res, **argkw):
677    """ Computes root mean-squared error. """
678    argkw.setdefault("sqrt", True)
679    return regression_error(res, **argkw)
680
681def MAE(res, **argkw):
682    """ Computes mean absolute error. """
683    argkw.setdefault("abs", True)
684    return regression_error(res, **argkw)
685
686def RSE(res, **argkw):
687    """ Computes relative squared error. """
688    argkw.setdefault("norm-sqr", True)
689    return regression_error(res, **argkw)
690
691def RRSE(res, **argkw):
692    """ Computes relative squared error. """
693    argkw.setdefault("norm-sqr", True)
694    argkw.setdefault("sqrt", True)
695    return regression_error(res, **argkw)
696
697def RAE(res, **argkw):
698    """ Computes relative absolute error. """
699    argkw.setdefault("abs", True)
700    argkw.setdefault("norm-abs", True)
701    return regression_error(res, **argkw)
702
703def R2(res, **argkw):
704    """ Computes the coefficient of determination, R-squared. """
705    argkw.setdefault("norm-sqr", True)
706    argkw.setdefault("R2", True)
707    return regression_error(res, **argkw)
708
709def MSE_old(res, **argkw):
710    """MSE(res) -> mean-squared error"""
711    if argkw.get("SE", 0) and res.number_of_iterations > 1:
712        MSEs = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)]
713        nIter = [0]*res.number_of_iterations
714        if argkw.get("unweighted", 0) or not res.weights:
715            for tex in res.results:
716                ac = float(tex.actual_class)
717                nIter[tex.iteration_number] += 1
718                for i, cls in enumerate(tex.classes):
719                    MSEs[i][tex.iteration_number] += (float(cls) - ac)**2
720        else:
721            raise ValueError, "weighted RMSE with SE not implemented yet"
722        MSEs = [[x/ni for x, ni in zip(y, nIter)] for y in MSEs]
723        if argkw.get("sqrt", 0):
724            MSEs = [[math.sqrt(x) for x in y] for y in MSEs]
725        return [(statc.mean(x), statc.std(x)) for x in MSEs]
726       
727    else:
728        MSEs = [0.0]*res.number_of_learners
729        if argkw.get("unweighted", 0) or not res.weights:
730            for tex in res.results:
731                MSEs = map(lambda res, cls, ac = float(tex.actual_class):
732                           res + (float(cls) - ac)**2, MSEs, tex.classes)
733            totweight = gettotsize(res)
734        else:
735            for tex in res.results:
736                MSEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight:
737                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes)
738            totweight = gettotweight(res)
739
740        if argkw.get("sqrt", 0):
741            MSEs = [math.sqrt(x) for x in MSEs]
742        return [x/totweight for x in MSEs]
743
744def RMSE_old(res, **argkw):
745    """RMSE(res) -> root mean-squared error"""
746    argkw.setdefault("sqrt", 1)
747    return MSE_old(res, **argkw)
748
749#########################################################################
750# PERFORMANCE MEASURES:
751# Scores for evaluation of classifiers
752
753def CA(res, reportSE = False, **argkw):
754    """ Computes classification accuracy, i.e. percentage of matches between
755    predicted and actual class. The function returns a list of classification
756    accuracies of all classifiers tested. If reportSE is set to true, the list
757    will contain tuples with accuracies and standard errors.
758   
759    If results are from multiple repetitions of experiments (like those
760    returned by Orange.evaluation.testing.crossValidation or
761    Orange.evaluation.testing.proportionTest) the
762    standard error (SE) is estimated from deviation of classification
763    accuracy accross folds (SD), as SE = SD/sqrt(N), where N is number
764    of repetitions (e.g. number of folds).
765   
766    If results are from a single repetition, we assume independency of
767    instances and treat the classification accuracy as distributed according
768    to binomial distribution. This can be approximated by normal distribution,
769    so we report the SE of sqrt(CA*(1-CA)/N), where CA is classification
770    accuracy and N is number of test instances.
771   
772    Instead of ExperimentResults, this function can be given a list of
773    confusion matrices (see below). Standard errors are in this case
774    estimated using the latter method.
775    """
776    if res.number_of_iterations==1:
777        if type(res)==ConfusionMatrix:
778            div = nm.TP+nm.FN+nm.FP+nm.TN
779            check_non_zero(div)
780            ca = [(nm.TP+nm.TN)/div]
781        else:
782            CAs = [0.0]*res.number_of_learners
783            if argkw.get("unweighted", 0) or not res.weights:
784                totweight = gettotsize(res)
785                for tex in res.results:
786                    CAs = map(lambda res, cls: res+(cls==tex.actual_class), CAs, tex.classes)
787            else:
788                totweight = 0.
789                for tex in res.results:
790                    CAs = map(lambda res, cls: res+(cls==tex.actual_class and tex.weight), CAs, tex.classes)
791                    totweight += tex.weight
792            check_non_zero(totweight)
793            ca = [x/totweight for x in CAs]
794           
795        if reportSE:
796            return [(x, x*(1-x)/math.sqrt(totweight)) for x in ca]
797        else:
798            return ca
799       
800    else:
801        CAsByFold = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)]
802        foldN = [0.0]*res.number_of_iterations
803
804        if argkw.get("unweighted", 0) or not res.weights:
805            for tex in res.results:
806                for lrn in range(res.number_of_learners):
807                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class)
808                foldN[tex.iteration_number] += 1
809        else:
810            for tex in res.results:
811                for lrn in range(res.number_of_learners):
812                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) and tex.weight
813                foldN[tex.iteration_number] += tex.weight
814
815        return statistics_by_folds(CAsByFold, foldN, reportSE, False)
816
817
818# Obsolete, but kept for compatibility
819def CA_se(res, **argkw):
820    return CA(res, True, **argkw)
821
822
823def AP(res, reportSE = False, **argkw):
824    """ Computes the average probability assigned to the correct class. """
825    if res.number_of_iterations == 1:
826        APs=[0.0]*res.number_of_learners
827        if argkw.get("unweighted", 0) or not res.weights:
828            for tex in res.results:
829                APs = map(lambda res, probs: res + probs[tex.actual_class], APs, tex.probabilities)
830            totweight = gettotsize(res)
831        else:
832            totweight = 0.
833            for tex in res.results:
834                APs = map(lambda res, probs: res + probs[tex.actual_class]*tex.weight, APs, tex.probabilities)
835                totweight += tex.weight
836        check_non_zero(totweight)
837        return [AP/totweight for AP in APs]
838
839    APsByFold = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)]
840    foldN = [0.0] * res.number_of_iterations
841    if argkw.get("unweighted", 0) or not res.weights:
842        for tex in res.results:
843            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class], APsByFold[tex.iteration_number], tex.probabilities)
844            foldN[tex.iteration_number] += 1
845    else:
846        for tex in res.results:
847            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class] * tex.weight, APsByFold[tex.iteration_number], tex.probabilities)
848            foldN[tex.iteration_number] += tex.weight
849
850    return statistics_by_folds(APsByFold, foldN, reportSE, True)
851
852
853def Brier_score(res, reportSE = False, **argkw):
854    """ Computes the Brier's score, defined as the average (over test examples)
855    of sumx(t(x)-p(x))2, where x is a class, t(x) is 1 for the correct class
856    and 0 for the others, and p(x) is the probability that the classifier
857    assigned to the class x
858    """
859    # Computes an average (over examples) of sum_x(t(x) - p(x))^2, where
860    #    x is class,
861    #    t(x) is 0 for 'wrong' and 1 for 'correct' class
862    #    p(x) is predicted probabilty.
863    # There's a trick: since t(x) is zero for all classes but the
864    # correct one (c), we compute the sum as sum_x(p(x)^2) - 2*p(c) + 1
865    # Since +1 is there for each example, it adds 1 to the average
866    # We skip the +1 inside the sum and add it just at the end of the function
867    # We take max(result, 0) to avoid -0.0000x due to rounding errors
868
869    if res.number_of_iterations == 1:
870        MSEs=[0.0]*res.number_of_learners
871        if argkw.get("unweighted", 0) or not res.weights:
872            totweight = 0.0
873            for tex in res.results:
874                MSEs = map(lambda res, probs:
875                           res + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities)
876                totweight += tex.weight
877        else:
878            for tex in res.results:
879                MSEs = map(lambda res, probs:
880                           res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities)
881            totweight = gettotweight(res)
882        check_non_zero(totweight)
883        if reportSE:
884            return [(max(x/totweight+1.0, 0), 0) for x in MSEs]  ## change this, not zero!!!
885        else:
886            return [max(x/totweight+1.0, 0) for x in MSEs]
887
888    BSs = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)]
889    foldN = [0.] * res.number_of_iterations
890
891    if argkw.get("unweighted", 0) or not res.weights:
892        for tex in res.results:
893            BSs[tex.iteration_number] = map(lambda rr, probs:
894                       rr + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities)
895            foldN[tex.iteration_number] += 1
896    else:
897        for tex in res.results:
898            BSs[tex.iteration_number] = map(lambda res, probs:
899                       res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities)
900            foldN[tex.iteration_number] += tex.weight
901
902    stats = statistics_by_folds(BSs, foldN, reportSE, True)
903    if reportSE:
904        return [(x+1.0, y) for x, y in stats]
905    else:
906        return [x+1.0 for x in stats]
907
908def BSS(res, **argkw):
909    return [1-x/2 for x in apply(Brier_score, (res, ), argkw)]
910
911def IS_ex(Pc, P):
912    "Pc aposterior probability, P aprior"
913    if (Pc>=P):
914        return -log2(P)+log2(Pc)
915    else:
916        return -(-log2(1-P)+log2(1-Pc))
917   
918def IS(res, apriori=None, reportSE = False, **argkw):
919    """ Computes the information score as defined by
920    `Kononenko and Bratko (1991) \
921    <http://www.springerlink.com/content/g5p7473160476612/>`_.
922    Argument :obj:`apriori` gives the apriori class
923    distribution; if it is omitted, the class distribution is computed from
924    the actual classes of examples in :obj:`res`.
925    """
926    if not apriori:
927        apriori = class_probabilities_from_res(res)
928
929    if res.number_of_iterations==1:
930        ISs = [0.0]*res.number_of_learners
931        if argkw.get("unweighted", 0) or not res.weights:
932            for tex in res.results:
933              for i in range(len(tex.probabilities)):
934                    cls = tex.actual_class
935                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls])
936            totweight = gettotsize(res)
937        else:
938            for tex in res.results:
939              for i in range(len(tex.probabilities)):
940                    cls = tex.actual_class
941                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight
942            totweight = gettotweight(res)
943        if reportSE:
944            return [(IS/totweight,0) for IS in ISs]
945        else:
946            return [IS/totweight for IS in ISs]
947
948       
949    ISs = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)]
950    foldN = [0.] * res.number_of_iterations
951
952    # compute info scores for each fold   
953    if argkw.get("unweighted", 0) or not res.weights:
954        for tex in res.results:
955            for i in range(len(tex.probabilities)):
956                cls = tex.actual_class
957                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls])
958            foldN[tex.iteration_number] += 1
959    else:
960        for tex in res.results:
961            for i in range(len(tex.probabilities)):
962                cls = tex.actual_class
963                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight
964            foldN[tex.iteration_number] += tex.weight
965
966    return statistics_by_folds(ISs, foldN, reportSE, False)
967
968
969def Friedman(res, statistics, **argkw):
970    sums = None
971    for ri in split_by_iterations(res):
972        ranks = statc.rankdata(apply(statistics, (ri,), argkw))
973        if sums:
974            sums = sums and [ranks[i]+sums[i] for i in range(k)]
975        else:
976            sums = ranks
977            k = len(sums)
978    N = res.number_of_iterations
979    k = len(sums)
980    T = sum([x*x for x in sums])
981    F = 12.0 / (N*k*(k+1)) * T  - 3 * N * (k+1)
982    return F, statc.chisqprob(F, k-1)
983   
984
985def Wilcoxon(res, statistics, **argkw):
986    res1, res2 = [], []
987    for ri in split_by_iterations(res):
988        stats = apply(statistics, (ri,), argkw)
989        if (len(stats) != 2):
990            raise TypeError, "Wilcoxon compares two classifiers, no more, no less"
991        res1.append(stats[0])
992        res2.append(stats[1])
993    return statc.wilcoxont(res1, res2)
994
995def rank_difference(res, statistics, **argkw):
996    if not res.results:
997        raise TypeError, "no experiments"
998
999    k = len(res.results[0].classes)
1000    if (k<2):
1001        raise TypeError, "nothing to compare (less than two classifiers given)"
1002    if k==2:
1003        return apply(Wilcoxon, (res, statistics), argkw)
1004    else:
1005        return apply(Friedman, (res, statistics), argkw)
1006   
1007class ConfusionMatrix:
1008    """ Class ConfusionMatrix stores data about false and true
1009    predictions compared to real class. It stores the number of
1010    True Negatives, False Positive, False Negatives and True Positives.
1011    """
1012    def __init__(self):
1013        self.TP = self.FN = self.FP = self.TN = 0.0
1014
1015    def addTFPosNeg(self, predictedPositive, isPositive, weight = 1.0):
1016        if predictedPositive:
1017            if isPositive:
1018                self.TP += weight
1019            else:
1020                self.FP += weight
1021        else:
1022            if isPositive:
1023                self.FN += weight
1024            else:
1025                self.TN += weight
1026
1027
1028def confusion_matrices(res, classIndex=-1, **argkw):
1029    """ This function can compute two different forms of confusion matrix:
1030    one in which a certain class is marked as positive and the other(s)
1031    negative, and another in which no class is singled out. The way to
1032    specify what we want is somewhat confusing due to backward
1033    compatibility issues.
1034    """
1035    tfpns = [ConfusionMatrix() for i in range(res.number_of_learners)]
1036   
1037    if classIndex<0:
1038        numberOfClasses = len(res.class_values)
1039        if classIndex < -1 or numberOfClasses > 2:
1040            cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.number_of_learners)]
1041            if argkw.get("unweighted", 0) or not res.weights:
1042                for tex in res.results:
1043                    trueClass = int(tex.actual_class)
1044                    for li, pred in enumerate(tex.classes):
1045                        predClass = int(pred)
1046                        if predClass < numberOfClasses:
1047                            cm[li][trueClass][predClass] += 1
1048            else:
1049                for tex in enumerate(res.results):
1050                    trueClass = int(tex.actual_class)
1051                    for li, pred in tex.classes:
1052                        predClass = int(pred)
1053                        if predClass < numberOfClasses:
1054                            cm[li][trueClass][predClass] += tex.weight
1055            return cm
1056           
1057        elif res.baseClass>=0:
1058            classIndex = res.baseClass
1059        else:
1060            classIndex = 1
1061           
1062    cutoff = argkw.get("cutoff")
1063    if cutoff:
1064        if argkw.get("unweighted", 0) or not res.weights:
1065            for lr in res.results:
1066                isPositive=(lr.actual_class==classIndex)
1067                for i in range(res.number_of_learners):
1068                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive)
1069        else:
1070            for lr in res.results:
1071                isPositive=(lr.actual_class==classIndex)
1072                for i in range(res.number_of_learners):
1073                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive, lr.weight)
1074    else:
1075        if argkw.get("unweighted", 0) or not res.weights:
1076            for lr in res.results:
1077                isPositive=(lr.actual_class==classIndex)
1078                for i in range(res.number_of_learners):
1079                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive)
1080        else:
1081            for lr in res.results:
1082                isPositive=(lr.actual_class==classIndex)
1083                for i in range(res.number_of_learners):
1084                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive, lr.weight)
1085    return tfpns
1086
1087
1088# obsolete (renamed)
1089compute_confusion_matrices = confusion_matrices
1090
1091
1092def confusion_chi_square(confusionMatrix):
1093    dim = len(confusionMatrix)
1094    rowPriors = [sum(r) for r in confusionMatrix]
1095    colPriors = [sum([r[i] for r in confusionMatrix]) for i in range(dim)]
1096    total = sum(rowPriors)
1097    rowPriors = [r/total for r in rowPriors]
1098    colPriors = [r/total for r in colPriors]
1099    ss = 0
1100    for ri, row in enumerate(confusionMatrix):
1101        for ci, o in enumerate(row):
1102            e = total * rowPriors[ri] * colPriors[ci]
1103            if not e:
1104                return -1, -1, -1
1105            ss += (o-e)**2 / e
1106    df = (dim-1)**2
1107    return ss, df, statc.chisqprob(ss, df)
1108       
1109   
1110def sens(confm):
1111    """Return sensitivity (recall rate) over the given confusion matrix."""
1112    if type(confm) == list:
1113        return [sens(cm) for cm in confm]
1114    else:
1115        tot = confm.TP+confm.FN
1116        if tot < 1e-6:
1117            import warnings
1118            warnings.warn("Can't compute sensitivity: one or both classes have no instances")
1119            return -1
1120
1121        return confm.TP/tot
1122
1123def recall(confm):
1124    """Return recall rate (sensitivity) over the given confusion matrix."""
1125    return sens(confm)
1126
1127
1128def spec(confm):
1129    """Return specificity over the given confusion matrix."""
1130    if type(confm) == list:
1131        return [spec(cm) for cm in confm]
1132    else:
1133        tot = confm.FP+confm.TN
1134        if tot < 1e-6:
1135            import warnings
1136            warnings.warn("Can't compute specificity: one or both classes have no instances")
1137            return -1
1138        return confm.TN/tot
1139 
1140
1141def PPV(confm):
1142    """Return positive predictive value (precision rate) over the given confusion matrix."""
1143    if type(confm) == list:
1144        return [PPV(cm) for cm in confm]
1145    else:
1146        tot = confm.TP+confm.FP
1147        if tot < 1e-6:
1148            import warnings
1149            warnings.warn("Can't compute PPV: one or both classes have no instances")
1150            return -1
1151        return confm.TP/tot
1152
1153
1154def precision(confm):
1155    """Return precision rate (positive predictive value) over the given confusion matrix."""
1156    return PPV(confm)
1157
1158
1159def NPV(confm):
1160    """Return negative predictive value over the given confusion matrix."""
1161    if type(confm) == list:
1162        return [NPV(cm) for cm in confm]
1163    else:
1164        tot = confm.FN+confm.TN
1165        if tot < 1e-6:
1166            import warnings
1167            warnings.warn("Can't compute NPV: one or both classes have no instances")
1168            return -1
1169        return confm.TN/tot
1170
1171def F1(confm):
1172    """Return F1 score (harmonic mean of precision and recall) over the given confusion matrix."""
1173    if type(confm) == list:
1174        return [F1(cm) for cm in confm]
1175    else:
1176        p = precision(confm)
1177        r = recall(confm)
1178        if p + r > 0:
1179            return 2. * p * r / (p + r)
1180        else:
1181            import warnings
1182            warnings.warn("Can't compute F1: P + R is zero or not defined")
1183            return -1
1184
1185def Falpha(confm, alpha=1.0):
1186    """Return the alpha-mean of precision and recall over the given confusion matrix."""
1187    if type(confm) == list:
1188        return [Falpha(cm, alpha=alpha) for cm in confm]
1189    else:
1190        p = precision(confm)
1191        r = recall(confm)
1192        return (1. + alpha) * p * r / (alpha * p + r)
1193   
1194def MCC(confm):
1195    '''
1196    Return Mattew correlation coefficient over the given confusion matrix.
1197
1198    MCC is calculated as follows:
1199    MCC = (TP*TN - FP*FN) / sqrt( (TP+FP)*(TP+FN)*(TN+FP)*(TN+FN) )
1200   
1201    [1] Matthews, B.W., Comparison of the predicted and observed secondary
1202    structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442-451
1203
1204    code by Boris Gorelik
1205    '''
1206    if type(confm) == list:
1207        return [MCC(cm) for cm in confm]
1208    else:
1209        truePositive = confm.TP
1210        trueNegative = confm.TN
1211        falsePositive = confm.FP
1212        falseNegative = confm.FN
1213         
1214        try:   
1215            r = (((truePositive * trueNegative) - (falsePositive * falseNegative))/ 
1216                math.sqrt(  (truePositive + falsePositive)  * 
1217                ( truePositive + falseNegative ) * 
1218                ( trueNegative + falsePositive ) * 
1219                ( trueNegative + falseNegative ) )
1220                )
1221        except ZeroDivisionError:
1222            # Zero difision occurs when there is either no true positives
1223            # or no true negatives i.e. the problem contains only one
1224            # type of classes.
1225            import warnings
1226            warnings.warn("Can't compute MCC: TP or TN is zero or not defined")
1227            r = None
1228
1229    return r
1230
1231def scotts_pi(confm, bIsListOfMatrices=True):
1232   """Compute Scott's Pi for measuring inter-rater agreement for nominal data
1233
1234   http://en.wikipedia.org/wiki/Scott%27s_Pi
1235   Scott's Pi is a statistic for measuring inter-rater reliability for nominal
1236   raters.
1237
1238   @param confm: confusion matrix, or list of confusion matrices. To obtain
1239                           non-binary confusion matrix, call
1240                           Orange.evaluation.scoring.compute_confusion_matrices and set the
1241                           classIndex parameter to -2.
1242   @param bIsListOfMatrices: specifies whether confm is list of matrices.
1243                           This function needs to operate on non-binary
1244                           confusion matrices, which are represented by python
1245                           lists, therefore one needs a way to distinguish
1246                           between a single matrix and list of matrices
1247   """
1248
1249   if bIsListOfMatrices:
1250       try:
1251           return [scotts_pi(cm, bIsListOfMatrices=False) for cm in confm]
1252       except TypeError:
1253           # Nevermind the parameter, maybe this is a "conventional" binary
1254           # confusion matrix and bIsListOfMatrices was specified by mistake
1255           return scottsPiSingle(confm, bIsListOfMatrices=False)
1256   else:
1257       if isinstance(confm, ConfusionMatrix):
1258           confm = numpy.array( [[confm.TP, confm.FN],
1259                   [confm.FP, confm.TN]], dtype=float)
1260       else:
1261           confm = numpy.array(confm, dtype=float)
1262
1263       marginalSumOfRows = numpy.sum(confm, axis=0)
1264       marginalSumOfColumns = numpy.sum(confm, axis=1)
1265       jointProportion = (marginalSumOfColumns + marginalSumOfRows)/ \
1266                           (2.0 * numpy.sum(confm, axis=None))
1267       # In the eq. above, 2.0 is what the Wikipedia page calls
1268       # the number of annotators. Here we have two annotators:
1269       # the observed (true) labels (annotations) and the predicted by
1270       # the learners.
1271
1272       prExpected = numpy.sum(jointProportion ** 2, axis=None)
1273       prActual = numpy.sum(numpy.diag(confm), axis=None)/numpy.sum(confm, axis=None)
1274
1275       ret = (prActual - prExpected) / (1.0 - prExpected)
1276       return ret
1277
1278def AUCWilcoxon(res, classIndex=-1, **argkw):
1279    """ Computes the area under ROC (AUC) and its standard error using
1280    Wilcoxon's approach proposed by Hanley and McNeal (1982). If
1281    :obj:`classIndex` is not specified, the first class is used as
1282    "the positive" and others are negative. The result is a list of
1283    tuples (aROC, standard error).
1284    """
1285    import corn
1286    useweights = res.weights and not argkw.get("unweighted", 0)
1287    problists, tots = corn.computeROCCumulative(res, classIndex, useweights)
1288
1289    results=[]
1290
1291    totPos, totNeg = tots[1], tots[0]
1292    N = totPos + totNeg
1293    for plist in problists:
1294        highPos, lowNeg = totPos, 0.0
1295        W, Q1, Q2 = 0.0, 0.0, 0.0
1296        for prob in plist:
1297            thisPos, thisNeg = prob[1][1], prob[1][0]
1298            highPos -= thisPos
1299            W += thisNeg * (highPos + thisPos/2.)
1300            Q2 += thisPos * (lowNeg**2  + lowNeg*thisNeg  + thisNeg**2 /3.)
1301            Q1 += thisNeg * (highPos**2 + highPos*thisPos + thisPos**2 /3.)
1302
1303            lowNeg += thisNeg
1304
1305        W  /= (totPos*totNeg)
1306        Q1 /= (totNeg*totPos**2)
1307        Q2 /= (totPos*totNeg**2)
1308
1309        SE = math.sqrt( (W*(1-W) + (totPos-1)*(Q1-W**2) + (totNeg-1)*(Q2-W**2)) / (totPos*totNeg) )
1310        results.append((W, SE))
1311    return results
1312
1313AROC = AUCWilcoxon # for backward compatibility, AROC is obsolote
1314
1315def compare_2_AUCs(res, lrn1, lrn2, classIndex=-1, **argkw):
1316    import corn
1317    return corn.compare2ROCs(res, lrn1, lrn2, classIndex, res.weights and not argkw.get("unweighted"))
1318
1319compare_2_AROCs = compare_2_AUCs # for backward compatibility, compare_2_AROCs is obsolote
1320
1321   
1322def compute_ROC(res, classIndex=-1):
1323    """ Computes a ROC curve as a list of (x, y) tuples, where x is
1324    1-specificity and y is sensitivity.
1325    """
1326    import corn
1327    problists, tots = corn.computeROCCumulative(res, classIndex)
1328
1329    results = []
1330    totPos, totNeg = tots[1], tots[0]
1331
1332    for plist in problists:
1333        curve=[(1., 1.)]
1334        TP, TN = totPos, 0.0
1335        FN, FP = 0., totNeg
1336        for prob in plist:
1337            thisPos, thisNeg = prob[1][1], prob[1][0]
1338            # thisPos go from TP to FN
1339            TP -= thisPos
1340            FN += thisPos
1341            # thisNeg go from FP to TN
1342            TN += thisNeg
1343            FP -= thisNeg
1344
1345            sens = TP/(TP+FN)
1346            spec = TN/(FP+TN)
1347            curve.append((1-spec, sens))
1348        results.append(curve)
1349
1350    return results   
1351
1352## TC's implementation of algorithms, taken from:
1353## T Fawcett: ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, submitted to KDD Journal.
1354def ROC_slope((P1x, P1y, P1fscore), (P2x, P2y, P2fscore)):
1355    if (P1x == P2x):
1356        return 1e300
1357    return (P1y - P2y) / (P1x - P2x)
1358
1359def ROC_add_point(P, R, keepConcavities=1):
1360    if keepConcavities:
1361        R.append(P)
1362    else:
1363        while (1):
1364            if len(R) < 2:
1365                R.append(P)
1366                return R
1367            else:
1368                T = R.pop()
1369                T2 = R[-1]
1370                if ROC_slope(T2, T) > ROC_slope(T, P):
1371                    R.append(T)
1372                    R.append(P)
1373                    return R
1374    return R
1375
1376def TC_compute_ROC(res, classIndex=-1, keepConcavities=1):
1377    import corn
1378    problists, tots = corn.computeROCCumulative(res, classIndex)
1379
1380    results = []
1381    P, N = tots[1], tots[0]
1382
1383    for plist in problists:
1384        ## corn gives an increasing by scores list, we need a decreasing by scores
1385        plist.reverse()
1386        TP = 0.0
1387        FP = 0.0
1388        curve=[]
1389        fPrev = 10e300 # "infinity" score at 0.0, 0.0
1390        for prob in plist:
1391            f = prob[0]
1392            if f <> fPrev:
1393                if P:
1394                    tpr = TP/P
1395                else:
1396                    tpr = 0.0
1397                if N:
1398                    fpr = FP/N
1399                else:
1400                    fpr = 0.0
1401                curve = ROC_add_point((fpr, tpr, fPrev), curve, keepConcavities)
1402                fPrev = f
1403            thisPos, thisNeg = prob[1][1], prob[1][0]
1404            TP += thisPos
1405            FP += thisNeg
1406        if P:
1407            tpr = TP/P
1408        else:
1409            tpr = 0.0
1410        if N:
1411            fpr = FP/N
1412        else:
1413            fpr = 0.0
1414        curve = ROC_add_point((fpr, tpr, f), curve, keepConcavities) ## ugly
1415        results.append(curve)
1416
1417    return results
1418
1419## returns a list of points at the intersection of the tangential iso-performance line and the given ROC curve
1420## for given values of FPcost, FNcost and pval
1421def TC_best_thresholds_on_ROC_curve(FPcost, FNcost, pval, curve):
1422    m = (FPcost*(1.0 - pval)) / (FNcost*pval)
1423
1424    ## put the iso-performance line in point (0.0, 1.0)
1425    x0, y0 = (0.0, 1.0)
1426    x1, y1 = (1.0, 1.0 + m)
1427    d01 = math.sqrt((x1 - x0)*(x1 - x0) + (y1 - y0)*(y1 - y0))
1428
1429    ## calculate and find the closest point to the line
1430    firstp = 1
1431    mind = 0.0
1432    a = (x0*y1 - x1*y0)
1433    closestPoints = []
1434    for (x, y, fscore) in curve:
1435        d = ((y0 - y1)*x + (x1 - x0)*y + a) / d01
1436        d = abs(d)
1437        if firstp or d < mind:
1438            mind, firstp = d, 0
1439            closestPoints = [(x, y, fscore)]
1440        else:
1441            if abs(d - mind) <= 0.0001: ## close enough
1442                closestPoints.append( (x, y, fscore) )
1443    return closestPoints         
1444
1445def frange(start, end=None, inc=None):
1446    "A range function, that does accept float increments..."
1447
1448    if end == None:
1449        end = start + 0.0
1450        start = 0.0
1451
1452    if inc == None or inc == 0:
1453        inc = 1.0
1454
1455    L = [start]
1456    while 1:
1457        next = start + len(L) * inc
1458        if inc > 0 and next >= end:
1459            L.append(end)
1460            break
1461        elif inc < 0 and next <= end:
1462            L.append(end)
1463            break
1464        L.append(next)
1465       
1466    return L
1467
1468## input ROCcurves are of form [ROCcurves1, ROCcurves2, ... ROCcurvesN],
1469## where ROCcurvesX is a set of ROC curves,
1470## where a (one) ROC curve is a set of (FP, TP) points
1471##
1472## for each (sub)set of input ROC curves
1473## returns the average ROC curve and an array of (vertical) standard deviations
1474def TC_vertical_average_ROC(ROCcurves, samples = 10):
1475    def INTERPOLATE((P1x, P1y, P1fscore), (P2x, P2y, P2fscore), X):
1476        if (P1x == P2x) or ((X > P1x) and (X > P2x)) or ((X < P1x) and (X < P2x)):
1477            raise ValueError, "assumptions for interpolation are not met: P1 = %f,%f P2 = %f,%f X = %f" % (P1x, P1y, P2x, P2y, X)
1478        dx = float(P2x) - float(P1x)
1479        dy = float(P2y) - float(P1y)
1480        m = dy/dx
1481        return P1y + m*(X - P1x)
1482
1483    def TP_FOR_FP(FPsample, ROC, npts):
1484        i = 0
1485        while i < npts - 1:
1486            (fp, _, _) = ROC[i + 1]
1487            if (fp <= FPsample):
1488                i += 1
1489            else:
1490                break
1491        (fp, tp, _) = ROC[i]
1492        if fp == FPsample:
1493            return tp
1494        elif fp < FPsample and i + 1 < len(ROC):
1495            return INTERPOLATE(ROC[i], ROC[i+1], FPsample)
1496        elif fp < FPsample and i + 1 == len(ROC): # return the last
1497            return ROC[i][1]
1498        raise ValueError, "cannot compute: TP_FOR_FP in TC_vertical_average_ROC"
1499        #return 0.0
1500
1501    average = []
1502    stdev = []
1503    for ROCS in ROCcurves:
1504        npts = []
1505        for c in ROCS:
1506            npts.append(len(c))
1507        nrocs = len(ROCS)
1508
1509        TPavg = []
1510        TPstd = []
1511        for FPsample in frange(0.0, 1.0, 1.0/samples):
1512            TPsum = []
1513            for i in range(nrocs):
1514                TPsum.append( TP_FOR_FP(FPsample, ROCS[i], npts[i]) ) ##TPsum = TPsum + TP_FOR_FP(FPsample, ROCS[i], npts[i])
1515            TPavg.append( (FPsample, statc.mean(TPsum)) )
1516            if len(TPsum) > 1:
1517                stdv = statc.std(TPsum)
1518            else:
1519                stdv = 0.0
1520            TPstd.append( stdv )
1521
1522        average.append(TPavg)
1523        stdev.append(TPstd)
1524
1525    return (average, stdev)
1526
1527## input ROCcurves are of form [ROCcurves1, ROCcurves2, ... ROCcurvesN],
1528## where ROCcurvesX is a set of ROC curves,
1529## where a (one) ROC curve is a set of (FP, TP) points
1530##
1531## for each (sub)set of input ROC curves
1532## returns the average ROC curve, an array of vertical standard deviations and an array of horizontal standard deviations
1533def TC_threshold_average_ROC(ROCcurves, samples = 10):
1534    def POINT_AT_THRESH(ROC, npts, thresh):
1535        i = 0
1536        while i < npts - 1:
1537            (px, py, pfscore) = ROC[i]
1538            if (pfscore > thresh):
1539                i += 1
1540            else:
1541                break
1542        return ROC[i]
1543
1544    average = []
1545    stdevV = []
1546    stdevH = []
1547    for ROCS in ROCcurves:
1548        npts = []
1549        for c in ROCS:
1550            npts.append(len(c))
1551        nrocs = len(ROCS)
1552
1553        T = []
1554        for c in ROCS:
1555            for (px, py, pfscore) in c:
1556##                try:
1557##                    T.index(pfscore)
1558##                except:
1559                T.append(pfscore)
1560        T.sort()
1561        T.reverse() ## ugly
1562
1563        TPavg = []
1564        TPstdV = []
1565        TPstdH = []
1566        for tidx in frange(0, (len(T) - 1.0), float(len(T))/samples):
1567            FPsum = []
1568            TPsum = []
1569            for i in range(nrocs):
1570                (fp, tp, _) = POINT_AT_THRESH(ROCS[i], npts[i], T[int(tidx)])
1571                FPsum.append(fp)
1572                TPsum.append(tp)
1573            TPavg.append( (statc.mean(FPsum), statc.mean(TPsum)) )
1574            ## vertical standard deviation
1575            if len(TPsum) > 1:
1576                stdv = statc.std(TPsum)
1577            else:
1578                stdv = 0.0
1579            TPstdV.append( stdv )
1580            ## horizontal standard deviation
1581            if len(FPsum) > 1:
1582                stdh = statc.std(FPsum)
1583            else:
1584                stdh = 0.0
1585            TPstdH.append( stdh )
1586
1587        average.append(TPavg)
1588        stdevV.append(TPstdV)
1589        stdevH.append(TPstdH)
1590
1591    return (average, stdevV, stdevH)
1592
1593## Calibration Curve
1594## returns an array of (curve, yesClassPredictions, noClassPredictions) elements, where:
1595##  - curve is an array of points (x, y) on the calibration curve
1596##  - yesClassRugPoints is an array of (x, 1) points
1597##  - noClassRugPoints is an array of (x, 0) points
1598def compute_calibration_curve(res, classIndex=-1):
1599    import corn
1600    ## merge multiple iterations into one
1601    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels)
1602    for te in res.results:
1603        mres.results.append( te )
1604
1605    problists, tots = corn.computeROCCumulative(mres, classIndex)
1606
1607    results = []
1608    P, N = tots[1], tots[0]
1609
1610    bins = 10 ## divide interval between 0.0 and 1.0 into N bins
1611
1612    for plist in problists:
1613        yesClassRugPoints = [] 
1614        noClassRugPoints = []
1615
1616        yesBinsVals = [0] * bins
1617        noBinsVals = [0] * bins
1618        for (f, (thisNeg, thisPos)) in plist:
1619            yesClassRugPoints.append( (f, thisPos) ) #1.0
1620            noClassRugPoints.append( (f, thisNeg) ) #1.0
1621
1622            index = int(f * bins )
1623            index = min(index, bins - 1) ## just in case for value 1.0
1624            yesBinsVals[index] += thisPos
1625            noBinsVals[index] += thisNeg
1626
1627        curve = []
1628        for cn in range(bins):
1629            f = float(cn * 1.0 / bins) + (1.0 / 2.0 / bins)
1630            yesVal = yesBinsVals[cn]
1631            noVal = noBinsVals[cn]
1632            allVal = yesVal + noVal
1633            if allVal == 0.0: continue
1634            y = float(yesVal)/float(allVal)
1635            curve.append( (f,  y) )
1636
1637        ## smooth the curve
1638        maxnPoints = 100
1639        if len(curve) >= 3:
1640#            loessCurve = statc.loess(curve, -3, 0.6)
1641            loessCurve = statc.loess(curve, maxnPoints, 0.5, 3)
1642        else:
1643            loessCurve = curve
1644        clen = len(loessCurve)
1645        if clen > maxnPoints:
1646            df = clen / maxnPoints
1647            if df < 1: df = 1
1648            curve = [loessCurve[i]  for i in range(0, clen, df)]
1649        else:
1650            curve = loessCurve
1651        curve = [(c)[:2] for c in curve] ## remove the third value (variance of epsilon?) that suddenly appeared in the output of the statc.loess function
1652        results.append((curve, yesClassRugPoints, noClassRugPoints))
1653
1654    return results
1655
1656
1657## Lift Curve
1658## returns an array of curve elements, where:
1659##  - curve is an array of points ((TP+FP)/(P + N), TP/P, (th, FP/N)) on the Lift Curve
1660def compute_lift_curve(res, classIndex=-1):
1661    import corn
1662    ## merge multiple iterations into one
1663    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels)
1664    for te in res.results:
1665        mres.results.append( te )
1666
1667    problists, tots = corn.computeROCCumulative(mres, classIndex)
1668
1669    results = []
1670    P, N = tots[1], tots[0]
1671    for plist in problists:
1672        ## corn gives an increasing by scores list, we need a decreasing by scores
1673        plist.reverse()
1674        TP = 0.0
1675        FP = 0.0
1676        curve = [(0.0, 0.0, (10e300, 0.0))]
1677        for (f, (thisNeg, thisPos)) in plist:
1678            TP += thisPos
1679            FP += thisNeg
1680            curve.append( ((TP+FP)/(P + N), TP, (f, FP/(N or 1))) )
1681        results.append(curve)
1682
1683    return P, N, results
1684###
1685
1686class CDT:
1687  """ Stores number of concordant (C), discordant (D) and tied (T) pairs (used for AUC) """
1688  def __init__(self, C=0.0, D=0.0, T=0.0):
1689    self.C, self.D, self.T = C, D, T
1690   
1691def is_CDT_empty(cdt):
1692    return cdt.C + cdt.D + cdt.T < 1e-20
1693
1694
1695def compute_CDT(res, classIndex=-1, **argkw):
1696    """Obsolete, don't use"""
1697    import corn
1698    if classIndex<0:
1699        if res.baseClass>=0:
1700            classIndex = res.baseClass
1701        else:
1702            classIndex = 1
1703           
1704    useweights = res.weights and not argkw.get("unweighted", 0)
1705    weightByClasses = argkw.get("weightByClasses", True)
1706
1707    if (res.number_of_iterations>1):
1708        CDTs = [CDT() for i in range(res.number_of_learners)]
1709        iterationExperiments = split_by_iterations(res)
1710        for exp in iterationExperiments:
1711            expCDTs = corn.computeCDT(exp, classIndex, useweights)
1712            for i in range(len(CDTs)):
1713                CDTs[i].C += expCDTs[i].C
1714                CDTs[i].D += expCDTs[i].D
1715                CDTs[i].T += expCDTs[i].T
1716        for i in range(res.number_of_learners):
1717            if is_CDT_empty(CDTs[0]):
1718                return corn.computeCDT(res, classIndex, useweights)
1719       
1720        return CDTs
1721    else:
1722        return corn.computeCDT(res, classIndex, useweights)
1723
1724## THIS FUNCTION IS OBSOLETE AND ITS AVERAGING OVER FOLDS IS QUESTIONABLE
1725## DON'T USE IT
1726def ROCs_from_CDT(cdt, **argkw):
1727    """Obsolete, don't use"""
1728    if type(cdt) == list:
1729        return [ROCs_from_CDT(c) for c in cdt]
1730
1731    C, D, T = cdt.C, cdt.D, cdt.T
1732    N = C+D+T
1733    if N < 1e-6:
1734        import warnings
1735        warnings.warn("Can't compute AUC: one or both classes have no instances")
1736        return (-1,)*8
1737    if N < 2:
1738        import warnings
1739        warnings.warn("Can't compute AUC: one or both classes have too few examples")
1740
1741    som = (C-D)/N
1742    c = 0.5*(1+som)
1743 
1744    if (C+D):
1745        res = (C/N*100, D/N*100, T/N*100, N, som, (C-D)/(C+D), (C-D)/(N*(N-1)/2), 0.5*(1+som))
1746    else:
1747        res = (C/N*100, D/N*100, T/N*100, N, som, -1.0, (C-D)/(N*(N-1)/2), 0.5*(1+som))
1748
1749    if argkw.get("print"):
1750        print "Concordant  = %5.1f       Somers' D = %1.3f" % (res[0], res[4])
1751        print "Discordant  = %5.1f       Gamma     = %1.3f" % (res[1], res[5]>0 and res[5] or "N/A")
1752        print "Tied        = %5.1f       Tau-a     = %1.3f" % (res[2], res[6])
1753        print " %6d pairs             c         = %1.3f"    % (res[3], res[7])
1754
1755    return res
1756
1757AROC_from_CDT = ROCs_from_CDT  # for backward compatibility, AROC_from_CDT is obsolote
1758
1759
1760
1761# computes AUC using a specified 'cdtComputer' function
1762# It tries to compute AUCs from 'ite' (examples from a single iteration) and,
1763# if C+D+T=0, from 'all_ite' (entire test set). In the former case, the AUCs
1764# are divided by 'divideByIfIte'. Additional flag is returned which is True in
1765# the former case, or False in the latter.
1766def AUC_x(cdtComputer, ite, all_ite, divideByIfIte, computerArgs):
1767    cdts = cdtComputer(*(ite, ) + computerArgs)
1768    if not is_CDT_empty(cdts[0]):
1769        return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T)/divideByIfIte for cdt in cdts], True
1770       
1771    if all_ite:
1772        cdts = cdtComputer(*(all_ite, ) + computerArgs)
1773        if not is_CDT_empty(cdts[0]):
1774            return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T) for cdt in cdts], False
1775
1776    return False, False
1777
1778   
1779# computes AUC between classes i and j as if there we no other classes
1780def AUC_ij(ite, classIndex1, classIndex2, useWeights = True, all_ite = None, divideByIfIte = 1.0):
1781    import corn
1782    return AUC_x(corn.computeCDTPair, ite, all_ite, divideByIfIte, (classIndex1, classIndex2, useWeights))
1783
1784
1785# computes AUC between class i and the other classes (treating them as the same class)
1786def AUC_i(ite, classIndex, useWeights = True, all_ite = None, divideByIfIte = 1.0):
1787    import corn
1788    return AUC_x(corn.computeCDT, ite, all_ite, divideByIfIte, (classIndex, useWeights))
1789   
1790
1791# computes the average AUC over folds using a "AUCcomputer" (AUC_i or AUC_ij)
1792# it returns the sum of what is returned by the computer, unless at a certain
1793# fold the computer has to resort to computing over all folds or even this failed;
1794# in these cases the result is returned immediately
1795def AUC_iterations(AUCcomputer, iterations, computerArgs):
1796    subsum_aucs = [0.] * iterations[0].number_of_learners
1797    for ite in iterations:
1798        aucs, foldsUsed = AUCcomputer(*(ite, ) + computerArgs)
1799        if not aucs:
1800            return None
1801        if not foldsUsed:
1802            return aucs
1803        subsum_aucs = map(add, subsum_aucs, aucs)
1804    return subsum_aucs
1805
1806
1807# AUC for binary classification problems
1808def AUC_binary(res, useWeights = True):
1809    if res.number_of_iterations > 1:
1810        return AUC_iterations(AUC_i, split_by_iterations(res), (-1, useWeights, res, res.number_of_iterations))
1811    else:
1812        return AUC_i(res, -1, useWeights)[0]
1813
1814# AUC for multiclass problems
1815def AUC_multi(res, useWeights = True, method = 0):
1816    numberOfClasses = len(res.class_values)
1817   
1818    if res.number_of_iterations > 1:
1819        iterations = split_by_iterations(res)
1820        all_ite = res
1821    else:
1822        iterations = [res]
1823        all_ite = None
1824   
1825    # by pairs
1826    sum_aucs = [0.] * res.number_of_learners
1827    usefulClassPairs = 0.
1828
1829    if method in [0, 2]:
1830        prob = class_probabilities_from_res(res)
1831       
1832    if method <= 1:
1833        for classIndex1 in range(numberOfClasses):
1834            for classIndex2 in range(classIndex1):
1835                subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
1836                if subsum_aucs:
1837                    if method == 0:
1838                        p_ij = prob[classIndex1] * prob[classIndex2]
1839                        subsum_aucs = [x * p_ij  for x in subsum_aucs]
1840                        usefulClassPairs += p_ij
1841                    else:
1842                        usefulClassPairs += 1
1843                    sum_aucs = map(add, sum_aucs, subsum_aucs)
1844    else:
1845        for classIndex in range(numberOfClasses):
1846            subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, useWeights, all_ite, res.number_of_iterations))
1847            if subsum_aucs:
1848                if method == 0:
1849                    p_i = prob[classIndex]
1850                    subsum_aucs = [x * p_i  for x in subsum_aucs]
1851                    usefulClassPairs += p_i
1852                else:
1853                    usefulClassPairs += 1
1854                sum_aucs = map(add, sum_aucs, subsum_aucs)
1855                   
1856    if usefulClassPairs > 0:
1857        sum_aucs = [x/usefulClassPairs for x in sum_aucs]
1858
1859    return sum_aucs
1860
1861def AUC():
1862    pass
1863
1864AUC.ByWeightedPairs = 0
1865
1866# Computes AUC, possibly for multiple classes (the averaging method can be specified)
1867# Results over folds are averages; if some folds examples from one class only, the folds are merged
1868def AUC(res, method = AUC.ByWeightedPairs, useWeights = True):
1869    """ Returns the area under ROC curve (AUC) given a set of experimental
1870    results. For multivalued class problems, it will compute some sort of
1871    average, as specified by the argument method.
1872    """
1873    if len(res.class_values) < 2:
1874        raise ValueError("Cannot compute AUC on a single-class problem")
1875    elif len(res.class_values) == 2:
1876        return AUC_binary(res, useWeights)
1877    else:
1878        return AUC_multi(res, useWeights, method)
1879
1880AUC.ByWeightedPairs = 0
1881AUC.ByPairs = 1
1882AUC.WeightedOneAgainstAll = 2
1883AUC.OneAgainstAll = 3
1884
1885
1886# Computes AUC; in multivalued class problem, AUC is computed as one against all
1887# Results over folds are averages; if some folds examples from one class only, the folds are merged
1888def AUC_single(res, classIndex = -1, useWeights = True):
1889    """ Computes AUC where the class given classIndex is singled out, and
1890    all other classes are treated as a single class. To find how good our
1891    classifiers are in distinguishing between vans and other vehicle, call
1892    the function like this::
1893   
1894        Orange.evaluation.scoring.AUC_single(resVeh, \
1895classIndex = vehicle.domain.classVar.values.index("van"))
1896    """
1897    if classIndex<0:
1898        if res.baseClass>=0:
1899            classIndex = res.baseClass
1900        else:
1901            classIndex = 1
1902
1903    if res.number_of_iterations > 1:
1904        return AUC_iterations(AUC_i, split_by_iterations(res), (classIndex, useWeights, res, res.number_of_iterations))
1905    else:
1906        return AUC_i( res, classIndex, useWeights)[0]
1907
1908# Computes AUC for a pair of classes (as if there were no other classes)
1909# Results over folds are averages; if some folds have examples from one class only, the folds are merged
1910def AUC_pair(res, classIndex1, classIndex2, useWeights = True):
1911    """ Computes AUC between a pair of instances, ignoring instances from all
1912    other classes.
1913    """
1914    if res.number_of_iterations > 1:
1915        return AUC_iterations(AUC_ij, split_by_iterations(res), (classIndex1, classIndex2, useWeights, res, res.number_of_iterations))
1916    else:
1917        return AUC_ij(res, classIndex1, classIndex2, useWeights)
1918 
1919
1920# AUC for multiclass problems
1921def AUC_matrix(res, useWeights = True):
1922    """ Computes a (lower diagonal) matrix with AUCs for all pairs of classes.
1923    If there are empty classes, the corresponding elements in the matrix
1924    are -1. Remember the beautiful(?) code for printing out the confusion
1925    matrix? Here it strikes again::
1926   
1927        classes = vehicle.domain.classVar.values
1928        AUCmatrix = Orange.evaluation.scoring.AUC_matrix(resVeh)[0]
1929        print "\t"+"\t".join(classes[:-1])
1930        for className, AUCrow in zip(classes[1:], AUCmatrix[1:]):
1931            print ("%s" + ("\t%5.3f" * len(AUCrow))) % ((className, ) + tuple(AUCrow))
1932    """
1933    numberOfClasses = len(res.class_values)
1934    number_of_learners = res.number_of_learners
1935   
1936    if res.number_of_iterations > 1:
1937        iterations, all_ite = split_by_iterations(res), res
1938    else:
1939        iterations, all_ite = [res], None
1940   
1941    aucs = [[[] for i in range(numberOfClasses)] for i in range(number_of_learners)]
1942    prob = class_probabilities_from_res(res)
1943       
1944    for classIndex1 in range(numberOfClasses):
1945        for classIndex2 in range(classIndex1):
1946            pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
1947            if pair_aucs:
1948                for lrn in range(number_of_learners):
1949                    aucs[lrn][classIndex1].append(pair_aucs[lrn])
1950            else:
1951                for lrn in range(number_of_learners):
1952                    aucs[lrn][classIndex1].append(-1)
1953    return aucs
1954               
1955
1956def McNemar(res, **argkw):
1957    """ Computes a triangular matrix with McNemar statistics for each pair of
1958    classifiers. The statistics is distributed by chi-square distribution with
1959    one degree of freedom; critical value for 5% significance is around 3.84.
1960    """
1961    nLearners = res.number_of_learners
1962    mcm = []
1963    for i in range(nLearners):
1964       mcm.append([0.0]*res.number_of_learners)
1965
1966    if not res.weights or argkw.get("unweighted"):
1967        for i in res.results:
1968            actual = i.actual_class
1969            classes = i.classes
1970            for l1 in range(nLearners):
1971                for l2 in range(l1, nLearners):
1972                    if classes[l1]==actual:
1973                        if classes[l2]!=actual:
1974                            mcm[l1][l2] += 1
1975                    elif classes[l2]==actual:
1976                        mcm[l2][l1] += 1
1977    else:
1978        for i in res.results:
1979            actual = i.actual_class
1980            classes = i.classes
1981            for l1 in range(nLearners):
1982                for l2 in range(l1, nLearners):
1983                    if classes[l1]==actual:
1984                        if classes[l2]!=actual:
1985                            mcm[l1][l2] += i.weight
1986                    elif classes[l2]==actual:
1987                        mcm[l2][l1] += i.weight
1988
1989    for l1 in range(nLearners):
1990        for l2 in range(l1, nLearners):
1991            su=mcm[l1][l2] + mcm[l2][l1]
1992            if su:
1993                mcm[l2][l1] = (abs(mcm[l1][l2]-mcm[l2][l1])-1)**2 / su
1994            else:
1995                mcm[l2][l1] = 0
1996
1997    for l1 in range(nLearners):
1998        mcm[l1]=mcm[l1][:l1]
1999
2000    return mcm
2001
2002
2003def McNemar_of_two(res, lrn1, lrn2):
2004    """ McNemar_of_two computes a McNemar statistics for a pair of classifier,
2005    specified by indices learner1 and learner2.
2006    """
2007    tf = ft = 0.0
2008    if not res.weights or argkw.get("unweighted"):
2009        for i in res.results:
2010            actual=i.actual_class
2011            if i.classes[lrn1]==actual:
2012                if i.classes[lrn2]!=actual:
2013                    tf += i.weight
2014            elif i.classes[lrn2]==actual:
2015                    ft += i.weight
2016    else:
2017        for i in res.results:
2018            actual=i.actual_class
2019            if i.classes[lrn1]==actual:
2020                if i.classes[lrn2]!=actual:
2021                    tf += 1.0
2022            elif i.classes[lrn2]==actual:
2023                    ft += 1.0
2024
2025    su = tf + ft
2026    if su:
2027        return (abs(tf-ft)-1)**2 / su
2028    else:
2029        return 0
2030
2031
2032def Friedman(res, stat=CA):
2033    """ Compares classifiers by Friedman test, treating folds as different examles.
2034        Returns F, p and average ranks
2035    """
2036    res_split = split_by_iterations(res)
2037    res = [stat(r) for r in res_split]
2038   
2039    N = len(res)
2040    k = len(res[0])
2041    sums = [0.0]*k
2042    for r in res:
2043        ranks = [k-x+1 for x in statc.rankdata(r)]
2044        if stat==Brier_score: # reverse ranks for Brier_score (lower better)
2045            ranks = [k+1-x for x in ranks]
2046        sums = [ranks[i]+sums[i] for i in range(k)]
2047
2048    T = sum([x*x for x in sums])
2049    sums = [x/N for x in sums]
2050
2051    F = 12.0 / (N*k*(k+1)) * T  - 3 * N * (k+1)
2052
2053    return F, statc.chisqprob(F, k-1), sums
2054
2055
2056def Wilcoxon_pairs(res, avgranks, stat=CA):
2057    """ Returns a triangular matrix, where element[i][j] stores significance of difference
2058        between i-th and j-th classifier, as computed by Wilcoxon test. The element is positive
2059        if i-th is better than j-th, negative if it is worse, and 1 if they are equal.
2060        Arguments to function are ExperimentResults, average ranks (as returned by Friedman)
2061        and, optionally, a statistics; greater values should mean better results.append
2062    """
2063    res_split = split_by_iterations(res)
2064    res = [stat(r) for r in res_split]
2065
2066    k = len(res[0])
2067    bt = []
2068    for m1 in range(k):
2069        nl = []
2070        for m2 in range(m1+1, k):
2071            t, p = statc.wilcoxont([r[m1] for r in res], [r[m2] for r in res])
2072            if avgranks[m1]<avgranks[m2]:
2073                nl.append(p)
2074            elif avgranks[m2]<avgranks[m1]:
2075                nl.append(-p)
2076            else:
2077                nl.append(1)
2078        bt.append(nl)
2079    return bt
2080
2081
2082def plot_learning_curve_learners(file, allResults, proportions, learners, noConfidence=0):
2083    plot_learning_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], noConfidence)
2084   
2085def plot_learning_curve(file, allResults, proportions, legend, noConfidence=0):
2086    import types
2087    fopened=0
2088    if (type(file)==types.StringType):
2089        file=open(file, "wt")
2090        fopened=1
2091       
2092    file.write("set yrange [0:1]\n")
2093    file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[-1]))
2094    file.write("set multiplot\n\n")
2095    CAs = [CA_dev(x) for x in allResults]
2096
2097    file.write("plot \\\n")
2098    for i in range(len(legend)-1):
2099        if not noConfidence:
2100            file.write("'-' title '' with yerrorbars pointtype %i,\\\n" % (i+1))
2101        file.write("'-' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1))
2102    if not noConfidence:
2103        file.write("'-' title '' with yerrorbars pointtype %i,\\\n" % (len(legend)))
2104    file.write("'-' title '%s' with linespoints pointtype %i\n" % (legend[-1], len(legend)))
2105
2106    for i in range(len(legend)):
2107        if not noConfidence:
2108            for p in range(len(proportions)):
2109                file.write("%f\t%f\t%f\n" % (proportions[p], CAs[p][i][0], 1.96*CAs[p][i][1]))
2110            file.write("e\n\n")
2111
2112        for p in range(len(proportions)):
2113            file.write("%f\t%f\n" % (proportions[p], CAs[p][i][0]))
2114        file.write("e\n\n")
2115
2116    if fopened:
2117        file.close()
2118
2119
2120def print_single_ROC_curve_coordinates(file, curve):
2121    import types
2122    fopened=0
2123    if (type(file)==types.StringType):
2124        file=open(file, "wt")
2125        fopened=1
2126
2127    for coord in curve:
2128        file.write("%5.3f\t%5.3f\n" % tuple(coord))
2129
2130    if fopened:
2131        file.close()
2132
2133
2134def plot_ROC_learners(file, curves, learners):
2135    plot_ROC(file, curves, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))])
2136   
2137def plot_ROC(file, curves, legend):
2138    import types
2139    fopened=0
2140    if (type(file)==types.StringType):
2141        file=open(file, "wt")
2142        fopened=1
2143
2144    file.write("set yrange [0:1]\n")
2145    file.write("set xrange [0:1]\n")
2146    file.write("set multiplot\n\n")
2147
2148    file.write("plot \\\n")
2149    for leg in legend:
2150        file.write("'-' title '%s' with lines,\\\n" % leg)
2151    file.write("'-' title '' with lines\n")
2152
2153    for curve in curves:
2154        for coord in curve:
2155            file.write("%5.3f\t%5.3f\n" % tuple(coord))
2156        file.write("e\n\n")
2157
2158    file.write("1.0\t1.0\n0.0\t0.0e\n\n")         
2159
2160    if fopened:
2161        file.close()
2162
2163
2164
2165def plot_McNemar_curve_learners(file, allResults, proportions, learners, reference=-1):
2166    plot_McNemar_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], reference)
2167
2168def plot_McNemar_curve(file, allResults, proportions, legend, reference=-1):
2169    if reference<0:
2170        reference=len(legend)-1
2171       
2172    import types
2173    fopened=0
2174    if (type(file)==types.StringType):
2175        file=open(file, "wt")
2176        fopened=1
2177       
2178    #file.write("set yrange [0:1]\n")
2179    #file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[-1]))
2180    file.write("set multiplot\n\n")
2181    file.write("plot \\\n")
2182    tmap=range(reference)+range(reference+1, len(legend))
2183    for i in tmap[:-1]:
2184        file.write("'-' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1))
2185    file.write("'-' title '%s' with linespoints pointtype %i\n" % (legend[tmap[-1]], tmap[-1]))
2186    file.write("\n")
2187
2188    for i in tmap:
2189        for p in range(len(proportions)):
2190            file.write("%f\t%f\n" % (proportions[p], McNemar_of_two(allResults[p], i, reference)))
2191        file.write("e\n\n")
2192
2193    if fopened:
2194        file.close()
2195
2196default_point_types=("{$\\circ$}", "{$\\diamond$}", "{$+$}", "{$\\times$}", "{$|$}")+tuple([chr(x) for x in range(97, 122)])
2197default_line_types=("\\setsolid", "\\setdashpattern <4pt, 2pt>", "\\setdashpattern <8pt, 2pt>", "\\setdashes", "\\setdots")
2198
2199def learning_curve_learners_to_PiCTeX(file, allResults, proportions, **options):
2200    return apply(learning_curve_to_PiCTeX, (file, allResults, proportions), options)
2201   
2202def learning_curve_to_PiCTeX(file, allResults, proportions, **options):
2203    import types
2204    fopened=0
2205    if (type(file)==types.StringType):
2206        file=open(file, "wt")
2207        fopened=1
2208
2209    nexamples=len(allResults[0].results)
2210    CAs = [CA_dev(x) for x in allResults]
2211
2212    graphsize=float(options.get("graphsize", 10.0)) #cm
2213    difprop=proportions[-1]-proportions[0]
2214    ntestexamples=nexamples*proportions[-1]
2215    xunit=graphsize/ntestexamples
2216
2217    yshift=float(options.get("yshift", -ntestexamples/20.))
2218   
2219    pointtypes=options.get("pointtypes", default_point_types)
2220    linetypes=options.get("linetypes", default_line_types)
2221
2222    if options.has_key("numberedx"):
2223        numberedx=options["numberedx"]
2224        if type(numberedx)==types.IntType:
2225            if numberedx>0:
2226                numberedx=[nexamples*proportions[int(i/float(numberedx)*len(proportions))] for i in range(numberedx)]+[proportions[-1]*nexamples]
2227            elif numberedx<0:
2228                numberedx = -numberedx
2229                newn=[]
2230                for i in range(numberedx+1):
2231                    wanted=proportions[0]+float(i)/numberedx*difprop
2232                    best=(10, 0)
2233                    for t in proportions:
2234                        td=abs(wanted-t)
2235                        if td<best[0]:
2236                            best=(td, t)
2237                    if not best[1] in newn:
2238                        newn.append(best[1])
2239                newn.sort()
2240                numberedx=[nexamples*x for x in newn]
2241        elif type(numberedx[0])==types.FloatType:
2242            numberedx=[nexamples*x for x in numberedx]
2243    else:
2244        numberedx=[nexamples*x for x in proportions]
2245
2246    file.write("\\mbox{\n")
2247    file.write(\\beginpicture\n")
2248    file.write(\\setcoordinatesystem units <%10.8fcm, %5.3fcm>\n\n" % (xunit, graphsize))   
2249    file.write(\\setplotarea x from %5.3f to %5.3f, y from 0 to 1\n" % (0, ntestexamples))   
2250    file.write(\\axis bottom invisible\n")# label {#examples}\n")
2251    file.write("      ticks short at %s /\n" % reduce(lambda x,y:x+" "+y, ["%i"%(x*nexamples+0.5) for x in proportions]))
2252    if numberedx:
2253        file.write("            long numbered at %s /\n" % reduce(lambda x,y:x+y, ["%i " % int(x+0.5) for x in numberedx]))
2254    file.write("  /\n")
2255    file.write(\\axis left invisible\n")# label {classification accuracy}\n")
2256    file.write("      shiftedto y=%5.3f\n" % yshift)
2257    file.write("      ticks short from 0.0 to 1.0 by 0.05\n")
2258    file.write("            long numbered from 0.0 to 1.0 by 0.25\n")
2259    file.write("  /\n")
2260    if options.has_key("default"):
2261        file.write(\\setdashpattern<1pt, 1pt>\n")
2262        file.write(\\plot %5.3f %5.3f %5.3f %5.3f /\n" % (0., options["default"], ntestexamples, options["default"]))
2263   
2264    for i in range(len(CAs[0])):
2265        coordinates=reduce(lambda x,y:x+" "+y, ["%i %5.3f" % (proportions[p]*nexamples, CAs[p][i][0]) for p in range(len(proportions))])
2266        if linetypes:
2267            file.write(%s\n" % linetypes[i])
2268            file.write(\\plot %s /\n" % coordinates)
2269        if pointtypes:
2270            file.write(\\multiput %s at %s /\n" % (pointtypes[i], coordinates))
2271
2272    file.write(\\endpicture\n")
2273    file.write("}\n")
2274    if fopened:
2275        file.close()
2276    file.close()
2277    del file
2278
2279def legend_learners_to_PiCTeX(file, learners, **options):
2280  return apply(legend_to_PiCTeX, (file, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))]), options)
2281   
2282def legend_to_PiCTeX(file, legend, **options):
2283    import types
2284    fopened=0
2285    if (type(file)==types.StringType):
2286        file=open(file, "wt")
2287        fopened=1
2288
2289    pointtypes=options.get("pointtypes", default_point_types)
2290    linetypes=options.get("linetypes", default_line_types)
2291
2292    file.write("\\mbox{\n")
2293    file.write(\\beginpicture\n")
2294    file.write(\\setcoordinatesystem units <5cm, 1pt>\n\n")
2295    file.write(\\setplotarea x from 0.000 to %5.3f, y from 0 to 12\n" % len(legend))
2296
2297    for i in range(len(legend)):
2298        if linetypes:
2299            file.write(%s\n" % linetypes[i])
2300            file.write(\\plot %5.3f 6 %5.3f 6 /\n" % (i, i+0.2))
2301        if pointtypes:
2302            file.write(\\put {%s} at %5.3f 6\n" % (pointtypes[i], i+0.1))
2303        file.write(\\put {%s} [lb] at %5.3f 0\n" % (legend[i], i+0.25))
2304
2305    file.write(\\endpicture\n")
2306    file.write("}\n")
2307    if fopened:
2308        file.close()
2309    file.close()
2310    del file
2311
2312
2313def compute_friedman(avranks, N):
2314    """ Returns a tuple composed of (friedman statistic, degrees of freedom)
2315    and (Iman statistic - F-distribution, degrees of freedoma) given average
2316    ranks and a number of tested data sets N.
2317    """
2318
2319    k = len(avranks)
2320
2321    def friedman(N, k, ranks):
2322        return 12*N*(sum([rank**2.0 for rank in ranks]) - (k*(k+1)*(k+1)/4.0) )/(k*(k+1))
2323
2324    def iman(fried, N, k):
2325        return (N-1)*fried/(N*(k-1) - fried)
2326
2327    f = friedman(N, k, avranks)
2328    im = iman(f, N, k)
2329    fdistdof = (k-1, (k-1)*(N-1))
2330
2331    return (f, k-1), (im, fdistdof)
2332
2333def compute_CD(avranks, N, alpha="0.05", type="nemenyi"):
2334    """ Returns critical difference for Nemenyi or Bonferroni-Dunn test
2335    according to given alpha (either alpha="0.05" or alpha="0.1") for average
2336    ranks and number of tested data sets N. Type can be either "nemenyi" for
2337    for Nemenyi two tailed test or "bonferroni-dunn" for Bonferroni-Dunn test.
2338    """
2339
2340    k = len(avranks)
2341   
2342    d = {}
2343
2344    #can be computed in R as qtukey(0.95, n, Inf)**0.5
2345    #for (x in c(2:20)) print(qtukey(0.95, x, Inf)/(2**0.5)
2346    d[("nemenyi", "0.05")] = [0, 0, 1.959964, 2.343701, 2.569032, 2.727774, 2.849705, 2.94832, 3.030879, 3.101730, 3.163684, 3.218654, 3.268004, 3.312739, 3.353618, 3.39123, 3.426041, 3.458425, 3.488685, 3.517073, 3.543799 ]
2347    d[("nemenyi", "0.1")] = [0, 0, 1.644854, 2.052293, 2.291341, 2.459516, 2.588521, 2.692732, 2.779884, 2.854606, 2.919889, 2.977768, 3.029694, 3.076733, 3.119693, 3.159199, 3.195743, 3.229723, 3.261461, 3.291224, 3.319233 ]
2348
2349    d[("bonferroni-dunn", "0.05")] =  [0, 0, 1.960, 2.241, 2.394, 2.498, 2.576, 2.638, 2.690, 2.724, 2.773 ]
2350    d[("bonferroni-dunn", "0.1")] = [0, 0, 1.645, 1.960, 2.128, 2.241, 2.326, 2.394, 2.450, 2.498, 2.539 ]
2351
2352    q = d[(type, alpha)]
2353
2354    cd = q[k]*(k*(k+1)/(6.0*N))**0.5
2355
2356    return cd
2357 
2358
2359def graph_ranks(filename, avranks, names, cd=None, cdmethod=None, lowv=None, highv=None, width=6, textspace=1, reverse=False, **kwargs):
2360    """
2361    Draws a CD graph, which is used to display  the differences in methods'
2362    performance.
2363    See Janez Demsar, Statistical Comparisons of Classifiers over
2364    Multiple Data Sets, 7(Jan):1--30, 2006.
2365
2366    Needs matplotlib to work.
2367
2368    :param filename: Output file name (with extension). Formats supported
2369                     by matplotlib can be used.
2370    :param avranks: List of average methods' ranks.
2371    :param names: List of methods' names.
2372
2373    :param cd: Critical difference. Used for marking methods that whose
2374               difference is not statistically significant.
2375    :param lowv: The lowest shown rank, if None, use 1.
2376    :param highv: The highest shown rank, if None, use len(avranks).
2377    :param width: Width of the drawn figure in inches, default 6 in.
2378    :param textspace: Space on figure sides left for the description
2379                      of methods, default 1 in.
2380    :param reverse:  If True, the lowest rank is on the right. Default: False.
2381    :param cdmethod: None by default. It can be an index of element in avranks
2382                     or or names which specifies the method which should be
2383                     marked with an interval.
2384    """
2385
2386    width = float(width)
2387    textspace = float(textspace)
2388
2389    def nth(l,n):
2390        """
2391        Returns only nth elemnt in a list.
2392        """
2393        n = lloc(l,n)
2394        return [ a[n] for a in l ]
2395
2396    def lloc(l,n):
2397        """
2398        List location in list of list structure.
2399        Enable the use of negative locations:
2400        -1 is the last element, -2 second last...
2401        """
2402        if n < 0:
2403            return len(l[0])+n
2404        else:
2405            return n
2406
2407    def mxrange(lr):
2408        """
2409        Multiple xranges. Can be used to traverse matrices.
2410        This function is very slow due to unknown number of
2411        parameters.
2412
2413        >>> mxrange([3,5])
2414        [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
2415
2416        >>> mxrange([[3,5,1],[9,0,-3]])
2417        [(3, 9), (3, 6), (3, 3), (4, 9), (4, 6), (4, 3)]
2418
2419        """
2420        if len(lr) == 0:
2421            yield ()
2422        else:
2423            #it can work with single numbers
2424            index = lr[0]
2425            if type(1) == type(index):
2426                index = [ index ]
2427            for a in range(*index):
2428                for b in mxrange(lr[1:]):
2429                    yield tuple([a] + list(b))
2430
2431    try:
2432        from matplotlib.figure import Figure
2433        from matplotlib.patches import Polygon
2434        from matplotlib.backends.backend_agg import FigureCanvasAgg
2435    except:
2436        import sys
2437        print >> sys.stderr, "Function requires matplotlib. Please install it."
2438        return
2439
2440    def print_figure(fig, *args, **kwargs):
2441        canvas = FigureCanvasAgg(fig)
2442        canvas.print_figure(*args, **kwargs)
2443
2444    sums = avranks
2445
2446    tempsort =  sorted([ (a,i) for i,a in  enumerate(sums) ], reverse=reverse)
2447    ssums = nth(tempsort, 0)
2448    sortidx = nth(tempsort, 1)
2449    nnames = [ names[x] for x in sortidx ]
2450   
2451    if lowv == None:
2452        lowv = min(1, int(math.floor(min(ssums))))
2453    if highv == None:
2454        highv = max(len(avranks), int(math.ceil(max(ssums))))
2455
2456    cline = 0.4
2457
2458    k = len(sums)
2459
2460    lines = None
2461    sums = sorted(sums)
2462
2463    linesblank = 0
2464    scalewidth = width - 2*textspace
2465
2466    def rankpos(rank):
2467        if not reverse:
2468            a = rank - lowv
2469        else:
2470            a = highv - rank
2471        return textspace+scalewidth/(highv-lowv)*a
2472
2473    distanceh = 0.25
2474
2475    if cd and cdmethod == None:
2476   
2477        #get pairs of non significant methods
2478
2479        def get_lines(sums, hsd):
2480
2481            #get all pairs
2482            lsums = len(sums)
2483            allpairs = [ (i,j) for i,j in mxrange([[lsums], [lsums]]) if j > i ]
2484            #remove not significant
2485            notSig = [ (i,j) for i,j in allpairs if abs(sums[i]-sums[j]) <= hsd ]
2486            #keep only longest
2487           
2488            def no_longer((i,j), notSig):
2489                for i1,j1 in notSig:
2490                    if (i1 <= i and j1 > j) or (i1 < i and j1 >= j):
2491                        return False
2492                return True
2493
2494            longest = [ (i,j) for i,j in notSig if no_longer((i,j),notSig) ]
2495           
2496            return longest
2497
2498        lines = get_lines(ssums, cd)
2499        linesblank = 0.2 + 0.2 + (len(lines)-1)*0.1
2500
2501        #add scale
2502        distanceh = 0.25
2503        cline += distanceh
2504
2505    #calculate height needed height of an image
2506    minnotsignificant = max(2*0.2, linesblank)
2507    height = cline + ((k+1)/2)*0.2 + minnotsignificant
2508
2509    fig = Figure(figsize=(width, height))
2510    ax = fig.add_axes([0,0,1,1]) #reverse y axis
2511    ax.set_axis_off()
2512
2513    hf = 1./height # height factor
2514    wf = 1./width
2515
2516    def hfl(l): 
2517        return [ a*hf for a in l ]
2518
2519    def wfl(l): 
2520        return [ a*wf for a in l ]
2521
2522    """
2523    Upper left corner is (0,0).
2524    """
2525
2526    ax.plot([0,1], [0,1], c="w")
2527    ax.set_xlim(0, 1)
2528    ax.set_ylim(1, 0)
2529
2530    def line(l, color='k', **kwargs):
2531        """
2532        Input is a list of pairs of points.
2533        """
2534        ax.plot(wfl(nth(l,0)), hfl(nth(l,1)), color=color, **kwargs)
2535
2536    def text(x, y, s, *args, **kwargs):
2537        ax.text(wf*x, hf*y, s, *args, **kwargs)
2538
2539    line([(textspace, cline), (width-textspace, cline)], linewidth=0.7)
2540   
2541    bigtick = 0.1
2542    smalltick = 0.05
2543
2544
2545    import numpy
2546
2547    for a in list(numpy.arange(lowv, highv, 0.5)) + [highv]:
2548        tick = smalltick
2549        if a == int(a): tick = bigtick
2550        line([(rankpos(a), cline-tick/2),(rankpos(a), cline)], linewidth=0.7)
2551
2552    for a in range(lowv, highv+1):
2553        text(rankpos(a), cline-tick/2-0.05, str(a), ha="center", va="bottom")
2554
2555    k = len(ssums)
2556
2557    for i in range((k+1)/2):
2558        chei = cline+ minnotsignificant + (i)*0.2
2559        line([(rankpos(ssums[i]), cline), (rankpos(ssums[i]), chei), (textspace-0.1, chei)], linewidth=0.7)
2560        text(textspace-0.2, chei, nnames[i], ha="right", va="center")
2561
2562    for i in range((k+1)/2, k):
2563        chei = cline + minnotsignificant + (k-i-1)*0.2
2564        line([(rankpos(ssums[i]), cline), (rankpos(ssums[i]), chei), (textspace+scalewidth+0.1, chei)], linewidth=0.7)
2565        text(textspace+scalewidth+0.2, chei, nnames[i], ha="left", va="center")
2566
2567    if cd and cdmethod == None:
2568
2569        #upper scale
2570        if not reverse:
2571            begin, end = rankpos(lowv), rankpos(lowv+cd)
2572        else:
2573            begin, end = rankpos(highv), rankpos(highv - cd)
2574           
2575        line([(begin, distanceh), (end, distanceh)], linewidth=0.7)
2576        line([(begin, distanceh + bigtick/2), (begin, distanceh - bigtick/2)], linewidth=0.7)
2577        line([(end, distanceh + bigtick/2), (end, distanceh - bigtick/2)], linewidth=0.7)
2578        text((begin+end)/2, distanceh - 0.05, "CD", ha="center", va="bottom")
2579
2580        #non significance lines   
2581        def draw_lines(lines, side=0.05, height=0.1):
2582            start = cline + 0.2
2583            for l,r in lines: 
2584                line([(rankpos(ssums[l])-side, start), (rankpos(ssums[r])+side, start)], linewidth=2.5) 
2585                start += height
2586
2587        draw_lines(lines)
2588
2589    elif cd:
2590        begin = rankpos(avranks[cdmethod]-cd)
2591        end = rankpos(avranks[cdmethod]+cd)
2592        line([(begin, cline), (end, cline)], linewidth=2.5) 
2593        line([(begin, cline + bigtick/2), (begin, cline - bigtick/2)], linewidth=2.5)
2594        line([(end, cline + bigtick/2), (end, cline - bigtick/2)], linewidth=2.5)
2595 
2596    print_figure(fig, filename, **kwargs)
2597
2598def mlc_hamming_loss(res):
2599    """
2600    Schapire and Singer (2000) presented Hamming Loss, which id defined as:
2601   
2602    :math:`HammingLoss(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{Y_i \\vartriangle Z_i}{|L|}`
2603    """
2604    losses = [0]*res.number_of_learners
2605    label_num = len(res.labels)
2606    example_num = gettotsize(res)
2607
2608    for e in res.results:
2609        aclass = e.actual_class
2610        for i, labels in enumerate(e.classes):
2611            labels = map(int, labels)
2612            if len(labels) <> len(aclass):
2613                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2614            for j in range(label_num):
2615                if labels[j] != aclass[j]:
2616                    losses[i] += 1
2617           
2618    return [float(x)/(label_num*example_num) for x in losses]
2619
2620def mlc_accuracy(res, forgiveness_rate = 1.0):
2621    """
2622    Godbole & Sarawagi, 2004 uses the metrics accuracy, precision, recall as follows:
2623     
2624    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|}`
2625   
2626    Boutell et al. (2004) give a more generalized version using a parameter :math:`\\alpha \\ge 0`,
2627    called forgiveness rate:
2628   
2629    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} (\\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|})^{\\alpha}`
2630    """
2631    accuracies = [0.0]*res.number_of_learners
2632    label_num = len(res.labels)
2633    example_num = gettotsize(res)
2634   
2635    for e in res.results:
2636        aclass = e.actual_class
2637        for i, labels in enumerate(e.classes):
2638            labels = map(int, labels)
2639            if len(labels) <> len(aclass):
2640                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2641           
2642            intersection = 0.0
2643            union = 0.0
2644            for real, pred in zip(labels, aclass):
2645                if real and pred:
2646                    intersection = intersection+1
2647                if real or pred:
2648                    union = union+1
2649
2650            if union != 0:
2651                accuracies[i] = accuracies[i] + intersection/union
2652           
2653    return [math.pow(x/example_num,forgiveness_rate) for x in accuracies]
2654
2655def mlc_precision(res):
2656    """
2657    :math:`Precision(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Z_i|}`
2658    """
2659    precisions = [0.0]*res.number_of_learners
2660    label_num = len(res.labels)
2661    example_num = gettotsize(res)
2662   
2663    for e in res.results:
2664        aclass = e.actual_class
2665        for i, labels in enumerate(e.classes):
2666            labels = map(int, labels)
2667            if len(labels) <> len(aclass):
2668                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2669           
2670            intersection = 0.0
2671            predicted = 0.0
2672            for real, pred in zip(labels, aclass):
2673                if real and pred:
2674                    intersection = intersection+1
2675                if real:
2676                    predicted = predicted + 1
2677            if predicted <> 0:
2678                precisions[i] = precisions[i] + intersection/predicted
2679           
2680    return [x/example_num for x in precisions]
2681
2682def mlc_recall(res):
2683    """
2684    :math:`Recall(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i|}`
2685    """
2686    recalls = [0.0]*res.number_of_learners
2687    label_num = len(res.labels)
2688    example_num = gettotsize(res)
2689   
2690    for e in res.results:
2691        aclass = e.actual_class
2692        for i, labels in enumerate(e.classes):
2693            labels = map(int, labels)
2694            if len(labels) <> len(aclass):
2695                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2696           
2697            intersection = 0.0
2698            actual = 0.0
2699            for real, pred in zip(labels, aclass):
2700                if real and pred:
2701                    intersection = intersection+1
2702                if pred:
2703                    actual = actual + 1
2704            if actual <> 0:
2705                recalls[i] = recalls[i] + intersection/actual
2706           
2707    return [x/example_num for x in recalls]
2708
2709#def mlc_ranking_loss(res):
2710#    pass
2711#
2712#def mlc_average_precision(res):
2713#    pass
2714#
2715#def mlc_hierarchical_loss(res):
2716#    pass
2717
2718#########################################################################################
2719if __name__ == "__main__":
2720    avranks =  [3.143, 2.000, 2.893, 1.964]
2721    names = ["prva", "druga", "tretja", "cetrta" ]
2722    cd = compute_CD(avranks, 14)
2723    #cd = compute_CD(avranks, 10, type="bonferroni-dunn")
2724    print cd
2725
2726    print compute_friedman(avranks, 14)
2727
2728    #graph_ranks("test.eps", avranks, names, cd=cd, cdmethod=0, width=6, textspace=1.5)
Note: See TracBrowser for help on using the repository browser.