source: orange/orange/evaluation/scoring.py @ 9669:165371b04b4a

Revision 9669:165371b04b4a, 100.3 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved content of Orange dir to package dir

Line 
1"""
2############################
3Method scoring (``scoring``)
4############################
5
6.. index: scoring
7
8This module contains various measures of quality for classification and
9regression. Most functions require an argument named :obj:`res`, an instance of
10:class:`Orange.evaluation.testing.ExperimentResults` as computed by
11functions from :mod:`Orange.evaluation.testing` and which contains
12predictions obtained through cross-validation,
13leave one-out, testing on training data or test set instances.
14
15==============
16Classification
17==============
18
19To prepare some data for examples on this page, we shall load the voting data
20set (problem of predicting the congressman's party (republican, democrat)
21based on a selection of votes) and evaluate naive Bayesian learner,
22classification trees and majority classifier using cross-validation.
23For examples requiring a multivalued class problem, we shall do the same
24with the vehicle data set (telling whether a vehicle described by the features
25extracted from a picture is a van, bus, or Opel or Saab car).
26
27Basic cross validation example is shown in the following part of
28(:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`):
29
30.. literalinclude:: code/statExample0.py
31
32If instances are weighted, weights are taken into account. This can be
33disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of
34disabling weights is to clear the
35:class:`Orange.evaluation.testing.ExperimentResults`' flag weights.
36
37General Measures of Quality
38===========================
39
40.. autofunction:: CA
41
42.. autofunction:: AP
43
44.. autofunction:: Brier_score
45
46.. autofunction:: IS
47
48So, let's compute all this in part of
49(:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`) and print it out:
50
51.. literalinclude:: code/statExample1.py
52   :lines: 13-
53
54The output should look like this::
55
56    method  CA      AP      Brier    IS
57    bayes   0.903   0.902   0.175    0.759
58    tree    0.846   0.845   0.286    0.641
59    majrty  0.614   0.526   0.474   -0.000
60
61Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out
62the standard errors.
63
64Confusion Matrix
65================
66
67.. autofunction:: confusion_matrices
68
69   **A positive-negative confusion matrix** is computed (a) if the class is
70   binary unless :obj:`classIndex` argument is -2, (b) if the class is
71   multivalued and the :obj:`classIndex` is non-negative. Argument
72   :obj:`classIndex` then tells which class is positive. In case (a),
73   :obj:`classIndex` may be omitted; the first class
74   is then negative and the second is positive, unless the :obj:`baseClass`
75   attribute in the object with results has non-negative value. In that case,
76   :obj:`baseClass` is an index of the target class. :obj:`baseClass`
77   attribute of results object should be set manually. The result of a
78   function is a list of instances of class :class:`ConfusionMatrix`,
79   containing the (weighted) number of true positives (TP), false
80   negatives (FN), false positives (FP) and true negatives (TN).
81   
82   We can also add the keyword argument :obj:`cutoff`
83   (e.g. confusion_matrices(results, cutoff=0.3); if we do, :obj:`confusion_matrices`
84   will disregard the classifiers' class predictions and observe the predicted
85   probabilities, and consider the prediction "positive" if the predicted
86   probability of the positive class is higher than the :obj:`cutoff`.
87
88   The example (part of :download:`statExamples.py <code/statExamples.py>`) below shows how setting the
89   cut off threshold from the default 0.5 to 0.2 affects the confusion matrics
90   for naive Bayesian classifier::
91   
92       cm = Orange.evaluation.scoring.confusion_matrices(res)[0]
93       print "Confusion matrix for naive Bayes:"
94       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
95       
96       cm = Orange.evaluation.scoring.confusion_matrices(res, cutoff=0.2)[0]
97       print "Confusion matrix for naive Bayes:"
98       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)
99
100   The output::
101   
102       Confusion matrix for naive Bayes:
103       TP: 238, FP: 13, FN: 29.0, TN: 155
104       Confusion matrix for naive Bayes:
105       TP: 239, FP: 18, FN: 28.0, TN: 150
106   
107   shows that the number of true positives increases (and hence the number of
108   false negatives decreases) by only a single instance, while five instances
109   that were originally true negatives become false positives due to the
110   lower threshold.
111   
112   To observe how good are the classifiers in detecting vans in the vehicle
113   data set, we would compute the matrix like this::
114   
115      cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \
116vehicle.domain.classVar.values.index("van"))
117   
118   and get the results like these::
119   
120       TP: 189, FP: 241, FN: 10.0, TN: 406
121   
122   while the same for class "opel" would give::
123   
124       TP: 86, FP: 112, FN: 126.0, TN: 522
125       
126   The main difference is that there are only a few false negatives for the
127   van, meaning that the classifier seldom misses it (if it says it's not a
128   van, it's almost certainly not a van). Not so for the Opel car, where the
129   classifier missed 126 of them and correctly detected only 86.
130   
131   **General confusion matrix** is computed (a) in case of a binary class,
132   when :obj:`classIndex` is set to -2, (b) when we have multivalued class and
133   the caller doesn't specify the :obj:`classIndex` of the positive class.
134   When called in this manner, the function cannot use the argument
135   :obj:`cutoff`.
136   
137   The function then returns a three-dimensional matrix, where the element
138   A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`]
139   gives the number of instances belonging to 'actual_class' for which the
140   'learner' predicted 'predictedClass'. We shall compute and print out
141   the matrix for naive Bayesian classifier.
142   
143   Here we see another example from :download:`statExamples.py <code/statExamples.py>`::
144   
145       cm = Orange.evaluation.scoring.confusion_matrices(resVeh)[0]
146       classes = vehicle.domain.classVar.values
147       print "\t"+"\t".join(classes)
148       for className, classConfusions in zip(classes, cm):
149           print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions))
150   
151   So, here's what this nice piece of code gives::
152   
153              bus   van  saab opel
154       bus     56   95   21   46
155       van     6    189  4    0
156       saab    3    75   73   66
157       opel    4    71   51   86
158       
159   Van's are clearly simple: 189 vans were classified as vans (we know this
160   already, we've printed it out above), and the 10 misclassified pictures
161   were classified as buses (6) and Saab cars (4). In all other classes,
162   there were more instances misclassified as vans than correctly classified
163   instances. The classifier is obviously quite biased to vans.
164   
165   .. method:: sens(confm)
166   .. method:: spec(confm)
167   .. method:: PPV(confm)
168   .. method:: NPV(confm)
169   .. method:: precision(confm)
170   .. method:: recall(confm)
171   .. method:: F2(confm)
172   .. method:: Falpha(confm, alpha=2.0)
173   .. method:: MCC(conf)
174
175   With the confusion matrix defined in terms of positive and negative
176   classes, you can also compute the
177   `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_(tests)>`_
178   [TP/(TP+FN)], `specificity \
179<http://en.wikipedia.org/wiki/Specificity_%28tests%29>`_
180   [TN/(TN+FP)], `positive predictive value \
181<http://en.wikipedia.org/wiki/Positive_predictive_value>`_
182   [TP/(TP+FP)] and `negative predictive value \
183<http://en.wikipedia.org/wiki/Negative_predictive_value>`_ [TN/(TN+FN)].
184   In information retrieval, positive predictive value is called precision
185   (the ratio of the number of relevant records retrieved to the total number
186   of irrelevant and relevant records retrieved), and sensitivity is called
187   `recall <http://en.wikipedia.org/wiki/Information_retrieval>`_
188   (the ratio of the number of relevant records retrieved to the total number
189   of relevant records in the database). The
190   `harmonic mean <http://en.wikipedia.org/wiki/Harmonic_mean>`_ of precision
191   and recall is called an
192   `F-measure <http://en.wikipedia.org/wiki/F-measure>`_, where, depending
193   on the ratio of the weight between precision and recall is implemented
194   as F1 [2*precision*recall/(precision+recall)] or, for a general case,
195   Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)].
196   The `Matthews correlation coefficient \
197<http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
198   in essence a correlation coefficient between
199   the observed and predicted binary classifications; it returns a value
200   between -1 and +1. A coefficient of +1 represents a perfect prediction,
201   0 an average random prediction and -1 an inverse prediction.
202   
203   If the argument :obj:`confm` is a single confusion matrix, a single
204   result (a number) is returned. If confm is a list of confusion matrices,
205   a list of scores is returned, one for each confusion matrix.
206   
207   Note that weights are taken into account when computing the matrix, so
208   these functions don't check the 'weighted' keyword argument.
209   
210   Let us print out sensitivities and specificities of our classifiers in
211   part of :download:`statExamples.py <code/statExamples.py>`::
212   
213       cm = Orange.evaluation.scoring.confusion_matrices(res)
214       print
215       print "method\tsens\tspec"
216       for l in range(len(learners)):
217           print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l]))
218   
219ROC Analysis
220============
221
222`Receiver Operating Characteristic \
223<http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
224(ROC) analysis was initially developed for
225a binary-like problems and there is no consensus on how to apply it in
226multi-class problems, nor do we know for sure how to do ROC analysis after
227cross validation and similar multiple sampling techniques. If you are
228interested in the area under the curve, function AUC will deal with those
229problems as specifically described below.
230
231.. autofunction:: AUC
232   
233   .. attribute:: AUC.ByWeightedPairs (or 0)
234     
235      Computes AUC for each pair of classes (ignoring instances of all other
236      classes) and averages the results, weighting them by the number of
237      pairs of instances from these two classes (e.g. by the product of
238      probabilities of the two classes). AUC computed in this way still
239      behaves as concordance index, e.g., gives the probability that two
240      randomly chosen instances from different classes will be correctly
241      recognized (this is of course true only if the classifier knows
242      from which two classes the instances came).
243   
244   .. attribute:: AUC.ByPairs (or 1)
245   
246      Similar as above, except that the average over class pairs is not
247      weighted. This AUC is, like the binary, independent of class
248      distributions, but it is not related to concordance index any more.
249     
250   .. attribute:: AUC.WeightedOneAgainstAll (or 2)
251     
252      For each class, it computes AUC for this class against all others (that
253      is, treating other classes as one class). The AUCs are then averaged by
254      the class probabilities. This is related to concordance index in which
255      we test the classifier's (average) capability for distinguishing the
256      instances from a specified class from those that come from other classes.
257      Unlike the binary AUC, the measure is not independent of class
258      distributions.
259     
260   .. attribute:: AUC.OneAgainstAll (or 3)
261   
262      As above, except that the average is not weighted.
263   
264   In case of multiple folds (for instance if the data comes from cross
265   validation), the computation goes like this. When computing the partial
266   AUCs for individual pairs of classes or singled-out classes, AUC is
267   computed for each fold separately and then averaged (ignoring the number
268   of instances in each fold, it's just a simple average). However, if a
269   certain fold doesn't contain any instances of a certain class (from the
270   pair), the partial AUC is computed treating the results as if they came
271   from a single-fold. This is not really correct since the class
272   probabilities from different folds are not necessarily comparable,
273   yet this will most often occur in a leave-one-out experiments,
274   comparability shouldn't be a problem.
275   
276   Computing and printing out the AUC's looks just like printing out
277   classification accuracies (except that we call AUC instead of
278   CA, of course)::
279   
280       AUCs = Orange.evaluation.scoring.AUC(res)
281       for l in range(len(learners)):
282           print "%10s: %5.3f" % (learners[l].name, AUCs[l])
283           
284   For vehicle, you can run exactly this same code; it will compute AUCs
285   for all pairs of classes and return the average weighted by probabilities
286   of pairs. Or, you can specify the averaging method yourself, like this::
287   
288       AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll)
289   
290   The following snippet tries out all four. (We don't claim that this is
291   how the function needs to be used; it's better to stay with the default.)::
292   
293       methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]
294       print " " *25 + "  \tbayes\ttree\tmajority"
295       for i in range(4):
296           AUCs = Orange.evaluation.scoring.AUC(resVeh, i)
297           print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))
298   
299   As you can see from the output::
300   
301                                   bayes   tree    majority
302              by pairs, weighted:  0.789   0.871   0.500
303                        by pairs:  0.791   0.872   0.500
304           one vs. all, weighted:  0.783   0.800   0.500
305                     one vs. all:  0.783   0.800   0.500
306
307.. autofunction:: AUC_single
308
309.. autofunction:: AUC_pair
310
311.. autofunction:: AUC_matrix
312
313The remaining functions, which plot the curves and statistically compare
314them, require that the results come from a test with a single iteration,
315and they always compare one chosen class against all others. If you have
316cross validation results, you can either use split_by_iterations to split the
317results by folds, call the function for each fold separately and then sum
318the results up however you see fit, or you can set the ExperimentResults'
319attribute number_of_iterations to 1, to cheat the function - at your own
320responsibility for the statistical correctness. Regarding the multi-class
321problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class
322attribute's baseValue at the time when results were computed. If baseValue
323was not given at that time, 1 (that is, the second class) is used as default.
324
325We shall use the following code to prepare suitable experimental results::
326
327    ri2 = Orange.core.MakeRandomIndices2(voting, 0.6)
328    train = voting.selectref(ri2, 0)
329    test = voting.selectref(ri2, 1)
330    res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test)
331
332
333.. autofunction:: AUCWilcoxon
334
335.. autofunction:: compute_ROC
336
337Comparison of Algorithms
338------------------------
339
340.. autofunction:: McNemar
341
342.. autofunction:: McNemar_of_two
343
344==========
345Regression
346==========
347
348General Measure of Quality
349==========================
350
351Several alternative measures, as given below, can be used to evaluate
352the sucess of numeric prediction:
353
354.. image:: files/statRegression.png
355
356.. autofunction:: MSE
357
358.. autofunction:: RMSE
359
360.. autofunction:: MAE
361
362.. autofunction:: RSE
363
364.. autofunction:: RRSE
365
366.. autofunction:: RAE
367
368.. autofunction:: R2
369
370The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to
371score several regression methods.
372
373.. literalinclude:: code/statExamplesRegression.py
374
375The code above produces the following output::
376
377    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
378    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
379    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
380    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
381    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715
382   
383=================
384Ploting functions
385=================
386
387.. autofunction:: graph_ranks
388
389The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:
390
391.. literalinclude:: code/statExamplesGraphRanks.py
392
393Code produces the following graph:
394
395.. image:: files/statExamplesGraphRanks1.png
396
397.. autofunction:: compute_CD
398
399.. autofunction:: compute_friedman
400
401=================
402Utility Functions
403=================
404
405.. autofunction:: split_by_iterations
406
407=====================================
408Scoring for multilabel classification
409=====================================
410
411Multi-label classification requries different metrics than those used in traditional single-label
412classification. This module presents the various methrics that have been proposed in the literature.
413Let :math:`D` be a multi-label evaluation data set, conisting of :math:`|D|` multi-label examples
414:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multi-label classifier
415and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`.
416
417.. autofunction:: mlc_hamming_loss
418.. autofunction:: mlc_accuracy
419.. autofunction:: mlc_precision
420.. autofunction:: mlc_recall
421
422So, let's compute all this and print it out (part of
423:download:`mlc-evaluate.py <code/mlc-evaluate.py>`, uses
424:download:`emotions.tab <code/emotions.tab>`):
425
426.. literalinclude:: code/mlc-evaluate.py
427   :lines: 1-15
428
429The output should look like this::
430
431    loss= [0.9375]
432    accuracy= [0.875]
433    precision= [1.0]
434    recall= [0.875]
435
436References
437==========
438
439Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification',
440Pattern Recogintion, vol.37, no.9, pp:1757-71
441
442Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper
443presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
444(PAKDD 2004)
445 
446Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',
447Machine Learning, vol.39, no.2/3, pp:135-68.
448
449"""
450
451import statc, operator, math
452from operator import add
453import numpy
454
455import Orange
456
457
458#### Private stuff
459
460def log2(x):
461    """Calculate logarithm in base 2."""
462    return math.log(x)/math.log(2)
463
464def check_non_zero(x):
465    """Throw Value Error when x = 0.0."""
466    if x==0.0:
467        raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
468
469def gettotweight(res):
470    """Sum all the weights"""
471    totweight = reduce(lambda x, y: x+y.weight, res.results, 0)
472    if totweight==0.0:
473        raise ValueError, "Cannot compute the score: sum of weights is 0.0."
474    return totweight
475
476def gettotsize(res):
477    """ Get number of result instances """
478    if len(res.results):
479        return len(res.results)
480    else:
481        raise ValueError, "Cannot compute the score: no examples."
482
483
484def split_by_iterations(res):
485    """ Splits ExperimentResults of multiple iteratation test into a list
486    of ExperimentResults, one for each iteration.
487    """
488    if res.number_of_iterations < 2:
489        return [res]
490       
491    ress = [Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels)
492            for i in range(res.number_of_iterations)]
493    for te in res.results:
494        ress[te.iteration_number].results.append(te)
495    return ress
496
497def split_by_classifiers(res):
498    """ Splites an instance of :obj:`ExperimentResults` into a list of
499    :obj:`ExperimentResults`, one for each classifier.
500    """
501    split_res = []
502    for i in range(len(res.classifierNames)):
503        r = Orange.evaluation.testing.ExperimentResults(res.numberOfIterations,
504                    [res.classifierNames[i]], res.classValues,
505                    weights=res.weights, baseClass=res.baseClass,
506                    classifiers=[res.classifiers[i]] if res.classifiers else [],
507                    test_type = res.test_type, labels = res.labels)
508        r.results = []
509        for te in res.results:
510            r.results.append(Orange.evaluation.testing.TestedExample(te.iterationNumber,
511                                te.actualClass, n=1, weight=te.weight))
512            r.results[-1].classes = [te.classes[i]]
513            r.results[-1].probabilities = [te.probabilities[i]]
514        split_res.append(r)
515    return split_res
516
517
518def class_probabilities_from_res(res, **argkw):
519    """Calculate class probabilities"""
520    probs = [0.0] * len(res.class_values)
521    if argkw.get("unweighted", 0) or not res.weights:
522        for tex in res.results:
523            probs[int(tex.actual_class)] += 1.0
524        totweight = gettotsize(res)
525    else:
526        totweight = 0.0
527        for tex in res.results:
528            probs[tex.actual_class] += tex.weight
529            totweight += tex.weight
530        check_non_zero(totweight)
531    return [prob/totweight for prob in probs]
532
533
534def statistics_by_folds(stats, foldN, reportSE, iterationIsOuter):
535    # remove empty folds, turn the matrix so that learner is outer
536    if iterationIsOuter:
537        if not stats:
538            raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
539        number_of_learners = len(stats[0])
540        stats = filter(lambda (x, fN): fN>0.0, zip(stats,foldN))
541        stats = [ [x[lrn]/fN for x, fN in stats] for lrn in range(number_of_learners)]
542    else:
543        stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, foldN))] for lrnD in stats]
544
545    if not stats:
546        raise ValueError, "Cannot compute the score: no classifiers"
547    if not stats[0]:
548        raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0."
549   
550    if reportSE:
551        return [(statc.mean(x), statc.sterr(x)) for x in stats]
552    else:
553        return [statc.mean(x) for x in stats]
554   
555def ME(res, **argkw):
556    MEs = [0.0]*res.number_of_learners
557
558    if argkw.get("unweighted", 0) or not res.weights:
559        for tex in res.results:
560            MEs = map(lambda res, cls, ac = float(tex.actual_class):
561                      res + abs(float(cls) - ac), MEs, tex.classes)
562        totweight = gettotsize(res)
563    else:
564        for tex in res.results:
565            MEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight:
566                       res + tw*abs(float(cls) - ac), MEs, tex.classes)
567        totweight = gettotweight(res)
568
569    return [x/totweight for x in MEs]
570
571MAE = ME
572
573#########################################################################
574# PERFORMANCE MEASURES:
575# Scores for evaluation of numeric predictions
576
577def check_argkw(dct, lst):
578    """check_argkw(dct, lst) -> returns true if any items have non-zero value in dct"""
579    return reduce(lambda x,y: x or y, [dct.get(k, 0) for k in lst])
580
581def regression_error(res, **argkw):
582    """regression_error(res) -> regression error (default: MSE)"""
583    if argkw.get("SE", 0) and res.number_of_iterations > 1:
584        # computes the scores for each iteration, then averages
585        scores = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)]
586        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0):
587            norm = [0.0] * res.number_of_iterations
588
589        nIter = [0]*res.number_of_iterations       # counts examples in each iteration
590        a = [0]*res.number_of_iterations           # average class in each iteration
591        for tex in res.results:
592            nIter[tex.iteration_number] += 1
593            a[tex.iteration_number] += float(tex.actual_class)
594        a = [a[i]/nIter[i] for i in range(res.number_of_iterations)]
595
596        if argkw.get("unweighted", 0) or not res.weights:
597            # iterate accross test cases
598            for tex in res.results:
599                ai = float(tex.actual_class)
600                nIter[tex.iteration_number] += 1
601
602                # compute normalization, if required
603                if argkw.get("norm-abs", 0):
604                    norm[tex.iteration_number] += abs(ai - a[tex.iteration_number])
605                elif argkw.get("norm-sqr", 0):
606                    norm[tex.iteration_number] += (ai - a[tex.iteration_number])**2
607
608                # iterate accross results of different regressors
609                for i, cls in enumerate(tex.classes):
610                    if argkw.get("abs", 0):
611                        scores[i][tex.iteration_number] += abs(float(cls) - ai)
612                    else:
613                        scores[i][tex.iteration_number] += (float(cls) - ai)**2
614        else: # unweighted<>0
615            raise NotImplementedError, "weighted error scores with SE not implemented yet"
616
617        if argkw.get("norm-abs") or argkw.get("norm-sqr"):
618            scores = [[x/n for x, n in zip(y, norm)] for y in scores]
619        else:
620            scores = [[x/ni for x, ni in zip(y, nIter)] for y in scores]
621
622        if argkw.get("R2"):
623            scores = [[1.0 - x for x in y] for y in scores]
624
625        if argkw.get("sqrt", 0):
626            scores = [[math.sqrt(x) for x in y] for y in scores]
627
628        return [(statc.mean(x), statc.std(x)) for x in scores]
629       
630    else: # single iteration (testing on a single test set)
631        scores = [0.0] * res.number_of_learners
632        norm = 0.0
633
634        if argkw.get("unweighted", 0) or not res.weights:
635            a = sum([tex.actual_class for tex in res.results]) \
636                / len(res.results)
637            for tex in res.results:
638                if argkw.get("abs", 0):
639                    scores = map(lambda res, cls, ac = float(tex.actual_class):
640                                 res + abs(float(cls) - ac), scores, tex.classes)
641                else:
642                    scores = map(lambda res, cls, ac = float(tex.actual_class):
643                                 res + (float(cls) - ac)**2, scores, tex.classes)
644
645                if argkw.get("norm-abs", 0):
646                    norm += abs(tex.actual_class - a)
647                elif argkw.get("norm-sqr", 0):
648                    norm += (tex.actual_class - a)**2
649            totweight = gettotsize(res)
650        else:
651            # UNFINISHED
652            for tex in res.results:
653                MSEs = map(lambda res, cls, ac = float(tex.actual_class),
654                           tw = tex.weight:
655                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes)
656            totweight = gettotweight(res)
657
658        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0):
659            scores = [s/norm for s in scores]
660        else: # normalize by number of instances (or sum of weights)
661            scores = [s/totweight for s in scores]
662
663        if argkw.get("R2"):
664            scores = [1.0 - s for s in scores]
665
666        if argkw.get("sqrt", 0):
667            scores = [math.sqrt(x) for x in scores]
668
669        return scores
670
671def MSE(res, **argkw):
672    """ Computes mean-squared error. """
673    return regression_error(res, **argkw)
674   
675def RMSE(res, **argkw):
676    """ Computes root mean-squared error. """
677    argkw.setdefault("sqrt", True)
678    return regression_error(res, **argkw)
679
680def MAE(res, **argkw):
681    """ Computes mean absolute error. """
682    argkw.setdefault("abs", True)
683    return regression_error(res, **argkw)
684
685def RSE(res, **argkw):
686    """ Computes relative squared error. """
687    argkw.setdefault("norm-sqr", True)
688    return regression_error(res, **argkw)
689
690def RRSE(res, **argkw):
691    """ Computes relative squared error. """
692    argkw.setdefault("norm-sqr", True)
693    argkw.setdefault("sqrt", True)
694    return regression_error(res, **argkw)
695
696def RAE(res, **argkw):
697    """ Computes relative absolute error. """
698    argkw.setdefault("abs", True)
699    argkw.setdefault("norm-abs", True)
700    return regression_error(res, **argkw)
701
702def R2(res, **argkw):
703    """ Computes the coefficient of determination, R-squared. """
704    argkw.setdefault("norm-sqr", True)
705    argkw.setdefault("R2", True)
706    return regression_error(res, **argkw)
707
708def MSE_old(res, **argkw):
709    """MSE(res) -> mean-squared error"""
710    if argkw.get("SE", 0) and res.number_of_iterations > 1:
711        MSEs = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)]
712        nIter = [0]*res.number_of_iterations
713        if argkw.get("unweighted", 0) or not res.weights:
714            for tex in res.results:
715                ac = float(tex.actual_class)
716                nIter[tex.iteration_number] += 1
717                for i, cls in enumerate(tex.classes):
718                    MSEs[i][tex.iteration_number] += (float(cls) - ac)**2
719        else:
720            raise ValueError, "weighted RMSE with SE not implemented yet"
721        MSEs = [[x/ni for x, ni in zip(y, nIter)] for y in MSEs]
722        if argkw.get("sqrt", 0):
723            MSEs = [[math.sqrt(x) for x in y] for y in MSEs]
724        return [(statc.mean(x), statc.std(x)) for x in MSEs]
725       
726    else:
727        MSEs = [0.0]*res.number_of_learners
728        if argkw.get("unweighted", 0) or not res.weights:
729            for tex in res.results:
730                MSEs = map(lambda res, cls, ac = float(tex.actual_class):
731                           res + (float(cls) - ac)**2, MSEs, tex.classes)
732            totweight = gettotsize(res)
733        else:
734            for tex in res.results:
735                MSEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight:
736                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes)
737            totweight = gettotweight(res)
738
739        if argkw.get("sqrt", 0):
740            MSEs = [math.sqrt(x) for x in MSEs]
741        return [x/totweight for x in MSEs]
742
743def RMSE_old(res, **argkw):
744    """RMSE(res) -> root mean-squared error"""
745    argkw.setdefault("sqrt", 1)
746    return MSE_old(res, **argkw)
747
748#########################################################################
749# PERFORMANCE MEASURES:
750# Scores for evaluation of classifiers
751
752def CA(res, reportSE = False, **argkw):
753    """ Computes classification accuracy, i.e. percentage of matches between
754    predicted and actual class. The function returns a list of classification
755    accuracies of all classifiers tested. If reportSE is set to true, the list
756    will contain tuples with accuracies and standard errors.
757   
758    If results are from multiple repetitions of experiments (like those
759    returned by Orange.evaluation.testing.crossValidation or
760    Orange.evaluation.testing.proportionTest) the
761    standard error (SE) is estimated from deviation of classification
762    accuracy accross folds (SD), as SE = SD/sqrt(N), where N is number
763    of repetitions (e.g. number of folds).
764   
765    If results are from a single repetition, we assume independency of
766    instances and treat the classification accuracy as distributed according
767    to binomial distribution. This can be approximated by normal distribution,
768    so we report the SE of sqrt(CA*(1-CA)/N), where CA is classification
769    accuracy and N is number of test instances.
770   
771    Instead of ExperimentResults, this function can be given a list of
772    confusion matrices (see below). Standard errors are in this case
773    estimated using the latter method.
774    """
775    if res.number_of_iterations==1:
776        if type(res)==ConfusionMatrix:
777            div = nm.TP+nm.FN+nm.FP+nm.TN
778            check_non_zero(div)
779            ca = [(nm.TP+nm.TN)/div]
780        else:
781            CAs = [0.0]*res.number_of_learners
782            if argkw.get("unweighted", 0) or not res.weights:
783                totweight = gettotsize(res)
784                for tex in res.results:
785                    CAs = map(lambda res, cls: res+(cls==tex.actual_class), CAs, tex.classes)
786            else:
787                totweight = 0.
788                for tex in res.results:
789                    CAs = map(lambda res, cls: res+(cls==tex.actual_class and tex.weight), CAs, tex.classes)
790                    totweight += tex.weight
791            check_non_zero(totweight)
792            ca = [x/totweight for x in CAs]
793           
794        if reportSE:
795            return [(x, x*(1-x)/math.sqrt(totweight)) for x in ca]
796        else:
797            return ca
798       
799    else:
800        CAsByFold = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)]
801        foldN = [0.0]*res.number_of_iterations
802
803        if argkw.get("unweighted", 0) or not res.weights:
804            for tex in res.results:
805                for lrn in range(res.number_of_learners):
806                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class)
807                foldN[tex.iteration_number] += 1
808        else:
809            for tex in res.results:
810                for lrn in range(res.number_of_learners):
811                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) and tex.weight
812                foldN[tex.iteration_number] += tex.weight
813
814        return statistics_by_folds(CAsByFold, foldN, reportSE, False)
815
816
817# Obsolete, but kept for compatibility
818def CA_se(res, **argkw):
819    return CA(res, True, **argkw)
820
821
822def AP(res, reportSE = False, **argkw):
823    """ Computes the average probability assigned to the correct class. """
824    if res.number_of_iterations == 1:
825        APs=[0.0]*res.number_of_learners
826        if argkw.get("unweighted", 0) or not res.weights:
827            for tex in res.results:
828                APs = map(lambda res, probs: res + probs[tex.actual_class], APs, tex.probabilities)
829            totweight = gettotsize(res)
830        else:
831            totweight = 0.
832            for tex in res.results:
833                APs = map(lambda res, probs: res + probs[tex.actual_class]*tex.weight, APs, tex.probabilities)
834                totweight += tex.weight
835        check_non_zero(totweight)
836        return [AP/totweight for AP in APs]
837
838    APsByFold = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)]
839    foldN = [0.0] * res.number_of_iterations
840    if argkw.get("unweighted", 0) or not res.weights:
841        for tex in res.results:
842            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class], APsByFold[tex.iteration_number], tex.probabilities)
843            foldN[tex.iteration_number] += 1
844    else:
845        for tex in res.results:
846            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class] * tex.weight, APsByFold[tex.iteration_number], tex.probabilities)
847            foldN[tex.iteration_number] += tex.weight
848
849    return statistics_by_folds(APsByFold, foldN, reportSE, True)
850
851
852def Brier_score(res, reportSE = False, **argkw):
853    """ Computes the Brier's score, defined as the average (over test examples)
854    of sumx(t(x)-p(x))2, where x is a class, t(x) is 1 for the correct class
855    and 0 for the others, and p(x) is the probability that the classifier
856    assigned to the class x
857    """
858    # Computes an average (over examples) of sum_x(t(x) - p(x))^2, where
859    #    x is class,
860    #    t(x) is 0 for 'wrong' and 1 for 'correct' class
861    #    p(x) is predicted probabilty.
862    # There's a trick: since t(x) is zero for all classes but the
863    # correct one (c), we compute the sum as sum_x(p(x)^2) - 2*p(c) + 1
864    # Since +1 is there for each example, it adds 1 to the average
865    # We skip the +1 inside the sum and add it just at the end of the function
866    # We take max(result, 0) to avoid -0.0000x due to rounding errors
867
868    if res.number_of_iterations == 1:
869        MSEs=[0.0]*res.number_of_learners
870        if argkw.get("unweighted", 0) or not res.weights:
871            totweight = 0.0
872            for tex in res.results:
873                MSEs = map(lambda res, probs:
874                           res + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities)
875                totweight += tex.weight
876        else:
877            for tex in res.results:
878                MSEs = map(lambda res, probs:
879                           res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities)
880            totweight = gettotweight(res)
881        check_non_zero(totweight)
882        if reportSE:
883            return [(max(x/totweight+1.0, 0), 0) for x in MSEs]  ## change this, not zero!!!
884        else:
885            return [max(x/totweight+1.0, 0) for x in MSEs]
886
887    BSs = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)]
888    foldN = [0.] * res.number_of_iterations
889
890    if argkw.get("unweighted", 0) or not res.weights:
891        for tex in res.results:
892            BSs[tex.iteration_number] = map(lambda rr, probs:
893                       rr + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities)
894            foldN[tex.iteration_number] += 1
895    else:
896        for tex in res.results:
897            BSs[tex.iteration_number] = map(lambda res, probs:
898                       res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities)
899            foldN[tex.iteration_number] += tex.weight
900
901    stats = statistics_by_folds(BSs, foldN, reportSE, True)
902    if reportSE:
903        return [(x+1.0, y) for x, y in stats]
904    else:
905        return [x+1.0 for x in stats]
906
907def BSS(res, **argkw):
908    return [1-x/2 for x in apply(Brier_score, (res, ), argkw)]
909
910def IS_ex(Pc, P):
911    "Pc aposterior probability, P aprior"
912    if (Pc>=P):
913        return -log2(P)+log2(Pc)
914    else:
915        return -(-log2(1-P)+log2(1-Pc))
916   
917def IS(res, apriori=None, reportSE = False, **argkw):
918    """ Computes the information score as defined by
919    `Kononenko and Bratko (1991) \
920    <http://www.springerlink.com/content/g5p7473160476612/>`_.
921    Argument :obj:`apriori` gives the apriori class
922    distribution; if it is omitted, the class distribution is computed from
923    the actual classes of examples in :obj:`res`.
924    """
925    if not apriori:
926        apriori = class_probabilities_from_res(res)
927
928    if res.number_of_iterations==1:
929        ISs = [0.0]*res.number_of_learners
930        if argkw.get("unweighted", 0) or not res.weights:
931            for tex in res.results:
932              for i in range(len(tex.probabilities)):
933                    cls = tex.actual_class
934                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls])
935            totweight = gettotsize(res)
936        else:
937            for tex in res.results:
938              for i in range(len(tex.probabilities)):
939                    cls = tex.actual_class
940                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight
941            totweight = gettotweight(res)
942        if reportSE:
943            return [(IS/totweight,0) for IS in ISs]
944        else:
945            return [IS/totweight for IS in ISs]
946
947       
948    ISs = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)]
949    foldN = [0.] * res.number_of_iterations
950
951    # compute info scores for each fold   
952    if argkw.get("unweighted", 0) or not res.weights:
953        for tex in res.results:
954            for i in range(len(tex.probabilities)):
955                cls = tex.actual_class
956                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls])
957            foldN[tex.iteration_number] += 1
958    else:
959        for tex in res.results:
960            for i in range(len(tex.probabilities)):
961                cls = tex.actual_class
962                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight
963            foldN[tex.iteration_number] += tex.weight
964
965    return statistics_by_folds(ISs, foldN, reportSE, False)
966
967
968def Friedman(res, statistics, **argkw):
969    sums = None
970    for ri in split_by_iterations(res):
971        ranks = statc.rankdata(apply(statistics, (ri,), argkw))
972        if sums:
973            sums = sums and [ranks[i]+sums[i] for i in range(k)]
974        else:
975            sums = ranks
976            k = len(sums)
977    N = res.number_of_iterations
978    k = len(sums)
979    T = sum([x*x for x in sums])
980    F = 12.0 / (N*k*(k+1)) * T  - 3 * N * (k+1)
981    return F, statc.chisqprob(F, k-1)
982   
983
984def Wilcoxon(res, statistics, **argkw):
985    res1, res2 = [], []
986    for ri in split_by_iterations(res):
987        stats = apply(statistics, (ri,), argkw)
988        if (len(stats) != 2):
989            raise TypeError, "Wilcoxon compares two classifiers, no more, no less"
990        res1.append(stats[0])
991        res2.append(stats[1])
992    return statc.wilcoxont(res1, res2)
993
994def rank_difference(res, statistics, **argkw):
995    if not res.results:
996        raise TypeError, "no experiments"
997
998    k = len(res.results[0].classes)
999    if (k<2):
1000        raise TypeError, "nothing to compare (less than two classifiers given)"
1001    if k==2:
1002        return apply(Wilcoxon, (res, statistics), argkw)
1003    else:
1004        return apply(Friedman, (res, statistics), argkw)
1005   
1006class ConfusionMatrix:
1007    """ Class ConfusionMatrix stores data about false and true
1008    predictions compared to real class. It stores the number of
1009    True Negatives, False Positive, False Negatives and True Positives.
1010    """
1011    def __init__(self):
1012        self.TP = self.FN = self.FP = self.TN = 0.0
1013
1014    def addTFPosNeg(self, predictedPositive, isPositive, weight = 1.0):
1015        if predictedPositive:
1016            if isPositive:
1017                self.TP += weight
1018            else:
1019                self.FP += weight
1020        else:
1021            if isPositive:
1022                self.FN += weight
1023            else:
1024                self.TN += weight
1025
1026
1027def confusion_matrices(res, classIndex=-1, **argkw):
1028    """ This function can compute two different forms of confusion matrix:
1029    one in which a certain class is marked as positive and the other(s)
1030    negative, and another in which no class is singled out. The way to
1031    specify what we want is somewhat confusing due to backward
1032    compatibility issues.
1033    """
1034    tfpns = [ConfusionMatrix() for i in range(res.number_of_learners)]
1035   
1036    if classIndex<0:
1037        numberOfClasses = len(res.class_values)
1038        if classIndex < -1 or numberOfClasses > 2:
1039            cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.number_of_learners)]
1040            if argkw.get("unweighted", 0) or not res.weights:
1041                for tex in res.results:
1042                    trueClass = int(tex.actual_class)
1043                    for li, pred in enumerate(tex.classes):
1044                        predClass = int(pred)
1045                        if predClass < numberOfClasses:
1046                            cm[li][trueClass][predClass] += 1
1047            else:
1048                for tex in enumerate(res.results):
1049                    trueClass = int(tex.actual_class)
1050                    for li, pred in tex.classes:
1051                        predClass = int(pred)
1052                        if predClass < numberOfClasses:
1053                            cm[li][trueClass][predClass] += tex.weight
1054            return cm
1055           
1056        elif res.baseClass>=0:
1057            classIndex = res.baseClass
1058        else:
1059            classIndex = 1
1060           
1061    cutoff = argkw.get("cutoff")
1062    if cutoff:
1063        if argkw.get("unweighted", 0) or not res.weights:
1064            for lr in res.results:
1065                isPositive=(lr.actual_class==classIndex)
1066                for i in range(res.number_of_learners):
1067                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive)
1068        else:
1069            for lr in res.results:
1070                isPositive=(lr.actual_class==classIndex)
1071                for i in range(res.number_of_learners):
1072                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive, lr.weight)
1073    else:
1074        if argkw.get("unweighted", 0) or not res.weights:
1075            for lr in res.results:
1076                isPositive=(lr.actual_class==classIndex)
1077                for i in range(res.number_of_learners):
1078                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive)
1079        else:
1080            for lr in res.results:
1081                isPositive=(lr.actual_class==classIndex)
1082                for i in range(res.number_of_learners):
1083                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive, lr.weight)
1084    return tfpns
1085
1086
1087# obsolete (renamed)
1088compute_confusion_matrices = confusion_matrices
1089
1090
1091def confusion_chi_square(confusionMatrix):
1092    dim = len(confusionMatrix)
1093    rowPriors = [sum(r) for r in confusionMatrix]
1094    colPriors = [sum([r[i] for r in confusionMatrix]) for i in range(dim)]
1095    total = sum(rowPriors)
1096    rowPriors = [r/total for r in rowPriors]
1097    colPriors = [r/total for r in colPriors]
1098    ss = 0
1099    for ri, row in enumerate(confusionMatrix):
1100        for ci, o in enumerate(row):
1101            e = total * rowPriors[ri] * colPriors[ci]
1102            if not e:
1103                return -1, -1, -1
1104            ss += (o-e)**2 / e
1105    df = (dim-1)**2
1106    return ss, df, statc.chisqprob(ss, df)
1107       
1108   
1109def sens(confm):
1110    """Return sensitivity (recall rate) over the given confusion matrix."""
1111    if type(confm) == list:
1112        return [sens(cm) for cm in confm]
1113    else:
1114        tot = confm.TP+confm.FN
1115        if tot < 1e-6:
1116            import warnings
1117            warnings.warn("Can't compute sensitivity: one or both classes have no instances")
1118            return -1
1119
1120        return confm.TP/tot
1121
1122def recall(confm):
1123    """Return recall rate (sensitivity) over the given confusion matrix."""
1124    return sens(confm)
1125
1126
1127def spec(confm):
1128    """Return specificity over the given confusion matrix."""
1129    if type(confm) == list:
1130        return [spec(cm) for cm in confm]
1131    else:
1132        tot = confm.FP+confm.TN
1133        if tot < 1e-6:
1134            import warnings
1135            warnings.warn("Can't compute specificity: one or both classes have no instances")
1136            return -1
1137        return confm.TN/tot
1138 
1139
1140def PPV(confm):
1141    """Return positive predictive value (precision rate) over the given confusion matrix."""
1142    if type(confm) == list:
1143        return [PPV(cm) for cm in confm]
1144    else:
1145        tot = confm.TP+confm.FP
1146        if tot < 1e-6:
1147            import warnings
1148            warnings.warn("Can't compute PPV: one or both classes have no instances")
1149            return -1
1150        return confm.TP/tot
1151
1152
1153def precision(confm):
1154    """Return precision rate (positive predictive value) over the given confusion matrix."""
1155    return PPV(confm)
1156
1157
1158def NPV(confm):
1159    """Return negative predictive value over the given confusion matrix."""
1160    if type(confm) == list:
1161        return [NPV(cm) for cm in confm]
1162    else:
1163        tot = confm.FN+confm.TN
1164        if tot < 1e-6:
1165            import warnings
1166            warnings.warn("Can't compute NPV: one or both classes have no instances")
1167            return -1
1168        return confm.TN/tot
1169
1170def F1(confm):
1171    """Return F1 score (harmonic mean of precision and recall) over the given confusion matrix."""
1172    if type(confm) == list:
1173        return [F1(cm) for cm in confm]
1174    else:
1175        p = precision(confm)
1176        r = recall(confm)
1177        if p + r > 0:
1178            return 2. * p * r / (p + r)
1179        else:
1180            import warnings
1181            warnings.warn("Can't compute F1: P + R is zero or not defined")
1182            return -1
1183
1184def Falpha(confm, alpha=1.0):
1185    """Return the alpha-mean of precision and recall over the given confusion matrix."""
1186    if type(confm) == list:
1187        return [Falpha(cm, alpha=alpha) for cm in confm]
1188    else:
1189        p = precision(confm)
1190        r = recall(confm)
1191        return (1. + alpha) * p * r / (alpha * p + r)
1192   
1193def MCC(confm):
1194    '''
1195    Return Mattew correlation coefficient over the given confusion matrix.
1196
1197    MCC is calculated as follows:
1198    MCC = (TP*TN - FP*FN) / sqrt( (TP+FP)*(TP+FN)*(TN+FP)*(TN+FN) )
1199   
1200    [1] Matthews, B.W., Comparison of the predicted and observed secondary
1201    structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442-451
1202
1203    code by Boris Gorelik
1204    '''
1205    if type(confm) == list:
1206        return [MCC(cm) for cm in confm]
1207    else:
1208        truePositive = confm.TP
1209        trueNegative = confm.TN
1210        falsePositive = confm.FP
1211        falseNegative = confm.FN
1212         
1213        try:   
1214            r = (((truePositive * trueNegative) - (falsePositive * falseNegative))/ 
1215                math.sqrt(  (truePositive + falsePositive)  * 
1216                ( truePositive + falseNegative ) * 
1217                ( trueNegative + falsePositive ) * 
1218                ( trueNegative + falseNegative ) )
1219                )
1220        except ZeroDivisionError:
1221            # Zero difision occurs when there is either no true positives
1222            # or no true negatives i.e. the problem contains only one
1223            # type of classes.
1224            import warnings
1225            warnings.warn("Can't compute MCC: TP or TN is zero or not defined")
1226            r = None
1227
1228    return r
1229
1230def scotts_pi(confm, bIsListOfMatrices=True):
1231   """Compute Scott's Pi for measuring inter-rater agreement for nominal data
1232
1233   http://en.wikipedia.org/wiki/Scott%27s_Pi
1234   Scott's Pi is a statistic for measuring inter-rater reliability for nominal
1235   raters.
1236
1237   @param confm: confusion matrix, or list of confusion matrices. To obtain
1238                           non-binary confusion matrix, call
1239                           Orange.evaluation.scoring.compute_confusion_matrices and set the
1240                           classIndex parameter to -2.
1241   @param bIsListOfMatrices: specifies whether confm is list of matrices.
1242                           This function needs to operate on non-binary
1243                           confusion matrices, which are represented by python
1244                           lists, therefore one needs a way to distinguish
1245                           between a single matrix and list of matrices
1246   """
1247
1248   if bIsListOfMatrices:
1249       try:
1250           return [scotts_pi(cm, bIsListOfMatrices=False) for cm in confm]
1251       except TypeError:
1252           # Nevermind the parameter, maybe this is a "conventional" binary
1253           # confusion matrix and bIsListOfMatrices was specified by mistake
1254           return scottsPiSingle(confm, bIsListOfMatrices=False)
1255   else:
1256       if isinstance(confm, ConfusionMatrix):
1257           confm = numpy.array( [[confm.TP, confm.FN],
1258                   [confm.FP, confm.TN]], dtype=float)
1259       else:
1260           confm = numpy.array(confm, dtype=float)
1261
1262       marginalSumOfRows = numpy.sum(confm, axis=0)
1263       marginalSumOfColumns = numpy.sum(confm, axis=1)
1264       jointProportion = (marginalSumOfColumns + marginalSumOfRows)/ \
1265                           (2.0 * numpy.sum(confm, axis=None))
1266       # In the eq. above, 2.0 is what the Wikipedia page calls
1267       # the number of annotators. Here we have two annotators:
1268       # the observed (true) labels (annotations) and the predicted by
1269       # the learners.
1270
1271       prExpected = numpy.sum(jointProportion ** 2, axis=None)
1272       prActual = numpy.sum(numpy.diag(confm), axis=None)/numpy.sum(confm, axis=None)
1273
1274       ret = (prActual - prExpected) / (1.0 - prExpected)
1275       return ret
1276
1277def AUCWilcoxon(res, classIndex=-1, **argkw):
1278    """ Computes the area under ROC (AUC) and its standard error using
1279    Wilcoxon's approach proposed by Hanley and McNeal (1982). If
1280    :obj:`classIndex` is not specified, the first class is used as
1281    "the positive" and others are negative. The result is a list of
1282    tuples (aROC, standard error).
1283    """
1284    import corn
1285    useweights = res.weights and not argkw.get("unweighted", 0)
1286    problists, tots = corn.computeROCCumulative(res, classIndex, useweights)
1287
1288    results=[]
1289
1290    totPos, totNeg = tots[1], tots[0]
1291    N = totPos + totNeg
1292    for plist in problists:
1293        highPos, lowNeg = totPos, 0.0
1294        W, Q1, Q2 = 0.0, 0.0, 0.0
1295        for prob in plist:
1296            thisPos, thisNeg = prob[1][1], prob[1][0]
1297            highPos -= thisPos
1298            W += thisNeg * (highPos + thisPos/2.)
1299            Q2 += thisPos * (lowNeg**2  + lowNeg*thisNeg  + thisNeg**2 /3.)
1300            Q1 += thisNeg * (highPos**2 + highPos*thisPos + thisPos**2 /3.)
1301
1302            lowNeg += thisNeg
1303
1304        W  /= (totPos*totNeg)
1305        Q1 /= (totNeg*totPos**2)
1306        Q2 /= (totPos*totNeg**2)
1307
1308        SE = math.sqrt( (W*(1-W) + (totPos-1)*(Q1-W**2) + (totNeg-1)*(Q2-W**2)) / (totPos*totNeg) )
1309        results.append((W, SE))
1310    return results
1311
1312AROC = AUCWilcoxon # for backward compatibility, AROC is obsolote
1313
1314def compare_2_AUCs(res, lrn1, lrn2, classIndex=-1, **argkw):
1315    import corn
1316    return corn.compare2ROCs(res, lrn1, lrn2, classIndex, res.weights and not argkw.get("unweighted"))
1317
1318compare_2_AROCs = compare_2_AUCs # for backward compatibility, compare_2_AROCs is obsolote
1319
1320   
1321def compute_ROC(res, classIndex=-1):
1322    """ Computes a ROC curve as a list of (x, y) tuples, where x is
1323    1-specificity and y is sensitivity.
1324    """
1325    import corn
1326    problists, tots = corn.computeROCCumulative(res, classIndex)
1327
1328    results = []
1329    totPos, totNeg = tots[1], tots[0]
1330
1331    for plist in problists:
1332        curve=[(1., 1.)]
1333        TP, TN = totPos, 0.0
1334        FN, FP = 0., totNeg
1335        for prob in plist:
1336            thisPos, thisNeg = prob[1][1], prob[1][0]
1337            # thisPos go from TP to FN
1338            TP -= thisPos
1339            FN += thisPos
1340            # thisNeg go from FP to TN
1341            TN += thisNeg
1342            FP -= thisNeg
1343
1344            sens = TP/(TP+FN)
1345            spec = TN/(FP+TN)
1346            curve.append((1-spec, sens))
1347        results.append(curve)
1348
1349    return results   
1350
1351## TC's implementation of algorithms, taken from:
1352## T Fawcett: ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, submitted to KDD Journal.
1353def ROC_slope((P1x, P1y, P1fscore), (P2x, P2y, P2fscore)):
1354    if (P1x == P2x):
1355        return 1e300
1356    return (P1y - P2y) / (P1x - P2x)
1357
1358def ROC_add_point(P, R, keepConcavities=1):
1359    if keepConcavities:
1360        R.append(P)
1361    else:
1362        while (1):
1363            if len(R) < 2:
1364                R.append(P)
1365                return R
1366            else:
1367                T = R.pop()
1368                T2 = R[-1]
1369                if ROC_slope(T2, T) > ROC_slope(T, P):
1370                    R.append(T)
1371                    R.append(P)
1372                    return R
1373    return R
1374
1375def TC_compute_ROC(res, classIndex=-1, keepConcavities=1):
1376    import corn
1377    problists, tots = corn.computeROCCumulative(res, classIndex)
1378
1379    results = []
1380    P, N = tots[1], tots[0]
1381
1382    for plist in problists:
1383        ## corn gives an increasing by scores list, we need a decreasing by scores
1384        plist.reverse()
1385        TP = 0.0
1386        FP = 0.0
1387        curve=[]
1388        fPrev = 10e300 # "infinity" score at 0.0, 0.0
1389        for prob in plist:
1390            f = prob[0]
1391            if f <> fPrev:
1392                if P:
1393                    tpr = TP/P
1394                else:
1395                    tpr = 0.0
1396                if N:
1397                    fpr = FP/N
1398                else:
1399                    fpr = 0.0
1400                curve = ROC_add_point((fpr, tpr, fPrev), curve, keepConcavities)
1401                fPrev = f
1402            thisPos, thisNeg = prob[1][1], prob[1][0]
1403            TP += thisPos
1404            FP += thisNeg
1405        if P:
1406            tpr = TP/P
1407        else:
1408            tpr = 0.0
1409        if N:
1410            fpr = FP/N
1411        else:
1412            fpr = 0.0
1413        curve = ROC_add_point((fpr, tpr, f), curve, keepConcavities) ## ugly
1414        results.append(curve)
1415
1416    return results
1417
1418## returns a list of points at the intersection of the tangential iso-performance line and the given ROC curve
1419## for given values of FPcost, FNcost and pval
1420def TC_best_thresholds_on_ROC_curve(FPcost, FNcost, pval, curve):
1421    m = (FPcost*(1.0 - pval)) / (FNcost*pval)
1422
1423    ## put the iso-performance line in point (0.0, 1.0)
1424    x0, y0 = (0.0, 1.0)
1425    x1, y1 = (1.0, 1.0 + m)
1426    d01 = math.sqrt((x1 - x0)*(x1 - x0) + (y1 - y0)*(y1 - y0))
1427
1428    ## calculate and find the closest point to the line
1429    firstp = 1
1430    mind = 0.0
1431    a = (x0*y1 - x1*y0)
1432    closestPoints = []
1433    for (x, y, fscore) in curve:
1434        d = ((y0 - y1)*x + (x1 - x0)*y + a) / d01
1435        d = abs(d)
1436        if firstp or d < mind:
1437            mind, firstp = d, 0
1438            closestPoints = [(x, y, fscore)]
1439        else:
1440            if abs(d - mind) <= 0.0001: ## close enough
1441                closestPoints.append( (x, y, fscore) )
1442    return closestPoints         
1443
1444def frange(start, end=None, inc=None):
1445    "A range function, that does accept float increments..."
1446
1447    if end == None:
1448        end = start + 0.0
1449        start = 0.0
1450
1451    if inc == None or inc == 0:
1452        inc = 1.0
1453
1454    L = [start]
1455    while 1:
1456        next = start + len(L) * inc
1457        if inc > 0 and next >= end:
1458            L.append(end)
1459            break
1460        elif inc < 0 and next <= end:
1461            L.append(end)
1462            break
1463        L.append(next)
1464       
1465    return L
1466
1467## input ROCcurves are of form [ROCcurves1, ROCcurves2, ... ROCcurvesN],
1468## where ROCcurvesX is a set of ROC curves,
1469## where a (one) ROC curve is a set of (FP, TP) points
1470##
1471## for each (sub)set of input ROC curves
1472## returns the average ROC curve and an array of (vertical) standard deviations
1473def TC_vertical_average_ROC(ROCcurves, samples = 10):
1474    def INTERPOLATE((P1x, P1y, P1fscore), (P2x, P2y, P2fscore), X):
1475        if (P1x == P2x) or ((X > P1x) and (X > P2x)) or ((X < P1x) and (X < P2x)):
1476            raise ValueError, "assumptions for interpolation are not met: P1 = %f,%f P2 = %f,%f X = %f" % (P1x, P1y, P2x, P2y, X)
1477        dx = float(P2x) - float(P1x)
1478        dy = float(P2y) - float(P1y)
1479        m = dy/dx
1480        return P1y + m*(X - P1x)
1481
1482    def TP_FOR_FP(FPsample, ROC, npts):
1483        i = 0
1484        while i < npts - 1:
1485            (fp, _, _) = ROC[i + 1]
1486            if (fp <= FPsample):
1487                i += 1
1488            else:
1489                break
1490        (fp, tp, _) = ROC[i]
1491        if fp == FPsample:
1492            return tp
1493        elif fp < FPsample and i + 1 < len(ROC):
1494            return INTERPOLATE(ROC[i], ROC[i+1], FPsample)
1495        elif fp < FPsample and i + 1 == len(ROC): # return the last
1496            return ROC[i][1]
1497        raise ValueError, "cannot compute: TP_FOR_FP in TC_vertical_average_ROC"
1498        #return 0.0
1499
1500    average = []
1501    stdev = []
1502    for ROCS in ROCcurves:
1503        npts = []
1504        for c in ROCS:
1505            npts.append(len(c))
1506        nrocs = len(ROCS)
1507
1508        TPavg = []
1509        TPstd = []
1510        for FPsample in frange(0.0, 1.0, 1.0/samples):
1511            TPsum = []
1512            for i in range(nrocs):
1513                TPsum.append( TP_FOR_FP(FPsample, ROCS[i], npts[i]) ) ##TPsum = TPsum + TP_FOR_FP(FPsample, ROCS[i], npts[i])
1514            TPavg.append( (FPsample, statc.mean(TPsum)) )
1515            if len(TPsum) > 1:
1516                stdv = statc.std(TPsum)
1517            else:
1518                stdv = 0.0
1519            TPstd.append( stdv )
1520
1521        average.append(TPavg)
1522        stdev.append(TPstd)
1523
1524    return (average, stdev)
1525
1526## input ROCcurves are of form [ROCcurves1, ROCcurves2, ... ROCcurvesN],
1527## where ROCcurvesX is a set of ROC curves,
1528## where a (one) ROC curve is a set of (FP, TP) points
1529##
1530## for each (sub)set of input ROC curves
1531## returns the average ROC curve, an array of vertical standard deviations and an array of horizontal standard deviations
1532def TC_threshold_average_ROC(ROCcurves, samples = 10):
1533    def POINT_AT_THRESH(ROC, npts, thresh):
1534        i = 0
1535        while i < npts - 1:
1536            (px, py, pfscore) = ROC[i]
1537            if (pfscore > thresh):
1538                i += 1
1539            else:
1540                break
1541        return ROC[i]
1542
1543    average = []
1544    stdevV = []
1545    stdevH = []
1546    for ROCS in ROCcurves:
1547        npts = []
1548        for c in ROCS:
1549            npts.append(len(c))
1550        nrocs = len(ROCS)
1551
1552        T = []
1553        for c in ROCS:
1554            for (px, py, pfscore) in c:
1555##                try:
1556##                    T.index(pfscore)
1557##                except:
1558                T.append(pfscore)
1559        T.sort()
1560        T.reverse() ## ugly
1561
1562        TPavg = []
1563        TPstdV = []
1564        TPstdH = []
1565        for tidx in frange(0, (len(T) - 1.0), float(len(T))/samples):
1566            FPsum = []
1567            TPsum = []
1568            for i in range(nrocs):
1569                (fp, tp, _) = POINT_AT_THRESH(ROCS[i], npts[i], T[int(tidx)])
1570                FPsum.append(fp)
1571                TPsum.append(tp)
1572            TPavg.append( (statc.mean(FPsum), statc.mean(TPsum)) )
1573            ## vertical standard deviation
1574            if len(TPsum) > 1:
1575                stdv = statc.std(TPsum)
1576            else:
1577                stdv = 0.0
1578            TPstdV.append( stdv )
1579            ## horizontal standard deviation
1580            if len(FPsum) > 1:
1581                stdh = statc.std(FPsum)
1582            else:
1583                stdh = 0.0
1584            TPstdH.append( stdh )
1585
1586        average.append(TPavg)
1587        stdevV.append(TPstdV)
1588        stdevH.append(TPstdH)
1589
1590    return (average, stdevV, stdevH)
1591
1592## Calibration Curve
1593## returns an array of (curve, yesClassPredictions, noClassPredictions) elements, where:
1594##  - curve is an array of points (x, y) on the calibration curve
1595##  - yesClassRugPoints is an array of (x, 1) points
1596##  - noClassRugPoints is an array of (x, 0) points
1597def compute_calibration_curve(res, classIndex=-1):
1598    import corn
1599    ## merge multiple iterations into one
1600    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels)
1601    for te in res.results:
1602        mres.results.append( te )
1603
1604    problists, tots = corn.computeROCCumulative(mres, classIndex)
1605
1606    results = []
1607    P, N = tots[1], tots[0]
1608
1609    bins = 10 ## divide interval between 0.0 and 1.0 into N bins
1610
1611    for plist in problists:
1612        yesClassRugPoints = [] 
1613        noClassRugPoints = []
1614
1615        yesBinsVals = [0] * bins
1616        noBinsVals = [0] * bins
1617        for (f, (thisNeg, thisPos)) in plist:
1618            yesClassRugPoints.append( (f, thisPos) ) #1.0
1619            noClassRugPoints.append( (f, thisNeg) ) #1.0
1620
1621            index = int(f * bins )
1622            index = min(index, bins - 1) ## just in case for value 1.0
1623            yesBinsVals[index] += thisPos
1624            noBinsVals[index] += thisNeg
1625
1626        curve = []
1627        for cn in range(bins):
1628            f = float(cn * 1.0 / bins) + (1.0 / 2.0 / bins)
1629            yesVal = yesBinsVals[cn]
1630            noVal = noBinsVals[cn]
1631            allVal = yesVal + noVal
1632            if allVal == 0.0: continue
1633            y = float(yesVal)/float(allVal)
1634            curve.append( (f,  y) )
1635
1636        ## smooth the curve
1637        maxnPoints = 100
1638        if len(curve) >= 3:
1639#            loessCurve = statc.loess(curve, -3, 0.6)
1640            loessCurve = statc.loess(curve, maxnPoints, 0.5, 3)
1641        else:
1642            loessCurve = curve
1643        clen = len(loessCurve)
1644        if clen > maxnPoints:
1645            df = clen / maxnPoints
1646            if df < 1: df = 1
1647            curve = [loessCurve[i]  for i in range(0, clen, df)]
1648        else:
1649            curve = loessCurve
1650        curve = [(c)[:2] for c in curve] ## remove the third value (variance of epsilon?) that suddenly appeared in the output of the statc.loess function
1651        results.append((curve, yesClassRugPoints, noClassRugPoints))
1652
1653    return results
1654
1655
1656## Lift Curve
1657## returns an array of curve elements, where:
1658##  - curve is an array of points ((TP+FP)/(P + N), TP/P, (th, FP/N)) on the Lift Curve
1659def compute_lift_curve(res, classIndex=-1):
1660    import corn
1661    ## merge multiple iterations into one
1662    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels)
1663    for te in res.results:
1664        mres.results.append( te )
1665
1666    problists, tots = corn.computeROCCumulative(mres, classIndex)
1667
1668    results = []
1669    P, N = tots[1], tots[0]
1670    for plist in problists:
1671        ## corn gives an increasing by scores list, we need a decreasing by scores
1672        plist.reverse()
1673        TP = 0.0
1674        FP = 0.0
1675        curve = [(0.0, 0.0, (10e300, 0.0))]
1676        for (f, (thisNeg, thisPos)) in plist:
1677            TP += thisPos
1678            FP += thisNeg
1679            curve.append( ((TP+FP)/(P + N), TP, (f, FP/(N or 1))) )
1680        results.append(curve)
1681
1682    return P, N, results
1683###
1684
1685class CDT:
1686  """ Stores number of concordant (C), discordant (D) and tied (T) pairs (used for AUC) """
1687  def __init__(self, C=0.0, D=0.0, T=0.0):
1688    self.C, self.D, self.T = C, D, T
1689   
1690def is_CDT_empty(cdt):
1691    return cdt.C + cdt.D + cdt.T < 1e-20
1692
1693
1694def compute_CDT(res, classIndex=-1, **argkw):
1695    """Obsolete, don't use"""
1696    import corn
1697    if classIndex<0:
1698        if res.baseClass>=0:
1699            classIndex = res.baseClass
1700        else:
1701            classIndex = 1
1702           
1703    useweights = res.weights and not argkw.get("unweighted", 0)
1704    weightByClasses = argkw.get("weightByClasses", True)
1705
1706    if (res.number_of_iterations>1):
1707        CDTs = [CDT() for i in range(res.number_of_learners)]
1708        iterationExperiments = split_by_iterations(res)
1709        for exp in iterationExperiments:
1710            expCDTs = corn.computeCDT(exp, classIndex, useweights)
1711            for i in range(len(CDTs)):
1712                CDTs[i].C += expCDTs[i].C
1713                CDTs[i].D += expCDTs[i].D
1714                CDTs[i].T += expCDTs[i].T
1715        for i in range(res.number_of_learners):
1716            if is_CDT_empty(CDTs[0]):
1717                return corn.computeCDT(res, classIndex, useweights)
1718       
1719        return CDTs
1720    else:
1721        return corn.computeCDT(res, classIndex, useweights)
1722
1723## THIS FUNCTION IS OBSOLETE AND ITS AVERAGING OVER FOLDS IS QUESTIONABLE
1724## DON'T USE IT
1725def ROCs_from_CDT(cdt, **argkw):
1726    """Obsolete, don't use"""
1727    if type(cdt) == list:
1728        return [ROCs_from_CDT(c) for c in cdt]
1729
1730    C, D, T = cdt.C, cdt.D, cdt.T
1731    N = C+D+T
1732    if N < 1e-6:
1733        import warnings
1734        warnings.warn("Can't compute AUC: one or both classes have no instances")
1735        return (-1,)*8
1736    if N < 2:
1737        import warnings
1738        warnings.warn("Can't compute AUC: one or both classes have too few examples")
1739
1740    som = (C-D)/N
1741    c = 0.5*(1+som)
1742 
1743    if (C+D):
1744        res = (C/N*100, D/N*100, T/N*100, N, som, (C-D)/(C+D), (C-D)/(N*(N-1)/2), 0.5*(1+som))
1745    else:
1746        res = (C/N*100, D/N*100, T/N*100, N, som, -1.0, (C-D)/(N*(N-1)/2), 0.5*(1+som))
1747
1748    if argkw.get("print"):
1749        print "Concordant  = %5.1f       Somers' D = %1.3f" % (res[0], res[4])
1750        print "Discordant  = %5.1f       Gamma     = %1.3f" % (res[1], res[5]>0 and res[5] or "N/A")
1751        print "Tied        = %5.1f       Tau-a     = %1.3f" % (res[2], res[6])
1752        print " %6d pairs             c         = %1.3f"    % (res[3], res[7])
1753
1754    return res
1755
1756AROC_from_CDT = ROCs_from_CDT  # for backward compatibility, AROC_from_CDT is obsolote
1757
1758
1759
1760# computes AUC using a specified 'cdtComputer' function
1761# It tries to compute AUCs from 'ite' (examples from a single iteration) and,
1762# if C+D+T=0, from 'all_ite' (entire test set). In the former case, the AUCs
1763# are divided by 'divideByIfIte'. Additional flag is returned which is True in
1764# the former case, or False in the latter.
1765def AUC_x(cdtComputer, ite, all_ite, divideByIfIte, computerArgs):
1766    cdts = cdtComputer(*(ite, ) + computerArgs)
1767    if not is_CDT_empty(cdts[0]):
1768        return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T)/divideByIfIte for cdt in cdts], True
1769       
1770    if all_ite:
1771        cdts = cdtComputer(*(all_ite, ) + computerArgs)
1772        if not is_CDT_empty(cdts[0]):
1773            return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T) for cdt in cdts], False
1774
1775    return False, False
1776
1777   
1778# computes AUC between classes i and j as if there we no other classes
1779def AUC_ij(ite, classIndex1, classIndex2, useWeights = True, all_ite = None, divideByIfIte = 1.0):
1780    import corn
1781    return AUC_x(corn.computeCDTPair, ite, all_ite, divideByIfIte, (classIndex1, classIndex2, useWeights))
1782
1783
1784# computes AUC between class i and the other classes (treating them as the same class)
1785def AUC_i(ite, classIndex, useWeights = True, all_ite = None, divideByIfIte = 1.0):
1786    import corn
1787    return AUC_x(corn.computeCDT, ite, all_ite, divideByIfIte, (classIndex, useWeights))
1788   
1789
1790# computes the average AUC over folds using a "AUCcomputer" (AUC_i or AUC_ij)
1791# it returns the sum of what is returned by the computer, unless at a certain
1792# fold the computer has to resort to computing over all folds or even this failed;
1793# in these cases the result is returned immediately
1794def AUC_iterations(AUCcomputer, iterations, computerArgs):
1795    subsum_aucs = [0.] * iterations[0].number_of_learners
1796    for ite in iterations:
1797        aucs, foldsUsed = AUCcomputer(*(ite, ) + computerArgs)
1798        if not aucs:
1799            return None
1800        if not foldsUsed:
1801            return aucs
1802        subsum_aucs = map(add, subsum_aucs, aucs)
1803    return subsum_aucs
1804
1805
1806# AUC for binary classification problems
1807def AUC_binary(res, useWeights = True):
1808    if res.number_of_iterations > 1:
1809        return AUC_iterations(AUC_i, split_by_iterations(res), (-1, useWeights, res, res.number_of_iterations))
1810    else:
1811        return AUC_i(res, -1, useWeights)[0]
1812
1813# AUC for multiclass problems
1814def AUC_multi(res, useWeights = True, method = 0):
1815    numberOfClasses = len(res.class_values)
1816   
1817    if res.number_of_iterations > 1:
1818        iterations = split_by_iterations(res)
1819        all_ite = res
1820    else:
1821        iterations = [res]
1822        all_ite = None
1823   
1824    # by pairs
1825    sum_aucs = [0.] * res.number_of_learners
1826    usefulClassPairs = 0.
1827
1828    if method in [0, 2]:
1829        prob = class_probabilities_from_res(res)
1830       
1831    if method <= 1:
1832        for classIndex1 in range(numberOfClasses):
1833            for classIndex2 in range(classIndex1):
1834                subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
1835                if subsum_aucs:
1836                    if method == 0:
1837                        p_ij = prob[classIndex1] * prob[classIndex2]
1838                        subsum_aucs = [x * p_ij  for x in subsum_aucs]
1839                        usefulClassPairs += p_ij
1840                    else:
1841                        usefulClassPairs += 1
1842                    sum_aucs = map(add, sum_aucs, subsum_aucs)
1843    else:
1844        for classIndex in range(numberOfClasses):
1845            subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, useWeights, all_ite, res.number_of_iterations))
1846            if subsum_aucs:
1847                if method == 0:
1848                    p_i = prob[classIndex]
1849                    subsum_aucs = [x * p_i  for x in subsum_aucs]
1850                    usefulClassPairs += p_i
1851                else:
1852                    usefulClassPairs += 1
1853                sum_aucs = map(add, sum_aucs, subsum_aucs)
1854                   
1855    if usefulClassPairs > 0:
1856        sum_aucs = [x/usefulClassPairs for x in sum_aucs]
1857
1858    return sum_aucs
1859
1860def AUC():
1861    pass
1862
1863AUC.ByWeightedPairs = 0
1864
1865# Computes AUC, possibly for multiple classes (the averaging method can be specified)
1866# Results over folds are averages; if some folds examples from one class only, the folds are merged
1867def AUC(res, method = AUC.ByWeightedPairs, useWeights = True):
1868    """ Returns the area under ROC curve (AUC) given a set of experimental
1869    results. For multivalued class problems, it will compute some sort of
1870    average, as specified by the argument method.
1871    """
1872    if len(res.class_values) < 2:
1873        raise ValueError("Cannot compute AUC on a single-class problem")
1874    elif len(res.class_values) == 2:
1875        return AUC_binary(res, useWeights)
1876    else:
1877        return AUC_multi(res, useWeights, method)
1878
1879AUC.ByWeightedPairs = 0
1880AUC.ByPairs = 1
1881AUC.WeightedOneAgainstAll = 2
1882AUC.OneAgainstAll = 3
1883
1884
1885# Computes AUC; in multivalued class problem, AUC is computed as one against all
1886# Results over folds are averages; if some folds examples from one class only, the folds are merged
1887def AUC_single(res, classIndex = -1, useWeights = True):
1888    """ Computes AUC where the class given classIndex is singled out, and
1889    all other classes are treated as a single class. To find how good our
1890    classifiers are in distinguishing between vans and other vehicle, call
1891    the function like this::
1892   
1893        Orange.evaluation.scoring.AUC_single(resVeh, \
1894classIndex = vehicle.domain.classVar.values.index("van"))
1895    """
1896    if classIndex<0:
1897        if res.baseClass>=0:
1898            classIndex = res.baseClass
1899        else:
1900            classIndex = 1
1901
1902    if res.number_of_iterations > 1:
1903        return AUC_iterations(AUC_i, split_by_iterations(res), (classIndex, useWeights, res, res.number_of_iterations))
1904    else:
1905        return AUC_i( res, classIndex, useWeights)[0]
1906
1907# Computes AUC for a pair of classes (as if there were no other classes)
1908# Results over folds are averages; if some folds have examples from one class only, the folds are merged
1909def AUC_pair(res, classIndex1, classIndex2, useWeights = True):
1910    """ Computes AUC between a pair of instances, ignoring instances from all
1911    other classes.
1912    """
1913    if res.number_of_iterations > 1:
1914        return AUC_iterations(AUC_ij, split_by_iterations(res), (classIndex1, classIndex2, useWeights, res, res.number_of_iterations))
1915    else:
1916        return AUC_ij(res, classIndex1, classIndex2, useWeights)
1917 
1918
1919# AUC for multiclass problems
1920def AUC_matrix(res, useWeights = True):
1921    """ Computes a (lower diagonal) matrix with AUCs for all pairs of classes.
1922    If there are empty classes, the corresponding elements in the matrix
1923    are -1. Remember the beautiful(?) code for printing out the confusion
1924    matrix? Here it strikes again::
1925   
1926        classes = vehicle.domain.classVar.values
1927        AUCmatrix = Orange.evaluation.scoring.AUC_matrix(resVeh)[0]
1928        print "\t"+"\t".join(classes[:-1])
1929        for className, AUCrow in zip(classes[1:], AUCmatrix[1:]):
1930            print ("%s" + ("\t%5.3f" * len(AUCrow))) % ((className, ) + tuple(AUCrow))
1931    """
1932    numberOfClasses = len(res.class_values)
1933    number_of_learners = res.number_of_learners
1934   
1935    if res.number_of_iterations > 1:
1936        iterations, all_ite = split_by_iterations(res), res
1937    else:
1938        iterations, all_ite = [res], None
1939   
1940    aucs = [[[] for i in range(numberOfClasses)] for i in range(number_of_learners)]
1941    prob = class_probabilities_from_res(res)
1942       
1943    for classIndex1 in range(numberOfClasses):
1944        for classIndex2 in range(classIndex1):
1945            pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations))
1946            if pair_aucs:
1947                for lrn in range(number_of_learners):
1948                    aucs[lrn][classIndex1].append(pair_aucs[lrn])
1949            else:
1950                for lrn in range(number_of_learners):
1951                    aucs[lrn][classIndex1].append(-1)
1952    return aucs
1953               
1954
1955def McNemar(res, **argkw):
1956    """ Computes a triangular matrix with McNemar statistics for each pair of
1957    classifiers. The statistics is distributed by chi-square distribution with
1958    one degree of freedom; critical value for 5% significance is around 3.84.
1959    """
1960    nLearners = res.number_of_learners
1961    mcm = []
1962    for i in range(nLearners):
1963       mcm.append([0.0]*res.number_of_learners)
1964
1965    if not res.weights or argkw.get("unweighted"):
1966        for i in res.results:
1967            actual = i.actual_class
1968            classes = i.classes
1969            for l1 in range(nLearners):
1970                for l2 in range(l1, nLearners):
1971                    if classes[l1]==actual:
1972                        if classes[l2]!=actual:
1973                            mcm[l1][l2] += 1
1974                    elif classes[l2]==actual:
1975                        mcm[l2][l1] += 1
1976    else:
1977        for i in res.results:
1978            actual = i.actual_class
1979            classes = i.classes
1980            for l1 in range(nLearners):
1981                for l2 in range(l1, nLearners):
1982                    if classes[l1]==actual:
1983                        if classes[l2]!=actual:
1984                            mcm[l1][l2] += i.weight
1985                    elif classes[l2]==actual:
1986                        mcm[l2][l1] += i.weight
1987
1988    for l1 in range(nLearners):
1989        for l2 in range(l1, nLearners):
1990            su=mcm[l1][l2] + mcm[l2][l1]
1991            if su:
1992                mcm[l2][l1] = (abs(mcm[l1][l2]-mcm[l2][l1])-1)**2 / su
1993            else:
1994                mcm[l2][l1] = 0
1995
1996    for l1 in range(nLearners):
1997        mcm[l1]=mcm[l1][:l1]
1998
1999    return mcm
2000
2001
2002def McNemar_of_two(res, lrn1, lrn2):
2003    """ McNemar_of_two computes a McNemar statistics for a pair of classifier,
2004    specified by indices learner1 and learner2.
2005    """
2006    tf = ft = 0.0
2007    if not res.weights or argkw.get("unweighted"):
2008        for i in res.results:
2009            actual=i.actual_class
2010            if i.classes[lrn1]==actual:
2011                if i.classes[lrn2]!=actual:
2012                    tf += i.weight
2013            elif i.classes[lrn2]==actual:
2014                    ft += i.weight
2015    else:
2016        for i in res.results:
2017            actual=i.actual_class
2018            if i.classes[lrn1]==actual:
2019                if i.classes[lrn2]!=actual:
2020                    tf += 1.0
2021            elif i.classes[lrn2]==actual:
2022                    ft += 1.0
2023
2024    su = tf + ft
2025    if su:
2026        return (abs(tf-ft)-1)**2 / su
2027    else:
2028        return 0
2029
2030
2031def Friedman(res, stat=CA):
2032    """ Compares classifiers by Friedman test, treating folds as different examles.
2033        Returns F, p and average ranks
2034    """
2035    res_split = split_by_iterations(res)
2036    res = [stat(r) for r in res_split]
2037   
2038    N = len(res)
2039    k = len(res[0])
2040    sums = [0.0]*k
2041    for r in res:
2042        ranks = [k-x+1 for x in statc.rankdata(r)]
2043        if stat==Brier_score: # reverse ranks for Brier_score (lower better)
2044            ranks = [k+1-x for x in ranks]
2045        sums = [ranks[i]+sums[i] for i in range(k)]
2046
2047    T = sum([x*x for x in sums])
2048    sums = [x/N for x in sums]
2049
2050    F = 12.0 / (N*k*(k+1)) * T  - 3 * N * (k+1)
2051
2052    return F, statc.chisqprob(F, k-1), sums
2053
2054
2055def Wilcoxon_pairs(res, avgranks, stat=CA):
2056    """ Returns a triangular matrix, where element[i][j] stores significance of difference
2057        between i-th and j-th classifier, as computed by Wilcoxon test. The element is positive
2058        if i-th is better than j-th, negative if it is worse, and 1 if they are equal.
2059        Arguments to function are ExperimentResults, average ranks (as returned by Friedman)
2060        and, optionally, a statistics; greater values should mean better results.append
2061    """
2062    res_split = split_by_iterations(res)
2063    res = [stat(r) for r in res_split]
2064
2065    k = len(res[0])
2066    bt = []
2067    for m1 in range(k):
2068        nl = []
2069        for m2 in range(m1+1, k):
2070            t, p = statc.wilcoxont([r[m1] for r in res], [r[m2] for r in res])
2071            if avgranks[m1]<avgranks[m2]:
2072                nl.append(p)
2073            elif avgranks[m2]<avgranks[m1]:
2074                nl.append(-p)
2075            else:
2076                nl.append(1)
2077        bt.append(nl)
2078    return bt
2079
2080
2081def plot_learning_curve_learners(file, allResults, proportions, learners, noConfidence=0):
2082    plot_learning_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], noConfidence)
2083   
2084def plot_learning_curve(file, allResults, proportions, legend, noConfidence=0):
2085    import types
2086    fopened=0
2087    if (type(file)==types.StringType):
2088        file=open(file, "wt")
2089        fopened=1
2090       
2091    file.write("set yrange [0:1]\n")
2092    file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[-1]))
2093    file.write("set multiplot\n\n")
2094    CAs = [CA_dev(x) for x in allResults]
2095
2096    file.write("plot \\\n")
2097    for i in range(len(legend)-1):
2098        if not noConfidence:
2099            file.write("'-' title '' with yerrorbars pointtype %i,\\\n" % (i+1))
2100        file.write("'-' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1))
2101    if not noConfidence:
2102        file.write("'-' title '' with yerrorbars pointtype %i,\\\n" % (len(legend)))
2103    file.write("'-' title '%s' with linespoints pointtype %i\n" % (legend[-1], len(legend)))
2104
2105    for i in range(len(legend)):
2106        if not noConfidence:
2107            for p in range(len(proportions)):
2108                file.write("%f\t%f\t%f\n" % (proportions[p], CAs[p][i][0], 1.96*CAs[p][i][1]))
2109            file.write("e\n\n")
2110
2111        for p in range(len(proportions)):
2112            file.write("%f\t%f\n" % (proportions[p], CAs[p][i][0]))
2113        file.write("e\n\n")
2114
2115    if fopened:
2116        file.close()
2117
2118
2119def print_single_ROC_curve_coordinates(file, curve):
2120    import types
2121    fopened=0
2122    if (type(file)==types.StringType):
2123        file=open(file, "wt")
2124        fopened=1
2125
2126    for coord in curve:
2127        file.write("%5.3f\t%5.3f\n" % tuple(coord))
2128
2129    if fopened:
2130        file.close()
2131
2132
2133def plot_ROC_learners(file, curves, learners):
2134    plot_ROC(file, curves, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))])
2135   
2136def plot_ROC(file, curves, legend):
2137    import types
2138    fopened=0
2139    if (type(file)==types.StringType):
2140        file=open(file, "wt")
2141        fopened=1
2142
2143    file.write("set yrange [0:1]\n")
2144    file.write("set xrange [0:1]\n")
2145    file.write("set multiplot\n\n")
2146
2147    file.write("plot \\\n")
2148    for leg in legend:
2149        file.write("'-' title '%s' with lines,\\\n" % leg)
2150    file.write("'-' title '' with lines\n")
2151
2152    for curve in curves:
2153        for coord in curve:
2154            file.write("%5.3f\t%5.3f\n" % tuple(coord))
2155        file.write("e\n\n")
2156
2157    file.write("1.0\t1.0\n0.0\t0.0e\n\n")         
2158
2159    if fopened:
2160        file.close()
2161
2162
2163
2164def plot_McNemar_curve_learners(file, allResults, proportions, learners, reference=-1):
2165    plot_McNemar_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], reference)
2166
2167def plot_McNemar_curve(file, allResults, proportions, legend, reference=-1):
2168    if reference<0:
2169        reference=len(legend)-1
2170       
2171    import types
2172    fopened=0
2173    if (type(file)==types.StringType):
2174        file=open(file, "wt")
2175        fopened=1
2176       
2177    #file.write("set yrange [0:1]\n")
2178    #file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[-1]))
2179    file.write("set multiplot\n\n")
2180    file.write("plot \\\n")
2181    tmap=range(reference)+range(reference+1, len(legend))
2182    for i in tmap[:-1]:
2183        file.write("'-' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1))
2184    file.write("'-' title '%s' with linespoints pointtype %i\n" % (legend[tmap[-1]], tmap[-1]))
2185    file.write("\n")
2186
2187    for i in tmap:
2188        for p in range(len(proportions)):
2189            file.write("%f\t%f\n" % (proportions[p], McNemar_of_two(allResults[p], i, reference)))
2190        file.write("e\n\n")
2191
2192    if fopened:
2193        file.close()
2194
2195default_point_types=("{$\\circ$}", "{$\\diamond$}", "{$+$}", "{$\\times$}", "{$|$}")+tuple([chr(x) for x in range(97, 122)])
2196default_line_types=("\\setsolid", "\\setdashpattern <4pt, 2pt>", "\\setdashpattern <8pt, 2pt>", "\\setdashes", "\\setdots")
2197
2198def learning_curve_learners_to_PiCTeX(file, allResults, proportions, **options):
2199    return apply(learning_curve_to_PiCTeX, (file, allResults, proportions), options)
2200   
2201def learning_curve_to_PiCTeX(file, allResults, proportions, **options):
2202    import types
2203    fopened=0
2204    if (type(file)==types.StringType):
2205        file=open(file, "wt")
2206        fopened=1
2207
2208    nexamples=len(allResults[0].results)
2209    CAs = [CA_dev(x) for x in allResults]
2210
2211    graphsize=float(options.get("graphsize", 10.0)) #cm
2212    difprop=proportions[-1]-proportions[0]
2213    ntestexamples=nexamples*proportions[-1]
2214    xunit=graphsize/ntestexamples
2215
2216    yshift=float(options.get("yshift", -ntestexamples/20.))
2217   
2218    pointtypes=options.get("pointtypes", default_point_types)
2219    linetypes=options.get("linetypes", default_line_types)
2220
2221    if options.has_key("numberedx"):
2222        numberedx=options["numberedx"]
2223        if type(numberedx)==types.IntType:
2224            if numberedx>0:
2225                numberedx=[nexamples*proportions[int(i/float(numberedx)*len(proportions))] for i in range(numberedx)]+[proportions[-1]*nexamples]
2226            elif numberedx<0:
2227                numberedx = -numberedx
2228                newn=[]
2229                for i in range(numberedx+1):
2230                    wanted=proportions[0]+float(i)/numberedx*difprop
2231                    best=(10, 0)
2232                    for t in proportions:
2233                        td=abs(wanted-t)
2234                        if td<best[0]:
2235                            best=(td, t)
2236                    if not best[1] in newn:
2237                        newn.append(best[1])
2238                newn.sort()
2239                numberedx=[nexamples*x for x in newn]
2240        elif type(numberedx[0])==types.FloatType:
2241            numberedx=[nexamples*x for x in numberedx]
2242    else:
2243        numberedx=[nexamples*x for x in proportions]
2244
2245    file.write("\\mbox{\n")
2246    file.write(\\beginpicture\n")
2247    file.write(\\setcoordinatesystem units <%10.8fcm, %5.3fcm>\n\n" % (xunit, graphsize))   
2248    file.write(\\setplotarea x from %5.3f to %5.3f, y from 0 to 1\n" % (0, ntestexamples))   
2249    file.write(\\axis bottom invisible\n")# label {#examples}\n")
2250    file.write("      ticks short at %s /\n" % reduce(lambda x,y:x+" "+y, ["%i"%(x*nexamples+0.5) for x in proportions]))
2251    if numberedx:
2252        file.write("            long numbered at %s /\n" % reduce(lambda x,y:x+y, ["%i " % int(x+0.5) for x in numberedx]))
2253    file.write("  /\n")
2254    file.write(\\axis left invisible\n")# label {classification accuracy}\n")
2255    file.write("      shiftedto y=%5.3f\n" % yshift)
2256    file.write("      ticks short from 0.0 to 1.0 by 0.05\n")
2257    file.write("            long numbered from 0.0 to 1.0 by 0.25\n")
2258    file.write("  /\n")
2259    if options.has_key("default"):
2260        file.write(\\setdashpattern<1pt, 1pt>\n")
2261        file.write(\\plot %5.3f %5.3f %5.3f %5.3f /\n" % (0., options["default"], ntestexamples, options["default"]))
2262   
2263    for i in range(len(CAs[0])):
2264        coordinates=reduce(lambda x,y:x+" "+y, ["%i %5.3f" % (proportions[p]*nexamples, CAs[p][i][0]) for p in range(len(proportions))])
2265        if linetypes:
2266            file.write(%s\n" % linetypes[i])
2267            file.write(\\plot %s /\n" % coordinates)
2268        if pointtypes:
2269            file.write(\\multiput %s at %s /\n" % (pointtypes[i], coordinates))
2270
2271    file.write(\\endpicture\n")
2272    file.write("}\n")
2273    if fopened:
2274        file.close()
2275    file.close()
2276    del file
2277
2278def legend_learners_to_PiCTeX(file, learners, **options):
2279  return apply(legend_to_PiCTeX, (file, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))]), options)
2280   
2281def legend_to_PiCTeX(file, legend, **options):
2282    import types
2283    fopened=0
2284    if (type(file)==types.StringType):
2285        file=open(file, "wt")
2286        fopened=1
2287
2288    pointtypes=options.get("pointtypes", default_point_types)
2289    linetypes=options.get("linetypes", default_line_types)
2290
2291    file.write("\\mbox{\n")
2292    file.write(\\beginpicture\n")
2293    file.write(\\setcoordinatesystem units <5cm, 1pt>\n\n")
2294    file.write(\\setplotarea x from 0.000 to %5.3f, y from 0 to 12\n" % len(legend))
2295
2296    for i in range(len(legend)):
2297        if linetypes:
2298            file.write(%s\n" % linetypes[i])
2299            file.write(\\plot %5.3f 6 %5.3f 6 /\n" % (i, i+0.2))
2300        if pointtypes:
2301            file.write(\\put {%s} at %5.3f 6\n" % (pointtypes[i], i+0.1))
2302        file.write(\\put {%s} [lb] at %5.3f 0\n" % (legend[i], i+0.25))
2303
2304    file.write(\\endpicture\n")
2305    file.write("}\n")
2306    if fopened:
2307        file.close()
2308    file.close()
2309    del file
2310
2311
2312def compute_friedman(avranks, N):
2313    """ Returns a tuple composed of (friedman statistic, degrees of freedom)
2314    and (Iman statistic - F-distribution, degrees of freedoma) given average
2315    ranks and a number of tested data sets N.
2316    """
2317
2318    k = len(avranks)
2319
2320    def friedman(N, k, ranks):
2321        return 12*N*(sum([rank**2.0 for rank in ranks]) - (k*(k+1)*(k+1)/4.0) )/(k*(k+1))
2322
2323    def iman(fried, N, k):
2324        return (N-1)*fried/(N*(k-1) - fried)
2325
2326    f = friedman(N, k, avranks)
2327    im = iman(f, N, k)
2328    fdistdof = (k-1, (k-1)*(N-1))
2329
2330    return (f, k-1), (im, fdistdof)
2331
2332def compute_CD(avranks, N, alpha="0.05", type="nemenyi"):
2333    """ Returns critical difference for Nemenyi or Bonferroni-Dunn test
2334    according to given alpha (either alpha="0.05" or alpha="0.1") for average
2335    ranks and number of tested data sets N. Type can be either "nemenyi" for
2336    for Nemenyi two tailed test or "bonferroni-dunn" for Bonferroni-Dunn test.
2337    """
2338
2339    k = len(avranks)
2340   
2341    d = {}
2342
2343    #can be computed in R as qtukey(0.95, n, Inf)**0.5
2344    #for (x in c(2:20)) print(qtukey(0.95, x, Inf)/(2**0.5)
2345    d[("nemenyi", "0.05")] = [0, 0, 1.959964, 2.343701, 2.569032, 2.727774, 2.849705, 2.94832, 3.030879, 3.101730, 3.163684, 3.218654, 3.268004, 3.312739, 3.353618, 3.39123, 3.426041, 3.458425, 3.488685, 3.517073, 3.543799 ]
2346    d[("nemenyi", "0.1")] = [0, 0, 1.644854, 2.052293, 2.291341, 2.459516, 2.588521, 2.692732, 2.779884, 2.854606, 2.919889, 2.977768, 3.029694, 3.076733, 3.119693, 3.159199, 3.195743, 3.229723, 3.261461, 3.291224, 3.319233 ]
2347
2348    d[("bonferroni-dunn", "0.05")] =  [0, 0, 1.960, 2.241, 2.394, 2.498, 2.576, 2.638, 2.690, 2.724, 2.773 ]
2349    d[("bonferroni-dunn", "0.1")] = [0, 0, 1.645, 1.960, 2.128, 2.241, 2.326, 2.394, 2.450, 2.498, 2.539 ]
2350
2351    q = d[(type, alpha)]
2352
2353    cd = q[k]*(k*(k+1)/(6.0*N))**0.5
2354
2355    return cd
2356 
2357
2358def graph_ranks(filename, avranks, names, cd=None, cdmethod=None, lowv=None, highv=None, width=6, textspace=1, reverse=False, **kwargs):
2359    """
2360    Draws a CD graph, which is used to display  the differences in methods'
2361    performance.
2362    See Janez Demsar, Statistical Comparisons of Classifiers over
2363    Multiple Data Sets, 7(Jan):1--30, 2006.
2364
2365    Needs matplotlib to work.
2366
2367    :param filename: Output file name (with extension). Formats supported
2368                     by matplotlib can be used.
2369    :param avranks: List of average methods' ranks.
2370    :param names: List of methods' names.
2371
2372    :param cd: Critical difference. Used for marking methods that whose
2373               difference is not statistically significant.
2374    :param lowv: The lowest shown rank, if None, use 1.
2375    :param highv: The highest shown rank, if None, use len(avranks).
2376    :param width: Width of the drawn figure in inches, default 6 in.
2377    :param textspace: Space on figure sides left for the description
2378                      of methods, default 1 in.
2379    :param reverse:  If True, the lowest rank is on the right. Default: False.
2380    :param cdmethod: None by default. It can be an index of element in avranks
2381                     or or names which specifies the method which should be
2382                     marked with an interval.
2383    """
2384
2385    width = float(width)
2386    textspace = float(textspace)
2387
2388    def nth(l,n):
2389        """
2390        Returns only nth elemnt in a list.
2391        """
2392        n = lloc(l,n)
2393        return [ a[n] for a in l ]
2394
2395    def lloc(l,n):
2396        """
2397        List location in list of list structure.
2398        Enable the use of negative locations:
2399        -1 is the last element, -2 second last...
2400        """
2401        if n < 0:
2402            return len(l[0])+n
2403        else:
2404            return n
2405
2406    def mxrange(lr):
2407        """
2408        Multiple xranges. Can be used to traverse matrices.
2409        This function is very slow due to unknown number of
2410        parameters.
2411
2412        >>> mxrange([3,5])
2413        [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
2414
2415        >>> mxrange([[3,5,1],[9,0,-3]])
2416        [(3, 9), (3, 6), (3, 3), (4, 9), (4, 6), (4, 3)]
2417
2418        """
2419        if len(lr) == 0:
2420            yield ()
2421        else:
2422            #it can work with single numbers
2423            index = lr[0]
2424            if type(1) == type(index):
2425                index = [ index ]
2426            for a in range(*index):
2427                for b in mxrange(lr[1:]):
2428                    yield tuple([a] + list(b))
2429
2430    try:
2431        from matplotlib.figure import Figure
2432        from matplotlib.patches import Polygon
2433        from matplotlib.backends.backend_agg import FigureCanvasAgg
2434    except:
2435        import sys
2436        print >> sys.stderr, "Function requires matplotlib. Please install it."
2437        return
2438
2439    def print_figure(fig, *args, **kwargs):
2440        canvas = FigureCanvasAgg(fig)
2441        canvas.print_figure(*args, **kwargs)
2442
2443    sums = avranks
2444
2445    tempsort =  sorted([ (a,i) for i,a in  enumerate(sums) ], reverse=reverse)
2446    ssums = nth(tempsort, 0)
2447    sortidx = nth(tempsort, 1)
2448    nnames = [ names[x] for x in sortidx ]
2449   
2450    if lowv == None:
2451        lowv = min(1, int(math.floor(min(ssums))))
2452    if highv == None:
2453        highv = max(len(avranks), int(math.ceil(max(ssums))))
2454
2455    cline = 0.4
2456
2457    k = len(sums)
2458
2459    lines = None
2460    sums = sorted(sums)
2461
2462    linesblank = 0
2463    scalewidth = width - 2*textspace
2464
2465    def rankpos(rank):
2466        if not reverse:
2467            a = rank - lowv
2468        else:
2469            a = highv - rank
2470        return textspace+scalewidth/(highv-lowv)*a
2471
2472    distanceh = 0.25
2473
2474    if cd and cdmethod == None:
2475   
2476        #get pairs of non significant methods
2477
2478        def get_lines(sums, hsd):
2479
2480            #get all pairs
2481            lsums = len(sums)
2482            allpairs = [ (i,j) for i,j in mxrange([[lsums], [lsums]]) if j > i ]
2483            #remove not significant
2484            notSig = [ (i,j) for i,j in allpairs if abs(sums[i]-sums[j]) <= hsd ]
2485            #keep only longest
2486           
2487            def no_longer((i,j), notSig):
2488                for i1,j1 in notSig:
2489                    if (i1 <= i and j1 > j) or (i1 < i and j1 >= j):
2490                        return False
2491                return True
2492
2493            longest = [ (i,j) for i,j in notSig if no_longer((i,j),notSig) ]
2494           
2495            return longest
2496
2497        lines = get_lines(ssums, cd)
2498        linesblank = 0.2 + 0.2 + (len(lines)-1)*0.1
2499
2500        #add scale
2501        distanceh = 0.25
2502        cline += distanceh
2503
2504    #calculate height needed height of an image
2505    minnotsignificant = max(2*0.2, linesblank)
2506    height = cline + ((k+1)/2)*0.2 + minnotsignificant
2507
2508    fig = Figure(figsize=(width, height))
2509    ax = fig.add_axes([0,0,1,1]) #reverse y axis
2510    ax.set_axis_off()
2511
2512    hf = 1./height # height factor
2513    wf = 1./width
2514
2515    def hfl(l): 
2516        return [ a*hf for a in l ]
2517
2518    def wfl(l): 
2519        return [ a*wf for a in l ]
2520
2521    """
2522    Upper left corner is (0,0).
2523    """
2524
2525    ax.plot([0,1], [0,1], c="w")
2526    ax.set_xlim(0, 1)
2527    ax.set_ylim(1, 0)
2528
2529    def line(l, color='k', **kwargs):
2530        """
2531        Input is a list of pairs of points.
2532        """
2533        ax.plot(wfl(nth(l,0)), hfl(nth(l,1)), color=color, **kwargs)
2534
2535    def text(x, y, s, *args, **kwargs):
2536        ax.text(wf*x, hf*y, s, *args, **kwargs)
2537
2538    line([(textspace, cline), (width-textspace, cline)], linewidth=0.7)
2539   
2540    bigtick = 0.1
2541    smalltick = 0.05
2542
2543
2544    import numpy
2545
2546    for a in list(numpy.arange(lowv, highv, 0.5)) + [highv]:
2547        tick = smalltick
2548        if a == int(a): tick = bigtick
2549        line([(rankpos(a), cline-tick/2),(rankpos(a), cline)], linewidth=0.7)
2550
2551    for a in range(lowv, highv+1):
2552        text(rankpos(a), cline-tick/2-0.05, str(a), ha="center", va="bottom")
2553
2554    k = len(ssums)
2555
2556    for i in range((k+1)/2):
2557        chei = cline+ minnotsignificant + (i)*0.2
2558        line([(rankpos(ssums[i]), cline), (rankpos(ssums[i]), chei), (textspace-0.1, chei)], linewidth=0.7)
2559        text(textspace-0.2, chei, nnames[i], ha="right", va="center")
2560
2561    for i in range((k+1)/2, k):
2562        chei = cline + minnotsignificant + (k-i-1)*0.2
2563        line([(rankpos(ssums[i]), cline), (rankpos(ssums[i]), chei), (textspace+scalewidth+0.1, chei)], linewidth=0.7)
2564        text(textspace+scalewidth+0.2, chei, nnames[i], ha="left", va="center")
2565
2566    if cd and cdmethod == None:
2567
2568        #upper scale
2569        if not reverse:
2570            begin, end = rankpos(lowv), rankpos(lowv+cd)
2571        else:
2572            begin, end = rankpos(highv), rankpos(highv - cd)
2573           
2574        line([(begin, distanceh), (end, distanceh)], linewidth=0.7)
2575        line([(begin, distanceh + bigtick/2), (begin, distanceh - bigtick/2)], linewidth=0.7)
2576        line([(end, distanceh + bigtick/2), (end, distanceh - bigtick/2)], linewidth=0.7)
2577        text((begin+end)/2, distanceh - 0.05, "CD", ha="center", va="bottom")
2578
2579        #non significance lines   
2580        def draw_lines(lines, side=0.05, height=0.1):
2581            start = cline + 0.2
2582            for l,r in lines: 
2583                line([(rankpos(ssums[l])-side, start), (rankpos(ssums[r])+side, start)], linewidth=2.5) 
2584                start += height
2585
2586        draw_lines(lines)
2587
2588    elif cd:
2589        begin = rankpos(avranks[cdmethod]-cd)
2590        end = rankpos(avranks[cdmethod]+cd)
2591        line([(begin, cline), (end, cline)], linewidth=2.5) 
2592        line([(begin, cline + bigtick/2), (begin, cline - bigtick/2)], linewidth=2.5)
2593        line([(end, cline + bigtick/2), (end, cline - bigtick/2)], linewidth=2.5)
2594 
2595    print_figure(fig, filename, **kwargs)
2596
2597def mlc_hamming_loss(res):
2598    """
2599    Schapire and Singer (2000) presented Hamming Loss, which id defined as:
2600   
2601    :math:`HammingLoss(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{Y_i \\vartriangle Z_i}{|L|}`
2602    """
2603    losses = [0]*res.number_of_learners
2604    label_num = len(res.labels)
2605    example_num = gettotsize(res)
2606
2607    for e in res.results:
2608        aclass = e.actual_class
2609        for i, labels in enumerate(e.classes):
2610            labels = map(int, labels)
2611            if len(labels) <> len(aclass):
2612                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2613            for j in range(label_num):
2614                if labels[j] != aclass[j]:
2615                    losses[i] += 1
2616           
2617    return [float(x)/(label_num*example_num) for x in losses]
2618
2619def mlc_accuracy(res, forgiveness_rate = 1.0):
2620    """
2621    Godbole & Sarawagi, 2004 uses the metrics accuracy, precision, recall as follows:
2622     
2623    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|}`
2624   
2625    Boutell et al. (2004) give a more generalized version using a parameter :math:`\\alpha \\ge 0`,
2626    called forgiveness rate:
2627   
2628    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} (\\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|})^{\\alpha}`
2629    """
2630    accuracies = [0.0]*res.number_of_learners
2631    label_num = len(res.labels)
2632    example_num = gettotsize(res)
2633   
2634    for e in res.results:
2635        aclass = e.actual_class
2636        for i, labels in enumerate(e.classes):
2637            labels = map(int, labels)
2638            if len(labels) <> len(aclass):
2639                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2640           
2641            intersection = 0.0
2642            union = 0.0
2643            for real, pred in zip(labels, aclass):
2644                if real and pred:
2645                    intersection = intersection+1
2646                if real or pred:
2647                    union = union+1
2648
2649            if union != 0:
2650                accuracies[i] = accuracies[i] + intersection/union
2651           
2652    return [math.pow(x/example_num,forgiveness_rate) for x in accuracies]
2653
2654def mlc_precision(res):
2655    """
2656    :math:`Precision(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Z_i|}`
2657    """
2658    precisions = [0.0]*res.number_of_learners
2659    label_num = len(res.labels)
2660    example_num = gettotsize(res)
2661   
2662    for e in res.results:
2663        aclass = e.actual_class
2664        for i, labels in enumerate(e.classes):
2665            labels = map(int, labels)
2666            if len(labels) <> len(aclass):
2667                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2668           
2669            intersection = 0.0
2670            predicted = 0.0
2671            for real, pred in zip(labels, aclass):
2672                if real and pred:
2673                    intersection = intersection+1
2674                if real:
2675                    predicted = predicted + 1
2676            if predicted <> 0:
2677                precisions[i] = precisions[i] + intersection/predicted
2678           
2679    return [x/example_num for x in precisions]
2680
2681def mlc_recall(res):
2682    """
2683    :math:`Recall(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i|}`
2684    """
2685    recalls = [0.0]*res.number_of_learners
2686    label_num = len(res.labels)
2687    example_num = gettotsize(res)
2688   
2689    for e in res.results:
2690        aclass = e.actual_class
2691        for i, labels in enumerate(e.classes):
2692            labels = map(int, labels)
2693            if len(labels) <> len(aclass):
2694                raise ValueError, "The dimensions of the classified output and the actual class array do not match."
2695           
2696            intersection = 0.0
2697            actual = 0.0
2698            for real, pred in zip(labels, aclass):
2699                if real and pred:
2700                    intersection = intersection+1
2701                if pred:
2702                    actual = actual + 1
2703            if actual <> 0:
2704                recalls[i] = recalls[i] + intersection/actual
2705           
2706    return [x/example_num for x in recalls]
2707
2708#def mlc_ranking_loss(res):
2709#    pass
2710#
2711#def mlc_average_precision(res):
2712#    pass
2713#
2714#def mlc_hierarchical_loss(res):
2715#    pass
2716
2717#########################################################################################
2718if __name__ == "__main__":
2719    avranks =  [3.143, 2.000, 2.893, 1.964]
2720    names = ["prva", "druga", "tretja", "cetrta" ]
2721    cd = compute_CD(avranks, 14)
2722    #cd = compute_CD(avranks, 10, type="bonferroni-dunn")
2723    print cd
2724
2725    print compute_friedman(avranks, 14)
2726
2727    #graph_ranks("test.eps", avranks, names, cd=cd, cdmethod=0, width=6, textspace=1.5)
Note: See TracBrowser for help on using the repository browser.