Changeset 9892:3b220d15fb39 in orange
 Timestamp:
 02/07/12 11:04:25 (2 years ago)
 Branch:
 default
 rebase_source:
 fd358bf360bc24d5c7ae3104b540b946f9cf6f41
 Files:

 2 edited
Legend:
 Unmodified
 Added
 Removed

Orange/evaluation/scoring.py
r9725 r9892 1 """2 ############################3 Method scoring (``scoring``)4 ############################5 6 .. index: scoring7 8 This module contains various measures of quality for classification and9 regression. Most functions require an argument named :obj:`res`, an instance of10 :class:`Orange.evaluation.testing.ExperimentResults` as computed by11 functions from :mod:`Orange.evaluation.testing` and which contains12 predictions obtained through crossvalidation,13 leave oneout, testing on training data or test set instances.14 15 ==============16 Classification17 ==============18 19 To prepare some data for examples on this page, we shall load the voting data20 set (problem of predicting the congressman's party (republican, democrat)21 based on a selection of votes) and evaluate naive Bayesian learner,22 classification trees and majority classifier using crossvalidation.23 For examples requiring a multivalued class problem, we shall do the same24 with the vehicle data set (telling whether a vehicle described by the features25 extracted from a picture is a van, bus, or Opel or Saab car).26 27 Basic cross validation example is shown in the following part of28 (:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`):29 30 .. literalinclude:: code/statExample0.py31 32 If instances are weighted, weights are taken into account. This can be33 disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of34 disabling weights is to clear the35 :class:`Orange.evaluation.testing.ExperimentResults`' flag weights.36 37 General Measures of Quality38 ===========================39 40 .. autofunction:: CA41 42 .. autofunction:: AP43 44 .. autofunction:: Brier_score45 46 .. autofunction:: IS47 48 So, let's compute all this in part of49 (:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`) and print it out:50 51 .. literalinclude:: code/statExample1.py52 :lines: 1353 54 The output should look like this::55 56 method CA AP Brier IS57 bayes 0.903 0.902 0.175 0.75958 tree 0.846 0.845 0.286 0.64159 majrty 0.614 0.526 0.474 0.00060 61 Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out62 the standard errors.63 64 Confusion Matrix65 ================66 67 .. autofunction:: confusion_matrices68 69 **A positivenegative confusion matrix** is computed (a) if the class is70 binary unless :obj:`classIndex` argument is 2, (b) if the class is71 multivalued and the :obj:`classIndex` is nonnegative. Argument72 :obj:`classIndex` then tells which class is positive. In case (a),73 :obj:`classIndex` may be omitted; the first class74 is then negative and the second is positive, unless the :obj:`baseClass`75 attribute in the object with results has nonnegative value. In that case,76 :obj:`baseClass` is an index of the target class. :obj:`baseClass`77 attribute of results object should be set manually. The result of a78 function is a list of instances of class :class:`ConfusionMatrix`,79 containing the (weighted) number of true positives (TP), false80 negatives (FN), false positives (FP) and true negatives (TN).81 82 We can also add the keyword argument :obj:`cutoff`83 (e.g. confusion_matrices(results, cutoff=0.3); if we do, :obj:`confusion_matrices`84 will disregard the classifiers' class predictions and observe the predicted85 probabilities, and consider the prediction "positive" if the predicted86 probability of the positive class is higher than the :obj:`cutoff`.87 88 The example (part of :download:`statExamples.py <code/statExamples.py>`) below shows how setting the89 cut off threshold from the default 0.5 to 0.2 affects the confusion matrics90 for naive Bayesian classifier::91 92 cm = Orange.evaluation.scoring.confusion_matrices(res)[0]93 print "Confusion matrix for naive Bayes:"94 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)95 96 cm = Orange.evaluation.scoring.confusion_matrices(res, cutoff=0.2)[0]97 print "Confusion matrix for naive Bayes:"98 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN)99 100 The output::101 102 Confusion matrix for naive Bayes:103 TP: 238, FP: 13, FN: 29.0, TN: 155104 Confusion matrix for naive Bayes:105 TP: 239, FP: 18, FN: 28.0, TN: 150106 107 shows that the number of true positives increases (and hence the number of108 false negatives decreases) by only a single instance, while five instances109 that were originally true negatives become false positives due to the110 lower threshold.111 112 To observe how good are the classifiers in detecting vans in the vehicle113 data set, we would compute the matrix like this::114 115 cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \116 vehicle.domain.classVar.values.index("van"))117 118 and get the results like these::119 120 TP: 189, FP: 241, FN: 10.0, TN: 406121 122 while the same for class "opel" would give::123 124 TP: 86, FP: 112, FN: 126.0, TN: 522125 126 The main difference is that there are only a few false negatives for the127 van, meaning that the classifier seldom misses it (if it says it's not a128 van, it's almost certainly not a van). Not so for the Opel car, where the129 classifier missed 126 of them and correctly detected only 86.130 131 **General confusion matrix** is computed (a) in case of a binary class,132 when :obj:`classIndex` is set to 2, (b) when we have multivalued class and133 the caller doesn't specify the :obj:`classIndex` of the positive class.134 When called in this manner, the function cannot use the argument135 :obj:`cutoff`.136 137 The function then returns a threedimensional matrix, where the element138 A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`]139 gives the number of instances belonging to 'actual_class' for which the140 'learner' predicted 'predictedClass'. We shall compute and print out141 the matrix for naive Bayesian classifier.142 143 Here we see another example from :download:`statExamples.py <code/statExamples.py>`::144 145 cm = Orange.evaluation.scoring.confusion_matrices(resVeh)[0]146 classes = vehicle.domain.classVar.values147 print "\t"+"\t".join(classes)148 for className, classConfusions in zip(classes, cm):149 print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions))150 151 So, here's what this nice piece of code gives::152 153 bus van saab opel154 bus 56 95 21 46155 van 6 189 4 0156 saab 3 75 73 66157 opel 4 71 51 86158 159 Van's are clearly simple: 189 vans were classified as vans (we know this160 already, we've printed it out above), and the 10 misclassified pictures161 were classified as buses (6) and Saab cars (4). In all other classes,162 there were more instances misclassified as vans than correctly classified163 instances. The classifier is obviously quite biased to vans.164 165 .. method:: sens(confm)166 .. method:: spec(confm)167 .. method:: PPV(confm)168 .. method:: NPV(confm)169 .. method:: precision(confm)170 .. method:: recall(confm)171 .. method:: F2(confm)172 .. method:: Falpha(confm, alpha=2.0)173 .. method:: MCC(conf)174 175 With the confusion matrix defined in terms of positive and negative176 classes, you can also compute the177 `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_(tests)>`_178 [TP/(TP+FN)], `specificity \179 <http://en.wikipedia.org/wiki/Specificity_%28tests%29>`_180 [TN/(TN+FP)], `positive predictive value \181 <http://en.wikipedia.org/wiki/Positive_predictive_value>`_182 [TP/(TP+FP)] and `negative predictive value \183 <http://en.wikipedia.org/wiki/Negative_predictive_value>`_ [TN/(TN+FN)].184 In information retrieval, positive predictive value is called precision185 (the ratio of the number of relevant records retrieved to the total number186 of irrelevant and relevant records retrieved), and sensitivity is called187 `recall <http://en.wikipedia.org/wiki/Information_retrieval>`_188 (the ratio of the number of relevant records retrieved to the total number189 of relevant records in the database). The190 `harmonic mean <http://en.wikipedia.org/wiki/Harmonic_mean>`_ of precision191 and recall is called an192 `Fmeasure <http://en.wikipedia.org/wiki/Fmeasure>`_, where, depending193 on the ratio of the weight between precision and recall is implemented194 as F1 [2*precision*recall/(precision+recall)] or, for a general case,195 Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)].196 The `Matthews correlation coefficient \197 <http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_198 in essence a correlation coefficient between199 the observed and predicted binary classifications; it returns a value200 between 1 and +1. A coefficient of +1 represents a perfect prediction,201 0 an average random prediction and 1 an inverse prediction.202 203 If the argument :obj:`confm` is a single confusion matrix, a single204 result (a number) is returned. If confm is a list of confusion matrices,205 a list of scores is returned, one for each confusion matrix.206 207 Note that weights are taken into account when computing the matrix, so208 these functions don't check the 'weighted' keyword argument.209 210 Let us print out sensitivities and specificities of our classifiers in211 part of :download:`statExamples.py <code/statExamples.py>`::212 213 cm = Orange.evaluation.scoring.confusion_matrices(res)214 print215 print "method\tsens\tspec"216 for l in range(len(learners)):217 print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l]))218 219 ROC Analysis220 ============221 222 `Receiver Operating Characteristic \223 <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_224 (ROC) analysis was initially developed for225 a binarylike problems and there is no consensus on how to apply it in226 multiclass problems, nor do we know for sure how to do ROC analysis after227 cross validation and similar multiple sampling techniques. If you are228 interested in the area under the curve, function AUC will deal with those229 problems as specifically described below.230 231 .. autofunction:: AUC232 233 .. attribute:: AUC.ByWeightedPairs (or 0)234 235 Computes AUC for each pair of classes (ignoring instances of all other236 classes) and averages the results, weighting them by the number of237 pairs of instances from these two classes (e.g. by the product of238 probabilities of the two classes). AUC computed in this way still239 behaves as concordance index, e.g., gives the probability that two240 randomly chosen instances from different classes will be correctly241 recognized (this is of course true only if the classifier knows242 from which two classes the instances came).243 244 .. attribute:: AUC.ByPairs (or 1)245 246 Similar as above, except that the average over class pairs is not247 weighted. This AUC is, like the binary, independent of class248 distributions, but it is not related to concordance index any more.249 250 .. attribute:: AUC.WeightedOneAgainstAll (or 2)251 252 For each class, it computes AUC for this class against all others (that253 is, treating other classes as one class). The AUCs are then averaged by254 the class probabilities. This is related to concordance index in which255 we test the classifier's (average) capability for distinguishing the256 instances from a specified class from those that come from other classes.257 Unlike the binary AUC, the measure is not independent of class258 distributions.259 260 .. attribute:: AUC.OneAgainstAll (or 3)261 262 As above, except that the average is not weighted.263 264 In case of multiple folds (for instance if the data comes from cross265 validation), the computation goes like this. When computing the partial266 AUCs for individual pairs of classes or singledout classes, AUC is267 computed for each fold separately and then averaged (ignoring the number268 of instances in each fold, it's just a simple average). However, if a269 certain fold doesn't contain any instances of a certain class (from the270 pair), the partial AUC is computed treating the results as if they came271 from a singlefold. This is not really correct since the class272 probabilities from different folds are not necessarily comparable,273 yet this will most often occur in a leaveoneout experiments,274 comparability shouldn't be a problem.275 276 Computing and printing out the AUC's looks just like printing out277 classification accuracies (except that we call AUC instead of278 CA, of course)::279 280 AUCs = Orange.evaluation.scoring.AUC(res)281 for l in range(len(learners)):282 print "%10s: %5.3f" % (learners[l].name, AUCs[l])283 284 For vehicle, you can run exactly this same code; it will compute AUCs285 for all pairs of classes and return the average weighted by probabilities286 of pairs. Or, you can specify the averaging method yourself, like this::287 288 AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll)289 290 The following snippet tries out all four. (We don't claim that this is291 how the function needs to be used; it's better to stay with the default.)::292 293 methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]294 print " " *25 + " \tbayes\ttree\tmajority"295 for i in range(4):296 AUCs = Orange.evaluation.scoring.AUC(resVeh, i)297 print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))298 299 As you can see from the output::300 301 bayes tree majority302 by pairs, weighted: 0.789 0.871 0.500303 by pairs: 0.791 0.872 0.500304 one vs. all, weighted: 0.783 0.800 0.500305 one vs. all: 0.783 0.800 0.500306 307 .. autofunction:: AUC_single308 309 .. autofunction:: AUC_pair310 311 .. autofunction:: AUC_matrix312 313 The remaining functions, which plot the curves and statistically compare314 them, require that the results come from a test with a single iteration,315 and they always compare one chosen class against all others. If you have316 cross validation results, you can either use split_by_iterations to split the317 results by folds, call the function for each fold separately and then sum318 the results up however you see fit, or you can set the ExperimentResults'319 attribute number_of_iterations to 1, to cheat the function  at your own320 responsibility for the statistical correctness. Regarding the multiclass321 problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class322 attribute's baseValue at the time when results were computed. If baseValue323 was not given at that time, 1 (that is, the second class) is used as default.324 325 We shall use the following code to prepare suitable experimental results::326 327 ri2 = Orange.core.MakeRandomIndices2(voting, 0.6)328 train = voting.selectref(ri2, 0)329 test = voting.selectref(ri2, 1)330 res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test)331 332 333 .. autofunction:: AUCWilcoxon334 335 .. autofunction:: compute_ROC336 337 Comparison of Algorithms338 339 340 .. autofunction:: McNemar341 342 .. autofunction:: McNemar_of_two343 344 ==========345 Regression346 ==========347 348 General Measure of Quality349 ==========================350 351 Several alternative measures, as given below, can be used to evaluate352 the sucess of numeric prediction:353 354 .. image:: files/statRegression.png355 356 .. autofunction:: MSE357 358 .. autofunction:: RMSE359 360 .. autofunction:: MAE361 362 .. autofunction:: RSE363 364 .. autofunction:: RRSE365 366 .. autofunction:: RAE367 368 .. autofunction:: R2369 370 The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to371 score several regression methods.372 373 .. literalinclude:: code/statExamplesRegression.py374 375 The code above produces the following output::376 377 Learner MSE RMSE MAE RSE RRSE RAE R2378 maj 84.585 9.197 6.653 1.002 1.001 1.001 0.002379 rt 40.015 6.326 4.592 0.474 0.688 0.691 0.526380 knn 21.248 4.610 2.870 0.252 0.502 0.432 0.748381 lr 24.092 4.908 3.425 0.285 0.534 0.515 0.715382 383 =================384 Ploting functions385 =================386 387 .. autofunction:: graph_ranks388 389 The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:390 391 .. literalinclude:: code/statExamplesGraphRanks.py392 393 Code produces the following graph:394 395 .. image:: files/statExamplesGraphRanks1.png396 397 .. autofunction:: compute_CD398 399 .. autofunction:: compute_friedman400 401 =================402 Utility Functions403 =================404 405 .. autofunction:: split_by_iterations406 407 =====================================408 Scoring for multilabel classification409 =====================================410 411 Multilabel classification requries different metrics than those used in traditional singlelabel412 classification. This module presents the various methrics that have been proposed in the literature.413 Let :math:`D` be a multilabel evaluation data set, conisting of :math:`D` multilabel examples414 :math:`(x_i,Y_i)`, :math:`i=1..D`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multilabel classifier415 and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`.416 417 .. autofunction:: mlc_hamming_loss418 .. autofunction:: mlc_accuracy419 .. autofunction:: mlc_precision420 .. autofunction:: mlc_recall421 422 So, let's compute all this and print it out (part of423 :download:`mlcevaluate.py <code/mlcevaluate.py>`, uses424 :download:`emotions.tab <code/emotions.tab>`):425 426 .. literalinclude:: code/mlcevaluate.py427 :lines: 115428 429 The output should look like this::430 431 loss= [0.9375]432 accuracy= [0.875]433 precision= [1.0]434 recall= [0.875]435 436 References437 ==========438 439 Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multilabel scene classification',440 Pattern Recogintion, vol.37, no.9, pp:175771441 442 Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multilabeled Classification', paper443 presented to Proceedings of the 8th PacificAsia Conference on Knowledge Discovery and Data Mining444 (PAKDD 2004)445 446 Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bosstingbased system for text categorization',447 Machine Learning, vol.39, no.2/3, pp:13568.448 449 """450 451 1 import operator, math 452 2 from operator import add … … 455 5 import Orange 456 6 from Orange import statc 457 7 from Orange.misc import deprecated_keywords 458 8 459 9 #### Private stuff … … 533 83 534 84 535 def statistics_by_folds(stats, foldN, reportSE, iterationIsOuter): 85 @deprecated_keywords({ 86 "foldN": "fold_n", 87 "reportSE": "report_se", 88 "iterationIsOuter": "iteration_is_outer"}) 89 def statistics_by_folds(stats, fold_n, report_se, iteration_is_outer): 536 90 # remove empty folds, turn the matrix so that learner is outer 537 if iteration IsOuter:91 if iteration_is_outer: 538 92 if not stats: 539 93 raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0." 540 94 number_of_learners = len(stats[0]) 541 stats = filter(lambda (x, fN): fN>0.0, zip(stats,fold N))95 stats = filter(lambda (x, fN): fN>0.0, zip(stats,fold_n)) 542 96 stats = [ [x[lrn]/fN for x, fN in stats] for lrn in range(number_of_learners)] 543 97 else: 544 stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, fold N))] for lrnD in stats]98 stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, fold_n))] for lrnD in stats] 545 99 546 100 if not stats: … … 549 103 raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0." 550 104 551 if report SE:105 if report_se: 552 106 return [(statc.mean(x), statc.sterr(x)) for x in stats] 553 107 else: … … 751 305 # Scores for evaluation of classifiers 752 306 753 def CA(res, reportSE = False, **argkw): 307 @deprecated_keywords({"reportSE": "report_se"}) 308 def CA(res, report_se = False, **argkw): 754 309 """ Computes classification accuracy, i.e. percentage of matches between 755 310 predicted and actual class. The function returns a list of classification … … 793 348 ca = [x/totweight for x in CAs] 794 349 795 if report SE:350 if report_se: 796 351 return [(x, x*(1x)/math.sqrt(totweight)) for x in ca] 797 352 else: … … 813 368 foldN[tex.iteration_number] += tex.weight 814 369 815 return statistics_by_folds(CAsByFold, foldN, report SE, False)370 return statistics_by_folds(CAsByFold, foldN, report_se, False) 816 371 817 372 … … 820 375 return CA(res, True, **argkw) 821 376 822 823 def AP(res, report SE= False, **argkw):377 @deprecated_keywords({"reportSE": "report_se"}) 378 def AP(res, report_se = False, **argkw): 824 379 """ Computes the average probability assigned to the correct class. """ 825 380 if res.number_of_iterations == 1: … … 848 403 foldN[tex.iteration_number] += tex.weight 849 404 850 return statistics_by_folds(APsByFold, foldN, reportSE, True) 851 852 853 def Brier_score(res, reportSE = False, **argkw): 405 return statistics_by_folds(APsByFold, foldN, report_se, True) 406 407 408 @deprecated_keywords({"reportSE": "report_se"}) 409 def Brier_score(res, report_se = False, **argkw): 854 410 """ Computes the Brier's score, defined as the average (over test examples) 855 411 of sumx(t(x)p(x))2, where x is a class, t(x) is 1 for the correct class … … 881 437 totweight = gettotweight(res) 882 438 check_non_zero(totweight) 883 if report SE:439 if report_se: 884 440 return [(max(x/totweight+1.0, 0), 0) for x in MSEs] ## change this, not zero!!! 885 441 else: … … 900 456 foldN[tex.iteration_number] += tex.weight 901 457 902 stats = statistics_by_folds(BSs, foldN, report SE, True)903 if report SE:458 stats = statistics_by_folds(BSs, foldN, report_se, True) 459 if report_se: 904 460 return [(x+1.0, y) for x, y in stats] 905 461 else: … … 915 471 else: 916 472 return (log2(1P)+log2(1Pc)) 917 918 def IS(res, apriori=None, reportSE = False, **argkw): 473 474 475 @deprecated_keywords({"reportSE": "report_se"}) 476 def IS(res, apriori=None, report_se = False, **argkw): 919 477 """ Computes the information score as defined by 920 478 `Kononenko and Bratko (1991) \ … … 941 499 ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight 942 500 totweight = gettotweight(res) 943 if report SE:501 if report_se: 944 502 return [(IS/totweight,0) for IS in ISs] 945 503 else: … … 964 522 foldN[tex.iteration_number] += tex.weight 965 523 966 return statistics_by_folds(ISs, foldN, report SE, False)524 return statistics_by_folds(ISs, foldN, report_se, False) 967 525 968 526 … … 1026 584 1027 585 1028 def confusion_matrices(res, classIndex=1, **argkw): 586 @deprecated_keywords({"classIndex": "class_index"}) 587 def confusion_matrices(res, class_index=1, **argkw): 1029 588 """ This function can compute two different forms of confusion matrix: 1030 589 one in which a certain class is marked as positive and the other(s) … … 1035 594 tfpns = [ConfusionMatrix() for i in range(res.number_of_learners)] 1036 595 1037 if class Index<0:596 if class_index<0: 1038 597 numberOfClasses = len(res.class_values) 1039 if class Index < 1 or numberOfClasses > 2:598 if class_index < 1 or numberOfClasses > 2: 1040 599 cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.number_of_learners)] 1041 600 if argkw.get("unweighted", 0) or not res.weights: … … 1056 615 1057 616 elif res.baseClass>=0: 1058 class Index = res.baseClass1059 else: 1060 class Index = 1617 class_index = res.baseClass 618 else: 619 class_index = 1 1061 620 1062 621 cutoff = argkw.get("cutoff") … … 1064 623 if argkw.get("unweighted", 0) or not res.weights: 1065 624 for lr in res.results: 1066 isPositive=(lr.actual_class==class Index)625 isPositive=(lr.actual_class==class_index) 1067 626 for i in range(res.number_of_learners): 1068 tfpns[i].addTFPosNeg(lr.probabilities[i][class Index]>cutoff, isPositive)627 tfpns[i].addTFPosNeg(lr.probabilities[i][class_index]>cutoff, isPositive) 1069 628 else: 1070 629 for lr in res.results: 1071 isPositive=(lr.actual_class==class Index)630 isPositive=(lr.actual_class==class_index) 1072 631 for i in range(res.number_of_learners): 1073 tfpns[i].addTFPosNeg(lr.probabilities[i][class Index]>cutoff, isPositive, lr.weight)632 tfpns[i].addTFPosNeg(lr.probabilities[i][class_index]>cutoff, isPositive, lr.weight) 1074 633 else: 1075 634 if argkw.get("unweighted", 0) or not res.weights: 1076 635 for lr in res.results: 1077 isPositive=(lr.actual_class==class Index)636 isPositive=(lr.actual_class==class_index) 1078 637 for i in range(res.number_of_learners): 1079 tfpns[i].addTFPosNeg(lr.classes[i]==class Index, isPositive)638 tfpns[i].addTFPosNeg(lr.classes[i]==class_index, isPositive) 1080 639 else: 1081 640 for lr in res.results: 1082 isPositive=(lr.actual_class==class Index)641 isPositive=(lr.actual_class==class_index) 1083 642 for i in range(res.number_of_learners): 1084 tfpns[i].addTFPosNeg(lr.classes[i]==class Index, isPositive, lr.weight)643 tfpns[i].addTFPosNeg(lr.classes[i]==class_index, isPositive, lr.weight) 1085 644 return tfpns 1086 645 … … 1090 649 1091 650 1092 def confusion_chi_square(confusionMatrix): 1093 dim = len(confusionMatrix) 1094 rowPriors = [sum(r) for r in confusionMatrix] 1095 colPriors = [sum([r[i] for r in confusionMatrix]) for i in range(dim)] 651 @deprecated_keywords({"confusionMatrix": "confusion_matrix"}) 652 def confusion_chi_square(confusion_matrix): 653 dim = len(confusion_matrix) 654 rowPriors = [sum(r) for r in confusion_matrix] 655 colPriors = [sum([r[i] for r in confusion_matrix]) for i in range(dim)] 1096 656 total = sum(rowPriors) 1097 657 rowPriors = [r/total for r in rowPriors] 1098 658 colPriors = [r/total for r in colPriors] 1099 659 ss = 0 1100 for ri, row in enumerate(confusion Matrix):660 for ri, row in enumerate(confusion_matrix): 1101 661 for ci, o in enumerate(row): 1102 662 e = total * rowPriors[ri] * colPriors[ci] … … 1229 789 return r 1230 790 1231 def scotts_pi(confm, bIsListOfMatrices=True): 791 792 @deprecated_keywords({"bIsListOfMatrices": "b_is_list_of_matrices"}) 793 def scotts_pi(confm, b_is_list_of_matrices=True): 1232 794 """Compute Scott's Pi for measuring interrater agreement for nominal data 1233 795 … … 1240 802 Orange.evaluation.scoring.compute_confusion_matrices and set the 1241 803 classIndex parameter to 2. 1242 @param b IsListOfMatrices: specifies whether confm is list of matrices.804 @param b_is_list_of_matrices: specifies whether confm is list of matrices. 1243 805 This function needs to operate on nonbinary 1244 806 confusion matrices, which are represented by python … … 1247 809 """ 1248 810 1249 if b IsListOfMatrices:811 if b_is_list_of_matrices: 1250 812 try: 1251 return [scotts_pi(cm, b IsListOfMatrices=False) for cm in confm]813 return [scotts_pi(cm, b_is_list_of_matrices=False) for cm in confm] 1252 814 except TypeError: 1253 815 # Nevermind the parameter, maybe this is a "conventional" binary … … 1276 838 return ret 1277 839 1278 def AUCWilcoxon(res, classIndex=1, **argkw): 840 @deprecated_keywords({"classIndex": "class_index"}) 841 def AUCWilcoxon(res, class_index=1, **argkw): 1279 842 """ Computes the area under ROC (AUC) and its standard error using 1280 843 Wilcoxon's approach proposed by Hanley and McNeal (1982). If … … 1285 848 import corn 1286 849 useweights = res.weights and not argkw.get("unweighted", 0) 1287 problists, tots = corn.computeROCCumulative(res, class Index, useweights)850 problists, tots = corn.computeROCCumulative(res, class_index, useweights) 1288 851 1289 852 results=[] … … 1313 876 AROC = AUCWilcoxon # for backward compatibility, AROC is obsolote 1314 877 1315 def compare_2_AUCs(res, lrn1, lrn2, classIndex=1, **argkw): 878 879 @deprecated_keywords({"classIndex": "class_index"}) 880 def compare_2_AUCs(res, lrn1, lrn2, class_index=1, **argkw): 1316 881 import corn 1317 return corn.compare2ROCs(res, lrn1, lrn2, class Index, res.weights and not argkw.get("unweighted"))882 return corn.compare2ROCs(res, lrn1, lrn2, class_index, res.weights and not argkw.get("unweighted")) 1318 883 1319 884 compare_2_AROCs = compare_2_AUCs # for backward compatibility, compare_2_AROCs is obsolote 1320 885 1321 1322 def compute_ROC(res, classIndex=1): 886 887 @deprecated_keywords({"classIndex": "class_index"}) 888 def compute_ROC(res, class_index=1): 1323 889 """ Computes a ROC curve as a list of (x, y) tuples, where x is 1324 890 1specificity and y is sensitivity. 1325 891 """ 1326 892 import corn 1327 problists, tots = corn.computeROCCumulative(res, class Index)893 problists, tots = corn.computeROCCumulative(res, class_index) 1328 894 1329 895 results = [] … … 1357 923 return (P1y  P2y) / (P1x  P2x) 1358 924 1359 def ROC_add_point(P, R, keepConcavities=1): 925 926 @deprecated_keywords({"keepConcavities": "keep_concavities"}) 927 def ROC_add_point(P, R, keep_concavities=1): 1360 928 if keepConcavities: 1361 929 R.append(P) … … 1374 942 return R 1375 943 1376 def TC_compute_ROC(res, classIndex=1, keepConcavities=1): 944 945 @deprecated_keywords({"classIndex": "class_index", 946 "keepConcavities": "keep_concavities"}) 947 def TC_compute_ROC(res, class_index=1, keep_concavities=1): 1377 948 import corn 1378 problists, tots = corn.computeROCCumulative(res, class Index)949 problists, tots = corn.computeROCCumulative(res, class_index) 1379 950 1380 951 results = [] … … 1399 970 else: 1400 971 fpr = 0.0 1401 curve = ROC_add_point((fpr, tpr, fPrev), curve, keep Concavities)972 curve = ROC_add_point((fpr, tpr, fPrev), curve, keep_concavities) 1402 973 fPrev = f 1403 974 thisPos, thisNeg = prob[1][1], prob[1][0] … … 1412 983 else: 1413 984 fpr = 0.0 1414 curve = ROC_add_point((fpr, tpr, f), curve, keep Concavities) ## ugly985 curve = ROC_add_point((fpr, tpr, f), curve, keep_concavities) ## ugly 1415 986 results.append(curve) 1416 987 … … 1472 1043 ## for each (sub)set of input ROC curves 1473 1044 ## returns the average ROC curve and an array of (vertical) standard deviations 1474 def TC_vertical_average_ROC(ROCcurves, samples = 10): 1045 @deprecated_keywords({"ROCcurves": "roc_curves"}) 1046 def TC_vertical_average_ROC(roc_curves, samples = 10): 1475 1047 def INTERPOLATE((P1x, P1y, P1fscore), (P2x, P2y, P2fscore), X): 1476 1048 if (P1x == P2x) or ((X > P1x) and (X > P2x)) or ((X < P1x) and (X < P2x)): … … 1501 1073 average = [] 1502 1074 stdev = [] 1503 for ROCS in ROCcurves:1075 for ROCS in roc_curves: 1504 1076 npts = [] 1505 1077 for c in ROCS: … … 1531 1103 ## for each (sub)set of input ROC curves 1532 1104 ## returns the average ROC curve, an array of vertical standard deviations and an array of horizontal standard deviations 1533 def TC_threshold_average_ROC(ROCcurves, samples = 10): 1105 @deprecated_keywords({"ROCcurves": "roc_curves"}) 1106 def TC_threshold_average_ROC(roc_curves, samples = 10): 1534 1107 def POINT_AT_THRESH(ROC, npts, thresh): 1535 1108 i = 0 … … 1545 1118 stdevV = [] 1546 1119 stdevH = [] 1547 for ROCS in ROCcurves:1120 for ROCS in roc_curves: 1548 1121 npts = [] 1549 1122 for c in ROCS: … … 1596 1169 ##  yesClassRugPoints is an array of (x, 1) points 1597 1170 ##  noClassRugPoints is an array of (x, 0) points 1598 def compute_calibration_curve(res, classIndex=1): 1171 @deprecated_keywords({"classIndex": "class_index"}) 1172 def compute_calibration_curve(res, class_index=1): 1599 1173 import corn 1600 1174 ## merge multiple iterations into one … … 1603 1177 mres.results.append( te ) 1604 1178 1605 problists, tots = corn.computeROCCumulative(mres, class Index)1179 problists, tots = corn.computeROCCumulative(mres, class_index) 1606 1180 1607 1181 results = [] … … 1658 1232 ## returns an array of curve elements, where: 1659 1233 ##  curve is an array of points ((TP+FP)/(P + N), TP/P, (th, FP/N)) on the Lift Curve 1660 def compute_lift_curve(res, classIndex=1): 1234 @deprecated_keywords({"classIndex": "class_index"}) 1235 def compute_lift_curve(res, class_index=1): 1661 1236 import corn 1662 1237 ## merge multiple iterations into one … … 1665 1240 mres.results.append( te ) 1666 1241 1667 problists, tots = corn.computeROCCumulative(mres, class Index)1242 problists, tots = corn.computeROCCumulative(mres, class_index) 1668 1243 1669 1244 results = [] … … 1693 1268 1694 1269 1695 def compute_CDT(res, classIndex=1, **argkw): 1270 @deprecated_keywords({"classIndex": "class_index"}) 1271 def compute_CDT(res, class_index=1, **argkw): 1696 1272 """Obsolete, don't use""" 1697 1273 import corn 1698 if class Index<0:1274 if class_index<0: 1699 1275 if res.baseClass>=0: 1700 class Index = res.baseClass1701 else: 1702 class Index = 11276 class_index = res.baseClass 1277 else: 1278 class_index = 1 1703 1279 1704 1280 useweights = res.weights and not argkw.get("unweighted", 0) … … 1709 1285 iterationExperiments = split_by_iterations(res) 1710 1286 for exp in iterationExperiments: 1711 expCDTs = corn.computeCDT(exp, class Index, useweights)1287 expCDTs = corn.computeCDT(exp, class_index, useweights) 1712 1288 for i in range(len(CDTs)): 1713 1289 CDTs[i].C += expCDTs[i].C … … 1716 1292 for i in range(res.number_of_learners): 1717 1293 if is_CDT_empty(CDTs[0]): 1718 return corn.computeCDT(res, class Index, useweights)1294 return corn.computeCDT(res, class_index, useweights) 1719 1295 1720 1296 return CDTs 1721 1297 else: 1722 return corn.computeCDT(res, class Index, useweights)1298 return corn.computeCDT(res, class_index, useweights) 1723 1299 1724 1300 ## THIS FUNCTION IS OBSOLETE AND ITS AVERAGING OVER FOLDS IS QUESTIONABLE … … 1764 1340 # are divided by 'divideByIfIte'. Additional flag is returned which is True in 1765 1341 # the former case, or False in the latter. 1766 def AUC_x(cdtComputer, ite, all_ite, divideByIfIte, computerArgs): 1767 cdts = cdtComputer(*(ite, ) + computerArgs) 1342 @deprecated_keywords({"divideByIfIte": "divide_by_if_ite", 1343 "computerArgs": "computer_args"}) 1344 def AUC_x(cdtComputer, ite, all_ite, divide_by_if_ite, computer_args): 1345 cdts = cdtComputer(*(ite, ) + computer_args) 1768 1346 if not is_CDT_empty(cdts[0]): 1769 return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T)/divide ByIfIte for cdt in cdts], True1347 return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T)/divide_by_if_ite for cdt in cdts], True 1770 1348 1771 1349 if all_ite: 1772 cdts = cdtComputer(*(all_ite, ) + computer Args)1350 cdts = cdtComputer(*(all_ite, ) + computer_args) 1773 1351 if not is_CDT_empty(cdts[0]): 1774 1352 return [(cdt.C+cdt.T/2)/(cdt.C+cdt.D+cdt.T) for cdt in cdts], False … … 1778 1356 1779 1357 # computes AUC between classes i and j as if there we no other classes 1780 def AUC_ij(ite, classIndex1, classIndex2, useWeights = True, all_ite = None, divideByIfIte = 1.0): 1358 @deprecated_keywords({"classIndex1": "class_index1", 1359 "classIndex2": "class_index2", 1360 "useWeights": "use_weights", 1361 "divideByIfIte": "divide_by_if_ite"}) 1362 def AUC_ij(ite, class_index1, class_index2, use_weights = True, all_ite = None, divide_by_if_ite = 1.0): 1781 1363 import corn 1782 return AUC_x(corn.computeCDTPair, ite, all_ite, divide ByIfIte, (classIndex1, classIndex2, useWeights))1364 return AUC_x(corn.computeCDTPair, ite, all_ite, divide_by_if_ite, (class_index1, class_index2, use_weights)) 1783 1365 1784 1366 1785 1367 # computes AUC between class i and the other classes (treating them as the same class) 1786 def AUC_i(ite, classIndex, useWeights = True, all_ite = None, divideByIfIte = 1.0): 1368 @deprecated_keywords({"classIndex": "class_index", 1369 "useWeights": "use_weights", 1370 "divideByIfIte": "divide_by_if_ite"}) 1371 def AUC_i(ite, class_index, use_weights = True, all_ite = None, divide_by_if_ite = 1.0): 1787 1372 import corn 1788 return AUC_x(corn.computeCDT, ite, all_ite, divide ByIfIte, (classIndex, useWeights))1789 1373 return AUC_x(corn.computeCDT, ite, all_ite, divide_by_if_ite, (class_index, use_weights)) 1374 1790 1375 1791 1376 # computes the average AUC over folds using a "AUCcomputer" (AUC_i or AUC_ij) … … 1793 1378 # fold the computer has to resort to computing over all folds or even this failed; 1794 1379 # in these cases the result is returned immediately 1795 def AUC_iterations(AUCcomputer, iterations, computerArgs): 1380 1381 @deprecated_keywords({"AUCcomputer": "auc_computer", 1382 "computerArgs": "computer_args"}) 1383 def AUC_iterations(auc_computer, iterations, computer_args): 1796 1384 subsum_aucs = [0.] * iterations[0].number_of_learners 1797 1385 for ite in iterations: 1798 aucs, foldsUsed = AUCcomputer(*(ite, ) + computerArgs)1386 aucs, foldsUsed = auc_computer(*(ite, ) + computer_args) 1799 1387 if not aucs: 1800 1388 return None … … 1806 1394 1807 1395 # AUC for binary classification problems 1808 def AUC_binary(res, useWeights = True): 1396 @deprecated_keywords({"useWeights": "use_weights"}) 1397 def AUC_binary(res, use_weights = True): 1809 1398 if res.number_of_iterations > 1: 1810 return AUC_iterations(AUC_i, split_by_iterations(res), (1, use Weights, res, res.number_of_iterations))1811 else: 1812 return AUC_i(res, 1, use Weights)[0]1399 return AUC_iterations(AUC_i, split_by_iterations(res), (1, use_weights, res, res.number_of_iterations)) 1400 else: 1401 return AUC_i(res, 1, use_weights)[0] 1813 1402 1814 1403 # AUC for multiclass problems 1815 def AUC_multi(res, useWeights = True, method = 0): 1404 @deprecated_keywords({"useWeights": "use_weights"}) 1405 def AUC_multi(res, use_weights = True, method = 0): 1816 1406 numberOfClasses = len(res.class_values) 1817 1407 … … 1833 1423 for classIndex1 in range(numberOfClasses): 1834 1424 for classIndex2 in range(classIndex1): 1835 subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, use Weights, all_ite, res.number_of_iterations))1425 subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, use_weights, all_ite, res.number_of_iterations)) 1836 1426 if subsum_aucs: 1837 1427 if method == 0: … … 1844 1434 else: 1845 1435 for classIndex in range(numberOfClasses): 1846 subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, use Weights, all_ite, res.number_of_iterations))1436 subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, use_weights, all_ite, res.number_of_iterations)) 1847 1437 if subsum_aucs: 1848 1438 if method == 0: … … 1866 1456 # Computes AUC, possibly for multiple classes (the averaging method can be specified) 1867 1457 # Results over folds are averages; if some folds examples from one class only, the folds are merged 1868 def AUC(res, method = AUC.ByWeightedPairs, useWeights = True): 1458 @deprecated_keywords({"useWeights": "use_weights"}) 1459 def AUC(res, method = AUC.ByWeightedPairs, use_weights = True): 1869 1460 """ Returns the area under ROC curve (AUC) given a set of experimental 1870 1461 results. For multivalued class problems, it will compute some sort of … … 1874 1465 raise ValueError("Cannot compute AUC on a singleclass problem") 1875 1466 elif len(res.class_values) == 2: 1876 return AUC_binary(res, use Weights)1877 else: 1878 return AUC_multi(res, use Weights, method)1467 return AUC_binary(res, use_weights) 1468 else: 1469 return AUC_multi(res, use_weights, method) 1879 1470 1880 1471 AUC.ByWeightedPairs = 0 … … 1886 1477 # Computes AUC; in multivalued class problem, AUC is computed as one against all 1887 1478 # Results over folds are averages; if some folds examples from one class only, the folds are merged 1888 def AUC_single(res, classIndex = 1, useWeights = True): 1479 @deprecated_keywords({"classIndex": "class_index", 1480 "useWeights": "use_weights"}) 1481 def AUC_single(res, class_index = 1, use_weights = True): 1889 1482 """ Computes AUC where the class given classIndex is singled out, and 1890 1483 all other classes are treated as a single class. To find how good our … … 1895 1488 classIndex = vehicle.domain.classVar.values.index("van")) 1896 1489 """ 1897 if class Index<0:1490 if class_index<0: 1898 1491 if res.baseClass>=0: 1899 class Index = res.baseClass1900 else: 1901 class Index = 11492 class_index = res.baseClass 1493 else: 1494 class_index = 1 1902 1495 1903 1496 if res.number_of_iterations > 1: 1904 return AUC_iterations(AUC_i, split_by_iterations(res), (class Index, useWeights, res, res.number_of_iterations))1905 else: 1906 return AUC_i( res, class Index, useWeights)[0]1497 return AUC_iterations(AUC_i, split_by_iterations(res), (class_index, use_weights, res, res.number_of_iterations)) 1498 else: 1499 return AUC_i( res, class_index, use_weights)[0] 1907 1500 1908 1501 # Computes AUC for a pair of classes (as if there were no other classes) 1909 1502 # Results over folds are averages; if some folds have examples from one class only, the folds are merged 1910 def AUC_pair(res, classIndex1, classIndex2, useWeights = True): 1503 @deprecated_keywords({"classIndex1": "class_index1", 1504 "classIndex2": "class_index2", 1505 "useWeights": "use_weights"}) 1506 def AUC_pair(res, class_index1, class_index2, use_weights = True): 1911 1507 """ Computes AUC between a pair of instances, ignoring instances from all 1912 1508 other classes. 1913 1509 """ 1914 1510 if res.number_of_iterations > 1: 1915 return AUC_iterations(AUC_ij, split_by_iterations(res), (class Index1, classIndex2, useWeights, res, res.number_of_iterations))1916 else: 1917 return AUC_ij(res, class Index1, classIndex2, useWeights)1511 return AUC_iterations(AUC_ij, split_by_iterations(res), (class_index1, class_index2, use_weights, res, res.number_of_iterations)) 1512 else: 1513 return AUC_ij(res, class_index1, class_index2, use_weights) 1918 1514 1919 1515 1920 1516 # AUC for multiclass problems 1921 def AUC_matrix(res, useWeights = True): 1517 @deprecated_keywords({"useWeights": "use_weights"}) 1518 def AUC_matrix(res, use_weights = True): 1922 1519 """ Computes a (lower diagonal) matrix with AUCs for all pairs of classes. 1923 1520 If there are empty classes, the corresponding elements in the matrix … … 1944 1541 for classIndex1 in range(numberOfClasses): 1945 1542 for classIndex2 in range(classIndex1): 1946 pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, use Weights, all_ite, res.number_of_iterations))1543 pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, use_weights, all_ite, res.number_of_iterations)) 1947 1544 if pair_aucs: 1948 1545 for lrn in range(number_of_learners): … … 2080 1677 2081 1678 2082 def plot_learning_curve_learners(file, allResults, proportions, learners, noConfidence=0): 2083 plot_learning_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], noConfidence) 2084 2085 def plot_learning_curve(file, allResults, proportions, legend, noConfidence=0): 1679 @deprecated_keywords({"allResults": "all_results", 1680 "noConfidence": "no_confidence"}) 1681 def plot_learning_curve_learners(file, all_results, proportions, learners, no_confidence=0): 1682 plot_learning_curve(file, all_results, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], no_confidence) 1683 1684 1685 @deprecated_keywords({"allResults": "all_results", 1686 "noConfidence": "no_confidence"}) 1687 def plot_learning_curve(file, all_results, proportions, legend, no_confidence=0): 2086 1688 import types 2087 1689 fopened=0 2088 if (type(file)==types.StringType):1690 if type(file)==types.StringType: 2089 1691 file=open(file, "wt") 2090 1692 fopened=1 … … 2093 1695 file.write("set xrange [%f:%f]\n" % (proportions[0], proportions[1])) 2094 1696 file.write("set multiplot\n\n") 2095 CAs = [CA_dev(x) for x in all Results]1697 CAs = [CA_dev(x) for x in all_results] 2096 1698 2097 1699 file.write("plot \\\n") 2098 1700 for i in range(len(legend)1): 2099 if not no Confidence:1701 if not no_confidence: 2100 1702 file.write("'' title '' with yerrorbars pointtype %i,\\\n" % (i+1)) 2101 1703 file.write("'' title '%s' with linespoints pointtype %i,\\\n" % (legend[i], i+1)) 2102 if not no Confidence:1704 if not no_confidence: 2103 1705 file.write("'' title '' with yerrorbars pointtype %i,\\\n" % (len(legend))) 2104 1706 file.write("'' title '%s' with linespoints pointtype %i\n" % (legend[1], len(legend))) 2105 1707 2106 1708 for i in range(len(legend)): 2107 if not no Confidence:1709 if not no_confidence: 2108 1710 for p in range(len(proportions)): 2109 1711 file.write("%f\t%f\t%f\n" % (proportions[p], CAs[p][i][0], 1.96*CAs[p][i][1])) … … 2162 1764 2163 1765 2164 2165 def plot_McNemar_curve_learners(file, allResults, proportions, learners, reference=1): 2166 plot_McNemar_curve(file, allResults, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], reference) 2167 2168 def plot_McNemar_curve(file, allResults, proportions, legend, reference=1): 1766 @deprecated_keywords({"allResults": "all_results"}) 1767 def plot_McNemar_curve_learners(file, all_results, proportions, learners, reference=1): 1768 plot_McNemar_curve(file, all_results, proportions, [Orange.misc.getobjectname(learners[i], "Learner %i" % i) for i in range(len(learners))], reference) 1769 1770 1771 @deprecated_keywords({"allResults": "all_results"}) 1772 def plot_McNemar_curve(file, all_results, proportions, legend, reference=1): 2169 1773 if reference<0: 2170 1774 reference=len(legend)1 … … 2188 1792 for i in tmap: 2189 1793 for p in range(len(proportions)): 2190 file.write("%f\t%f\n" % (proportions[p], McNemar_of_two(all Results[p], i, reference)))1794 file.write("%f\t%f\n" % (proportions[p], McNemar_of_two(all_results[p], i, reference))) 2191 1795 file.write("e\n\n") 2192 1796 … … 2197 1801 default_line_types=("\\setsolid", "\\setdashpattern <4pt, 2pt>", "\\setdashpattern <8pt, 2pt>", "\\setdashes", "\\setdots") 2198 1802 2199 def learning_curve_learners_to_PiCTeX(file, allResults, proportions, **options): 2200 return apply(learning_curve_to_PiCTeX, (file, allResults, proportions), options) 2201 2202 def learning_curve_to_PiCTeX(file, allResults, proportions, **options): 1803 @deprecated_keywords({"allResults": "all_results"}) 1804 def learning_curve_learners_to_PiCTeX(file, all_results, proportions, **options): 1805 return apply(learning_curve_to_PiCTeX, (file, all_results, proportions), options) 1806 1807 1808 @deprecated_keywords({"allResults": "all_results"}) 1809 def learning_curve_to_PiCTeX(file, all_results, proportions, **options): 2203 1810 import types 2204 1811 fopened=0 … … 2207 1814 fopened=1 2208 1815 2209 nexamples=len(all Results[0].results)2210 CAs = [CA_dev(x) for x in all Results]1816 nexamples=len(all_results[0].results) 1817 CAs = [CA_dev(x) for x in all_results] 2211 1818 2212 1819 graphsize=float(options.get("graphsize", 10.0)) #cm 
docs/reference/rst/Orange.evaluation.scoring.rst
r9372 r9892 1 1 .. automodule:: Orange.evaluation.scoring 2 3 ############################ 4 Method scoring (``scoring``) 5 ############################ 6 7 .. index: scoring 8 9 This module contains various measures of quality for classification and 10 regression. Most functions require an argument named :obj:`res`, an instance of 11 :class:`Orange.evaluation.testing.ExperimentResults` as computed by 12 functions from :mod:`Orange.evaluation.testing` and which contains 13 predictions obtained through crossvalidation, 14 leave oneout, testing on training data or test set instances. 15 16 ============== 17 Classification 18 ============== 19 20 To prepare some data for examples on this page, we shall load the voting data 21 set (problem of predicting the congressman's party (republican, democrat) 22 based on a selection of votes) and evaluate naive Bayesian learner, 23 classification trees and majority classifier using crossvalidation. 24 For examples requiring a multivalued class problem, we shall do the same 25 with the vehicle data set (telling whether a vehicle described by the features 26 extracted from a picture is a van, bus, or Opel or Saab car). 27 28 Basic cross validation example is shown in the following part of 29 (:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`): 30 31 .. literalinclude:: code/statExample0.py 32 33 If instances are weighted, weights are taken into account. This can be 34 disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of 35 disabling weights is to clear the 36 :class:`Orange.evaluation.testing.ExperimentResults`' flag weights. 37 38 General Measures of Quality 39 =========================== 40 41 .. autofunction:: CA 42 43 .. autofunction:: AP 44 45 .. autofunction:: Brier_score 46 47 .. autofunction:: IS 48 49 So, let's compute all this in part of 50 (:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`) and print it out: 51 52 .. literalinclude:: code/statExample1.py 53 :lines: 13 54 55 The output should look like this:: 56 57 method CA AP Brier IS 58 bayes 0.903 0.902 0.175 0.759 59 tree 0.846 0.845 0.286 0.641 60 majorty 0.614 0.526 0.474 0.000 61 62 Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out 63 the standard errors. 64 65 Confusion Matrix 66 ================ 67 68 .. autofunction:: confusion_matrices 69 70 **A positivenegative confusion matrix** is computed (a) if the class is 71 binary unless :obj:`classIndex` argument is 2, (b) if the class is 72 multivalued and the :obj:`classIndex` is nonnegative. Argument 73 :obj:`classIndex` then tells which class is positive. In case (a), 74 :obj:`classIndex` may be omitted; the first class 75 is then negative and the second is positive, unless the :obj:`baseClass` 76 attribute in the object with results has nonnegative value. In that case, 77 :obj:`baseClass` is an index of the target class. :obj:`baseClass` 78 attribute of results object should be set manually. The result of a 79 function is a list of instances of class :class:`ConfusionMatrix`, 80 containing the (weighted) number of true positives (TP), false 81 negatives (FN), false positives (FP) and true negatives (TN). 82 83 We can also add the keyword argument :obj:`cutoff` 84 (e.g. confusion_matrices(results, cutoff=0.3); if we do, :obj:`confusion_matrices` 85 will disregard the classifiers' class predictions and observe the predicted 86 probabilities, and consider the prediction "positive" if the predicted 87 probability of the positive class is higher than the :obj:`cutoff`. 88 89 The example (part of :download:`statExamples.py <code/statExamples.py>`) below shows how setting the 90 cut off threshold from the default 0.5 to 0.2 affects the confusion matrics 91 for naive Bayesian classifier:: 92 93 cm = Orange.evaluation.scoring.confusion_matrices(res)[0] 94 print "Confusion matrix for naive Bayes:" 95 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) 96 97 cm = Orange.evaluation.scoring.confusion_matrices(res, cutoff=0.2)[0] 98 print "Confusion matrix for naive Bayes:" 99 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) 100 101 The output:: 102 103 Confusion matrix for naive Bayes: 104 TP: 238, FP: 13, FN: 29.0, TN: 155 105 Confusion matrix for naive Bayes: 106 TP: 239, FP: 18, FN: 28.0, TN: 150 107 108 shows that the number of true positives increases (and hence the number of 109 false negatives decreases) by only a single instance, while five instances 110 that were originally true negatives become false positives due to the 111 lower threshold. 112 113 To observe how good are the classifiers in detecting vans in the vehicle 114 data set, we would compute the matrix like this:: 115 116 cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \ 117 vehicle.domain.classVar.values.index("van")) 118 119 and get the results like these:: 120 121 TP: 189, FP: 241, FN: 10.0, TN: 406 122 123 while the same for class "opel" would give:: 124 125 TP: 86, FP: 112, FN: 126.0, TN: 522 126 127 The main difference is that there are only a few false negatives for the 128 van, meaning that the classifier seldom misses it (if it says it's not a 129 van, it's almost certainly not a van). Not so for the Opel car, where the 130 classifier missed 126 of them and correctly detected only 86. 131 132 **General confusion matrix** is computed (a) in case of a binary class, 133 when :obj:`classIndex` is set to 2, (b) when we have multivalued class and 134 the caller doesn't specify the :obj:`classIndex` of the positive class. 135 When called in this manner, the function cannot use the argument 136 :obj:`cutoff`. 137 138 The function then returns a threedimensional matrix, where the element 139 A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`] 140 gives the number of instances belonging to 'actual_class' for which the 141 'learner' predicted 'predictedClass'. We shall compute and print out 142 the matrix for naive Bayesian classifier. 143 144 Here we see another example from :download:`statExamples.py <code/statExamples.py>`:: 145 146 cm = Orange.evaluation.scoring.confusion_matrices(resVeh)[0] 147 classes = vehicle.domain.classVar.values 148 print "\t"+"\t".join(classes) 149 for className, classConfusions in zip(classes, cm): 150 print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions)) 151 152 So, here's what this nice piece of code gives:: 153 154 bus van saab opel 155 bus 56 95 21 46 156 van 6 189 4 0 157 saab 3 75 73 66 158 opel 4 71 51 86 159 160 Van's are clearly simple: 189 vans were classified as vans (we know this 161 already, we've printed it out above), and the 10 misclassified pictures 162 were classified as buses (6) and Saab cars (4). In all other classes, 163 there were more instances misclassified as vans than correctly classified 164 instances. The classifier is obviously quite biased to vans. 165 166 .. method:: sens(confm) 167 .. method:: spec(confm) 168 .. method:: PPV(confm) 169 .. method:: NPV(confm) 170 .. method:: precision(confm) 171 .. method:: recall(confm) 172 .. method:: F2(confm) 173 .. method:: Falpha(confm, alpha=2.0) 174 .. method:: MCC(conf) 175 176 With the confusion matrix defined in terms of positive and negative 177 classes, you can also compute the 178 `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_(tests)>`_ 179 [TP/(TP+FN)], `specificity \ 180 <http://en.wikipedia.org/wiki/Specificity_%28tests%29>`_ 181 [TN/(TN+FP)], `positive predictive value \ 182 <http://en.wikipedia.org/wiki/Positive_predictive_value>`_ 183 [TP/(TP+FP)] and `negative predictive value \ 184 <http://en.wikipedia.org/wiki/Negative_predictive_value>`_ [TN/(TN+FN)]. 185 In information retrieval, positive predictive value is called precision 186 (the ratio of the number of relevant records retrieved to the total number 187 of irrelevant and relevant records retrieved), and sensitivity is called 188 `recall <http://en.wikipedia.org/wiki/Information_retrieval>`_ 189 (the ratio of the number of relevant records retrieved to the total number 190 of relevant records in the database). The 191 `harmonic mean <http://en.wikipedia.org/wiki/Harmonic_mean>`_ of precision 192 and recall is called an 193 `Fmeasure <http://en.wikipedia.org/wiki/Fmeasure>`_, where, depending 194 on the ratio of the weight between precision and recall is implemented 195 as F1 [2*precision*recall/(precision+recall)] or, for a general case, 196 Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)]. 197 The `Matthews correlation coefficient \ 198 <http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_ 199 in essence a correlation coefficient between 200 the observed and predicted binary classifications; it returns a value 201 between 1 and +1. A coefficient of +1 represents a perfect prediction, 202 0 an average random prediction and 1 an inverse prediction. 203 204 If the argument :obj:`confm` is a single confusion matrix, a single 205 result (a number) is returned. If confm is a list of confusion matrices, 206 a list of scores is returned, one for each confusion matrix. 207 208 Note that weights are taken into account when computing the matrix, so 209 these functions don't check the 'weighted' keyword argument. 210 211 Let us print out sensitivities and specificities of our classifiers in 212 part of :download:`statExamples.py <code/statExamples.py>`:: 213 214 cm = Orange.evaluation.scoring.confusion_matrices(res) 215 print 216 print "method\tsens\tspec" 217 for l in range(len(learners)): 218 print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l])) 219 220 ROC Analysis 221 ============ 222 223 `Receiver Operating Characteristic \ 224 <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_ 225 (ROC) analysis was initially developed for 226 a binarylike problems and there is no consensus on how to apply it in 227 multiclass problems, nor do we know for sure how to do ROC analysis after 228 cross validation and similar multiple sampling techniques. If you are 229 interested in the area under the curve, function AUC will deal with those 230 problems as specifically described below. 231 232 .. autofunction:: AUC 233 234 .. attribute:: AUC.ByWeightedPairs (or 0) 235 236 Computes AUC for each pair of classes (ignoring instances of all other 237 classes) and averages the results, weighting them by the number of 238 pairs of instances from these two classes (e.g. by the product of 239 probabilities of the two classes). AUC computed in this way still 240 behaves as concordance index, e.g., gives the probability that two 241 randomly chosen instances from different classes will be correctly 242 recognized (this is of course true only if the classifier knows 243 from which two classes the instances came). 244 245 .. attribute:: AUC.ByPairs (or 1) 246 247 Similar as above, except that the average over class pairs is not 248 weighted. This AUC is, like the binary, independent of class 249 distributions, but it is not related to concordance index any more. 250 251 .. attribute:: AUC.WeightedOneAgainstAll (or 2) 252 253 For each class, it computes AUC for this class against all others (that 254 is, treating other classes as one class). The AUCs are then averaged by 255 the class probabilities. This is related to concordance index in which 256 we test the classifier's (average) capability for distinguishing the 257 instances from a specified class from those that come from other classes. 258 Unlike the binary AUC, the measure is not independent of class 259 distributions. 260 261 .. attribute:: AUC.OneAgainstAll (or 3) 262 263 As above, except that the average is not weighted. 264 265 In case of multiple folds (for instance if the data comes from cross 266 validation), the computation goes like this. When computing the partial 267 AUCs for individual pairs of classes or singledout classes, AUC is 268 computed for each fold separately and then averaged (ignoring the number 269 of instances in each fold, it's just a simple average). However, if a 270 certain fold doesn't contain any instances of a certain class (from the 271 pair), the partial AUC is computed treating the results as if they came 272 from a singlefold. This is not really correct since the class 273 probabilities from different folds are not necessarily comparable, 274 yet this will most often occur in a leaveoneout experiments, 275 comparability shouldn't be a problem. 276 277 Computing and printing out the AUC's looks just like printing out 278 classification accuracies (except that we call AUC instead of 279 CA, of course):: 280 281 AUCs = Orange.evaluation.scoring.AUC(res) 282 for l in range(len(learners)): 283 print "%10s: %5.3f" % (learners[l].name, AUCs[l]) 284 285 For vehicle, you can run exactly this same code; it will compute AUCs 286 for all pairs of classes and return the average weighted by probabilities 287 of pairs. Or, you can specify the averaging method yourself, like this:: 288 289 AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll) 290 291 The following snippet tries out all four. (We don't claim that this is 292 how the function needs to be used; it's better to stay with the default.):: 293 294 methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"] 295 print " " *25 + " \tbayes\ttree\tmajority" 296 for i in range(4): 297 AUCs = Orange.evaluation.scoring.AUC(resVeh, i) 298 print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs)) 299 300 As you can see from the output:: 301 302 bayes tree majority 303 by pairs, weighted: 0.789 0.871 0.500 304 by pairs: 0.791 0.872 0.500 305 one vs. all, weighted: 0.783 0.800 0.500 306 one vs. all: 0.783 0.800 0.500 307 308 .. autofunction:: AUC_single 309 310 .. autofunction:: AUC_pair 311 312 .. autofunction:: AUC_matrix 313 314 The remaining functions, which plot the curves and statistically compare 315 them, require that the results come from a test with a single iteration, 316 and they always compare one chosen class against all others. If you have 317 cross validation results, you can either use split_by_iterations to split the 318 results by folds, call the function for each fold separately and then sum 319 the results up however you see fit, or you can set the ExperimentResults' 320 attribute number_of_iterations to 1, to cheat the function  at your own 321 responsibility for the statistical correctness. Regarding the multiclass 322 problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class 323 attribute's baseValue at the time when results were computed. If baseValue 324 was not given at that time, 1 (that is, the second class) is used as default. 325 326 We shall use the following code to prepare suitable experimental results:: 327 328 ri2 = Orange.core.MakeRandomIndices2(voting, 0.6) 329 train = voting.selectref(ri2, 0) 330 test = voting.selectref(ri2, 1) 331 res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test) 332 333 334 .. autofunction:: AUCWilcoxon 335 336 .. autofunction:: compute_ROC 337 338 Comparison of Algorithms 339  340 341 .. autofunction:: McNemar 342 343 .. autofunction:: McNemar_of_two 344 345 ========== 346 Regression 347 ========== 348 349 General Measure of Quality 350 ========================== 351 352 Several alternative measures, as given below, can be used to evaluate 353 the sucess of numeric prediction: 354 355 .. image:: files/statRegression.png 356 357 .. autofunction:: MSE 358 359 .. autofunction:: RMSE 360 361 .. autofunction:: MAE 362 363 .. autofunction:: RSE 364 365 .. autofunction:: RRSE 366 367 .. autofunction:: RAE 368 369 .. autofunction:: R2 370 371 The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to 372 score several regression methods. 373 374 .. literalinclude:: code/statExamplesRegression.py 375 376 The code above produces the following output:: 377 378 Learner MSE RMSE MAE RSE RRSE RAE R2 379 maj 84.585 9.197 6.653 1.002 1.001 1.001 0.002 380 rt 40.015 6.326 4.592 0.474 0.688 0.691 0.526 381 knn 21.248 4.610 2.870 0.252 0.502 0.432 0.748 382 lr 24.092 4.908 3.425 0.285 0.534 0.515 0.715 383 384 ================= 385 Ploting functions 386 ================= 387 388 .. autofunction:: graph_ranks 389 390 The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph: 391 392 .. literalinclude:: code/statExamplesGraphRanks.py 393 394 Code produces the following graph: 395 396 .. image:: files/statExamplesGraphRanks1.png 397 398 .. autofunction:: compute_CD 399 400 .. autofunction:: compute_friedman 401 402 ================= 403 Utility Functions 404 ================= 405 406 .. autofunction:: split_by_iterations 407 408 ===================================== 409 Scoring for multilabel classification 410 ===================================== 411 412 Multilabel classification requries different metrics than those used in traditional singlelabel 413 classification. This module presents the various methrics that have been proposed in the literature. 414 Let :math:`D` be a multilabel evaluation data set, conisting of :math:`D` multilabel examples 415 :math:`(x_i,Y_i)`, :math:`i=1..D`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multilabel classifier 416 and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`. 417 418 .. autofunction:: mlc_hamming_loss 419 .. autofunction:: mlc_accuracy 420 .. autofunction:: mlc_precision 421 .. autofunction:: mlc_recall 422 423 So, let's compute all this and print it out (part of 424 :download:`mlcevaluate.py <code/mlcevaluate.py>`, uses 425 :download:`emotions.tab <code/emotions.tab>`): 426 427 .. literalinclude:: code/mlcevaluate.py 428 :lines: 115 429 430 The output should look like this:: 431 432 loss= [0.9375] 433 accuracy= [0.875] 434 precision= [1.0] 435 recall= [0.875] 436 437 References 438 ========== 439 440 Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multilabel scene classification', 441 Pattern Recogintion, vol.37, no.9, pp:175771 442 443 Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multilabeled Classification', paper 444 presented to Proceedings of the 8th PacificAsia Conference on Knowledge Discovery and Data Mining 445 (PAKDD 2004) 446 447 Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bosstingbased system for text categorization', 448 Machine Learning, vol.39, no.2/3, pp:13568.
Note: See TracChangeset
for help on using the changeset viewer.