Changeset 7464:7327feaec329 in orange
 Timestamp:
 02/04/11 16:13:54 (3 years ago)
 Branch:
 default
 Convert:
 189b44b77d59dcc83f2603d96b5024114547ff30
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

orange/Orange/evaluation/scoring.py
r7396 r7464 2 2 3 3 .. index: scoring 4 5 =================================6 Orange Statistics for Predictions7 =================================8 4 9 5 This module contains various measures of quality for classification and 10 6 regression. Most functions require an argument named res, an instance of 11 :class:Orange.evaluation.testing.ExperimentResults as computed by functions from orngTest and which contains 12 predictions obtained through crossvalidation, leave oneout, testing on 13 training data or test set examples. 7 :class:`Orange.evaluation.testing.ExperimentResults` as computed by 8 functions from orngTest and which contains predictions obtained through 9 crossvalidation, leave oneout, testing on training data or test set examples. 10 11 ============== 12 Classification 13 ============== 14 15 To prepare some data for examples on this page, we shall load the voting data 16 set (problem of predicting the congressman's party (republican, democrat) 17 based on a selection of votes) and evaluate naive Bayesian learner, 18 classification trees and majority classifier using crossvalidation. 19 For examples requiring a multivalued class problem, we shall do the same 20 with the vehicle data set (telling whether a vehicle described by the features 21 extracted from a picture is a van, bus, or Opel or Saab car). 22 23 Basic cross validation example is shown in the following part of 24 (`statExamples.py`_, uses `voting.tab`_ and `vehicle.tab`_): 25 26 .. literalinclude:: code/statExample0.py 27 28 .. _voting.tab: code/voting.tab 29 .. _vehicle.tab: code/vehicle.tab 30 .. _statExamples.py: code/statExamples.py 31 32 If instances are weighted, weights are taken into account. This can be 33 disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of 34 disabling weights is to clear the 35 :class:`Orange.evaluation.testing.ExperimentResults`' flag weights. 36 37 General Measures of Quality 38 =========================== 39 40 .. autofunction:: CA 41 42 .. autofunction:: AP 43 44 .. autofunction:: BrierScore 45 46 .. autofunction:: IS 47 48 So, let's compute all this in part of 49 (`statExamples.py`_, uses `voting.tab`_ and `vehicle.tab`_) and print it out: 50 51 .. literalinclude:: code/statExample1.py 52 53 .. _voting.tab: code/voting.tab 54 .. _vehicle.tab: code/vehicle.tab 55 .. _statExamples.py: code/statExamples.py 56 57 The output should look like this:: 58 59 method CA AP Brier IS 60 bayes 0.903 0.902 0.175 0.759 61 tree 0.846 0.845 0.286 0.641 62 majrty 0.614 0.526 0.474 0.000 63 64 Script `statExamples.py`_ contains another example that also prints out 65 the standard errors. 66 67 .. _statExamples.py: code/statExamples.py 68 69 Confusion Matrix 70 ================ 71 72 .. autofunction:: confusionMatrices 73 74 **A positivenegative confusion matrix** is computed (a) if the class is 75 binary unless classIndex argument is 2, (b) if the class is multivalued 76 and the classIndex is nonnegative. Argument classIndex then tells which 77 class is positive. In case (a), classIndex may be omited; the first class 78 is then negative and the second is positive, unless the baseClass attribute 79 in the object with results has nonnegative value. In that case, baseClass 80 is an index of the traget class. baseClass attribute of results object 81 should be set manually. The result of a function is a list of instances 82 of class ConfusionMatrix, containing the (weighted) number of true 83 positives (TP), false negatives (FN), false positives (FP) and true 84 negatives (TN). 85 86 We can also add the keyword argument cutoff 87 (e.g. confusionMatrices(results, cutoff=0.3); if we do, confusionMatrices 88 will disregard the classifiers' class predictions and observe the predicted 89 probabilities, and consider the prediction "positive" if the predicted 90 probability of the positive class is higher than the cutoff. 91 92 The example (part of `statExamples.py`_) below shows how setting the 93 cut off threshold from the default 0.5 to 0.2 affects the confusion matrics 94 for naive Bayesian classifier:: 95 96 cm = orngStat.confusionMatrices(res)[0] 97 print "Confusion matrix for naive Bayes:" 98 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) 99 100 cm = orngStat.confusionMatrices(res, cutoff=0.2)[0] 101 print "Confusion matrix for naive Bayes:" 102 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) 103 104 .. _statExamples.py: code/statExamples.py 105 106 The output:: 107 108 Confusion matrix for naive Bayes: 109 TP: 238, FP: 13, FN: 29.0, TN: 155 110 Confusion matrix for naive Bayes: 111 TP: 239, FP: 18, FN: 28.0, TN: 150 112 113 shows that the number of true positives increases (and hence the number of 114 false negatives decreases) by only a single example, while five examples 115 that were originally true negatives become false positives due to the 116 lower threshold. 117 118 To observe how good are the classifiers in detecting vans in the vehicle 119 data set, we would compute the matrix like this:: 120 121 cm = orngStat.confusionMatrices(resVeh, \ 122 vehicle.domain.classVar.values.index("van")) 123 124 and get the results like these:: 125 126 TP: 189, FP: 241, FN: 10.0, TN: 406 127 128 while the same for class "opel" would give:: 129 130 TP: 86, FP: 112, FN: 126.0, TN: 522 131 132 The main difference is that there are only a few false negatives for the 133 van, meaning that the classifier seldom misses it (if it says it's not a 134 van, it's almost certainly not a van). Not so for the Opel car, where the 135 classifier missed 126 of them and correctly detected only 86. 136 137 **General confusion matrix** is computed (a) in case of a binary class, 138 when :obj:`classIndex` is set to 2, (b) when we have multivalued class and 139 the caller doesn't specify the :obj:`classIndex` of the positive class. 140 When called in this manner, the function cannot use the argument 141 :obj:`cutoff`. 142 143 The function then returns a threedimensional matrix, where the element 144 A[:obj:`learner`][:obj:`actualClass`][:obj:`predictedClass`] 145 gives the number of examples belonging to 'actualClass' for which the 146 'learner' predicted 'predictedClass'. We shall compute and print out 147 the matrix for naive Bayesian classifier. 148 149 Here we see another example from `statExamples.py`_:: 150 151 cm = orngStat.confusionMatrices(resVeh)[0] 152 classes = vehicle.domain.classVar.values 153 print "\t"+"\t".join(classes) 154 for className, classConfusions in zip(classes, cm): 155 print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions)) 156 157 .. _statExamples.py: code/statExamples.py 158 159 So, here's what this nice piece of code gives:: 160 161 bus van saab opel 162 bus 56 95 21 46 163 van 6 189 4 0 164 saab 3 75 73 66 165 opel 4 71 51 86 166 167 Van's are clearly simple: 189 vans were classified as vans (we know this 168 already, we've printed it out above), and the 10 misclassified pictures 169 were classified as buses (6) and Saab cars (4). In all other classes, 170 there were more examples misclassified as vans than correctly classified 171 examples. The classifier is obviously quite biased to vans. 172 173 .. method:: sens(confm) 174 .. method:: spec(confm) 175 .. method:: PPV(confm) 176 .. method:: NPV(confm) 177 .. method:: precision(confm) 178 .. method:: recall(confm) 179 .. method:: F2(confm) 180 .. method:: Falpha(confm, alpha=2.0) 181 .. method:: MCC(conf) 182 183 With the confusion matrix defined in terms of positive and negative 184 classes, you can also compute the 185 `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_(tests)>`_ 186 [TP/(TP+FN)], `specificity \ 187 <http://en.wikipedia.org/wiki/Specificity_%28tests%29>`_ 188 [TN/(TN+FP)], `positive predictive value \ 189 <http://en.wikipedia.org/wiki/Positive_predictive_value>`_ 190 [TP/(TP+FP)] and `negative predictive value \ 191 <http://en.wikipedia.org/wiki/Negative_predictive_value>`_ [TN/(TN+FN)]. 192 In information retrieval, positive predictive value is called precision 193 (the ratio of the number of relevant records retrieved to the total number 194 of irrelevant and relevant records retrieved), and sensitivity is called 195 `recall <http://en.wikipedia.org/wiki/Information_retrieval>`_ 196 (the ratio of the number of relevant records retrieved to the total number 197 of relevant records in the database). The 198 `harmonic mean <http://en.wikipedia.org/wiki/Harmonic_mean>`_ of precision 199 and recall is called an 200 `Fmeasure <http://en.wikipedia.org/wiki/Fmeasure>`_, where, depending 201 on the ratio of the weight between precision and recall is implemented 202 as F1 [2*precision*recall/(precision+recall)] or, for a general case, 203 Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)]. 204 The `Matthews correlation coefficient \ 205 <http://en.wikipedia.org/wiki/Matthews_correlation_coefficient Matthews>`_ 206 in essence a correlation coefficient between 207 the observed and predicted binary classifications; it returns a value 208 between 1 and +1. A coefficient of +1 represents a perfect prediction, 209 0 an average random prediction and 1 an inverse prediction. 210 211 If the argument :obj:`confm` is a single confusion matrix, a single 212 result (a number) is returned. If confm is a list of confusion matrices, 213 a list of scores is returned, one for each confusion matrix. 214 215 Note that weights are taken into account when computing the matrix, so 216 these functions don't check the 'weighted' keyword argument. 217 218 Let us print out sensitivities and specificities of our classifiers in 219 part of `statExamples.py`_:: 220 221 cm = orngStat.confusionMatrices(res) 222 print 223 print "method\tsens\tspec" 224 for l in range(len(learners)): 225 print "%s\t%5.3f\t%5.3f" % (learners[l].name, orngStat.sens(cm[l]), orngStat.spec(cm[l])) 226 227 .. _statExamples.py: code/statExamples.py 228 229 ROC Analysis 230 ============ 231 232 `Receiver Operating Characteristic \ 233 <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_ 234 (ROC) analysis was initially developed for 235 a binarylike problems and there is no consensus on how to apply it in 236 multiclass problems, nor do we know for sure how to do ROC analysis after 237 cross validation and similar multiple sampling techniques. If you are 238 interested in the area under the curve, function AUC will deal with those 239 problems as specifically described below. 240 241 .. autofunction:: AUC 242 243 .. attribute:: AUC.ByWeightedPairs (or 0) 244 245 Computes AUC for each pair of classes (ignoring examples of all other 246 classes) and averages the results, weighting them by the number of 247 pairs of examples from these two classes (e.g. by the product of 248 probabilities of the two classes). AUC computed in this way still 249 behaves as concordance index, e.g., gives the probability that two 250 randomly chosen examples from different classes will be correctly 251 recognized (this is of course true only if the classifier knows 252 from which two classes the examples came). 253 254 .. attribute:: AUC.ByPairs (or 1) 255 256 Similar as above, except that the average over class pairs is not 257 weighted. This AUC is, like the binary, independent of class 258 distributions, but it is not related to concordance index any more. 259 260 .. attribute:: AUC.WeightedOneAgainstAll (or 2) 261 262 For each class, it computes AUC for this class against all others (that 263 is, treating other classes as one class). The AUCs are then averaged by 264 the class probabilities. This is related to concordance index in which 265 we test the classifier's (average) capability for distinguishing the 266 examples from a specified class from those that come from other classes. 267 Unlike the binary AUC, the measure is not independent of class 268 distributions. 269 270 .. attribute:: AUC.OneAgainstAll (or 3) 271 272 As above, except that the average is not weighted. 273 274 In case of :obj:`multiple folds` (for instance if the data comes from cross 275 validation), the computation goes like this. When computing the partial 276 AUCs for individual pairs of classes or singledout classes, AUC is 277 computed for each fold separately and then averaged (ignoring the number 278 of examples in each fold, it's just a simple average). However, if a 279 certain fold doesn't contain any examples of a certain class (from the 280 pair), the partial AUC is computed treating the results as if they came 281 from a singlefold. This is not really correct since the class 282 probabilities from different folds are not necessarily comparable, 283 yet this will most often occur in a leaveoneout experiments, 284 comparability shouldn't be a problem. 285 286 Computing and printing out the AUC's looks just like printing out 287 classification accuracies (except that we call AUC instead of 288 CA, of course):: 289 290 AUCs = orngStat.AUC(res) 291 for l in range(len(learners)): 292 print "%10s: %5.3f" % (learners[l].name, AUCs[l]) 293 294 For vehicle, you can run exactly this same code; it will compute AUCs 295 for all pairs of classes and return the average weighted by probabilities 296 of pairs. Or, you can specify the averaging method yourself, like this:: 297 298 AUCs = orngStat.AUC(resVeh, orngStat.AUC.WeightedOneAgainstAll) 299 300 The following snippet tries out all four. (We don't claim that this is 301 how the function needs to be used; it's better to stay with the default.):: 302 303 methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"] 304 print " " *25 + " \tbayes\ttree\tmajority" 305 for i in range(4): 306 AUCs = orngStat.AUC(resVeh, i) 307 print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs)) 308 309 As you can see from the output:: 310 311 bayes tree majority 312 by pairs, weighted: 0.789 0.871 0.500 313 by pairs: 0.791 0.872 0.500 314 one vs. all, weighted: 0.783 0.800 0.500 315 one vs. all: 0.783 0.800 0.500 316 317 .. autofunction:: AUC_single 318 319 .. autofunction:: AUC_pair 320 321 .. autofunction:: AUC_matrix 322 323 The remaining functions, which plot the curves and statistically compare 324 them, require that the results come from a test with a single iteration, 325 and they always compare one chosen class against all others. If you have 326 cross validation results, you can either use splitByIterations to split the 327 results by folds, call the function for each fold separately and then sum 328 the results up however you see fit, or you can set the ExperimentResults' 329 attribute numberOfIterations to 1, to cheat the function  at your own 330 responsibility for the statistical correctness. Regarding the multiclass 331 problems, if you don't chose a specific class, orngStat will use the class 332 attribute's baseValue at the time when results were computed. If baseValue 333 was not given at that time, 1 (that is, the second class) is used as default. 334 335 We shall use the following code to prepare suitable experimental results:: 336 337 ri2 = orange.MakeRandomIndices2(voting, 0.6) 338 train = voting.selectref(ri2, 0) 339 test = voting.selectref(ri2, 1) 340 res1 = orngTest.learnAndTestOnTestData(learners, train, test) 341 342 343 .. autofunction:: AUCWilcoxon 344 345 .. autofunction:: computeROC 346 347 Comparison of Algorithms 348  349 350 .. autofunction:: McNemar 351 352 .. autofunction:: McNemarOfTwo 353 354 ========== 355 Regression 356 ========== 357 358 General measure of quality 359 ========================== 360 361 Several alternative measures, as given below, can be used to evaluate 362 the sucess of numeric prediction: 363 364 .. image:: files/statRegression.png 365 366 .. autofunction:: MSE 367 368 .. autofunction:: RMSE 369 370 .. autofunction:: MAE 371 372 .. autofunction:: RSE 373 374 .. autofunction:: RRSE 375 376 .. autofunction:: RAE 377 378 .. autofunction:: R2 379 380 The following code (`statExamples.py`_) uses most of the above measures to 381 score several regression methods. 382 383 .. literalinclude:: code/statExamplesRegression.py 384 385 .. _statExamples.py: code/statExamples.py 386 387 The code above produces the following output:: 388 389 Learner MSE RMSE MAE RSE RRSE RAE R2 390 maj 84.585 9.197 6.653 1.002 1.001 1.001 0.002 391 rt 40.015 6.326 4.592 0.474 0.688 0.691 0.526 392 knn 21.248 4.610 2.870 0.252 0.502 0.432 0.748 393 lr 24.092 4.908 3.425 0.285 0.534 0.515 0.715 394 395 ================= 396 Ploting functions 397 ================= 398 399 .. autofunction:: graph_ranks 400 401 The following script (`statExamplesGraphRanks.py`_) shows hot to plot a graph: 402 403 .. literalinclude:: code/statExamplesGraphRanks.py 404 405 .. _statExamplesGraphRanks.py: code/statExamplesGraphRanks.py 406 407 Code produces the following graph: 408 409 .. image:: files/statExamplesGraphRanks1.png 410 411 .. autofunction:: compute_CD 412 413 .. autofunction:: compute_friedman 414 415 ================= 416 Utility Functions 417 ================= 418 419 .. autofunction:: splitByIterations 420 14 421 """ 15 422 16 423 import statc 17 424 from operator import add 425 import numpy 426 18 427 import orngMisc, orngTest 19 import numpy20 428 21 429 #### Private stuff … … 26 434 27 435 def checkNonZero(x): 436 """Throw Value Error when x = 0.0.""" 28 437 if x==0.0: 29 438 raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0." … … 43 452 44 453 def splitByIterations(res): 454 """ Splits ExperimentResults of multiple iteratation test into a list 455 of ExperimentResults, one for each iteration. 456 """ 45 457 if res.numberOfIterations < 2: 46 458 return [res] … … 206 618 207 619 def MSE(res, **argkw): 208 """ MSE(res) > meansquared error"""620 """ Computes meansquared error. """ 209 621 return regressionError(res, **argkw) 210 622 211 623 def RMSE(res, **argkw): 212 """ RMSE(res) > root meansquared error"""624 """ Computes root meansquared error. """ 213 625 argkw.setdefault("sqrt", True) 214 626 return regressionError(res, **argkw) 215 627 216 628 def MAE(res, **argkw): 217 """ MAE(res) > mean absolute error"""629 """ Computes mean absolute error. """ 218 630 argkw.setdefault("abs", True) 219 631 return regressionError(res, **argkw) 220 632 221 633 def RSE(res, **argkw): 222 """ RSE(res) > relative squared error"""634 """ Computes relative squared error. """ 223 635 argkw.setdefault("normsqr", True) 224 636 return regressionError(res, **argkw) 225 637 226 638 def RRSE(res, **argkw): 227 """ RRSE(res) > root relative squared error"""639 """ Computes relative squared error. """ 228 640 argkw.setdefault("normsqr", True) 229 641 argkw.setdefault("sqrt", True) … … 231 643 232 644 def RAE(res, **argkw): 233 """ RAE(res) > relative absolute error"""645 """ Computes relative absolute error. """ 234 646 argkw.setdefault("abs", True) 235 647 argkw.setdefault("normabs", True) … … 237 649 238 650 def R2(res, **argkw): 239 """ R2(res) > Rsquared"""651 """ Computes the coefficient of determination, Rsquared. """ 240 652 argkw.setdefault("normsqr", True) 241 653 argkw.setdefault("R2", True) … … 282 694 return MSE_old(res, **argkw) 283 695 284 285 #########################################################################286 # PERFORMANCE MEASURES:287 # Scores for evaluation of numeric predictions288 289 def checkArgkw(dct, lst):290 """checkArgkw(dct, lst) > returns true if any items have nonzero value in dct"""291 return reduce(lambda x,y: x or y, [dct.get(k, 0) for k in lst])292 293 def regressionError(res, **argkw):294 """regressionError(res) > regression error (default: MSE)"""295 if argkw.get("SE", 0) and res.numberOfIterations > 1:296 # computes the scores for each iteration, then averages297 scores = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)]298 if argkw.get("normabs", 0) or argkw.get("normsqr", 0):299 norm = [0.0] * res.numberOfIterations300 301 nIter = [0]*res.numberOfIterations # counts examples in each iteration302 a = [0]*res.numberOfIterations # average class in each iteration303 for tex in res.results:304 nIter[tex.iterationNumber] += 1305 a[tex.iterationNumber] += float(tex.actualClass)306 a = [a[i]/nIter[i] for i in range(res.numberOfIterations)]307 308 if argkw.get("unweighted", 0) or not res.weights:309 # iterate accross test cases310 for tex in res.results:311 ai = float(tex.actualClass)312 nIter[tex.iterationNumber] += 1313 314 # compute normalization, if required315 if argkw.get("normabs", 0):316 norm[tex.iterationNumber] += abs(ai  a[tex.iterationNumber])317 elif argkw.get("normsqr", 0):318 norm[tex.iterationNumber] += (ai  a[tex.iterationNumber])**2319 320 # iterate accross results of different regressors321 for i, cls in enumerate(tex.classes):322 if argkw.get("abs", 0):323 scores[i][tex.iterationNumber] += abs(float(cls)  ai)324 else:325 scores[i][tex.iterationNumber] += (float(cls)  ai)**2326 else: # unweighted<>0327 raise NotImplementedError, "weighted error scores with SE not implemented yet"328 329 if argkw.get("normabs") or argkw.get("normsqr"):330 scores = [[x/n for x, n in zip(y, norm)] for y in scores]331 else:332 scores = [[x/ni for x, ni in zip(y, nIter)] for y in scores]333 334 if argkw.get("R2"):335 scores = [[1.0  x for x in y] for y in scores]336 337 if argkw.get("sqrt", 0):338 scores = [[math.sqrt(x) for x in y] for y in scores]339 340 return [(statc.mean(x), statc.std(x)) for x in scores]341 342 else: # single iteration (testing on a single test set)343 scores = [0.0] * res.numberOfLearners344 norm = 0.0345 346 if argkw.get("unweighted", 0) or not res.weights:347 a = sum([tex.actualClass for tex in res.results]) \348 / len(res.results)349 for tex in res.results:350 if argkw.get("abs", 0):351 scores = map(lambda res, cls, ac = float(tex.actualClass):352 res + abs(float(cls)  ac), scores, tex.classes)353 else:354 scores = map(lambda res, cls, ac = float(tex.actualClass):355 res + (float(cls)  ac)**2, scores, tex.classes)356 357 if argkw.get("normabs", 0):358 norm += abs(tex.actualClass  a)359 elif argkw.get("normsqr", 0):360 norm += (tex.actualClass  a)**2361 totweight = gettotsize(res)362 else:363 # UNFINISHED364 for tex in res.results:365 MSEs = map(lambda res, cls, ac = float(tex.actualClass),366 tw = tex.weight:367 res + tw * (float(cls)  ac)**2, MSEs, tex.classes)368 totweight = gettotweight(res)369 370 if argkw.get("normabs", 0) or argkw.get("normsqr", 0):371 scores = [s/norm for s in scores]372 else: # normalize by number of instances (or sum of weights)373 scores = [s/totweight for s in scores]374 375 if argkw.get("R2"):376 scores = [1.0  s for s in scores]377 378 if argkw.get("sqrt", 0):379 scores = [math.sqrt(x) for x in scores]380 381 return scores382 383 def MSE(res, **argkw):384 """MSE(res) > meansquared error"""385 return regressionError(res, **argkw)386 387 def RMSE(res, **argkw):388 """RMSE(res) > root meansquared error"""389 argkw.setdefault("sqrt", True)390 return regressionError(res, **argkw)391 392 def MAE(res, **argkw):393 """MAE(res) > mean absolute error"""394 argkw.setdefault("abs", True)395 return regressionError(res, **argkw)396 397 def RSE(res, **argkw):398 """RSE(res) > relative squared error"""399 argkw.setdefault("normsqr", True)400 return regressionError(res, **argkw)401 402 def RRSE(res, **argkw):403 """RRSE(res) > root relative squared error"""404 argkw.setdefault("normsqr", True)405 argkw.setdefault("sqrt", True)406 return regressionError(res, **argkw)407 408 def RAE(res, **argkw):409 """RAE(res) > relative absolute error"""410 argkw.setdefault("abs", True)411 argkw.setdefault("normabs", True)412 return regressionError(res, **argkw)413 414 def R2(res, **argkw):415 """R2(res) > Rsquared"""416 argkw.setdefault("normsqr", True)417 argkw.setdefault("R2", True)418 return regressionError(res, **argkw)419 420 def MSE_old(res, **argkw):421 """MSE(res) > meansquared error"""422 if argkw.get("SE", 0) and res.numberOfIterations > 1:423 MSEs = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)]424 nIter = [0]*res.numberOfIterations425 if argkw.get("unweighted", 0) or not res.weights:426 for tex in res.results:427 ac = float(tex.actualClass)428 nIter[tex.iterationNumber] += 1429 for i, cls in enumerate(tex.classes):430 MSEs[i][tex.iterationNumber] += (float(cls)  ac)**2431 else:432 raise ValueError, "weighted RMSE with SE not implemented yet"433 MSEs = [[x/ni for x, ni in zip(y, nIter)] for y in MSEs]434 if argkw.get("sqrt", 0):435 MSEs = [[math.sqrt(x) for x in y] for y in MSEs]436 return [(statc.mean(x), statc.std(x)) for x in MSEs]437 438 else:439 MSEs = [0.0]*res.numberOfLearners440 if argkw.get("unweighted", 0) or not res.weights:441 for tex in res.results:442 MSEs = map(lambda res, cls, ac = float(tex.actualClass):443 res + (float(cls)  ac)**2, MSEs, tex.classes)444 totweight = gettotsize(res)445 else:446 for tex in res.results:447 MSEs = map(lambda res, cls, ac = float(tex.actualClass), tw = tex.weight:448 res + tw * (float(cls)  ac)**2, MSEs, tex.classes)449 totweight = gettotweight(res)450 451 if argkw.get("sqrt", 0):452 MSEs = [math.sqrt(x) for x in MSEs]453 return [x/totweight for x in MSEs]454 455 def RMSE_old(res, **argkw):456 """RMSE(res) > root meansquared error"""457 argkw.setdefault("sqrt", 1)458 return MSE_old(res, **argkw)459 460 461 696 ######################################################################### 462 697 # PERFORMANCE MEASURES: … … 464 699 465 700 def CA(res, reportSE = False, **argkw): 701 """ Computes classification accuracy, i.e. percentage of matches between 702 predicted and actual class. The function returns a list of classification 703 accuracies of all classifiers tested. If reportSE is set to true, the list 704 will contain tuples with accuracies and standard errors. 705 706 If results are from multiple repetitions of experiments (like those 707 returned by orngTest.crossValidation or orngTest.proportionTest) the 708 standard error (SE) is estimated from deviation of classification 709 accuracy accross folds (SD), as SE = SD/sqrt(N), where N is number 710 of repetitions (e.g. number of folds). 711 712 If results are from a single repetition, we assume independency of 713 examples and treat the classification accuracy as distributed according 714 to binomial distribution. This can be approximated by normal distribution, 715 so we report the SE of sqrt(CA*(1CA)/N), where CA is classification 716 accuracy and N is number of test examples. 717 718 Instead of ExperimentResults, this function can be given a list of 719 confusion matrices (see below). Standard errors are in this case 720 estimated using the latter method. 721 """ 466 722 if res.numberOfIterations==1: 467 723 if type(res)==ConfusionMatrix: … … 512 768 513 769 def AP(res, reportSE = False, **argkw): 770 """ Computes the average probability assigned to the correct class. """ 514 771 if res.numberOfIterations == 1: 515 772 APs=[0.0]*res.numberOfLearners … … 541 798 542 799 def BrierScore(res, reportSE = False, **argkw): 543 """Computes Brier score""" 800 """ Computes the Brier's score, defined as the average (over test examples) 801 of sumx(t(x)p(x))2, where x is a class, t(x) is 1 for the correct class 802 and 0 for the others, and p(x) is the probability that the classifier 803 assigned to the class x 804 """ 544 805 # Computes an average (over examples) of sum_x(t(x)  p(x))^2, where 545 806 # x is class, … … 655 916 656 917 def IS(res, apriori=None, reportSE = False, **argkw): 918 """ Computes the information score as defined by 919 `Kononenko and Bratko (1991) \ 920 <http://www.springerlink.com/content/g5p7473160476612/>`_. 921 Argument 'apriori' gives the apriori class 922 distribution; if it is omitted, the class distribution is computed from 923 the actual classes of examples in res. 924 """ 657 925 if not apriori: 658 926 apriori = classProbabilitiesFromRes(res) … … 754 1022 755 1023 def confusionMatrices(res, classIndex=1, **argkw): 1024 """ This function can compute two different forms of confusion matrix: 1025 one in which a certain class is marked as positive and the other(s) 1026 negative, and another in which no class is singled out. The way to 1027 specify what we want is somewhat confusing due to backward 1028 compatibility issues. 1029 """ 756 1030 tfpns = [ConfusionMatrix() for i in range(res.numberOfLearners)] 757 1031 … … 998 1272 999 1273 def AUCWilcoxon(res, classIndex=1, **argkw): 1274 """ Computes the area under ROC (AUC) and its standard error using 1275 Wilcoxon's approach proposed by Hanley and McNeal (1982). If classIndex 1276 is not specified, the first class is used as "the positive" and others 1277 are negative. The result is a list of tuples (aROC, standard error). 1278 """ 1000 1279 import corn 1001 1280 useweights = res.weights and not argkw.get("unweighted", 0) … … 1036 1315 1037 1316 def computeROC(res, classIndex=1): 1317 """ Computes a ROC curve as a list of (x, y) tuples, where x is 1318 1specificity and y is sensitivity. 1319 """ 1038 1320 import corn 1039 1321 problists, tots = corn.computeROCCumulative(res, classIndex) … … 1574 1856 return sum_aucs 1575 1857 1858 def AUC(): 1859 pass 1860 1861 AUC.ByWeightedPairs = 0 1576 1862 1577 1863 # Computes AUC, possibly for multiple classes (the averaging method can be specified) 1578 1864 # Results over folds are averages; if some folds examples from one class only, the folds are merged 1579 def AUC(res, method = 0, useWeights = True): 1865 def AUC(res, method = AUC.ByWeightedPairs, useWeights = True): 1866 """ Returns the area under ROC curve (AUC) given a set of experimental 1867 results. For multivalued class problems, it will compute some sort of 1868 average, as specified by the argument method. 1869 """ 1580 1870 if len(res.classValues) < 2: 1581 1871 raise ValueError("Cannot compute AUC on a singleclass problem") … … 1594 1884 # Results over folds are averages; if some folds examples from one class only, the folds are merged 1595 1885 def AUC_single(res, classIndex = 1, useWeights = True): 1886 """ Computes AUC where the class given classIndex is singled out, and 1887 all other classes are treated as a single class. To find how good our 1888 classifiers are in distinguishing between vans and other vehicle, call 1889 the function like this:: 1890 1891 orngStat.AUC_single(resVeh, \ 1892 classIndex = vehicle.domain.classVar.values.index("van")) 1893 """ 1596 1894 if classIndex<0: 1597 1895 if res.baseClass>=0: … … 1608 1906 # Results over folds are averages; if some folds have examples from one class only, the folds are merged 1609 1907 def AUC_pair(res, classIndex1, classIndex2, useWeights = True): 1908 """ Computes AUC between a pair of examples, ignoring examples from all 1909 other classes. 1910 """ 1610 1911 if res.numberOfIterations > 1: 1611 1912 return AUC_iterations(AUC_ij, splitByIterations(res), (classIndex1, classIndex2, useWeights, res, res.numberOfIterations)) … … 1616 1917 # AUC for multiclass problems 1617 1918 def AUC_matrix(res, useWeights = True): 1919 """ Computes a (lower diagonal) matrix with AUCs for all pairs of classes. 1920 If there are empty classes, the corresponding elements in the matrix 1921 are 1. Remember the beautiful(?) code for printing out the confusion 1922 matrix? Here it strikes again:: 1923 1924 classes = vehicle.domain.classVar.values 1925 AUCmatrix = orngStat.AUC_matrix(resVeh)[0] 1926 print "\t"+"\t".join(classes[:1]) 1927 for className, AUCrow in zip(classes[1:], AUCmatrix[1:]): 1928 print ("%s" + ("\t%5.3f" * len(AUCrow))) % ((className, ) + tuple(AUCrow)) 1929 """ 1618 1930 numberOfClasses = len(res.classValues) 1619 1931 numberOfLearners = res.numberOfLearners … … 1640 1952 1641 1953 def McNemar(res, **argkw): 1954 """ Computes a triangular matrix with McNemar statistics for each pair of 1955 classifiers. The statistics is distributed by chisquare distribution with 1956 one degree of freedom; critical value for 5% significance is around 3.84. 1957 """ 1642 1958 nLearners = res.numberOfLearners 1643 1959 mcm = [] … … 1683 1999 1684 2000 def McNemarOfTwo(res, lrn1, lrn2): 2001 """ McNemarOfTwo computes a McNemar statistics for a pair of classifier, 2002 specified by indices learner1 and learner2. 2003 """ 1685 2004 tf = ft = 0.0 1686 2005 if not res.weights or argkw.get("unweighted"): … … 1990 2309 1991 2310 def compute_friedman(avranks, N): 1992 """ 1993 Returns a tuple (friedman statistic, degrees of freedom)1994 and (Iman statistic  Fdistribution, degrees of freedom)2311 """ Returns a tuple composed of (friedman statistic, degrees of freedom) 2312 and (Iman statistic  Fdistribution, degrees of freedoma) given average 2313 ranks and a number of tested data sets N. 1995 2314 """ 1996 2315 … … 2010 2329 2011 2330 def compute_CD(avranks, N, alpha="0.05", type="nemenyi"): 2012 """ 2013 if type == "nemenyi": 2014 critical difference for Nemenyi two tailed test. 2015 if type == "bonferronidunn": 2016 critical difference for BonferroniDunn test 2331 """ Returns critical difference for Nemenyi or BonferroniDunn test 2332 according to given alpha (either alpha="0.05" or alpha="0.1") for average 2333 ranks and number of tested data sets N. Type can be either "nemenyi" for 2334 for Nemenyi two tailed test or "bonferronidunn" for BonferroniDunn test. 2017 2335 """ 2018 2336 … … 2042 2360 Needs matplotlib to work. 2043 2361 2044 Arguments: 2045 filename  Output file name (with extension). Formats supported 2046 by matplotlib can be used. 2047 avranks  List of average methods' ranks. 2048 names  List of methods' names. 2049 2050 Keyword arguments: 2051 cd  Critical difference. Used for marking methods that whose 2052 difference is not statistically significant. 2053 lowv  The lowest shown rank, if None, use 1. 2054 highv  The highest shown rank, if None, use len(avranks). 2055 width  Width of the drawn figure in inches, default 6 in. 2056 textspace  Space on figure sides left for the description 2057 of methods, default 1 in. 2058 reverse  If True, the lowest rank is on the right. Default: 2059 False. 2060 cdmethod  None by default. It can be an index of element in avranks or 2061 or names which specifies the method which should be marked 2062 with an interval. 2063 2064 Maintainer: Marko Toplak 2362 :param filename: Output file name (with extension). Formats supported 2363 by matplotlib can be used. 2364 :param avranks: List of average methods' ranks. 2365 :param names: List of methods' names. 2366 2367 :param cd: Critical difference. Used for marking methods that whose 2368 difference is not statistically significant. 2369 :param lowv: The lowest shown rank, if None, use 1. 2370 :param highv: The highest shown rank, if None, use len(avranks). 2371 :param width: Width of the drawn figure in inches, default 6 in. 2372 :param textspace: Space on figure sides left for the description 2373 of methods, default 1 in. 2374 :param reverse: If True, the lowest rank is on the right. Default: False. 2375 :param cdmethod: None by default. It can be an index of element in avranks 2376 or or names which specifies the method which should be 2377 marked with an interval. 2065 2378 """ 2066 2379
Note: See TracChangeset
for help on using the changeset viewer.