Changeset 7582:e2114f229e5e in orange
 Timestamp:
 02/04/11 23:52:33 (3 years ago)
 Branch:
 default
 Convert:
 e5d3c728f7a39f86dd2744ce4413a2cc8cc1d168
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

orange/Orange/evaluation/scoring.py
r7548 r7582 4 4 5 5 This module contains various measures of quality for classification and 6 regression. Most functions require an argument named res, an instance of6 regression. Most functions require an argument named :obj:`res`, an instance of 7 7 :class:`Orange.evaluation.testing.ExperimentResults` as computed by 8 functions from Orange.evaluation.testing and which contains predictions9 obtained through 10 crossvalidation, leave oneout, testing on training data or test set examples.8 functions from :mod:`Orange.evaluation.testing` and which contains 9 predictions obtained through crossvalidation, 10 leave oneout, testing on training data or test set instances. 11 11 12 12 ============== … … 59 59 The output should look like this:: 60 60 61 method CA AP BrierIS61 method CA AP Brier IS 62 62 bayes 0.903 0.902 0.175 0.759 63 63 tree 0.846 0.845 0.286 0.641 … … 75 75 76 76 **A positivenegative confusion matrix** is computed (a) if the class is 77 binary unless classIndex argument is 2, (b) if the class is multivalued 78 and the classIndex is nonnegative. Argument classIndex then tells which 79 class is positive. In case (a), classIndex may be omited; the first class 80 is then negative and the second is positive, unless the baseClass attribute 81 in the object with results has nonnegative value. In that case, baseClass 82 is an index of the traget class. baseClass attribute of results object 83 should be set manually. The result of a function is a list of instances 84 of class ConfusionMatrix, containing the (weighted) number of true 85 positives (TP), false negatives (FN), false positives (FP) and true 86 negatives (TN). 87 88 We can also add the keyword argument cutoff 89 (e.g. confusionMatrices(results, cutoff=0.3); if we do, confusionMatrices 77 binary unless :obj:`classIndex` argument is 2, (b) if the class is 78 multivalued and the :obj:`classIndex` is nonnegative. Argument 79 :obj:`classIndex` then tells which class is positive. In case (a), 80 :obj:`classIndex` may be omitted; the first class 81 is then negative and the second is positive, unless the :obj:`baseClass` 82 attribute in the object with results has nonnegative value. In that case, 83 :obj:`baseClass` is an index of the target class. :obj:`baseClass` 84 attribute of results object should be set manually. The result of a 85 function is a list of instances of class :class:`ConfusionMatrix`, 86 containing the (weighted) number of true positives (TP), false 87 negatives (FN), false positives (FP) and true negatives (TN). 88 89 We can also add the keyword argument :obj:`cutoff` 90 (e.g. confusionMatrices(results, cutoff=0.3); if we do, :obj:`confusionMatrices` 90 91 will disregard the classifiers' class predictions and observe the predicted 91 92 probabilities, and consider the prediction "positive" if the predicted 92 probability of the positive class is higher than the cutoff.93 probability of the positive class is higher than the :obj:`cutoff`. 93 94 94 95 The example (part of `statExamples.py`_) below shows how setting the … … 96 97 for naive Bayesian classifier:: 97 98 98 cm = orngStat.confusionMatrices(res)[0]99 cm = Orange.evaluation.scoring.confusionMatrices(res)[0] 99 100 print "Confusion matrix for naive Bayes:" 100 101 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) 101 102 102 cm = orngStat.confusionMatrices(res, cutoff=0.2)[0]103 cm = Orange.evaluation.scoring.confusionMatrices(res, cutoff=0.2)[0] 103 104 print "Confusion matrix for naive Bayes:" 104 105 print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) … … 114 115 115 116 shows that the number of true positives increases (and hence the number of 116 false negatives decreases) by only a single example, while five examples117 false negatives decreases) by only a single instance, while five instances 117 118 that were originally true negatives become false positives due to the 118 119 lower threshold. … … 121 122 data set, we would compute the matrix like this:: 122 123 123 cm = orngStat.confusionMatrices(resVeh, \124 cm = Orange.evaluation.scoring.confusionMatrices(resVeh, \ 124 125 vehicle.domain.classVar.values.index("van")) 125 126 … … 145 146 The function then returns a threedimensional matrix, where the element 146 147 A[:obj:`learner`][:obj:`actualClass`][:obj:`predictedClass`] 147 gives the number of examples belonging to 'actualClass' for which the148 gives the number of instances belonging to 'actualClass' for which the 148 149 'learner' predicted 'predictedClass'. We shall compute and print out 149 150 the matrix for naive Bayesian classifier. … … 151 152 Here we see another example from `statExamples.py`_:: 152 153 153 cm = orngStat.confusionMatrices(resVeh)[0]154 cm = Orange.evaluation.scoring.confusionMatrices(resVeh)[0] 154 155 classes = vehicle.domain.classVar.values 155 156 print "\t"+"\t".join(classes) … … 170 171 already, we've printed it out above), and the 10 misclassified pictures 171 172 were classified as buses (6) and Saab cars (4). In all other classes, 172 there were more examples misclassified as vans than correctly classified173 examples. The classifier is obviously quite biased to vans.173 there were more instances misclassified as vans than correctly classified 174 instances. The classifier is obviously quite biased to vans. 174 175 175 176 .. method:: sens(confm) … … 221 222 part of `statExamples.py`_:: 222 223 223 cm = orngStat.confusionMatrices(res)224 cm = Orange.evaluation.scoring.confusionMatrices(res) 224 225 print 225 226 print "method\tsens\tspec" 226 227 for l in range(len(learners)): 227 print "%s\t%5.3f\t%5.3f" % (learners[l].name, orngStat.sens(cm[l]), orngStat.spec(cm[l]))228 print "%s\t%5.3f\t%5.3f" % (learners[l].name, Orange.evaluation.scoring.sens(cm[l]), Orange.evaluation.scoring.spec(cm[l])) 228 229 229 230 .. _statExamples.py: code/statExamples.py … … 245 246 .. attribute:: AUC.ByWeightedPairs (or 0) 246 247 247 Computes AUC for each pair of classes (ignoring examples of all other248 Computes AUC for each pair of classes (ignoring instances of all other 248 249 classes) and averages the results, weighting them by the number of 249 pairs of examples from these two classes (e.g. by the product of250 pairs of instances from these two classes (e.g. by the product of 250 251 probabilities of the two classes). AUC computed in this way still 251 252 behaves as concordance index, e.g., gives the probability that two 252 randomly chosen examples from different classes will be correctly253 randomly chosen instances from different classes will be correctly 253 254 recognized (this is of course true only if the classifier knows 254 from which two classes the examples came).255 from which two classes the instances came). 255 256 256 257 .. attribute:: AUC.ByPairs (or 1) … … 266 267 the class probabilities. This is related to concordance index in which 267 268 we test the classifier's (average) capability for distinguishing the 268 examples from a specified class from those that come from other classes.269 instances from a specified class from those that come from other classes. 269 270 Unlike the binary AUC, the measure is not independent of class 270 271 distributions. … … 274 275 As above, except that the average is not weighted. 275 276 276 In case of :obj:`multiple folds`(for instance if the data comes from cross277 In case of multiple folds (for instance if the data comes from cross 277 278 validation), the computation goes like this. When computing the partial 278 279 AUCs for individual pairs of classes or singledout classes, AUC is 279 280 computed for each fold separately and then averaged (ignoring the number 280 of examples in each fold, it's just a simple average). However, if a281 certain fold doesn't contain any examples of a certain class (from the281 of instances in each fold, it's just a simple average). However, if a 282 certain fold doesn't contain any instances of a certain class (from the 282 283 pair), the partial AUC is computed treating the results as if they came 283 284 from a singlefold. This is not really correct since the class … … 290 291 CA, of course):: 291 292 292 AUCs = orngStat.AUC(res)293 AUCs = Orange.evaluation.scoring.AUC(res) 293 294 for l in range(len(learners)): 294 295 print "%10s: %5.3f" % (learners[l].name, AUCs[l]) … … 298 299 of pairs. Or, you can specify the averaging method yourself, like this:: 299 300 300 AUCs = orngStat.AUC(resVeh, orngStat.AUC.WeightedOneAgainstAll)301 AUCs = Orange.evaluation.scoring.AUC(resVeh, orngStat.AUC.WeightedOneAgainstAll) 301 302 302 303 The following snippet tries out all four. (We don't claim that this is … … 306 307 print " " *25 + " \tbayes\ttree\tmajority" 307 308 for i in range(4): 308 AUCs = orngStat.AUC(resVeh, i)309 AUCs = Orange.evaluation.scoring.AUC(resVeh, i) 309 310 print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs)) 310 311 … … 337 338 We shall use the following code to prepare suitable experimental results:: 338 339 339 ri2 = orange.MakeRandomIndices2(voting, 0.6)340 ri2 = Orange.core.MakeRandomIndices2(voting, 0.6) 340 341 train = voting.selectref(ri2, 0) 341 342 test = voting.selectref(ri2, 1) … … 718 719 719 720 If results are from a single repetition, we assume independency of 720 examples and treat the classification accuracy as distributed according721 instances and treat the classification accuracy as distributed according 721 722 to binomial distribution. This can be approximated by normal distribution, 722 723 so we report the SE of sqrt(CA*(1CA)/N), where CA is classification 723 accuracy and N is number of test examples.724 accuracy and N is number of test instances. 724 725 725 726 Instead of ExperimentResults, this function can be given a list of … … 873 874 `Kononenko and Bratko (1991) \ 874 875 <http://www.springerlink.com/content/g5p7473160476612/>`_. 875 Argument 'apriori'gives the apriori class876 Argument :obj:`apriori` gives the apriori class 876 877 distribution; if it is omitted, the class distribution is computed from 877 the actual classes of examples in res.878 the actual classes of examples in :obj:`res`. 878 879 """ 879 880 if not apriori: … … 1227 1228 def AUCWilcoxon(res, classIndex=1, **argkw): 1228 1229 """ Computes the area under ROC (AUC) and its standard error using 1229 Wilcoxon's approach proposed by Hanley and McNeal (1982). If classIndex 1230 is not specified, the first class is used as "the positive" and others 1231 are negative. The result is a list of tuples (aROC, standard error). 1230 Wilcoxon's approach proposed by Hanley and McNeal (1982). If 1231 :obj:`classIndex` is not specified, the first class is used as 1232 "the positive" and others are negative. The result is a list of 1233 tuples (aROC, standard error). 1232 1234 """ 1233 1235 import corn … … 1843 1845 the function like this:: 1844 1846 1845 orngStat.AUC_single(resVeh, \1847 Orange.evaluation.scoring.AUC_single(resVeh, \ 1846 1848 classIndex = vehicle.domain.classVar.values.index("van")) 1847 1849 """ … … 1860 1862 # Results over folds are averages; if some folds have examples from one class only, the folds are merged 1861 1863 def AUC_pair(res, classIndex1, classIndex2, useWeights = True): 1862 """ Computes AUC between a pair of examples, ignoring examples from all1864 """ Computes AUC between a pair of instances, ignoring instances from all 1863 1865 other classes. 1864 1866 """
Note: See TracChangeset
for help on using the changeset viewer.