Changeset 9999:3b8b4cc606c0 in orange for docs/reference/rst/Orange.evaluation.scoring.rst
 Timestamp:
 02/07/12 19:04:58 (2 years ago)
 Branch:
 default
 Children:
 10000:d65550cf0356, 10003:1631c2f30f11
 rebase_source:
 395f47303359a995a9ab21929a90119d9a55de6f
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

docs/reference/rst/Orange.evaluation.scoring.rst
r9904 r9999 7 7 .. index: scoring 8 8 9 This module contains various measures of quality for classification and 10 regression. Most functions require an argument named :obj:`res`, an instance of 11 :class:`Orange.evaluation.testing.ExperimentResults` as computed by 12 functions from :mod:`Orange.evaluation.testing` and which contains 13 predictions obtained through crossvalidation, 14 leave oneout, testing on training data or test set instances. 9 Scoring plays and integral role in evaluation of any prediction model. Orange 10 implements various scores for evaluation of classification, 11 regression and multilabel models. Most of the methods needs to be called 12 with an instance of :obj:`ExperimentResults`. 13 14 .. literalinclude:: code/statExample0.py 15 15 16 16 ============== 17 17 Classification 18 18 ============== 19 20 Many scores for evaluation of classification models can be computed solely 21 from the confusion matrix constructed manually with the 22 :obj:`confusion_matrices` function. If class variable has more than two 23 values, the index of the value to calculate the confusion matrix for should 24 be passed as well. 25 26 Calibration scores 27 ================== 28 29 .. autofunction:: CA 30 .. autofunction:: sens 31 .. autofunction:: spec 32 .. autofunction:: PPV 33 .. autofunction:: NPV 34 .. autofunction:: precision 35 .. autofunction:: recall 36 .. autofunction:: F1 37 .. autofunction:: Falpha 38 .. autofunction:: MCC 39 .. autofunction:: AP 40 .. autofunction:: IS 41 .. autofunction:: 42 43 Discriminatory scores 44 ===================== 45 46 .. autofunction:: Brier_score 47 48 .. autofunction:: AUC 49 50 .. attribute:: AUC.ByWeightedPairs (or 0) 51 52 Computes AUC for each pair of classes (ignoring instances of all other 53 classes) and averages the results, weighting them by the number of 54 pairs of instances from these two classes (e.g. by the product of 55 probabilities of the two classes). AUC computed in this way still 56 behaves as concordance index, e.g., gives the probability that two 57 randomly chosen instances from different classes will be correctly 58 recognized (this is of course true only if the classifier knows 59 from which two classes the instances came). 60 61 .. attribute:: AUC.ByPairs (or 1) 62 63 Similar as above, except that the average over class pairs is not 64 weighted. This AUC is, like the binary, independent of class 65 distributions, but it is not related to concordance index any more. 66 67 .. attribute:: AUC.WeightedOneAgainstAll (or 2) 68 69 For each class, it computes AUC for this class against all others (that 70 is, treating other classes as one class). The AUCs are then averaged by 71 the class probabilities. This is related to concordance index in which 72 we test the classifier's (average) capability for distinguishing the 73 instances from a specified class from those that come from other classes. 74 Unlike the binary AUC, the measure is not independent of class 75 distributions. 76 77 .. attribute:: AUC.OneAgainstAll (or 3) 78 79 As above, except that the average is not weighted. 80 81 In case of multiple folds (for instance if the data comes from cross 82 validation), the computation goes like this. When computing the partial 83 AUCs for individual pairs of classes or singledout classes, AUC is 84 computed for each fold separately and then averaged (ignoring the number 85 of instances in each fold, it's just a simple average). However, if a 86 certain fold doesn't contain any instances of a certain class (from the 87 pair), the partial AUC is computed treating the results as if they came 88 from a singlefold. This is not really correct since the class 89 probabilities from different folds are not necessarily comparable, 90 yet this will most often occur in a leaveoneout experiments, 91 comparability shouldn't be a problem. 92 93 Computing and printing out the AUC's looks just like printing out 94 classification accuracies (except that we call AUC instead of 95 CA, of course):: 96 97 AUCs = Orange.evaluation.scoring.AUC(res) 98 for l in range(len(learners)): 99 print "%10s: %5.3f" % (learners[l].name, AUCs[l]) 100 101 For vehicle, you can run exactly this same code; it will compute AUCs 102 for all pairs of classes and return the average weighted by probabilities 103 of pairs. Or, you can specify the averaging method yourself, like this:: 104 105 AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll) 106 107 The following snippet tries out all four. (We don't claim that this is 108 how the function needs to be used; it's better to stay with the default.):: 109 110 methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"] 111 print " " *25 + " \tbayes\ttree\tmajority" 112 for i in range(4): 113 AUCs = Orange.evaluation.scoring.AUC(resVeh, i) 114 print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs)) 115 116 As you can see from the output:: 117 118 bayes tree majority 119 by pairs, weighted: 0.789 0.871 0.500 120 by pairs: 0.791 0.872 0.500 121 one vs. all, weighted: 0.783 0.800 0.500 122 one vs. all: 0.783 0.800 0.500 123 124 .. autofunction:: AUC_single 125 126 .. autofunction:: AUC_pair 127 128 .. autofunction:: AUC_matrix 129 130 The remaining functions, which plot the curves and statistically compare 131 them, require that the results come from a test with a single iteration, 132 and they always compare one chosen class against all others. If you have 133 cross validation results, you can either use split_by_iterations to split the 134 results by folds, call the function for each fold separately and then sum 135 the results up however you see fit, or you can set the ExperimentResults' 136 attribute number_of_iterations to 1, to cheat the function  at your own 137 responsibility for the statistical correctness. Regarding the multiclass 138 problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class 139 attribute's baseValue at the time when results were computed. If baseValue 140 was not given at that time, 1 (that is, the second class) is used as default. 141 142 We shall use the following code to prepare suitable experimental results:: 143 144 ri2 = Orange.core.MakeRandomIndices2(voting, 0.6) 145 train = voting.selectref(ri2, 0) 146 test = voting.selectref(ri2, 1) 147 res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test) 148 149 150 .. autofunction:: AUCWilcoxon 151 152 .. autofunction:: compute_ROC 153 154 155 .. autofunction:: confusion_matrices 156 157 .. autoclass:: ConfusionMatrix 158 19 159 20 160 To prepare some data for examples on this page, we shall load the voting data … … 29 169 (:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`): 30 170 31 .. literalinclude:: code/statExample0.py32 33 171 If instances are weighted, weights are taken into account. This can be 34 172 disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of … … 39 177 =========================== 40 178 41 .. autofunction:: CA 42 43 .. autofunction:: AP 44 45 .. autofunction:: Brier_score 46 47 .. autofunction:: IS 179 180 181 48 182 49 183 So, let's compute all this in part of … … 58 192 bayes 0.903 0.902 0.175 0.759 59 193 tree 0.846 0.845 0.286 0.641 60 major ty 0.614 0.526 0.474 0.000194 majority 0.614 0.526 0.474 0.000 61 195 62 196 Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out … … 163 297 instances. The classifier is obviously quite biased to vans. 164 298 165 .. method:: sens(confm) 166 .. method:: spec(confm) 167 .. method:: PPV(confm) 168 .. method:: NPV(confm) 169 .. method:: precision(confm) 170 .. method:: recall(confm) 171 .. method:: F2(confm) 172 .. method:: Falpha(confm, alpha=2.0) 173 .. method:: MCC(conf) 299 174 300 175 301 With the confusion matrix defined in terms of positive and negative
Note: See TracChangeset
for help on using the changeset viewer.