Ignore:
Timestamp:
02/07/12 19:04:58 (2 years ago)
Author:
anze <anze.staric@…>
Branch:
default
Children:
10000:d65550cf0356, 10003:1631c2f30f11
rebase_source:
395f47303359a995a9ab21929a90119d9a55de6f
Message:

Improved documentation.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.evaluation.scoring.rst

    r9904 r9999  
    77.. index: scoring 
    88 
    9 This module contains various measures of quality for classification and 
    10 regression. Most functions require an argument named :obj:`res`, an instance of 
    11 :class:`Orange.evaluation.testing.ExperimentResults` as computed by 
    12 functions from :mod:`Orange.evaluation.testing` and which contains 
    13 predictions obtained through cross-validation, 
    14 leave one-out, testing on training data or test set instances. 
     9Scoring plays and integral role in evaluation of any prediction model. Orange 
     10implements various scores for evaluation of classification, 
     11regression and multi-label models. Most of the methods needs to be called 
     12with an instance of :obj:`ExperimentResults`. 
     13 
     14.. literalinclude:: code/statExample0.py 
    1515 
    1616============== 
    1717Classification 
    1818============== 
     19 
     20Many scores for evaluation of classification models can be computed solely 
     21from the confusion matrix constructed manually with the 
     22:obj:`confusion_matrices` function. If class variable has more than two 
     23values, the index of the value to calculate the confusion matrix for should 
     24be passed as well. 
     25 
     26Calibration scores 
     27================== 
     28 
     29.. autofunction:: CA 
     30.. autofunction:: sens 
     31.. autofunction:: spec 
     32.. autofunction:: PPV 
     33.. autofunction:: NPV 
     34.. autofunction:: precision 
     35.. autofunction:: recall 
     36.. autofunction:: F1 
     37.. autofunction:: Falpha 
     38.. autofunction:: MCC 
     39.. autofunction:: AP 
     40.. autofunction:: IS 
     41.. autofunction:: 
     42 
     43Discriminatory scores 
     44===================== 
     45 
     46.. autofunction:: Brier_score 
     47 
     48.. autofunction:: AUC 
     49 
     50    .. attribute:: AUC.ByWeightedPairs (or 0) 
     51 
     52      Computes AUC for each pair of classes (ignoring instances of all other 
     53      classes) and averages the results, weighting them by the number of 
     54      pairs of instances from these two classes (e.g. by the product of 
     55      probabilities of the two classes). AUC computed in this way still 
     56      behaves as concordance index, e.g., gives the probability that two 
     57      randomly chosen instances from different classes will be correctly 
     58      recognized (this is of course true only if the classifier knows 
     59      from which two classes the instances came). 
     60 
     61   .. attribute:: AUC.ByPairs (or 1) 
     62 
     63      Similar as above, except that the average over class pairs is not 
     64      weighted. This AUC is, like the binary, independent of class 
     65      distributions, but it is not related to concordance index any more. 
     66 
     67   .. attribute:: AUC.WeightedOneAgainstAll (or 2) 
     68 
     69      For each class, it computes AUC for this class against all others (that 
     70      is, treating other classes as one class). The AUCs are then averaged by 
     71      the class probabilities. This is related to concordance index in which 
     72      we test the classifier's (average) capability for distinguishing the 
     73      instances from a specified class from those that come from other classes. 
     74      Unlike the binary AUC, the measure is not independent of class 
     75      distributions. 
     76 
     77   .. attribute:: AUC.OneAgainstAll (or 3) 
     78 
     79      As above, except that the average is not weighted. 
     80 
     81   In case of multiple folds (for instance if the data comes from cross 
     82   validation), the computation goes like this. When computing the partial 
     83   AUCs for individual pairs of classes or singled-out classes, AUC is 
     84   computed for each fold separately and then averaged (ignoring the number 
     85   of instances in each fold, it's just a simple average). However, if a 
     86   certain fold doesn't contain any instances of a certain class (from the 
     87   pair), the partial AUC is computed treating the results as if they came 
     88   from a single-fold. This is not really correct since the class 
     89   probabilities from different folds are not necessarily comparable, 
     90   yet this will most often occur in a leave-one-out experiments, 
     91   comparability shouldn't be a problem. 
     92 
     93   Computing and printing out the AUC's looks just like printing out 
     94   classification accuracies (except that we call AUC instead of 
     95   CA, of course):: 
     96 
     97       AUCs = Orange.evaluation.scoring.AUC(res) 
     98       for l in range(len(learners)): 
     99           print "%10s: %5.3f" % (learners[l].name, AUCs[l]) 
     100 
     101   For vehicle, you can run exactly this same code; it will compute AUCs 
     102   for all pairs of classes and return the average weighted by probabilities 
     103   of pairs. Or, you can specify the averaging method yourself, like this:: 
     104 
     105       AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll) 
     106 
     107   The following snippet tries out all four. (We don't claim that this is 
     108   how the function needs to be used; it's better to stay with the default.):: 
     109 
     110       methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"] 
     111       print " " *25 + "  \tbayes\ttree\tmajority" 
     112       for i in range(4): 
     113           AUCs = Orange.evaluation.scoring.AUC(resVeh, i) 
     114           print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs)) 
     115 
     116   As you can see from the output:: 
     117 
     118                                   bayes   tree    majority 
     119              by pairs, weighted:  0.789   0.871   0.500 
     120                        by pairs:  0.791   0.872   0.500 
     121           one vs. all, weighted:  0.783   0.800   0.500 
     122                     one vs. all:  0.783   0.800   0.500 
     123 
     124.. autofunction:: AUC_single 
     125 
     126.. autofunction:: AUC_pair 
     127 
     128.. autofunction:: AUC_matrix 
     129 
     130The remaining functions, which plot the curves and statistically compare 
     131them, require that the results come from a test with a single iteration, 
     132and they always compare one chosen class against all others. If you have 
     133cross validation results, you can either use split_by_iterations to split the 
     134results by folds, call the function for each fold separately and then sum 
     135the results up however you see fit, or you can set the ExperimentResults' 
     136attribute number_of_iterations to 1, to cheat the function - at your own 
     137responsibility for the statistical correctness. Regarding the multi-class 
     138problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class 
     139attribute's baseValue at the time when results were computed. If baseValue 
     140was not given at that time, 1 (that is, the second class) is used as default. 
     141 
     142We shall use the following code to prepare suitable experimental results:: 
     143 
     144    ri2 = Orange.core.MakeRandomIndices2(voting, 0.6) 
     145    train = voting.selectref(ri2, 0) 
     146    test = voting.selectref(ri2, 1) 
     147    res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test) 
     148 
     149 
     150.. autofunction:: AUCWilcoxon 
     151 
     152.. autofunction:: compute_ROC 
     153 
     154 
     155.. autofunction:: confusion_matrices 
     156 
     157.. autoclass:: ConfusionMatrix 
     158 
    19159 
    20160To prepare some data for examples on this page, we shall load the voting data 
     
    29169(:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`): 
    30170 
    31 .. literalinclude:: code/statExample0.py 
    32  
    33171If instances are weighted, weights are taken into account. This can be 
    34172disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of 
     
    39177=========================== 
    40178 
    41 .. autofunction:: CA 
    42  
    43 .. autofunction:: AP 
    44  
    45 .. autofunction:: Brier_score 
    46  
    47 .. autofunction:: IS 
     179 
     180 
     181 
    48182 
    49183So, let's compute all this in part of 
     
    58192    bayes   0.903   0.902   0.175    0.759 
    59193    tree    0.846   0.845   0.286    0.641 
    60     majorty  0.614   0.526   0.474   -0.000 
     194    majority  0.614   0.526   0.474   -0.000 
    61195 
    62196Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out 
     
    163297   instances. The classifier is obviously quite biased to vans. 
    164298 
    165    .. method:: sens(confm) 
    166    .. method:: spec(confm) 
    167    .. method:: PPV(confm) 
    168    .. method:: NPV(confm) 
    169    .. method:: precision(confm) 
    170    .. method:: recall(confm) 
    171    .. method:: F2(confm) 
    172    .. method:: Falpha(confm, alpha=2.0) 
    173    .. method:: MCC(conf) 
     299 
    174300 
    175301   With the confusion matrix defined in terms of positive and negative 
Note: See TracChangeset for help on using the changeset viewer.