Changeset 10230:fe33086bba2e in orange
 Timestamp:
 02/14/12 15:30:13 (2 years ago)
 Branch:
 default
 rebase_source:
 f414611a26e2fdce2a579c27b4ff39b8550f7417
 Location:
 docs/reference/rst
 Files:

 1 added
 1 edited
Legend:
 Unmodified
 Added
 Removed

docs/reference/rst/Orange.evaluation.scoring.rst
r10204 r10230 12 12 with an instance of :obj:`~Orange.evaluation.testing.ExperimentResults`. 13 13 14 .. literalinclude:: code/s tatExample0.py14 .. literalinclude:: code/scoringexample.py 15 15 16 16 ============== … … 56 56 matrix 57 57 58 In case of multiple folds (for instance if the data comes from cross59 validation), the computation goes like this. When computing the partial60 AUCs for individual pairs of classes or singledout classes, AUC is61 computed for each fold separately and then averaged (ignoring the number62 of instances in each fold, it's just a simple average). However, if a63 certain fold doesn't contain any instances of a certain class (from the64 pair), the partial AUC is computed treating the results as if they came65 from a singlefold. This is not really correct since the class66 probabilities from different folds are not necessarily comparable,67 yet this will most often occur in a leaveoneout experiments,68 comparability shouldn't be a problem.69 70 Computing and printing out the AUC's looks just like printing out71 classification accuracies (except that we call AUC instead of72 CA, of course)::73 74 AUCs = Orange.evaluation.scoring.AUC(res)75 for l in range(len(learners)):76 print "%10s: %5.3f" % (learners[l].name, AUCs[l])77 78 For vehicle, you can run exactly this same code; it will compute AUCs79 for all pairs of classes and return the average weighted by probabilities80 of pairs. Or, you can specify the averaging method yourself, like this::81 82 AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll)83 84 The following snippet tries out all four. (We don't claim that this is85 how the function needs to be used; it's better to stay with the default.)::86 87 methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"]88 print " " *25 + " \tbayes\ttree\tmajority"89 for i in range(4):90 AUCs = Orange.evaluation.scoring.AUC(resVeh, i)91 print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs))92 93 As you can see from the output::94 95 bayes tree majority96 by pairs, weighted: 0.789 0.871 0.50097 by pairs: 0.791 0.872 0.50098 one vs. all, weighted: 0.783 0.800 0.50099 one vs. all: 0.783 0.800 0.500100 101 The remaining functions, which plot the curves and statistically compare102 them, require that the results come from a test with a single iteration,103 and they always compare one chosen class against all others. If you have104 cross validation results, you can either use split_by_iterations to split the105 results by folds, call the function for each fold separately and then sum106 the results up however you see fit, or you can set the ExperimentResults'107 attribute number_of_iterations to 1, to cheat the function  at your own108 responsibility for the statistical correctness. Regarding the multiclass109 problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class110 attribute's baseValue at the time when results were computed. If baseValue111 was not given at that time, 1 (that is, the second class) is used as default.112 113 We shall use the following code to prepare suitable experimental results::114 115 ri2 = Orange.core.MakeRandomIndices2(voting, 0.6)116 train = voting.selectref(ri2, 0)117 test = voting.selectref(ri2, 1)118 res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test)119 120 121 58 .. autofunction:: AUCWilcoxon 122 59 123 60 .. autofunction:: compute_ROC 124 125 61 126 62 .. autofunction:: confusion_matrices … … 130 66 131 67 Comparison of Algorithms 132  68 ======================== 133 69 134 70 .. autofunction:: McNemar … … 201 137 ===================================== 202 138 203 Multilabel classification requries different metrics than those used in traditional singlelabel 204 classification. This module presents the various methrics that have been proposed in the literature. 205 Let :math:`D` be a multilabel evaluation data set, conisting of :math:`D` multilabel examples 206 :math:`(x_i,Y_i)`, :math:`i=1..D`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multilabel classifier 207 and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`. 139 Multilabel classification requires different metrics than those used in 140 traditional singlelabel classification. This module presents the various 141 metrics that have been proposed in the literature. Let :math:`D` be a 142 multilabel evaluation data set, conisting of :math:`D` multilabel examples 143 :math:`(x_i,Y_i)`, :math:`i=1..D`, :math:`Y_i \\subseteq L`. Let :math:`H` 144 be a multilabel classifier and :math:`Z_i=H(x_i)` be the set of labels 145 predicted by :math:`H` for example :math:`x_i`. 208 146 209 147 .. autofunction:: mlc_hamming_loss
Note: See TracChangeset
for help on using the changeset viewer.