source: orange/docs/reference/rst/Orange.evaluation.scoring.rst @ 10425:40e3b077c358

Revision 10425:40e3b077c358, 5.9 KB checked in by anzeh <anze.staric@…>, 2 years ago (diff)

Scoring functions ((based on confusion matrices) are now classes, but documented as functions.

Line 
1.. automodule:: Orange.evaluation.scoring
2
3############################
4Method scoring (``scoring``)
5############################
6
7.. index: scoring
8
9Scoring plays and integral role in evaluation of any prediction model. Orange
10implements various scores for evaluation of classification,
11regression and multi-label models. Most of the methods needs to be called
12with an instance of :obj:`~Orange.evaluation.testing.ExperimentResults`.
13
14.. literalinclude:: code/scoring-example.py
15
16==============
17Classification
18==============
19
20Calibration scores
21==================
22Many scores for evaluation of the classification models measure whether the
23model assigns the correct class value to the test instances. Many of these
24scores can be computed solely from the confusion matrix constructed manually
25with the :obj:`confusion_matrices` function. If class variable has more than
26two values, the index of the value to calculate the confusion matrix for should
27be passed as well.
28
29.. autofunction:: CA
30.. autofunction:: Sensitivity
31.. autofunction:: Specificity
32.. autofunction:: PPV
33.. autofunction:: NPV
34.. autofunction:: Precision
35.. autofunction:: Recall
36.. autofunction:: F1
37.. autofunction:: Falpha
38.. autofunction:: MCC
39.. autofunction:: AP
40.. autofunction:: IS
41.. autofunction:: confusion_chi_square
42
43Discriminatory scores
44=====================
45Scores that measure how good can the prediction model separate instances with
46different classes are called discriminatory scores.
47
48.. autofunction:: Brier_score
49
50.. autoclass:: AUC
51    :members: by_weighted_pairs, by_pairs,
52              weighted_one_against_all, one_against_all, single_class, pair,
53
54.. autofunction:: AUCWilcoxon
55
56.. autofunction:: compute_ROC
57
58.. autofunction:: confusion_matrices
59
60.. autoclass:: ConfusionMatrix
61
62
63Comparison of Algorithms
64========================
65
66.. autofunction:: McNemar
67
68.. autofunction:: McNemar_of_two
69
70==========
71Regression
72==========
73
74Several alternative measures, as given below, can be used to evaluate
75the sucess of numeric prediction:
76
77.. image:: files/statRegression.png
78
79.. autofunction:: MSE
80
81.. autofunction:: RMSE
82
83.. autofunction:: MAE
84
85.. autofunction:: RSE
86
87.. autofunction:: RRSE
88
89.. autofunction:: RAE
90
91.. autofunction:: R2
92
93The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to
94score several regression methods.
95
96The code above produces the following output::
97
98    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
99    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
100    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
101    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
102    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715
103
104=================
105Ploting functions
106=================
107
108.. autofunction:: graph_ranks
109
110The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:
111
112.. literalinclude:: code/statExamplesGraphRanks.py
113
114Code produces the following graph:
115
116.. image:: files/statExamplesGraphRanks1.png
117
118.. autofunction:: compute_CD
119
120.. autofunction:: compute_friedman
121
122=================
123Utility Functions
124=================
125
126.. autofunction:: split_by_iterations
127
128
129.. _mt-scoring:
130
131============
132Multi-target
133============
134
135:doc:`Multi-target <Orange.multitarget>` classifiers predict values for
136multiple target classes. They can be used with standard
137:obj:`~Orange.evaluation.testing` procedures (e.g.
138:obj:`~Orange.evaluation.testing.Evaluation.cross_validation`), but require
139special scoring functions to compute a single score from the obtained
140:obj:`~Orange.evaluation.testing.ExperimentResults`.
141Since different targets can vary in importance depending on the experiment,
142some methods have options to indicate this e.g. through weights or customized
143distance functions. These can also be used for normalization in case target
144values do not have the same scales.
145
146.. autofunction:: mt_flattened_score
147.. autofunction:: mt_average_score
148
149The whole procedure of evaluating multi-target methods and computing
150the scores (RMSE errors) is shown in the following example
151(:download:`mt-evaluate.py <code/mt-evaluate.py>`). Because we consider
152the first target to be more important and the last not so much we will
153indicate this using appropriate weights.
154
155.. literalinclude:: code/mt-evaluate.py
156
157Which outputs::
158
159    Weighted RMSE scores:
160        Majority    0.8228
161          MTTree    0.3949
162             PLS    0.3021
163           Earth    0.2880
164
165==========================
166Multi-label classification
167==========================
168
169Multi-label classification requires different metrics than those used in
170traditional single-label classification. This module presents the various
171metrics that have been proposed in the literature. Let :math:`D` be a
172multi-label evaluation data set, conisting of :math:`|D|` multi-label examples
173:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H`
174be a multi-label classifier and :math:`Z_i=H(x_i)` be the set of labels
175predicted by :math:`H` for example :math:`x_i`.
176
177.. autofunction:: mlc_hamming_loss
178.. autofunction:: mlc_accuracy
179.. autofunction:: mlc_precision
180.. autofunction:: mlc_recall
181
182The following script demonstrates the use of those evaluation measures:
183
184.. literalinclude:: code/mlc-evaluate.py
185
186The output should look like this::
187
188    loss= [0.9375]
189    accuracy= [0.875]
190    precision= [1.0]
191    recall= [0.875]
192
193References
194==========
195
196Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification',
197Pattern Recogintion, vol.37, no.9, pp:1757-71
198
199Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper
200presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
201(PAKDD 2004)
202
203Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',
204Machine Learning, vol.39, no.2/3, pp:135-68.
Note: See TracBrowser for help on using the repository browser.