source: orange/docs/reference/rst/Orange.evaluation.scoring.rst @ 10784:01b5e2a07b00

Revision 10784:01b5e2a07b00, 6.4 KB checked in by Miran@…, 2 years ago (diff)

Documentation fix

Line 
1.. automodule:: Orange.evaluation.scoring
2
3############################
4Method scoring (``scoring``)
5############################
6
7.. index: scoring
8
9Scoring plays and integral role in evaluation of any prediction model. Orange
10implements various scores for evaluation of classification,
11regression and multi-label models. Most of the methods needs to be called
12with an instance of :obj:`~Orange.evaluation.testing.ExperimentResults`.
13
14.. literalinclude:: code/scoring-example.py
15
16==============
17Classification
18==============
19
20Calibration scores
21==================
22Many scores for evaluation of the classification models measure whether the
23model assigns the correct class value to the test instances. Many of these
24scores can be computed solely from the confusion matrix constructed manually
25with the :obj:`confusion_matrices` function. If class variable has more than
26two values, the index of the value to calculate the confusion matrix for should
27be passed as well.
28
29.. autofunction:: CA
30.. autofunction:: Sensitivity
31.. autofunction:: Specificity
32.. autofunction:: PPV
33.. autofunction:: NPV
34.. autofunction:: Precision
35.. autofunction:: Recall
36.. autofunction:: F1
37.. autofunction:: Falpha
38.. autofunction:: MCC
39.. autofunction:: AP
40.. autofunction:: IS
41.. autofunction:: confusion_chi_square
42
43Discriminatory scores
44=====================
45Scores that measure how good can the prediction model separate instances with
46different classes are called discriminatory scores.
47
48.. autofunction:: Brier_score
49
50.. autofunction:: AUC
51.. autofunction:: AUC_for_single_class
52.. autofunction:: AUC_matrix
53.. autofunction:: AUCWilcoxon
54
55.. autofunction:: compute_ROC
56
57.. autofunction:: confusion_matrices
58
59.. autoclass:: ConfusionMatrix
60
61
62Comparison of Algorithms
63========================
64
65.. autofunction:: McNemar
66
67.. autofunction:: McNemar_of_two
68
69==========
70Regression
71==========
72
73Several alternative measures, as given below, can be used to evaluate
74the sucess of numeric prediction:
75
76.. image:: files/statRegression.png
77
78.. autofunction:: MSE
79
80.. autofunction:: RMSE
81
82.. autofunction:: MAE
83
84.. autofunction:: RSE
85
86.. autofunction:: RRSE
87
88.. autofunction:: RAE
89
90.. autofunction:: R2
91
92The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to
93score several regression methods.
94
95The code above produces the following output::
96
97    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
98    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
99    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
100    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
101    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715
102
103=================
104Ploting functions
105=================
106
107.. autofunction:: graph_ranks
108
109The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:
110
111.. literalinclude:: code/statExamplesGraphRanks.py
112
113Code produces the following graph:
114
115.. image:: files/statExamplesGraphRanks1.png
116
117.. autofunction:: compute_CD
118
119.. autofunction:: compute_friedman
120
121=================
122Utility Functions
123=================
124
125.. autofunction:: split_by_iterations
126
127
128.. _mt-scoring:
129
130============
131Multi-target
132============
133
134:doc:`Multi-target <Orange.multitarget>` classifiers predict values for
135multiple target classes. They can be used with standard
136:obj:`~Orange.evaluation.testing` procedures (e.g.
137:obj:`~Orange.evaluation.testing.Evaluation.cross_validation`), but require
138special scoring functions to compute a single score from the obtained
139:obj:`~Orange.evaluation.testing.ExperimentResults`.
140Since different targets can vary in importance depending on the experiment,
141some methods have options to indicate this e.g. through weights or customized
142distance functions. These can also be used for normalization in case target
143values do not have the same scales.
144
145.. autofunction:: mt_flattened_score
146.. autofunction:: mt_average_score
147
148The whole procedure of evaluating multi-target methods and computing
149the scores (RMSE errors) is shown in the following example
150(:download:`mt-evaluate.py <code/mt-evaluate.py>`). Because we consider
151the first target to be more important and the last not so much we will
152indicate this using appropriate weights.
153
154.. literalinclude:: code/mt-evaluate.py
155
156Which outputs::
157
158    Weighted RMSE scores:
159        Majority    0.8228
160          MTTree    0.3949
161             PLS    0.3021
162           Earth    0.2880
163
164Two more accuracy measures based on the article by Zaragoza et al.(2011); applicable to discrete classes:
165
166Global accuracy (accuracy per example) over d-dimensional class variable:
167
168.. autofunction:: mt_global_accuracy
169
170Mean accuracy (accuracy per class or per label) over d class variables:
171
172.. autofunction:: mt_mean_accuracy   
173
174References
175==========
176
177Zaragoza, J.H., Sucar, L.E., Morales, E.F.,Bielza, C., Larranaga, P.  (2011). 'Bayesian Chain Classifiers for Multidimensional Classification', Proc. of the International Joint Conference on Artificial Intelligence (IJCAI-2011),  pp:2192-2197.
178
179==========================
180Multi-label classification
181==========================
182
183Multi-label classification requires different metrics than those used in
184traditional single-label classification. This module presents the various
185metrics that have been proposed in the literature. Let :math:`D` be a
186multi-label evaluation data set, conisting of :math:`|D|` multi-label examples
187:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H`
188be a multi-label classifier and :math:`Z_i=H(x_i)` be the set of labels
189predicted by :math:`H` for example :math:`x_i`.
190
191.. autofunction:: mlc_hamming_loss
192.. autofunction:: mlc_accuracy
193.. autofunction:: mlc_precision
194.. autofunction:: mlc_recall
195
196The following script demonstrates the use of those evaluation measures:
197
198.. literalinclude:: code/mlc-evaluate.py
199
200The output should look like this::
201
202    loss= [0.9375]
203    accuracy= [0.875]
204    precision= [1.0]
205    recall= [0.875]
206
207References
208==========
209
210Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification',
211Pattern Recogintion, vol.37, no.9, pp:1757-71
212
213Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper
214presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
215(PAKDD 2004)
216
217Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',
218Machine Learning, vol.39, no.2/3, pp:135-68.
Note: See TracBrowser for help on using the repository browser.