source: orange/docs/reference/rst/Orange.evaluation.scoring.rst @ 10343:528f672373f5

Revision 10343:528f672373f5, 5.9 KB checked in by Lan Zagar <lan.zagar@…>, 2 years ago (diff)

Improved multitarget scoring.

Line 
1.. automodule:: Orange.evaluation.scoring
2
3############################
4Method scoring (``scoring``)
5############################
6
7.. index: scoring
8
9Scoring plays and integral role in evaluation of any prediction model. Orange
10implements various scores for evaluation of classification,
11regression and multi-label models. Most of the methods needs to be called
12with an instance of :obj:`~Orange.evaluation.testing.ExperimentResults`.
13
14.. literalinclude:: code/scoring-example.py
15
16==============
17Classification
18==============
19
20Calibration scores
21==================
22Many scores for evaluation of the classification models measure whether the
23model assigns the correct class value to the test instances. Many of these
24scores can be computed solely from the confusion matrix constructed manually
25with the :obj:`confusion_matrices` function. If class variable has more than
26two values, the index of the value to calculate the confusion matrix for should
27be passed as well.
28
29.. autoclass:: CA
30.. autofunction:: sens
31.. autofunction:: spec
32.. autofunction:: PPV
33.. autofunction:: NPV
34.. autofunction:: precision
35.. autofunction:: recall
36.. autofunction:: F1
37.. autofunction:: Falpha
38.. autofunction:: MCC
39.. autofunction:: AP
40.. autofunction:: IS
41.. autofunction:: confusion_chi_square
42
43Discriminatory scores
44=====================
45Scores that measure how good can the prediction model separate instances with
46different classes are called discriminatory scores.
47
48.. autofunction:: Brier_score
49
50.. autoclass:: AUC
51    :members: by_weighted_pairs, by_pairs,
52              weighted_one_against_all, one_against_all, single_class, pair,
53              matrix
54
55.. autofunction:: AUCWilcoxon
56
57.. autofunction:: compute_ROC
58
59.. autofunction:: confusion_matrices
60
61.. autoclass:: ConfusionMatrix
62
63
64Comparison of Algorithms
65========================
66
67.. autofunction:: McNemar
68
69.. autofunction:: McNemar_of_two
70
71==========
72Regression
73==========
74
75Several alternative measures, as given below, can be used to evaluate
76the sucess of numeric prediction:
77
78.. image:: files/statRegression.png
79
80.. autofunction:: MSE
81
82.. autofunction:: RMSE
83
84.. autofunction:: MAE
85
86.. autofunction:: RSE
87
88.. autofunction:: RRSE
89
90.. autofunction:: RAE
91
92.. autofunction:: R2
93
94The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to
95score several regression methods.
96
97The code above produces the following output::
98
99    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
100    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
101    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
102    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
103    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715
104
105=================
106Ploting functions
107=================
108
109.. autofunction:: graph_ranks
110
111The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:
112
113.. literalinclude:: code/statExamplesGraphRanks.py
114
115Code produces the following graph:
116
117.. image:: files/statExamplesGraphRanks1.png
118
119.. autofunction:: compute_CD
120
121.. autofunction:: compute_friedman
122
123=================
124Utility Functions
125=================
126
127.. autofunction:: split_by_iterations
128
129
130.. _mt-scoring:
131
132============
133Multi-target
134============
135
136:doc:`Multi-target <Orange.multitarget>` classifiers predict values for
137multiple target classes. They can be used with standard
138:obj:`~Orange.evaluation.testing` procedures (e.g.
139:obj:`~Orange.evaluation.testing.Evaluation.cross_validation`), but require
140special scoring functions to compute a single score from the obtained
141:obj:`~Orange.evaluation.testing.ExperimentResults`.
142Since different targets can vary in importance depending on the experiment,
143some methods have options to indicate this e.g. through weights or customized
144distance functions. These can also be used for normalization in case target
145values do not have the same scales.
146
147.. autofunction:: mt_flattened_score
148.. autofunction:: mt_average_score
149
150The whole procedure of evaluating multi-target methods and computing
151the scores (RMSE errors) is shown in the following example
152(:download:`mt-evaluate.py <code/mt-evaluate.py>`). Because we consider
153the first target to be more important and the last not so much we will
154indicate this using appropriate weights.
155
156.. literalinclude:: code/mt-evaluate.py
157
158Which outputs::
159
160    Weighted RMSE scores:
161        Majority    0.8228
162          MTTree    0.3949
163             PLS    0.3021
164           Earth    0.2880
165
166==========================
167Multi-label classification
168==========================
169
170Multi-label classification requires different metrics than those used in
171traditional single-label classification. This module presents the various
172metrics that have been proposed in the literature. Let :math:`D` be a
173multi-label evaluation data set, conisting of :math:`|D|` multi-label examples
174:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H`
175be a multi-label classifier and :math:`Z_i=H(x_i)` be the set of labels
176predicted by :math:`H` for example :math:`x_i`.
177
178.. autofunction:: mlc_hamming_loss
179.. autofunction:: mlc_accuracy
180.. autofunction:: mlc_precision
181.. autofunction:: mlc_recall
182
183The following script demonstrates the use of those evaluation measures:
184
185.. literalinclude:: code/mlc-evaluate.py
186
187The output should look like this::
188
189    loss= [0.9375]
190    accuracy= [0.875]
191    precision= [1.0]
192    recall= [0.875]
193
194References
195==========
196
197Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification',
198Pattern Recogintion, vol.37, no.9, pp:1757-71
199
200Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper
201presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
202(PAKDD 2004)
203
204Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',
205Machine Learning, vol.39, no.2/3, pp:135-68.
Note: See TracBrowser for help on using the repository browser.