source: orange/docs/reference/rst/Orange.evaluation.scoring.rst @ 10340:e2b32b9880cb

Revision 10340:e2b32b9880cb, 5.5 KB checked in by Lan Zagar <lan.zagar@…>, 2 years ago (diff)

Added multitarget evaluation example.

Line 
1.. automodule:: Orange.evaluation.scoring
2
3############################
4Method scoring (``scoring``)
5############################
6
7.. index: scoring
8
9Scoring plays and integral role in evaluation of any prediction model. Orange
10implements various scores for evaluation of classification,
11regression and multi-label models. Most of the methods needs to be called
12with an instance of :obj:`~Orange.evaluation.testing.ExperimentResults`.
13
14.. literalinclude:: code/scoring-example.py
15
16==============
17Classification
18==============
19
20Calibration scores
21==================
22Many scores for evaluation of the classification models measure whether the
23model assigns the correct class value to the test instances. Many of these
24scores can be computed solely from the confusion matrix constructed manually
25with the :obj:`confusion_matrices` function. If class variable has more than
26two values, the index of the value to calculate the confusion matrix for should
27be passed as well.
28
29.. autoclass:: CA
30.. autofunction:: sens
31.. autofunction:: spec
32.. autofunction:: PPV
33.. autofunction:: NPV
34.. autofunction:: precision
35.. autofunction:: recall
36.. autofunction:: F1
37.. autofunction:: Falpha
38.. autofunction:: MCC
39.. autofunction:: AP
40.. autofunction:: IS
41.. autofunction:: confusion_chi_square
42
43Discriminatory scores
44=====================
45Scores that measure how good can the prediction model separate instances with
46different classes are called discriminatory scores.
47
48.. autofunction:: Brier_score
49
50.. autoclass:: AUC
51    :members: by_weighted_pairs, by_pairs,
52              weighted_one_against_all, one_against_all, single_class, pair,
53              matrix
54
55.. autofunction:: AUCWilcoxon
56
57.. autofunction:: compute_ROC
58
59.. autofunction:: confusion_matrices
60
61.. autoclass:: ConfusionMatrix
62
63
64Comparison of Algorithms
65========================
66
67.. autofunction:: McNemar
68
69.. autofunction:: McNemar_of_two
70
71==========
72Regression
73==========
74
75Several alternative measures, as given below, can be used to evaluate
76the sucess of numeric prediction:
77
78.. image:: files/statRegression.png
79
80.. autofunction:: MSE
81
82.. autofunction:: RMSE
83
84.. autofunction:: MAE
85
86.. autofunction:: RSE
87
88.. autofunction:: RRSE
89
90.. autofunction:: RAE
91
92.. autofunction:: R2
93
94The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to
95score several regression methods.
96
97The code above produces the following output::
98
99    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
100    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
101    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
102    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
103    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715
104
105=================
106Ploting functions
107=================
108
109.. autofunction:: graph_ranks
110
111The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:
112
113.. literalinclude:: code/statExamplesGraphRanks.py
114
115Code produces the following graph:
116
117.. image:: files/statExamplesGraphRanks1.png
118
119.. autofunction:: compute_CD
120
121.. autofunction:: compute_friedman
122
123=================
124Utility Functions
125=================
126
127.. autofunction:: split_by_iterations
128
129
130.. _mt-scoring:
131
132============
133Multi-target
134============
135
136:doc:`Multi-target <Orange.multitarget>` classifiers predict values for
137multiple target classes. They can be used with standard
138:obj:`~Orange.evaluation.testing` procedures (e.g.
139:obj:`~Orange.evaluation.testing.Evaluation.cross_validation`), but require special
140scoring functions to compute a single score from the obtained
141:obj:`~Orange.evaluation.testing.ExperimentResults`.
142
143.. autofunction:: mt_flattened_score
144.. autofunction:: mt_average_score
145
146The whole procedure of evaluating multi-target methods and computing the scores
147(RMSE errors) is shown in the following example (:download:`mt-evaluate.py <code/mt-evaluate.py>`):
148
149.. literalinclude:: code/mt-evaluate.py
150
151Which outputs::
152
153    Weighted RMSE scores:
154        Majority    0.8228
155          MTTree    0.3949
156             PLS    0.3021
157           Earth    0.2880
158
159==========================
160Multi-label classification
161==========================
162
163Multi-label classification requires different metrics than those used in
164traditional single-label classification. This module presents the various
165metrics that have been proposed in the literature. Let :math:`D` be a
166multi-label evaluation data set, conisting of :math:`|D|` multi-label examples
167:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H`
168be a multi-label classifier and :math:`Z_i=H(x_i)` be the set of labels
169predicted by :math:`H` for example :math:`x_i`.
170
171.. autofunction:: mlc_hamming_loss
172.. autofunction:: mlc_accuracy
173.. autofunction:: mlc_precision
174.. autofunction:: mlc_recall
175
176The following script demonstrates the use of those evaluation measures:
177
178.. literalinclude:: code/mlc-evaluate.py
179
180The output should look like this::
181
182    loss= [0.9375]
183    accuracy= [0.875]
184    precision= [1.0]
185    recall= [0.875]
186
187References
188==========
189
190Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification',
191Pattern Recogintion, vol.37, no.9, pp:1757-71
192
193Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper
194presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining
195(PAKDD 2004)
196
197Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',
198Machine Learning, vol.39, no.2/3, pp:135-68.
Note: See TracBrowser for help on using the repository browser.