#
source:
orange/docs/reference/rst/Orange.evaluation.scoring.rst
@
10340:e2b32b9880cb

Revision 10340:e2b32b9880cb, 5.5 KB checked in by Lan Zagar <lan.zagar@…>, 2 years ago (diff) |
---|

Line | |
---|---|

1 | .. automodule:: Orange.evaluation.scoring |

2 | |

3 | ############################ |

4 | Method scoring (``scoring``) |

5 | ############################ |

6 | |

7 | .. index: scoring |

8 | |

9 | Scoring plays and integral role in evaluation of any prediction model. Orange |

10 | implements various scores for evaluation of classification, |

11 | regression and multi-label models. Most of the methods needs to be called |

12 | with an instance of :obj:`~Orange.evaluation.testing.ExperimentResults`. |

13 | |

14 | .. literalinclude:: code/scoring-example.py |

15 | |

16 | ============== |

17 | Classification |

18 | ============== |

19 | |

20 | Calibration scores |

21 | ================== |

22 | Many scores for evaluation of the classification models measure whether the |

23 | model assigns the correct class value to the test instances. Many of these |

24 | scores can be computed solely from the confusion matrix constructed manually |

25 | with the :obj:`confusion_matrices` function. If class variable has more than |

26 | two values, the index of the value to calculate the confusion matrix for should |

27 | be passed as well. |

28 | |

29 | .. autoclass:: CA |

30 | .. autofunction:: sens |

31 | .. autofunction:: spec |

32 | .. autofunction:: PPV |

33 | .. autofunction:: NPV |

34 | .. autofunction:: precision |

35 | .. autofunction:: recall |

36 | .. autofunction:: F1 |

37 | .. autofunction:: Falpha |

38 | .. autofunction:: MCC |

39 | .. autofunction:: AP |

40 | .. autofunction:: IS |

41 | .. autofunction:: confusion_chi_square |

42 | |

43 | Discriminatory scores |

44 | ===================== |

45 | Scores that measure how good can the prediction model separate instances with |

46 | different classes are called discriminatory scores. |

47 | |

48 | .. autofunction:: Brier_score |

49 | |

50 | .. autoclass:: AUC |

51 | :members: by_weighted_pairs, by_pairs, |

52 | weighted_one_against_all, one_against_all, single_class, pair, |

53 | matrix |

54 | |

55 | .. autofunction:: AUCWilcoxon |

56 | |

57 | .. autofunction:: compute_ROC |

58 | |

59 | .. autofunction:: confusion_matrices |

60 | |

61 | .. autoclass:: ConfusionMatrix |

62 | |

63 | |

64 | Comparison of Algorithms |

65 | ======================== |

66 | |

67 | .. autofunction:: McNemar |

68 | |

69 | .. autofunction:: McNemar_of_two |

70 | |

71 | ========== |

72 | Regression |

73 | ========== |

74 | |

75 | Several alternative measures, as given below, can be used to evaluate |

76 | the sucess of numeric prediction: |

77 | |

78 | .. image:: files/statRegression.png |

79 | |

80 | .. autofunction:: MSE |

81 | |

82 | .. autofunction:: RMSE |

83 | |

84 | .. autofunction:: MAE |

85 | |

86 | .. autofunction:: RSE |

87 | |

88 | .. autofunction:: RRSE |

89 | |

90 | .. autofunction:: RAE |

91 | |

92 | .. autofunction:: R2 |

93 | |

94 | The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to |

95 | score several regression methods. |

96 | |

97 | The code above produces the following output:: |

98 | |

99 | Learner MSE RMSE MAE RSE RRSE RAE R2 |

100 | maj 84.585 9.197 6.653 1.002 1.001 1.001 -0.002 |

101 | rt 40.015 6.326 4.592 0.474 0.688 0.691 0.526 |

102 | knn 21.248 4.610 2.870 0.252 0.502 0.432 0.748 |

103 | lr 24.092 4.908 3.425 0.285 0.534 0.515 0.715 |

104 | |

105 | ================= |

106 | Ploting functions |

107 | ================= |

108 | |

109 | .. autofunction:: graph_ranks |

110 | |

111 | The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph: |

112 | |

113 | .. literalinclude:: code/statExamplesGraphRanks.py |

114 | |

115 | Code produces the following graph: |

116 | |

117 | .. image:: files/statExamplesGraphRanks1.png |

118 | |

119 | .. autofunction:: compute_CD |

120 | |

121 | .. autofunction:: compute_friedman |

122 | |

123 | ================= |

124 | Utility Functions |

125 | ================= |

126 | |

127 | .. autofunction:: split_by_iterations |

128 | |

129 | |

130 | .. _mt-scoring: |

131 | |

132 | ============ |

133 | Multi-target |

134 | ============ |

135 | |

136 | :doc:`Multi-target <Orange.multitarget>` classifiers predict values for |

137 | multiple target classes. They can be used with standard |

138 | :obj:`~Orange.evaluation.testing` procedures (e.g. |

139 | :obj:`~Orange.evaluation.testing.Evaluation.cross_validation`), but require special |

140 | scoring functions to compute a single score from the obtained |

141 | :obj:`~Orange.evaluation.testing.ExperimentResults`. |

142 | |

143 | .. autofunction:: mt_flattened_score |

144 | .. autofunction:: mt_average_score |

145 | |

146 | The whole procedure of evaluating multi-target methods and computing the scores |

147 | (RMSE errors) is shown in the following example (:download:`mt-evaluate.py <code/mt-evaluate.py>`): |

148 | |

149 | .. literalinclude:: code/mt-evaluate.py |

150 | |

151 | Which outputs:: |

152 | |

153 | Weighted RMSE scores: |

154 | Majority 0.8228 |

155 | MTTree 0.3949 |

156 | PLS 0.3021 |

157 | Earth 0.2880 |

158 | |

159 | ========================== |

160 | Multi-label classification |

161 | ========================== |

162 | |

163 | Multi-label classification requires different metrics than those used in |

164 | traditional single-label classification. This module presents the various |

165 | metrics that have been proposed in the literature. Let :math:`D` be a |

166 | multi-label evaluation data set, conisting of :math:`|D|` multi-label examples |

167 | :math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H` |

168 | be a multi-label classifier and :math:`Z_i=H(x_i)` be the set of labels |

169 | predicted by :math:`H` for example :math:`x_i`. |

170 | |

171 | .. autofunction:: mlc_hamming_loss |

172 | .. autofunction:: mlc_accuracy |

173 | .. autofunction:: mlc_precision |

174 | .. autofunction:: mlc_recall |

175 | |

176 | The following script demonstrates the use of those evaluation measures: |

177 | |

178 | .. literalinclude:: code/mlc-evaluate.py |

179 | |

180 | The output should look like this:: |

181 | |

182 | loss= [0.9375] |

183 | accuracy= [0.875] |

184 | precision= [1.0] |

185 | recall= [0.875] |

186 | |

187 | References |

188 | ========== |

189 | |

190 | Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification', |

191 | Pattern Recogintion, vol.37, no.9, pp:1757-71 |

192 | |

193 | Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper |

194 | presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining |

195 | (PAKDD 2004) |

196 | |

197 | Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization', |

198 | Machine Learning, vol.39, no.2/3, pp:135-68. |

**Note:**See TracBrowser for help on using the repository browser.