#
source:
orange/docs/reference/rst/Orange.classification.logreg.rst
@
9818:2ec8ecdb81e5

Revision 9818:2ec8ecdb81e5, 9.1 KB checked in by Matija Polajnar <matija.polajnar@…>, 2 years ago (diff) |
---|

Line | |
---|---|

1 | .. automodule:: Orange.classification.logreg |

2 | |

3 | .. index: logistic regression |

4 | .. index: |

5 | single: classification; logistic regression |

6 | |

7 | ******************************** |

8 | Logistic regression (``logreg``) |

9 | ******************************** |

10 | |

11 | `Logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ |

12 | is a statistical classification methods that fits data to a logistic |

13 | function. Orange's implementation of algorithm |

14 | can handle various anomalies in features, such as constant variables and |

15 | singularities, that could make direct fitting of logistic regression almost |

16 | impossible. Stepwise logistic regression, which iteratively selects the most |

17 | informative features, is also supported. |

18 | |

19 | .. autoclass:: LogRegLearner |

20 | :members: |

21 | |

22 | .. class :: LogRegClassifier |

23 | |

24 | A logistic regression classification model. Stores estimated values of |

25 | regression coefficients and their significances, and uses them to predict |

26 | classes and class probabilities. |

27 | |

28 | .. attribute :: beta |

29 | |

30 | Estimated regression coefficients. |

31 | |

32 | .. attribute :: beta_se |

33 | |

34 | Estimated standard errors for regression coefficients. |

35 | |

36 | .. attribute :: wald_Z |

37 | |

38 | Wald Z statistics for beta coefficients. Wald Z is computed |

39 | as beta/beta_se. |

40 | |

41 | .. attribute :: P |

42 | |

43 | List of P-values for beta coefficients, that is, the probability |

44 | that beta coefficients differ from 0.0. The probability is |

45 | computed from squared Wald Z statistics that is distributed with |

46 | Chi-Square distribution. |

47 | |

48 | .. attribute :: likelihood |

49 | |

50 | The probability of the sample (ie. learning examples) observed on |

51 | the basis of the derived model, as a function of the regression |

52 | parameters. |

53 | |

54 | .. attribute :: fit_status |

55 | |

56 | Tells how the model fitting ended - either regularly |

57 | (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta |

58 | coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`) |

59 | or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The |

60 | value tells about the classifier's "reliability"; the classifier |

61 | itself is useful in either case. |

62 | |

63 | .. method:: __call__(instance, result_type) |

64 | |

65 | Classify a new instance. |

66 | |

67 | :param instance: instance to be classified. |

68 | :type instance: :class:`~Orange.data.Instance` |

69 | :param result_type: :class:`~Orange.classification.Classifier.GetValue` or |

70 | :class:`~Orange.classification.Classifier.GetProbabilities` or |

71 | :class:`~Orange.classification.Classifier.GetBoth` |

72 | |

73 | :rtype: :class:`~Orange.data.Value`, |

74 | :class:`~Orange.statistics.distribution.Distribution` or a |

75 | tuple with both |

76 | |

77 | |

78 | .. class:: LogRegFitter |

79 | |

80 | :obj:`LogRegFitter` is the abstract base class for logistic fitters. It |

81 | defines the form of call operator and the constants denoting its |

82 | (un)success: |

83 | |

84 | .. attribute:: OK |

85 | |

86 | Fitter succeeded to converge to the optimal fit. |

87 | |

88 | .. attribute:: Infinity |

89 | |

90 | Fitter failed due to one or more beta coefficients escaping towards infinity. |

91 | |

92 | .. attribute:: Divergence |

93 | |

94 | Beta coefficients failed to converge, but none of beta coefficients escaped. |

95 | |

96 | .. attribute:: Constant |

97 | |

98 | There is a constant attribute that causes the matrix to be singular. |

99 | |

100 | .. attribute:: Singularity |

101 | |

102 | The matrix is singular. |

103 | |

104 | |

105 | .. method:: __call__(examples, weight_id) |

106 | |

107 | Performs the fitting. There can be two different cases: either |

108 | the fitting succeeded to find a set of beta coefficients (although |

109 | possibly with difficulties) or the fitting failed altogether. The |

110 | two cases return different results. |

111 | |

112 | `(status, beta, beta_se, likelihood)` |

113 | The fitter managed to fit the model. The first element of |

114 | the tuple, result, tells about the problems occurred; it can |

115 | be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In |

116 | the latter cases, returned values may still be useful for |

117 | making predictions, but it's recommended that you inspect |

118 | the coefficients and their errors and make your decision |

119 | whether to use the model or not. |

120 | |

121 | `(status, attribute)` |

122 | The fitter failed and the returned attribute is responsible |

123 | for it. The type of failure is reported in status, which |

124 | can be either :obj:`Constant` or :obj:`Singularity`. |

125 | |

126 | The proper way of calling the fitter is to expect and handle all |

127 | the situations described. For instance, if fitter is an instance |

128 | of some fitter and examples contain a set of suitable examples, |

129 | a script should look like this:: |

130 | |

131 | res = fitter(examples) |

132 | if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]: |

133 | status, beta, beta_se, likelihood = res |

134 | < proceed by doing something with what you got > |

135 | else: |

136 | status, attr = res |

137 | < remove the attribute or complain to the user or ... > |

138 | |

139 | |

140 | .. class :: LogRegFitter_Cholesky |

141 | |

142 | The sole fitter available at the |

143 | moment. It is a C++ translation of `Alan Miller's logistic regression |

144 | code <http://users.bigpond.net.au/amiller/>`_. It uses Newton-Raphson |

145 | algorithm to iteratively minimize least squares error computed from |

146 | learning examples. |

147 | |

148 | |

149 | .. autoclass:: StepWiseFSS |

150 | :members: |

151 | :show-inheritance: |

152 | |

153 | .. autofunction:: dump |

154 | |

155 | |

156 | |

157 | Examples |

158 | -------- |

159 | |

160 | The first example shows a very simple induction of a logistic regression |

161 | classifier (:download:`logreg-run.py <code/logreg-run.py>`). |

162 | |

163 | .. literalinclude:: code/logreg-run.py |

164 | |

165 | Result:: |

166 | |

167 | Classification accuracy: 0.778282598819 |

168 | |

169 | class attribute = survived |

170 | class values = <no, yes> |

171 | |

172 | Attribute beta st. error wald Z P OR=exp(beta) |

173 | |

174 | Intercept -1.23 0.08 -15.15 -0.00 |

175 | status=first 0.86 0.16 5.39 0.00 2.36 |

176 | status=second -0.16 0.18 -0.91 0.36 0.85 |

177 | status=third -0.92 0.15 -6.12 0.00 0.40 |

178 | age=child 1.06 0.25 4.30 0.00 2.89 |

179 | sex=female 2.42 0.14 17.04 0.00 11.25 |

180 | |

181 | The next examples shows how to handle singularities in data sets |

182 | (:download:`logreg-singularities.py <code/logreg-singularities.py>`). |

183 | |

184 | .. literalinclude:: code/logreg-singularities.py |

185 | |

186 | The first few lines of the output of this script are:: |

187 | |

188 | <=50K <=50K |

189 | <=50K <=50K |

190 | <=50K <=50K |

191 | >50K >50K |

192 | <=50K >50K |

193 | |

194 | class attribute = y |

195 | class values = <>50K, <=50K> |

196 | |

197 | Attribute beta st. error wald Z P OR=exp(beta) |

198 | |

199 | Intercept 6.62 -0.00 -inf 0.00 |

200 | age -0.04 0.00 -inf 0.00 0.96 |

201 | fnlwgt -0.00 0.00 -inf 0.00 1.00 |

202 | education-num -0.28 0.00 -inf 0.00 0.76 |

203 | marital-status=Divorced 4.29 0.00 inf 0.00 72.62 |

204 | marital-status=Never-married 3.79 0.00 inf 0.00 44.45 |

205 | marital-status=Separated 3.46 0.00 inf 0.00 31.95 |

206 | marital-status=Widowed 3.85 0.00 inf 0.00 46.96 |

207 | marital-status=Married-spouse-absent 3.98 0.00 inf 0.00 53.63 |

208 | marital-status=Married-AF-spouse 4.01 0.00 inf 0.00 55.19 |

209 | occupation=Tech-support -0.32 0.00 -inf 0.00 0.72 |

210 | |

211 | If :obj:`remove_singular` is set to 0, inducing a logistic regression |

212 | classifier would return an error:: |

213 | |

214 | Traceback (most recent call last): |

215 | File "logreg-singularities.py", line 4, in <module> |

216 | lr = classification.logreg.LogRegLearner(table, removeSingular=0) |

217 | File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner |

218 | return lr(examples, weightID) |

219 | File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__ |

220 | lr = learner(examples, weight) |

221 | orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked |

222 | |

223 | We can see that the attribute workclass is causing a singularity. |

224 | |

225 | The example below shows how the use of stepwise logistic regression can help to |

226 | gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`): |

227 | |

228 | .. literalinclude:: code/logreg-stepwise.py |

229 | |

230 | The output of this script is:: |

231 | |

232 | Learner CA |

233 | logistic 0.841 |

234 | filtered 0.846 |

235 | |

236 | Number of times attributes were used in cross-validation: |

237 | 1 x a21 |

238 | 10 x a22 |

239 | 8 x a23 |

240 | 7 x a24 |

241 | 1 x a25 |

242 | 10 x a26 |

243 | 10 x a27 |

244 | 3 x a28 |

245 | 7 x a29 |

246 | 9 x a31 |

247 | 2 x a16 |

248 | 7 x a12 |

249 | 1 x a32 |

250 | 8 x a15 |

251 | 10 x a14 |

252 | 4 x a17 |

253 | 7 x a30 |

254 | 10 x a11 |

255 | 1 x a10 |

256 | 1 x a13 |

257 | 10 x a34 |

258 | 2 x a19 |

259 | 1 x a18 |

260 | 10 x a3 |

261 | 10 x a5 |

262 | 4 x a4 |

263 | 4 x a7 |

264 | 8 x a6 |

265 | 10 x a9 |

266 | 10 x a8 |

**Note:**See TracBrowser for help on using the repository browser.