#
source:
orange/docs/reference/rst/Orange.statistics.estimate.rst
@
10246:11b418321f79

Revision 10246:11b418321f79, 14.1 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff) |
---|

Line | |
---|---|

1 | .. automodule:: Orange.statistics.estimate |

2 | |

3 | .. index:: Probability Estimation |

4 | |

5 | ======================================= |

6 | Probability Estimation (``estimate``) |

7 | ======================================= |

8 | |

9 | Probability estimators compute probabilities of values of class variable. |

10 | They come in two flavours: |

11 | |

12 | #. for unconditional probabilities (:math:`p(C=c)`, where :math:`c` is a |

13 | class) and |

14 | |

15 | #. for conditional probabilities (:math:`p(C=c|V=v)`, |

16 | where :math:`v` is a feature value). |

17 | |

18 | A duality much like the one between learners and classifiers exists between |

19 | probability estimator constructors and probability estimators: when a |

20 | probability estimator constructor is called with data, it constructs a |

21 | probability estimator that can then be called with a value of class variable |

22 | to obtain a probability of that value. This duality is mainly needed to |

23 | enable probability estimation for continuous variables, |

24 | where it is not possible to generate a list of probabilities of all possible |

25 | values in advance. |

26 | |

27 | First, probability estimation constructors for common probability estimation |

28 | techniques are enumerated. Base classes, knowledge of which is needed to |

29 | develop new techniques, are described later in this document. |

30 | |

31 | Probability Estimation Constructors |

32 | =================================== |

33 | |

34 | .. class:: RelativeFrequency |

35 | |

36 | Bases: :class:`EstimatorConstructor` |

37 | |

38 | Compute distribution using relative frequencies of classes. |

39 | |

40 | :rtype: :class:`EstimatorFromDistribution` |

41 | |

42 | .. class:: Laplace |

43 | |

44 | Bases: :class:`EstimatorConstructor` |

45 | |

46 | Use Laplace estimation to compute distribution from frequencies of classes: |

47 | |

48 | .. math:: |

49 | |

50 | p(c) = \\frac{Nc+1}{N+n} |

51 | |

52 | where :math:`Nc` is number of occurrences of an event (e.g. number of |

53 | instances in class c), :math:`N` is the total number of events (instances) |

54 | and :math:`n` is the number of different events (classes). |

55 | |

56 | :rtype: :class:`EstimatorFromDistribution` |

57 | |

58 | .. class:: M |

59 | |

60 | Bases: :class:`EstimatorConstructor` |

61 | |

62 | .. method:: __init__(m) |

63 | |

64 | :param m: Parameter for m-estimation. |

65 | :type m: int |

66 | |

67 | Use m-estimation to compute distribution from frequencies of classes: |

68 | |

69 | .. math:: |

70 | |

71 | p(c) = \\frac{Nc+m*ap(c)}{N+m} |

72 | |

73 | where :math:`Nc` is number of occurrences of an event (e.g. number of |

74 | instances in class c), :math:`N` is the total number of events (instances) |

75 | and :math:`ap(c)` is the prior probability of event (class) c. |

76 | |

77 | :rtype: :class:`EstimatorFromDistribution` |

78 | |

79 | .. class:: Kernel |

80 | |

81 | Bases: :class:`EstimatorConstructor` |

82 | |

83 | .. method:: __init__(min_impact, smoothing, n_points) |

84 | |

85 | :param min_impact: A requested minimal weight of a point (default: |

86 | 0.01); points with lower weights won't be taken into account. |

87 | :type min_impact: float |

88 | |

89 | :param smoothing: Smoothing factor (default: 1.144). |

90 | :type smoothing: float |

91 | |

92 | :param n_points: Number of points for the interpolating curve. If |

93 | negative, say -3 (default), 3 points will be inserted between each |

94 | data points. |

95 | :type n_points: int |

96 | |

97 | Compute probabilities for continuous variable for certain number of points |

98 | using Gaussian kernels. The resulting point-wise continuous distribution is |

99 | stored as :class:`~Orange.statistics.distribution.Continuous`. |

100 | |

101 | Probabilities are always computed at all points that |

102 | are present in the data (i.e. the existing values of the continuous |

103 | feature). If :obj:`n_points` is positive and greater than the |

104 | number of existing data points, additional points are inserted |

105 | between the existing points to achieve the required number of |

106 | points. Approximately equal number of new points is inserted between |

107 | each adjacent existing point each data points. If :obj:`n_points` is |

108 | negative, its absolute value determines the number of points to be added |

109 | between each two data points. |

110 | |

111 | :rtype: :class:`EstimatorFromDistribution` |

112 | |

113 | .. class:: Loess |

114 | |

115 | Bases: :class:`EstimatorConstructor` |

116 | |

117 | .. method:: __init__(window_proportion, n_points) |

118 | |

119 | :param window_proportion: A proportion of points in a window. |

120 | :type window_proportion: float |

121 | |

122 | :param n_points: Number of points for the interpolating curve. If |

123 | negative, say -3 (default), 3 points will be inserted between each |

124 | data points. |

125 | :type n_points: int |

126 | |

127 | Prepare a probability estimator that computes probability at point ``x`` |

128 | as weighted local regression of probabilities for points in the window |

129 | around this point. |

130 | |

131 | The window contains a prescribed proportion of original data points. The |

132 | window is as symmetric as possible in the sense that the leftmost point in |

133 | the window is approximately as far from ``x`` as the rightmost. The |

134 | number of points to the left of ``x`` might thus differ from the number |

135 | of points to the right. |

136 | |

137 | Points are weighted by bi-cubic weight function; a weight of point |

138 | at ``x'`` is :math:`(1-|t|^3)^3`, where :math:`t` is |

139 | :math:`(x-x'>)/h` and :math:`h` is the distance to the farther |

140 | of the two window edge points. |

141 | |

142 | :rtype: :class:`EstimatorFromDistribution` |

143 | |

144 | |

145 | .. class:: ConditionalLoess |

146 | |

147 | Bases: :class:`ConditionalEstimatorConstructor` |

148 | |

149 | .. method:: __init__(window_proportion, n_points) |

150 | |

151 | :param window_proportion: A proportion of points in a window. |

152 | :type window_proportion: float |

153 | |

154 | :param n_points: Number of points for the interpolating curve. If |

155 | negative, say -3 (default), 3 points will be inserted between each |

156 | data points. |

157 | :type n_points: int |

158 | |

159 | Construct a conditional probability estimator, in other aspects |

160 | similar to the one constructed by :class:`Loess`. |

161 | |

162 | :rtype: :class:`ConditionalEstimatorFromDistribution`. |

163 | |

164 | |

165 | Base classes |

166 | ============= |

167 | |

168 | All probability estimators are derived from two base classes: one for |

169 | unconditional and the other for conditional probability estimation. The same |

170 | is true for probability estimator constructors. |

171 | |

172 | .. class:: EstimatorConstructor |

173 | |

174 | Constructor of an unconditional probability estimator. |

175 | |

176 | .. method:: __call__([distribution[, prior]], [instances[, weight_id]]) |

177 | |

178 | :param distribution: input distribution. |

179 | :type distribution: :class:`~Orange.statistics.distribution.Distribution` |

180 | |

181 | :param priori: prior distribution. |

182 | :type distribution: :class:`~Orange.statistics.distribution.Distribution` |

183 | |

184 | :param instances: input data. |

185 | :type distribution: :class:`Orange.data.Table` |

186 | |

187 | :param weight_id: ID of the weight attribute. |

188 | :type weight_id: int |

189 | |

190 | If distribution is given, it can be followed by prior class |

191 | distribution. Similarly, instances can be followed by with |

192 | the ID of meta attribute with instance weights. (Hint: to pass a |

193 | prior distribution and instances, but no distribution, |

194 | just pass :obj:`None` for the latter.) When both, |

195 | distribution and instances are given, it is up to constructor to |

196 | decide what to use. |

197 | |

198 | .. class:: Estimator |

199 | |

200 | .. attribute:: supports_discrete |

201 | |

202 | Tells whether the estimator can handle discrete attributes. |

203 | |

204 | .. attribute:: supports_continuous |

205 | |

206 | Tells whether the estimator can handle continuous attributes. |

207 | |

208 | .. method:: __call__([value]) |

209 | |

210 | If value is given, return the probability of the value. |

211 | |

212 | :rtype: float |

213 | |

214 | If the value is omitted, an attempt is made |

215 | to return a distribution of probabilities for all values. |

216 | |

217 | :rtype: :class:`~Orange.statistics.distribution.Distribution` |

218 | (usually :class:`~Orange.statistics.distribution.Discrete` for |

219 | discrete and :class:`~Orange.statistics.distribution.Continuous` |

220 | for continuous) or :obj:`NoneType` |

221 | |

222 | .. class:: ConditionalEstimatorConstructor |

223 | |

224 | Constructor of a conditional probability estimator. |

225 | |

226 | .. method:: __call__([table[, prior]], [instances[, weight_id]]) |

227 | |

228 | :param table: input distribution. |

229 | :type table: :class:`Orange.statistics.contingency.Table` |

230 | |

231 | :param prior: prior distribution. |

232 | :type distribution: :class:`~Orange.statistics.distribution.Distribution` |

233 | |

234 | :param instances: input data. |

235 | :type distribution: :class:`Orange.data.Table` |

236 | |

237 | :param weight_id: ID of the weight attribute. |

238 | :type weight_id: int |

239 | |

240 | If distribution is given, it can be followed by prior class |

241 | distribution. Similarly, instances can be followed by with |

242 | the ID of meta attribute with instance weights. (Hint: to pass a |

243 | prior distribution and instances, but no distribution, |

244 | just pass :obj:`None` for the latter.) When both, |

245 | distribution and instances are given, it is up to constructor to |

246 | decide what to use. |

247 | |

248 | .. class:: ConditionalEstimator |

249 | |

250 | As a counterpart of :class:`Estimator`, this estimator can return |

251 | conditional probabilities. |

252 | |

253 | .. method:: __call__([[value,] condition_value]) |

254 | |

255 | When given two values, it returns a probability of :math:`p(value|condition)`. |

256 | |

257 | :rtype: float |

258 | |

259 | When given only one value, it is interpreted as condition; the estimator |

260 | attempts to return a distribution of conditional probabilities for all |

261 | values. |

262 | |

263 | :rtype: :class:`~Orange.statistics.distribution.Distribution` |

264 | (usually :class:`~Orange.statistics.distribution.Discrete` for |

265 | discrete and :class:`~Orange.statistics.distribution.Continuous` |

266 | for continuous) or :obj:`NoneType` |

267 | |

268 | When called without arguments, it returns a |

269 | matrix containing probabilities :math:`p(value|condition)` for each |

270 | possible :math:`value` and :math:`condition` (a contingency table); |

271 | condition is used as outer |

272 | variable. |

273 | |

274 | :rtype: :class:`Orange.statistics.contingency.Table` or :obj:`NoneType` |

275 | |

276 | If estimator cannot return precomputed distributions and/or |

277 | contingencies, it returns :obj:`None`. |

278 | |

279 | Common Components |

280 | ================= |

281 | |

282 | .. class:: EstimatorFromDistribution |

283 | |

284 | Bases: :class:`Estimator` |

285 | |

286 | Probability estimator constructors that compute probabilities for all |

287 | values in advance return this estimator with calculated |

288 | quantities in the :obj:`probabilities` attribute. |

289 | |

290 | .. attribute:: probabilities |

291 | |

292 | A precomputed list of probabilities. |

293 | |

294 | .. method:: __call__([value]) |

295 | |

296 | If value is given, return the probability of the value. For discrete |

297 | variables, every value has an entry in the :obj:`probabilities` |

298 | attribute. For continuous variables, a linear interpolation between |

299 | two nearest points is used to compute the probability. |

300 | |

301 | :rtype: float |

302 | |

303 | If the value is omitted, a copy of :obj:`probabilities` is returned. |

304 | |

305 | :rtype: :class:`~Orange.statistics.distribution.Distribution` |

306 | (usually :class:`~Orange.statistics.distribution.Discrete` for |

307 | discrete and :class:`~Orange.statistics.distribution.Continuous` |

308 | for continuous). |

309 | |

310 | .. class:: ConditionalEstimatorFromDistribution |

311 | |

312 | Bases: :class:`ConditionalEstimator` |

313 | |

314 | Probability estimator constructors that compute the whole |

315 | contingency table (:class:`Orange.statistics.contingency.Table`) of |

316 | conditional probabilities in advance |

317 | return this estimator with the table in the :obj:`probabilities` attribute. |

318 | |

319 | .. attribute:: probabilities |

320 | |

321 | A precomputed contingency table. |

322 | |

323 | .. method:: __call__([[value,] condition_value]) |

324 | |

325 | For detailed description of handling of different combinations of |

326 | parameters, see the inherited :obj:`ConditionalEstimator.__call__`. |

327 | For behaviour with continuous variable distributions, |

328 | see the unconditional counterpart :obj:`EstimatorFromDistribution.__call__`. |

329 | |

330 | .. class:: ConditionalByRows |

331 | |

332 | Bases: :class:`ConditionalEstimator` |

333 | |

334 | .. attribute:: estimator_constructor |

335 | |

336 | An unconditional probability estimator constructor. |

337 | |

338 | Computes a conditional probability estimator using |

339 | an unconditional probability estimator constructor. The result |

340 | can be of type :class:`ConditionalEstimatorFromDistribution` |

341 | or :class:`ConditionalEstimatorByRows`, depending on the type of |

342 | constructor. |

343 | |

344 | .. method:: __call__([table[, prior]], [instances[, weight_id]], estimator) |

345 | |

346 | :param table: input distribution. |

347 | :type table: :class:`Orange.statistics.contingency.Table` |

348 | |

349 | :param prior: prior distribution. |

350 | :type distribution: :class:`~Orange.statistics.distribution.Distribution` |

351 | |

352 | :param instances: input data. |

353 | :type distribution: :class:`Orange.data.Table` |

354 | |

355 | :param weight_id: ID of the weight attribute. |

356 | :type weight_id: int |

357 | |

358 | :param estimator: unconditional probability estimator constructor. |

359 | :type estimator: :class:`EstimatorConstructor` |

360 | |

361 | Compute contingency matrix if it has not been computed already. Then |

362 | call :obj:`estimator_constructor` for each value of condition attribute. |

363 | If all constructed estimators can return distribution of probabilities |

364 | for all classes (usually either all or none can), the |

365 | :class:`~Orange.statistics.distribution.Distribution` instances are put |

366 | in a contingency table |

367 | and :class:`ConditionalEstimatorFromDistribution` |

368 | is constructed and returned. If constructed estimators are |

369 | not capable of returning distribution of probabilities, |

370 | a :class:`ConditionalEstimatorByRows` is constructed and the |

371 | estimators are stored in its :obj:`estimator_list`. |

372 | |

373 | :rtype: :class:`ConditionalEstimatorFromDistribution` or :class:`ConditionalEstimatorByRows` |

374 | |

375 | .. class:: ConditionalEstimatorByRows |

376 | |

377 | Bases: :class:`ConditionalEstimator` |

378 | |

379 | A conditional probability estimator constructors that itself uses a series |

380 | of estimators, one for each possible condition, |

381 | stored in its :obj:`estimator_list` attribute. |

382 | |

383 | .. attribute:: estimator_list |

384 | |

385 | A list of estimators; one for each value of :obj:`condition`. |

386 | |

387 | .. method:: __call__([[value,] condition_value]) |

388 | |

389 | Uses estimators from :obj:`estimator_list`, |

390 | depending on given `condition_value`. |

391 | For detailed description of handling of different combinations of |

392 | parameters, see the inherited :obj:`ConditionalEstimator.__call__`. |

393 |

**Note:**See TracBrowser for help on using the repository browser.