source: orange/docs/reference/rst/Orange.statistics.estimate.rst @ 11335:33ee32ba99fe

Revision 11335:33ee32ba99fe, 14.1 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Fixed param type documentation in 'Orange.statistics.estimate'.

Line 
1.. automodule:: Orange.statistics.estimate
2
3.. index:: Probability Estimation
4
5=======================================
6Probability Estimation (``estimate``)
7=======================================
8
9Probability estimators compute probabilities of values of class variable.
10They come in two flavours:
11
12#. for unconditional probabilities (:math:`p(C=c)`, where :math:`c` is a
13   class) and
14
15#. for conditional probabilities (:math:`p(C=c|V=v)`,
16   where :math:`v` is a feature value).
17
18A duality much like the one between learners and classifiers exists between
19probability estimator constructors and probability estimators: when a
20probability estimator constructor is called with data, it constructs a
21probability estimator that can then be called with a value of class variable
22to obtain a probability of that value. This duality is mainly needed to
23enable probability estimation for continuous variables,
24where it is not possible to generate a list of probabilities of all possible
25values in advance.
26
27First, probability estimation constructors for common probability estimation
28techniques are enumerated. Base classes, knowledge of which is needed to
29develop new techniques, are described later in this document.
30
31Probability Estimation Constructors
32===================================
33
34.. class:: RelativeFrequency
35
36    Bases: :class:`EstimatorConstructor`
37
38    Compute distribution using relative frequencies of classes.
39
40    :rtype: :class:`EstimatorFromDistribution`
41
42.. class:: Laplace
43
44    Bases: :class:`EstimatorConstructor`
45
46    Use Laplace estimation to compute distribution from frequencies of classes:
47
48    .. math::
49
50        p(c) = \\frac{Nc+1}{N+n}
51
52    where :math:`Nc` is number of occurrences of an event (e.g. number of
53    instances in class c), :math:`N` is the total number of events (instances)
54    and :math:`n` is the number of different events (classes).
55
56    :rtype: :class:`EstimatorFromDistribution`
57
58.. class:: M
59
60    Bases: :class:`EstimatorConstructor`
61
62    .. method:: __init__(m)
63
64        :param m: Parameter for m-estimation.
65        :type m: int
66
67    Use m-estimation to compute distribution from frequencies of classes:
68
69    .. math::
70
71        p(c) = \\frac{Nc+m*ap(c)}{N+m}
72
73    where :math:`Nc` is number of occurrences of an event (e.g. number of
74    instances in class c), :math:`N` is the total number of events (instances)
75    and :math:`ap(c)` is the prior probability of event (class) c.
76
77    :rtype: :class:`EstimatorFromDistribution`
78
79.. class:: Kernel
80
81    Bases: :class:`EstimatorConstructor`
82
83    .. method:: __init__(min_impact, smoothing, n_points)
84
85        :param min_impact: A requested minimal weight of a point (default:
86            0.01); points with lower weights won't be taken into account.
87        :type min_impact: float
88
89        :param smoothing: Smoothing factor (default: 1.144).
90        :type smoothing: float
91
92        :param n_points: Number of points for the interpolating curve. If
93            negative, say -3 (default), 3 points will be inserted between each
94            data points.
95        :type n_points: int
96
97    Compute probabilities for continuous variable for certain number of points
98    using Gaussian kernels. The resulting point-wise continuous distribution is
99    stored as :class:`~Orange.statistics.distribution.Continuous`.
100
101    Probabilities are always computed at all points that
102    are present in the data (i.e. the existing values of the continuous
103    feature). If :obj:`n_points` is positive and greater than the
104    number of existing data points, additional points are inserted
105    between the existing points to achieve the required number of
106    points. Approximately equal number of new points is inserted between
107    each adjacent existing point each data points. If :obj:`n_points` is
108    negative, its absolute value determines the number of points to be added
109    between each two data points.
110
111    :rtype: :class:`EstimatorFromDistribution`
112
113.. class:: Loess
114
115    Bases: :class:`EstimatorConstructor`
116
117    .. method:: __init__(window_proportion, n_points)
118
119        :param window_proportion: A proportion of points in a window.
120        :type window_proportion: float
121
122        :param n_points: Number of points for the interpolating curve. If
123            negative, say -3 (default), 3 points will be inserted between each
124            data points.
125        :type n_points: int
126
127    Prepare a probability estimator that computes probability at point ``x``
128    as weighted local regression of probabilities for points in the window
129    around this point.
130
131    The window contains a prescribed proportion of original data points. The
132    window is as symmetric as possible in the sense that the leftmost point in
133    the window is approximately as far from ``x`` as the rightmost. The
134    number of points to the left of ``x`` might thus differ from the number
135    of points to the right.
136
137    Points are weighted by bi-cubic weight function; a weight of point
138    at ``x'`` is :math:`(1-|t|^3)^3`, where :math:`t` is
139    :math:`(x-x'>)/h` and :math:`h` is the distance to the farther
140    of the two window edge points.
141
142    :rtype: :class:`EstimatorFromDistribution`
143
144
145.. class:: ConditionalLoess
146
147    Bases: :class:`ConditionalEstimatorConstructor`
148
149    .. method:: __init__(window_proportion, n_points)
150
151        :param window_proportion: A proportion of points in a window.
152        :type window_proportion: float
153
154        :param n_points: Number of points for the interpolating curve. If
155            negative, say -3 (default), 3 points will be inserted between each
156            data points.
157        :type n_points: int
158
159    Construct a conditional probability estimator, in other aspects
160    similar to the one constructed by :class:`Loess`.
161
162    :rtype: :class:`ConditionalEstimatorFromDistribution`.
163
164
165Base classes
166=============
167
168All probability estimators are derived from two base classes: one for
169unconditional and the other for conditional probability estimation. The same
170is true for probability estimator constructors.
171
172.. class:: EstimatorConstructor
173
174    Constructor of an unconditional probability estimator.
175
176    .. method:: __call__([distribution[, prior]], [instances[, weight_id]])
177
178        :param distribution: input distribution.
179        :type distribution: :class:`~Orange.statistics.distribution.Distribution`
180
181        :param priori: prior distribution.
182        :type priori: :class:`~Orange.statistics.distribution.Distribution`
183
184        :param instances: input data.
185        :type instances: :class:`Orange.data.Table`
186
187        :param weight_id: ID of the weight attribute.
188        :type weight_id: int
189
190        If distribution is given, it can be followed by prior class
191        distribution. Similarly, instances can be followed by with
192        the ID of meta attribute with instance weights. (Hint: to pass a
193        prior distribution and instances, but no distribution,
194        just pass :obj:`None` for the latter.) When both,
195        distribution and instances are given, it is up to constructor to
196        decide what to use.
197
198.. class:: Estimator
199
200    .. attribute:: supports_discrete
201
202        Tells whether the estimator can handle discrete attributes.
203
204    .. attribute:: supports_continuous
205
206        Tells whether the estimator can handle continuous attributes.
207
208    .. method:: __call__([value])
209
210        If value is given, return the probability of the value.
211
212        :rtype: float
213
214        If the value is omitted, an attempt is made
215        to return a distribution of probabilities for all values.
216
217        :rtype: :class:`~Orange.statistics.distribution.Distribution`
218            (usually :class:`~Orange.statistics.distribution.Discrete` for
219            discrete and :class:`~Orange.statistics.distribution.Continuous`
220            for continuous) or :obj:`NoneType`
221
222.. class:: ConditionalEstimatorConstructor
223
224    Constructor of a conditional probability estimator.
225
226    .. method:: __call__([table[, prior]], [instances[, weight_id]])
227
228        :param table: input distribution.
229        :type table: :class:`Orange.statistics.contingency.Table`
230
231        :param prior: prior distribution.
232        :type prior: :class:`~Orange.statistics.distribution.Distribution`
233
234        :param instances: input data.
235        :type instances: :class:`Orange.data.Table`
236
237        :param weight_id: ID of the weight attribute.
238        :type weight_id: int
239
240        If distribution is given, it can be followed by prior class
241        distribution. Similarly, instances can be followed by with
242        the ID of meta attribute with instance weights. (Hint: to pass a
243        prior distribution and instances, but no distribution,
244        just pass :obj:`None` for the latter.) When both,
245        distribution and instances are given, it is up to constructor to
246        decide what to use.
247
248.. class:: ConditionalEstimator
249
250    As a counterpart of :class:`Estimator`, this estimator can return
251    conditional probabilities.
252
253    .. method:: __call__([[value,] condition_value])
254
255        When given two values, it returns a probability of :math:`p(value|condition)`.
256
257        :rtype: float
258
259        When given only one value, it is interpreted as condition; the estimator
260        attempts to return a distribution of conditional probabilities for all
261        values.
262
263        :rtype: :class:`~Orange.statistics.distribution.Distribution`
264            (usually :class:`~Orange.statistics.distribution.Discrete` for
265            discrete and :class:`~Orange.statistics.distribution.Continuous`
266            for continuous) or :obj:`NoneType`
267
268        When called without arguments, it returns a
269        matrix containing probabilities :math:`p(value|condition)` for each
270        possible :math:`value` and :math:`condition` (a contingency table);
271        condition is used as outer
272        variable.
273
274        :rtype: :class:`Orange.statistics.contingency.Table` or :obj:`NoneType`
275
276        If estimator cannot return precomputed distributions and/or
277        contingencies, it returns :obj:`None`.
278
279Common Components
280=================
281
282.. class:: EstimatorFromDistribution
283
284    Bases: :class:`Estimator`
285
286    Probability estimator constructors that compute probabilities for all
287    values in advance return this estimator with calculated
288    quantities in the :obj:`probabilities` attribute.
289
290    .. attribute:: probabilities
291
292        A precomputed list of probabilities.
293
294    .. method:: __call__([value])
295
296        If value is given, return the probability of the value. For discrete
297        variables, every value has an entry in the :obj:`probabilities`
298        attribute. For continuous variables, a linear interpolation between
299        two nearest points is used to compute the probability.
300
301        :rtype: float
302
303        If the value is omitted, a copy of :obj:`probabilities` is returned.
304
305        :rtype: :class:`~Orange.statistics.distribution.Distribution`
306            (usually :class:`~Orange.statistics.distribution.Discrete` for
307            discrete and :class:`~Orange.statistics.distribution.Continuous`
308            for continuous).
309
310.. class:: ConditionalEstimatorFromDistribution
311
312    Bases: :class:`ConditionalEstimator`
313
314    Probability estimator constructors that compute the whole
315    contingency table (:class:`Orange.statistics.contingency.Table`) of
316    conditional probabilities in advance
317    return this estimator with the table in the :obj:`probabilities` attribute.
318
319    .. attribute:: probabilities
320
321        A precomputed contingency table.
322
323    .. method:: __call__([[value,] condition_value])
324
325        For detailed description of handling of different combinations of
326        parameters, see the inherited :obj:`ConditionalEstimator.__call__`.
327        For behaviour with continuous variable distributions,
328        see the unconditional counterpart :obj:`EstimatorFromDistribution.__call__`.
329
330.. class:: ConditionalByRows
331
332    Bases: :class:`ConditionalEstimatorConstructor`
333
334    .. attribute:: estimator_constructor
335
336        An unconditional probability estimator constructor.
337
338    Computes a conditional probability estimator using
339    an unconditional probability estimator constructor. The result
340    can be of type :class:`ConditionalEstimatorFromDistribution`
341    or :class:`ConditionalEstimatorByRows`, depending on the type of
342    constructor.
343
344    .. method:: __call__([table[, prior]], [instances[, weight_id]], estimator)
345
346        :param table: input distribution.
347        :type table: :class:`Orange.statistics.contingency.Table`
348
349        :param prior: prior distribution.
350        :type prior: :class:`~Orange.statistics.distribution.Distribution`
351
352        :param instances: input data.
353        :type instances: :class:`Orange.data.Table`
354
355        :param weight_id: ID of the weight attribute.
356        :type weight_id: int
357
358        :param estimator: unconditional probability estimator constructor.
359        :type estimator: :class:`EstimatorConstructor`
360
361        Compute contingency matrix if it has not been computed already. Then
362        call :obj:`estimator_constructor` for each value of condition attribute.
363        If all constructed estimators can return distribution of probabilities
364        for all classes (usually either all or none can), the
365        :class:`~Orange.statistics.distribution.Distribution` instances are put
366        in a contingency table
367        and :class:`ConditionalEstimatorFromDistribution`
368        is constructed and returned. If constructed estimators are
369        not capable of returning distribution of probabilities,
370        a :class:`ConditionalEstimatorByRows` is constructed and the
371        estimators are stored in its :obj:`estimator_list`.
372
373        :rtype: :class:`ConditionalEstimatorFromDistribution` or :class:`ConditionalEstimatorByRows`
374
375.. class:: ConditionalEstimatorByRows
376
377    Bases: :class:`ConditionalEstimator`
378
379    A conditional probability estimator constructors that itself uses a series
380    of estimators, one for each possible condition,
381    stored in its :obj:`estimator_list` attribute.
382
383    .. attribute:: estimator_list
384
385        A list of estimators; one for each value of :obj:`condition`.
386
387    .. method:: __call__([[value,] condition_value])
388
389        Uses estimators from :obj:`estimator_list`,
390        depending on given `condition_value`.
391        For detailed description of handling of different combinations of
392        parameters, see the inherited :obj:`ConditionalEstimator.__call__`.
393
Note: See TracBrowser for help on using the repository browser.