source: orange/Orange/statistics/estimate.py @ 10101:5318f864a842

Revision 10101:5318f864a842, 15.2 KB checked in by Matija Polajnar <matija.polajnar@…>, 2 years ago (diff)

Rewrite Orange.statistics.estimate documentation.

Line 
1"""
2.. index:: Probability Estimation
3
4=======================================
5Probability Estimation (``estimate``)
6=======================================
7
8Probability estimators compute probabilities of values of class variable.
9They come in two flavours:
10
11#. for unconditional probabilities (:math:`p(C=c)`, where :math:`c` is a
12   class) and
13
14#. for conditional probabilities (:math:`p(C=c|V=v)`,
15   where :math:`v` is a feature value).
16
17A duality much like the one between learners and classifiers exists between
18probability estimator constructors and probability estimators: when a
19probability estimator constructor is called with data, it constructs a
20probability estimator that can then be called with a value of class variable
21to obtain a probability of that value. This duality is mainly needed to
22enable probability estimation for continuous variables,
23where it is not possible to generate a list of probabilities of all possible
24values in advance.
25
26First, probability estimation constructors for common probability estimation
27techniques are enumerated. Base classes, knowledge of which is needed to
28develop new techniques, are described later in this document.
29
30Probability Estimation Constructors
31===================================
32
33.. class:: RelativeFrequency
34
35    Bases: :class:`EstimatorConstructor`
36
37    Compute distribution using relative frequencies of classes.
38
39    :rtype: :class:`EstimatorFromDistribution`
40
41.. class:: Laplace
42
43    Bases: :class:`EstimatorConstructor`
44
45    Use Laplace estimation to compute distribution from frequencies of classes:
46
47    .. math::
48
49        p(c) = \\frac{Nc+1}{N+n}
50
51    where :math:`Nc` is number of occurrences of an event (e.g. number of
52    instances in class c), :math:`N` is the total number of events (instances)
53    and :math:`n` is the number of different events (classes).
54
55    :rtype: :class:`EstimatorFromDistribution`
56
57.. class:: M
58
59    Bases: :class:`EstimatorConstructor`
60
61    .. method:: __init__(m)
62
63        :param m: Parameter for m-estimation.
64        :type m: int
65
66    Use m-estimation to compute distribution from frequencies of classes:
67
68    .. math::
69
70        p(c) = \\frac{Nc+m*ap(c)}{N+m}
71
72    where :math:`Nc` is number of occurrences of an event (e.g. number of
73    instances in class c), :math:`N` is the total number of events (instances)
74    and :math:`ap(c)` is the prior probability of event (class) c.
75
76    :rtype: :class:`EstimatorFromDistribution`
77
78.. class:: Kernel
79
80    Bases: :class:`EstimatorConstructor`
81
82    .. method:: __init__(min_impact, smoothing, n_points)
83
84        :param min_impact: A requested minimal weight of a point (default:
85            0.01); points with lower weights won't be taken into account.
86        :type min_impact: float
87
88        :param smoothing: Smoothing factor (default: 1.144).
89        :type smoothing: float
90
91        :param n_points: Number of points for the interpolating curve. If
92            negative, say -3 (default), 3 points will be inserted between each
93            data points.
94        :type n_points: int
95
96    Compute probabilities for continuous variable for certain number of points
97    using Gaussian kernels. The resulting point-wise continuous distribution is
98    stored as :class:`~Orange.statistics.distribution.Continuous`.
99
100    Probabilities are always computed at all points that
101    are present in the data (i.e. the existing values of the continuous
102    feature). If :obj:`n_points` is positive and greater than the
103    number of existing data points, additional points are inserted
104    between the existing points to achieve the required number of
105    points. Approximately equal number of new points is inserted between
106    each adjacent existing point each data points. If :obj:`n_points` is
107    negative, its absolute value determines the number of points to be added
108    between each two data points.
109
110    :rtype: :class:`EstimatorFromDistribution`
111
112.. class:: Loess
113
114    Bases: :class:`EstimatorConstructor`
115
116    .. method:: __init__(window_proportion, n_points)
117
118        :param window_proportion: A proportion of points in a window.
119        :type window_proportion: float
120
121        :param n_points: Number of points for the interpolating curve. If
122            negative, say -3 (default), 3 points will be inserted between each
123            data points.
124        :type n_points: int
125
126    Prepare a probability estimator that computes probability at point ``x``
127    as weighted local regression of probabilities for points in the window
128    around this point.
129
130    The window contains a prescribed proportion of original data points. The
131    window is as symmetric as possible in the sense that the leftmost point in
132    the window is approximately as far from ``x`` as the rightmost. The
133    number of points to the left of ``x`` might thus differ from the number
134    of points to the right.
135
136    Points are weighted by bi-cubic weight function; a weight of point
137    at ``x'`` is :math:`(1-|t|^3)^3`, where :math:`t` is
138    :math:`(x-x'>)/h` and :math:`h` is the distance to the farther
139    of the two window edge points.
140
141    :rtype: :class:`EstimatorFromDistribution`
142
143
144.. class:: ConditionalLoess
145
146    Bases: :class:`ConditionalEstimatorConstructor`
147
148    .. method:: __init__(window_proportion, n_points)
149
150        :param window_proportion: A proportion of points in a window.
151        :type window_proportion: float
152
153        :param n_points: Number of points for the interpolating curve. If
154            negative, say -3 (default), 3 points will be inserted between each
155            data points.
156        :type n_points: int
157
158    Construct a conditional probability estimator, in other aspects
159    similar to the one constructed by :class:`Loess`.
160
161    :rtype: :class:`ConditionalEstimatorFromDistribution`.
162
163
164Base classes
165=============
166
167All probability estimators are derived from two base classes: one for
168unconditional and the other for conditional probability estimation. The same
169is true for probability estimator constructors.
170
171.. class:: EstimatorConstructor
172
173    Constructor of an unconditional probability estimator.
174
175    .. method:: __call__([distribution[, apriori]], [instances[, weight_id]])
176
177        :param distribution: input distribution.
178        :type distribution: :class:`~Orange.statistics.distribution.Distribution`
179
180        :param apriori: prior distribution.
181        :type distribution: :class:`~Orange.statistics.distribution.Distribution`
182
183        :param instances: input data.
184        :type distribution: :class:`Orange.data.Table`
185
186        :param weight_id: ID of the weight attribute.
187        :type weight_id: int
188
189        If distribution is given, it can be followed by prior class
190        distribution. Similarly, instances can be followed by with
191        the ID of meta attribute with instance weights. (Hint: to pass a
192        prior distribution and instances, but no distribution,
193        just pass :obj:`None` for the latter.) When both,
194        distribution and instances are given, it is up to constructor to
195        decide what to use.
196
197.. class:: Estimator
198
199    .. attribute:: supports_discrete
200
201        Tells whether the estimator can handle discrete attributes.
202
203    .. attribute:: supports_continuous
204
205        Tells whether the estimator can handle continuous attributes.
206
207    .. method:: __call__([value])
208
209        If value is given, return the probability of the value.
210
211        :rtype: float
212
213        If the value is omitted, an attempt is made
214        to return a distribution of probabilities for all values.
215
216        :rtype: :class:`~Orange.statistics.distribution.Distribution`
217            (usually :class:`~Orange.statistics.distribution.Discrete` for
218            discrete and :class:`~Orange.statistics.distribution.Continuous`
219            for continuous) or :obj:`NoneType`
220
221.. class:: ConditionalEstimatorConstructor
222
223    Constructor of a conditional probability estimator.
224
225    .. method:: __call__([table[, apriori]], [instances[, weight_id]])
226
227        :param table: input distribution.
228        :type table: :class:`Orange.statistics.contingency.Table`
229
230        :param apriori: prior distribution.
231        :type distribution: :class:`~Orange.statistics.distribution.Distribution`
232
233        :param instances: input data.
234        :type distribution: :class:`Orange.data.Table`
235
236        :param weight_id: ID of the weight attribute.
237        :type weight_id: int
238
239        If distribution is given, it can be followed by prior class
240        distribution. Similarly, instances can be followed by with
241        the ID of meta attribute with instance weights. (Hint: to pass a
242        prior distribution and instances, but no distribution,
243        just pass :obj:`None` for the latter.) When both,
244        distribution and instances are given, it is up to constructor to
245        decide what to use.
246
247.. class:: ConditionalEstimator
248
249    As a counterpart of :class:`Estimator`, this estimator can return
250    conditional probabilities.
251
252    .. method:: __call__([[value,] condition_value])
253
254        When given two values, it returns a probability of :math:`p(value|condition)`.
255
256        :rtype: float
257
258        When given only one value, it is interpreted as condition; the estimator
259        attempts to return a distribution of conditional probabilities for all
260        values.
261
262        :rtype: :class:`~Orange.statistics.distribution.Distribution`
263            (usually :class:`~Orange.statistics.distribution.Discrete` for
264            discrete and :class:`~Orange.statistics.distribution.Continuous`
265            for continuous) or :obj:`NoneType`
266
267        When called without arguments, it returns a
268        matrix containing probabilities :math:`p(value|condition)` for each
269        possible :math:`value` and :math:`condition` (a contingency table);
270        condition is used as outer
271        variable.
272
273        :rtype: :class:`Orange.statistics.contingency.Table` or :obj:`NoneType`
274
275        If estimator cannot return precomputed distributions and/or
276        contingencies, it returns :obj:`None`.
277
278Common Components
279=================
280
281.. class:: EstimatorFromDistribution
282
283    Bases: :class:`Estimator`
284
285    Probability estimator constructors that compute probabilities for all
286    values in advance return this estimator with calculated
287    quantities in the :obj:`probabilities` attribute.
288
289    .. attribute:: probabilities
290
291        A precomputed list of probabilities.
292
293    .. method:: __call__([value])
294
295        If value is given, return the probability of the value. For discrete
296        variables, every value has an entry in the :obj:`probabilities`
297        attribute. For continuous variables, a linear interpolation between
298        two nearest points is used to compute the probability.
299
300        :rtype: float
301
302        If the value is omitted, a copy of :obj:`probabilities` is returned.
303
304        :rtype: :class:`~Orange.statistics.distribution.Distribution`
305            (usually :class:`~Orange.statistics.distribution.Discrete` for
306            discrete and :class:`~Orange.statistics.distribution.Continuous`
307            for continuous).
308
309.. class:: ConditionalEstimatorFromDistribution
310
311    Bases: :class:`ConditionalEstimator`
312
313    Probability estimator constructors that compute the whole
314    contingency table (:class:`Orange.statistics.contingency.Table`) of
315    conditional probabilities in advance
316    return this estimator with the table in the :obj:`probabilities` attribute.
317
318    .. attribute:: probabilities
319
320        A precomputed contingency table.
321
322    .. method:: __call__([[value,] condition_value])
323
324        For detailed description of handling of different combinations of
325        parameters, see the inherited :obj:`ConditionalEstimator.__call__`.
326        For behaviour with continuous variable distributions,
327        see the unconditional counterpart :obj:`EstimatorFromDistribution.__call__`.
328
329.. class:: ConditionalByRows
330
331    Bases: :class:`ConditionalEstimator`
332
333    .. attribute:: estimator_constructor
334
335        An unconditional probability estimator constructor.
336
337    Computes a conditional probability estimator using
338    an unconditional probability estimator constructor. The result
339    can be of type :class:`ConditionalEstimatorFromDistribution`
340    or :class:`ConditionalEstimatorByRows`, depending on the type of
341    constructor.
342
343    .. method:: __call__([table[, apriori]], [instances[, weight_id]], estimator)
344
345        :param table: input distribution.
346        :type table: :class:`Orange.statistics.contingency.Table`
347
348        :param apriori: prior distribution.
349        :type distribution: :class:`~Orange.statistics.distribution.Distribution`
350
351        :param instances: input data.
352        :type distribution: :class:`Orange.data.Table`
353
354        :param weight_id: ID of the weight attribute.
355        :type weight_id: int
356
357        :param estimator: unconditional probability estimator constructor.
358        :type estimator: :class:`EstimatorConstructor`
359
360        Compute contingency matrix if it has not been computed already. Then
361        call :obj:`estimator_constructor` for each value of condition attribute.
362        If all constructed estimators can return distribution of probabilities
363        for all classes (usually either all or none can), the
364        :class:`~Orange.statistics.distribution.Distribution` instances are put
365        in a contingency table
366        and :class:`ConditionalEstimatorFromDistribution`
367        is constructed and returned. If constructed estimators are
368        not capable of returning distribution of probabilities,
369        a :class:`ConditionalEstimatorByRows` is constructed and the
370        estimators are stored in its :obj:`estimator_list`.
371
372        :rtype: :class:`ConditionalEstimatorFromDistribution` or :class:`ConditionalEstimatorByRows`
373
374.. class:: ConditionalEstimatorByRows
375
376    Bases: :class:`ConditionalEstimator`
377
378    A conditional probability estimator constructors that itself uses a series
379    of estimators, one for each possible condition,
380    stored in its :obj:`estimator_list` attribute.
381
382    .. attribute:: estimator_list
383
384        A list of estimators; one for each value of :obj:`condition`.
385
386    .. method:: __call__([[value,] condition_value])
387
388        Uses estimators from :obj:`estimator_list`,
389        depending on given `condition_value`.
390        For detailed description of handling of different combinations of
391        parameters, see the inherited :obj:`ConditionalEstimator.__call__`.
392
393"""
394
395import Orange
396from Orange.core import ProbabilityEstimator as Estimator
397from Orange.core import ProbabilityEstimator_FromDistribution as EstimatorFromDistribution
398from Orange.core import ProbabilityEstimatorConstructor as EstimatorConstructor
399from Orange.core import ProbabilityEstimatorConstructor_Laplace as Laplace
400from Orange.core import ProbabilityEstimatorConstructor_kernel as Kernel
401from Orange.core import ProbabilityEstimatorConstructor_loess as Loess
402from Orange.core import ProbabilityEstimatorConstructor_m as M
403from Orange.core import ProbabilityEstimatorConstructor_relative as RelativeFrequency
404from Orange.core import ConditionalProbabilityEstimator as ConditionalEstimator
405from Orange.core import ConditionalProbabilityEstimator_FromDistribution as ConditionalEstimatorFromDistribution
406from Orange.core import ConditionalProbabilityEstimator_ByRows as ConditionalEstimatorByRows
407from Orange.core import ConditionalProbabilityEstimatorConstructor as ConditionalEstimatorConstructor
408from Orange.core import ConditionalProbabilityEstimatorConstructor_ByRows as ConditionalByRows
409from Orange.core import ConditionalProbabilityEstimatorConstructor_loess as ConditionalLoess
Note: See TracBrowser for help on using the repository browser.