source: orange/docs/reference/rst/Orange.statistics.estimate.rst @ 11337:02feeae55f5f

Revision 11337:02feeae55f5f, 15.7 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Added some clarification and and a code example to the estimator documentation.

Line 
1.. automodule:: Orange.statistics.estimate
2
3.. index:: Probability Estimation
4
5=====================================
6Probability Estimation (``estimate``)
7=====================================
8
9Probability estimators compute probabilities of values of class variable.
10They come in two flavours:
11
12#. for unconditional probabilities (:math:`p(C=c)`, where :math:`c` is a
13   class) and
14
15#. for conditional probabilities (:math:`p(C=c|V=v)`,
16   where :math:`v` is a feature value).
17
18A duality much like the one between learners and classifiers exists between
19probability estimator constructors and probability estimators: when a
20probability estimator constructor is called with data, it constructs a
21probability estimator that can then be called with a value of class variable
22to obtain a probability of that value. This duality is mainly needed to
23enable probability estimation for continuous variables,
24where it is not possible to generate a list of probabilities of all possible
25values in advance.
26
27First, probability estimation constructors for common probability estimation
28techniques are enumerated. Base classes, knowledge of which is needed to
29develop new techniques, are described later in this document.
30
31Probability Estimation Constructors
32===================================
33
34.. class:: RelativeFrequency
35
36    Bases: :class:`EstimatorConstructor`
37
38    Compute distribution using relative frequencies of classes.
39
40    :rtype: :class:`EstimatorFromDistribution`
41
42.. class:: Laplace
43
44    Bases: :class:`EstimatorConstructor`
45
46    Use Laplace estimation to compute distribution from frequencies of classes:
47
48    .. math::
49
50        p(c) = \\frac{Nc+1}{N+n}
51
52    where :math:`Nc` is number of occurrences of an event (e.g. number of
53    instances in class c), :math:`N` is the total number of events (instances)
54    and :math:`n` is the number of different events (classes).
55
56    :rtype: :class:`EstimatorFromDistribution`
57
58.. class:: M
59
60    Bases: :class:`EstimatorConstructor`
61
62    .. method:: __init__(m)
63
64        :param m: Parameter for m-estimation.
65        :type m: int
66
67    Use m-estimation to compute distribution from frequencies of classes:
68
69    .. math::
70
71        p(c) = \\frac{Nc+m*ap(c)}{N+m}
72
73    where :math:`Nc` is number of occurrences of an event (e.g. number of
74    instances in class c), :math:`N` is the total number of events (instances)
75    and :math:`ap(c)` is the prior probability of event (class) c.
76
77    :rtype: :class:`EstimatorFromDistribution`
78
79.. class:: Kernel
80
81    Bases: :class:`EstimatorConstructor`
82
83    .. method:: __init__(min_impact, smoothing, n_points)
84
85        :param min_impact: A requested minimal weight of a point (default:
86            0.01); points with lower weights won't be taken into account.
87        :type min_impact: float
88
89        :param smoothing: Smoothing factor (default: 1.144).
90        :type smoothing: float
91
92        :param n_points: Number of points for the interpolating curve. If
93            negative, say -3 (default), 3 points will be inserted between each
94            data points.
95        :type n_points: int
96
97    Compute probabilities for continuous variable for certain number of points
98    using Gaussian kernels. The resulting point-wise continuous distribution is
99    stored as :class:`~Orange.statistics.distribution.Continuous`.
100
101    Probabilities are always computed at all points that
102    are present in the data (i.e. the existing values of the continuous
103    feature). If :obj:`n_points` is positive and greater than the
104    number of existing data points, additional points are inserted
105    between the existing points to achieve the required number of
106    points. Approximately equal number of new points is inserted between
107    each adjacent existing point each data points. If :obj:`n_points` is
108    negative, its absolute value determines the number of points to be added
109    between each two data points.
110
111    :rtype: :class:`EstimatorFromDistribution`
112
113.. class:: Loess
114
115    Bases: :class:`EstimatorConstructor`
116
117    .. method:: __init__(window_proportion, n_points)
118
119        :param window_proportion: A proportion of points in a window.
120        :type window_proportion: float
121
122        :param n_points: Number of points for the interpolating curve. If
123            negative, say -3 (default), 3 points will be inserted between each
124            data points.
125        :type n_points: int
126
127    Prepare a probability estimator that computes probability at point ``x``
128    as weighted local regression of probabilities for points in the window
129    around this point.
130
131    The window contains a prescribed proportion of original data points. The
132    window is as symmetric as possible in the sense that the leftmost point in
133    the window is approximately as far from ``x`` as the rightmost. The
134    number of points to the left of ``x`` might thus differ from the number
135    of points to the right.
136
137    Points are weighted by bi-cubic weight function; a weight of point
138    at ``x'`` is :math:`(1-|t|^3)^3`, where :math:`t` is
139    :math:`(x-x'>)/h` and :math:`h` is the distance to the farther
140    of the two window edge points.
141
142    :rtype: :class:`EstimatorFromDistribution`
143
144
145.. class:: ConditionalLoess
146
147    Bases: :class:`ConditionalEstimatorConstructor`
148
149    .. method:: __init__(window_proportion, n_points)
150
151        :param window_proportion: A proportion of points in a window.
152        :type window_proportion: float
153
154        :param n_points: Number of points for the interpolating curve. If
155            negative, say -3 (default), 3 points will be inserted between each
156            data points.
157        :type n_points: int
158
159    Construct a conditional probability estimator, in other aspects
160    similar to the one constructed by :class:`Loess`.
161
162    :rtype: :class:`ConditionalEstimatorFromDistribution`.
163
164
165Base classes
166============
167
168All probability estimators are derived from two base classes: one for
169unconditional and the other for conditional probability estimation. The same
170is true for probability estimator constructors.
171
172.. class:: EstimatorConstructor
173
174    Constructor of an unconditional probability estimator.
175
176    .. method:: __call__([distribution[, prior]], [instances[, weight_id]])
177
178        :param distribution: input distribution.
179        :type distribution: :class:`~Orange.statistics.distribution.Distribution`
180
181        :param priori: prior distribution.
182        :type priori: :class:`~Orange.statistics.distribution.Distribution`
183
184        :param instances: input data.
185        :type instances: :class:`Orange.data.Table`
186
187        :param weight_id: ID of the weight attribute.
188        :type weight_id: int
189
190        If distribution is given, it can be followed by prior class
191        distribution. Similarly, instances can be followed by with
192        the ID of meta attribute with instance weights. (Hint: to pass a
193        prior distribution and instances, but no distribution,
194        just pass :obj:`None` for the latter.) When both,
195        distribution and instances are given, it is up to constructor to
196        decide what to use.
197
198        .. note:: The `instances` and `weight_id` argument are at the moment
199            only used by :class:`ConditionalByRows`. The rest of the builtin
200            constructors require that `distribution` is given.
201
202.. class:: Estimator
203
204    .. attribute:: supports_discrete
205
206        Tells whether the estimator can handle discrete attributes.
207
208    .. attribute:: supports_continuous
209
210        Tells whether the estimator can handle continuous attributes.
211
212    .. method:: __call__([value])
213
214        If value is given, return the probability of the value.
215
216        :rtype: float
217
218        If the value is omitted, an attempt is made
219        to return a distribution of probabilities for all values.
220
221        :rtype: :class:`~Orange.statistics.distribution.Distribution`
222            (usually :class:`~Orange.statistics.distribution.Discrete` for
223            discrete and :class:`~Orange.statistics.distribution.Continuous`
224            for continuous) or :obj:`NoneType`
225
226.. class:: ConditionalEstimatorConstructor
227
228    Constructor of a conditional probability estimator.
229
230    .. method:: __call__([table[, prior]], [instances[, weight_id]])
231
232        :param table: input distribution.
233        :type table: :class:`Orange.statistics.contingency.Table`
234
235        :param prior: prior distribution.
236        :type prior: :class:`~Orange.statistics.distribution.Distribution`
237
238        :param instances: input data.
239        :type instances: :class:`Orange.data.Table`
240
241        :param weight_id: ID of the weight attribute.
242        :type weight_id: int
243
244        If distribution is given, it can be followed by prior class
245        distribution. Similarly, instances can be followed by with
246        the ID of meta attribute with instance weights. (Hint: to pass a
247        prior distribution and instances, but no distribution,
248        just pass :obj:`None` for the latter.) When both,
249        distribution and instances are given, it is up to constructor to
250        decide what to use.
251
252        .. note:: The `instances` and `weight_id` argument are at the moment
253            only used by :class:`ConditionalByRows`. The rest of the builtin
254            constructors require that `table` is given.
255
256.. class:: ConditionalEstimator
257
258    As a counterpart of :class:`Estimator`, this estimator can return
259    conditional probabilities.
260
261    .. method:: __call__([[value,] condition_value])
262
263        When given two values, it returns a probability of :math:`p(value|condition)`.
264
265        :rtype: float
266
267        When given only one value, it is interpreted as condition; the estimator
268        attempts to return a distribution of conditional probabilities for all
269        values.
270
271        :rtype: :class:`~Orange.statistics.distribution.Distribution`
272            (usually :class:`~Orange.statistics.distribution.Discrete` for
273            discrete and :class:`~Orange.statistics.distribution.Continuous`
274            for continuous) or :obj:`NoneType`
275
276        When called without arguments, it returns a
277        matrix containing probabilities :math:`p(value|condition)` for each
278        possible :math:`value` and :math:`condition` (a contingency table);
279        condition is used as outer
280        variable.
281
282        :rtype: :class:`Orange.statistics.contingency.Table` or :obj:`NoneType`
283
284        If estimator cannot return precomputed distributions and/or
285        contingencies, it returns :obj:`None`.
286
287Common Components
288=================
289
290.. class:: EstimatorFromDistribution
291
292    Bases: :class:`Estimator`
293
294    Probability estimator constructors that compute probabilities for all
295    values in advance return this estimator with calculated
296    quantities in the :obj:`probabilities` attribute.
297
298    .. attribute:: probabilities
299
300        A precomputed list of probabilities.
301
302    .. method:: __call__([value])
303
304        If value is given, return the probability of the value. For discrete
305        variables, every value has an entry in the :obj:`probabilities`
306        attribute. For continuous variables, a linear interpolation between
307        two nearest points is used to compute the probability.
308
309        :rtype: float
310
311        If the value is omitted, a copy of :obj:`probabilities` is returned.
312
313        :rtype: :class:`~Orange.statistics.distribution.Distribution`
314            (usually :class:`~Orange.statistics.distribution.Discrete` for
315            discrete and :class:`~Orange.statistics.distribution.Continuous`
316            for continuous).
317
318.. class:: ConditionalEstimatorFromDistribution
319
320    Bases: :class:`ConditionalEstimator`
321
322    Probability estimator constructors that compute the whole
323    contingency table (:class:`Orange.statistics.contingency.Table`) of
324    conditional probabilities in advance
325    return this estimator with the table in the :obj:`probabilities` attribute.
326
327    .. attribute:: probabilities
328
329        A precomputed contingency table.
330
331    .. method:: __call__([[value,] condition_value])
332
333        For detailed description of handling of different combinations of
334        parameters, see the inherited :obj:`ConditionalEstimator.__call__`.
335        For behaviour with continuous variable distributions,
336        see the unconditional counterpart :obj:`EstimatorFromDistribution.__call__`.
337
338.. class:: ConditionalByRows
339
340    Bases: :class:`ConditionalEstimatorConstructor`
341
342    .. attribute:: estimator_constructor
343
344        An unconditional probability estimator constructor.
345
346    Computes a conditional probability estimator using
347    an unconditional probability estimator constructor. The result
348    can be of type :class:`ConditionalEstimatorFromDistribution`
349    or :class:`ConditionalEstimatorByRows`, depending on the type of
350    constructor.
351
352    .. method:: __call__([table[, prior]], [instances[, weight_id]], estimator)
353
354        :param table: input distribution.
355        :type table: :class:`Orange.statistics.contingency.Table`
356
357        :param prior: prior distribution.
358        :type prior: :class:`~Orange.statistics.distribution.Distribution`
359
360        :param instances: input data.
361        :type instances: :class:`Orange.data.Table`
362
363        :param weight_id: ID of the weight attribute.
364        :type weight_id: int
365
366        :param estimator: unconditional probability estimator constructor.
367        :type estimator: :class:`EstimatorConstructor`
368
369        Compute contingency matrix if it has not been computed already. Then
370        call :obj:`estimator_constructor` for each value of condition attribute.
371        If all constructed estimators can return distribution of probabilities
372        for all classes (usually either all or none can), the
373        :class:`~Orange.statistics.distribution.Distribution` instances are put
374        in a contingency table
375        and :class:`ConditionalEstimatorFromDistribution`
376        is constructed and returned. If constructed estimators are
377        not capable of returning distribution of probabilities,
378        a :class:`ConditionalEstimatorByRows` is constructed and the
379        estimators are stored in its :obj:`estimator_list`.
380
381        :rtype: :class:`ConditionalEstimatorFromDistribution` or :class:`ConditionalEstimatorByRows`
382
383.. class:: ConditionalEstimatorByRows
384
385    Bases: :class:`ConditionalEstimator`
386
387    A conditional probability estimator constructors that itself uses a series
388    of estimators, one for each possible condition,
389    stored in its :obj:`estimator_list` attribute.
390
391    .. attribute:: estimator_list
392
393        A list of estimators; one for each value of :obj:`condition`.
394
395    .. method:: __call__([[value,] condition_value])
396
397        Uses estimators from :obj:`estimator_list`,
398        depending on given `condition_value`.
399        For detailed description of handling of different combinations of
400        parameters, see the inherited :obj:`ConditionalEstimator.__call__`.
401
402
403Example
404=======
405
406    >>> import Orange
407    >>> iris = Orange.data.Table("iris")
408    >>>
409    >>> # discrete class distribution
410    >>> iris_dist = Orange.statistics.distribution.Distribution("iris", iris)
411    >>> # m estimate constructor
412    >>> mest_constructor = Orange.statistics.estimate.M(m=10)
413    >>>
414    >>> # create the estimator
415    >>> mest = mest_constructor(iris_dist)
416    >>> print "%.2f" % mest(iris[0]['iris'])
417    0.33
418    >>> # petal length (continuous) distribution
419    >>> plength_dist = Orange.statistics.distribution.Distribution("petal length", iris)
420    >>> plength_dist.normalize()
421    >>>
422    >>> # loess contructor
423    >>> loess_est_constructor = Orange.statistics.estimate.Loess()
424    >>>
425    >>> # create the loess estimator
426    >>> loess_est = loess_est_constructor(plength_dist)
427    >>>
428    >>> print "%.2f" % loess_est(iris[0]['petal length'])
429    0.04
430    >>> # contingency matrix for the conditional estimator
431    >>> contingency = Orange.statistics.contingency.VarClass('petal length', iris)
432    >>> conditional_loess_constructor = Orange.statistics.estimate.ConditionalLoess()
433    >>>
434    >>> cloess_est = conditional_loess_constructor(contingency)
435    >>> print cloess_est(iris[0]['petal length'])
436    <0.980, 0.008, 0.012>
Note: See TracBrowser for help on using the repository browser.