# source:orange/docs/reference/rst/Orange.statistics.estimate.rst@11337:02feeae55f5f

Revision 11337:02feeae55f5f, 15.7 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Added some clarification and and a code example to the estimator documentation.

Line
1.. automodule:: Orange.statistics.estimate
2
3.. index:: Probability Estimation
4
5=====================================
6Probability Estimation (estimate)
7=====================================
8
9Probability estimators compute probabilities of values of class variable.
10They come in two flavours:
11
12#. for unconditional probabilities (:math:p(C=c), where :math:c is a
13   class) and
14
15#. for conditional probabilities (:math:p(C=c|V=v),
16   where :math:v is a feature value).
17
18A duality much like the one between learners and classifiers exists between
19probability estimator constructors and probability estimators: when a
20probability estimator constructor is called with data, it constructs a
21probability estimator that can then be called with a value of class variable
22to obtain a probability of that value. This duality is mainly needed to
23enable probability estimation for continuous variables,
24where it is not possible to generate a list of probabilities of all possible
26
27First, probability estimation constructors for common probability estimation
28techniques are enumerated. Base classes, knowledge of which is needed to
29develop new techniques, are described later in this document.
30
31Probability Estimation Constructors
32===================================
33
34.. class:: RelativeFrequency
35
36    Bases: :class:EstimatorConstructor
37
38    Compute distribution using relative frequencies of classes.
39
40    :rtype: :class:EstimatorFromDistribution
41
42.. class:: Laplace
43
44    Bases: :class:EstimatorConstructor
45
46    Use Laplace estimation to compute distribution from frequencies of classes:
47
48    .. math::
49
50        p(c) = \\frac{Nc+1}{N+n}
51
52    where :math:Nc is number of occurrences of an event (e.g. number of
53    instances in class c), :math:N is the total number of events (instances)
54    and :math:n is the number of different events (classes).
55
56    :rtype: :class:EstimatorFromDistribution
57
58.. class:: M
59
60    Bases: :class:EstimatorConstructor
61
62    .. method:: __init__(m)
63
64        :param m: Parameter for m-estimation.
65        :type m: int
66
67    Use m-estimation to compute distribution from frequencies of classes:
68
69    .. math::
70
71        p(c) = \\frac{Nc+m*ap(c)}{N+m}
72
73    where :math:Nc is number of occurrences of an event (e.g. number of
74    instances in class c), :math:N is the total number of events (instances)
75    and :math:ap(c) is the prior probability of event (class) c.
76
77    :rtype: :class:EstimatorFromDistribution
78
79.. class:: Kernel
80
81    Bases: :class:EstimatorConstructor
82
83    .. method:: __init__(min_impact, smoothing, n_points)
84
85        :param min_impact: A requested minimal weight of a point (default:
86            0.01); points with lower weights won't be taken into account.
87        :type min_impact: float
88
89        :param smoothing: Smoothing factor (default: 1.144).
90        :type smoothing: float
91
92        :param n_points: Number of points for the interpolating curve. If
93            negative, say -3 (default), 3 points will be inserted between each
94            data points.
95        :type n_points: int
96
97    Compute probabilities for continuous variable for certain number of points
98    using Gaussian kernels. The resulting point-wise continuous distribution is
99    stored as :class:~Orange.statistics.distribution.Continuous.
100
101    Probabilities are always computed at all points that
102    are present in the data (i.e. the existing values of the continuous
103    feature). If :obj:n_points is positive and greater than the
104    number of existing data points, additional points are inserted
105    between the existing points to achieve the required number of
106    points. Approximately equal number of new points is inserted between
107    each adjacent existing point each data points. If :obj:n_points is
108    negative, its absolute value determines the number of points to be added
109    between each two data points.
110
111    :rtype: :class:EstimatorFromDistribution
112
113.. class:: Loess
114
115    Bases: :class:EstimatorConstructor
116
117    .. method:: __init__(window_proportion, n_points)
118
119        :param window_proportion: A proportion of points in a window.
120        :type window_proportion: float
121
122        :param n_points: Number of points for the interpolating curve. If
123            negative, say -3 (default), 3 points will be inserted between each
124            data points.
125        :type n_points: int
126
127    Prepare a probability estimator that computes probability at point x
128    as weighted local regression of probabilities for points in the window
129    around this point.
130
131    The window contains a prescribed proportion of original data points. The
132    window is as symmetric as possible in the sense that the leftmost point in
133    the window is approximately as far from x as the rightmost. The
134    number of points to the left of x might thus differ from the number
135    of points to the right.
136
137    Points are weighted by bi-cubic weight function; a weight of point
138    at x' is :math:(1-|t|^3)^3, where :math:t is
139    :math:(x-x'>)/h and :math:h is the distance to the farther
140    of the two window edge points.
141
142    :rtype: :class:EstimatorFromDistribution
143
144
145.. class:: ConditionalLoess
146
147    Bases: :class:ConditionalEstimatorConstructor
148
149    .. method:: __init__(window_proportion, n_points)
150
151        :param window_proportion: A proportion of points in a window.
152        :type window_proportion: float
153
154        :param n_points: Number of points for the interpolating curve. If
155            negative, say -3 (default), 3 points will be inserted between each
156            data points.
157        :type n_points: int
158
159    Construct a conditional probability estimator, in other aspects
160    similar to the one constructed by :class:Loess.
161
162    :rtype: :class:ConditionalEstimatorFromDistribution.
163
164
165Base classes
166============
167
168All probability estimators are derived from two base classes: one for
169unconditional and the other for conditional probability estimation. The same
170is true for probability estimator constructors.
171
172.. class:: EstimatorConstructor
173
174    Constructor of an unconditional probability estimator.
175
176    .. method:: __call__([distribution[, prior]], [instances[, weight_id]])
177
178        :param distribution: input distribution.
179        :type distribution: :class:~Orange.statistics.distribution.Distribution
180
181        :param priori: prior distribution.
182        :type priori: :class:~Orange.statistics.distribution.Distribution
183
184        :param instances: input data.
185        :type instances: :class:Orange.data.Table
186
187        :param weight_id: ID of the weight attribute.
188        :type weight_id: int
189
190        If distribution is given, it can be followed by prior class
191        distribution. Similarly, instances can be followed by with
192        the ID of meta attribute with instance weights. (Hint: to pass a
193        prior distribution and instances, but no distribution,
194        just pass :obj:None for the latter.) When both,
195        distribution and instances are given, it is up to constructor to
196        decide what to use.
197
198        .. note:: The instances and weight_id argument are at the moment
199            only used by :class:ConditionalByRows. The rest of the builtin
200            constructors require that distribution is given.
201
202.. class:: Estimator
203
204    .. attribute:: supports_discrete
205
206        Tells whether the estimator can handle discrete attributes.
207
208    .. attribute:: supports_continuous
209
210        Tells whether the estimator can handle continuous attributes.
211
212    .. method:: __call__([value])
213
214        If value is given, return the probability of the value.
215
216        :rtype: float
217
218        If the value is omitted, an attempt is made
219        to return a distribution of probabilities for all values.
220
221        :rtype: :class:~Orange.statistics.distribution.Distribution
222            (usually :class:~Orange.statistics.distribution.Discrete for
223            discrete and :class:~Orange.statistics.distribution.Continuous
224            for continuous) or :obj:NoneType
225
226.. class:: ConditionalEstimatorConstructor
227
228    Constructor of a conditional probability estimator.
229
230    .. method:: __call__([table[, prior]], [instances[, weight_id]])
231
232        :param table: input distribution.
233        :type table: :class:Orange.statistics.contingency.Table
234
235        :param prior: prior distribution.
236        :type prior: :class:~Orange.statistics.distribution.Distribution
237
238        :param instances: input data.
239        :type instances: :class:Orange.data.Table
240
241        :param weight_id: ID of the weight attribute.
242        :type weight_id: int
243
244        If distribution is given, it can be followed by prior class
245        distribution. Similarly, instances can be followed by with
246        the ID of meta attribute with instance weights. (Hint: to pass a
247        prior distribution and instances, but no distribution,
248        just pass :obj:None for the latter.) When both,
249        distribution and instances are given, it is up to constructor to
250        decide what to use.
251
252        .. note:: The instances and weight_id argument are at the moment
253            only used by :class:ConditionalByRows. The rest of the builtin
254            constructors require that table is given.
255
256.. class:: ConditionalEstimator
257
258    As a counterpart of :class:Estimator, this estimator can return
259    conditional probabilities.
260
261    .. method:: __call__([[value,] condition_value])
262
263        When given two values, it returns a probability of :math:p(value|condition).
264
265        :rtype: float
266
267        When given only one value, it is interpreted as condition; the estimator
268        attempts to return a distribution of conditional probabilities for all
269        values.
270
271        :rtype: :class:~Orange.statistics.distribution.Distribution
272            (usually :class:~Orange.statistics.distribution.Discrete for
273            discrete and :class:~Orange.statistics.distribution.Continuous
274            for continuous) or :obj:NoneType
275
276        When called without arguments, it returns a
277        matrix containing probabilities :math:p(value|condition) for each
278        possible :math:value and :math:condition (a contingency table);
279        condition is used as outer
280        variable.
281
282        :rtype: :class:Orange.statistics.contingency.Table or :obj:NoneType
283
284        If estimator cannot return precomputed distributions and/or
285        contingencies, it returns :obj:None.
286
287Common Components
288=================
289
290.. class:: EstimatorFromDistribution
291
292    Bases: :class:Estimator
293
294    Probability estimator constructors that compute probabilities for all
295    values in advance return this estimator with calculated
296    quantities in the :obj:probabilities attribute.
297
298    .. attribute:: probabilities
299
300        A precomputed list of probabilities.
301
302    .. method:: __call__([value])
303
304        If value is given, return the probability of the value. For discrete
305        variables, every value has an entry in the :obj:probabilities
306        attribute. For continuous variables, a linear interpolation between
307        two nearest points is used to compute the probability.
308
309        :rtype: float
310
311        If the value is omitted, a copy of :obj:probabilities is returned.
312
313        :rtype: :class:~Orange.statistics.distribution.Distribution
314            (usually :class:~Orange.statistics.distribution.Discrete for
315            discrete and :class:~Orange.statistics.distribution.Continuous
316            for continuous).
317
318.. class:: ConditionalEstimatorFromDistribution
319
320    Bases: :class:ConditionalEstimator
321
322    Probability estimator constructors that compute the whole
323    contingency table (:class:Orange.statistics.contingency.Table) of
324    conditional probabilities in advance
325    return this estimator with the table in the :obj:probabilities attribute.
326
327    .. attribute:: probabilities
328
329        A precomputed contingency table.
330
331    .. method:: __call__([[value,] condition_value])
332
333        For detailed description of handling of different combinations of
334        parameters, see the inherited :obj:ConditionalEstimator.__call__.
335        For behaviour with continuous variable distributions,
336        see the unconditional counterpart :obj:EstimatorFromDistribution.__call__.
337
338.. class:: ConditionalByRows
339
340    Bases: :class:ConditionalEstimatorConstructor
341
342    .. attribute:: estimator_constructor
343
344        An unconditional probability estimator constructor.
345
346    Computes a conditional probability estimator using
347    an unconditional probability estimator constructor. The result
348    can be of type :class:ConditionalEstimatorFromDistribution
349    or :class:ConditionalEstimatorByRows, depending on the type of
350    constructor.
351
352    .. method:: __call__([table[, prior]], [instances[, weight_id]], estimator)
353
354        :param table: input distribution.
355        :type table: :class:Orange.statistics.contingency.Table
356
357        :param prior: prior distribution.
358        :type prior: :class:~Orange.statistics.distribution.Distribution
359
360        :param instances: input data.
361        :type instances: :class:Orange.data.Table
362
363        :param weight_id: ID of the weight attribute.
364        :type weight_id: int
365
366        :param estimator: unconditional probability estimator constructor.
367        :type estimator: :class:EstimatorConstructor
368
369        Compute contingency matrix if it has not been computed already. Then
370        call :obj:estimator_constructor for each value of condition attribute.
371        If all constructed estimators can return distribution of probabilities
372        for all classes (usually either all or none can), the
373        :class:~Orange.statistics.distribution.Distribution instances are put
374        in a contingency table
375        and :class:ConditionalEstimatorFromDistribution
376        is constructed and returned. If constructed estimators are
377        not capable of returning distribution of probabilities,
378        a :class:ConditionalEstimatorByRows is constructed and the
379        estimators are stored in its :obj:estimator_list.
380
381        :rtype: :class:ConditionalEstimatorFromDistribution or :class:ConditionalEstimatorByRows
382
383.. class:: ConditionalEstimatorByRows
384
385    Bases: :class:ConditionalEstimator
386
387    A conditional probability estimator constructors that itself uses a series
388    of estimators, one for each possible condition,
389    stored in its :obj:estimator_list attribute.
390
391    .. attribute:: estimator_list
392
393        A list of estimators; one for each value of :obj:condition.
394
395    .. method:: __call__([[value,] condition_value])
396
397        Uses estimators from :obj:estimator_list,
398        depending on given condition_value.
399        For detailed description of handling of different combinations of
400        parameters, see the inherited :obj:ConditionalEstimator.__call__.
401
402
403Example
404=======
405
406    >>> import Orange
407    >>> iris = Orange.data.Table("iris")
408    >>>
409    >>> # discrete class distribution
410    >>> iris_dist = Orange.statistics.distribution.Distribution("iris", iris)
411    >>> # m estimate constructor
412    >>> mest_constructor = Orange.statistics.estimate.M(m=10)
413    >>>
414    >>> # create the estimator
415    >>> mest = mest_constructor(iris_dist)
416    >>> print "%.2f" % mest(iris[0]['iris'])
417    0.33
418    >>> # petal length (continuous) distribution
419    >>> plength_dist = Orange.statistics.distribution.Distribution("petal length", iris)
420    >>> plength_dist.normalize()
421    >>>
422    >>> # loess contructor
423    >>> loess_est_constructor = Orange.statistics.estimate.Loess()
424    >>>
425    >>> # create the loess estimator
426    >>> loess_est = loess_est_constructor(plength_dist)
427    >>>
428    >>> print "%.2f" % loess_est(iris[0]['petal length'])
429    0.04
430    >>> # contingency matrix for the conditional estimator
431    >>> contingency = Orange.statistics.contingency.VarClass('petal length', iris)
432    >>> conditional_loess_constructor = Orange.statistics.estimate.ConditionalLoess()
433    >>>
434    >>> cloess_est = conditional_loess_constructor(contingency)
435    >>> print cloess_est(iris[0]['petal length'])
436    <0.980, 0.008, 0.012>
Note: See TracBrowser for help on using the repository browser.