source: orange/orange/Orange/statistics/estimate.py @ 9135:22a4efea7940

Revision 9135:22a4efea7940, 13.2 KB checked in by markotoplak, 2 years ago (diff)

converted Probability estimation. Fixes #759.

Line 
1"""
2=======================================
3Probability Estimation (``estimate``)
4=======================================
5
6Probability estimators are compute value probabilities.
7
8There are two branches of probability estimators:
9
10#. for unconditional and
11
12#. for conditional probabilities.
13
14For naive Bayesian classification the first compute p(C)
15and the second p(C|v), where C is a class and v is a feature value.
16
17Since probability estimation is usually based on the data, the whole
18setup is done in orange way. As for learning, where you use a learner
19to construct a classifier, in probability estimation there are estimator
20constructors whose purpose is to construct probability estimators.
21
22This page is divided into three sections. The first describes the basic
23classes, the second contains classes that are abstract or only support
24"real" estimators - you would seldom use these directly. The last section
25contains estimators and constructors that you would most often use. If
26you are not interested details, skip the first two sections.
27
28Basic classes
29=============
30
31Four basic abstract classes serve as roots of the hierarchy:
32:class:`Estimator`, :class:`EstimatorConstructor`,
33:class:`ConditionalEstimator` and
34:class:`ConditionalEstimatorConstructor`.
35
36.. class:: Estimator
37
38    .. attribute:: supports_discrete
39   
40        Tells whether the estimator can handle discrete attributes.
41       
42    .. attribute:: supports_continuous
43       
44        Tells whether the estimator can handle continuous attributes.
45
46    .. method:: __call__([value])
47
48        If value is given, Return the  probability of the value
49        (as float).  When the value is omitted, the object attempts
50        to return a distribution of probabilities for all values (as
51        :class:`~Orange.statistics.distribution.Distribution`). The
52        result can be :class:`~Orange.statistics.distribution.Discrete`
53        for discrete, :class:`~Orange.statistics.distribution.Continuous`
54        for continuous features or an instance of some other class derived
55        from :class:`~Orange.statistics.distribution.Distribution`. Note
56        that it indeed makes sense to return continuous
57        distribution. Although probabilities are stored
58        point-wise (as something similar to Python's map, where
59        keys are attribute values and items are probabilities,
60        :class:`~Orange.statistics.distribution.Distribution` can compute
61        probabilities between the recorded values by interpolation.
62
63        The estimator does not necessarily support
64        returning precomputed probabilities in form of
65        :class:`~Orange.statistics.distribution.Distribution`; in this
66        case, it simply returns None.
67
68.. class:: EstimatorConstructor
69
70    This is an abstract class; derived classes define call operators
71    that return different probability estimators. The class is
72    call-constructible (i.e., if called with appropriate parameters,
73    the constructor returns a probability estimator, not a probability
74    estimator constructor).
75
76    The call operator can accept an already computed distribution of
77    classes or a list of examples or both.
78
79    .. method:: __call__([distribution[, apriori]], [examples[,weightID]])
80
81        If distribution is given, it can be followed by apriori class
82        distribution. Similarly, examples can be followed by with
83        the ID of meta attribute with example weights. (Hint: if you
84        want to have examples and a priori distribution, but don't have
85        distribution ready, just pass None for distribution.) When both,
86        distribution and examples are given, it is up to constructor to
87        decide what to use.
88
89
90.. class:: ConditionalEstimator
91
92    As a counterpart of :class:`Estimator`, this estimator can return
93    conditional probabilities.
94
95    .. method:: __call__([[Value,] ConditionValue])
96
97        When given two values, it returns a probability of
98        p(Value|Condition) (as float). When given only one value,
99        it is interpreted as condition; the estimator returns a
100        :class:`~Orange.statistics.distribution.Distribution` with
101        probabilities p(v|Condition) for each possible value v. When
102        called without arguments, it returns a :class:`Orange.statistics.contingency.Table`
103        matrix containing probabilities p(v|c) for each possible value
104        and condition; condition is used as outer variable.
105
106        If estimator cannot return precomputed distributions and/or
107        contingencies, it returns None.
108
109.. class:: ConditionalEstimatorConstructor
110
111    A counterpart of :class:`EstimatorConstructor`. It has
112    similar arguments, except that the first argument is not a
113    :class:`~Orange.statistics.distribution.Distribution` but
114    :class:`Orange.statistics.contingency.Table`.
115
116
117Abstract and supporting classes
118===============================
119
120    There are several abstract classes that simplify the actual classes
121    for probability estimation.
122
123.. class:: EstimatorFromDistribution
124
125    .. attribute:: probabilities
126
127        A precomputed list of probabilities.
128
129    There are many estimator constructors that compute
130    probabilities of classes from frequencies of classes
131    or from list of examples. Probabilities are stored as
132    :class:`~Orange.statistics.distribution.Distribution`, and
133    :class:`EstimatorFromDistribution` is returned. This is done for
134    estimators that use relative frequencies, Laplace's estimation,
135    m-estimation and even estimators that compute continuous
136    distributions.
137
138    When asked about probability of certain value, the estimator
139    returns a corresponding element of :obj:`probabilities`. Note that
140    when distribution is continuous, linear interpolation between two
141    points is used to compute the probability. When asked for a complete
142    distribution, it returns a copy of :obj:`probabilities`.
143
144.. class:: ConditionalEstimatorFromDistribution
145
146    .. attribute:: probabilities
147
148        A precomputed list of probabilities
149
150    This counterpart of :class:`EstimatorFromDistribution` stores
151    conditional probabilities in :class:`Orange.statistics.contingency.Table`.
152
153.. class:: ConditionalEstimatorByRows
154
155    .. attribute:: estimator_list
156
157        A list of estimators; one for each value of
158        :obj:`Condition`.
159
160    This conditional probability estimator has different estimators for
161    different values of conditional attribute. For instance, when used
162    for computing p(c|A) in naive Bayesian classifier, it would have
163    an estimator for each possible value of attribute A. This does not
164    mean that the estimators were constructed by different constructors,
165    i.e. using different probability estimation methods. This class is
166    normally used when we only have a probability estimator constructor
167    for unconditional probabilities but need to construct a conditional
168    probability estimator; the constructor is used to construct estimators
169    for subsets of original example set and the resulting estimators
170    are stored in :class:`ConditionalEstimatorByRows`.
171
172.. class:: ConditionalByRows
173
174    .. attribute:: estimator_constructor
175
176        An unconditional probability estimator constructor.
177
178    This class computes a conditional probability estimator using
179    an unconditional probability estimator constructor. The result
180    can be of type :class:`ConditionalEstimatorFromDistribution`
181    or :class:`ConditionalEstimatorByRows`, depending on the type of
182    constructor.
183
184    The class first computes contingency matrix if it hasn't been
185    computed already. Then it calls :obj:`estimator_constructor`
186    for each value of condition attribute. If all constructed
187    estimators can return distribution of probabilities
188    for all classes (usually either all or none can), the
189    :class:`~Orange.statistics.distribution.Distribution` are put in
190    a contingency, and :class:`ConditionalEstimatorFromDistribution`
191    is constructed and returned. If constructed estimators are
192    not capable of returning distribution of probabilities,
193    a :class:`ConditionalEstimatorByRows` is constructed and the
194    estimators are stored in its :obj:`estimator_list`.
195
196
197Concrete probability estimators and constructors
198================================================
199
200.. class:: RelativeFrequency
201
202    Computes relative frequencies of classes, puts it into a Distribution
203    and returns it as :class:`EstimatorFromDistribution`.
204
205.. class:: Laplace
206
207    Uses Laplace estimation to compute probabilities from frequencies
208    of classes.
209
210    .. math::
211
212        p(c) = (Nc+1) / (N+n)
213
214    where Nc is number of occurences of an event (e.g. number of examples
215    in class c), N is the total number of events (examples) and n is
216    the number of different events (classes).
217
218    The resulting estimator is again of type
219    :class:`EstimatorFromDistribution`.
220
221.. class:: M
222
223    .. attribute:: m
224
225        Parameter for m-estimation
226
227    Uses m-estimation to compute probabilities from frequencies of
228    classes.
229
230    .. math::
231
232        p(c) = (Nc+m*ap(c)) / (N+m)
233
234    where Nc is number of occurences of an event (e.g. number of examples
235    in class c), N is the total number of events (examples) and ap(c)
236    is the apriori probability of event (class) c.
237
238    The resulting estimator is of type :class:`EstimatorFromDistribution`.
239
240.. class:: Kernel
241
242    .. attribute:: min_impact
243
244        A requested minimal weight of a point (default: 0.01); points
245        with lower weights won't be taken into account.
246
247    .. attribute:: smoothing
248
249        Smoothing factor (default: 1.144)
250
251    .. attribute:: n_points
252
253        Number of points for the interpolating curve. If negative, say -3
254        (default), 3 points will be inserted between each data points.
255
256    Useful for continuous distributions, this constructor computes
257    probabilities for certain number of points using Gaussian
258    kernels. The resulting point-wise continuous distribution is stored
259    as :class:`~Orange.statistics.distribution.Continuous` and returned
260    in :class:`EstimatorFromDistribution`.
261
262    The points at which probabilities are computed are determined
263    like this.  Probabilities are always computed at all points that
264    are present in the data (i.e. the existing values of the continuous
265    attribute). If :obj:`n_points` is positive and greater than the
266    number of existing data points, additional points are inserted
267    between the existing points to achieve the required number of
268    points. Approximately equal number of new points is inserted between
269    each adjacent existing point each data points.
270
271.. class:: Loess
272
273    .. attribute:: window_proportion
274
275        A proportion of points in a window.
276
277    .. attribute:: n_points
278
279        Number of points for the interpolating curve. If negative, say -3
280        (default), 3 points will be inserted between each data points.
281
282    This method of probability estimation is similar to
283    :class:`Kernel`. They both return a curve computed at certain number
284    of points and the points are determined by the same procedure. They
285    differ, however, at the method for estimating the probabilities.
286
287    To estimate probability at point ``x``, :class:`Loess` examines a
288    window containing a prescribed proportion of original data points. The
289    window is as simetric as possible; the number of points to the left
290    of ``x`` might differ from the number to the right, but the leftmost
291    point is approximately as far from ``x`` as the rightmost. Let us
292    denote the width of the windows, e.g. the distance to the farther
293    of the two edge points, by ``h``.
294
295    Points are weighted by bi-cubic weight function; a weight of point
296    at ``x'`` is :math:`(1-|t|^3)^3`, where ``t`` is
297    :math:`(x-x'>)/h`.
298
299    Probability at point ``x`` is then computed as weighted local
300    regression of probabilities for points in the window.
301
302.. class:: ConditionalLoess
303
304    .. attribute:: window_proportion
305
306        A proportion of points in a window.
307
308    .. attribute:: n_points
309
310        Number of points for the interpolating curve. If negative, say -3
311        (default), 3 points will be inserted between each data points.
312
313    Constructs similar estimator as :class:`Loess`, except that
314    it computes conditional probabilites. The result is of type
315    :class:`ConditionalEstimatorFromDistribution`.
316
317"""
318
319import Orange
320from Orange.core import ProbabilityEstimator as Estimator
321from Orange.core import ProbabilityEstimator_FromDistribution as EstimatorFromDistribution
322from Orange.core import ProbabilityEstimatorConstructor as EstimatorConstructor
323from Orange.core import ProbabilityEstimatorConstructor_Laplace as Laplace
324from Orange.core import ProbabilityEstimatorConstructor_kernel as Kernel
325from Orange.core import ProbabilityEstimatorConstructor_loess as Loess
326from Orange.core import ProbabilityEstimatorConstructor_m as M
327from Orange.core import ProbabilityEstimatorConstructor_relative as RelativeFrequency
328from Orange.core import ConditionalProbabilityEstimator as ConditionalEstimator
329from Orange.core import ConditionalProbabilityEstimator_FromDistribution as ConditionalEstimatorFromDistribution
330from Orange.core import ConditionalProbabilityEstimator_ByRows as ConditionalEstimatorByRows
331from Orange.core import ConditionalProbabilityEstimatorConstructor_ByRows as ConditionalByRows
332from Orange.core import ConditionalProbabilityEstimatorConstructor_loess as ConditionalLoess
Note: See TracBrowser for help on using the repository browser.