# source:orange/orange/statistics/estimate.py@9669:165371b04b4a

Revision 9669:165371b04b4a, 13.2 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved content of Orange dir to package dir

Line
1"""
2.. index:: Probability Estimation
3
4=======================================
5Probability Estimation (``estimate``)
6=======================================
7
8Probability estimators are compute value probabilities.
9
10There are two branches of probability estimators:
11
12#. for unconditional and
13
14#. for conditional probabilities.
15
16For naive Bayesian classification the first compute p(C)
17and the second p(C|v), where C is a class and v is a feature value.
18
19Since probability estimation is usually based on the data, the whole
20setup is done in orange way. As for learning, where you use a learner
21to construct a classifier, in probability estimation there are estimator
22constructors whose purpose is to construct probability estimators.
23
24This page is divided into three sections. The first describes the basic
25classes, the second contains classes that are abstract or only support
26"real" estimators - you would seldom use these directly. The last section
27contains estimators and constructors that you would most often use. If
28you are not interested details, skip the first two sections.
29
30Basic classes
31=============
32
33Four basic abstract classes serve as roots of the hierarchy:
34:class:`Estimator`, :class:`EstimatorConstructor`,
35:class:`ConditionalEstimator` and
36:class:`ConditionalEstimatorConstructor`.
37
38.. class:: Estimator
39
40    .. attribute:: supports_discrete
41
42        Tells whether the estimator can handle discrete attributes.
43
44    .. attribute:: supports_continuous
45
46        Tells whether the estimator can handle continuous attributes.
47
48    .. method:: __call__([value])
49
50        If value is given, Return the  probability of the value
51        (as float).  When the value is omitted, the object attempts
52        to return a distribution of probabilities for all values (as
53        :class:`~Orange.statistics.distribution.Distribution`). The
54        result can be :class:`~Orange.statistics.distribution.Discrete`
55        for discrete, :class:`~Orange.statistics.distribution.Continuous`
56        for continuous features or an instance of some other class derived
57        from :class:`~Orange.statistics.distribution.Distribution`. Note
58        that it indeed makes sense to return continuous
59        distribution. Although probabilities are stored
60        point-wise (as something similar to Python's map, where
61        keys are attribute values and items are probabilities,
62        :class:`~Orange.statistics.distribution.Distribution` can compute
63        probabilities between the recorded values by interpolation.
64
65        The estimator does not necessarily support
66        returning precomputed probabilities in form of
67        :class:`~Orange.statistics.distribution.Distribution`; in this
68        case, it simply returns None.
69
70.. class:: EstimatorConstructor
71
72    This is an abstract class; derived classes define call operators
73    that return different probability estimators. The class is
74    call-constructible (i.e., if called with appropriate parameters,
75    the constructor returns a probability estimator, not a probability
76    estimator constructor).
77
78    The call operator can accept an already computed distribution of
79    classes or a list of examples or both.
80
81    .. method:: __call__([distribution[, apriori]], [examples[,weightID]])
82
83        If distribution is given, it can be followed by apriori class
84        distribution. Similarly, examples can be followed by with
85        the ID of meta attribute with example weights. (Hint: if you
86        want to have examples and a priori distribution, but don't have
87        distribution ready, just pass None for distribution.) When both,
88        distribution and examples are given, it is up to constructor to
89        decide what to use.
90
91
92.. class:: ConditionalEstimator
93
94    As a counterpart of :class:`Estimator`, this estimator can return
95    conditional probabilities.
96
97    .. method:: __call__([[Value,] ConditionValue])
98
99        When given two values, it returns a probability of
100        p(Value|Condition) (as float). When given only one value,
101        it is interpreted as condition; the estimator returns a
102        :class:`~Orange.statistics.distribution.Distribution` with
103        probabilities p(v|Condition) for each possible value v. When
104        called without arguments, it returns a :class:`Orange.statistics.contingency.Table`
105        matrix containing probabilities p(v|c) for each possible value
106        and condition; condition is used as outer variable.
107
108        If estimator cannot return precomputed distributions and/or
109        contingencies, it returns None.
110
111.. class:: ConditionalEstimatorConstructor
112
113    A counterpart of :class:`EstimatorConstructor`. It has
114    similar arguments, except that the first argument is not a
115    :class:`~Orange.statistics.distribution.Distribution` but
116    :class:`Orange.statistics.contingency.Table`.
117
118
119Abstract and supporting classes
120===============================
121
122    There are several abstract classes that simplify the actual classes
123    for probability estimation.
124
125.. class:: EstimatorFromDistribution
126
127    .. attribute:: probabilities
128
129        A precomputed list of probabilities.
130
131    There are many estimator constructors that compute
132    probabilities of classes from frequencies of classes
133    or from list of examples. Probabilities are stored as
134    :class:`~Orange.statistics.distribution.Distribution`, and
135    :class:`EstimatorFromDistribution` is returned. This is done for
136    estimators that use relative frequencies, Laplace's estimation,
137    m-estimation and even estimators that compute continuous
138    distributions.
139
141    returns a corresponding element of :obj:`probabilities`. Note that
142    when distribution is continuous, linear interpolation between two
143    points is used to compute the probability. When asked for a complete
144    distribution, it returns a copy of :obj:`probabilities`.
145
146.. class:: ConditionalEstimatorFromDistribution
147
148    .. attribute:: probabilities
149
150        A precomputed list of probabilities
151
152    This counterpart of :class:`EstimatorFromDistribution` stores
153    conditional probabilities in :class:`Orange.statistics.contingency.Table`.
154
155.. class:: ConditionalEstimatorByRows
156
157    .. attribute:: estimator_list
158
159        A list of estimators; one for each value of
160        :obj:`Condition`.
161
162    This conditional probability estimator has different estimators for
163    different values of conditional attribute. For instance, when used
164    for computing p(c|A) in naive Bayesian classifier, it would have
165    an estimator for each possible value of attribute A. This does not
166    mean that the estimators were constructed by different constructors,
167    i.e. using different probability estimation methods. This class is
168    normally used when we only have a probability estimator constructor
169    for unconditional probabilities but need to construct a conditional
170    probability estimator; the constructor is used to construct estimators
171    for subsets of original example set and the resulting estimators
172    are stored in :class:`ConditionalEstimatorByRows`.
173
174.. class:: ConditionalByRows
175
176    .. attribute:: estimator_constructor
177
178        An unconditional probability estimator constructor.
179
180    This class computes a conditional probability estimator using
181    an unconditional probability estimator constructor. The result
182    can be of type :class:`ConditionalEstimatorFromDistribution`
183    or :class:`ConditionalEstimatorByRows`, depending on the type of
184    constructor.
185
186    The class first computes contingency matrix if it hasn't been
187    computed already. Then it calls :obj:`estimator_constructor`
188    for each value of condition attribute. If all constructed
189    estimators can return distribution of probabilities
190    for all classes (usually either all or none can), the
191    :class:`~Orange.statistics.distribution.Distribution` are put in
192    a contingency, and :class:`ConditionalEstimatorFromDistribution`
193    is constructed and returned. If constructed estimators are
194    not capable of returning distribution of probabilities,
195    a :class:`ConditionalEstimatorByRows` is constructed and the
196    estimators are stored in its :obj:`estimator_list`.
197
198
199Concrete probability estimators and constructors
200================================================
201
202.. class:: RelativeFrequency
203
204    Computes relative frequencies of classes, puts it into a Distribution
205    and returns it as :class:`EstimatorFromDistribution`.
206
207.. class:: Laplace
208
209    Uses Laplace estimation to compute probabilities from frequencies
210    of classes.
211
212    .. math::
213
214        p(c) = (Nc+1) / (N+n)
215
216    where Nc is number of occurences of an event (e.g. number of examples
217    in class c), N is the total number of events (examples) and n is
218    the number of different events (classes).
219
220    The resulting estimator is again of type
221    :class:`EstimatorFromDistribution`.
222
223.. class:: M
224
225    .. attribute:: m
226
227        Parameter for m-estimation
228
229    Uses m-estimation to compute probabilities from frequencies of
230    classes.
231
232    .. math::
233
234        p(c) = (Nc+m*ap(c)) / (N+m)
235
236    where Nc is number of occurences of an event (e.g. number of examples
237    in class c), N is the total number of events (examples) and ap(c)
238    is the apriori probability of event (class) c.
239
240    The resulting estimator is of type :class:`EstimatorFromDistribution`.
241
242.. class:: Kernel
243
244    .. attribute:: min_impact
245
246        A requested minimal weight of a point (default: 0.01); points
247        with lower weights won't be taken into account.
248
249    .. attribute:: smoothing
250
251        Smoothing factor (default: 1.144)
252
253    .. attribute:: n_points
254
255        Number of points for the interpolating curve. If negative, say -3
256        (default), 3 points will be inserted between each data points.
257
258    Useful for continuous distributions, this constructor computes
259    probabilities for certain number of points using Gaussian
260    kernels. The resulting point-wise continuous distribution is stored
261    as :class:`~Orange.statistics.distribution.Continuous` and returned
262    in :class:`EstimatorFromDistribution`.
263
264    The points at which probabilities are computed are determined
265    like this.  Probabilities are always computed at all points that
266    are present in the data (i.e. the existing values of the continuous
267    attribute). If :obj:`n_points` is positive and greater than the
268    number of existing data points, additional points are inserted
269    between the existing points to achieve the required number of
270    points. Approximately equal number of new points is inserted between
271    each adjacent existing point each data points.
272
273.. class:: Loess
274
275    .. attribute:: window_proportion
276
277        A proportion of points in a window.
278
279    .. attribute:: n_points
280
281        Number of points for the interpolating curve. If negative, say -3
282        (default), 3 points will be inserted between each data points.
283
284    This method of probability estimation is similar to
285    :class:`Kernel`. They both return a curve computed at certain number
286    of points and the points are determined by the same procedure. They
287    differ, however, at the method for estimating the probabilities.
288
289    To estimate probability at point ``x``, :class:`Loess` examines a
290    window containing a prescribed proportion of original data points. The
291    window is as simetric as possible; the number of points to the left
292    of ``x`` might differ from the number to the right, but the leftmost
293    point is approximately as far from ``x`` as the rightmost. Let us
294    denote the width of the windows, e.g. the distance to the farther
295    of the two edge points, by ``h``.
296
297    Points are weighted by bi-cubic weight function; a weight of point
298    at ``x'`` is :math:`(1-|t|^3)^3`, where ``t`` is
299    :math:`(x-x'>)/h`.
300
301    Probability at point ``x`` is then computed as weighted local
302    regression of probabilities for points in the window.
303
304.. class:: ConditionalLoess
305
306    .. attribute:: window_proportion
307
308        A proportion of points in a window.
309
310    .. attribute:: n_points
311
312        Number of points for the interpolating curve. If negative, say -3
313        (default), 3 points will be inserted between each data points.
314
315    Constructs similar estimator as :class:`Loess`, except that
316    it computes conditional probabilites. The result is of type
317    :class:`ConditionalEstimatorFromDistribution`.
318
319"""
320
321import Orange
322from Orange.core import ProbabilityEstimator as Estimator
323from Orange.core import ProbabilityEstimator_FromDistribution as EstimatorFromDistribution
324from Orange.core import ProbabilityEstimatorConstructor as EstimatorConstructor
325from Orange.core import ProbabilityEstimatorConstructor_Laplace as Laplace
326from Orange.core import ProbabilityEstimatorConstructor_kernel as Kernel
327from Orange.core import ProbabilityEstimatorConstructor_loess as Loess
328from Orange.core import ProbabilityEstimatorConstructor_m as M
329from Orange.core import ProbabilityEstimatorConstructor_relative as RelativeFrequency
330from Orange.core import ConditionalProbabilityEstimator as ConditionalEstimator
331from Orange.core import ConditionalProbabilityEstimator_FromDistribution as ConditionalEstimatorFromDistribution
332from Orange.core import ConditionalProbabilityEstimator_ByRows as ConditionalEstimatorByRows
333from Orange.core import ConditionalProbabilityEstimatorConstructor_ByRows as ConditionalByRows
334from Orange.core import ConditionalProbabilityEstimatorConstructor_loess as ConditionalLoess
Note: See TracBrowser for help on using the repository browser.