# Changeset 10101:5318f864a842 in orange

Ignore:
Timestamp:
02/08/12 18:14:51 (2 years ago)
Branch:
default
rebase_source:
6f8a6086772a2376a8f6ac7ce2e0f8156dfa1bcf
Message:

Rewrite Orange.statistics.estimate documentation.

File:
1 edited

### Legend:

Unmodified
 r9671 ======================================= Probability estimators are compute value probabilities. There are two branches of probability estimators: #. for unconditional and #. for conditional probabilities. For naive Bayesian classification the first compute p(C) and the second p(C|v), where C is a class and v is a feature value. Since probability estimation is usually based on the data, the whole setup is done in orange way. As for learning, where you use a learner to construct a classifier, in probability estimation there are estimator constructors whose purpose is to construct probability estimators. This page is divided into three sections. The first describes the basic classes, the second contains classes that are abstract or only support "real" estimators - you would seldom use these directly. The last section contains estimators and constructors that you would most often use. If you are not interested details, skip the first two sections. Basic classes Probability estimators compute probabilities of values of class variable. They come in two flavours: #. for unconditional probabilities (:math:p(C=c), where :math:c is a class) and #. for conditional probabilities (:math:p(C=c|V=v), where :math:v is a feature value). A duality much like the one between learners and classifiers exists between probability estimator constructors and probability estimators: when a probability estimator constructor is called with data, it constructs a probability estimator that can then be called with a value of class variable to obtain a probability of that value. This duality is mainly needed to enable probability estimation for continuous variables, where it is not possible to generate a list of probabilities of all possible values in advance. First, probability estimation constructors for common probability estimation techniques are enumerated. Base classes, knowledge of which is needed to develop new techniques, are described later in this document. Probability Estimation Constructors =================================== .. class:: RelativeFrequency Bases: :class:EstimatorConstructor Compute distribution using relative frequencies of classes. :rtype: :class:EstimatorFromDistribution .. class:: Laplace Bases: :class:EstimatorConstructor Use Laplace estimation to compute distribution from frequencies of classes: .. math:: p(c) = \\frac{Nc+1}{N+n} where :math:Nc is number of occurrences of an event (e.g. number of instances in class c), :math:N is the total number of events (instances) and :math:n is the number of different events (classes). :rtype: :class:EstimatorFromDistribution .. class:: M Bases: :class:EstimatorConstructor .. method:: __init__(m) :param m: Parameter for m-estimation. :type m: int Use m-estimation to compute distribution from frequencies of classes: .. math:: p(c) = \\frac{Nc+m*ap(c)}{N+m} where :math:Nc is number of occurrences of an event (e.g. number of instances in class c), :math:N is the total number of events (instances) and :math:ap(c) is the prior probability of event (class) c. :rtype: :class:EstimatorFromDistribution .. class:: Kernel Bases: :class:EstimatorConstructor .. method:: __init__(min_impact, smoothing, n_points) :param min_impact: A requested minimal weight of a point (default: 0.01); points with lower weights won't be taken into account. :type min_impact: float :param smoothing: Smoothing factor (default: 1.144). :type smoothing: float :param n_points: Number of points for the interpolating curve. If negative, say -3 (default), 3 points will be inserted between each data points. :type n_points: int Compute probabilities for continuous variable for certain number of points using Gaussian kernels. The resulting point-wise continuous distribution is stored as :class:~Orange.statistics.distribution.Continuous. Probabilities are always computed at all points that are present in the data (i.e. the existing values of the continuous feature). If :obj:n_points is positive and greater than the number of existing data points, additional points are inserted between the existing points to achieve the required number of points. Approximately equal number of new points is inserted between each adjacent existing point each data points. If :obj:n_points is negative, its absolute value determines the number of points to be added between each two data points. :rtype: :class:EstimatorFromDistribution .. class:: Loess Bases: :class:EstimatorConstructor .. method:: __init__(window_proportion, n_points) :param window_proportion: A proportion of points in a window. :type window_proportion: float :param n_points: Number of points for the interpolating curve. If negative, say -3 (default), 3 points will be inserted between each data points. :type n_points: int Prepare a probability estimator that computes probability at point x as weighted local regression of probabilities for points in the window around this point. The window contains a prescribed proportion of original data points. The window is as symmetric as possible in the sense that the leftmost point in the window is approximately as far from x as the rightmost. The number of points to the left of x might thus differ from the number of points to the right. Points are weighted by bi-cubic weight function; a weight of point at x' is :math:(1-|t|^3)^3, where :math:t is :math:(x-x'>)/h and :math:h is the distance to the farther of the two window edge points. :rtype: :class:EstimatorFromDistribution .. class:: ConditionalLoess Bases: :class:ConditionalEstimatorConstructor .. method:: __init__(window_proportion, n_points) :param window_proportion: A proportion of points in a window. :type window_proportion: float :param n_points: Number of points for the interpolating curve. If negative, say -3 (default), 3 points will be inserted between each data points. :type n_points: int Construct a conditional probability estimator, in other aspects similar to the one constructed by :class:Loess. :rtype: :class:ConditionalEstimatorFromDistribution. Base classes ============= Four basic abstract classes serve as roots of the hierarchy: :class:Estimator, :class:EstimatorConstructor, :class:ConditionalEstimator and :class:ConditionalEstimatorConstructor. All probability estimators are derived from two base classes: one for unconditional and the other for conditional probability estimation. The same is true for probability estimator constructors. .. class:: EstimatorConstructor Constructor of an unconditional probability estimator. .. method:: __call__([distribution[, apriori]], [instances[, weight_id]]) :param distribution: input distribution. :type distribution: :class:~Orange.statistics.distribution.Distribution :param apriori: prior distribution. :type distribution: :class:~Orange.statistics.distribution.Distribution :param instances: input data. :type distribution: :class:Orange.data.Table :param weight_id: ID of the weight attribute. :type weight_id: int If distribution is given, it can be followed by prior class distribution. Similarly, instances can be followed by with the ID of meta attribute with instance weights. (Hint: to pass a prior distribution and instances, but no distribution, just pass :obj:None for the latter.) When both, distribution and instances are given, it is up to constructor to decide what to use. .. class:: Estimator .. attribute:: supports_discrete Tells whether the estimator can handle discrete attributes. .. attribute:: supports_continuous Tells whether the estimator can handle continuous attributes. .. method:: __call__([value]) If value is given, Return the  probability of the value (as float).  When the value is omitted, the object attempts to return a distribution of probabilities for all values (as :class:~Orange.statistics.distribution.Distribution). The result can be :class:~Orange.statistics.distribution.Discrete for discrete, :class:~Orange.statistics.distribution.Continuous for continuous features or an instance of some other class derived from :class:~Orange.statistics.distribution.Distribution. Note that it indeed makes sense to return continuous distribution. Although probabilities are stored point-wise (as something similar to Python's map, where keys are attribute values and items are probabilities, :class:~Orange.statistics.distribution.Distribution can compute probabilities between the recorded values by interpolation. The estimator does not necessarily support returning precomputed probabilities in form of :class:~Orange.statistics.distribution.Distribution; in this case, it simply returns None. .. class:: EstimatorConstructor This is an abstract class; derived classes define call operators that return different probability estimators. The class is call-constructible (i.e., if called with appropriate parameters, the constructor returns a probability estimator, not a probability estimator constructor). The call operator can accept an already computed distribution of classes or a list of examples or both. .. method:: __call__([distribution[, apriori]], [examples[,weightID]]) If distribution is given, it can be followed by apriori class distribution. Similarly, examples can be followed by with the ID of meta attribute with example weights. (Hint: if you want to have examples and a priori distribution, but don't have distribution ready, just pass None for distribution.) When both, distribution and examples are given, it is up to constructor to If value is given, return the probability of the value. :rtype: float If the value is omitted, an attempt is made to return a distribution of probabilities for all values. :rtype: :class:~Orange.statistics.distribution.Distribution (usually :class:~Orange.statistics.distribution.Discrete for discrete and :class:~Orange.statistics.distribution.Continuous for continuous) or :obj:NoneType .. class:: ConditionalEstimatorConstructor Constructor of a conditional probability estimator. .. method:: __call__([table[, apriori]], [instances[, weight_id]]) :param table: input distribution. :type table: :class:Orange.statistics.contingency.Table :param apriori: prior distribution. :type distribution: :class:~Orange.statistics.distribution.Distribution :param instances: input data. :type distribution: :class:Orange.data.Table :param weight_id: ID of the weight attribute. :type weight_id: int If distribution is given, it can be followed by prior class distribution. Similarly, instances can be followed by with the ID of meta attribute with instance weights. (Hint: to pass a prior distribution and instances, but no distribution, just pass :obj:None for the latter.) When both, distribution and instances are given, it is up to constructor to decide what to use. .. class:: ConditionalEstimator conditional probabilities. .. method:: __call__([[Value,] ConditionValue]) When given two values, it returns a probability of p(Value|Condition) (as float). When given only one value, it is interpreted as condition; the estimator returns a :class:~Orange.statistics.distribution.Distribution with probabilities p(v|Condition) for each possible value v. When called without arguments, it returns a :class:Orange.statistics.contingency.Table matrix containing probabilities p(v|c) for each possible value and condition; condition is used as outer variable. .. method:: __call__([[value,] condition_value]) When given two values, it returns a probability of :math:p(value|condition). :rtype: float When given only one value, it is interpreted as condition; the estimator attempts to return a distribution of conditional probabilities for all values. :rtype: :class:~Orange.statistics.distribution.Distribution (usually :class:~Orange.statistics.distribution.Discrete for discrete and :class:~Orange.statistics.distribution.Continuous for continuous) or :obj:NoneType When called without arguments, it returns a matrix containing probabilities :math:p(value|condition) for each possible :math:value and :math:condition (a contingency table); condition is used as outer variable. :rtype: :class:Orange.statistics.contingency.Table or :obj:NoneType If estimator cannot return precomputed distributions and/or contingencies, it returns None. .. class:: ConditionalEstimatorConstructor A counterpart of :class:EstimatorConstructor. It has similar arguments, except that the first argument is not a :class:~Orange.statistics.distribution.Distribution but :class:Orange.statistics.contingency.Table. Abstract and supporting classes =============================== There are several abstract classes that simplify the actual classes for probability estimation. contingencies, it returns :obj:None. Common Components ================= .. class:: EstimatorFromDistribution Bases: :class:Estimator Probability estimator constructors that compute probabilities for all values in advance return this estimator with calculated quantities in the :obj:probabilities attribute. .. attribute:: probabilities A precomputed list of probabilities. There are many estimator constructors that compute probabilities of classes from frequencies of classes or from list of examples. Probabilities are stored as :class:~Orange.statistics.distribution.Distribution, and :class:EstimatorFromDistribution is returned. This is done for estimators that use relative frequencies, Laplace's estimation, m-estimation and even estimators that compute continuous distributions. When asked about probability of certain value, the estimator returns a corresponding element of :obj:probabilities. Note that when distribution is continuous, linear interpolation between two points is used to compute the probability. When asked for a complete distribution, it returns a copy of :obj:probabilities. .. method:: __call__([value]) If value is given, return the probability of the value. For discrete variables, every value has an entry in the :obj:probabilities attribute. For continuous variables, a linear interpolation between two nearest points is used to compute the probability. :rtype: float If the value is omitted, a copy of :obj:probabilities is returned. :rtype: :class:~Orange.statistics.distribution.Distribution (usually :class:~Orange.statistics.distribution.Discrete for discrete and :class:~Orange.statistics.distribution.Continuous for continuous). .. class:: ConditionalEstimatorFromDistribution Bases: :class:ConditionalEstimator Probability estimator constructors that compute the whole contingency table (:class:Orange.statistics.contingency.Table) of conditional probabilities in advance return this estimator with the table in the :obj:probabilities attribute. .. attribute:: probabilities A precomputed list of probabilities This counterpart of :class:EstimatorFromDistribution stores conditional probabilities in :class:Orange.statistics.contingency.Table. .. class:: ConditionalEstimatorByRows .. attribute:: estimator_list A list of estimators; one for each value of :obj:Condition. This conditional probability estimator has different estimators for different values of conditional attribute. For instance, when used for computing p(c|A) in naive Bayesian classifier, it would have an estimator for each possible value of attribute A. This does not mean that the estimators were constructed by different constructors, i.e. using different probability estimation methods. This class is normally used when we only have a probability estimator constructor for unconditional probabilities but need to construct a conditional probability estimator; the constructor is used to construct estimators for subsets of original example set and the resulting estimators are stored in :class:ConditionalEstimatorByRows. A precomputed contingency table. .. method:: __call__([[value,] condition_value]) For detailed description of handling of different combinations of parameters, see the inherited :obj:ConditionalEstimator.__call__. For behaviour with continuous variable distributions, see the unconditional counterpart :obj:EstimatorFromDistribution.__call__. .. class:: ConditionalByRows Bases: :class:ConditionalEstimator .. attribute:: estimator_constructor An unconditional probability estimator constructor. This class computes a conditional probability estimator using Computes a conditional probability estimator using an unconditional probability estimator constructor. The result can be of type :class:ConditionalEstimatorFromDistribution constructor. The class first computes contingency matrix if it hasn't been computed already. Then it calls :obj:estimator_constructor for each value of condition attribute. If all constructed estimators can return distribution of probabilities for all classes (usually either all or none can), the :class:~Orange.statistics.distribution.Distribution are put in a contingency, and :class:ConditionalEstimatorFromDistribution is constructed and returned. If constructed estimators are not capable of returning distribution of probabilities, a :class:ConditionalEstimatorByRows is constructed and the estimators are stored in its :obj:estimator_list. Concrete probability estimators and constructors ================================================ .. class:: RelativeFrequency Computes relative frequencies of classes, puts it into a Distribution and returns it as :class:EstimatorFromDistribution. .. class:: Laplace Uses Laplace estimation to compute probabilities from frequencies of classes. .. math:: p(c) = (Nc+1) / (N+n) where Nc is number of occurences of an event (e.g. number of examples in class c), N is the total number of events (examples) and n is the number of different events (classes). The resulting estimator is again of type :class:EstimatorFromDistribution. .. class:: M .. attribute:: m Parameter for m-estimation Uses m-estimation to compute probabilities from frequencies of classes. .. math:: p(c) = (Nc+m*ap(c)) / (N+m) where Nc is number of occurences of an event (e.g. number of examples in class c), N is the total number of events (examples) and ap(c) is the apriori probability of event (class) c. The resulting estimator is of type :class:EstimatorFromDistribution. .. class:: Kernel .. attribute:: min_impact A requested minimal weight of a point (default: 0.01); points with lower weights won't be taken into account. .. attribute:: smoothing Smoothing factor (default: 1.144) .. attribute:: n_points Number of points for the interpolating curve. If negative, say -3 (default), 3 points will be inserted between each data points. Useful for continuous distributions, this constructor computes probabilities for certain number of points using Gaussian kernels. The resulting point-wise continuous distribution is stored as :class:~Orange.statistics.distribution.Continuous and returned in :class:EstimatorFromDistribution. The points at which probabilities are computed are determined like this.  Probabilities are always computed at all points that are present in the data (i.e. the existing values of the continuous attribute). If :obj:n_points is positive and greater than the number of existing data points, additional points are inserted between the existing points to achieve the required number of points. Approximately equal number of new points is inserted between each adjacent existing point each data points. .. class:: Loess .. attribute:: window_proportion A proportion of points in a window. .. attribute:: n_points Number of points for the interpolating curve. If negative, say -3 (default), 3 points will be inserted between each data points. This method of probability estimation is similar to :class:Kernel. They both return a curve computed at certain number of points and the points are determined by the same procedure. They differ, however, at the method for estimating the probabilities. To estimate probability at point x, :class:Loess examines a window containing a prescribed proportion of original data points. The window is as simetric as possible; the number of points to the left of x might differ from the number to the right, but the leftmost point is approximately as far from x as the rightmost. Let us denote the width of the windows, e.g. the distance to the farther of the two edge points, by h. Points are weighted by bi-cubic weight function; a weight of point at x' is :math:(1-|t|^3)^3, where t is :math:(x-x'>)/h. Probability at point x is then computed as weighted local regression of probabilities for points in the window. .. class:: ConditionalLoess .. attribute:: window_proportion A proportion of points in a window. .. attribute:: n_points Number of points for the interpolating curve. If negative, say -3 (default), 3 points will be inserted between each data points. Constructs similar estimator as :class:Loess, except that it computes conditional probabilites. The result is of type :class:ConditionalEstimatorFromDistribution. .. method:: __call__([table[, apriori]], [instances[, weight_id]], estimator) :param table: input distribution. :type table: :class:Orange.statistics.contingency.Table :param apriori: prior distribution. :type distribution: :class:~Orange.statistics.distribution.Distribution :param instances: input data. :type distribution: :class:Orange.data.Table :param weight_id: ID of the weight attribute. :type weight_id: int :param estimator: unconditional probability estimator constructor. :type estimator: :class:EstimatorConstructor Compute contingency matrix if it has not been computed already. Then call :obj:estimator_constructor for each value of condition attribute. If all constructed estimators can return distribution of probabilities for all classes (usually either all or none can), the :class:~Orange.statistics.distribution.Distribution instances are put in a contingency table and :class:ConditionalEstimatorFromDistribution is constructed and returned. If constructed estimators are not capable of returning distribution of probabilities, a :class:ConditionalEstimatorByRows is constructed and the estimators are stored in its :obj:estimator_list. :rtype: :class:ConditionalEstimatorFromDistribution or :class:ConditionalEstimatorByRows .. class:: ConditionalEstimatorByRows Bases: :class:ConditionalEstimator A conditional probability estimator constructors that itself uses a series of estimators, one for each possible condition, stored in its :obj:estimator_list attribute. .. attribute:: estimator_list A list of estimators; one for each value of :obj:condition. .. method:: __call__([[value,] condition_value]) Uses estimators from :obj:estimator_list, depending on given condition_value. For detailed description of handling of different combinations of parameters, see the inherited :obj:ConditionalEstimator.__call__. """ from Orange.core import ConditionalProbabilityEstimator_FromDistribution as ConditionalEstimatorFromDistribution from Orange.core import ConditionalProbabilityEstimator_ByRows as ConditionalEstimatorByRows from Orange.core import ConditionalProbabilityEstimatorConstructor as ConditionalEstimatorConstructor from Orange.core import ConditionalProbabilityEstimatorConstructor_ByRows as ConditionalByRows from Orange.core import ConditionalProbabilityEstimatorConstructor_loess as ConditionalLoess