Changeset 9135:22a4efea7940 in orange

10/21/11 15:22:55 (2 years ago)

converted Probability estimation. Fixes #759.

3 edited


  • orange/Orange/statistics/

    r8863 r9135  
     3Probability Estimation (``estimate``) 
     6Probability estimators are compute value probabilities. 
     8There are two branches of probability estimators: 
     10#. for unconditional and 
     12#. for conditional probabilities. 
     14For naive Bayesian classification the first compute p(C) 
     15and the second p(C|v), where C is a class and v is a feature value. 
     17Since probability estimation is usually based on the data, the whole 
     18setup is done in orange way. As for learning, where you use a learner 
     19to construct a classifier, in probability estimation there are estimator 
     20constructors whose purpose is to construct probability estimators. 
     22This page is divided into three sections. The first describes the basic 
     23classes, the second contains classes that are abstract or only support 
     24"real" estimators - you would seldom use these directly. The last section 
     25contains estimators and constructors that you would most often use. If 
     26you are not interested details, skip the first two sections. 
     28Basic classes 
     31Four basic abstract classes serve as roots of the hierarchy: 
     32:class:`Estimator`, :class:`EstimatorConstructor`, 
     33:class:`ConditionalEstimator` and 
     36.. class:: Estimator 
     38    .. attribute:: supports_discrete 
     40        Tells whether the estimator can handle discrete attributes. 
     42    .. attribute:: supports_continuous 
     44        Tells whether the estimator can handle continuous attributes. 
     46    .. method:: __call__([value]) 
     48        If value is given, Return the  probability of the value 
     49        (as float).  When the value is omitted, the object attempts 
     50        to return a distribution of probabilities for all values (as 
     51        :class:`~Orange.statistics.distribution.Distribution`). The 
     52        result can be :class:`~Orange.statistics.distribution.Discrete` 
     53        for discrete, :class:`~Orange.statistics.distribution.Continuous` 
     54        for continuous features or an instance of some other class derived 
     55        from :class:`~Orange.statistics.distribution.Distribution`. Note 
     56        that it indeed makes sense to return continuous 
     57        distribution. Although probabilities are stored 
     58        point-wise (as something similar to Python's map, where 
     59        keys are attribute values and items are probabilities, 
     60        :class:`~Orange.statistics.distribution.Distribution` can compute 
     61        probabilities between the recorded values by interpolation. 
     63        The estimator does not necessarily support 
     64        returning precomputed probabilities in form of 
     65        :class:`~Orange.statistics.distribution.Distribution`; in this 
     66        case, it simply returns None. 
     68.. class:: EstimatorConstructor 
     70    This is an abstract class; derived classes define call operators 
     71    that return different probability estimators. The class is 
     72    call-constructible (i.e., if called with appropriate parameters, 
     73    the constructor returns a probability estimator, not a probability 
     74    estimator constructor). 
     76    The call operator can accept an already computed distribution of 
     77    classes or a list of examples or both. 
     79    .. method:: __call__([distribution[, apriori]], [examples[,weightID]]) 
     81        If distribution is given, it can be followed by apriori class 
     82        distribution. Similarly, examples can be followed by with 
     83        the ID of meta attribute with example weights. (Hint: if you 
     84        want to have examples and a priori distribution, but don't have 
     85        distribution ready, just pass None for distribution.) When both, 
     86        distribution and examples are given, it is up to constructor to 
     87        decide what to use. 
     90.. class:: ConditionalEstimator 
     92    As a counterpart of :class:`Estimator`, this estimator can return 
     93    conditional probabilities. 
     95    .. method:: __call__([[Value,] ConditionValue]) 
     97        When given two values, it returns a probability of 
     98        p(Value|Condition) (as float). When given only one value, 
     99        it is interpreted as condition; the estimator returns a 
     100        :class:`~Orange.statistics.distribution.Distribution` with 
     101        probabilities p(v|Condition) for each possible value v. When 
     102        called without arguments, it returns a :class:`Orange.statistics.contingency.Table` 
     103        matrix containing probabilities p(v|c) for each possible value 
     104        and condition; condition is used as outer variable. 
     106        If estimator cannot return precomputed distributions and/or 
     107        contingencies, it returns None. 
     109.. class:: ConditionalEstimatorConstructor 
     111    A counterpart of :class:`EstimatorConstructor`. It has 
     112    similar arguments, except that the first argument is not a 
     113    :class:`~Orange.statistics.distribution.Distribution` but 
     114    :class:`Orange.statistics.contingency.Table`. 
     117Abstract and supporting classes  
     120    There are several abstract classes that simplify the actual classes 
     121    for probability estimation. 
     123.. class:: EstimatorFromDistribution 
     125    .. attribute:: probabilities 
     127        A precomputed list of probabilities. 
     129    There are many estimator constructors that compute 
     130    probabilities of classes from frequencies of classes 
     131    or from list of examples. Probabilities are stored as 
     132    :class:`~Orange.statistics.distribution.Distribution`, and 
     133    :class:`EstimatorFromDistribution` is returned. This is done for 
     134    estimators that use relative frequencies, Laplace's estimation, 
     135    m-estimation and even estimators that compute continuous 
     136    distributions. 
     138    When asked about probability of certain value, the estimator 
     139    returns a corresponding element of :obj:`probabilities`. Note that 
     140    when distribution is continuous, linear interpolation between two 
     141    points is used to compute the probability. When asked for a complete 
     142    distribution, it returns a copy of :obj:`probabilities`. 
     144.. class:: ConditionalEstimatorFromDistribution 
     146    .. attribute:: probabilities 
     148        A precomputed list of probabilities 
     150    This counterpart of :class:`EstimatorFromDistribution` stores 
     151    conditional probabilities in :class:`Orange.statistics.contingency.Table`. 
     153.. class:: ConditionalEstimatorByRows 
     155    .. attribute:: estimator_list 
     157        A list of estimators; one for each value of 
     158        :obj:`Condition`. 
     160    This conditional probability estimator has different estimators for 
     161    different values of conditional attribute. For instance, when used 
     162    for computing p(c|A) in naive Bayesian classifier, it would have 
     163    an estimator for each possible value of attribute A. This does not 
     164    mean that the estimators were constructed by different constructors, 
     165    i.e. using different probability estimation methods. This class is 
     166    normally used when we only have a probability estimator constructor 
     167    for unconditional probabilities but need to construct a conditional 
     168    probability estimator; the constructor is used to construct estimators 
     169    for subsets of original example set and the resulting estimators 
     170    are stored in :class:`ConditionalEstimatorByRows`. 
     172.. class:: ConditionalByRows 
     174    .. attribute:: estimator_constructor 
     176        An unconditional probability estimator constructor. 
     178    This class computes a conditional probability estimator using 
     179    an unconditional probability estimator constructor. The result 
     180    can be of type :class:`ConditionalEstimatorFromDistribution` 
     181    or :class:`ConditionalEstimatorByRows`, depending on the type of 
     182    constructor. 
     184    The class first computes contingency matrix if it hasn't been 
     185    computed already. Then it calls :obj:`estimator_constructor` 
     186    for each value of condition attribute. If all constructed 
     187    estimators can return distribution of probabilities 
     188    for all classes (usually either all or none can), the 
     189    :class:`~Orange.statistics.distribution.Distribution` are put in 
     190    a contingency, and :class:`ConditionalEstimatorFromDistribution` 
     191    is constructed and returned. If constructed estimators are 
     192    not capable of returning distribution of probabilities, 
     193    a :class:`ConditionalEstimatorByRows` is constructed and the 
     194    estimators are stored in its :obj:`estimator_list`. 
     197Concrete probability estimators and constructors 
     200.. class:: RelativeFrequency 
     202    Computes relative frequencies of classes, puts it into a Distribution 
     203    and returns it as :class:`EstimatorFromDistribution`. 
     205.. class:: Laplace 
     207    Uses Laplace estimation to compute probabilities from frequencies 
     208    of classes. 
     210    .. math:: 
     212        p(c) = (Nc+1) / (N+n) 
     214    where Nc is number of occurences of an event (e.g. number of examples 
     215    in class c), N is the total number of events (examples) and n is 
     216    the number of different events (classes). 
     218    The resulting estimator is again of type 
     219    :class:`EstimatorFromDistribution`. 
     221.. class:: M 
     223    .. attribute:: m 
     225        Parameter for m-estimation 
     227    Uses m-estimation to compute probabilities from frequencies of 
     228    classes. 
     230    .. math:: 
     232        p(c) = (Nc+m*ap(c)) / (N+m) 
     234    where Nc is number of occurences of an event (e.g. number of examples 
     235    in class c), N is the total number of events (examples) and ap(c) 
     236    is the apriori probability of event (class) c. 
     238    The resulting estimator is of type :class:`EstimatorFromDistribution`. 
     240.. class:: Kernel 
     242    .. attribute:: min_impact 
     244        A requested minimal weight of a point (default: 0.01); points 
     245        with lower weights won't be taken into account. 
     247    .. attribute:: smoothing 
     249        Smoothing factor (default: 1.144) 
     251    .. attribute:: n_points 
     253        Number of points for the interpolating curve. If negative, say -3 
     254        (default), 3 points will be inserted between each data points. 
     256    Useful for continuous distributions, this constructor computes 
     257    probabilities for certain number of points using Gaussian 
     258    kernels. The resulting point-wise continuous distribution is stored 
     259    as :class:`~Orange.statistics.distribution.Continuous` and returned 
     260    in :class:`EstimatorFromDistribution`. 
     262    The points at which probabilities are computed are determined 
     263    like this.  Probabilities are always computed at all points that 
     264    are present in the data (i.e. the existing values of the continuous 
     265    attribute). If :obj:`n_points` is positive and greater than the 
     266    number of existing data points, additional points are inserted 
     267    between the existing points to achieve the required number of 
     268    points. Approximately equal number of new points is inserted between 
     269    each adjacent existing point each data points. 
     271.. class:: Loess 
     273    .. attribute:: window_proportion 
     275        A proportion of points in a window. 
     277    .. attribute:: n_points 
     279        Number of points for the interpolating curve. If negative, say -3 
     280        (default), 3 points will be inserted between each data points. 
     282    This method of probability estimation is similar to 
     283    :class:`Kernel`. They both return a curve computed at certain number 
     284    of points and the points are determined by the same procedure. They 
     285    differ, however, at the method for estimating the probabilities. 
     287    To estimate probability at point ``x``, :class:`Loess` examines a 
     288    window containing a prescribed proportion of original data points. The 
     289    window is as simetric as possible; the number of points to the left 
     290    of ``x`` might differ from the number to the right, but the leftmost 
     291    point is approximately as far from ``x`` as the rightmost. Let us 
     292    denote the width of the windows, e.g. the distance to the farther 
     293    of the two edge points, by ``h``. 
     295    Points are weighted by bi-cubic weight function; a weight of point 
     296    at ``x'`` is :math:`(1-|t|^3)^3`, where ``t`` is 
     297    :math:`(x-x'>)/h`. 
     299    Probability at point ``x`` is then computed as weighted local 
     300    regression of probabilities for points in the window. 
     302.. class:: ConditionalLoess 
     304    .. attribute:: window_proportion 
     306        A proportion of points in a window. 
     308    .. attribute:: n_points 
     310        Number of points for the interpolating curve. If negative, say -3 
     311        (default), 3 points will be inserted between each data points. 
     313    Constructs similar estimator as :class:`Loess`, except that 
     314    it computes conditional probabilites. The result is of type 
     315    :class:`ConditionalEstimatorFromDistribution`. 
    1319import Orange 
    2320from Orange.core import ProbabilityEstimator as Estimator 
    8326from Orange.core import ProbabilityEstimatorConstructor_m as M 
    9327from Orange.core import ProbabilityEstimatorConstructor_relative as RelativeFrequency 
     328from Orange.core import ConditionalProbabilityEstimator as ConditionalEstimator 
     329from Orange.core import ConditionalProbabilityEstimator_FromDistribution as ConditionalEstimatorFromDistribution 
     330from Orange.core import ConditionalProbabilityEstimator_ByRows as ConditionalEstimatorByRows 
     331from Orange.core import ConditionalProbabilityEstimatorConstructor_ByRows as ConditionalByRows 
     332from Orange.core import ConditionalProbabilityEstimatorConstructor_loess as ConditionalLoess 
  • orange/doc/Orange/rst/Orange.statistics.estimate.rst

    r8863 r9135  
    11.. automodule:: Orange.statistics.estimate 
    3 ====================== 
    4 Probability estimation 
    5 ====================== 
    7 .. class:: Estimator(*args, **kwds) 
    8 .. class:: EstimatorFromDistribution(*args, **kwds) 
    9 .. class:: EstimatorConstructor(*args, **kwds) 
    10 .. class:: Laplace(*args, **kwds) 
    11 .. class:: Kernel(*args, **kwds) 
    12 .. class:: Loess(*args, **kwds) 
    13 .. class:: M(*args, **kwds) 
    14 .. class:: RelativeFrequency(*args, **kwds) 
  • orange/fixes/

    r9006 r9135  
    5555           "orange.ContingencyClassAttr": "Orange.statistics.contingency.ClassVar", 
    5656           "orange.DomainContingency": "Orange.statistics.contingency.Domain", 
     57           "orange.Contingency": "Orange.statistics.contingency.Table", 
    5859           "orange.MeasureAttribute": "Orange.feature.scoring.Score",  
    502503           "orange.ProbabilityEstimatorConstructor_m": "Orange.statistics.estimate.M", 
    503504           "orange.ProbabilityEstimatorConstructor_relative": "Orange.statistics.estimate.RelativeFrequency", 
     505           "orange.onditionalProbabilityEstimator": "Orange.statistics.estimate.ConditionalEstimator", 
     506           "orange.ConditionalProbabilityEstimator_FromDistribution": "Orange.statistics.estimate.ConditionalEstimatorFromDistribution", 
     507           "orange.ConditionalProbabilityEstimator_ByRows": "Orange.statistics.estimate.ConditionalEstimatorByRows", 
     508           "orange.ConditionalProbabilityEstimatorConstructor_ByRows": "Orange.statistics.estimate.ConditionalByRows", 
     509           "orange.ConditionalProbabilityEstimatorConstructor_loess": "Orange.statistics.estimate.ConditionalLoess", 
    504511           } 
Note: See TracChangeset for help on using the changeset viewer.