Changeset 9135:22a4efea7940 in orange


Ignore:
Timestamp:
10/21/11 15:22:55 (2 years ago)
Author:
markotoplak
Branch:
default
Convert:
80066a0169a034a6129ca6aed520b4c37b7d07c6
Message:

converted Probability estimation. Fixes #759.

Location:
orange
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/statistics/estimate.py

    r8863 r9135  
     1""" 
     2======================================= 
     3Probability Estimation (``estimate``) 
     4======================================= 
     5 
     6Probability estimators are compute value probabilities. 
     7 
     8There are two branches of probability estimators: 
     9 
     10#. for unconditional and 
     11 
     12#. for conditional probabilities. 
     13 
     14For naive Bayesian classification the first compute p(C) 
     15and the second p(C|v), where C is a class and v is a feature value. 
     16 
     17Since probability estimation is usually based on the data, the whole 
     18setup is done in orange way. As for learning, where you use a learner 
     19to construct a classifier, in probability estimation there are estimator 
     20constructors whose purpose is to construct probability estimators. 
     21 
     22This page is divided into three sections. The first describes the basic 
     23classes, the second contains classes that are abstract or only support 
     24"real" estimators - you would seldom use these directly. The last section 
     25contains estimators and constructors that you would most often use. If 
     26you are not interested details, skip the first two sections. 
     27 
     28Basic classes 
     29============= 
     30 
     31Four basic abstract classes serve as roots of the hierarchy: 
     32:class:`Estimator`, :class:`EstimatorConstructor`, 
     33:class:`ConditionalEstimator` and 
     34:class:`ConditionalEstimatorConstructor`. 
     35 
     36.. class:: Estimator 
     37 
     38    .. attribute:: supports_discrete 
     39     
     40        Tells whether the estimator can handle discrete attributes. 
     41         
     42    .. attribute:: supports_continuous 
     43         
     44        Tells whether the estimator can handle continuous attributes. 
     45 
     46    .. method:: __call__([value]) 
     47 
     48        If value is given, Return the  probability of the value 
     49        (as float).  When the value is omitted, the object attempts 
     50        to return a distribution of probabilities for all values (as 
     51        :class:`~Orange.statistics.distribution.Distribution`). The 
     52        result can be :class:`~Orange.statistics.distribution.Discrete` 
     53        for discrete, :class:`~Orange.statistics.distribution.Continuous` 
     54        for continuous features or an instance of some other class derived 
     55        from :class:`~Orange.statistics.distribution.Distribution`. Note 
     56        that it indeed makes sense to return continuous 
     57        distribution. Although probabilities are stored 
     58        point-wise (as something similar to Python's map, where 
     59        keys are attribute values and items are probabilities, 
     60        :class:`~Orange.statistics.distribution.Distribution` can compute 
     61        probabilities between the recorded values by interpolation. 
     62 
     63        The estimator does not necessarily support 
     64        returning precomputed probabilities in form of 
     65        :class:`~Orange.statistics.distribution.Distribution`; in this 
     66        case, it simply returns None. 
     67 
     68.. class:: EstimatorConstructor 
     69 
     70    This is an abstract class; derived classes define call operators 
     71    that return different probability estimators. The class is 
     72    call-constructible (i.e., if called with appropriate parameters, 
     73    the constructor returns a probability estimator, not a probability 
     74    estimator constructor). 
     75 
     76    The call operator can accept an already computed distribution of 
     77    classes or a list of examples or both. 
     78 
     79    .. method:: __call__([distribution[, apriori]], [examples[,weightID]]) 
     80 
     81        If distribution is given, it can be followed by apriori class 
     82        distribution. Similarly, examples can be followed by with 
     83        the ID of meta attribute with example weights. (Hint: if you 
     84        want to have examples and a priori distribution, but don't have 
     85        distribution ready, just pass None for distribution.) When both, 
     86        distribution and examples are given, it is up to constructor to 
     87        decide what to use. 
     88 
     89 
     90.. class:: ConditionalEstimator 
     91 
     92    As a counterpart of :class:`Estimator`, this estimator can return 
     93    conditional probabilities. 
     94 
     95    .. method:: __call__([[Value,] ConditionValue]) 
     96 
     97        When given two values, it returns a probability of 
     98        p(Value|Condition) (as float). When given only one value, 
     99        it is interpreted as condition; the estimator returns a 
     100        :class:`~Orange.statistics.distribution.Distribution` with 
     101        probabilities p(v|Condition) for each possible value v. When 
     102        called without arguments, it returns a :class:`Orange.statistics.contingency.Table` 
     103        matrix containing probabilities p(v|c) for each possible value 
     104        and condition; condition is used as outer variable. 
     105 
     106        If estimator cannot return precomputed distributions and/or 
     107        contingencies, it returns None. 
     108 
     109.. class:: ConditionalEstimatorConstructor 
     110 
     111    A counterpart of :class:`EstimatorConstructor`. It has 
     112    similar arguments, except that the first argument is not a 
     113    :class:`~Orange.statistics.distribution.Distribution` but 
     114    :class:`Orange.statistics.contingency.Table`. 
     115 
     116 
     117Abstract and supporting classes  
     118=============================== 
     119 
     120    There are several abstract classes that simplify the actual classes 
     121    for probability estimation. 
     122 
     123.. class:: EstimatorFromDistribution 
     124 
     125    .. attribute:: probabilities 
     126 
     127        A precomputed list of probabilities. 
     128 
     129    There are many estimator constructors that compute 
     130    probabilities of classes from frequencies of classes 
     131    or from list of examples. Probabilities are stored as 
     132    :class:`~Orange.statistics.distribution.Distribution`, and 
     133    :class:`EstimatorFromDistribution` is returned. This is done for 
     134    estimators that use relative frequencies, Laplace's estimation, 
     135    m-estimation and even estimators that compute continuous 
     136    distributions. 
     137 
     138    When asked about probability of certain value, the estimator 
     139    returns a corresponding element of :obj:`probabilities`. Note that 
     140    when distribution is continuous, linear interpolation between two 
     141    points is used to compute the probability. When asked for a complete 
     142    distribution, it returns a copy of :obj:`probabilities`. 
     143 
     144.. class:: ConditionalEstimatorFromDistribution 
     145 
     146    .. attribute:: probabilities 
     147 
     148        A precomputed list of probabilities 
     149 
     150    This counterpart of :class:`EstimatorFromDistribution` stores 
     151    conditional probabilities in :class:`Orange.statistics.contingency.Table`. 
     152 
     153.. class:: ConditionalEstimatorByRows 
     154 
     155    .. attribute:: estimator_list 
     156 
     157        A list of estimators; one for each value of 
     158        :obj:`Condition`. 
     159 
     160    This conditional probability estimator has different estimators for 
     161    different values of conditional attribute. For instance, when used 
     162    for computing p(c|A) in naive Bayesian classifier, it would have 
     163    an estimator for each possible value of attribute A. This does not 
     164    mean that the estimators were constructed by different constructors, 
     165    i.e. using different probability estimation methods. This class is 
     166    normally used when we only have a probability estimator constructor 
     167    for unconditional probabilities but need to construct a conditional 
     168    probability estimator; the constructor is used to construct estimators 
     169    for subsets of original example set and the resulting estimators 
     170    are stored in :class:`ConditionalEstimatorByRows`. 
     171 
     172.. class:: ConditionalByRows 
     173 
     174    .. attribute:: estimator_constructor 
     175 
     176        An unconditional probability estimator constructor. 
     177 
     178    This class computes a conditional probability estimator using 
     179    an unconditional probability estimator constructor. The result 
     180    can be of type :class:`ConditionalEstimatorFromDistribution` 
     181    or :class:`ConditionalEstimatorByRows`, depending on the type of 
     182    constructor. 
     183 
     184    The class first computes contingency matrix if it hasn't been 
     185    computed already. Then it calls :obj:`estimator_constructor` 
     186    for each value of condition attribute. If all constructed 
     187    estimators can return distribution of probabilities 
     188    for all classes (usually either all or none can), the 
     189    :class:`~Orange.statistics.distribution.Distribution` are put in 
     190    a contingency, and :class:`ConditionalEstimatorFromDistribution` 
     191    is constructed and returned. If constructed estimators are 
     192    not capable of returning distribution of probabilities, 
     193    a :class:`ConditionalEstimatorByRows` is constructed and the 
     194    estimators are stored in its :obj:`estimator_list`. 
     195 
     196 
     197Concrete probability estimators and constructors 
     198================================================ 
     199 
     200.. class:: RelativeFrequency 
     201 
     202    Computes relative frequencies of classes, puts it into a Distribution 
     203    and returns it as :class:`EstimatorFromDistribution`. 
     204 
     205.. class:: Laplace 
     206 
     207    Uses Laplace estimation to compute probabilities from frequencies 
     208    of classes. 
     209 
     210    .. math:: 
     211 
     212        p(c) = (Nc+1) / (N+n) 
     213 
     214    where Nc is number of occurences of an event (e.g. number of examples 
     215    in class c), N is the total number of events (examples) and n is 
     216    the number of different events (classes). 
     217 
     218    The resulting estimator is again of type 
     219    :class:`EstimatorFromDistribution`. 
     220 
     221.. class:: M 
     222 
     223    .. attribute:: m 
     224 
     225        Parameter for m-estimation 
     226 
     227    Uses m-estimation to compute probabilities from frequencies of 
     228    classes. 
     229 
     230    .. math:: 
     231 
     232        p(c) = (Nc+m*ap(c)) / (N+m) 
     233 
     234    where Nc is number of occurences of an event (e.g. number of examples 
     235    in class c), N is the total number of events (examples) and ap(c) 
     236    is the apriori probability of event (class) c. 
     237 
     238    The resulting estimator is of type :class:`EstimatorFromDistribution`. 
     239 
     240.. class:: Kernel 
     241 
     242    .. attribute:: min_impact 
     243 
     244        A requested minimal weight of a point (default: 0.01); points 
     245        with lower weights won't be taken into account. 
     246 
     247    .. attribute:: smoothing 
     248 
     249        Smoothing factor (default: 1.144) 
     250 
     251    .. attribute:: n_points 
     252 
     253        Number of points for the interpolating curve. If negative, say -3 
     254        (default), 3 points will be inserted between each data points. 
     255 
     256    Useful for continuous distributions, this constructor computes 
     257    probabilities for certain number of points using Gaussian 
     258    kernels. The resulting point-wise continuous distribution is stored 
     259    as :class:`~Orange.statistics.distribution.Continuous` and returned 
     260    in :class:`EstimatorFromDistribution`. 
     261 
     262    The points at which probabilities are computed are determined 
     263    like this.  Probabilities are always computed at all points that 
     264    are present in the data (i.e. the existing values of the continuous 
     265    attribute). If :obj:`n_points` is positive and greater than the 
     266    number of existing data points, additional points are inserted 
     267    between the existing points to achieve the required number of 
     268    points. Approximately equal number of new points is inserted between 
     269    each adjacent existing point each data points. 
     270 
     271.. class:: Loess 
     272 
     273    .. attribute:: window_proportion 
     274 
     275        A proportion of points in a window. 
     276 
     277    .. attribute:: n_points 
     278 
     279        Number of points for the interpolating curve. If negative, say -3 
     280        (default), 3 points will be inserted between each data points. 
     281 
     282    This method of probability estimation is similar to 
     283    :class:`Kernel`. They both return a curve computed at certain number 
     284    of points and the points are determined by the same procedure. They 
     285    differ, however, at the method for estimating the probabilities. 
     286 
     287    To estimate probability at point ``x``, :class:`Loess` examines a 
     288    window containing a prescribed proportion of original data points. The 
     289    window is as simetric as possible; the number of points to the left 
     290    of ``x`` might differ from the number to the right, but the leftmost 
     291    point is approximately as far from ``x`` as the rightmost. Let us 
     292    denote the width of the windows, e.g. the distance to the farther 
     293    of the two edge points, by ``h``. 
     294 
     295    Points are weighted by bi-cubic weight function; a weight of point 
     296    at ``x'`` is :math:`(1-|t|^3)^3`, where ``t`` is 
     297    :math:`(x-x'>)/h`. 
     298 
     299    Probability at point ``x`` is then computed as weighted local 
     300    regression of probabilities for points in the window. 
     301 
     302.. class:: ConditionalLoess 
     303 
     304    .. attribute:: window_proportion 
     305 
     306        A proportion of points in a window. 
     307 
     308    .. attribute:: n_points 
     309 
     310        Number of points for the interpolating curve. If negative, say -3 
     311        (default), 3 points will be inserted between each data points. 
     312 
     313    Constructs similar estimator as :class:`Loess`, except that 
     314    it computes conditional probabilites. The result is of type 
     315    :class:`ConditionalEstimatorFromDistribution`. 
     316 
     317""" 
     318 
    1319import Orange 
    2320from Orange.core import ProbabilityEstimator as Estimator 
     
    8326from Orange.core import ProbabilityEstimatorConstructor_m as M 
    9327from Orange.core import ProbabilityEstimatorConstructor_relative as RelativeFrequency 
     328from Orange.core import ConditionalProbabilityEstimator as ConditionalEstimator 
     329from Orange.core import ConditionalProbabilityEstimator_FromDistribution as ConditionalEstimatorFromDistribution 
     330from Orange.core import ConditionalProbabilityEstimator_ByRows as ConditionalEstimatorByRows 
     331from Orange.core import ConditionalProbabilityEstimatorConstructor_ByRows as ConditionalByRows 
     332from Orange.core import ConditionalProbabilityEstimatorConstructor_loess as ConditionalLoess 
  • orange/doc/Orange/rst/Orange.statistics.estimate.rst

    r8863 r9135  
    11.. automodule:: Orange.statistics.estimate 
    22 
    3 ====================== 
    4 Probability estimation 
    5 ====================== 
    6  
    7 .. class:: Estimator(*args, **kwds) 
    8 .. class:: EstimatorFromDistribution(*args, **kwds) 
    9 .. class:: EstimatorConstructor(*args, **kwds) 
    10 .. class:: Laplace(*args, **kwds) 
    11 .. class:: Kernel(*args, **kwds) 
    12 .. class:: Loess(*args, **kwds) 
    13 .. class:: M(*args, **kwds) 
    14 .. class:: RelativeFrequency(*args, **kwds) 
  • orange/fixes/fix_changed_names.py

    r9006 r9135  
    5555           "orange.ContingencyClassAttr": "Orange.statistics.contingency.ClassVar", 
    5656           "orange.DomainContingency": "Orange.statistics.contingency.Domain", 
     57           "orange.Contingency": "Orange.statistics.contingency.Table", 
    5758           
    5859           "orange.MeasureAttribute": "Orange.feature.scoring.Score",  
     
    502503           "orange.ProbabilityEstimatorConstructor_m": "Orange.statistics.estimate.M", 
    503504           "orange.ProbabilityEstimatorConstructor_relative": "Orange.statistics.estimate.RelativeFrequency", 
     505           "orange.onditionalProbabilityEstimator": "Orange.statistics.estimate.ConditionalEstimator", 
     506           "orange.ConditionalProbabilityEstimator_FromDistribution": "Orange.statistics.estimate.ConditionalEstimatorFromDistribution", 
     507           "orange.ConditionalProbabilityEstimator_ByRows": "Orange.statistics.estimate.ConditionalEstimatorByRows", 
     508           "orange.ConditionalProbabilityEstimatorConstructor_ByRows": "Orange.statistics.estimate.ConditionalByRows", 
     509           "orange.ConditionalProbabilityEstimatorConstructor_loess": "Orange.statistics.estimate.ConditionalLoess", 
     510 
    504511           } 
    505512 
Note: See TracChangeset for help on using the changeset viewer.