Ignore:
Timestamp:
02/08/12 18:17:05 (2 years ago)
Author:
Matija Polajnar <matija.polajnar@…>
Branch:
default
rebase_source:
e1c57397b9f546f4ad8f3ccb9e05cb89ad67e639
Message:

Move Orange.statistics.estimate documentation to rst. Closes #1070.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Orange/statistics/estimate.py

    r10101 r10102  
    1 """ 
    2 .. index:: Probability Estimation 
    3  
    4 ======================================= 
    5 Probability Estimation (``estimate``) 
    6 ======================================= 
    7  
    8 Probability estimators compute probabilities of values of class variable. 
    9 They come in two flavours: 
    10  
    11 #. for unconditional probabilities (:math:`p(C=c)`, where :math:`c` is a 
    12    class) and 
    13  
    14 #. for conditional probabilities (:math:`p(C=c|V=v)`, 
    15    where :math:`v` is a feature value). 
    16  
    17 A duality much like the one between learners and classifiers exists between 
    18 probability estimator constructors and probability estimators: when a 
    19 probability estimator constructor is called with data, it constructs a 
    20 probability estimator that can then be called with a value of class variable 
    21 to obtain a probability of that value. This duality is mainly needed to 
    22 enable probability estimation for continuous variables, 
    23 where it is not possible to generate a list of probabilities of all possible 
    24 values in advance. 
    25  
    26 First, probability estimation constructors for common probability estimation 
    27 techniques are enumerated. Base classes, knowledge of which is needed to 
    28 develop new techniques, are described later in this document. 
    29  
    30 Probability Estimation Constructors 
    31 =================================== 
    32  
    33 .. class:: RelativeFrequency 
    34  
    35     Bases: :class:`EstimatorConstructor` 
    36  
    37     Compute distribution using relative frequencies of classes. 
    38  
    39     :rtype: :class:`EstimatorFromDistribution` 
    40  
    41 .. class:: Laplace 
    42  
    43     Bases: :class:`EstimatorConstructor` 
    44  
    45     Use Laplace estimation to compute distribution from frequencies of classes: 
    46  
    47     .. math:: 
    48  
    49         p(c) = \\frac{Nc+1}{N+n} 
    50  
    51     where :math:`Nc` is number of occurrences of an event (e.g. number of 
    52     instances in class c), :math:`N` is the total number of events (instances) 
    53     and :math:`n` is the number of different events (classes). 
    54  
    55     :rtype: :class:`EstimatorFromDistribution` 
    56  
    57 .. class:: M 
    58  
    59     Bases: :class:`EstimatorConstructor` 
    60  
    61     .. method:: __init__(m) 
    62  
    63         :param m: Parameter for m-estimation. 
    64         :type m: int 
    65  
    66     Use m-estimation to compute distribution from frequencies of classes: 
    67  
    68     .. math:: 
    69  
    70         p(c) = \\frac{Nc+m*ap(c)}{N+m} 
    71  
    72     where :math:`Nc` is number of occurrences of an event (e.g. number of 
    73     instances in class c), :math:`N` is the total number of events (instances) 
    74     and :math:`ap(c)` is the prior probability of event (class) c. 
    75  
    76     :rtype: :class:`EstimatorFromDistribution` 
    77  
    78 .. class:: Kernel 
    79  
    80     Bases: :class:`EstimatorConstructor` 
    81  
    82     .. method:: __init__(min_impact, smoothing, n_points) 
    83  
    84         :param min_impact: A requested minimal weight of a point (default: 
    85             0.01); points with lower weights won't be taken into account. 
    86         :type min_impact: float 
    87  
    88         :param smoothing: Smoothing factor (default: 1.144). 
    89         :type smoothing: float 
    90  
    91         :param n_points: Number of points for the interpolating curve. If 
    92             negative, say -3 (default), 3 points will be inserted between each 
    93             data points. 
    94         :type n_points: int 
    95  
    96     Compute probabilities for continuous variable for certain number of points 
    97     using Gaussian kernels. The resulting point-wise continuous distribution is 
    98     stored as :class:`~Orange.statistics.distribution.Continuous`. 
    99  
    100     Probabilities are always computed at all points that 
    101     are present in the data (i.e. the existing values of the continuous 
    102     feature). If :obj:`n_points` is positive and greater than the 
    103     number of existing data points, additional points are inserted 
    104     between the existing points to achieve the required number of 
    105     points. Approximately equal number of new points is inserted between 
    106     each adjacent existing point each data points. If :obj:`n_points` is 
    107     negative, its absolute value determines the number of points to be added 
    108     between each two data points. 
    109  
    110     :rtype: :class:`EstimatorFromDistribution` 
    111  
    112 .. class:: Loess 
    113  
    114     Bases: :class:`EstimatorConstructor` 
    115  
    116     .. method:: __init__(window_proportion, n_points) 
    117  
    118         :param window_proportion: A proportion of points in a window. 
    119         :type window_proportion: float 
    120  
    121         :param n_points: Number of points for the interpolating curve. If 
    122             negative, say -3 (default), 3 points will be inserted between each 
    123             data points. 
    124         :type n_points: int 
    125  
    126     Prepare a probability estimator that computes probability at point ``x`` 
    127     as weighted local regression of probabilities for points in the window 
    128     around this point. 
    129  
    130     The window contains a prescribed proportion of original data points. The 
    131     window is as symmetric as possible in the sense that the leftmost point in 
    132     the window is approximately as far from ``x`` as the rightmost. The 
    133     number of points to the left of ``x`` might thus differ from the number 
    134     of points to the right. 
    135  
    136     Points are weighted by bi-cubic weight function; a weight of point 
    137     at ``x'`` is :math:`(1-|t|^3)^3`, where :math:`t` is 
    138     :math:`(x-x'>)/h` and :math:`h` is the distance to the farther 
    139     of the two window edge points. 
    140  
    141     :rtype: :class:`EstimatorFromDistribution` 
    142  
    143  
    144 .. class:: ConditionalLoess 
    145  
    146     Bases: :class:`ConditionalEstimatorConstructor` 
    147  
    148     .. method:: __init__(window_proportion, n_points) 
    149  
    150         :param window_proportion: A proportion of points in a window. 
    151         :type window_proportion: float 
    152  
    153         :param n_points: Number of points for the interpolating curve. If 
    154             negative, say -3 (default), 3 points will be inserted between each 
    155             data points. 
    156         :type n_points: int 
    157  
    158     Construct a conditional probability estimator, in other aspects 
    159     similar to the one constructed by :class:`Loess`. 
    160  
    161     :rtype: :class:`ConditionalEstimatorFromDistribution`. 
    162  
    163  
    164 Base classes 
    165 ============= 
    166  
    167 All probability estimators are derived from two base classes: one for 
    168 unconditional and the other for conditional probability estimation. The same 
    169 is true for probability estimator constructors. 
    170  
    171 .. class:: EstimatorConstructor 
    172  
    173     Constructor of an unconditional probability estimator. 
    174  
    175     .. method:: __call__([distribution[, apriori]], [instances[, weight_id]]) 
    176  
    177         :param distribution: input distribution. 
    178         :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    179  
    180         :param apriori: prior distribution. 
    181         :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    182  
    183         :param instances: input data. 
    184         :type distribution: :class:`Orange.data.Table` 
    185  
    186         :param weight_id: ID of the weight attribute. 
    187         :type weight_id: int 
    188  
    189         If distribution is given, it can be followed by prior class 
    190         distribution. Similarly, instances can be followed by with 
    191         the ID of meta attribute with instance weights. (Hint: to pass a 
    192         prior distribution and instances, but no distribution, 
    193         just pass :obj:`None` for the latter.) When both, 
    194         distribution and instances are given, it is up to constructor to 
    195         decide what to use. 
    196  
    197 .. class:: Estimator 
    198  
    199     .. attribute:: supports_discrete 
    200  
    201         Tells whether the estimator can handle discrete attributes. 
    202  
    203     .. attribute:: supports_continuous 
    204  
    205         Tells whether the estimator can handle continuous attributes. 
    206  
    207     .. method:: __call__([value]) 
    208  
    209         If value is given, return the probability of the value. 
    210  
    211         :rtype: float 
    212  
    213         If the value is omitted, an attempt is made 
    214         to return a distribution of probabilities for all values. 
    215  
    216         :rtype: :class:`~Orange.statistics.distribution.Distribution` 
    217             (usually :class:`~Orange.statistics.distribution.Discrete` for 
    218             discrete and :class:`~Orange.statistics.distribution.Continuous` 
    219             for continuous) or :obj:`NoneType` 
    220  
    221 .. class:: ConditionalEstimatorConstructor 
    222  
    223     Constructor of a conditional probability estimator. 
    224  
    225     .. method:: __call__([table[, apriori]], [instances[, weight_id]]) 
    226  
    227         :param table: input distribution. 
    228         :type table: :class:`Orange.statistics.contingency.Table` 
    229  
    230         :param apriori: prior distribution. 
    231         :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    232  
    233         :param instances: input data. 
    234         :type distribution: :class:`Orange.data.Table` 
    235  
    236         :param weight_id: ID of the weight attribute. 
    237         :type weight_id: int 
    238  
    239         If distribution is given, it can be followed by prior class 
    240         distribution. Similarly, instances can be followed by with 
    241         the ID of meta attribute with instance weights. (Hint: to pass a 
    242         prior distribution and instances, but no distribution, 
    243         just pass :obj:`None` for the latter.) When both, 
    244         distribution and instances are given, it is up to constructor to 
    245         decide what to use. 
    246  
    247 .. class:: ConditionalEstimator 
    248  
    249     As a counterpart of :class:`Estimator`, this estimator can return 
    250     conditional probabilities. 
    251  
    252     .. method:: __call__([[value,] condition_value]) 
    253  
    254         When given two values, it returns a probability of :math:`p(value|condition)`. 
    255  
    256         :rtype: float 
    257  
    258         When given only one value, it is interpreted as condition; the estimator 
    259         attempts to return a distribution of conditional probabilities for all 
    260         values. 
    261  
    262         :rtype: :class:`~Orange.statistics.distribution.Distribution` 
    263             (usually :class:`~Orange.statistics.distribution.Discrete` for 
    264             discrete and :class:`~Orange.statistics.distribution.Continuous` 
    265             for continuous) or :obj:`NoneType` 
    266  
    267         When called without arguments, it returns a 
    268         matrix containing probabilities :math:`p(value|condition)` for each 
    269         possible :math:`value` and :math:`condition` (a contingency table); 
    270         condition is used as outer 
    271         variable. 
    272  
    273         :rtype: :class:`Orange.statistics.contingency.Table` or :obj:`NoneType` 
    274  
    275         If estimator cannot return precomputed distributions and/or 
    276         contingencies, it returns :obj:`None`. 
    277  
    278 Common Components 
    279 ================= 
    280  
    281 .. class:: EstimatorFromDistribution 
    282  
    283     Bases: :class:`Estimator` 
    284  
    285     Probability estimator constructors that compute probabilities for all 
    286     values in advance return this estimator with calculated 
    287     quantities in the :obj:`probabilities` attribute. 
    288  
    289     .. attribute:: probabilities 
    290  
    291         A precomputed list of probabilities. 
    292  
    293     .. method:: __call__([value]) 
    294  
    295         If value is given, return the probability of the value. For discrete 
    296         variables, every value has an entry in the :obj:`probabilities` 
    297         attribute. For continuous variables, a linear interpolation between 
    298         two nearest points is used to compute the probability. 
    299  
    300         :rtype: float 
    301  
    302         If the value is omitted, a copy of :obj:`probabilities` is returned. 
    303  
    304         :rtype: :class:`~Orange.statistics.distribution.Distribution` 
    305             (usually :class:`~Orange.statistics.distribution.Discrete` for 
    306             discrete and :class:`~Orange.statistics.distribution.Continuous` 
    307             for continuous). 
    308  
    309 .. class:: ConditionalEstimatorFromDistribution 
    310  
    311     Bases: :class:`ConditionalEstimator` 
    312  
    313     Probability estimator constructors that compute the whole 
    314     contingency table (:class:`Orange.statistics.contingency.Table`) of 
    315     conditional probabilities in advance 
    316     return this estimator with the table in the :obj:`probabilities` attribute. 
    317  
    318     .. attribute:: probabilities 
    319  
    320         A precomputed contingency table. 
    321  
    322     .. method:: __call__([[value,] condition_value]) 
    323  
    324         For detailed description of handling of different combinations of 
    325         parameters, see the inherited :obj:`ConditionalEstimator.__call__`. 
    326         For behaviour with continuous variable distributions, 
    327         see the unconditional counterpart :obj:`EstimatorFromDistribution.__call__`. 
    328  
    329 .. class:: ConditionalByRows 
    330  
    331     Bases: :class:`ConditionalEstimator` 
    332  
    333     .. attribute:: estimator_constructor 
    334  
    335         An unconditional probability estimator constructor. 
    336  
    337     Computes a conditional probability estimator using 
    338     an unconditional probability estimator constructor. The result 
    339     can be of type :class:`ConditionalEstimatorFromDistribution` 
    340     or :class:`ConditionalEstimatorByRows`, depending on the type of 
    341     constructor. 
    342  
    343     .. method:: __call__([table[, apriori]], [instances[, weight_id]], estimator) 
    344  
    345         :param table: input distribution. 
    346         :type table: :class:`Orange.statistics.contingency.Table` 
    347  
    348         :param apriori: prior distribution. 
    349         :type distribution: :class:`~Orange.statistics.distribution.Distribution` 
    350  
    351         :param instances: input data. 
    352         :type distribution: :class:`Orange.data.Table` 
    353  
    354         :param weight_id: ID of the weight attribute. 
    355         :type weight_id: int 
    356  
    357         :param estimator: unconditional probability estimator constructor. 
    358         :type estimator: :class:`EstimatorConstructor` 
    359  
    360         Compute contingency matrix if it has not been computed already. Then 
    361         call :obj:`estimator_constructor` for each value of condition attribute. 
    362         If all constructed estimators can return distribution of probabilities 
    363         for all classes (usually either all or none can), the 
    364         :class:`~Orange.statistics.distribution.Distribution` instances are put 
    365         in a contingency table 
    366         and :class:`ConditionalEstimatorFromDistribution` 
    367         is constructed and returned. If constructed estimators are 
    368         not capable of returning distribution of probabilities, 
    369         a :class:`ConditionalEstimatorByRows` is constructed and the 
    370         estimators are stored in its :obj:`estimator_list`. 
    371  
    372         :rtype: :class:`ConditionalEstimatorFromDistribution` or :class:`ConditionalEstimatorByRows` 
    373  
    374 .. class:: ConditionalEstimatorByRows 
    375  
    376     Bases: :class:`ConditionalEstimator` 
    377  
    378     A conditional probability estimator constructors that itself uses a series 
    379     of estimators, one for each possible condition, 
    380     stored in its :obj:`estimator_list` attribute. 
    381  
    382     .. attribute:: estimator_list 
    383  
    384         A list of estimators; one for each value of :obj:`condition`. 
    385  
    386     .. method:: __call__([[value,] condition_value]) 
    387  
    388         Uses estimators from :obj:`estimator_list`, 
    389         depending on given `condition_value`. 
    390         For detailed description of handling of different combinations of 
    391         parameters, see the inherited :obj:`ConditionalEstimator.__call__`. 
    392  
    393 """ 
    394  
    3951import Orange 
    3962from Orange.core import ProbabilityEstimator as Estimator 
Note: See TracChangeset for help on using the changeset viewer.