Ignore:
Timestamp:
02/25/12 22:42:47 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Moved documentation about statistics.distribution to rst

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Orange/statistics/distribution.py

    r9927 r10372  
    1 """ 
    2 .. index:: Distributions 
    3  
    4 ============= 
    5 Distributions 
    6 ============= 
    7  
    8 :obj:`Distribution` and derived classes store empirical 
    9 distributions of discrete and continuous variables. 
    10  
    11 .. class:: Distribution 
    12  
    13     This class can 
    14     store absolute or relative frequencies. It provides a convenience constructor 
    15     which constructs instances of derived classes. :: 
    16  
    17         >>> import Orange 
    18         >>> data = Orange.data.Table("adult_sample") 
    19         >>> disc = Orange.statistics.distribution.Distribution("workclass", data) 
    20         >>> print disc 
    21         <685.000, 72.000, 28.000, 29.000, 59.000, 43.000, 2.000> 
    22         >>> print type(disc) 
    23         <type 'DiscDistribution'> 
    24  
    25     The resulting distribution is of type :obj:`DiscDistribution` since variable 
    26     `workclass` is discrete. The printed numbers are counts of examples that have particular 
    27     attribute value. :: 
    28  
    29         >>> workclass = data.domain["workclass"] 
    30         >>> for i in range(len(workclass.values)): 
    31         ...     print "%20s: %5.3f" % (workclass.values[i], disc[i]) 
    32                  Private: 685.000 
    33         Self-emp-not-inc: 72.000 
    34             Self-emp-inc: 28.000 
    35              Federal-gov: 29.000 
    36                Local-gov: 59.000 
    37                State-gov: 43.000 
    38              Without-pay: 2.000 
    39             Never-worked: 0.000 
    40  
    41     Distributions resembles dictionaries, supporting indexing by instances of 
    42     :obj:`Orange.data.Value`, integers or floats (depending on the distribution 
    43     type), and symbolic names (if :obj:`variable` is defined). 
    44  
    45     For instance, the number of examples with `workclass="private"`, can be 
    46     obtained in three ways:: 
    47      
    48         print "Private: ", disc["Private"] 
    49         print "Private: ", disc[0] 
    50         print "Private: ", disc[orange.Value(workclass, "Private")] 
    51  
    52     Elements cannot be removed from distributions. 
    53  
    54     Length of distribution equals the number of possible values for discrete 
    55     distributions (if :obj:`variable` is set), the value with the highest index 
    56     encountered (if distribution is discrete and :obj: `variable` is 
    57     :obj:`None`) or the number of different values encountered (for continuous 
    58     distributions). 
    59  
    60     .. attribute:: variable 
    61  
    62         Variable to which the distribution applies; may be :obj:`None` if not 
    63         applicable. 
    64  
    65     .. attribute:: unknowns 
    66  
    67         The number of instances for which the value of the variable was 
    68         undefined. 
    69  
    70     .. attribute:: abs 
    71  
    72         Sum of all elements in the distribution. Usually it equals either 
    73         :obj:`cases` if the instance stores absolute frequencies or 1 if the 
    74         stored frequencies are relative, e.g. after calling :obj:`normalize`. 
    75  
    76     .. attribute:: cases 
    77  
    78         The number of instances from which the distribution is computed, 
    79         excluding those on which the value was undefined. If instances were 
    80         weighted, this is the sum of weights. 
    81  
    82     .. attribute:: normalized 
    83  
    84         :obj:`True` if distribution is normalized. 
    85  
    86     .. attribute:: random_generator 
    87  
    88         A pseudo-random number generator used for method :obj:`Orange.misc.Random`. 
    89  
    90     .. method:: __init__(variable[, data[, weightId=0]]) 
    91  
    92         Construct either :obj:`DiscDistribution` or :obj:`ContDistribution`, 
    93         depending on the variable type. If the variable is the only argument, it 
    94         must be an instance of :obj:`Orange.feature.Descriptor`. In that case, 
    95         an empty distribution is constructed. If data is given as well, the 
    96         variable can also be specified by name or index in the 
    97         domain. Constructor then computes the distribution of the specified 
    98         variable on the given data. If instances are weighted, the id of 
    99         meta-attribute with weights can be passed as the third argument. 
    100  
    101         If variable is given by descriptor, it doesn't need to exist in the 
    102         domain, but it must be computable from given instances. For example, the 
    103         variable can be a discretized version of a variable from data. 
    104  
    105     .. method:: keys() 
    106  
    107         Return a list of possible values (if distribution is discrete and 
    108         :obj:`variable` is set) or a list encountered values otherwise. 
    109  
    110     .. method:: values() 
    111  
    112         Return a list of frequencies of values such as described above. 
    113  
    114     .. method:: items() 
    115  
    116         Return a list of pairs of elements of the above lists. 
    117  
    118     .. method:: native() 
    119  
    120         Return the distribution as a list (for discrete distributions) or as a 
    121         dictionary (for continuous distributions) 
    122  
    123     .. method:: add(value[, weight=1]) 
    124  
    125         Increase the count of the element corresponding to ``value`` by 
    126         ``weight``. 
    127  
    128         :param value: Value 
    129         :type value: :obj:`Orange.data.Value`, string (if :obj:`variable` is set), :obj:`int` for discrete distributions or :obj:`float` for continuous distributions 
    130         :param weight: Weight to be added to the count for ``value`` 
    131         :type weight: float 
    132  
    133     .. method:: normalize() 
    134  
    135         Divide the counts by their sum, set :obj:`normalized` to :obj:`True` and 
    136         :obj:`abs` to 1. Attributes :obj:`cases` and :obj:`unknowns` are 
    137         unchanged. This changes absoluted frequencies into relative. 
    138  
    139     .. method:: modus() 
    140  
    141         Return the most common value. If there are multiple such values, one is 
    142         chosen at random, although the chosen value will always be the same for 
    143         the same distribution. 
    144  
    145     .. method:: random() 
    146  
    147         Return a random value based on the stored empirical probability 
    148         distribution. For continuous distributions, this will always be one of 
    149         the values which actually appeared (e.g. one of the values from 
    150         :obj:`keys`). 
    151  
    152         The method uses :obj:`random_generator`. If none has been constructed or 
    153         assigned yet, a new one is constructed and stored for further use. 
    154  
    155  
    156 .. class:: Discrete 
    157  
    158     Stores a discrete distribution of values. The class differs from its parent 
    159     class in having a few additional constructors. 
    160  
    161     .. method:: __init__(variable) 
    162  
    163         Construct an instance of :obj:`Discrete` and set the variable 
    164         attribute. 
    165  
    166         :param variable: A discrete variable 
    167         :type variable: Orange.feature.Discrete 
    168  
    169     .. method:: __init__(frequencies) 
    170  
    171         Construct an instance and initialize the frequencies from the list, but 
    172         leave `Distribution.variable` empty. 
    173  
    174         :param frequencies: A list of frequencies 
    175         :type frequencies: list 
    176  
    177         Distribution constructed in this way can be used, for instance, to 
    178         generate random numbers from a given discrete distribution:: 
    179  
    180             disc = Orange.statistics.distribution.Discrete([0.5, 0.3, 0.2]) 
    181             for i in range(20): 
    182                 print disc.random(), 
    183  
    184         This prints out approximatelly ten 0's, six 1's and four 2's. The values 
    185         can be named by assigning a variable:: 
    186  
    187             v = orange.EnumVariable(values = ["red", "green", "blue"]) 
    188             disc.variable = v 
    189  
    190     .. method:: __init__(distribution) 
    191  
    192         Copy constructor; makes a shallow copy of the given distribution 
    193  
    194         :param distribution: An existing discrete distribution 
    195         :type distribution: Discrete 
    196  
    197  
    198 .. class:: Continuous 
    199  
    200     Stores a continuous distribution, that is, a dictionary-like structure with 
    201     values and their frequencies. 
    202  
    203     .. method:: __init__(variable) 
    204  
    205         Construct an instance of :obj:`ContDistribution` and set the variable 
    206         attribute. 
    207  
    208         :param variable: A continuous variable 
    209         :type variable: Orange.feature.Continuous 
    210  
    211     .. method:: __init__(frequencies) 
    212  
    213         Construct an instance of :obj:`Continuous` and initialize it from 
    214         the given dictionary with frequencies, whose keys and values must be integers. 
    215  
    216         :param frequencies: Values and their corresponding frequencies 
    217         :type frequencies: dict 
    218  
    219     .. method:: __init__(distribution) 
    220  
    221         Copy constructor; makes a shallow copy of the given distribution 
    222  
    223         :param distribution: An existing continuous distribution 
    224         :type distribution: Continuous 
    225  
    226     .. method:: average() 
    227  
    228         Return the average value. Note that the average can also be 
    229         computed using a simpler and faster classes from module 
    230         :obj:`Orange.statistics.basic`. 
    231  
    232     .. method:: var() 
    233  
    234         Return the variance of distribution. 
    235  
    236     .. method:: dev() 
    237  
    238         Return the standard deviation. 
    239  
    240     .. method:: error() 
    241  
    242         Return the standard error. 
    243  
    244     .. method:: percentile(p) 
    245  
    246         Return the value at the `p`-th percentile. 
    247  
    248         :param p: The percentile, must be between 0 and 100 
    249         :type p: float 
    250         :rtype: float 
    251  
    252         For example, if `d_age` is a continuous distribution, the quartiles can 
    253         be printed by :: 
    254  
    255             print "Quartiles: %5.3f - %5.3f - %5.3f" % (  
    256                  dage.percentile(25), dage.percentile(50), dage.percentile(75)) 
    257  
    258    .. method:: density(x) 
    259  
    260         Return the probability density at `x`. If the value is not in 
    261         :obj:`Distribution.keys`, it is interpolated. 
    262  
    263  
    264 .. class:: Gaussian 
    265  
    266     A class imitating :obj:`Continuous` by returning the statistics and 
    267     densities for Gaussian distribution. The class is not meant only for a 
    268     convenient substitution for code which expects an instance of 
    269     :obj:`Distribution`. For general use, Python module :obj:`random` 
    270     provides a comprehensive set of functions for various random distributions. 
    271  
    272     .. attribute:: mean 
    273  
    274         The mean value parameter of the Gauss distribution. 
    275  
    276     .. attribute:: sigma 
    277  
    278         The standard deviation of the distribution 
    279  
    280     .. attribute:: abs 
    281  
    282         The simulated number of instances; in effect, the Gaussian distribution 
    283         density, as returned by method :obj:`density` is multiplied by 
    284         :obj:`abs`. 
    285  
    286     .. method:: __init__([mean=0, sigma=1]) 
    287  
    288         Construct an instance, set :obj:`mean` and :obj:`sigma` to the given 
    289         values and :obj:`abs` to 1. 
    290  
    291     .. method:: __init__(distribution) 
    292  
    293         Construct a distribution which approximates the given distribution, 
    294         which must be either :obj:`Continuous`, in which case its 
    295         average and deviation will be used for mean and sigma, or and existing 
    296         :obj:`GaussianDistribution`, which will be copied. Attribute :obj:`abs` 
    297         is set to the given distribution's ``abs``. 
    298  
    299     .. method:: average() 
    300  
    301         Return :obj:`mean`. 
    302  
    303     .. method:: dev() 
    304  
    305         Return :obj:`sigma`. 
    306  
    307     .. method:: var() 
    308  
    309         Return square of :obj:`sigma`. 
    310  
    311     .. method:: density(x) 
    312  
    313         Return the density at point ``x``, that is, the Gaussian distribution 
    314         density multiplied by :obj:`abs`. 
    315  
    316  
    317 Class distributions 
    318 =================== 
    319  
    320 There is a convenience function for computing empirical class distributions from 
    321 data. 
    322  
    323 .. function:: getClassDistribution(data[, weightID=0]) 
    324  
    325     Return a class distribution for the given data. 
    326  
    327     :param data: A set of instances. 
    328     :type data: Orange.data.Table 
    329     :param weightID: An id for meta attribute with weights of instances 
    330     :type weightID: int 
    331     :rtype: :obj:`Discrete` or :obj:`Continuous`, depending on the class type 
    332  
    333 Distributions of all variables 
    334 ============================== 
    335  
    336 Distributions of all variables can be computed and stored in 
    337 :obj:`Domain`. The list-like object can be indexed by variable 
    338 indices in the domain, as well as by variables and their names. 
    339  
    340 .. class:: Domain 
    341  
    342     .. method:: __init__(data[, weightID=0]) 
    343  
    344         Construct an instance with distributions of all discrete and continuous 
    345         variables from the given data. 
    346  
    347     :param data: A set of instances. 
    348     :type data: Orange.data.Table 
    349     :param weightID: An id for meta attribute with weights of instances 
    350     :type weightID: int 
    351  
    352 The script below computes distributions for all attributes in the data and 
    353 prints out distributions for discrete and averages for continuous attributes. :: 
    354  
    355     dist = Orange.statistics.distribution.Domain(data) 
    356  
    357     for d in dist: 
    358         if d.variable.var_type == Orange.feature.Type.Discrete: 
    359              print "%30s: %s" % (d.variable.name, d) 
    360         else: 
    361              print "%30s: avg. %5.3f" % (d.variable.name, d.average()) 
    362  
    363 The distribution for, say, attribute `age` can be obtained by its index and also 
    364 by its name:: 
    365  
    366     dist_age = dist["age"] 
    367  
    368 """ 
    369  
    370  
    3711from Orange.core import Distribution 
    3722from Orange.core import DiscDistribution as Discrete 
Note: See TracChangeset for help on using the changeset viewer.