Changeset 7659:0f8fb9f7cd72 in orange


Ignore:
Timestamp:
02/12/11 16:48:28 (3 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Convert:
e506a7c6637fb66a50ecee74e30404c736604673
Message:

Reorganized Orange.statistics into submodules

Location:
orange
Files:
1 added
1 deleted
6 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/__init__.py

    r7595 r7659  
    1919import statistics 
    2020import statistics.estimate 
    21 import statistics.distributions 
     21import statistics.contingency 
     22import statistics.distribution 
     23import statistics.basic 
    2224import statistics.evd 
    2325 
  • orange/Orange/distances/__init__.py

    r7604 r7659  
    3838.. class:: ExamplesDistanceConstructor 
    3939 
    40     .. method:: __call__([instances, weightID][, DomainDistributions][, DomainBasicAttrStat]) 
     40    .. method:: __call__([instances, weightID][, distributions][, basic_var_stat]) 
    4141 
    4242        Constructs an instance of ExamplesDistance. 
    4343        Not all the data needs to be given. Most measures can be constructed 
    44         from DomainBasicAttrStat; if it is not given, they can help themselves 
    45         either by instances or DomainDistributions. 
     44        from basic_var_stat; if it is not given, they can help themselves 
     45        either by instances or distributions. 
    4646        Some (e.g. ExamplesDistance_Hamming) even do not need any arguments. 
    4747 
     
    118118    .. method:: distributions  
    119119 
    120         An object of type DomainDistributions that holds the distributions 
    121         for all discrete features. This is needed to compute distances between 
    122         known and unknown values. 
     120        An object of type 
     121        :obj:`Orange.statistics.distribution.Distribution` that holds 
     122        the distributions for all discrete features used for 
     123        computation of distances between known and unknown values. 
    123124 
    124125    .. method:: bothSpecialDist 
  • orange/Orange/feature/scoring.py

    r7552 r7659  
    8181    .. attribute:: needs 
    8282     
    83     Tells what kind of data the measure needs. This can be either  
    84     :obj:`NeedsGenerator`, :obj:`NeedsDomainContingency`,  
    85     :obj:`NeedsContingency_Class`. The first need an instance generator 
    86     (Relief is an example of such measure), the second can compute the quality 
    87     from :obj:`Orange.statistics.distributions.DomainContingency` and the 
    88     latter only needs the contingency 
    89     (:obj:`Orange.statistics.distributions.ContingencyAttrClass`) the  
    90     feature distribution and the apriori class distribution. Most measures 
    91     only need the latter. 
     83    Tells what kind of data the measure needs. This can be either 
     84    :obj:`NeedsGenerator`, :obj:`NeedsDomainContingency`, 
     85    :obj:`NeedsContingency_Class`. The first need an instance generator (Relief 
     86    is an example of such measure), the second can compute the quality from 
     87    :obj:`Orange.statistics.contingency.Domain` and the latter only needs the 
     88    contingency (:obj:`Orange.statistics.contingency.VarClass`) the feature 
     89    distribution and the apriori class distribution. Most measures only need the 
     90    latter. 
    9291 
    9392    Several (but not all) measures can treat unknown feature values in 
     
    150149          feature in the domain, though. 
    151150           
    152         Data is given either as examples (and, optionally, id for  
    153         meta-feature with weight), domain contingency 
    154         (:obj:`Orange.statistics.distributions.DomainContingency`) (a list of 
    155         contingencies) or distribution (:obj:`Orange.statistics.distributions`) 
    156         matrix and :obj:`Orange.statistics.distributions.Distribution`. If  
    157         you use the latter form, what you should give as the class distribution 
    158         depends upon what you do with unknown values (if there are any). 
    159         If :obj:`unknownsTreatment` is :obj:`IgnoreUnknowns`, the class 
    160         distribution should be computed on examples for which the feature 
    161         value is defined. Otherwise, class distribution should be the overall 
    162         class distribution. 
     151        Data is given either as examples (and, optionally, id for meta-feature 
     152        with weight), contingency tables 
     153        (:obj:`Orange.statistics.contingency.Domain`) or distributions 
     154        (:obj:`Orange.statistics.distribution.Distribution`) for all 
     155        attributes. In the latter for, what is given as the class distribution 
     156        depends upon what you do with unknown values (if there are any).  If 
     157        :obj:`unknownsTreatment` is :obj:`IgnoreUnknowns`, the class 
     158        distribution should be computed on examples for which the feature value 
     159        is defined. Otherwise, class distribution should be the overall class 
     160        distribution. 
    163161 
    164162        The optional argument with apriori class distribution is 
  • orange/Orange/statistics/__init__.py

    r7530 r7659  
     1import distributions, estimate, evd 
  • orange/Orange/statistics/distributions.py

    r7628 r7659  
    11""" 
    2  
    3 Orange has several classes for computing and storing basic statistics about 
    4 features, distributions and contingencies. 
    5  
    6      
    7 ========================================= 
    8 Basic statistics for continuous variables 
    9 ========================================= 
    10  
    11 The are two simple classes for computing basic statistics 
    12 for continuous features, such as their minimal and maximal value 
    13 or average: :class:`BasicStatistics` holds the statistics for a single variable 
    14 and :class:`DomainBasicStatistics` behaves like a list of instances of 
    15 the above class for all variables in the domain. 
    16  
    17 .. class:: BasicStatistics 
    18  
    19     ``BasicStatistics`` computes and stores minimal, maximal, average and 
    20     standard deviation of a variable. It does not include the median or any 
    21     other statistics that can be computed on the fly, without remembering the 
    22     data; such statistics can be obtained using :obj:`ContDistribution`. 
    23  
    24     Instances of this class are seldom constructed manually; they are more often 
    25     returned by :obj:`DomainBasicStatistics` described below. 
    26  
    27     .. attribute:: variable 
    28      
    29         The variable to which the data applies. 
    30  
    31     .. attribute:: min 
    32  
    33         Minimal value encountered 
    34  
    35     .. attribute:: max 
    36  
    37         Maximal value encountered 
    38  
    39     .. attribute:: avg 
    40  
    41         Average value 
    42  
    43     .. attribute:: dev 
    44  
    45         Standard deviation 
    46  
    47     .. attribute:: n 
    48  
    49         Number of instances for which the value was defined. 
    50         If instances were weighted, :obj:`n` holds the sum of weights 
    51          
    52     .. attribute:: sum 
    53  
    54         Weighted sum of values 
    55  
    56     .. attribute:: sum2 
    57  
    58         Weighted sum of squared values 
    59  
    60     .. 
    61         .. attribute:: holdRecomputation 
    62      
    63             Holds recomputation of the average and standard deviation. 
    64  
    65     .. method:: add(value[, weight=1]) 
    66      
    67         Add a value to the statistics: adjust :obj:`min` and :obj:`max` if 
    68         necessary, increase :obj:`n` and recompute :obj:`sum`, :obj:`sum2`, 
    69         :obj:`avg` and :obj:`dev`. 
    70  
    71         :param value: Value to be added to the statistics 
    72         :type value: float 
    73         :param weight: Weight assigned to the value 
    74         :type weight: float 
    75  
    76     .. 
    77         .. method:: recompute() 
    78  
    79             Recompute the average and deviation. 
    80  
    81 .. class:: DomainBasicStatistics 
    82  
    83     ``DomainBasicStatistics`` behaves like a ordinary list, except that its 
    84     elements can also be indexed by variable names or descriptors. 
    85  
    86     .. method:: __init__(data[, weight=None]) 
    87  
    88         Compute the statistics for all continuous features in the data, and put 
    89         :obj:`None` to the places corresponding to variables of other types. 
    90  
    91         :param data: A table of instances 
    92         :type data: Orange.data.Table 
    93         :param weight: The id of the meta-attribute with weights 
    94         :type weight: `int` or none 
    95          
    96     .. method:: purge() 
    97      
    98         Remove the :obj:`None`'s corresponding to non-continuous features; this 
    99         truncates the list, so the indices do not respond to indices of 
    100         variables in the domain. 
    101      
    102     part of `distributions-basic-stat.py`_ (uses monks-1.tab) 
    103      
    104     .. literalinclude:: code/distributions-basic-stat.py 
    105         :lines: 1-10 
    106  
    107     Output:: 
    108  
    109              feature   min   max   avg 
    110         sepal length 4.300 7.900 5.843 
    111          sepal width 2.000 4.400 3.054 
    112         petal length 1.000 6.900 3.759 
    113          petal width 0.100 2.500 1.199 
    114  
    115  
    116     part of `distributions-basic-stat`_ (uses iris.tab) 
    117      
    118     .. literalinclude:: code/distributions-basic-stat.py 
    119         :lines: 11- 
    120  
    121     Output:: 
    122  
    123         5.84333467484  
    124  
    125 .. _distributions-basic-stat: code/distributions-basic-stat.py 
    126 .. _distributions-basic-stat.py: code/distributions-basic-stat.py 
    127  
    128  
    129 ================================ 
    130 Distributions of variable values 
    131 ================================ 
    1322 
    1333Class :obj:`Distribution` and derived classes are used for storing empirical 
     
    279149 
    280150 
    281 .. class:: DiscDistribution 
     151.. class:: Discrete 
    282152 
    283153    Stores a discrete distribution of values. The class differs from its parent 
     
    286156    .. method:: __init__(variable) 
    287157 
    288         Construct an instance of :obj:`DiscDistribution` and set the variable 
     158        Construct an instance of :obj:`Discrete` and set the variable 
    289159        attribute. 
    290160 
     
    303173    generate random numbers from a given discrete distribution:: 
    304174 
    305             disc = orange.DiscDistribution([0.5, 0.3, 0.2]) 
     175            disc = Orange.statistics.distribution.Discrete([0.5, 0.3, 0.2]) 
    306176            for i in range(20): 
    307177                print disc.random(), 
     
    318188 
    319189        :param distribution: An existing discrete distribution 
    320         :type distribution: DiscDistribution 
    321  
    322  
    323 .. class:: ContDistribution 
     190        :type distribution: Discrete 
     191 
     192 
     193.. class:: Continuous 
    324194 
    325195    Stores a continuous distribution, that is, a dictionary-like structure with 
     
    336206    .. method:: __init__(frequencies) 
    337207 
    338         Construct an instance of :obj:`ContDistribution` and initialize it from 
     208        Construct an instance of :obj:`Continuous` and initialize it from 
    339209        the given dictionary with frequencies, whose keys and values must be integers. 
    340210 
     
    347217 
    348218        :param distribution: An existing continuous distribution 
    349         :type distribution: ContDistribution 
     219        :type distribution: Continuous 
    350220 
    351221    .. method:: average() 
    352222 
    353         Return the average value. Note that the average can also be computed 
    354         using a simpler and faster class 
    355         :obj:`Orange.statistics.distributions.BasicStatistics`. 
     223        Return the average value. Note that the average can also be 
     224        computed using a simpler and faster classes from module 
     225        :obj:`Orange.statistics.basic`. 
    356226 
    357227    .. method:: var() 
     
    387257 
    388258 
    389 .. class:: GaussianDistribution 
    390  
    391     A class imitating :obj:`ContDistribution` by returning the statistics and 
     259.. class:: Gaussian 
     260 
     261    A class imitating :obj:`Continuous` by returning the statistics and 
    392262    densities for Gaussian distribution. The class is not meant only for a 
    393263    convenient substitution for code which expects an instance of 
     
    417287 
    418288        Construct a distribution which approximates the given distribution, 
    419         which must be either :obj:`ContDistribution`, in which case its 
     289        which must be either :obj:`Continuous`, in which case its 
    420290    average and deviation will be used for mean and sigma, or and existing 
    421291        :obj:`GaussianDistribution`, which will be copied. Attribute :obj:`abs` 
     
    454324    :param weightID: An id for meta attribute with weights of instances 
    455325    :type weightID: int 
    456     :rtype: :obj:`DiscDistribution` or :obj:`ContDistribution`, depending on the class type 
     326    :rtype: :obj:`Discrete` or :obj:`Continuous`, depending on the class type 
    457327 
    458328Distributions of all variables 
     
    460330 
    461331Distributions of all variables can be computed and stored in 
    462 :obj:`DomainDistributions`. The list-like object can be indexed by variable 
     332:obj:`Domain`. The list-like object can be indexed by variable 
    463333indices in the domain, as well as by variables and their names. 
    464334 
    465 .. class:: DomainDistributions 
     335.. class:: Domain 
    466336 
    467337    .. method:: __init__(data[, weightID=0]) 
     
    478348prints out distributions for discrete and averages for continuous attributes. :: 
    479349 
    480     dist = orange.DomainDistributions(data) 
     350    dist = Orange.statistics.distributions.Domain(data) 
    481351 
    482352        for d in dist: 
     
    484354                 print "%30s: %s" % (d.variable.name, d) 
    485355        else: 
     356                 
    486357                 print "%30s: avg. %5.3f" % (d.variable.name, d.average()) 
    487358 
     
    491362    dist_age = dist["age"] 
    492363 
    493 ================== 
    494 Contingency matrix 
    495 ================== 
    496  
    497 Contingency matrix contains conditional distributions. Unless explicitly 
    498 'normalized', they contain absolute frequencies, that is, the number of 
    499 instances with a particular combination of two variables' values. If they are 
    500 normalized by dividing each cell by the row sum, the represent conditional 
    501 probabilities of the column variable (here denoted as ``innerVariable``) 
    502 conditioned by the row variable (``outerVariable``). 
    503  
    504 Contingency matrices are usually constructed for discrete variables. Matrices 
    505 for continuous variables have certain limitations described in a :ref:`separate 
    506 section <contcont>`. 
    507  
    508 The example below loads the monks-1 data set and prints out the conditional 
    509 class distribution given the value of `e`. 
    510  
    511 .. _distributions-contingency: code/distributions-contingency.py 
    512  
    513 part of `distributions-contingency`_ (uses monks-1.tab) 
    514  
    515 .. literalinclude:: code/distributions-contingency.py 
    516     :lines: 1-8 
    517  
    518 This code prints out:: 
    519  
    520     1 <0.000, 108.000> 
    521     2 <72.000, 36.000> 
    522     3 <72.000, 36.000> 
    523     4 <72.000, 36.000>  
    524  
    525 Contingencies behave like lists of distributions (in this case, class 
    526 distributions) indexed by values (of `e`, in this 
    527 example). Distributions are, in turn indexed by values (class values, 
    528 here). The variable `e` from the above example is called the outer 
    529 variable, and the class is the inner. This can also be reversed. It is 
    530 also possible to use features for both, outer and inner variable, so 
    531 the matrix shows distributions of one variable's values given the 
    532 value of another.  There is a corresponding hierarchy of classes: 
    533 :obj:`Contingency` is a base class for :obj:`ContingencyVarVar` (both 
    534 variables are attribtes) and :obj:`ContingencyClass` (one variable is 
    535 the class).  The latter is the base class for 
    536 :obj:`ContingencyVarClass` and :obj:`ContingencyClassVar`. 
    537  
    538 The most commonly used of the above classes is :obj:`ContingencyVarClass` which 
    539 can compute and store conditional probabilities of classes given the feature value. 
    540  
    541 Contingency matrices 
    542 ==================== 
    543  
    544 .. class:: Contingency 
    545  
    546     Provides a base class for storing and manipulating contingency 
    547     matrices. Although it is not abstract, it is seldom used directly but rather 
    548     through more convenient derived classes described below. 
    549  
    550     .. attribute:: outerVariable 
    551  
    552        Outer variable (:class:`Orange.data.feature.Feature`) whose values are 
    553        used as the first, outer index. 
    554  
    555     .. attribute:: innerVariable 
    556  
    557        Inner variable(:class:`Orange.data.feature.Feature`), whose values are 
    558        used as the second, inner index. 
    559   
    560     .. attribute:: outerDistribution 
    561  
    562         The marginal distribution (:class:`Distribution`) of the outer variable. 
    563  
    564     .. attribute:: innerDistribution 
    565  
    566         The marginal distribution (:class:`Distribution`) of the inner variable. 
    567          
    568     .. attribute:: innerDistributionUnknown 
    569  
    570         The distribution (:class:`Distribution`) of the inner variable for 
    571         instances for which the outer variable was undefined. This is the 
    572         difference between the ``innerDistribution`` and (unconditional) 
    573         distribution of inner variable. 
    574        
    575     .. attribute:: varType 
    576  
    577         The type of the outer variable (:obj:`Orange.data.Type`, usually 
    578         :obj:`Orange.data.feature.Discrete` or 
    579         :obj:`Orange.data.feature.Continuous`); equals 
    580         ``outerVariable.varType`` and ``outerDistribution.varType``. 
    581  
    582     .. method:: __init__(outer_variable, inner_variable) 
    583       
    584         Construct an instance of ``Contingency`` for the given pair of 
    585         variables. 
    586       
    587         :param outer_variable: Descriptor of the outer variable 
    588         :type outer_variable: Orange.data.feature.Feature 
    589         :param outer_variable: Descriptor of the inner variable 
    590         :type inner_variable: Orange.data.feature.Feature 
    591          
    592     .. method:: add(outer_value, inner_value[, weight=1]) 
    593      
    594         Add an element to the contingency matrix by adding ``weight`` to the 
    595         corresponding cell. 
    596  
    597         :param outer_value: The value for the outer variable 
    598         :type outer_value: int, float, string or :obj:`Orange.data.Value` 
    599         :param inner_value: The value for the inner variable 
    600         :type inner_value: int, float, string or :obj:`Orange.data.Value` 
    601         :param weight: Instance weight 
    602         :type weight: float 
    603  
    604     .. method:: normalize() 
    605  
    606         Normalize all distributions (rows) in the matrix to sum to ``1``:: 
    607          
    608             >>> cont.normalize() 
    609             >>> for val, dist in cont.items(): 
    610                    print val, dist 
    611  
    612         Output: :: 
    613  
    614             1 <0.000, 1.000> 
    615             2 <0.667, 0.333> 
    616             3 <0.667, 0.333> 
    617             4 <0.667, 0.333> 
    618  
    619         .. note:: 
    620         
    621             This method does not change the ``innerDistribution`` or 
    622             ``outerDistribution``. 
    623          
    624     With respect to indexing, contingency matrix is a cross between dictionary 
    625     and a list. It supports standard dictionary methods ``keys``, ``values`` and 
    626     ``items``. :: 
    627  
    628         >> print cont.keys() 
    629         ['1', '2', '3', '4'] 
    630         >>> print cont.values() 
    631         [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>] 
    632         >>> print cont.items() 
    633         [('1', <0.000, 108.000>), ('2', <72.000, 36.000>), 
    634         ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]  
    635  
    636     Although keys returned by the above functions are strings, contingency can 
    637     be indexed by anything that can be converted into values of the outer 
    638     variable: strings, numbers or instances of ``Orange.data.Value``. :: 
    639  
    640         >>> print cont[0] 
    641         <0.000, 108.000> 
    642         >>> print cont["1"] 
    643         <0.000, 108.000> 
    644         >>> print cont[orange.Value(data.domain["e"], "1")]  
    645  
    646     The length of ``Contingency`` equals the number of values of the outer 
    647     variable. However, iterating through contingency 
    648     does not return keys, as with dictionaries, but distributions. :: 
    649  
    650         >>> for i in cont: 
    651             ... print i 
    652         <0.000, 108.000> 
    653         <72.000, 36.000> 
    654         <72.000, 36.000> 
    655         <72.000, 36.000> 
    656         <72.000, 36.000>  
    657  
    658  
    659 .. class:: ContingencyClass 
    660  
    661     An abstract base class for contingency matrices that contain the class, 
    662     either as the inner or the outer variable. 
    663  
    664     .. attribute:: classVar (read only) 
    665      
    666         The class attribute descriptor; always equal to either 
    667         :obj:`Contingency.innerVariable` or :obj:``Contingency.outerVariable``. 
    668  
    669     .. attribute:: variable 
    670      
    671         Variable; always equal either to either innerVariable or outerVariable 
    672  
    673     .. method:: add_attrclass(variable_value, class_value[, weight=1]) 
    674  
    675         Add an element to contingency by increasing the corresponding count. The 
    676         difference between this and :obj:`Contigency.add` is that the variable 
    677         value is always the first argument and class value the second, 
    678         regardless of which one is inner and which one is outer. 
    679  
    680         :param attribute_value: Variable value 
    681         :type attribute_value: int, float, string or :obj:`Orange.data.Value` 
    682         :param class_value: Class value 
    683         :type class_value: int, float, string or :obj:`Orange.data.Value` 
    684         :param weight: Instance weight 
    685         :type weight: float 
    686  
    687  
    688 .. class:: ContingencyVarClass 
    689  
    690     A class derived from :obj:`ContingencyVarClass` in which the variable is 
    691     used as :obj:`Contingency.outerVariable` and class as the 
    692     :obj:`Contingency.innerVariable`. This form is a form suitable for 
    693     computation of conditional class probabilities given the variable value. 
    694      
    695     Calling :obj:`ContingencyVarClass.add_attrclass(v, c)` is equivalent to 
    696     :obj:`Contingency.add(v, c)`. Similar as :obj:`Contingency`, 
    697     :obj:`ContingencyVarClass` can compute contingency from instances. 
    698  
    699     .. method:: __init__(feature, class_variable) 
    700  
    701         Construct an instance of :obj:`ContingencyVarClass` for the given pair of 
    702         variables. Inherited from :obj:`Contingency`. 
    703  
    704         :param feature: Outer variable 
    705         :type feature: Orange.data.feature.Feature 
    706         :param class_attribute: Class variable; used as ``innerVariable`` 
    707         :type class_attribute: Orange.data.feature.Feature 
    708          
    709     .. method:: __init__(feature, data[, weightId]) 
    710  
    711         Compute the contingency from data. 
    712  
    713         :param feature: Outer variable 
    714         :type feature: Orange.data.feature.Feature 
    715         :param data: A set of instances 
    716         :type data: Orange.data.Table 
    717         :param weightId: meta attribute with weights of instances 
    718         :type weightId: int 
    719  
    720     .. method:: p_class(value) 
    721  
    722         Return the probability distribution of classes given the value of the 
    723         variable. 
    724  
    725         :param value: The value of the variable 
    726         :type value: int, float, string or :obj:`Orange.data.Value` 
    727         :rtype: Orange.statistics.distribution.Distribution 
    728  
    729  
    730     .. method:: p_class(value, class_value) 
    731  
    732         Returns the conditional probability of the class_value given the 
    733         feature value, p(class_value|value) (note the order of arguments!) 
    734          
    735         :param value: The value of the variable 
    736         :type value: int, float, string or :obj:`Orange.data.Value` 
    737         :param class_value: The class value 
    738         :type value: int, float, string or :obj:`Orange.data.Value` 
    739         :rtype: float 
    740  
    741     .. _distributions-contingency3.py: code/distributions-contingency3.py 
    742  
    743     part of `distributions-contingency3.py`_ (uses monks-1.tab) 
    744  
    745     .. literalinclude:: code/distributions-contingency3.py 
    746         :lines: 1-25 
    747  
    748     The inner and the outer variable and their relations to the class are 
    749     as follows:: 
    750  
    751         Inner variable:  y 
    752         Outer variable:  e 
    753      
    754         Class variable:  y 
    755         Feature:         e 
    756  
    757     Distributions are normalized, and probabilities are elements from the 
    758     normalized distributions. Knowing that the target concept is 
    759     y := (e=1) or (a=b), distributions are as expected: when e equals 1, class 1 
    760     has a 100% probability, while for the rest, probability is one third, which 
    761     agrees with a probability that two three-valued independent features 
    762     have the same value. :: 
    763  
    764         Distributions: 
    765           p(.|1) = <0.000, 1.000> 
    766           p(.|2) = <0.662, 0.338> 
    767           p(.|3) = <0.659, 0.341> 
    768           p(.|4) = <0.669, 0.331> 
    769      
    770         Probabilities of class '1' 
    771           p(1|1) = 1.000 
    772           p(1|2) = 0.338 
    773           p(1|3) = 0.341 
    774           p(1|4) = 0.331 
    775      
    776         Distributions from a matrix computed manually: 
    777           p(.|1) = <0.000, 1.000> 
    778           p(.|2) = <0.662, 0.338> 
    779           p(.|3) = <0.659, 0.341> 
    780           p(.|4) = <0.669, 0.331> 
    781  
    782  
    783 .. class:: ContingencyClassVar 
    784  
    785     :obj:`ContingencyClassVar` is similar to :obj:`ContingencyVarClass` except 
    786     that the class is outside and the variable is inside. This form of 
    787     contingency matrix is suitable for computing conditional probabilities of 
    788     variable given the class. All methods get the two arguments in the same 
    789     order as :obj:`ContingencyVarClass`. 
    790  
    791     .. method:: __init__(feature, class_variable) 
    792  
    793         Construct an instance of :obj:`ContingencyVarClass` for the given pair of 
    794         variables. Inherited from :obj:`Contingency`, except for the reversed 
    795         order of arguments. 
    796  
    797         :param feature: Outer variable 
    798         :type feature: Orange.data.feature.Feature 
    799         :param class_variable: Class variable 
    800         :type class_variable: Orange.data.feature.Feature 
    801          
    802     .. method:: __init__(feature, data[, weightId]) 
    803  
    804         Compute contingency from the data. 
    805  
    806         :param feature: Descriptor of the outer variable 
    807         :type feature: Orange.data.feature.Feature 
    808         :param data: A set of instances 
    809         :type data: Orange.data.Table 
    810         :param weightId: meta attribute with weights of instances 
    811         :type weightId: int 
    812  
    813     .. method:: p_attr(class_value) 
    814  
    815         Return the probability distribution of variable given the class. 
    816  
    817         :param class_value: The value of the variable 
    818         :type class_value: int, float, string or :obj:`Orange.data.Value` 
    819         :rtype: Orange.statistics.distribution.Distribution 
    820  
    821     .. method:: p_attr(value, class_value) 
    822  
    823         Returns the conditional probability of the value given the 
    824         class, p(value|class_value). 
    825         Equivalent to `self[class][value]`, except for normalization. 
    826  
    827         :param value: Value of the variable 
    828         :type value: int, float, string or :obj:`Orange.data.Value` 
    829         :param class_value: Class value 
    830         :type value: int, float, string or :obj:`Orange.data.Value` 
    831         :rtype: float 
    832  
    833     .. _distributions-contingency4.py: code/distributions-contingency4.py 
    834      
    835     part of the output from `distributions-contingency4.py`_ (uses monk1.tab) 
    836      
    837     The role of the feature and the class are reversed compared to 
    838     :obj:`ContingencyClassVar`:: 
    839      
    840         Inner variable:  e 
    841         Outer variable:  y 
    842      
    843         Class variable:  y 
    844         Feature:         e 
    845      
    846     Distributions given the class can be printed out by calling :meth:`p_attr`. 
    847      
    848     part of `distributions-contingency4.py`_ (uses monks-1.tab) 
    849      
    850     .. literalinclude:: code/distributions-contingency4.py 
    851         :lines: 31- 
    852      
    853     will print:: 
    854         p(.|0) = <0.000, 0.333, 0.333, 0.333> 
    855         p(.|1) = <0.500, 0.167, 0.167, 0.167> 
    856      
    857     If the class value is '0', the attribute `e` cannot be `1` (the first 
    858     value), while distribution across other values is uniform.  If the class 
    859     value is `1`, `e` is `1` for exactly half of instances, and distribution of 
    860     other values is again uniform. 
    861  
    862 .. class:: ContingencyVarVar 
    863  
    864     Contingency matrices in which none of the variables is the class.  The class 
    865     is derived from :obj:`Contingency`, and adds an additional constructor and 
    866     method for getting conditional probabilities. 
    867  
    868     .. method:: ContingencyVarVar(outer_variable, inner_variable) 
    869  
    870         Inherited from :obj:`Contingency`. 
    871  
    872     .. method:: __init__(outer_variable, inner_variable, data[, weightId]) 
    873  
    874         Compute the contingency from the given instances. 
    875  
    876         :param outer_variable: Outer variable 
    877         :type outer_variable: Orange.data.feature.Feature 
    878         :param inner_variable: Inner variable 
    879         :type inner_variable: Orange.data.feature.Feature 
    880         :param data: A set of instances 
    881         :type data: Orange.data.Table 
    882         :param weightId: meta attribute with weights of instances 
    883         :type weightId: int 
    884  
    885     .. method:: p_attr(outer_value) 
    886  
    887         Return the probability distribution of the inner variable given the 
    888         outer variable value. 
    889  
    890         :param outer_value: The value of the outer variable 
    891         :type outer_value: int, float, string or :obj:`Orange.data.Value` 
    892         :rtype: Orange.statistics.distribution.Distribution 
    893   
    894     .. method:: p_attr(outer_value, inner_value) 
    895  
    896         Return the conditional probability of the inner_value 
    897         given the outer_value. 
    898  
    899         :param outer_value: The value of the outer variable 
    900         :type outer_value: int, float, string or :obj:`Orange.data.Value` 
    901         :param inner_value: The value of the inner variable 
    902         :type inner_value: int, float, string or :obj:`Orange.data.Value` 
    903         :rtype: float 
    904  
    905     The following example investigates which material is used for 
    906     bridges of different lengths. 
    907      
    908     .. _distributions-contingency5: code/distributions-contingency5.py 
    909      
    910     part of `distributions-contingency5`_ (uses bridges.tab) 
    911      
    912     .. literalinclude:: code/distributions-contingency5.py 
    913         :lines: 1-19 
    914  
    915     Short bridges are mostly wooden or iron, and the longer (and most of the 
    916     middle sized) are made from steel:: 
    917      
    918         SHORT: 
    919            WOOD (56%) 
    920            IRON (44%) 
    921      
    922         MEDIUM: 
    923            WOOD (9%) 
    924            IRON (11%) 
    925            STEEL (79%) 
    926      
    927         LONG: 
    928            STEEL (100%) 
    929      
    930     As all other contingency matrices, this one can also be computed "manually". 
    931      
    932     .. literalinclude:: code/distributions-contingency5.py 
    933         :lines: 20- 
    934  
    935  
    936 Contingencies for entire domain 
    937 =============================== 
    938  
    939 A list of contingencies, either :obj:`ContingencyVarClass` or 
    940 :obj:`ContingencyClassVar`. 
    941  
    942 .. class:: DomainContingency 
    943  
    944     .. method:: __init__(data[, weightId=0, classOuter=0|1]) 
    945  
    946         Compute a list of contingencies. 
    947  
    948         :param data: A set of instances 
    949         :type data: Orange.data.Table 
    950         :param weightId: meta attribute with weights of instances 
    951         :type weightId: int 
    952         :param classOuter: `True`, if class is the outer variable 
    953         :type classOuter: bool 
    954  
    955         .. note:: 
    956          
    957             ``classIsOuter`` cannot be given as positional argument, 
    958             but needs to be passed by keyword. 
    959  
    960     .. attribute:: classIsOuter (read only) 
    961  
    962         Tells whether the class is the outer or the inner variable. 
    963  
    964     .. attribute:: classes 
    965  
    966         Contains the distribution of class values on the entire dataset. 
    967  
    968     .. method:: normalize() 
    969  
    970         Call normalize for all contingencies. 
    971  
    972     The following script prints the contingencies for features 
    973     "a", "b" and "e" for the dataset Monk 1. 
    974      
    975     .. _distributions-contingency8: code/distributions-contingency8.py 
    976      
    977     part of `distributions-contingency8`_ (uses monks-1.tab) 
    978      
    979     .. literalinclude:: code/distributions-contingency8.py 
    980         :lines: 1-11 
    981  
    982     Contingencies are of type :obj:`ContingencyVarClass` give 
    983     the conditional distributions of classes, given the value of the variable. 
    984      
    985     .. _distributions-contingency8: code/distributions-contingency8.py 
    986      
    987     part of `distributions-contingency8`_ (uses monks-1.tab) 
    988      
    989     .. literalinclude:: code/distributions-contingency8.py 
    990         :lines: 13-  
    991  
    992  
    993 .. _contcont: 
    994  
    995 Contingencies for continuous variables 
    996 ====================================== 
    997  
    998 If the outer variable is continuous, the index must be one of the values that do 
    999 exist in the contingency matrix. Using other values raises an exception:: 
    1000  
    1001     .. _distributions-contingency6: code/distributions-contingency6.py 
    1002      
    1003     part of `distributions-contingency6`_ (uses monks-1.tab) 
    1004      
    1005     .. literalinclude:: code/distributions-contingency6.py 
    1006         :lines: 1-5,18,19 
    1007  
    1008 Since even rounding can be a problem, the only safe way to get the key is to 
    1009 take it from from the contingencies' ``keys``. 
    1010  
    1011 Contingencies with discrete outer variable and continuous inner variables are 
    1012 more useful, since methods :obj:`ContingencyClassVar.p_class` and  
    1013 :obj:`ContingencyVarClass.p_attr` use the primitive density estimation 
    1014 provided by :obj:`Orange.statistics.distribution.Distribution`. 
    1015  
    1016 For example, :obj:`ContingencyClassVar` on the iris dataset can return the 
    1017 probability of the sepal length 5.5 for different classes:: 
    1018  
    1019     .. _distributions-contingency7: code/distributions-contingency7.py 
    1020      
    1021     part of `distributions-contingency7`_ (uses iris.tab) 
    1022      
    1023     .. literalinclude:: code/distributions-contingency7.py 
    1024  
    1025 The script outputs:: 
    1026  
    1027     Estimated frequencies for e=5.5 
    1028       f(5.5|Iris-setosa) = 2.000 
    1029       f(5.5|Iris-versicolor) = 5.000 
    1030       f(5.5|Iris-virginica) = 1.000 
    1031  
    1032364""" 
    1033365 
    1034366 
    1035  
    1036 from Orange.core import \ 
    1037      DomainContingency, \ 
    1038      DomainDistributions, \ 
    1039      DistributionList, \ 
    1040      ComputeDomainContingency, \ 
    1041      Contingency 
    1042  
    1043 from Orange.core import BasicAttrStat as BasicStatistics 
    1044 from Orange.core import DomainBasicAttrStat as DomainBasicStatistics 
    1045 from Orange.core import ContingencyAttrAttr as ContingencyVarVar 
    1046 from Orange.core import ContingencyClass as ContingencyClass 
    1047 from Orange.core import ContingencyAttrClass as ContingencyVarClass 
    1048 from Orange.core import ContingencyClassAttr as ContingencyClassVar 
     367from Orange.core import Distribution 
     368from Orange.core import DiscDistribution as Discrete 
     369from Orange.core import ContDistribution as Continuous 
     370from Orange.core import GaussianDistribution as Gaussian 
     371 
     372from Orange.core import DomainDistributions as Domain 
  • orange/doc/Orange/rst/index.rst

    r7608 r7659  
    2222   Orange.regression 
    2323    
    24    Orange.statistics.distributions 
     24   Orange.statistics 
    2525   Orange.ensemble 
    2626 
Note: See TracChangeset for help on using the changeset viewer.