Changeset 7614:330049b0117c in orange


Ignore:
Timestamp:
02/06/11 19:39:05 (3 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Convert:
510fb1eadfbe18dab98b0828b84f899b6da86e49
Message:

Not through with checking yet - came to ContingencyVarClass

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/statistics/distributions.py

    r7574 r7614  
    120120 
    121121 
    122 ================== 
    123122Contingency Matrix 
    124123================== 
    125124 
    126 Contingency matrix contains conditional distributions. They can work for both, 
    127 discrete and continuous variables; although examples on this page will mostly 
    128 use discrete ones, similar code could be run for continuous variables. 
     125Contingency matrix contains conditional distributions. When initialized, they 
     126will typically contain absolute frequencies, that is, the number of instances 
     127with a particular combination of two variables' values. If they are normalized 
     128by dividing each cell by the row sum, the represent conditional probabilities 
     129of the column variable (here denoted as ``innerVariable``) conditioned by the 
     130row variable (``outerVariable``).  
     131 
     132Contingencies work with both, discrete and continuous variables. 
    129133 
    130134.. _distributions-contingency: code/distributions-contingency.py 
     
    156160.. class:: Orange.statistics.distribution.Contingency 
    157161 
    158     .. attribute:: outerVariable (`Orange.data.feature.Feature`_)  
    159  
    160        Descriptor of the outer variable. 
    161  
    162 .. _`Orange.data.feature.Feature`: :obj:`Orange.data.feature.Feature` 
    163  
    164     .. attribute:: innerVariable (:class:`Orange.data.feature.Feature`) 
    165  
    166         Descriptor of the inner variable. 
    167  
     162    .. attribute:: outerVariable 
     163 
     164       Descriptor (:class:`Orange.data.feature.Feature`) of the outer variable. 
     165 
     166    .. attribute:: innerVariable 
     167 
     168        Descriptor (:class:`Orange.data.feature.Feature`) of the inner variable. 
     169  
    168170    .. attribute:: outerDistribution 
    169171 
    170         The distribution (`of the outer feature's values - sums of rows. 
    171         In the above case, distribution of ``e`` is 
    172         <108.000, 108.000, 108.000, 108.000> 
     172        The marginal distribution (:class:`Distribution`) of the outer variable. 
    173173 
    174174    .. attribute:: innerDistribution 
    175175 
    176         The distribution of the inner feature. 
    177         In the above case, it is the class distribution 
    178         which is <216.000, 216.000<.  
    179  
     176        The marginal distribution (:class:`Distribution`) of the inner variable. 
     177         
    180178    .. attribute:: innerDistributionUnknown 
    181179 
    182         The distribution of the inner feature for the 
    183         instances where the outer feature was unknown. 
    184         This is the difference between the innerDistribution 
    185         and the sum of all distributions in the matrix. 
     180        The distribution (:class:`Distribution`) of the inner variable for  
     181        instances for which the outer variable was undefined. 
     182        This is the difference between the ``innerDistribution`` 
     183        and unconditional distribution of inner variable. 
    186184       
    187185    .. attribute:: varType 
    188186 
    189         The varType for the outer feature (discrete, continuous...); 
    190         varType equals outerVariable.varType and outerDistribution.varType. 
    191  
    192 Contingency matrix is a cross between dictionary and a list. 
    193 It supports standard dictionary methods keys, values and items.:: 
    194  
    195     >> print cont.keys() 
    196     ['1', '2', '3', '4'] 
    197     >>> print cont.values() 
    198     [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>] 
    199     >>> print cont.items() 
    200     [('1', <0.000, 108.000>), ('2', <72.000, 36.000>), 
    201     ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]  
    202  
    203 Although keys returned by the above functions are strings, 
    204 you can index the contingency with anything that converts into values 
    205 of the outer feature - strings, numbers or instances of Value.:: 
    206  
    207     >>> print cont[0] 
    208     <0.000, 108.000> 
    209     >>> print cont["1"] 
    210     <0.000, 108.000> 
    211     >>> print cont[Orange.data.Value(data.domain["e"], "1")]  
    212  
    213 Naturally, the length of Contingency equals the number of values of the outer 
    214 feature. The only weird thing is that iterating through contingency 
    215 (by using a for loop, for instance) doesn't return keys, as with dictionaries, 
    216 but dictionary values.:: 
    217  
    218     >>> for i in cont: 
    219         ... print i 
    220     <0.000, 108.000> 
    221     <72.000, 36.000> 
    222     <72.000, 36.000> 
    223     <72.000, 36.000> 
    224     <72.000, 36.000>  
    225  
    226 If cont behaved like a normal dictionary, the above script would print out strings from '0' to '3'. 
    227  
    228  
    229 Other methods 
    230  
    231 .. class:: Orange.statistics.distributions.Contingency 
    232  
    233     .. method:: add(outer_value, inner_value[, weight]) 
    234  
    235        Adds an element to the contingency matrix. 
     187        The type of the outer feature (:obj:`Orange.data.Type`, usually 
     188        :obj:`Orange.data.feature.Discrete` or  
     189        :obj:`Orange.data.feature.Continuous`). ``varType`` equals ``outerVariable.varType`` and ``outerDistribution.varType``. 
     190 
     191    .. method:: __init__(outerVariable, innerVariable) 
     192      
     193        :param outerVariable: Descriptor of the outer variable 
     194        :type outerVariable: Orange.data.feature.Feature 
     195        :param outerVariable: Descriptor of the inner variable 
     196        :type innerVariable: Orange.data.feature.Feature 
     197         
     198        Construct an instance of ``Contingency``. 
     199      
     200    .. method:: add(outer_value, inner_value[, weight=1]) 
     201     
     202        :param outer_value: The value for the outer variable 
     203        :type outer_value: int, float, string or :obj:`Orange.data.Value` 
     204        :param inner_value: The value for the inner variable 
     205        :type inner_value: int, float, string or :obj:`Orange.data.Value` 
     206        :param weight: Instance weight 
     207        :type weight: float 
     208 
     209        Add an element to the contingency matrix by adding 
     210        ``weight`` to the corresponding cell. 
    236211 
    237212    .. method:: normalize() 
    238213 
    239 Normalizes all distributions (rows) in the contingency to sum to 1. 
    240 It doesn't change the innerDistribution or outerDistribution.:: 
    241  
    242     >>> cont.normalize() 
    243     >>> for val, dist in cont.items(): 
    244            print val, dist 
    245  
    246 This outputs: :: 
    247  
    248     1 <0.000, 1.000> 
    249     2 <0.667, 0.333> 
    250     3 <0.667, 0.333> 
    251     4 <0.667, 0.333> 
    252  
    253 .. _distributions-contingency2: code/distributions-contingency2.py 
    254  
    255 part of `distributions-contingency2`_ (uses monks-1.tab) 
    256  
    257 .. literalinclude:: code/distributions-contingency2.py 
    258  
    259 The "reproduction" is not perfect. We didn't care about unknown values 
    260 and haven't computed innerDistribution and outerDistribution. 
    261 The better way to do it is by using the method add, so that the loop becomes: :: 
    262  
    263     for ins in table: 
    264         cont.add(ins["e"], ins.getclass())  
    265  
    266 It's not only simpler, but also correctly handles unknown values 
    267 and updates innerDistribution and outerDistribution.  
     214        Normalize all distributions (rows) in the contingency to sum to ``1``:: 
     215         
     216            >>> cont.normalize() 
     217            >>> for val, dist in cont.items(): 
     218                   print val, dist 
     219 
     220        Output: :: 
     221 
     222            1 <0.000, 1.000> 
     223            2 <0.667, 0.333> 
     224            3 <0.667, 0.333> 
     225            4 <0.667, 0.333> 
     226 
     227        .. note:: 
     228        
     229            This method doesn't change the ``innerDistribution`` or 
     230            ``outerDistribution``. 
     231         
     232    With respect to indexing, contingency matrix is a cross between dictionary 
     233    and a list. It supports standard dictionary methods ``keys``, ``values`` and 
     234    ``items``.:: 
     235 
     236        >> print cont.keys() 
     237        ['1', '2', '3', '4'] 
     238        >>> print cont.values() 
     239        [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>] 
     240        >>> print cont.items() 
     241        [('1', <0.000, 108.000>), ('2', <72.000, 36.000>), 
     242        ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]  
     243 
     244    Although keys returned by the above functions are strings, contingency 
     245    can be indexed with anything that converts into values 
     246    of the outer variable: strings, numbers or instances of ``Orange.data.Value``.:: 
     247 
     248        >>> print cont[0] 
     249        <0.000, 108.000> 
     250        >>> print cont["1"] 
     251        <0.000, 108.000> 
     252        >>> print cont[orange.Value(data.domain["e"], "1")]  
     253 
     254    The length of ``Contingency`` equals the number of values of the outer 
     255    variable. However, iterating through contingency 
     256    doesn't return keys, as with dictionaries, but distributions.:: 
     257 
     258        >>> for i in cont: 
     259            ... print i 
     260        <0.000, 108.000> 
     261        <72.000, 36.000> 
     262        <72.000, 36.000> 
     263        <72.000, 36.000> 
     264        <72.000, 36.000>  
     265 
    268266 
    269267.. class:: Orange.statistics.distribution.ContingencyClass 
    270268 
    271     ContingencyClass is an abstract base class for contingency matrices 
     269    ``ContingencyClass`` is an abstract base class for contingency matrices 
    272270    that contain the class, either as the inner or the outer 
    273     feature. If offers a function for making filing the contingency clearer. 
    274  
    275     After reading through the rest of this page you might ask yourself 
    276     why do we need to separate the classes ContingencyAttrClass, 
    277     ContingencyClassAttr and ContingencyAttrAttr, 
    278     given that the underlying matrix is the same. This is to avoid confusion 
    279     about what is in the inner and the outer variable. 
    280     Contingency matrices are most often used to compute probabilities of conditional 
    281     classes or features. By separating the classes and giving them specialized 
    282     methods for computing the probabilities that are most suitable to compute 
    283     from a particular class, the user (ie, you or the method that gets passed 
    284     the matrix) is relieved from checking what kind of matrix it got, that is, 
    285     where is the class and where's the feature. 
    286  
    287  
     271    variable. 
    288272 
    289273    .. attribute:: classVar (read only) 
    290274     
    291275        The class attribute descriptor. 
    292         This is always equal either to innerVariable or outerVariable 
     276        This is always equal either to :obj:`Contingency.innerVariable` or 
     277        ``outerVariable``. 
    293278 
    294279    .. attribute:: variable 
Note: See TracChangeset for help on using the changeset viewer.