source: orange/orange/Orange/statistics/contingency.py @ 7671:65b33dc81393

Revision 7671:65b33dc81393, 18.6 KB checked in by janezd <janez.demsar@…>, 3 years ago (diff)

Added the missing modules

Line 
1"""
2Contingency table contains conditional distributions. Unless explicitly
3'normalized', they contain absolute frequencies, that is, the number of
4instances with a particular combination of two variables' values. If they are
5normalized by dividing each cell by the row sum, the represent conditional
6probabilities of the column variable (here denoted as ``innerVariable``)
7conditioned by the row variable (``outerVariable``).
8
9Contingency matrices are usually constructed for discrete variables. Matrices
10for continuous variables have certain limitations described in a :ref:`separate
11section <contcont>`.
12
13The example below loads the monks-1 data set and prints out the conditional
14class distribution given the value of `e`.
15
16.. _statistics-contingency: code/statistics-contingency.py
17
18part of `statistics-contingency`_ (uses monks-1.tab)
19
20.. literalinclude:: code/statistics-contingency.py
21    :lines: 1-7
22
23This code prints out::
24
25    1 <0.000, 108.000>
26    2 <72.000, 36.000>
27    3 <72.000, 36.000>
28    4 <72.000, 36.000>
29
30Contingencies behave like lists of distributions (in this case, class
31distributions) indexed by values (of `e`, in this
32example). Distributions are, in turn indexed by values (class values,
33here). The variable `e` from the above example is called the outer
34variable, and the class is the inner. This can also be reversed. It is
35also possible to use features for both, outer and inner variable, so
36the table shows distributions of one variable's values given the
37value of another.  There is a corresponding hierarchy of classes:
38:obj:`Table` is a base class for :obj:`VarVar` (both
39variables are attributes) and :obj:`Class` (one variable is
40the class).  The latter is the base class for
41:obj:`VarClass` and :obj:`ClassVar`.
42
43The most commonly used of the above classes is :obj:`VarClass` which
44can compute and store conditional probabilities of classes given the feature value.
45
46Contingency matrices
47====================
48
49.. class:: Table
50
51    Provides a base class for storing and manipulating contingency
52    matrices. Although it is not abstract, it is seldom used directly but rather
53    through more convenient derived classes described below.
54
55    .. attribute:: outerVariable
56
57       Outer variable (:class:`Orange.data.variable.Variable`) whose values are
58       used as the first, outer index.
59
60    .. attribute:: innerVariable
61
62       Inner variable(:class:`Orange.data.variable.Variable`), whose values are
63       used as the second, inner index.
64 
65    .. attribute:: outerDistribution
66
67        The marginal distribution (:class:`Distribution`) of the outer variable.
68
69    .. attribute:: innerDistribution
70
71        The marginal distribution (:class:`Distribution`) of the inner variable.
72       
73    .. attribute:: innerDistributionUnknown
74
75        The distribution (:class:`distribution.Distribution`) of the inner variable for
76        instances for which the outer variable was undefined. This is the
77        difference between the ``innerDistribution`` and (unconditional)
78        distribution of inner variable.
79     
80    .. attribute:: varType
81
82        The type of the outer variable (:obj:`Orange.data.Type`, usually
83        :obj:`Orange.data.variable.Discrete` or
84        :obj:`Orange.data.variable.Continuous`); equals
85        ``outerVariable.varType`` and ``outerDistribution.varType``.
86
87    .. method:: __init__(outer_variable, inner_variable)
88     
89        Construct an instance of contingency table for the given pair of
90        variables.
91     
92        :param outer_variable: Descriptor of the outer variable
93        :type outer_variable: Orange.data.variable.Variable
94        :param outer_variable: Descriptor of the inner variable
95        :type inner_variable: Orange.data.variable.Variable
96       
97    .. method:: add(outer_value, inner_value[, weight=1])
98   
99        Add an element to the contingency table by adding ``weight`` to the
100        corresponding cell.
101
102        :param outer_value: The value for the outer variable
103        :type outer_value: int, float, string or :obj:`Orange.data.Value`
104        :param inner_value: The value for the inner variable
105        :type inner_value: int, float, string or :obj:`Orange.data.Value`
106        :param weight: Instance weight
107        :type weight: float
108
109    .. method:: normalize()
110
111        Normalize all distributions (rows) in the table to sum to ``1``::
112       
113            >>> cont.normalize()
114            >>> for val, dist in cont.items():
115                   print val, dist
116
117        Output: ::
118
119            1 <0.000, 1.000>
120            2 <0.667, 0.333>
121            3 <0.667, 0.333>
122            4 <0.667, 0.333>
123
124        .. note::
125       
126            This method does not change the ``innerDistribution`` or
127            ``outerDistribution``.
128       
129    With respect to indexing, contingency table is a cross between dictionary
130    and a list. It supports standard dictionary methods ``keys``, ``values`` and
131    ``items``. ::
132
133        >> print cont.keys()
134        ['1', '2', '3', '4']
135        >>> print cont.values()
136        [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>]
137        >>> print cont.items()
138        [('1', <0.000, 108.000>), ('2', <72.000, 36.000>),
139        ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]
140
141    Although keys returned by the above functions are strings, contingency can
142    be indexed by anything that can be converted into values of the outer
143    variable: strings, numbers or instances of ``Orange.data.Value``. ::
144
145        >>> print cont[0]
146        <0.000, 108.000>
147        >>> print cont["1"]
148        <0.000, 108.000>
149        >>> print cont[orange.Value(data.domain["e"], "1")]
150
151    The length of the table equals the number of values of the outer
152    variable. However, iterating through contingency
153    does not return keys, as with dictionaries, but distributions. ::
154
155        >>> for i in cont:
156            ... print i
157        <0.000, 108.000>
158        <72.000, 36.000>
159        <72.000, 36.000>
160        <72.000, 36.000>
161        <72.000, 36.000>
162
163
164.. class:: Class
165
166    An abstract base class for contingency matrices that contain the class,
167    either as the inner or the outer variable.
168
169    .. attribute:: classVar (read only)
170   
171        The class attribute descriptor; always equal to either
172        :obj:`Table.innerVariable` or :obj:``Table.outerVariable``.
173
174    .. attribute:: variable
175   
176        Variable; always equal either to either ``innerVariable`` or ``outerVariable``
177
178    .. method:: add_var_class(variable_value, class_value[, weight=1])
179
180        Add an element to contingency by increasing the corresponding count. The
181        difference between this and :obj:`Table.add` is that the variable
182        value is always the first argument and class value the second,
183        regardless of which one is inner and which one is outer.
184
185        :param variable_value: Variable value
186        :type variable_value: int, float, string or :obj:`Orange.data.Value`
187        :param class_value: Class value
188        :type class_value: int, float, string or :obj:`Orange.data.Value`
189        :param weight: Instance weight
190        :type weight: float
191
192
193.. class:: VarClass
194
195    A class derived from :obj:`Class` in which the variable is
196    used as :obj:`Table.outerVariable` and class as the
197    :obj:`Table.innerVariable`. This form is a form suitable for
198    computation of conditional class probabilities given the variable value.
199   
200    Calling :obj:`VarClass.add_var_class(v, c)` is equivalent to
201    :obj:`Table.add(v, c)`. Similar as :obj:`Table`,
202    :obj:`VarClass` can compute contingency from instances.
203
204    .. method:: __init__(feature, class_variable)
205
206        Construct an instance of :obj:`VarClass` for the given pair of
207        variables. Inherited from :obj:`Table`.
208
209        :param feature: Outer variable
210        :type feature: Orange.data.variable.Variable
211        :param class_attribute: Class variable; used as ``innerVariable``
212        :type class_attribute: Orange.data.variable.Variable
213       
214    .. method:: __init__(feature, data[, weightId])
215
216        Compute the contingency table from data.
217
218        :param feature: Outer variable
219        :type feature: Orange.data.variable.Variable
220        :param data: A set of instances
221        :type data: Orange.data.Table
222        :param weightId: meta attribute with weights of instances
223        :type weightId: int
224
225    .. method:: p_class(value)
226
227        Return the probability distribution of classes given the value of the
228        variable.
229
230        :param value: The value of the variable
231        :type value: int, float, string or :obj:`Orange.data.Value`
232        :rtype: Orange.statistics.distribution.Distribution
233
234
235    .. method:: p_class(value, class_value)
236
237        Returns the conditional probability of the class_value given the
238        feature value, p(class_value|value) (note the order of arguments!)
239       
240        :param value: The value of the variable
241        :type value: int, float, string or :obj:`Orange.data.Value`
242        :param class_value: The class value
243        :type value: int, float, string or :obj:`Orange.data.Value`
244        :rtype: float
245
246    .. _statistics-contingency3.py: code/statistics-contingency3.py
247
248    part of `statistics-contingency3.py`_ (uses monks-1.tab)
249
250    .. literalinclude:: code/statistics-contingency3.py
251        :lines: 1-23
252
253    The inner and the outer variable and their relations to the class are
254    as follows::
255
256        Inner variable:  y
257        Outer variable:  e
258   
259        Class variable:  y
260        Feature:         e
261
262    Distributions are normalized, and probabilities are elements from the
263    normalized distributions. Knowing that the target concept is
264    y := (e=1) or (a=b), distributions are as expected: when e equals 1, class 1
265    has a 100% probability, while for the rest, probability is one third, which
266    agrees with a probability that two three-valued independent features
267    have the same value. ::
268
269        Distributions:
270          p(.|1) = <0.000, 1.000>
271          p(.|2) = <0.662, 0.338>
272          p(.|3) = <0.659, 0.341>
273          p(.|4) = <0.669, 0.331>
274   
275        Probabilities of class '1'
276          p(1|1) = 1.000
277          p(1|2) = 0.338
278          p(1|3) = 0.341
279          p(1|4) = 0.331
280   
281        Distributions from a matrix computed manually:
282          p(.|1) = <0.000, 1.000>
283          p(.|2) = <0.662, 0.338>
284          p(.|3) = <0.659, 0.341>
285          p(.|4) = <0.669, 0.331>
286
287
288.. class:: ClassVar
289
290    :obj:`ClassVar` is similar to :obj:`VarClass` except
291    that the class is outside and the variable is inside. This form of
292    contingency table is suitable for computing conditional probabilities of
293    variable given the class. All methods get the two arguments in the same
294    order as :obj:`VarClass`.
295
296    .. method:: __init__(feature, class_variable)
297
298        Construct an instance of :obj:`VarClass` for the given pair of
299        variables. Inherited from :obj:`Table`, except for the reversed
300        order of arguments.
301
302        :param feature: Outer variable
303        :type feature: Orange.data.variable.Variable
304        :param class_variable: Class variable
305        :type class_variable: Orange.data.variable.Variable
306       
307    .. method:: __init__(feature, data[, weightId])
308
309        Compute contingency table from the data.
310
311        :param feature: Descriptor of the outer variable
312        :type feature: Orange.data.variable.Variable
313        :param data: A set of instances
314        :type data: Orange.data.Table
315        :param weightId: meta attribute with weights of instances
316        :type weightId: int
317
318    .. method:: p_attr(class_value)
319
320        Return the probability distribution of variable given the class.
321
322        :param class_value: The value of the variable
323        :type class_value: int, float, string or :obj:`Orange.data.Value`
324        :rtype: Orange.statistics.distribution.Distribution
325
326    .. method:: p_attr(value, class_value)
327
328        Returns the conditional probability of the value given the
329        class, p(value|class_value).
330
331        :param value: Value of the variable
332        :type value: int, float, string or :obj:`Orange.data.Value`
333        :param class_value: Class value
334        :type value: int, float, string or :obj:`Orange.data.Value`
335        :rtype: float
336
337    .. _statistics-contingency4.py: code/statistics-contingency4.py
338
339    .. literalinclude:: code/statistics-contingency4.py
340        :lines: 1-27
341
342    part of the output from `statistics-contingency4.py`_ (uses monk1.tab)
343   
344    The role of the feature and the class are reversed compared to
345    :obj:`ClassVar`::
346   
347        Inner variable:  e
348        Outer variable:  y
349   
350        Class variable:  y
351        Feature:         e
352   
353    Distributions given the class can be printed out by calling :meth:`p_attr`.
354   
355    part of `statistics-contingency4.py`_ (uses monks-1.tab)
356   
357    .. literalinclude:: code/statistics-contingency4.py
358        :lines: 30-31
359   
360    will print::
361        p(.|0) = <0.000, 0.333, 0.333, 0.333>
362        p(.|1) = <0.500, 0.167, 0.167, 0.167>
363   
364    If the class value is '0', the attribute `e` cannot be `1` (the first
365    value), while distribution across other values is uniform.  If the class
366    value is `1`, `e` is `1` for exactly half of instances, and distribution of
367    other values is again uniform.
368
369.. class:: VarVar
370
371    Contingency table in which none of the variables is the class.  The class
372    is derived from :obj:`Table`, and adds an additional constructor and
373    method for getting conditional probabilities.
374
375    .. method:: VarVar(outer_variable, inner_variable)
376
377        Inherited from :obj:`Table`.
378
379    .. method:: __init__(outer_variable, inner_variable, data[, weightId])
380
381        Compute the contingency from the given instances.
382
383        :param outer_variable: Outer variable
384        :type outer_variable: Orange.data.variable.Variable
385        :param inner_variable: Inner variable
386        :type inner_variable: Orange.data.variable.Variable
387        :param data: A set of instances
388        :type data: Orange.data.Table
389        :param weightId: meta attribute with weights of instances
390        :type weightId: int
391
392    .. method:: p_attr(outer_value)
393
394        Return the probability distribution of the inner variable given the
395        outer variable value.
396
397        :param outer_value: The value of the outer variable
398        :type outer_value: int, float, string or :obj:`Orange.data.Value`
399        :rtype: Orange.statistics.distribution.Distribution
400 
401    .. method:: p_attr(outer_value, inner_value)
402
403        Return the conditional probability of the inner_value
404        given the outer_value.
405
406        :param outer_value: The value of the outer variable
407        :type outer_value: int, float, string or :obj:`Orange.data.Value`
408        :param inner_value: The value of the inner variable
409        :type inner_value: int, float, string or :obj:`Orange.data.Value`
410        :rtype: float
411
412    The following example investigates which material is used for
413    bridges of different lengths.
414   
415    .. _statistics-contingency5.py: code/statistics-contingency5.py
416   
417    part of `statistics-contingency5.py`_ (uses bridges.tab)
418   
419    .. literalinclude:: code/statistics-contingency5.py
420        :lines: 1-17
421
422    Short bridges are mostly wooden or iron, and the longer (and most of the
423    middle sized) are made from steel::
424   
425        SHORT:
426           WOOD (56%)
427           IRON (44%)
428   
429        MEDIUM:
430           WOOD (9%)
431           IRON (11%)
432           STEEL (79%)
433   
434        LONG:
435           STEEL (100%)
436   
437    As all other contingency tables, this one can also be computed "manually".
438   
439    .. literalinclude:: code/statistics-contingency5.py
440        :lines: 18-
441
442
443Contingencies for entire domain
444===============================
445
446A list of contingency tables, either :obj:`VarClass` or
447:obj:`ClassVar`.
448
449.. class:: Domain
450
451    .. method:: __init__(data[, weightId=0, classOuter=0|1])
452
453        Compute a list of contingency tables.
454
455        :param data: A set of instances
456        :type data: Orange.data.Table
457        :param weightId: meta attribute with weights of instances
458        :type weightId: int
459        :param classOuter: `True`, if class is the outer variable
460        :type classOuter: bool
461
462        .. note::
463       
464            ``classIsOuter`` cannot be given as positional argument,
465            but needs to be passed by keyword.
466
467    .. attribute:: classIsOuter (read only)
468
469        Tells whether the class is the outer or the inner variable.
470
471    .. attribute:: classes
472
473        Contains the distribution of class values on the entire dataset.
474
475    .. method:: normalize()
476
477        Call normalize for all contingencies.
478
479    The following script prints the contingency tables for features
480    "a", "b" and "e" for the dataset Monk 1.
481   
482    .. _statistics-contingency8: code/statistics-contingency8.py
483   
484    part of `statistics-contingency8`_ (uses monks-1.tab)
485   
486    .. literalinclude:: code/statistics-contingency8.py
487        :lines: 9
488
489    Contingency tables of type :obj:`VarClass` give
490    the conditional distributions of classes, given the value of the variable.
491   
492    .. _statistics-contingency8: code/statistics-contingency8.py
493   
494    part of `statistics-contingency8`_ (uses monks-1.tab)
495   
496    .. literalinclude:: code/statistics-contingency8.py
497        :lines: 12-
498
499
500.. _contcont:
501
502Contingency tables for continuous variables
503===========================================
504
505If the outer variable is continuous, the index must be one of the
506values that do exist in the contingency table; other values raise an
507exception::
508
509    .. _statistics-contingency6: code/statistics-contingency6.py
510   
511    part of `statistics-contingency6`_ (uses monks-1.tab)
512   
513    .. literalinclude:: code/statistics-contingency6.py
514        :lines: 1-4,17-
515
516Since even rounding can be a problem, the only safe way to get the key
517is to take it from from the contingencies' ``keys``.
518
519Contingency tables with discrete outer variable and continuous inner variables
520are more useful, since methods :obj:`ContingencyClassVar.p_class`
521and :obj:`ContingencyVarClass.p_attr` use the primitive density estimation
522provided by :obj:`Orange.statistics.distribution.Distribution`.
523
524For example, :obj:`ClassVar` on the iris dataset can return the
525probability of the sepal length 5.5 for different classes::
526
527    .. _statistics-contingency7: code/statistics-contingency7.py
528   
529    part of `statistics-contingency7`_ (uses iris.tab)
530   
531    .. literalinclude:: code/statistics-contingency7.py
532
533The script outputs::
534
535    Estimated frequencies for e=5.5
536      f(5.5|Iris-setosa) = 2.000
537      f(5.5|Iris-versicolor) = 5.000
538      f(5.5|Iris-virginica) = 1.000
539
540"""
541
542from Orange.core import Contingency as Table
543from Orange.core import ContingencyAttrAttr as VarVar
544from Orange.core import ContingencyClass as Class
545from Orange.core import ContingencyAttrClass as VarClass
546from Orange.core import ContingencyClassAttr as ClassVar
547
548from Orange.core import DomainContingency as Domain
Note: See TracBrowser for help on using the repository browser.