# Changeset 7570:1af826def4fa in orange

Ignore:
Timestamp:
02/04/11 23:11:37 (3 years ago)
Branch:
default
Convert:
e878347f62165a08e3e7fc28098c135a809d3a72
Message:

No more warnings in Orange.statistics.distributions!

File:
1 edited

Unmodified
Removed
• ## orange/Orange/statistics/distributions.py

 r7542 Orange has several classes for computing and storing basic statistics about features, distributions and contingencies. ======================================== Basic Statistics for Continuous Features The are two simple classes for computing basic statistics for continuous features, such as their minimal and maximal value or average: :class:`BasicAttrStat` holds the statistics for a single feature and :class:`DomainBasicAttrStat` is a container storing a list of instances of or average: :class:`BasicStatistics` holds the statistics for a single feature and :class:`DomainBasicStatistics` is a container storing a list of instances of the above class for all features in the domain. .. class:: BasicAttrStat `BasicAttrStat` computes on-the fly statistics. .. class:: BasicStatistics `DomainBasicStatistics` computes on-the fly statistics. .. attribute:: variable Number of instances for which the value was defined (and used in the statistics). If instances were weighted, `n` is the sum of weights of those instances. ``n`` is the sum of weights of those instances. .. attribute:: sum, sum2 Holds recomputation of the average and standard deviation. .. method:: add(value[, weight=1.0]) Adds a value to the statistics. Both arguments should be numbers. .. method:: add(value[, weight=1]) :param value: Value to be added to the statistics :type value: float :param weight: Weight assigned to the value :type weight: float Adds a value to the statistics. .. :obj:`sum` and its square to :obj:`sum2`. The weight is added to :obj:`n`. The statistics does not include the median or any other statistics that can be computed on the fly, without remembering the data. Quantiles can be computed by :obj:`ContDistribution`. !!!TODO Instances of this class are seldom constructed manually; they are more often returned by :obj:`DomainBasicAttrStat` described below. .. class:: DomainBasicAttrStat :param data: A table of instances :type data: Orange.data.Table :param weight: The id of the meta-attribute with weights :type weight: `int` or none Constructor computes the statistics for all continuous features in the give data, and puts `None` to the places corresponding to other types of features. returned by :obj:`DomainBasicStatistics` described below. .. class:: DomainBasicStatistics ``DomainBasicStatistics`` behaves like a ordinary list, except that its elements can also be indexed by feature descriptors or feature names. .. method:: __init__(data[, weight=None]) :param data: A table of instances :type data: Orange.data.Table :param weight: The id of the meta-attribute with weights :type weight: `int` or none Constructor computes the statistics for all continuous features in the give data, and puts `None` to the places corresponding to other types of features. .. method:: purge() Removes the `None`'s corresponding to non-continuous features. `DomainBasicAttrStat` behaves like a ordinary list, except that its elements can also be indexed by feature descriptors or feature names. .. _distributions-basic-stat: code/distributions-basic-stat.py part of `distributions-basic-stat`_ (uses monks-1.tab) Removes the ``None``'s corresponding to non-continuous features. part of `distributions-basic-stat.py`_ (uses monks-1.tab) .. literalinclude:: code/distributions-basic-stat.py :lines: 1-10 This code prints out:: feature   min   max   avg sepal length 4.300 7.900 5.843 sepal width 2.000 4.400 3.054 petal length 1.000 6.900 3.759 petal width 0.100 2.500 1.199 .. _distributions-basic-stat: code/distributions-basic-stat.py Output:: feature   min   max   avg sepal length 4.300 7.900 5.843 sepal width 2.000 4.400 3.054 petal length 1.000 6.900 3.759 petal width 0.100 2.500 1.199 part of `distributions-basic-stat`_ (uses iris.tab) .. literalinclude:: code/distributions-basic-stat.py :lines: 11- This code prints out:: Output:: 5.84333467484 .. _distributions-basic-stat: code/distributions-basic-stat.py .. _distributions-basic-stat.py: code/distributions-basic-stat.py .. _distributions-contingency: code/distributions-contingency.py part of `distributions-contingency`_ (uses monks-1.tab) is also possible to use features for both, outer and inner variable, so the matrix shows distributions of one variable's values given the value of another. There is a corresponding hierarchy of classes for handling hierarchies: :obj:`Contingency` is a base class for :obj:`ContingencyAttrAttr` (and :obj:`ContingencyClass`; the latter is There is a hierarchy of classes with contingencies:: Contingency:: ContingencyClass ContingencyClassAttr ContingencyAttrClass ContingencyAttrAttr The base object is Contingency. Derived from it is ContingencyClass in which one of the feaure is class; ContingencyClass is a base for two classes, ContingencyAttrClass and ContingencyClassAttr, the former having class as the inner and the latter as the outer feature. Class ContingencyAttrAttr is derived directly from Contingency and represents contingency matrices in which none of the feature is the class. The most common used of the above classes is ContingencyAttrClass which resembles conditional probabilities of classes given the feature value. Here's what all contingency matrices share in common. There is a corresponding hierarchy of classes for handling hierarchies: :obj:`Contingency` is a base class for :obj:`ContingencyVarVar` (both variables are attribtes) and :obj:`ContingencyClass` (one variable is the class). The latter is the base class for :obj:`ContingencyVarClass` and :obj:`ContingencyClassVar`. The most commonly used of the above classes is :obj:`ContingencyVarClass` which can compute and store conditional probabilities of classes given the feature value. .. class:: Orange.statistics.distribution.Contingency The base class is, once for a change, not abstract. Its constructor expects two feature descriptors, the first one for the outer and the second for the inner feature. It initializes empty distributions and it's up to you to fill them. This is, for instance, how to manually reproduce results of the script at the top of the page. .. attribute:: outerVariable The outer feature descriptor. In the above case, it is e. .. attribute:: innerVariable The inner feature descriptor. In the above case, it is the class .. attribute:: outerVariable (`Orange.data.feature.Feature`_) Descriptor of the outer variable. .. _`Orange.data.feature.Feature`: :obj:`Orange.data.feature.Feature` .. attribute:: innerVariable (:class:`Orange.data.feature.Feature`) Descriptor of the inner variable. .. attribute:: outerDistribution The distribution of the outer featue's values - sums of rows. In the above case, distribution of e is The distribution (`of the outer feature's values - sums of rows. In the above case, distribution of ``e`` is <108.000, 108.000, 108.000, 108.000> .. _distributions-contingency2: code/distributions-contingency2.py part of `distributions-contingency2`_ (uses monks-1.tab) .. _distributions-contingency3.py: code/distributions-contingency3.py part of `distributions-contingency3.py`_ (uses monks-1.tab) If the class value is '0', than attribute e cannot be '1' (the first value), If the class value is '0', then attribute e cannot be '1' (the first value), but can be anything else, with equal probabilities of 0.333. If the class value is '1', e is '1' in exactly half of examples This constructor is exactly the same as that of Contingency. .. method:: ContingencyAttrAttr(outer_variable, inner_variable, instances[, weightID]) .. method:: ContingencyAttrAttr(outer_variable, inner_variable,  instances[, weightID]) Computes the contingency from the given instances. .. _distributions-contingency5: code/distributions-contingency5.py part of `distributions-contingency5`_ (uses bridges.tab) ================= ==================================== Contingencies with Continuous Values ================= ==================================== What happens if one or both features are continuous? .. _distributions-contingency6: code/distributions-contingency6.py part of `distributions-contingency6`_ (uses monks-1.tab) .. _distributions-contingency7: code/distributions-contingency7.py part of `distributions-contingency7`_ (uses iris.tab) ================= ======================================== Computing Contingencies for All Features ================= ======================================== Computing contingency matrices requires iteration through instances. .. _distributions-contingency8: code/distributions-contingency8.py part of `distributions-contingency8`_ (uses monks-1.tab) .. _distributions-contingency8: code/distributions-contingency8.py part of `distributions-contingency8`_ (uses monks-1.tab)
