Basic Statistics for Continuous Features (basic)¶
The are two simple classes for computing basic statistics for continuous features, such as their minimal and maximal value or average: Orange.statistics.basic.Variable holds the statistics for a single variable and Orange.statistics.basic.Domain behaves like a list of instances of the above class for all variables in the domain.
- class Orange.statistics.basic.Variable¶
Computes and stores minimal, maximal, average and standard deviation of a variable. It does not include the median or any other statistics that can be computed on the fly, without remembering the data; such statistics can be obtained classes from module Orange.statistics.distribution.
Instances of this class are seldom constructed manually; they are more often returned by Domain described below.
- variable¶
The variable to which the data applies.
- min¶
Minimal value encountered
- max¶
Maximal value encountered
- avg¶
Average value
- dev¶
Standard deviation
- n¶
Number of instances for which the value was defined. If instances were weighted, n holds the sum of weights
- sum¶
Weighted sum of values
- sum2¶
Weighted sum of squared values
- class Orange.statistics.basic.Domain¶
statistics.basic.Domain behaves like an ordinary list, except that its elements can also be indexed by variable names or descriptors.
- __init__(data[, weight=None])¶
Compute the statistics for all continuous variables in the data, and put None to the places corresponding to variables of other types.
Parameters: - data (Orange.data.Table) – A table of instances
- weight (int or none) – The id of the meta-attribute with weights
- purge()¶
Remove the None‘s corresponding to non-continuous features; this truncates the list, so the indices do not respond to indices of variables in the domain.
part of distributions-basic-stat.py
import Orange iris = Orange.data.Table("iris.tab") bas = Orange.statistics.basic.Domain(iris) print "%20s %5s %5s %5s" % ("feature", "min", "max", "avg") for a in bas: if a: print "%20s %5.3f %5.3f %5.3f" % (a.variable.name, a.min, a.max, a.avg)
Output:
feature min max avg sepal length 4.300 7.900 5.843 sepal width 2.000 4.400 3.054 petal length 1.000 6.900 3.759 petal width 0.100 2.500 1.199
part of distributions-basic-stat.py
print bas["sepal length"].avg
Output:
5.84333467484