source: orange/Orange/doc/reference/basicstat.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 4.6 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8<h1>Basic Statistics for Continuous Attributes</h1>
9
10<P>Orange contains two simple classes for computing <INDEX>basic statistics</INDEX> for continuous attributes, such as their minimal and maximal value or average: <code>BasicAttrStat</code> holds the statistics for a single attribute and <code>DomainBasicAttrStat</code> holds the statistics for all attributes in the domain.
11
12<hr>
13
14<h2>BasicAttrStat</h2>
15<index name="classes/BasicAttrStat">
16
17<p class=section>Attributes</p>
18
19<dl class=attributes>
20<dt>variable</dt>
21<dd>The descriptor for the attribute to which the data applies.</dd>
22<dt>min, max</dt>
23<dd>Minimal and maximal attribute value that was encountered in the data.</dd>
24<dt>avg, dev</dt>
25<dd>Average value and deviation.</dd>
26<dt>n</dt>
27<dd>Number of examples for which the value was defined (and used in the statistics). If examples were weighted, <CODE>n</code> is the sum of weights of those examples.</dd>
28<dt>sum, sum2</dt>
29<dd>Weighted sum of values and weighted sum of squared values of this attribute.</dd>
30<dt>holdRecomputation</dt>
31<dd>Holds recomputation of the average and deviation.</dd>
32</dl>
33
34<P class=section>Methods</p>
35
36<dl class=attributes>
37<dt>add(value[, weight])</dt>
38<dd>Adds a value to the statistics. Both arguments should be numbers; <CODE>weight</CODE> is optional, default is 1.0.</dd>
39<dt>recompute()</dt>
40<dd>Recomputes the average and deviation.</dd>
41</dl>
42
43<p>You most probably won't construct the class yourself, but instead call <CODE>DomainBasicAttrStat</CODE> to compute statistics for all continuous attributes in the dataset.</p>
44
45<P>Nevertheless, here's how the class works. Values are fed into <code>add</code>; this is usually done by <CODE>DomainBasicAttrStat</Code>, but you can traverse the examples and feed the values in Python, if you want to. For each value it checks and, if necessary, adjusts <code>min</code> and <code>max</code>, adds the value to <code>sum</code> and its square to <code>sum2</code>. The weight is added to <code>n</code>. If <code>holdRecomputation</code> is <CODE>false</CODE>, it also computes the average and the deviation. If <CODE>true</CODE>, this gets postponed until <code>recompute</code> is called. It makes sense to postpone recomputation when using the class from C++, while when using it from Python, the recomputation will take much much less time than the Python interpreter, so you can leave it on.</p>
46
47<p>You can see that the statistics does not include the median or,
48more generally, any quantiles. That's because it only collects
49statistics that can be computed on the fly, without remembering the
50data. If you need quantiles, you will need to construct a <a
51href="distributions.htm"><code>ContDistribution</code></A>.</p>
52
53<h2>DomainBasicAttrStat</h2>
54<index name="classes/DomainBasicAttrStat">
55
56<p><code>DomainBasicAttrStat</code> behaves as a list of <code>BasicAttrStat</code> except for a few details.</p>
57
58<P class=section>Methods</P>
59<DL class=attributes>
60<DT>&lt;constructor&gt;</DT>
61<DD>Constructor expects an example generator; if examples are weighted, the second (otherwise optional) arguments should give the id of the meta-attribute with weights.</P>
62
63<p class=header style="margin-bottom: 12pt">part of <a href="basicattrstat.py">basicattrstat.py</a>
64(uses <a href="iris.tab">iris.tab</a>)</p>
65<xmp class=code>import orange
66data = orange.ExampleTable("iris")
67bas = orange.DomainBasicAttrStat(data)
68print "%20s  %5s  %5s  %5s" % ("attribute", "min", "max", "avg")
69for a in bas:
70    if a:
71        print "%20s  %5.3f  %5.3f  %5.3f" % (
72          a.variable.name, a.min, a.max, a.avg)
73</xmp>
74
75<p>This will print</p>
76
77<xmp class="code">           attribute    min    max    avg
78        sepal length  4.300  7.900  5.843
79         sepal width  2.000  4.400  3.054
80        petal length  1.000  6.900  3.759
81         petal width  0.100  2.500  1.199
82</xmp>
83
84<DT>purge()</DT>
85<DD>Noticed the "<code>if a</code>" in the script? It's needed because of discrete attributes for which this statistics cannot be measured and are thus represented by a <code>None</code>. Method <CODE>purge</CODE> gets rid of them by removing the <CODE>None</CODE>'s from the list.</p>
86
87<DT>&lt;list-like operations&gt;</DT>
88<DD><CODE>DomainBasicAttrStat</CODE> behaves like a ordinary list, except that its elements can also be indexed by attribute descriptors or attribute names.</p>
89
90<xmp class="code">>>> print bas["sepal length"].avg
915.84333467484
92</xmp>
93</DL>
94
95<P>If you need more statistics, see information on <a href="distributions.htm">distributions</a>.</P>
96</body> 
Note: See TracBrowser for help on using the repository browser.