source: orange/Orange/doc/modules/orngOutlier.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 2.8 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="../style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8<index name="outlier detection">
9<h1>orngOutlier: module for detecting outliers</h1>
10
11<p>This page describes a class for detecting outliers.</p>
12<p>The class first calculates average distances for each example to other examples in given data. Then it calculates Z-scores for all average distances. Z-scores higher than zero denote an example that is more distant to other examples than average.</p>
13<p>Detection of outliers can be performed directly on examples or on already calculated distance matrix. Also, the number of nearest neighbours used for averaging distances can be set. Default is 0, which means that all examples are used when calculating average distances.</p>
14
15<hr>
16
17<H2>OutlierDetection</H2>
18<index name="classes/OutlierDetection">
19
20<P class=section>Methods</P>
21<DL class=attributes><DT>setExamples(examples, distance)</DT>
22<DD>Sets examples on which the outlier detection will be performed. Distance is a class capable of calculating example distance. If omitted, Manhattan distance is used.</DD>
23</DL>
24<DL class=attributes><DT>setDistanceMatrix(orange.SymMatrix)</DT>
25<DD>Sets the distance matrix on which the outlier detection will be performed.</DD>
26</DL>
27<DL class=attributes><DT>setKNN(neighbours)</DT>
28<DD>Set the number of nearest neighbours considered in determinating outliers.</DD>
29</DL>
30<DL class=attributes><DT>distanceMatrix()</DT>
31<DD>Returns the distance matrix of the dataset.</DD>
32</DL>
33<DL class=attributes><DT>zValues()</DT>
34<DD>Returns a list of Z values of average distances for each element to others. N-th number in the list is the Z-value of N-th example.</DD>
35</DL>
36
37<H2>Examples</H2>
38
39<p>The following example prints a list of Z-values of examples in <CODE>bridges</CODE> dataset.</p>
40<p class="header"><a href="outlier1.py">outlier1.py</a>
41(uses <a href="bridges.tab">bridges.tab</a>)</p>
42<XMP class=code>import orange, orngOutlier
43
44data = orange.ExampleTable("bridges")
45outlierDet = orngOutlier.OutlierDetection()
46outlierDet.setExamples(data)
47print outlierDet.zValues()
48</XMP>
49
50<p>The following example prints 5 examples with highest Z-scores. Euclidian distance is used as a distance measurement and average distance is calculated over 3 nearest neighbours.</p>
51<p class="header"><a href="outlier2.py">outlier2.py</a>
52(uses <a href="bridges.tab">bridges.tab</a>)</p>
53<XMP class=code>import orange, orngOutlier
54
55data = orange.ExampleTable("bridges")
56outlierDet = orngOutlier.OutlierDetection()
57outlierDet.setExamples(data, orange.ExamplesDistanceConstructor_Euclidean(data))
58outlierDet.setKNN(3)
59zValues = outlierDet.zValues()
60sorted = []
61for el in zValues: sorted.append(el)
62sorted.sort()
63for i,el in enumerate(zValues):
64    if el > sorted[-6]: print  data[i], "Z-score: %5.3f" % el
65</XMP> 
Note: See TracBrowser for help on using the repository browser.