source: orange/Orange/doc/reference/FindNearest.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 6.9 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8<index name="k nearest neighbours">
9<index name="nearest neighbours">
10<h1>Classes for Finding Nearest Neighbours</h1>
11
12<P>Orange provides classes for finding the nearest neighbours of the given reference example. While we might add some smarter classes in future, we now have only two - abstract classes that defines the general behaviour of neighbour searching classes, and classes that implement brute force search.</P>
13
14<P>As usually in Orange, there is a pair of classes: a class that does the work (<CODE>FindNearest</CODE>) and a class that constructs it ("learning" - getting the examples and arranging them in an appropriate data structure that allows for searching) (<CODE>FindNearestConstructor</CODE>). These two classes are abstract, the classes you can actually use are <CODE>FindNearest_BruteForce</CODE> and <CODE>FindNearestConstructor_BruteForce</CODE>.</P>
15
16<hr>
17
18<H2>FindNearest</H2>
19<index name="classes/FindNearest">
20
21<P><CODE>FindNearest</CODE> returns nearest neighbours of a given example. The class is abstract and it is up to derived classes to decide how the examples are stored and searched. Note that the class even does not have a component for measuring distances between examples; derived classes can either store such a component or store examples in such a way that the component is unnecessary.</P>
22
23<P class=section>Attributes</P>
24<DL class=attributes>
25<DT>distanceID</DT>
26<DD>Id of a meta attribute that should store the distance from the reference example; if 0 (default), distances are not stored.</DD>
27
28<DT>includeSame</DT>
29<DD>Tells whether to include the examples that are same as the reference; default is <CODE>true</CODE>.</DD>
30</DL>
31
32<P class=section>Methods</P>
33<DL class=attributes>
34<DT>__call__(example, k[, needsClass])</DT>
35<DD><P>Returns an ordered list of <CODE>k</CODE> nearest neighbours of <CODE>example</CODE>. If examples are weighted, the length of the list is not necessarily <CODE>k</CODE>, but the total weight of the returned examples will be as close to <CODE>k</CODE> as possible. (If there are enough examples, the total weight will be at least <CODE>k</CODE>.)</P>
36
37<P>If <CODE>k</CODE> is zero, all examples are returned, but sorted by their distances to <CODE>example</CODE>.</P>
38
39<P>If the third argument, <code>needsClass</code> is given and true, <Code>FindNearest</Code> will ignore all examples with missing classes. If the domain is classless, this flag has no effect.</P>
40</DD></DL>
41
42
43<H2>FindNearestConstructor</H2>
44<index name="classes/FindNearestConstructor">
45
46<P>The job of <CODE>FindNearestConstructor</CODE> is to construct an instance of <CODE>FindNearest</CODE>.
47
48<P class=section>Attributes</P>
49<DL class=attributes>
50<DT>distanceConstructor</DT>
51<DD>A component of class <CODE>ExamplesDistanceConstructor</CODE> that "learns" to measure distances between examples. Learning can be, for examples, storing the ranges of continuous attributes or the number of value of a discrete attribute (see the page about <a href="ExamplesDistance.htm">measuring distances</a> for more information). The result of learning is an instance of <CODE>ExamplesDistance</CODE> that should be used for measuring distances between examples.</DD>
52
53<DT>includeSame</DT>
54<DD>Tells whether to include the examples that are same as the reference; default is <CODE>true</CODE>.</DD>
55</DL>
56
57<P class=section>Methods</P>
58<DL class=attributes>
59<DT>__call__(examples, weightID, distanceID)</DT>
60<DD>Constructs an instance of <CODE>FindNearest</CODE> that would return neighbours of a given <CODE>example</CODE>, obeying <CODE>weightID</CODE> when counting them (also, some measures of distance might consider weights as well) and store the distances in a meta attribute with ID <CODE>distanceID</CODE>.</DD>
61</DL>
62
63
64<H2>FindNearest_BruteForce</H2>
65<index name="classes/FindNearest_BruteForce">
66
67A class for brute force search for nearest neighbours. It stores a table of examples (it's its own copy of examples, not only <CODE>ExampleTable</CODE> with references to another <CODE>ExampleTable</CODE>). When asked for neighbours, it measures distances to all examples, stores them in a heap and returns the first <CODE>k</CODE> as an <CODE>ExampleTable</CODE> with references to examples stored in <CODE>FindNearest_BruteForce</CODE>'s field <CODE>examples</CODE>).
68
69
70<P class=section>Attributes</P>
71<DL class=attributes>
72<DT>distance</DT>
73<DD>a component that measures distances between examples</DD>
74
75<DT>examples</DT>
76<DD>a stored list of examples</DD>
77
78<DT>weighID</DT>
79<DD>ID of meta attribute with weight</DD>
80</DL>
81
82<H2>FindNearestConstructor_BruteForce</H2>
83<index name="classes/FindNearestConstructor_BruteForce">
84
85<P>A class that constructs <CODE>FindNearest_BruteForce</CODE>. It calls the inherited <CODE>distanceConstructor</CODE> and then passes the constructed distance measure, among with <CODE>examples</CODE>, <CODE>weightID</CODE> and <CODE>distanceID</CODE>, to the just constructed instance of <CODE>FindNearest_BruteForce</CODE>.</P>
86
87<P>If there are more examples with the same distance fighting for the last places, the tie is resolved by randomly picking the appropriate number of examples. A local random generator is constructed and initiated by a constant computed from the reference example. The effect of this is that same random neighbours will be chosen for the example each time <CODE>FindNearest_BruteForce</CODE> is called.</P>
88
89<P>The following script shows how to find the five nearest neighbours of the first example in the lenses dataset.</P>
90
91<p class="header"><a href="examplesdistance.py">examplesdistance.py</a>
92(uses <a href="lenses.tab">lenses.tab</a>)</p>
93<XMP class=code>import orange
94
95data = orange.ExampleTable("lenses")
96
97nnc = orange.FindNearestConstructor_BruteForce()
98nnc.distanceConstructor = orange.ExamplesDistanceConstructor_Euclidean()
99
100did = orange.newmetaid()
101nn = nnc(data, 0, did)
102
103print "*** Reference example: ", data[0]
104for ex in nn(data[0], 5):
105    print ex
106</XMP>
107
108<P>First, we prepare a constructor <CODE>nnc</CODE>, and tell it that we want to use Euclidean distance. The we obtain a new id for meta-attribute and call the constructor. The resulting object <CODE>nn</CODE> is trained to find neighbours of examples in lenses dataset. Finally, in the for loop, we call <CODE>nn</CODE> to find the five nearest neighbours of <CODE>data[0]</CODE> and print them out.</P>
109
110<XMP class=code>*** Reference example:  ['young', 'myope', 'no', 'reduced', 'none']
111['young', 'myope', 'no', 'reduced', 'none'], {-2:0.00}
112['young', 'myope', 'yes', 'reduced', 'none'], {-2:1.00}
113['young', 'hypermetrope', 'no', 'reduced', 'none'], {-2:1.00}
114['pre-presbyopic', 'myope', 'no', 'reduced', 'none'], {-2:1.00}
115['presbyopic', 'myope', 'no', 'reduced', 'none'], {-2:1.00}
116</XMP>
117
118<P>All returned examples here have a new meta-attribute with id -2 that shows the distance from the reference example.</P> 
Note: See TracBrowser for help on using the repository browser.