source: orange/Orange/doc/reference/kNNLearner.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 5.8 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8<index name="classifiers+k nearest neighbours">
9<index name="k nearest neighbours">
10<h1>k Nearest Neighbours</h1>
11
12<p>kNN is one of the simplest learning techniques - the learner only needs to store the examples, while the classifier does its work by observing the most similar examples of the example to be classified.</p>
13
14<p>Classes for kNN learner and classifier are strongly related to classes for <a href="ExamplesDistance.htm">measuring distances</a>, described on a separate page.</p>
15
16<hr>
17
18<H2>kNNLearner</H2>
19<index name="classes/kNNLearner">
20
21<P class=section>Attributes</P>
22<DL class=attributes>
23<DT>k</DT>
24<DD>Number of neighbours. If set to 0 (which is also the default value), the square root of the number of examples is used. <b>Changed:</b> the default used to be 1.</DD>
25
26<DT>rankWeight</DT>
27<DD>Enables weighting by ranks (default: <CODE>true</CODE>)</DD>
28
29<DT>distanceConstructor</DT>
30<DD>A component that constructs the object for measuring distances between examples.</DD>
31</DL>
32
33<P><CODE>kNNLearner</CODE> first constructs an object for measuring distances between examples. <CODE>distanceConstructor</CODE> is used if given; otherwise, Euclidean metrics will be used. <CODE>kNNLearner</CODE> then constructs an instance of <a href="FindNearest.htm#FindNearest_BruteForce"><CODE>FindNearest_BruteForce</CODE></a>. Together with ID of meta attribute with weights of examples, <CODE>k</CODE> and <CODE>rankWeight</CODE>, it is passed to a <CODE>kNNClassifier</CODE>.</P>
34
35<H2>kNNClassifier</H2>
36<index name="classes/kNNClassifier">
37
38<P class=section>Attributes</P>
39<DL class=attributes>
40<DT>findNearest</DT>
41<DD>A component that finds nearest neighbours of a given example.</DD>
42
43<DT>k</DT>
44<DD>Number of neighbours.  If set to 0 (which is also the default value), the square root of the number of examples is used. <b>Changed:</b> the default used to be 1.</DD>
45
46<DT>rankWeight</DT>
47<DD>Enables weighting by ranks (default: <CODE>true</CODE>).</DD>
48
49<DT>weighID</DT>
50<DD>ID of meta attribute with weights of examples</DD>
51
52<DT>nExamples</DT>
53<DD>The number of learning examples. It is used to compute the number of neighbours if k is zero.</DD>
54</DL>
55
56<P>When called to classify an example, the classifier first calls <CODE>findNearest</CODE> to retrieve a list with <CODE>k</CODE> nearest neighbours. The component <CODE>findNearest</CODE> has a stored table of examples (those that have been passed to the learner) together with their weights. If examples are weighted (non-zero <CODE>weightID</CODE>), weights are considered when counting the neighbours.</P>
57
58<P>If <CODE>findNearest</CODE> returns only one neighbour (this is the case if <CODE>k</CODE>=1), <CODE>kNNClassifier</CODE> returns the neighbour's class.</P>
59
60<P>Otherwise, the retrieved neighbours vote about the class prediction (or probability of classes). Voting has double weights. As first, if examples are weighted, their weights are respected. Secondly, nearer neighbours have greater impact on the prediction; weight of example is computed as exp(-<EM>t</EM><SUP>2</SUP>/<EM>s</EM><SUP>2</SUP>), where the meaning of <EM>t</EM> depends on the setting of <CODE>rankWeight</CODE>.</P>
61<UL>
62<LI>if <CODE>rankWeight</CODE> is <CODE>false</CODE>, <EM>t</EM> is a distance from the example being classified;</LI>
63<LI>if <CODE>rankWeight</CODE> is <CODE>true</CODE>, neighbours are ordered and <EM>t</EM> is the position of the neighbour on the list (a rank)</LI>.
64</UL>
65<P>In both cases, <EM>s</EM> is chosen so that the impact of the farthest example is 0.001.</P>
66
67<P>Weighting gives the classifier certain insensitivity to the number of neighbours used, making it possible to use large <CODE>k</CODE>'s.</P>
68
69<P>The classifier can treat continuous and discrete attributes, and can even distinguish between ordinal and nominal attributes. See information on <a href="ExamplesDistance.htm">distance measuring</a> for details.</P>
70
71
72<p>We will test the learner on 'iris' dataset. We shall split it onto train (80%) and test (20%) sets, learn on training examples and test on five randomly selected test examples.</p>
73
74<p class="header">part of <a href="knnlearner.py">knnlearner.py</a>
75(uses <a href="iris.tab">iris.tab</a>)</p>
76<xmp class="code">>>> import orange, orngTest, orngStat
77>>> data = orange.ExampleTable("iris")
78>>>
79>>> rndind = orange.MakeRandomIndices2(data, p0=0.8)
80>>> train = data.select(rndind, 0)
81>>> test = data.select(rndind, 1)
82>>>
83>>> knn = orange.kNNLearner(train, k=10)
84>>> for i in range(5):
85...     example = test.randomexample()
86...     print example.getclass(), knn(example)
87...
88Iris-setosa Iris-setosa
89Iris-versicolor Iris-versicolor
90Iris-versicolor Iris-versicolor
91Iris-setosa Iris-setosa
92Iris-setosa Iris-setosa
93</xmp>
94
95<P>The secret of kNN's success is that the examples in iris dataset appear in three well separated clusters. The classifier's accuraccy will remain excellent even with very large or small number of neighbours.</p>
96
97<p>As many experiments have shown, a selection of examples distance measure does not have a greater and predictable effect on the performance of kNN classifiers. So there is not much point in changing the default. If you decide to do so, you need to set the <code>distanceConstructor</code> to an instance of one of the classes for <a href="ExamplesDistance.htm">distance measuring</a>.</p>
98
99<xmp class="code">>>> knn = orange.kNNLearner()
100>>> knn.k = 10
101>>> knn.distanceConstructor = orange.ExamplesDistanceConstructor_Hamming()
102>>> knn = knn(train)
103>>> for i in range(5):
104...     example = test.randomexample()
105...     print example.getclass(), knn(example)
106...
107Iris-virginica Iris-versicolor
108Iris-setosa Iris-setosa
109Iris-versicolor Iris-versicolor
110Iris-setosa Iris-setosa
111Iris-setosa Iris-setosa
112</xmp>
113
114<p>Still perfect. I've told you.</p> 
Note: See TracBrowser for help on using the repository browser.