source: orange/orange/Orange/classification/knn.py @ 9349:fa13a2c52fcd

Revision 9349:fa13a2c52fcd, 11.1 KB checked in by mitar, 2 years ago (diff)

Changed way of linking to code in documentation.

Line 
1"""
2.. index: k-nearest neighbors (kNN)
3.. index:
4   single: classification; k-nearest neighbors (kNN)
5   
6*****************************
7k-nearest neighbors (``knn``)
8*****************************
9
10This module includes implementation of the `nearest neighbors
11algorithm <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ and classes
12for finding the nearest instances according to chosen distance metrics.
13
14k-nearest neighbor algorithm
15============================
16
17The nearest neighbors algorithm is one of the most basic,
18`lazy <http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning algorithms.
19The learner only needs to store the instances of training data, while the classifier
20does all the work by searching this list for the instances most similar to
21the data instance being classified:
22
23.. literalinclude:: code/knnExample0.py
24
25.. class:: kNNLearner(k, distanceConstructor, weightID)
26
27    :param instances: table of instances
28    :type instances: Orange.data.Table
29   
30    :param k: number of nearest neighbors used in classification
31    :type k: int
32   
33    :param weightID: id of meta attribute with instance weights
34    :type weightID: int
35   
36    :rtype: :class:`kNNLearner`
37   
38    .. method:: __call__(instances)
39       
40        Return instance of :class:`kNNClassifier` that learns from the
41        :obj:`instances`.
42       
43        :param instances: table of instances
44        :type instances: Orange.data.Table
45       
46        :rtype: :class:`kNNClassifier`
47
48
49    .. attribute:: k
50   
51        Number of neighbors. If set to 0 (which is also the default value),
52        the square root of the number of instances is used.
53   
54    .. attribute:: rank_weight
55   
56        Enables weighting by ranks (default: :obj:`true`)
57   
58    .. attribute:: distance_constructor
59   
60        component that constructs the object for measuring distances between
61        instances.
62
63kNNLearner first constructs an object for measuring distances between
64instances. distance_constructor is used if given; otherwise, Euclidean
65metrics will be used. :class:`kNNLearner` then constructs an instance of
66:class:`FindNearest_BruteForce`. Together with the ID of the meta feature with
67weights of instances, :attr:`kNNLearner.k` and :attr:`kNNLearner.rank_weight`,
68it is passed to a :class:`kNNClassifier`.
69
70.. class:: kNNClassifier(domain, weightID, k, FindNearest, rankWeight, \
71nExamples)
72
73    .. method:: __call__(instance)
74   
75        :param instance: given instance to be classified
76        :type instance: Orange.data.Instance
77       
78        :param return_type: return value and probabilities, only value or only
79                            probabilities
80        :type return_type: Orange.classification.Classifier.GetBoth,
81                           Orange.classification.Classifier.GetValue,
82                           Orange.classification.Classifier.GetProbilities
83       
84        :rtype: :class:`Orange.data.Value`,
85                :class:`Orange.statistics.distribution`, or a tuple with both
86       
87    .. method:: find_nearest(instance)
88   
89    A component which finds the nearest neighbors of a given instance.
90       
91    :param instance: given instance
92    :type instance: Orange.data.Instance
93       
94    :rtype: :class:`Orange.data.Instance`
95   
96   
97    .. attribute:: k
98   
99        Number of neighbors. If set to 0 (which is also the default value),
100        the square root of the number of examples is used.
101   
102    .. attribute:: rank_weight
103   
104        Enables weighting by rank (default: :obj:`true`).
105   
106    .. attribute:: weight_ID
107   
108        ID of meta attribute with weights of examples
109   
110    .. attribute:: n_examples
111   
112        The number of learning instances. It is used to compute the number of
113        neighbors if the value of :attr:`kNNClassifier.k` is zero.
114
115When called to classify an instance, the classifier first calls
116:meth:`kNNClassifier.find_nearest`
117to retrieve a list with :attr:`kNNClassifier.k` nearest neighbors. The
118component :meth:`kNNClassifier.find_nearest` has
119a stored table of instances (those that have been passed to the learner)
120together with their weights. If instances are weighted (non-zero
121:obj:`weight_ID`), weights are considered when counting the neighbors.
122
123If :meth:`kNNClassifier.find_nearest` returns only one neighbor
124(this is the case if :obj:`k=1`), :class:`kNNClassifier` returns the
125neighbor's class.
126
127Otherwise, the retrieved neighbors vote about the class prediction
128(or probability of classes). Voting has double weights. As first, if
129instances are weighted, their weights are respected. Secondly, nearer
130neighbors have a greater impact on the prediction; the weight of instance
131is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends
132on the setting of :obj:`rank_weight`.
133
134* if :obj:`rank_weight` is :obj:`false`, :obj:`t` is the distance from the
135  instance being classified
136* if :obj:`rank_weight` is :obj:`true`, neighbors are ordered and :obj:`t`
137  is the position of the neighbor on the list (a rank)
138
139
140In both cases, :obj:`s` is chosen so that the impact of the farthest instance
141is 0.001.
142
143Weighting gives the classifier a certain insensitivity to the number of
144neighbors used, making it possible to use large :obj:`k`'s.
145
146The classifier can treat continuous and discrete features, and can even
147distinguish between ordinal and nominal features. See information on
148distance measuring for details.
149
150Examples
151--------
152
153The learner will be tested on an 'iris' data set. The data will be split
154into training (80%) and testing (20%) instances. We will use the former
155for "training" the classifier and test it on five testing instances
156randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`):
157
158.. literalinclude:: code/knnExample1.py
159
160The output of this code is::
161   
162    Iris-setosa Iris-setosa
163    Iris-versicolor Iris-versicolor
164    Iris-versicolor Iris-versicolor
165    Iris-setosa Iris-setosa
166    Iris-setosa Iris-setosa
167
168The secret to kNN's success is that the instances in the iris data set appear in
169three well separated clusters. The classifier's accuracy will remain
170excellent even with a very large or very small number of neighbors.
171
172As many experiments have shown, a selection of instances of distance measures
173has neither a greater nor more predictable effect on the performance of kNN
174classifiers. Therefore there is not much point in changing the default. If you
175decide to do so, the distance_constructor must be set to an instance
176of one of the classes for distance measuring. This can be seen in the following
177part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`):
178
179.. literalinclude:: code/knnExample2.py
180
181The output of this code is::
182
183    Iris-virginica Iris-versicolor
184    Iris-setosa Iris-setosa
185    Iris-versicolor Iris-versicolor
186    Iris-setosa Iris-setosa
187    Iris-setosa Iris-setosa
188
189The result is still perfect.
190
191.. index: fnn
192
193
194Finding nearest neighbors
195=========================
196
197Orange provides classes for finding the nearest neighbors of a given
198reference instance. While we might add some smarter classes in the future, we
199now have only two - abstract classes that define the general behavior of
200neighbor searching classes, and classes that implement brute force search.
201
202As is the norm in Orange, there are a pair of classes: a class that does the work
203(:class:`FindNearest`) and a class that constructs it ("learning" - getting the
204instances and arranging them in an appropriate data structure that allows for
205searching) (:class:`FindNearestConstructor`).
206
207.. class:: FindNearest
208
209    A class for a brute force search for nearest neighbors. It stores a table
210    of instances (it's its own copy of instances, not only Orange.data.Table
211    with references to another Orange.data.Table). When asked for neighbors,
212    it measures distances to all instances, stores them in a heap and returns
213    the first k as an Orange.data.Table with references to instances stored in
214    FindNearest's field instances).
215   
216    .. attribute:: distance
217   
218        a component that measures the distance between examples
219   
220    .. attribute:: examples
221   
222        a stored list of instances
223   
224    .. attribute:: weight_ID
225   
226        ID of meta attribute with weight
227   
228    .. method:: __call__(instance, n)
229   
230    :param instance: given instance
231    :type instance: Orange.data.Instance
232   
233    :param n: number of neighbors
234    :type n: int
235   
236    :rtype: list of :obj:`Orange.data.Instance`
237   
238.. class:: FindNearestConstructor()
239
240   
241    A class that constructs FindNearest. It calls the inherited
242    distance_constructor, which constructs a distance measure.
243    The distance measure, along with the instances weight_ID and
244    distance_ID, is then passed to the just constructed instance
245    of FindNearest_BruteForce.
246
247    If there are more instances with the same distance fighting for the last
248    places, the tie is resolved by randomly picking the appropriate number of
249    instances. A local random generator is constructed and initiated by a
250    constant computed from the reference instance. The effect of this is that
251    the same random neighbors will be chosen for the instance each time
252    FindNearest_BruteForce
253    is called.
254   
255    .. attribute:: distance_constructor
256   
257        A component of class ExamplesDistanceConstructor that "learns" to
258        measure distances between instances. Learning can mean, for instances,
259        storing the ranges of continuous features or the number of values of
260        a discrete feature (see the page about measuring distances for more
261        information). The result of learning is an instance of
262        ExamplesDistance that should be used for measuring distances
263        between instances.
264   
265    .. attribute:: include_same
266   
267        Tells whether or not to include the examples that are same as the reference;
268        the default is true.
269   
270    .. method:: __call__(table, weightID, distanceID)
271   
272        Constructs an instance of FindNearest that would return neighbors of
273        a given instance, obeying weight_ID when counting them (also, some
274        measures of distance might consider weights as well) and storing the
275        distances in a meta attribute with ID distance_ID.
276   
277        :param table: table of instances
278        :type table: Orange.data.Table
279       
280        :param weight_ID: id of meta attribute with weights of instances
281        :type weight_ID: int
282       
283        :param distance_ID: id of meta attribute that will save distances
284        :type distance_ID: int
285       
286        :rtype: :class:`FindNearest`
287
288Examples
289--------
290
291The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`, uses :download:`lenses.tab <code/lenses.tab>`)
292shows how to find the five nearest neighbors of the first instance
293in the lenses dataset.
294
295.. literalinclude:: code/knnInstanceDistance.py
296
297"""
298
299from Orange.core import \
300            kNNLearner, \
301            FindNearest_BruteForce as FindNearest, \
302            FindNearestConstructor_BruteForce as FindNearestConstructor, \
303            kNNClassifier, \
304            P2NN
305            #FindNearest
306            #FindNearestConstructor
Note: See TracBrowser for help on using the repository browser.