source: orange/docs/reference/rst/Orange.classification.knn.rst @ 11787:e71e4ba3ead7

Revision 11787:e71e4ba3ead7, 9.1 KB checked in by Ales Erjavec <ales.erjavec@…>, 5 months ago (diff)

knn documentation fix.

Line 
1.. py:currentmodule:: Orange.classification.knn
2
3.. index: k-nearest neighbors (kNN)
4.. index:
5   single: classification; k-nearest neighbors (kNN)
6   
7*****************************
8k-nearest neighbors (``knn``)
9*****************************
10
11The `nearest neighbors algorithm
12<http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ is one
13of the most basic, `lazy
14<http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning
15algorithms.  The learner only stores the training data, and the
16classifier makes predictions based on the instances most similar to
17the data instance being classified:
18
19.. literalinclude:: code/knnExample0.py
20
21.. class:: kNNLearner(k, distance_constructor, weight_id)
22
23    Lazy classifier that stores instances from the training set. Constructor
24    parameters set the corresponding attributes.
25
26    .. attribute:: k
27
28        Number of nearest neighbors used in classification. If 0
29        (default), the square root of the numbers of instances is
30        used.
31
32    .. attribute:: distance_constructor
33
34        Component that constructs the object for measuring distances between
35        instances. Defaults to :class:`~Orange.distance.Euclidean`.
36
37    .. attribute:: weight_id
38   
39        Id of meta attribute with instance weights.
40
41    .. attribute:: rank_weight
42
43        If ``True`` (default), neighbours are weighted according to
44        their order and not their (normalized) distances to the
45        instance that is being classified.
46
47    .. method:: __call__(data)
48
49        Return a :class:`~kNNClassifier`. Learning consists of
50        constructing a distance measure and passing it to the
51        classifier along with :obj:`instances` and attributes (:obj:`k`,
52        :obj:`rank_weight` and :obj:`weight_id`).
53
54        :param instances: training instances
55        :type instances: :class:`~Orange.data.Table`
56
57
58.. class:: kNNClassifier(domain, weight_id, k, find_nearest, rank_weight, n_examples)
59
60    .. method:: __call__(instance, return_type)
61
62        :param instance: given instance to be classified
63        :type instance: Orange.data.Instance
64       
65        :param return_type: return value and probabilities, only value or only
66                            probabilities
67        :type return_type: :obj:`~Orange.classification.Classifier.GetBoth`,
68                           :obj:`~Orange.classification.Classifier.GetValue`,
69                           :obj:`~Orange.classification.Classifier.GetProbabilities`
70       
71        :rtype: :class:`~Orange.data.Value`,
72              :class:`~Orange.statistics.distribution.Distribution` or a
73              tuple with both
74       
75    .. attribute:: find_nearest
76   
77        A callable component that finds the nearest :obj:`k` neighbors
78        of the given instance.
79       
80        :param instance: given instance
81        :type instance: :class:`~Orange.data.Instance`
82        :rtype: :class:`Orange.data.Instance`
83   
84    .. attribute:: k
85   
86        Number of neighbors. If set to 0 (which is also the default value),
87        the square root of the number of examples is used.
88   
89    .. attribute:: weight_id
90   
91        Id of meta attribute with instance weights.
92
93    .. attribute:: rank_weight
94
95        If ``True`` (default), neighbours are weighted according to
96        their order and not their (normalized) distances to the
97        instance that is being classified.
98
99    .. attribute:: n_examples
100   
101        The number of learning instances, used to compute the number of
102        neighbors if the value of :attr:`kNNClassifier.k` is zero.
103
104When called to classify instance ``inst``, the classifier first calls
105:obj:`kNNClassifier.find_nearest(inst)` to retrieve a list with
106:attr:`kNNClassifier.k` nearest neighbors. The component
107:meth:`kNNClassifier.find_nearest` has a stored table of training
108instances together with their weights. If instances are weighted
109(non-zero :obj:`weight_id`), weights are considered when counting the
110neighbors.
111
112If :meth:`kNNClassifier.find_nearest` returns only one neighbor (this
113is the case if :obj:`k=1`), :class:`kNNClassifier` returns the
114neighbor's class.
115
116Otherwise, the retrieved neighbors vote for the class prediction or
117probability of classes. Voting can be a product of two weights:
118weights of training instances, if they are given, and weights that
119reflect the distance from ``inst``. Nearer neighbors have a greater
120impact on the prediction: the weight is computed as
121:math:`-exp(-t^2 / s^2)` , where the meaning of `t` depends on the
122setting of :obj:`rank_weight`.
123
124* if :obj:`rank_weight` is :obj:`False`, :obj:`t` is the distance from the
125  instance being classified
126* if :obj:`rank_weight` is :obj:`True`, neighbors are ordered and :obj:`t`
127  is the position of the neighbor on the list (a rank)
128
129In both cases, :obj:`s` is chosen so that the weight of the farthest
130instance is 0.001.
131
132Weighting gives the classifier a certain insensitivity to the number of
133neighbors used, making it possible to use large :obj:`k`'s.
134
135The classifier can use continuous and discrete features, and can even
136distinguish between ordinal and nominal features. See information on
137distance measuring for details.
138
139Examples
140--------
141
142The learner will be tested on an 'iris' data set. The data will be split
143into training (80%) and testing (20%) instances. We will use the former
144for "training" the classifier and test it on five testing instances
145randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`):
146
147.. literalinclude:: code/knnExample1.py
148
149The output of this code is::
150   
151    Iris-setosa Iris-setosa
152    Iris-versicolor Iris-versicolor
153    Iris-versicolor Iris-versicolor
154    Iris-setosa Iris-setosa
155    Iris-setosa Iris-setosa
156
157The choice of metric usually has not greater impact on the performance
158of kNN classifiers, so default should work fine. To change it,
159distance_constructor must be set to an instance of one of the classes
160for distance measuring.
161
162.. literalinclude:: code/knnExample2.py
163    :lines: 4-7
164
165.. index: fnn
166
167
168Finding nearest neighbors
169-------------------------
170
171Orange provides classes for finding the nearest neighbors of a given
172reference instance.
173
174As usual in Orange, there are two classes: one that does the work
175(:class:`FindNearest`) and another that constructs the former from
176data (:class:`FindNearestConstructor`).
177
178.. class:: FindNearest
179
180    Brute force search for nearest neighbors in the stored data table.
181   
182    .. attribute:: distance
183   
184        An instance of :obj:`Orange.distance.Distance` used for
185        computing distances between data instances.
186   
187    .. attribute:: instances
188   
189        Stored data table
190   
191    .. attribute:: weight_ID
192   
193        ID of meta attribute with weight. If present (non-null) the
194        class does not return ``k`` instances but a set of instances
195        with a total weight of ``k``.
196
197    .. attribute:: distance_ID
198
199        The id of meta attribute that will be added to the found
200        neighbours and to store the distances between the returned
201        data instances and the reference. If zero, the distances is
202        not stored.
203   
204    .. method:: __call__(instance, k)
205   
206        Return a data table with ``k`` nearest neighbours of
207    ``instance``.  Any ties for the last place(s) are resolved by
208    randomly picking the appropriate number of instances. A local
209    random generator is constructed and seeded by a constant
210    computed from :obj:`instance`, so the same random neighbors
211    are always returned for the same instance.
212
213    :param instance: given instance
214    :type instance: Orange.data.Instance
215
216    :param k: number of neighbors
217    :type k: int
218
219    :rtype: :obj:`Orange.data.Table`
220   
221.. class:: FindNearestConstructor()
222
223    A class that constructs :obj:`FindNearest` and initializes it with a
224    distance metric, constructed by :obj:`distance_constructor`.
225   
226    .. attribute:: distance_constructor
227   
228        An instance of :obj:`Orange.distance.DistanceConstructor` that
229        "learns" to measure distances between instances. Learning can
230        mean, for example, storing the ranges of continuous features
231        or the number of values of a discrete feature. The result of
232        learning is an instance of :obj:`Orange.distance.Distance` that is
233        used for measuring distances between instances.
234   
235    .. attribute:: include_same
236   
237        Indicates whether to include the instances that are same as
238        the reference; default is ``true``.
239   
240    .. method:: __call__(data, weight_ID, distance_ID)
241   
242        Constructs an instance of :obj:`FindNearest` for the given
243        data. Arguments :obj:`weight_ID` and :obj:`distance_ID` are copied to the new object.
244
245        :param table: table of instances
246        :type table: Orange.data.Table
247       
248        :param weight_ID: id of meta attribute with weights of instances
249        :type weight_ID: int
250       
251        :param distance_ID: id of meta attribute that will store distances
252        :type distance_ID: int
253       
254        :rtype: :obj:`FindNearest`
255
256Examples
257--------
258
259The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`)
260shows how to find the five nearest neighbors of the first instance
261in the lenses dataset.
262
263.. literalinclude:: code/knnInstanceDistance.py
264
265
266.. automodule:: Orange.classification.knn
Note: See TracBrowser for help on using the repository browser.