source: orange/orange/Orange/classification/knn.py @ 9667:4a1214fdd5aa

Revision 9667:4a1214fdd5aa, 10.5 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Updated links to GetValue, ...

Line 
1"""
2.. index: k-nearest neighbors (kNN)
3.. index:
4   single: classification; k-nearest neighbors (kNN)
5   
6*****************************
7k-nearest neighbors (``knn``)
8*****************************
9
10The `nearest neighbors
11algorithm <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ is one of the most basic,
12`lazy <http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning algorithms.
13The learner only needs to store the instances of training data, while the classifier
14does all the work by searching this list for the instances most similar to
15the data instance being classified:
16
17.. literalinclude:: code/knnExample0.py
18
19.. class:: kNNLearner(k, distance_constructor, weight_id)
20
21    Lazy classifier that stores instances from the training set. Constructor
22    parameters set the corresponding attributes.
23
24    .. attribute:: k
25
26        number of nearest neighbors used in classification. If set to 0
27        (default), the square root of the numbers of instances is used.
28
29    .. attribute:: distance_constructor
30
31        component that constructs the object for measuring distances between
32        instances. Defaults to :class:`~Orange.distance.instances.EuclideanConstructor`.
33
34    .. attribute:: weight_id
35   
36        id of meta attribute with instance weights
37
38    .. attribute:: rank_weight
39
40        Enables weighting by ranks (default: :obj:`true`)
41
42    .. method:: __call__(instances)
43
44        Return a learned :class:`~kNNClassifier`. Learning consists of
45        constructing a distance measure and passing it to the classifier
46        along with :obj:`instances` and all attributes.
47
48        :param instances: training instances
49        :type instances: :class:`~Orange.data.Table`
50
51
52.. class:: kNNClassifier(domain, weight_id, k, find_nearest, rank_weight, n_examples)
53
54    .. method:: __call__(instance, return_type)
55
56        :param instance: given instance to be classified
57        :type instance: Orange.data.Instance
58       
59        :param return_type: return value and probabilities, only value or only
60                            probabilities
61        :type return_type: :obj:`~Orange.classification.Classifier.GetBoth`,
62                           :obj:`~Orange.classification.Classifier.GetValue`,
63                           :obj:`~Orange.classification.Classifier.GetProbabilities`
64       
65        :rtype: :class:`~Orange.data.Value`,
66              :class:`~Orange.statistics.distribution.Distribution` or a
67              tuple with both
68       
69    .. method:: find_nearest(instance)
70   
71    A component which finds the nearest neighbors of a given instance.
72       
73    :param instance: given instance
74    :type instance: :class:`~Orange.data.Instance`
75       
76    :rtype: :class:`Orange.data.Instance`
77   
78   
79    .. attribute:: k
80   
81        Number of neighbors. If set to 0 (which is also the default value),
82        the square root of the number of examples is used.
83   
84    .. attribute:: rank_weight
85   
86        Enables weighting by rank (default: :obj:`true`).
87   
88    .. attribute:: weight_id
89   
90        ID of meta attribute with weights of examples
91   
92    .. attribute:: n_examples
93   
94        The number of learning instances. It is used to compute the number of
95        neighbors if the value of :attr:`kNNClassifier.k` is zero.
96
97When called to classify an instance, the classifier first calls
98:meth:`kNNClassifier.find_nearest`
99to retrieve a list with :attr:`kNNClassifier.k` nearest neighbors. The
100component :meth:`kNNClassifier.find_nearest` has
101a stored table of instances (those that have been passed to the learner)
102together with their weights. If instances are weighted (non-zero
103:obj:`weight_ID`), weights are considered when counting the neighbors.
104
105If :meth:`kNNClassifier.find_nearest` returns only one neighbor
106(this is the case if :obj:`k=1`), :class:`kNNClassifier` returns the
107neighbor's class.
108
109Otherwise, the retrieved neighbors vote about the class prediction
110(or probability of classes). Voting has double weights. As first, if
111instances are weighted, their weights are respected. Secondly, nearer
112neighbors have a greater impact on the prediction; the weight of instance
113is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends
114on the setting of :obj:`rank_weight`.
115
116* if :obj:`rank_weight` is :obj:`false`, :obj:`t` is the distance from the
117  instance being classified
118* if :obj:`rank_weight` is :obj:`true`, neighbors are ordered and :obj:`t`
119  is the position of the neighbor on the list (a rank)
120
121
122In both cases, :obj:`s` is chosen so that the impact of the farthest instance
123is 0.001.
124
125Weighting gives the classifier a certain insensitivity to the number of
126neighbors used, making it possible to use large :obj:`k`'s.
127
128The classifier can treat continuous and discrete features, and can even
129distinguish between ordinal and nominal features. See information on
130distance measuring for details.
131
132Examples
133--------
134
135The learner will be tested on an 'iris' data set. The data will be split
136into training (80%) and testing (20%) instances. We will use the former
137for "training" the classifier and test it on five testing instances
138randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`):
139
140.. literalinclude:: code/knnExample1.py
141
142The output of this code is::
143   
144    Iris-setosa Iris-setosa
145    Iris-versicolor Iris-versicolor
146    Iris-versicolor Iris-versicolor
147    Iris-setosa Iris-setosa
148    Iris-setosa Iris-setosa
149
150The secret to kNN's success is that the instances in the iris data set appear in
151three well separated clusters. The classifier's accuracy will remain
152excellent even with a very large or very small number of neighbors.
153
154As many experiments have shown, a selection of instances of distance measures
155has neither a greater nor more predictable effect on the performance of kNN
156classifiers. Therefore there is not much point in changing the default. If you
157decide to do so, the distance_constructor must be set to an instance
158of one of the classes for distance measuring. This can be seen in the following
159part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`):
160
161.. literalinclude:: code/knnExample2.py
162
163The output of this code is::
164
165    Iris-virginica Iris-versicolor
166    Iris-setosa Iris-setosa
167    Iris-versicolor Iris-versicolor
168    Iris-setosa Iris-setosa
169    Iris-setosa Iris-setosa
170
171The result is still perfect.
172
173.. index: fnn
174
175
176Finding nearest neighbors
177-------------------------
178
179Orange provides classes for finding the nearest neighbors of a given
180reference instance. While we might add some smarter classes in the future, we
181now have only two - abstract classes that define the general behavior of
182neighbor searching classes, and classes that implement brute force search.
183
184As is the norm in Orange, there are a pair of classes: a class that does the work
185(:class:`FindNearest`) and a class that constructs it ("learning" - getting the
186instances and arranging them in an appropriate data structure that allows for
187searching) (:class:`FindNearestConstructor`).
188
189.. class:: FindNearest
190
191    A class for a brute force search for nearest neighbors. It stores a table
192    of instances (it's its own copy of instances, not only Orange.data.Table
193    with references to another Orange.data.Table). When asked for neighbors,
194    it measures distances to all instances, stores them in a heap and returns
195    the first k as an Orange.data.Table with references to instances stored in
196    FindNearest's field instances).
197   
198    .. attribute:: distance
199   
200        a component that measures the distance between examples
201   
202    .. attribute:: examples
203   
204        a stored list of instances
205   
206    .. attribute:: weight_ID
207   
208        ID of meta attribute with weight
209   
210    .. method:: __call__(instance, n)
211   
212    :param instance: given instance
213    :type instance: Orange.data.Instance
214   
215    :param n: number of neighbors
216    :type n: int
217   
218    :rtype: list of :obj:`Orange.data.Instance`
219   
220.. class:: FindNearestConstructor()
221
222   
223    A class that constructs FindNearest. It calls the inherited
224    distance_constructor, which constructs a distance measure.
225    The distance measure, along with the instances weight_ID and
226    distance_ID, is then passed to the just constructed instance
227    of FindNearest_BruteForce.
228
229    If there are more instances with the same distance fighting for the last
230    places, the tie is resolved by randomly picking the appropriate number of
231    instances. A local random generator is constructed and initiated by a
232    constant computed from the reference instance. The effect of this is that
233    the same random neighbors will be chosen for the instance each time
234    FindNearest_BruteForce
235    is called.
236   
237    .. attribute:: distance_constructor
238   
239        A component of class ExamplesDistanceConstructor that "learns" to
240        measure distances between instances. Learning can mean, for instances,
241        storing the ranges of continuous features or the number of values of
242        a discrete feature (see the page about measuring distances for more
243        information). The result of learning is an instance of
244        ExamplesDistance that should be used for measuring distances
245        between instances.
246   
247    .. attribute:: include_same
248   
249        Tells whether or not to include the examples that are same as the reference;
250        the default is true.
251   
252    .. method:: __call__(table, weightID, distanceID)
253   
254        Constructs an instance of FindNearest that would return neighbors of
255        a given instance, obeying weight_ID when counting them (also, some
256        measures of distance might consider weights as well) and storing the
257        distances in a meta attribute with ID distance_ID.
258   
259        :param table: table of instances
260        :type table: Orange.data.Table
261       
262        :param weight_ID: id of meta attribute with weights of instances
263        :type weight_ID: int
264       
265        :param distance_ID: id of meta attribute that will save distances
266        :type distance_ID: int
267       
268        :rtype: :class:`FindNearest`
269
270Examples
271--------
272
273The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`, uses :download:`lenses.tab <code/lenses.tab>`)
274shows how to find the five nearest neighbors of the first instance
275in the lenses dataset.
276
277.. literalinclude:: code/knnInstanceDistance.py
278
279"""
280
281from Orange.core import \
282            kNNLearner, \
283            FindNearest_BruteForce as FindNearest, \
284            FindNearestConstructor_BruteForce as FindNearestConstructor, \
285            kNNClassifier, \
286            P2NN
287            #FindNearest
288            #FindNearestConstructor
Note: See TracBrowser for help on using the repository browser.