source: orange/orange/Orange/classification/knn.py @ 9665:50d563eac2b2

Revision 9665:50d563eac2b2, 10.5 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Updated documentation.

Line 
1"""
2.. index: k-nearest neighbors (kNN)
3.. index:
4   single: classification; k-nearest neighbors (kNN)
5   
6*****************************
7k-nearest neighbors (``knn``)
8*****************************
9
10The `nearest neighbors
11algorithm <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ is one of the most basic,
12`lazy <http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning algorithms.
13The learner only needs to store the instances of training data, while the classifier
14does all the work by searching this list for the instances most similar to
15the data instance being classified:
16
17
18.. literalinclude:: code/knnExample0.py
19
20.. class:: kNNLearner(k, distance_constructor, weight_id)
21
22    Lazy classifier that stores instances from the training set. Constructor
23    parameters set the corresponding attributes.
24
25    .. attribute:: k
26
27        number of nearest neighbors used in classification. If set to 0
28        (default), the square root of the numbers of instances is used.
29
30    .. attribute:: distance_constructor
31
32        component that constructs the object for measuring distances between
33        instances. Defaults to :class:`~Orange.distance.instances.EuclideanConstructor`.
34
35    .. attribute:: weight_id
36   
37        id of meta attribute with instance weights
38
39    .. attribute:: rank_weight
40
41        Enables weighting by ranks (default: :obj:`true`)
42
43    .. method:: __call__(instances)
44
45        Return a learned :class:`~kNNClassifier`. Learning consists of
46        constructing a distance measure and passing it to the classifier
47        along with :obj:`instances` and all attributes.
48
49        :param instances: training instances
50        :type instances: :class:`~Orange.data.Table`
51
52
53.. class:: kNNClassifier(domain, weight_id, k, find_nearest, rank_weight, \
54n_examples)
55
56    .. method:: __call__(instance)
57   
58        :param instance: given instance to be classified
59        :type instance: Orange.data.Instance
60       
61        :param return_type: return value and probabilities, only value or only
62                            probabilities
63        :type return_type: :class:`Orange.classification.Classifier.GetBoth`,
64                           :class:`Orange.classification.Classifier.GetValue`,
65                           :class:`Orange.classification.Classifier.GetProbabilities`
66       
67        :rtype: :class:`~Orange.data.Value`,
68              :class:`~Orange.statistics.distribution.Distribution` or a
69              tuple with both
70       
71    .. method:: find_nearest(instance)
72   
73    A component which finds the nearest neighbors of a given instance.
74       
75    :param instance: given instance
76    :type instance: :class:`~Orange.data.Instance`
77       
78    :rtype: :class:`Orange.data.Instance`
79   
80   
81    .. attribute:: k
82   
83        Number of neighbors. If set to 0 (which is also the default value),
84        the square root of the number of examples is used.
85   
86    .. attribute:: rank_weight
87   
88        Enables weighting by rank (default: :obj:`true`).
89   
90    .. attribute:: weight_id
91   
92        ID of meta attribute with weights of examples
93   
94    .. attribute:: n_examples
95   
96        The number of learning instances. It is used to compute the number of
97        neighbors if the value of :attr:`kNNClassifier.k` is zero.
98
99When called to classify an instance, the classifier first calls
100:meth:`kNNClassifier.find_nearest`
101to retrieve a list with :attr:`kNNClassifier.k` nearest neighbors. The
102component :meth:`kNNClassifier.find_nearest` has
103a stored table of instances (those that have been passed to the learner)
104together with their weights. If instances are weighted (non-zero
105:obj:`weight_ID`), weights are considered when counting the neighbors.
106
107If :meth:`kNNClassifier.find_nearest` returns only one neighbor
108(this is the case if :obj:`k=1`), :class:`kNNClassifier` returns the
109neighbor's class.
110
111Otherwise, the retrieved neighbors vote about the class prediction
112(or probability of classes). Voting has double weights. As first, if
113instances are weighted, their weights are respected. Secondly, nearer
114neighbors have a greater impact on the prediction; the weight of instance
115is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends
116on the setting of :obj:`rank_weight`.
117
118* if :obj:`rank_weight` is :obj:`false`, :obj:`t` is the distance from the
119  instance being classified
120* if :obj:`rank_weight` is :obj:`true`, neighbors are ordered and :obj:`t`
121  is the position of the neighbor on the list (a rank)
122
123
124In both cases, :obj:`s` is chosen so that the impact of the farthest instance
125is 0.001.
126
127Weighting gives the classifier a certain insensitivity to the number of
128neighbors used, making it possible to use large :obj:`k`'s.
129
130The classifier can treat continuous and discrete features, and can even
131distinguish between ordinal and nominal features. See information on
132distance measuring for details.
133
134Examples
135--------
136
137The learner will be tested on an 'iris' data set. The data will be split
138into training (80%) and testing (20%) instances. We will use the former
139for "training" the classifier and test it on five testing instances
140randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`):
141
142.. literalinclude:: code/knnExample1.py
143
144The output of this code is::
145   
146    Iris-setosa Iris-setosa
147    Iris-versicolor Iris-versicolor
148    Iris-versicolor Iris-versicolor
149    Iris-setosa Iris-setosa
150    Iris-setosa Iris-setosa
151
152The secret to kNN's success is that the instances in the iris data set appear in
153three well separated clusters. The classifier's accuracy will remain
154excellent even with a very large or very small number of neighbors.
155
156As many experiments have shown, a selection of instances of distance measures
157has neither a greater nor more predictable effect on the performance of kNN
158classifiers. Therefore there is not much point in changing the default. If you
159decide to do so, the distance_constructor must be set to an instance
160of one of the classes for distance measuring. This can be seen in the following
161part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`):
162
163.. literalinclude:: code/knnExample2.py
164
165The output of this code is::
166
167    Iris-virginica Iris-versicolor
168    Iris-setosa Iris-setosa
169    Iris-versicolor Iris-versicolor
170    Iris-setosa Iris-setosa
171    Iris-setosa Iris-setosa
172
173The result is still perfect.
174
175.. index: fnn
176
177
178Finding nearest neighbors
179-------------------------
180
181Orange provides classes for finding the nearest neighbors of a given
182reference instance. While we might add some smarter classes in the future, we
183now have only two - abstract classes that define the general behavior of
184neighbor searching classes, and classes that implement brute force search.
185
186As is the norm in Orange, there are a pair of classes: a class that does the work
187(:class:`FindNearest`) and a class that constructs it ("learning" - getting the
188instances and arranging them in an appropriate data structure that allows for
189searching) (:class:`FindNearestConstructor`).
190
191.. class:: FindNearest
192
193    A class for a brute force search for nearest neighbors. It stores a table
194    of instances (it's its own copy of instances, not only Orange.data.Table
195    with references to another Orange.data.Table). When asked for neighbors,
196    it measures distances to all instances, stores them in a heap and returns
197    the first k as an Orange.data.Table with references to instances stored in
198    FindNearest's field instances).
199   
200    .. attribute:: distance
201   
202        a component that measures the distance between examples
203   
204    .. attribute:: examples
205   
206        a stored list of instances
207   
208    .. attribute:: weight_ID
209   
210        ID of meta attribute with weight
211   
212    .. method:: __call__(instance, n)
213   
214    :param instance: given instance
215    :type instance: Orange.data.Instance
216   
217    :param n: number of neighbors
218    :type n: int
219   
220    :rtype: list of :obj:`Orange.data.Instance`
221   
222.. class:: FindNearestConstructor()
223
224   
225    A class that constructs FindNearest. It calls the inherited
226    distance_constructor, which constructs a distance measure.
227    The distance measure, along with the instances weight_ID and
228    distance_ID, is then passed to the just constructed instance
229    of FindNearest_BruteForce.
230
231    If there are more instances with the same distance fighting for the last
232    places, the tie is resolved by randomly picking the appropriate number of
233    instances. A local random generator is constructed and initiated by a
234    constant computed from the reference instance. The effect of this is that
235    the same random neighbors will be chosen for the instance each time
236    FindNearest_BruteForce
237    is called.
238   
239    .. attribute:: distance_constructor
240   
241        A component of class ExamplesDistanceConstructor that "learns" to
242        measure distances between instances. Learning can mean, for instances,
243        storing the ranges of continuous features or the number of values of
244        a discrete feature (see the page about measuring distances for more
245        information). The result of learning is an instance of
246        ExamplesDistance that should be used for measuring distances
247        between instances.
248   
249    .. attribute:: include_same
250   
251        Tells whether or not to include the examples that are same as the reference;
252        the default is true.
253   
254    .. method:: __call__(table, weightID, distanceID)
255   
256        Constructs an instance of FindNearest that would return neighbors of
257        a given instance, obeying weight_ID when counting them (also, some
258        measures of distance might consider weights as well) and storing the
259        distances in a meta attribute with ID distance_ID.
260   
261        :param table: table of instances
262        :type table: Orange.data.Table
263       
264        :param weight_ID: id of meta attribute with weights of instances
265        :type weight_ID: int
266       
267        :param distance_ID: id of meta attribute that will save distances
268        :type distance_ID: int
269       
270        :rtype: :class:`FindNearest`
271
272Examples
273--------
274
275The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`, uses :download:`lenses.tab <code/lenses.tab>`)
276shows how to find the five nearest neighbors of the first instance
277in the lenses dataset.
278
279.. literalinclude:: code/knnInstanceDistance.py
280
281"""
282
283from Orange.core import \
284            kNNLearner, \
285            FindNearest_BruteForce as FindNearest, \
286            FindNearestConstructor_BruteForce as FindNearestConstructor, \
287            kNNClassifier, \
288            P2NN
289            #FindNearest
290            #FindNearestConstructor
Note: See TracBrowser for help on using the repository browser.