source: orange/orange/Orange/classification/knn.py @ 7775:10dc848dc2e5

Revision 7775:10dc848dc2e5, 10.8 KB checked in by mocnik <mocnik@…>, 3 years ago (diff)

Changing documentation to use underscores.

Line 
1"""
2.. index: k-nearest neighbors (kNN)
3.. index:
4   single: classification; k-nearest neighbors (kNN)
5   
6*******************
7k-nearest neighbors
8*******************
9
10The module includes implementation of `nearest neighbors
11algorithm <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ and classes
12for finding nearest instances according to chosen distance metrics.
13
14k-nearest neighbor algorithm
15============================
16
17Nearest neighbors algorithm is one of most basic,
18`lazy <http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning algorithms.
19The learner only needs to store the training data instances, while the classifier
20does all the work by searching this list for the most similar instances for
21the data instance being classified:
22
23.. literalinclude:: code/knnExample0.py
24
25.. class:: kNNLearner(k, distanceConstructor, weightID)
26
27    :param instances: table of instances
28    :type instances: Orange.data.Table
29   
30    :param k: number of nearest neighbours used in classification
31    :type k: int
32   
33    :param weightID: id of meta attribute with instance weights
34    :type weightID: int
35   
36    :rtype: :class:`kNNLearner`
37   
38    .. method:: __call__(instances)
39       
40        Return instance of :class:`kNNClassifier` that learns from the
41        :obj:`instances`.
42       
43        :param instances: table of instances
44        :type instances: Orange.data.Table
45       
46        :rtype: :class:`kNNClassifier`
47
48
49    .. attribute:: k
50   
51        Number of neighbors. If set to 0 (which is also the default value),
52        the square root of the number of instances is used.
53   
54    .. attribute:: rank_weight
55   
56        Enables weighting by ranks (default: :obj:`true`)
57   
58    .. attribute:: distance_constructor
59   
60        component that constructs the object for measuring distances between
61        instances.
62
63kNNLearner first constructs an object for measuring distances between
64instances. distance_constructor is used if given; otherwise, Euclidean
65metrics will be used. :class:`kNNLearner` then constructs an instance of
66:class:`FindNearest_BruteForce`. Together with ID of meta feature with
67weights of instances, :attr:`kNNLearner.k` and :attr:`kNNLearner.rank_weight`,
68it is passed to a :class:`kNNClassifier`.
69
70.. class:: kNNClassifier(domain, weightID, k, FindNearest, rankWeight, \
71nExamples)
72
73    .. method:: __call__(instance)
74   
75        :param instance: given instance to be classified
76        :type instance: Orange.data.Instance
77       
78        :param return_type: return value and probabilities, only value or only
79                            probabilities
80        :type return_type: Orange.classification.Classifier.GetBoth,
81                           Orange.classification.Classifier.GetValue,
82                           Orange.classification.Classifier.GetProbilities
83       
84        :rtype: :class:`Orange.data.Value`,
85                :class:`Orange.statistics.distribution`, or a tuple with both
86       
87    .. method:: find_nearest(instance)
88   
89    A component that finds nearest neighbors of a given instance.
90       
91    :param instance: given instance
92    :type instance: Orange.data.Instance
93       
94    :rtype: :class:`Orange.data.Instance`
95   
96   
97    .. attribute:: k
98   
99        Number of neighbors. If set to 0 (which is also the default value),
100        the square root of the number of examples is used.
101   
102    .. attribute:: rank_weight
103   
104        Enables weighting by ranks (default: :obj:`true`).
105   
106    .. attribute:: weight_ID
107   
108        ID of meta attribute with weights of examples
109   
110    .. attribute:: n_examples
111   
112        The number of learning instances. It is used to compute the number of
113        neighbours if :attr:`kNNClassifier.k` is zero.
114
115When called to classify an instance, the classifier first calls
116:meth:`kNNClassifier.find_nearest`
117to retrieve a list with :attr:`kNNClassifier.k` nearest neighbors. The
118component :meth:`kNNClassifier.find_nearest` has
119a stored table of instances (those that have been passed to the learner)
120together with their weights. If instances are weighted (non-zero
121:obj:`weight_ID`), weights are considered when counting the neighbors.
122
123If :meth:`kNNClassifier.find_nearest` returns only one neighbor
124(this is the case if :obj:`k=1`), :class:`kNNClassifier` returns the
125neighbour's class.
126
127Otherwise, the retrieved neighbours vote about the class prediction
128(or probability of classes). Voting has double weights. As first, if
129instances are weighted, their weights are respected. Secondly, nearer
130neighbours have greater impact on the prediction; weight of instance
131is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends
132on the setting of :obj:`rank_weight`.
133
134* if :obj:`rank_weight` is :obj:`false`, :obj:`t` is a distance from the
135  instance being classified
136* if :obj:`rank_weight` is :obj:`true`, neighbors are ordered and :obj:`t`
137  is the position of the neighbor on the list (a rank)
138
139
140In both cases, :obj:`s` is chosen so that the impact of the farthest instance
141is 0.001.
142
143Weighting gives the classifier certain insensitivity to the number of
144neighbors used, making it possible to use large :obj:`k`'s.
145
146The classifier can treat continuous and discrete features, and can even
147distinguish between ordinal and nominal features. See information on
148distance measuring for details.
149
150Examples
151--------
152
153We will test the learner on 'iris' data set. We shall split it onto train
154(80%) and test (20%) sets, learn on training instances and test on five
155randomly selected test instances, in part of
156(`knnlearner.py`_, uses `iris.tab`_):
157
158.. literalinclude:: code/knnExample1.py
159
160The output of this code is::
161   
162    Iris-setosa Iris-setosa
163    Iris-versicolor Iris-versicolor
164    Iris-versicolor Iris-versicolor
165    Iris-setosa Iris-setosa
166    Iris-setosa Iris-setosa
167
168The secret of kNN's success is that the instances in iris data set appear in
169three well separated clusters. The classifier's accuracy will remain
170excellent even with very large or small number of neighbors.
171
172As many experiments have shown, a selection of instances distance measure
173does not have a greater and predictable effect on the performance of kNN
174classifiers. So there is not much point in changing the default. If you
175decide to do so, you need to set the distance_constructor to an instance
176of one of the classes for distance measuring. This can be seen in the following
177part of (`knnlearner.py`_, uses `iris.tab`_):
178
179.. literalinclude:: code/knnExample2.py
180
181The output of this code is::
182
183    Iris-virginica Iris-versicolor
184    Iris-setosa Iris-setosa
185    Iris-versicolor Iris-versicolor
186    Iris-setosa Iris-setosa
187    Iris-setosa Iris-setosa
188
189The result is still perfect.
190
191.. _iris.tab: code/iris.tab
192.. _knnlearner.py: code/knnlearner.py
193
194.. index: fnn
195
196
197Finding nearest neighbors
198=========================
199
200Orange provides classes for finding the nearest neighbors of the given
201reference instance. While we might add some smarter classes in future, we
202now have only two - abstract classes that defines the general behavior of
203neighbor searching classes, and classes that implement brute force search.
204
205As usually in Orange, there is a pair of classes: a class that does the work
206(:class:`FindNearest`) and a class that constructs it ("learning" - getting the
207instances and arranging them in an appropriate data structure that allows for
208searching) (:class:`FindNearestConstructor`).
209
210.. class:: FindNearest
211
212    A class for brute force search for nearest neighbours. It stores a table
213    of instances (it's its own copy of instances, not only Orange.data.Table
214    with references to another Orange.data.Table). When asked for neighbours,
215    it measures distances to all instances, stores them in a heap and returns
216    the first k as an Orange.data.Table with references to instances stored in
217    FindNearest's field instances).
218   
219    .. attribute:: distance
220   
221        a component that measures distance between examples
222   
223    .. attribute:: examples
224   
225        a stored list of instances
226   
227    .. attribute:: weight_ID
228   
229        ID of meta attribute with weight
230   
231    .. method:: __call__(instance, n)
232   
233    :param instance: given instance
234    :type instance: Orange.data.Instance
235   
236    :param n: number of neighbours
237    :type n: int
238   
239    :rtype: list of :obj:`Orange.data.Instance`
240   
241.. class:: FindNearestConstructor()
242
243    A class that constructs FindNearest. It calls the inherited
244    distance_constructor and then passes the constructed distance measure,
245    among with instances, weight_ID and distance_ID, to the just constructed
246    instance of FindNearest_BruteForce.
247   
248    If there are more instances with the same distance fighting for the last
249    places, the tie is resolved by randomly picking the appropriate number of
250    instances. A local random generator is constructed and initiated by a
251    constant computed from the reference instance. The effect of this is that
252    same random neighbours will be chosen for the instance each time
253    FindNearest_BruteForce
254    is called.
255   
256    .. attribute:: distance_constructor
257   
258        A component of class ExamplesDistanceConstructor that "learns" to
259        measure distances between instances. Learning can be, for instances,
260        storing the ranges of continuous features or the number of value of
261        a discrete feature (see the page about measuring distances for more
262        information). The result of learning is an instance of
263        ExamplesDistance that should be used for measuring distances
264        between instances.
265   
266    .. attribute:: include_same
267   
268        Tells whether to include the examples that are same as the reference;
269        default is true.
270   
271    .. method:: __call__(table, weightID, distanceID)
272   
273        Constructs an instance of FindNearest that would return neighbours of
274        a given instance, obeying weight_ID when counting them (also, some
275        measures of distance might consider weights as well) and store the
276        distances in a meta attribute with ID distance_ID.
277   
278        :param table: table of instances
279        :type table: Orange.data.Table
280       
281        :param weight_ID: id of meta attribute with weights of instances
282        :type weight_ID: int
283       
284        :param distance_ID: id of meta attribute that will save distances
285        :type distance_ID: int
286       
287        :rtype: :class:`FindNearest`
288
289Examples
290--------
291
292The following script (`knnInstanceDistance.py`_, uses `lenses.tab`_)
293shows how to find the five nearest neighbours of the first instance
294in the lenses dataset.
295
296.. literalinclude:: code/knnInstanceDistance.py
297
298.. _lenses.tab: code/lenses.tab
299.. _knnInstanceDistance.py: code/knnInstanceDistance.py
300
301"""
302
303from Orange.core import \
304            kNNLearner, \
305            FindNearest_BruteForce as FindNearest, \
306            FindNearestConstructor_BruteForce as FindNearestConstructor, \
307            kNNClassifier, \
308            P2NN
309            #FindNearest
310            #FindNearestConstructor
Note: See TracBrowser for help on using the repository browser.