source: orange/docs/widgets/rst/classify/knearestneighbours.rst @ 11778:ecd4beec2099

Revision 11778:ecd4beec2099, 3.3 KB checked in by Ales Erjavec <ales.erjavec@…>, 5 months ago (diff)

Use new SVG icons in the widget documentation.

Line 
1.. _k-Nearest Neighbours:
2
3k-Nearest Neighbours Learner
4============================
5
6.. image:: ../../../../Orange/OrangeWidgets/Classify/icons/kNearestNeighbours.svg
7
8k-Nearest Neighbours (kNN) learner
9
10Signals
11-------
12
13Inputs:
14
15
16   - Examples (ExampleTable)
17      A table with training examples
18
19
20Outputs:
21
22   - Learner
23      The kNN learning algorithm with settings as specified in the dialog.
24
25   - KNN Classifier
26      Trained classifier (a subtype of Classifier)
27
28
29Signal :obj:`KNN Classifier` sends data only if the learning data (signal
30:obj:`Examples` is present.
31
32Description
33-----------
34
35This widget provides a graphical interface to the k-Nearest Neighbours
36classifier.
37
38As all widgets for classification, it provides a learner and classifier
39on the output. Learner is a learning algorithm with settings as specified
40by the user. It can be fed into widgets for testing learners, for instance
41:ref:`Test Learners`. Classifier is a kNN Classifier (a subtype of a general
42classifier), built from the training examples on the input. If examples are
43not given, there is no classifier on the output.
44
45.. image:: images/k-NearestNeighbours.png
46   :alt: k-Nearest Neighbours Widget
47
48Learner can be given a name under which it will appear in, say,
49:ref:`Test Learners`. The default name is "kNN".
50
51Then, you can set the :obj:`Number of neighbours`. Neighbours are weighted
52by their proximity to the example being classified, so there's no harm in
53using ten or twenty examples as neighbours. Weights use a Gaussian kernel,
54so that the last neighbour has a weight of 0.001. If you check
55:obj:`Weighting by ranks, not distances`, the weighting formula will
56use the rank of the neighbour instead of its distance to the reference
57example.
58
59The :obj:`Metrics` you can use are Euclidean, Hamming (the number of
60attributes in which the two examples differ - not suitable for continuous
61attributes), Manhattan (the sum of absolute differences for all attributes)
62and Maximal (the maximal difference between attributes).
63
64If you check :obj:`Normalize continuous attributes`, their values will be
65divided by their span (on the training data). This ensures that all
66continuous attributes have equal impact, independent of their original scale.
67
68If you use Euclidean distance leave :obj:`Ignore unknown values`
69unchecked. The corresponding class for measuring distances will compute
70the distributions of attribute values and return statistically valid distance
71estimations.
72
73If you use other metrics and have missing values in the data, imputation
74may be the optimal way to go, since other measures don't have any such
75treatment of unknowns. If you don't impute, you can either
76:obj:`Ignore unknown values`, which treats all missing values as wildcards
77(so they are equivalent to any other attribute value). If you leave it
78unchecked, "don't cares" are wildcards, and "don't knows" as different
79from all values.
80
81When you change one or more settings, you need to push :obj:`Apply`, which
82will put the new learner on the output and, if the training examples are
83given, construct a new classifier and output it as well.
84
85
86Examples
87--------
88
89This schema compares the results of k-Nearest neighbours with the default
90classifier which always predicts the majority class
91
92.. image:: images/Majority-Knn-SchemaLearner.png
93   :alt: k-Nearest Neighbours Classifier
Note: See TracBrowser for help on using the repository browser.