source: orange/docs/widgets/rst/classify/knearestneighbours.rst @ 11050:e3c4699ca155

Revision 11050:e3c4699ca155, 3.2 KB checked in by Miha Stajdohar <miha.stajdohar@…>, 16 months ago (diff)

Widget docs From HTML to Sphinx.

Line 
1.. _k-Nearest Neighbours:
2
3k-Nearest Neighbours Learner
4============================
5
6.. image:: ../icons/k-NearestNeighbours.png
7
8k-Nearest Neighbours (kNN) learner
9
10Signals
11-------
12
13Inputs:
14
15
16   - Examples (ExampleTable)
17      A table with training examples
18
19
20Outputs:
21
22   - Learner
23      The kNN learning algorithm with settings as specified in the dialog.
24
25   - KNN Classifier
26      Trained classifier (a subtype of Classifier)
27
28
29Signal :code:`KNN Classifier` sends data only if the learning data (signal :code:`Examples` is present.
30
31Description
32-----------
33
34This widget provides a graphical interface to the k-Nearest Neighbours classifier.
35
36As all widgets for classification, it provides a learner and classifier on the output. Learner is a learning algorithm with settings as specified by the user. It can be fed into widgets for testing learners, for instance :code:`Test Learners`. Classifier is a kNN Classifier (a subtype of a general classifier), built from the training examples on the input. If examples are not given, there is no classifier on the output.
37
38.. image:: images/k-NearestNeighbours.png
39   :alt: k-Nearest Neighbours Widget
40
41Learner can be given a name under which it will appear in, say, :code:`Test Learners`. The default name is "kNN".
42
43Then, you can set the :obj:`Number of neighbours`. Neighbours are weighted by their proximity to the example being classified, so there's no harm in using ten or twenty examples as neighbours. Weights use a Gaussian kernel, so that the last neighbour has a weight of 0.001. If you check :obj:`Weighting by ranks, not distances`, the weighting formula will use the rank of the neighbour instead of its distance to the reference example.
44
45The :obj:`Metrics` you can use are Euclidean, Hamming (the number of attributes in which the two examples differ - not suitable for continuous attributes), Manhattan (the sum of absolute differences for all attributes) and Maximal (the maximal difference between attributes).
46
47If you check :obj:`Normalize continuous attributes`, their values will be divided by their span (on the training data). This ensures that all continuous attributes have equal impact, independent of their original scale.
48
49If you use Euclidean distance leave :obj:`Ignore unknown values` unchecked. The corresponding class for measuring distances will compute the distributions of attribute values and return statistically valid distance estimations.
50
51If you use other metrics and have missing values in the data, imputation may be the optimal way to go, since other measures don't have any such treatment of unknowns. If you don't impute, you can either :obj:`Ignore unknown values`, which treats all missing values as wildcards (so they are equivalent to any other attribute value). If you leave it unchecked, "don't cares" are wildcards, and "don't knows" as different from all values.
52
53When you change one or more settings, you need to push :obj:`Apply`, which will put the new learner on the output and, if the training examples are given, construct a new classifier and output it as well.
54
55
56Examples
57--------
58
59This schema compares the results of k-Nearest neighbours with the default classifier which always predicts the majority class
60
61.. image:: images/Majority-Knn-SchemaLearner.png
62   :alt: k-Nearest Neighbours Classifier
Note: See TracBrowser for help on using the repository browser.