Changeset 7295:52400425e0dc in orange

02/03/11 11:07:03 (3 years ago)
mocnik <mocnik@…>

-adding code examples for knn module
-adding code and documentation for knn module

5 added
1 edited


  • orange/Orange/classification/

    r7191 r7295  
    1 from orange import \ 
    2               kNNLearner, \ 
    3          FindNearest, \ 
    4               FindNearest_BruteForce, \ 
    5          FindNearestConstructor, \ 
    6               FindNearestConstructor_BruteForce, \ 
    7          kNNClassifier, \ 
    8          P2NN 
     2.. index: knn 
     4Module :mod:`orange.classification.knn` includes classes for classification based on nearest neighbours 
     5algorithm and also classes for finding instances near the given one.  
     8k Nearest Neighbours 
     11kNN is one of the simplest learning techniques - the learner only needs to  
     12store the instances, while the classifier does its work by observing the most  
     13similar instances of the instance to be classified. 
     15.. class:: kNNLearner(k, distanceConstructor, weightID) 
     17    :param instances: table of instances 
     18    :type instances: 
     20    :param k: number of nearest neighbours used in classification 
     21    :type k: int 
     23    :param weightID: id of meta attribute with instance weights 
     24    :type weightID: int 
     26    :rtype: :class:`kNNLearner` 
     28    .. method:: __call__(instances) 
     30        Return learned kNNClassifier 
     32        :param instances: table of instances 
     33        :type instances: 
     35        :rtype: :class:`kNNClassifier` 
     38    .. attribute:: k 
     40    Number of neighbours. If set to 0 (which is also the default value), the  
     41    square root of the number of instances is used. 
     43    .. attribute:: rankWeight 
     45    Enables weighting by ranks (default: :obj:`true`) 
     47    .. attribute:: distanceConstructor 
     49    A component that constructs the object for measuring distances between  
     50    instances. 
     52kNNLearner first constructs an object for measuring distances between  
     53instances. distanceConstructor is used if given; otherwise, Euclidean  
     54metrics will be used. :class:`kNNLearner` then constructs an instance of  
     55:class:`FindNearest_BruteForce`. Together with ID of meta feature with  
     56weights of instances, :obj:`k` and :obj:`rankWeight`, it is passed to a :class:`kNNClassifier`. 
     58.. class:: kNNClassifier(domain, weightID, k, FindNearest, rankWeight, nExamples) 
     60    .. method:: __call__(instance) 
     62        :param instance: given instance to be classified 
     63        :type instance: 
     65        :param return_type: return value and probabilities, only value or only probabilities 
     66        :type return_type: Orange.classifier.getBoth, Orange.classifier.getValue, Orange.classifier.getProbilities 
     68        :rtype: :class:``, :class:`Orange.statistics.distribution`, or a tuple with both 
     70    .. method:: findNearest(instance) 
     72    A component that finds nearest neighbours of a given instance. 
     74    :param instance: given instance 
     75    :type instance: 
     77    :rtype: :class:`` 
     80    .. attribute:: k 
     82    Number of neighbours. If set to 0 (which is also the default value),  
     83    the square root of the number of examples is used. 
     85    .. attribute:: rankWeight 
     87    Enables weighting by ranks (default: :obj:`true`). 
     89    .. attribute:: weightID 
     91    ID of meta attribute with weights of examples 
     93    .. attribute:: nExamples 
     95    The number of learning instances. It is used to compute the number of  
     96    neighbours if :obj:`k` is zero. 
     98When called to classify an instance, the classifier first calls  
     100to retrieve a list with :obj:`k` nearest neighbours. The component  
     101:meth:`kNNClassifier.findNearest` has  
     102a stored table of instances (those that have been passed to the learner)  
     103together with their weights. If instances are weighted (non-zero  
     104:obj:`weightID`), weights are considered when counting the neighbours. 
     106If :meth:`kNNClassifier.findNearest` returns only one neighbour  
     107(this is the case if k=1), :class:`kNNClassifier` returns the neighbour's 
     110Otherwise, the retrieved neighbours vote about the class prediction 
     111(or probability of classes). Voting has double weights. As first, if 
     112instances are weighted, their weights are respected. Secondly, nearer 
     113neighbours have greater impact on the prediction; weight of instance 
     114is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends 
     115on the setting of :obj:`rankWeight`. 
     117* if :obj:`rankWeight` is false, t is a distance from the instance being 
     118  classified 
     119* if :obj:`rankWeight` is true, neighbours are ordered and t is the position 
     120  of the neighbour on the list (a rank) 
     123In both cases, s is chosen so that the impact of the farthest instance is 
     126Weighting gives the classifier certain insensitivity to the number of 
     127neighbours used, making it possible to use large k's. 
     129The classifier can treat continuous and discrete features, and can even 
     130distinguish between ordinal and nominal features. See information on 
     131distance measuring for details. 
     136We will test the learner on 'iris' dataset. We shall split it onto train 
     137(80%) and test (20%) sets, learn on training instances and test on five 
     138randomly selected test instances. 
     140part of (``_, uses ``_) 
     142.. literalinclude:: code/ 
     144The output of this code is::  
     146    Iris-setosa Iris-setosa 
     147    Iris-versicolor Iris-versicolor 
     148    Iris-versicolor Iris-versicolor 
     149    Iris-setosa Iris-setosa 
     150    Iris-setosa Iris-setosa 
     152The secret of kNN's success is that the instances in iris dataset appear in 
     153three well separated clusters. The classifier's accuraccy will remain 
     154excellent even with very large or small number of neighbours. 
     156As many experiments have shown, a selection of instances distance measure 
     157does not have a greater and predictable effect on the performance of kNN 
     158classifiers. So there is not much point in changing the default. If you 
     159decide to do so, you need to set the distanceConstructor to an instance 
     160of one of the classes for distance measuring. 
     162.. literalinclude:: code/ 
     164The output of this code is:: 
     166    Iris-virginica Iris-versicolor 
     167    Iris-setosa Iris-setosa 
     168    Iris-versicolor Iris-versicolor 
     169    Iris-setosa Iris-setosa 
     170    Iris-setosa Iris-setosa 
     172The result is still perfect. 
     174.. code/ 
     175.. code/ 
     177.. index: fnn 
     180Finding Nearest Neighbours 
     183Orange provides classes for finding the nearest neighbours of the given 
     184reference instance. While we might add some smarter classes in future, we 
     185now have only two - abstract classes that defines the general behaviour of 
     186neighbour searching classes, and classes that implement brute force search. 
     188As usually in Orange, there is a pair of classes: a class that does the work 
     189(:class:`FindNearest`) and a class that constructs it ("learning" - getting the 
     190instances and arranging them in an appropriate data structure that allows for 
     191searching) (:class:`FindNearestConstructor`). 
     193.. class:: FindNearest 
     195    A class for brute force search for nearest neighbours. It stores a table of 
     196    instances (it's its own copy of instances, not only with 
     197    references to another When asked for neighbours, it 
     198    measures distances to all instances, stores them in a heap and returns the 
     199    first k as an with references to instances stored in 
     200    FindNearest's field instances). 
     202    .. attribute:: distance 
     204    a component that measures distance between examples 
     206    .. attribute:: examples 
     208    a stored list of instances 
     210    .. attribute:: weightID 
     212    ID of meta attribute with weight 
     214    .. method:: __call__(instance, n) 
     216    :param instance: given instance 
     217    :type instance: 
     219    :param n: number of neighbours 
     220    :type n: int 
     222    :rtype: list( 
     224.. class:: FindNearestConstructor() 
     226    A class that constructs FindNearest. It calls the inherited  
     227    distanceConstructor and then passes the constructed distance measure, 
     228    among with instances, weightIDand distanceID, to the just constructed 
     229    instance of FindNearest_BruteForce. 
     231    If there are more instances with the same distance fighting for the last 
     232    places, the tie is resolved by randomly picking the appropriate number of 
     233    instances. A local random generator is constructed and initiated by a constant 
     234    computed from the reference instance. The effect of this is that same random 
     235    neighbours will be chosen for the instance each time FindNearest_BruteForce 
     236    is called. 
     238    .. attribute:: distanceConstructor 
     240    A component of class ExamplesDistanceConstructor that "learns" to measure 
     241    distances between instances. Learning can be, for instances, storing the 
     242    ranges of continuous features or the number of value of a discrete feature 
     243    (see the page about measuring distances for more information). The result of 
     244    learning is an instance of ExamplesDistance that should be used for measuring 
     245    distances between instances. 
     247    .. attribute:: includeSame 
     249    Tells whether to include the examples that are same as the reference; default is true. 
     251    .. method:: __call__(table, weightID, distanceID) 
     253        Constructs an instance of FindNearest that would return neighbours of a 
     254        given instance, obeying weightID when counting them (also, some measures 
     255        of distance might consider weights as well) and store the distances in a 
     256        meta attribute with ID distanceID. 
     258        :param table: table of instances 
     259        :type table: 
     261        :param weightID: id of meta attribute with weights of instances 
     262        :type weightID: int 
     264        :param distanceID: id of meta attribute that will save distances 
     265        :type distanceID: int 
     267        :rtype: :class:`FindNearest` 
     272The following script shows how to find the five nearest neighbours of the 
     273first instance in the lenses dataset. 
     275(``_, uses ``_) 
     277.. literalinclude:: code/ 
     279.. code/ 
     280.. code/ 
     284from Orange.core import \ 
     285            kNNLearner, \ 
     286            FindNearest_BruteForce as FindNearest, \ 
     287            FindNearestConstructor_BruteForce as FindNearestConstructor, \ 
     288            kNNClassifier, \ 
     289            P2NN 
     290            #FindNearest 
     291            #FindNearestConstructor 
Note: See TracChangeset for help on using the changeset viewer.