Changeset 7295:52400425e0dc in orange


Ignore:
Timestamp:
02/03/11 11:07:03 (3 years ago)
Author:
mocnik <mocnik@…>
Branch:
default
Convert:
3a1863febfe2e38b8270feb6daf02a38ef5a5f80
Message:

-adding code examples for knn module
-adding code and documentation for knn module

Location:
orange
Files:
5 added
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/knn.py

    r7191 r7295  
    1 from orange import \ 
    2               kNNLearner, \ 
    3          FindNearest, \ 
    4               FindNearest_BruteForce, \ 
    5          FindNearestConstructor, \ 
    6               FindNearestConstructor_BruteForce, \ 
    7          kNNClassifier, \ 
    8          P2NN 
     1""" 
     2.. index: knn 
     3 
     4Module :mod:`orange.classification.knn` includes classes for classification based on nearest neighbours 
     5algorithm and also classes for finding instances near the given one.  
     6 
     7==================== 
     8k Nearest Neighbours 
     9==================== 
     10 
     11kNN is one of the simplest learning techniques - the learner only needs to  
     12store the instances, while the classifier does its work by observing the most  
     13similar instances of the instance to be classified. 
     14 
     15.. class:: kNNLearner(k, distanceConstructor, weightID) 
     16 
     17    :param instances: table of instances 
     18    :type instances: Orange.data.Table 
     19     
     20    :param k: number of nearest neighbours used in classification 
     21    :type k: int 
     22     
     23    :param weightID: id of meta attribute with instance weights 
     24    :type weightID: int 
     25     
     26    :rtype: :class:`kNNLearner` 
     27     
     28    .. method:: __call__(instances) 
     29         
     30        Return learned kNNClassifier 
     31         
     32        :param instances: table of instances 
     33        :type instances: Orange.data.Table 
     34         
     35        :rtype: :class:`kNNClassifier` 
     36 
     37 
     38    .. attribute:: k 
     39     
     40    Number of neighbours. If set to 0 (which is also the default value), the  
     41    square root of the number of instances is used. 
     42     
     43    .. attribute:: rankWeight 
     44     
     45    Enables weighting by ranks (default: :obj:`true`) 
     46     
     47    .. attribute:: distanceConstructor 
     48     
     49    A component that constructs the object for measuring distances between  
     50    instances. 
     51 
     52kNNLearner first constructs an object for measuring distances between  
     53instances. distanceConstructor is used if given; otherwise, Euclidean  
     54metrics will be used. :class:`kNNLearner` then constructs an instance of  
     55:class:`FindNearest_BruteForce`. Together with ID of meta feature with  
     56weights of instances, :obj:`k` and :obj:`rankWeight`, it is passed to a :class:`kNNClassifier`. 
     57 
     58.. class:: kNNClassifier(domain, weightID, k, FindNearest, rankWeight, nExamples) 
     59 
     60    .. method:: __call__(instance) 
     61     
     62        :param instance: given instance to be classified 
     63        :type instance: Orange.data.Instance 
     64         
     65        :param return_type: return value and probabilities, only value or only probabilities 
     66        :type return_type: Orange.classifier.getBoth, Orange.classifier.getValue, Orange.classifier.getProbilities 
     67         
     68        :rtype: :class:`Orange.data.Value`, :class:`Orange.statistics.distribution`, or a tuple with both 
     69         
     70    .. method:: findNearest(instance) 
     71     
     72    A component that finds nearest neighbours of a given instance. 
     73         
     74    :param instance: given instance 
     75    :type instance: Orange.data.instance 
     76         
     77    :rtype: :class:`Orange.data.instance` 
     78     
     79     
     80    .. attribute:: k 
     81     
     82    Number of neighbours. If set to 0 (which is also the default value),  
     83    the square root of the number of examples is used. 
     84     
     85    .. attribute:: rankWeight 
     86     
     87    Enables weighting by ranks (default: :obj:`true`). 
     88     
     89    .. attribute:: weightID 
     90     
     91    ID of meta attribute with weights of examples 
     92     
     93    .. attribute:: nExamples 
     94     
     95    The number of learning instances. It is used to compute the number of  
     96    neighbours if :obj:`k` is zero. 
     97 
     98When called to classify an instance, the classifier first calls  
     99:meth:`kNNClassifier.findNearest`  
     100to retrieve a list with :obj:`k` nearest neighbours. The component  
     101:meth:`kNNClassifier.findNearest` has  
     102a stored table of instances (those that have been passed to the learner)  
     103together with their weights. If instances are weighted (non-zero  
     104:obj:`weightID`), weights are considered when counting the neighbours. 
     105 
     106If :meth:`kNNClassifier.findNearest` returns only one neighbour  
     107(this is the case if k=1), :class:`kNNClassifier` returns the neighbour's 
     108class. 
     109 
     110Otherwise, the retrieved neighbours vote about the class prediction 
     111(or probability of classes). Voting has double weights. As first, if 
     112instances are weighted, their weights are respected. Secondly, nearer 
     113neighbours have greater impact on the prediction; weight of instance 
     114is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends 
     115on the setting of :obj:`rankWeight`. 
     116 
     117* if :obj:`rankWeight` is false, t is a distance from the instance being 
     118  classified 
     119* if :obj:`rankWeight` is true, neighbours are ordered and t is the position 
     120  of the neighbour on the list (a rank) 
     121 
     122 
     123In both cases, s is chosen so that the impact of the farthest instance is 
     1240.001. 
     125 
     126Weighting gives the classifier certain insensitivity to the number of 
     127neighbours used, making it possible to use large k's. 
     128 
     129The classifier can treat continuous and discrete features, and can even 
     130distinguish between ordinal and nominal features. See information on 
     131distance measuring for details. 
     132 
     133Examples 
     134======== 
     135 
     136We will test the learner on 'iris' dataset. We shall split it onto train 
     137(80%) and test (20%) sets, learn on training instances and test on five 
     138randomly selected test instances. 
     139 
     140part of (`knnlearner.py`_, uses `iris.tab`_) 
     141 
     142.. literalinclude:: code/knnExample1.py 
     143 
     144The output of this code is::  
     145     
     146    Iris-setosa Iris-setosa 
     147    Iris-versicolor Iris-versicolor 
     148    Iris-versicolor Iris-versicolor 
     149    Iris-setosa Iris-setosa 
     150    Iris-setosa Iris-setosa 
     151 
     152The secret of kNN's success is that the instances in iris dataset appear in 
     153three well separated clusters. The classifier's accuraccy will remain 
     154excellent even with very large or small number of neighbours. 
     155 
     156As many experiments have shown, a selection of instances distance measure 
     157does not have a greater and predictable effect on the performance of kNN 
     158classifiers. So there is not much point in changing the default. If you 
     159decide to do so, you need to set the distanceConstructor to an instance 
     160of one of the classes for distance measuring. 
     161 
     162.. literalinclude:: code/knnExample2.py 
     163 
     164The output of this code is:: 
     165 
     166    Iris-virginica Iris-versicolor 
     167    Iris-setosa Iris-setosa 
     168    Iris-versicolor Iris-versicolor 
     169    Iris-setosa Iris-setosa 
     170    Iris-setosa Iris-setosa 
     171 
     172The result is still perfect. 
     173 
     174.. _iris.tab: code/iris.tab 
     175.. _knnlearner.py: code/knnlearner.py 
     176 
     177.. index: fnn 
     178 
     179========================== 
     180Finding Nearest Neighbours 
     181========================== 
     182 
     183Orange provides classes for finding the nearest neighbours of the given 
     184reference instance. While we might add some smarter classes in future, we 
     185now have only two - abstract classes that defines the general behaviour of 
     186neighbour searching classes, and classes that implement brute force search. 
     187 
     188As usually in Orange, there is a pair of classes: a class that does the work 
     189(:class:`FindNearest`) and a class that constructs it ("learning" - getting the 
     190instances and arranging them in an appropriate data structure that allows for 
     191searching) (:class:`FindNearestConstructor`). 
     192 
     193.. class:: FindNearest 
     194 
     195    A class for brute force search for nearest neighbours. It stores a table of 
     196    instances (it's its own copy of instances, not only Orange.data.Table with 
     197    references to another Orange.data.Table). When asked for neighbours, it 
     198    measures distances to all instances, stores them in a heap and returns the 
     199    first k as an Orange.data.Table with references to instances stored in 
     200    FindNearest's field instances). 
     201     
     202    .. attribute:: distance 
     203     
     204    a component that measures distance between examples 
     205     
     206    .. attribute:: examples 
     207     
     208    a stored list of instances 
     209     
     210    .. attribute:: weightID 
     211     
     212    ID of meta attribute with weight 
     213     
     214    .. method:: __call__(instance, n) 
     215     
     216    :param instance: given instance 
     217    :type instance: Orange.data.instance 
     218     
     219    :param n: number of neighbours 
     220    :type n: int 
     221     
     222    :rtype: list(Orange.data.instance) 
     223     
     224.. class:: FindNearestConstructor() 
     225 
     226    A class that constructs FindNearest. It calls the inherited  
     227    distanceConstructor and then passes the constructed distance measure, 
     228    among with instances, weightIDand distanceID, to the just constructed 
     229    instance of FindNearest_BruteForce. 
     230     
     231    If there are more instances with the same distance fighting for the last 
     232    places, the tie is resolved by randomly picking the appropriate number of 
     233    instances. A local random generator is constructed and initiated by a constant 
     234    computed from the reference instance. The effect of this is that same random 
     235    neighbours will be chosen for the instance each time FindNearest_BruteForce 
     236    is called. 
     237     
     238    .. attribute:: distanceConstructor 
     239     
     240    A component of class ExamplesDistanceConstructor that "learns" to measure 
     241    distances between instances. Learning can be, for instances, storing the 
     242    ranges of continuous features or the number of value of a discrete feature 
     243    (see the page about measuring distances for more information). The result of 
     244    learning is an instance of ExamplesDistance that should be used for measuring 
     245    distances between instances. 
     246     
     247    .. attribute:: includeSame 
     248     
     249    Tells whether to include the examples that are same as the reference; default is true. 
     250     
     251    .. method:: __call__(table, weightID, distanceID) 
     252     
     253        Constructs an instance of FindNearest that would return neighbours of a 
     254        given instance, obeying weightID when counting them (also, some measures 
     255        of distance might consider weights as well) and store the distances in a 
     256        meta attribute with ID distanceID. 
     257     
     258        :param table: table of instances 
     259        :type table: Orange.data.Table 
     260         
     261        :param weightID: id of meta attribute with weights of instances 
     262        :type weightID: int 
     263         
     264        :param distanceID: id of meta attribute that will save distances 
     265        :type distanceID: int 
     266         
     267        :rtype: :class:`FindNearest` 
     268 
     269Example 
     270======= 
     271 
     272The following script shows how to find the five nearest neighbours of the 
     273first instance in the lenses dataset. 
     274 
     275(`knnInstanceDistance.py`_, uses `lenses.tab`_) 
     276 
     277.. literalinclude:: code/knnInstanceDistance.py 
     278 
     279.. _lenses.tab: code/lenses.tab 
     280.. _knnInstanceDistance.py: code/knnInstanceDistance.py 
     281 
     282""" 
     283 
     284from Orange.core import \ 
     285            kNNLearner, \ 
     286            FindNearest_BruteForce as FindNearest, \ 
     287            FindNearestConstructor_BruteForce as FindNearestConstructor, \ 
     288            kNNClassifier, \ 
     289            P2NN 
     290            #FindNearest 
     291            #FindNearestConstructor 
Note: See TracChangeset for help on using the changeset viewer.