Changeset 7926:8695214d4fe0 in orange


Ignore:
Timestamp:
05/22/11 11:15:21 (3 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Convert:
40135c93164a4d8e1a3fe9b298ea8721d16832f6
Message:

Josh's corrections of documentation

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/knn.py

    r7775 r7926  
    88******************* 
    99 
    10 The module includes implementation of `nearest neighbors  
     10The module includes implementation of the `nearest neighbors  
    1111algorithm <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_ and classes 
    12 for finding nearest instances according to chosen distance metrics. 
     12for finding the nearest instances according to chosen distance metrics. 
    1313 
    1414k-nearest neighbor algorithm 
    1515============================ 
    1616 
    17 Nearest neighbors algorithm is one of most basic,  
     17The nearest neighbors algorithm is one of the most basic,  
    1818`lazy <http://en.wikipedia.org/wiki/Lazy_learning>`_ machine learning algorithms. 
    19 The learner only needs to store the training data instances, while the classifier 
    20 does all the work by searching this list for the most similar instances for  
     19The learner only needs to store the instances of training data, while the classifier 
     20does all the work by searching this list for the instances most similar to  
    2121the data instance being classified: 
    2222 
     
    2828    :type instances: Orange.data.Table 
    2929     
    30     :param k: number of nearest neighbours used in classification 
     30    :param k: number of nearest neighbors used in classification 
    3131    :type k: int 
    3232     
     
    6464instances. distance_constructor is used if given; otherwise, Euclidean  
    6565metrics will be used. :class:`kNNLearner` then constructs an instance of  
    66 :class:`FindNearest_BruteForce`. Together with ID of meta feature with  
     66:class:`FindNearest_BruteForce`. Together with the ID of the meta feature with  
    6767weights of instances, :attr:`kNNLearner.k` and :attr:`kNNLearner.rank_weight`, 
    6868it is passed to a :class:`kNNClassifier`. 
     
    8787    .. method:: find_nearest(instance) 
    8888     
    89     A component that finds nearest neighbors of a given instance. 
     89    A component which finds the nearest neighbors of a given instance. 
    9090         
    9191    :param instance: given instance 
     
    102102    .. attribute:: rank_weight 
    103103     
    104         Enables weighting by ranks (default: :obj:`true`). 
     104        Enables weighting by rank (default: :obj:`true`). 
    105105     
    106106    .. attribute:: weight_ID 
     
    111111     
    112112        The number of learning instances. It is used to compute the number of  
    113         neighbours if :attr:`kNNClassifier.k` is zero. 
     113        neighbors if the value of :attr:`kNNClassifier.k` is zero. 
    114114 
    115115When called to classify an instance, the classifier first calls  
     
    123123If :meth:`kNNClassifier.find_nearest` returns only one neighbor  
    124124(this is the case if :obj:`k=1`), :class:`kNNClassifier` returns the 
    125 neighbour's class. 
    126  
    127 Otherwise, the retrieved neighbours vote about the class prediction 
     125neighbor's class. 
     126 
     127Otherwise, the retrieved neighbors vote about the class prediction 
    128128(or probability of classes). Voting has double weights. As first, if 
    129129instances are weighted, their weights are respected. Secondly, nearer 
    130 neighbours have greater impact on the prediction; weight of instance 
     130neighbors have a greater impact on the prediction; the weight of instance 
    131131is computed as exp(-t:sup:`2`/s:sup:`2`), where the meaning of t depends 
    132132on the setting of :obj:`rank_weight`. 
    133133 
    134 * if :obj:`rank_weight` is :obj:`false`, :obj:`t` is a distance from the 
     134* if :obj:`rank_weight` is :obj:`false`, :obj:`t` is the distance from the 
    135135  instance being classified 
    136136* if :obj:`rank_weight` is :obj:`true`, neighbors are ordered and :obj:`t` 
     
    141141is 0.001. 
    142142 
    143 Weighting gives the classifier certain insensitivity to the number of 
     143Weighting gives the classifier a certain insensitivity to the number of 
    144144neighbors used, making it possible to use large :obj:`k`'s. 
    145145 
     
    151151-------- 
    152152 
    153 We will test the learner on 'iris' data set. We shall split it onto train 
    154 (80%) and test (20%) sets, learn on training instances and test on five 
    155 randomly selected test instances, in part of  
    156 (`knnlearner.py`_, uses `iris.tab`_): 
     153The learner will be tested on an 'iris' data set. The data will be split  
     154into training (80%) and testing (20%) instances. We will use the former  
     155for "training" the classifier and test it on five testing instances  
     156randomly selected from a part of (`knnlearner.py`_, uses `iris.tab`_): 
    157157 
    158158.. literalinclude:: code/knnExample1.py 
     
    166166    Iris-setosa Iris-setosa 
    167167 
    168 The secret of kNN's success is that the instances in iris data set appear in 
     168The secret to kNN's success is that the instances in the iris data set appear in 
    169169three well separated clusters. The classifier's accuracy will remain 
    170 excellent even with very large or small number of neighbors. 
    171  
    172 As many experiments have shown, a selection of instances distance measure 
    173 does not have a greater and predictable effect on the performance of kNN 
    174 classifiers. So there is not much point in changing the default. If you 
    175 decide to do so, you need to set the distance_constructor to an instance 
     170excellent even with a very large or very small number of neighbors. 
     171 
     172As many experiments have shown, a selection of instances of distance measures 
     173has neither a greater nor more predictable effect on the performance of kNN 
     174classifiers. Therefore there is not much point in changing the default. If you 
     175decide to do so, the distance_constructor must be set to an instance 
    176176of one of the classes for distance measuring. This can be seen in the following 
    177177part of (`knnlearner.py`_, uses `iris.tab`_): 
     
    198198========================= 
    199199 
    200 Orange provides classes for finding the nearest neighbors of the given 
    201 reference instance. While we might add some smarter classes in future, we 
    202 now have only two - abstract classes that defines the general behavior of 
     200Orange provides classes for finding the nearest neighbors of a given 
     201reference instance. While we might add some smarter classes in the future, we 
     202now have only two - abstract classes that define the general behavior of 
    203203neighbor searching classes, and classes that implement brute force search. 
    204204 
    205 As usually in Orange, there is a pair of classes: a class that does the work 
     205As is the norm in Orange, there are a pair of classes: a class that does the work 
    206206(:class:`FindNearest`) and a class that constructs it ("learning" - getting the 
    207207instances and arranging them in an appropriate data structure that allows for 
     
    210210.. class:: FindNearest 
    211211 
    212     A class for brute force search for nearest neighbours. It stores a table  
     212    A class for a brute force search for nearest neighbors. It stores a table  
    213213    of instances (it's its own copy of instances, not only Orange.data.Table 
    214     with references to another Orange.data.Table). When asked for neighbours, 
     214    with references to another Orange.data.Table). When asked for neighbors, 
    215215    it measures distances to all instances, stores them in a heap and returns  
    216216    the first k as an Orange.data.Table with references to instances stored in 
     
    219219    .. attribute:: distance 
    220220     
    221         a component that measures distance between examples 
     221        a component that measures the distance between examples 
    222222     
    223223    .. attribute:: examples 
     
    234234    :type instance: Orange.data.Instance 
    235235     
    236     :param n: number of neighbours 
     236    :param n: number of neighbors 
    237237    :type n: int 
    238238     
     
    241241.. class:: FindNearestConstructor() 
    242242 
    243     A class that constructs FindNearest. It calls the inherited  
    244     distance_constructor and then passes the constructed distance measure, 
    245     among with instances, weight_ID and distance_ID, to the just constructed 
    246     instance of FindNearest_BruteForce. 
    247      
     243     
     244    A class that constructs FindNearest. It calls the inherited 
     245    distance_constructor, which constructs a distance measure. 
     246    The distance measure, along with the instances weight_ID and 
     247    distance_ID, is then passed to the just constructed instance 
     248    of FindNearest_BruteForce. 
     249 
    248250    If there are more instances with the same distance fighting for the last 
    249251    places, the tie is resolved by randomly picking the appropriate number of 
    250252    instances. A local random generator is constructed and initiated by a 
    251253    constant computed from the reference instance. The effect of this is that 
    252     same random neighbours will be chosen for the instance each time 
     254    the same random neighbors will be chosen for the instance each time 
    253255    FindNearest_BruteForce 
    254256    is called. 
     
    257259     
    258260        A component of class ExamplesDistanceConstructor that "learns" to 
    259         measure distances between instances. Learning can be, for instances, 
    260         storing the ranges of continuous features or the number of value of 
     261        measure distances between instances. Learning can mean, for instances, 
     262        storing the ranges of continuous features or the number of values of 
    261263        a discrete feature (see the page about measuring distances for more 
    262264        information). The result of learning is an instance of  
     
    266268    .. attribute:: include_same 
    267269     
    268         Tells whether to include the examples that are same as the reference; 
    269         default is true. 
     270        Tells whether or not to include the examples that are same as the reference; 
     271        the default is true. 
    270272     
    271273    .. method:: __call__(table, weightID, distanceID) 
    272274     
    273         Constructs an instance of FindNearest that would return neighbours of 
     275        Constructs an instance of FindNearest that would return neighbors of 
    274276        a given instance, obeying weight_ID when counting them (also, some  
    275         measures of distance might consider weights as well) and store the  
     277        measures of distance might consider weights as well) and storing the  
    276278        distances in a meta attribute with ID distance_ID. 
    277279     
     
    291293 
    292294The following script (`knnInstanceDistance.py`_, uses `lenses.tab`_)  
    293 shows how to find the five nearest neighbours of the first instance 
     295shows how to find the five nearest neighbors of the first instance 
    294296in the lenses dataset. 
    295297 
Note: See TracChangeset for help on using the changeset viewer.