Ignore:
Timestamp:
02/05/12 23:04:46 (2 years ago)
Author:
anze <anze.staric@…>
Branch:
default
rebase_source:
cc88516d07302e9201ec67de38e99cb60ee1b6a6
Message:

Moved instance distance documentation to rst file.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/distance/instances.py

    r9125 r9639  
    1 """ 
    2  
    3 ########################### 
    4 Distances between Instances 
    5 ########################### 
    6  
    7 This page describes a bunch of classes for different metrics for measure 
    8 distances (dissimilarities) between instances. 
    9  
    10 Typical (although not all) measures of distance between instances require 
    11 some "learning" - adjusting the measure to the data. For instance, when 
    12 the dataset contains continuous features, the distances between continuous 
    13 values should be normalized, e.g. by dividing the distance with the range 
    14 of possible values or with some interquartile distance to ensure that all 
    15 features have, in principle, similar impacts. 
    16  
    17 Different measures of distance thus appear in pairs - a class that measures 
    18 the distance and a class that constructs it based on the data. The abstract 
    19 classes representing such a pair are `ExamplesDistance` and 
    20 `ExamplesDistanceConstructor`. 
    21  
    22 Since most measures work on normalized distances between corresponding 
    23 features, there is an abstract intermediate class 
    24 `ExamplesDistance_Normalized` that takes care of normalizing. 
    25 The remaining classes correspond to different ways of defining the distances, 
    26 such as Manhattan or Euclidean distance. 
    27  
    28 Unknown values are treated correctly only by Euclidean and Relief distance. 
    29 For other measure of distance, a distance between unknown and known or between 
    30 two unknown values is always 0.5. 
    31  
    32 .. class:: ExamplesDistance 
    33  
    34     .. method:: __call__(instance1, instance2) 
    35  
    36         Returns a distance between the given instances as floating point number.  
    37  
    38 .. class:: ExamplesDistanceConstructor 
    39  
    40     .. method:: __call__([instances, weightID][, distributions][, basic_var_stat]) 
    41  
    42         Constructs an instance of ExamplesDistance. 
    43         Not all the data needs to be given. Most measures can be constructed 
    44         from basic_var_stat; if it is not given, they can help themselves 
    45         either by instances or distributions. 
    46         Some (e.g. ExamplesDistance_Hamming) even do not need any arguments. 
    47  
    48 .. class:: ExamplesDistance_Normalized 
    49  
    50     This abstract class provides a function which is given two instances 
    51     and returns a list of normalized distances between values of their 
    52     features. Many distance measuring classes need such a function and are 
    53     therefore derived from this class 
    54  
    55     .. attribute:: normalizers 
    56      
    57         A precomputed list of normalizing factors for feature values 
    58          
    59         - If a factor positive, differences in feature's values 
    60           are multiplied by it; for continuous features the factor 
    61           would be 1/(max_value-min_value) and for ordinal features 
    62           the factor is 1/number_of_values. If either (or both) of 
    63           features are unknown, the distance is 0.5 
    64         - If a factor is -1, the feature is nominal; the distance 
    65           between two values is 0 if they are same (or at least 
    66           one is unknown) and 1 if they are different. 
    67         - If a factor is 0, the feature is ignored. 
    68  
    69     .. attribute:: bases, averages, variances 
    70  
    71         The minimal values, averages and variances 
    72         (continuous features only) 
    73  
    74     .. attribute:: domainVersion 
    75  
    76         Stores a domain version for which the normalizers were computed. 
    77         The domain version is increased each time a domain description is 
    78         changed (i.e. features are added or removed); this is used for a quick 
    79         check that the user is not attempting to measure distances between 
    80         instances that do not correspond to normalizers. 
    81         Since domains are practicably immutable (especially from Python), 
    82         you don't need to care about this anyway.  
    83  
    84     .. method:: attributeDistances(instance1, instance2) 
    85  
    86         Returns a list of floats representing distances between pairs of 
    87         feature values of the two instances. 
    88  
    89  
    90 .. class:: Hamming, HammingConstructor 
    91  
    92     Hamming distance between two instances is defined as the number of 
    93     features in which the two instances differ. Note that this measure 
    94     is not really appropriate for instances that contain continuous features. 
    95  
    96  
    97 .. class:: Maximal, MaximalConstructor 
    98  
    99     The maximal between two instances is defined as the maximal distance 
    100     between two feature values. If dist is the result of 
    101     ExamplesDistance_Normalized.attributeDistances, 
    102     then Maximal returns max(dist). 
    103  
    104  
    105 .. class:: Manhattan, ManhattanConstructor 
    106  
    107     Manhattan distance between two instances is a sum of absolute values 
    108     of distances between pairs of features, e.g. ``apply(add, [abs(x) for x in dist])`` 
    109     where dist is the result of ExamplesDistance_Normalized.attributeDistances. 
    110  
    111 .. class:: Euclidean, EuclideanConstructor 
    112  
    113  
    114     Euclidean distance is a square root of sum of squared per-feature distances, 
    115     i.e. ``sqrt(apply(add, [x*x for x in dist]))``, where dist is the result of 
    116     ExamplesDistance_Normalized.attributeDistances. 
    117  
    118     .. method:: distributions  
    119  
    120         An object of type 
    121         :obj:`Orange.statistics.distribution.Distribution` that holds 
    122         the distributions for all discrete features used for 
    123         computation of distances between known and unknown values. 
    124  
    125     .. method:: bothSpecialDist 
    126  
    127         A list containing the distance between two unknown values for each 
    128         discrete feature. 
    129  
    130     This measure of distance deals with unknown values by computing the 
    131     expected square of distance based on the distribution obtained from the 
    132     "training" data. Squared distance between 
    133  
    134         - A known and unknown continuous attribute equals squared distance 
    135           between the known and the average, plus variance 
    136         - Two unknown continuous attributes equals double variance 
    137         - A known and unknown discrete attribute equals the probabilit 
    138           that the unknown attribute has different value than the known 
    139           (i.e., 1 - probability of the known value) 
    140         - Two unknown discrete attributes equals the probability that two 
    141           random chosen values are equal, which can be computed as 
    142           1 - sum of squares of probabilities. 
    143  
    144     Continuous cases can be handled by averages and variances inherited from 
    145     ExamplesDistance_normalized. The data for discrete cases are stored in 
    146     distributions (used for unknown vs. known value) and in bothSpecial 
    147     (the precomputed distance between two unknown values). 
    148  
    149 .. class:: Relief, ReliefConstructor 
    150  
    151     Relief is similar to Manhattan distance, but incorporates a more 
    152     correct treatment of undefined values, which is used by ReliefF measure. 
    153  
    154 This class is derived directly from ExamplesDistance, not from ExamplesDistance_Normalized.         
    155              
    156  
    157 .. autoclass:: PearsonR 
    158     :members: 
    159  
    160 .. autoclass:: SpearmanR 
    161     :members: 
    162  
    163 .. autoclass:: PearsonRConstructor 
    164     :members: 
    165  
    166 .. autoclass:: SpearmanRConstructor 
    167     :members:     
    168  
    169  
    170 """ 
    171  
    1721import Orange 
    1732 
Note: See TracChangeset for help on using the changeset viewer.