Ignore:
Timestamp:
02/05/12 23:04:46 (2 years ago)
Author:
anze <anze.staric@…>
Branch:
default
rebase_source:
cc88516d07302e9201ec67de38e99cb60ee1b6a6
Message:

Moved instance distance documentation to rst file.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.distance.instances.rst

    r9372 r9639  
     1.. automodule:: Orange.distance.instances 
     2 
    13######################### 
    24Instances (``instances``) 
    35######################### 
    46 
    5 .. automodule:: Orange.distance.instances 
     7########################### 
     8Distances between Instances 
     9########################### 
     10 
     11This page describes a bunch of classes for different metrics for measure 
     12distances (dissimilarities) between instances. 
     13 
     14Typical (although not all) measures of distance between instances require 
     15some "learning" - adjusting the measure to the data. For instance, when 
     16the dataset contains continuous features, the distances between continuous 
     17values should be normalized, e.g. by dividing the distance with the range 
     18of possible values or with some interquartile distance to ensure that all 
     19features have, in principle, similar impacts. 
     20 
     21Different measures of distance thus appear in pairs - a class that measures 
     22the distance and a class that constructs it based on the data. The abstract 
     23classes representing such a pair are `ExamplesDistance` and 
     24`ExamplesDistanceConstructor`. 
     25 
     26Since most measures work on normalized distances between corresponding 
     27features, there is an abstract intermediate class 
     28`ExamplesDistance_Normalized` that takes care of normalizing. 
     29The remaining classes correspond to different ways of defining the distances, 
     30such as Manhattan or Euclidean distance. 
     31 
     32Unknown values are treated correctly only by Euclidean and Relief distance. 
     33For other measure of distance, a distance between unknown and known or between 
     34two unknown values is always 0.5. 
     35 
     36.. class:: ExamplesDistance 
     37 
     38    .. method:: __call__(instance1, instance2) 
     39 
     40        Returns a distance between the given instances as floating point number. 
     41 
     42.. class:: ExamplesDistanceConstructor 
     43 
     44    .. method:: __call__([instances, weightID][, distributions][, basic_var_stat]) 
     45 
     46        Constructs an instance of ExamplesDistance. 
     47        Not all the data needs to be given. Most measures can be constructed 
     48        from basic_var_stat; if it is not given, they can help themselves 
     49        either by instances or distributions. 
     50        Some (e.g. ExamplesDistance_Hamming) even do not need any arguments. 
     51 
     52.. class:: ExamplesDistance_Normalized 
     53 
     54    This abstract class provides a function which is given two instances 
     55    and returns a list of normalized distances between values of their 
     56    features. Many distance measuring classes need such a function and are 
     57    therefore derived from this class 
     58 
     59    .. attribute:: normalizers 
     60 
     61        A precomputed list of normalizing factors for feature values 
     62 
     63        - If a factor positive, differences in feature's values 
     64          are multiplied by it; for continuous features the factor 
     65          would be 1/(max_value-min_value) and for ordinal features 
     66          the factor is 1/number_of_values. If either (or both) of 
     67          features are unknown, the distance is 0.5 
     68        - If a factor is -1, the feature is nominal; the distance 
     69          between two values is 0 if they are same (or at least 
     70          one is unknown) and 1 if they are different. 
     71        - If a factor is 0, the feature is ignored. 
     72 
     73    .. attribute:: bases, averages, variances 
     74 
     75        The minimal values, averages and variances 
     76        (continuous features only) 
     77 
     78    .. attribute:: domainVersion 
     79 
     80        Stores a domain version for which the normalizers were computed. 
     81        The domain version is increased each time a domain description is 
     82        changed (i.e. features are added or removed); this is used for a quick 
     83        check that the user is not attempting to measure distances between 
     84        instances that do not correspond to normalizers. 
     85        Since domains are practicably immutable (especially from Python), 
     86        you don't need to care about this anyway. 
     87 
     88    .. method:: attributeDistances(instance1, instance2) 
     89 
     90        Returns a list of floats representing distances between pairs of 
     91        feature values of the two instances. 
     92 
     93 
     94.. class:: Hamming, HammingConstructor 
     95 
     96    Hamming distance between two instances is defined as the number of 
     97    features in which the two instances differ. Note that this measure 
     98    is not really appropriate for instances that contain continuous features. 
     99 
     100 
     101.. class:: Maximal, MaximalConstructor 
     102 
     103    The maximal between two instances is defined as the maximal distance 
     104    between two feature values. If dist is the result of 
     105    ExamplesDistance_Normalized.attributeDistances, 
     106    then Maximal returns max(dist). 
     107 
     108 
     109.. class:: Manhattan, ManhattanConstructor 
     110 
     111    Manhattan distance between two instances is a sum of absolute values 
     112    of distances between pairs of features, e.g. ``apply(add, [abs(x) for x in dist])`` 
     113    where dist is the result of ExamplesDistance_Normalized.attributeDistances. 
     114 
     115.. class:: Euclidean, EuclideanConstructor 
     116 
     117 
     118    Euclidean distance is a square root of sum of squared per-feature distances, 
     119    i.e. ``sqrt(apply(add, [x*x for x in dist]))``, where dist is the result of 
     120    ExamplesDistance_Normalized.attributeDistances. 
     121 
     122    .. method:: distributions 
     123 
     124        An object of type 
     125        :obj:`Orange.statistics.distribution.Distribution` that holds 
     126        the distributions for all discrete features used for 
     127        computation of distances between known and unknown values. 
     128 
     129    .. method:: bothSpecialDist 
     130 
     131        A list containing the distance between two unknown values for each 
     132        discrete feature. 
     133 
     134    This measure of distance deals with unknown values by computing the 
     135    expected square of distance based on the distribution obtained from the 
     136    "training" data. Squared distance between 
     137 
     138        - A known and unknown continuous attribute equals squared distance 
     139          between the known and the average, plus variance 
     140        - Two unknown continuous attributes equals double variance 
     141        - A known and unknown discrete attribute equals the probabilit 
     142          that the unknown attribute has different value than the known 
     143          (i.e., 1 - probability of the known value) 
     144        - Two unknown discrete attributes equals the probability that two 
     145          random chosen values are equal, which can be computed as 
     146          1 - sum of squares of probabilities. 
     147 
     148    Continuous cases can be handled by averages and variances inherited from 
     149    ExamplesDistance_normalized. The data for discrete cases are stored in 
     150    distributions (used for unknown vs. known value) and in bothSpecial 
     151    (the precomputed distance between two unknown values). 
     152 
     153.. class:: Relief, ReliefConstructor 
     154 
     155    Relief is similar to Manhattan distance, but incorporates a more 
     156    correct treatment of undefined values, which is used by ReliefF measure. 
     157 
     158This class is derived directly from ExamplesDistance, not from ExamplesDistance_Normalized. 
     159 
     160 
     161.. autoclass:: PearsonR 
     162    :members: 
     163 
     164.. autoclass:: SpearmanR 
     165    :members: 
     166 
     167.. autoclass:: PearsonRConstructor 
     168    :members: 
     169 
     170.. autoclass:: SpearmanRConstructor 
     171    :members: 
Note: See TracChangeset for help on using the changeset viewer.