Ignore:
Timestamp:
02/06/12 09:42:50 (2 years ago)
Author:
markotoplak
Branch:
default
Children:
9664:6638cc93015a, 9719:782cfec5fe88
rebase_source:
a78d1d75a2cac951721701298920fd6faa82902a
Message:

Orange.distance.instances -> Orange.distance

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.distance.rst

    r9372 r9663  
    1 ######################### 
     1.. automodule:: Orange.distance 
     2 
     3########################################## 
    24Distance (``distance``) 
    3 ######################### 
     5########################################## 
    46 
    5 .. toctree:: 
     7This page describes a bunch of classes for different metrics for measure 
     8distances (dissimilarities) between instances. 
    69 
    7   Orange.distance.instances 
     10Typical (although not all) measures of distance between instances require 
     11some "learning" - adjusting the measure to the data. For instance, when 
     12the dataset contains continuous features, the distances between continuous 
     13values should be normalized, e.g. by dividing the distance with the range 
     14of possible values or with some interquartile distance to ensure that all 
     15features have, in principle, similar impacts. 
    816 
     17Different measures of distance thus appear in pairs - a class that measures 
     18the distance and a class that constructs it based on the data. The abstract 
     19classes representing such a pair are `ExamplesDistance` and 
     20`ExamplesDistanceConstructor`. 
     21 
     22Since most measures work on normalized distances between corresponding 
     23features, there is an abstract intermediate class 
     24`ExamplesDistance_Normalized` that takes care of normalizing. 
     25The remaining classes correspond to different ways of defining the distances, 
     26such as Manhattan or Euclidean distance. 
     27 
     28Unknown values are treated correctly only by Euclidean and Relief distance. 
     29For other measure of distance, a distance between unknown and known or between 
     30two unknown values is always 0.5. 
     31 
     32.. class:: ExamplesDistance 
     33 
     34    .. method:: __call__(instance1, instance2) 
     35 
     36        Returns a distance between the given instances as floating point number. 
     37 
     38.. class:: ExamplesDistanceConstructor 
     39 
     40    .. method:: __call__([instances, weightID][, distributions][, basic_var_stat]) 
     41 
     42        Constructs an instance of ExamplesDistance. 
     43        Not all the data needs to be given. Most measures can be constructed 
     44        from basic_var_stat; if it is not given, they can help themselves 
     45        either by instances or distributions. 
     46        Some (e.g. ExamplesDistance_Hamming) even do not need any arguments. 
     47 
     48.. class:: ExamplesDistance_Normalized 
     49 
     50    This abstract class provides a function which is given two instances 
     51    and returns a list of normalized distances between values of their 
     52    features. Many distance measuring classes need such a function and are 
     53    therefore derived from this class 
     54 
     55    .. attribute:: normalizers 
     56 
     57        A precomputed list of normalizing factors for feature values 
     58 
     59        - If a factor positive, differences in feature's values 
     60          are multiplied by it; for continuous features the factor 
     61          would be 1/(max_value-min_value) and for ordinal features 
     62          the factor is 1/number_of_values. If either (or both) of 
     63          features are unknown, the distance is 0.5 
     64        - If a factor is -1, the feature is nominal; the distance 
     65          between two values is 0 if they are same (or at least 
     66          one is unknown) and 1 if they are different. 
     67        - If a factor is 0, the feature is ignored. 
     68 
     69    .. attribute:: bases, averages, variances 
     70 
     71        The minimal values, averages and variances 
     72        (continuous features only) 
     73 
     74    .. attribute:: domainVersion 
     75 
     76        Stores a domain version for which the normalizers were computed. 
     77        The domain version is increased each time a domain description is 
     78        changed (i.e. features are added or removed); this is used for a quick 
     79        check that the user is not attempting to measure distances between 
     80        instances that do not correspond to normalizers. 
     81        Since domains are practicably immutable (especially from Python), 
     82        you don't need to care about this anyway. 
     83 
     84    .. method:: attributeDistances(instance1, instance2) 
     85 
     86        Returns a list of floats representing distances between pairs of 
     87        feature values of the two instances. 
     88 
     89 
     90.. class:: HammingConstructor 
     91.. class:: Hamming 
     92 
     93    Hamming distance between two instances is defined as the number of 
     94    features in which the two instances differ. Note that this measure 
     95    is not really appropriate for instances that contain continuous features. 
     96 
     97 
     98.. class:: MaximalConstructor 
     99.. class:: Maximal 
     100 
     101    The maximal between two instances is defined as the maximal distance 
     102    between two feature values. If dist is the result of 
     103    ExamplesDistance_Normalized.attributeDistances, 
     104    then Maximal returns max(dist). 
     105 
     106 
     107.. class:: ManhattanConstructor 
     108.. class:: Manhattan 
     109 
     110    Manhattan distance between two instances is a sum of absolute values 
     111    of distances between pairs of features, e.g. ``sum(abs(x) for x in dist)`` 
     112    where dist is the result of ExamplesDistance_Normalized.attributeDistances. 
     113 
     114.. class:: EuclideanConstructor 
     115.. class:: Euclidean 
     116 
     117    Euclidean distance is a square root of sum of squared per-feature distances, 
     118    i.e. ``sqrt(sum(x*x for x in dist))``, where dist is the result of 
     119    ExamplesDistance_Normalized.attributeDistances. 
     120 
     121    .. method:: distributions 
     122 
     123        An object of type 
     124        :obj:`~Orange.statistics.distribution.Distribution` that holds 
     125        the distributions for all discrete features used for 
     126        computation of distances between known and unknown values. 
     127 
     128    .. method:: bothSpecialDist 
     129 
     130        A list containing the distance between two unknown values for each 
     131        discrete feature. 
     132 
     133    This measure of distance deals with unknown values by computing the 
     134    expected square of distance based on the distribution obtained from the 
     135    "training" data. Squared distance between 
     136 
     137        - A known and unknown continuous attribute equals squared distance 
     138          between the known and the average, plus variance 
     139        - Two unknown continuous attributes equals double variance 
     140        - A known and unknown discrete attribute equals the probability 
     141          that the unknown attribute has different value than the known 
     142          (i.e., 1 - probability of the known value) 
     143        - Two unknown discrete attributes equals the probability that two 
     144          random chosen values are equal, which can be computed as 
     145          1 - sum of squares of probabilities. 
     146 
     147    Continuous cases can be handled by averages and variances inherited from 
     148    ExamplesDistance_normalized. The data for discrete cases are stored in 
     149    distributions (used for unknown vs. known value) and in bothSpecial 
     150    (the precomputed distance between two unknown values). 
     151 
     152.. class:: ReliefConstructor 
     153.. class:: Relief 
     154 
     155    Relief is similar to Manhattan distance, but incorporates a more 
     156    correct treatment of undefined values, which is used by ReliefF measure. 
     157 
     158This class is derived directly from ExamplesDistance, not from ExamplesDistance_Normalized. 
     159 
     160 
     161.. autoclass:: PearsonR 
     162    :members: 
     163 
     164.. autoclass:: SpearmanR 
     165    :members: 
     166 
     167.. autoclass:: PearsonRConstructor 
     168    :members: 
     169 
     170.. autoclass:: SpearmanRConstructor 
     171    :members: 
Note: See TracChangeset for help on using the changeset viewer.