source: orange/docs/reference/rst/Orange.distance.rst @ 9720:a01e00b751d1

Revision 9720:a01e00b751d1, 5.9 KB checked in by markotoplak, 2 years ago (diff)

Orange.distance work.

Line 
1.. py:currentmodule:: Orange.distance
2
3.. automodule:: Orange.distance
4
5##########################################
6Distance (``distance``)
7##########################################
8
9Distance measures typically have to be adjusted to the data. For instance,
10when the data set contains continuous features, the distances between
11continuous values should be normalized to ensure that all features have
12similar impats, e.g. by dividing the distance with the range.
13
14Distance measures thus appear in pairs - a class that measures
15the distance (:obj:`Distance`) and a class that constructs it based on the
16data (:obj:`DistanceConstructor`).
17
18Since most measures work on normalized distances between corresponding
19features, an abstract class `DistanceNormalized` takes care of
20normalizing.
21
22Unknown values are treated correctly only by Euclidean and Relief
23distance.  For other measures, a distance between unknown and known or
24between two unknown values is always 0.5.
25
26.. class:: Distance
27
28    .. method:: __call__(instance1, instance2)
29
30        Return a distance between the given instances (as a floating point number).
31
32.. class:: DistanceConstructor
33
34    .. method:: __call__([instances, weightID][, distributions][, basic_var_stat])
35
36        Constructs an :obj:`Distance`.  Not all the data needs to be
37        given. Most measures can be constructed from basic_var_stat;
38        if it is not given, they can help themselves either by instances
39        or distributions. Some do not need any arguments.
40
41.. class:: DistanceNormalized
42
43    This abstract class provides a function which is given two instances
44    and returns a list of normalized distances between values of their
45    features. Many distance measuring classes need such a function and are
46    therefore derived from this class
47
48    .. attribute:: normalizers
49
50        A precomputed list of normalizing factors for feature values
51
52        - If a factor positive, differences in feature's values
53          are multiplied by it; for continuous features the factor
54          would be 1/(max_value-min_value) and for ordinal features
55          the factor is 1/number_of_values. If either (or both) of
56          features are unknown, the distance is 0.5
57        - If a factor is -1, the feature is nominal; the distance
58          between two values is 0 if they are same (or at least
59          one is unknown) and 1 if they are different.
60        - If a factor is 0, the feature is ignored.
61
62    .. attribute:: bases, averages, variances
63
64        The minimal values, averages and variances
65        (continuous features only)
66
67    .. attribute:: domain_version
68
69        The domain version increases each time a domain description is
70        changed (i.e. features are added or removed); this checks
71        that the user is not attempting to measure distances between
72        instances that do not correspond to normalizers.
73
74    .. method:: attribute_distances(instance1, instance2)
75
76        Return a list of floats representing distances between pairs of
77        feature values of the two instances.
78
79.. class:: HammingConstructor
80.. class:: Hamming
81
82    Hamming distance between two instances is defined as the number of
83    features in which the two instances differ. Note that this measure
84    is not really appropriate for instances that contain continuous features.
85
86.. class:: MaximalConstructor
87.. class:: Maximal
88
89    The maximal between two instances is defined as the maximal distance
90    between two feature values. If dist is the result of
91    DistanceNormalized.attribute_distances,
92    then Maximal returns max(dist).
93
94.. class:: ManhattanConstructor
95.. class:: Manhattan
96
97    Manhattan distance between two instances is a sum of absolute values
98    of distances between pairs of features, e.g. ``sum(abs(x) for x in dist)``
99    where dist is the result of ExamplesDistance_Normalized.attributeDistances.
100
101.. class:: EuclideanConstructor
102.. class:: Euclidean
103
104    Euclidean distance is a square root of sum of squared per-feature distances,
105    i.e. ``sqrt(sum(x*x for x in dist))``, where dist is the result of
106    ExamplesDistance_Normalized.attributeDistances.
107
108    .. method:: distributions
109
110        An object of type
111        :obj:`~Orange.statistics.distribution.Distribution` that holds
112        the distributions for all discrete features used for
113        computation of distances between known and unknown values.
114
115    .. method:: bothSpecialDist
116
117        A list containing the distance between two unknown values for each
118        discrete feature.
119
120    This measure of distance deals with unknown values by computing the
121    expected square of distance based on the distribution obtained from the
122    "training" data. Squared distance between
123
124        - A known and unknown continuous attribute equals squared distance
125          between the known and the average, plus variance
126        - Two unknown continuous attributes equals double variance
127        - A known and unknown discrete attribute equals the probability
128          that the unknown attribute has different value than the known
129          (i.e., 1 - probability of the known value)
130        - Two unknown discrete attributes equals the probability that two
131          random chosen values are equal, which can be computed as
132          1 - sum of squares of probabilities.
133
134    Continuous cases can be handled by averages and variances inherited from
135    ExamplesDistance_normalized. The data for discrete cases are stored in
136    distributions (used for unknown vs. known value) and in bothSpecial
137    (the precomputed distance between two unknown values).
138
139.. class:: ReliefConstructor
140.. class:: Relief
141
142    Relief is similar to Manhattan distance, but incorporates a more
143    correct treatment of undefined values, which is used by ReliefF measure.
144
145This class is derived directly from ExamplesDistance, not from ExamplesDistance_Normalized.
146
147
148.. autoclass:: PearsonR
149    :members:
150
151.. autoclass:: SpearmanR
152    :members:
153
154.. autoclass:: PearsonRConstructor
155    :members:
156
157.. autoclass:: SpearmanRConstructor
158    :members:
Note: See TracBrowser for help on using the repository browser.