source: orange/docs/reference/rst/Orange.distance.rst @ 9719:782cfec5fe88

Revision 9719:782cfec5fe88, 6.1 KB checked in by markotoplak, 2 years ago (diff)

Orange.distance renames.

Line 
1.. py:currentmodule:: Orange.distance
2
3.. automodule:: Orange.distance
4
5##########################################
6Distance (``distance``)
7##########################################
8
9Distance measures typically have to be adjusted to the data. For instance,
10when the data set contains continuous features, the distances between
11continuous values should be normalized to ensure that all features have
12similar impats, e.g. by dividing the distance with the range.
13
14Distance measures thus appear in pairs - a class that measures
15the distance (:obj:`Distance`) and a class that constructs it based on the
16data (:obj:`DistanceConstructor`).
17
18Since most measures work on normalized distances between corresponding
19features, an abstract class `DistanceNormalized` takes care of
20normalizing.
21
22Unknown values are treated correctly only by Euclidean and Relief
23distance.  For other measures, a distance between unknown and known or
24between two unknown values is always 0.5.
25
26.. class:: Distance
27
28    .. method:: __call__(instance1, instance2)
29
30        Return a distance between the given instances (as a floating point number).
31
32.. class:: DistanceConstructor
33
34    .. method:: __call__([instances, weightID][, distributions][, basic_var_stat])
35
36        Constructs an :obj:`Distance`.  Not all the data needs to be
37        given. Most measures can be constructed from basic_var_stat;
38        if it is not given, they can help themselves either by instances
39        or distributions. Some do not need any arguments.
40
41.. class:: DistanceNormalized
42
43    This abstract class provides a function which is given two instances
44    and returns a list of normalized distances between values of their
45    features. Many distance measuring classes need such a function and are
46    therefore derived from this class
47
48    .. attribute:: normalizers
49
50        A precomputed list of normalizing factors for feature values
51
52        - If a factor positive, differences in feature's values
53          are multiplied by it; for continuous features the factor
54          would be 1/(max_value-min_value) and for ordinal features
55          the factor is 1/number_of_values. If either (or both) of
56          features are unknown, the distance is 0.5
57        - If a factor is -1, the feature is nominal; the distance
58          between two values is 0 if they are same (or at least
59          one is unknown) and 1 if they are different.
60        - If a factor is 0, the feature is ignored.
61
62    .. attribute:: bases, averages, variances
63
64        The minimal values, averages and variances
65        (continuous features only)
66
67    .. attribute:: domainVersion
68
69        Stores a domain version for which the normalizers were computed.
70        The domain version is increased each time a domain description is
71        changed (i.e. features are added or removed); this is used for a quick
72        check that the user is not attempting to measure distances between
73        instances that do not correspond to normalizers.
74        Since domains are practicably immutable (especially from Python),
75        you don't need to care about this anyway.
76
77    .. method:: attributeDistances(instance1, instance2)
78
79        Returns a list of floats representing distances between pairs of
80        feature values of the two instances.
81
82
83.. class:: HammingConstructor
84.. class:: Hamming
85
86    Hamming distance between two instances is defined as the number of
87    features in which the two instances differ. Note that this measure
88    is not really appropriate for instances that contain continuous features.
89
90
91.. class:: MaximalConstructor
92.. class:: Maximal
93
94    The maximal between two instances is defined as the maximal distance
95    between two feature values. If dist is the result of
96    ExamplesDistance_Normalized.attributeDistances,
97    then Maximal returns max(dist).
98
99
100.. class:: ManhattanConstructor
101.. class:: Manhattan
102
103    Manhattan distance between two instances is a sum of absolute values
104    of distances between pairs of features, e.g. ``sum(abs(x) for x in dist)``
105    where dist is the result of ExamplesDistance_Normalized.attributeDistances.
106
107.. class:: EuclideanConstructor
108.. class:: Euclidean
109
110    Euclidean distance is a square root of sum of squared per-feature distances,
111    i.e. ``sqrt(sum(x*x for x in dist))``, where dist is the result of
112    ExamplesDistance_Normalized.attributeDistances.
113
114    .. method:: distributions
115
116        An object of type
117        :obj:`~Orange.statistics.distribution.Distribution` that holds
118        the distributions for all discrete features used for
119        computation of distances between known and unknown values.
120
121    .. method:: bothSpecialDist
122
123        A list containing the distance between two unknown values for each
124        discrete feature.
125
126    This measure of distance deals with unknown values by computing the
127    expected square of distance based on the distribution obtained from the
128    "training" data. Squared distance between
129
130        - A known and unknown continuous attribute equals squared distance
131          between the known and the average, plus variance
132        - Two unknown continuous attributes equals double variance
133        - A known and unknown discrete attribute equals the probability
134          that the unknown attribute has different value than the known
135          (i.e., 1 - probability of the known value)
136        - Two unknown discrete attributes equals the probability that two
137          random chosen values are equal, which can be computed as
138          1 - sum of squares of probabilities.
139
140    Continuous cases can be handled by averages and variances inherited from
141    ExamplesDistance_normalized. The data for discrete cases are stored in
142    distributions (used for unknown vs. known value) and in bothSpecial
143    (the precomputed distance between two unknown values).
144
145.. class:: ReliefConstructor
146.. class:: Relief
147
148    Relief is similar to Manhattan distance, but incorporates a more
149    correct treatment of undefined values, which is used by ReliefF measure.
150
151This class is derived directly from ExamplesDistance, not from ExamplesDistance_Normalized.
152
153
154.. autoclass:: PearsonR
155    :members:
156
157.. autoclass:: SpearmanR
158    :members:
159
160.. autoclass:: PearsonRConstructor
161    :members:
162
163.. autoclass:: SpearmanRConstructor
164    :members:
Note: See TracBrowser for help on using the repository browser.