source: orange/docs/reference/rst/Orange.distance.rst @ 9819:e11e2ff31f47

Revision 9819:e11e2ff31f47, 5.1 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Fixed a bug in documentation.

Line 
1.. py:currentmodule:: Orange.distance
2
3##########################################
4Distance (``distance``)
5##########################################
6
7Distance measures typically have to be adjusted to the data. For instance,
8when the data set contains continuous features, the distances between
9continuous values should be normalized to ensure that all features have
10similar impats, e.g. by dividing the distance with the range.
11
12Distance measures thus appear in pairs - a class that measures
13the distance (:obj:`Distance`) and a class that constructs it based on the
14data (:obj:`DistanceConstructor`).
15
16Since most measures work on normalized distances between corresponding
17features, an abstract class `DistanceNormalized` takes care of
18normalizing.
19
20Unknown values are treated correctly only by Euclidean and Relief
21distance.  For other measures, a distance between unknown and known or
22between two unknown values is always 0.5.
23
24.. autofunction:: distance_matrix
25
26.. class:: Distance
27
28    .. method:: __call__(instance1, instance2)
29
30        Return a distance between the given instances (as a floating point number).
31
32.. class:: DistanceConstructor
33
34    .. method:: __call__([instances, weightID][, distributions][, basic_var_stat])
35
36        Constructs an :obj:`Distance`. Not all arguments are required.
37        Most measures can be constructed from basic_var_stat; if it is
38        not given, instances or distributions can be used.
39
40.. class:: DistanceNormalized
41
42    An abstract class that provides normalization.
43
44    .. attribute:: normalizers
45
46        A precomputed list of normalizing factors for feature values. They are:
47
48        - 1/(max_value-min_value) for continuous and 1/number_of_values
49          for ordinal features.
50          If either feature is unknown, the distance is 0.5. Such factors
51          are used to multiply differences in feature's values.
52        - ``-1`` for nominal features; the distance
53          between two values is 0 if they are same (or at least one is
54          unknown) and 1 if they are different.
55        - ``0`` for ignored features.
56
57    .. attribute:: bases, averages, variances
58
59        The minimal values, averages and variances
60        (continuous features only).
61
62    .. attribute:: domain_version
63
64        The domain version changes each time a domain description is
65        changed (i.e. features are added or removed).
66
67    .. method:: feature_distances(instance1, instance2)
68
69        Return a list of floats representing normalized distances between
70        pairs of feature values of the two instances.
71
72.. class:: Hamming
73.. class:: HammingDistance
74
75    The number of features in which the two instances differ. This measure
76    is not appropriate for instances that contain continuous features.
77
78.. class:: Maximal
79.. class:: MaximalDistance
80
81    The maximal distance
82    between two feature values. If dist is the result of
83    ~:obj:`DistanceNormalized.feature_distances`,
84    then :class:`Maximal` returns ``max(dist)``.
85
86.. class:: Manhattan
87.. class:: ManhattanDistance
88
89    The sum of absolute values
90    of distances between pairs of features, e.g. ``sum(abs(x) for x in dist)``
91    where dist is the result of ~:obj:`DistanceNormalized.feature_distances`.
92
93.. class:: Euclidean
94.. class:: EuclideanDistance
95
96    The square root of sum of squared per-feature distances,
97    i.e. ``sqrt(sum(x*x for x in dist))``, where dist is the result of
98    ~:obj:`DistanceNormalized.feature_distances`.
99
100    .. method:: distributions
101
102        A :obj:`~Orange.statistics.distribution.Distribution` containing
103        the distributions for all discrete features used for
104        computation of distances between known and unknown values.
105
106    .. method:: both_special_dist
107
108        A list containing the distance between two unknown values for each
109        discrete feature.
110
111    Unknown values are handled by computing the
112    expected square of distance based on the distribution from the
113    "training" data. Squared distance between
114
115        - A known and unknown continuous feature equals squared distance
116          between the known and the average, plus variance.
117        - Two unknown continuous features equals double variance.
118        - A known and unknown discrete feature equals the probability
119          that the unknown feature has different value than the known
120          (i.e., 1 - probability of the known value).
121        - Two unknown discrete features equals the probability that two
122          random chosen values are equal, which can be computed as
123          1 - sum of squares of probabilities.
124
125    Continuous cases are handled as inherited from
126    :class:`DistanceNormalized`. The data for discrete cases are
127    stored in distributions (used for unknown vs. known value) and
128    in :obj:`both_special_dist` (the precomputed distance between two
129    unknown values).
130
131.. class:: Relief
132.. class:: ReliefDistance
133
134    Relief is similar to Manhattan distance, but incorporates the
135    treatment of undefined values, which is used by ReliefF measure.
136
137    This class is derived directly from :obj:`Distance`.
138
139
140.. autoclass:: PearsonR
141    :members:
142
143.. autoclass:: PearsonRDistance
144    :members:
145
146.. autoclass:: SpearmanR
147    :members:
148
149.. autoclass:: SpearmanRDistance
150    :members:
151
152
Note: See TracBrowser for help on using the repository browser.