Ignore:
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • Orange/clustering/hierarchical.py

    r9724 r9752  
    451451.. autofunction:: pruned 
    452452.. autofunction:: cluster_depths 
    453 .. autofunction:: instance_distance_matrix 
    454453.. autofunction:: feature_distance_matrix 
    455454.. autofunction:: joining_cluster 
     
    15091508    return depths 
    15101509 
    1511  
    1512 def instance_distance_matrix(data, 
    1513             distance_constructor=Orange.distance.Euclidean, 
    1514             progress_callback=None): 
    1515     """ A helper function that computes an :class:`Orange.core.SymMatrix` of all 
    1516     pairwise distances between instances in `data`. 
    1517      
    1518     :param data: A data table 
    1519     :type data: :class:`Orange.data.Table` 
    1520      
    1521     :param distance_constructor: An DistanceConstructor instance. 
    1522     :type distance_constructor: :class:`Orange.distance.DistanceConstructor` 
    1523      
    1524     :param progress_callback: A function (taking one argument) to use for 
    1525         reporting the on the progress. 
    1526     :type progress_callback: function 
    1527      
    1528     :rtype: :class:`Orange.core.SymMatrix` 
    1529      
    1530     """ 
    1531     matrix = orange.SymMatrix(len(data)) 
    1532     dist = distance_constructor(data) 
    1533      
    1534     iter_count = matrix.dim * (matrix.dim - 1) / 2 
    1535     milestones = progress_bar_milestones(iter_count, 100) 
    1536      
    1537     for count, ((i, ex1), (j, ex2)) in enumerate(_pairs(enumerate(data))): 
    1538         matrix[i, j] = dist(ex1, ex2) 
    1539         if progress_callback and count in milestones: 
    1540             progress_callback(100.0 * count / iter_count) 
    1541              
    1542     return matrix  
    1543  
     1510instance_distance_matrix = Orange.distance.distance_matrix 
    15441511 
    15451512def feature_distance_matrix(data, distance=None, progress_callback=None): 
  • Orange/distance/__init__.py

    r9725 r9753  
    11import Orange 
    2  
    3 #%s/ExamplesDistanceConstructor/DistanceConstructor/gc 
    4 #%s/ExamplesDistance_Normalized/DistanceNormalized/gc 
    5 #ExampleDistance -> Distance 
    6 #Hamming -> HammingDistance 
    7 #DTW -> DTWDistance 
    8 #Euclidean -> EuclideanDistance 
    9 #Manhattan -> ... 
    10 #Maximal -> ... 
    11 #Relief -> .. 
    12 #DTWConstructor 
    13 #EuclideanConstructor 
    14 #HammingConstructor 
    15 #ManhattanConstructor 
    16 #MaximalConstructor 
    17 #ReliefConstructor 
    18 #PearsonRConstructor -> PearsonR 
    19 #PearsonR -> PearsonRDistance 
    20 #SpearmanRConstructor -> SpearmanR 
    21 #SpearmanR -> SpearmanRDistance 
    22 #MahalanobisConstructor ->  Mahalanobis 
    23 #Mahalanobis -> MahalanobisDistance 
    242 
    253from Orange.core import \ 
     
    4220    ExamplesDistanceConstructor_Relief as Relief 
    4321 
     22<<<<<<< local 
    4423from Orange import statc 
     24======= 
     25from Orange.misc import progress_bar_milestones 
     26 
     27import statc 
     28>>>>>>> other 
    4529import numpy 
    4630from numpy import linalg 
     
    260244            return 1.0 
    261245     
    262      
    263 def distance_matrix(data, distance_constructor, progress_callback=None): 
    264     """ A helper function that computes an obj:`Orange.core.SymMatrix` of all 
     246def _pairs(seq, same = False): 
     247    """ Return all pairs from elements of `seq`. 
     248    """ 
     249    seq = list(seq) 
     250    same = 0 if same else 1 
     251    for i in range(len(seq)): 
     252        for j in range(i + same, len(seq)): 
     253            yield seq[i], seq[j] 
     254    
     255def distance_matrix(data, distance_constructor=Euclidean, progress_callback=None): 
     256    """ A helper function that computes an :obj:`Orange.data.SymMatrix` of all 
    265257    pairwise distances between instances in `data`. 
    266258     
     
    268260    :type data: :obj:`Orange.data.Table` 
    269261     
    270     :param distance_constructor: An DistanceConstructor instance. 
     262    :param distance_constructor: An DistanceConstructor instance (defaults to :obj:`Euclidean`). 
    271263    :type distance_constructor: :obj:`Orange.distances.DistanceConstructor` 
    272      
    273     """ 
    274     from Orange.misc import progressBarMilestones as progress_milestones 
    275     matrix = Orange.core.SymMatrix(len(data)) 
     264 
     265    :param progress_callback: A function (taking one argument) to use for 
     266        reporting the on the progress. 
     267    :type progress_callback: function 
     268     
     269    :rtype: :class:`Orange.data.SymMatrix` 
     270     
     271    """ 
     272    matrix = Orange.data.SymMatrix(len(data)) 
    276273    dist = distance_constructor(data) 
    277      
    278     msize = len(data)*(len(data) - 1)/2 
    279     milestones = progress_milestones(msize, 100) 
    280     count = 0 
    281     for i in range(len(data)): 
    282         for j in range(i + 1, len(data)): 
    283             matrix[i, j] = dist(data[i], data[j]) 
     274 
     275    iter_count = matrix.dim * (matrix.dim - 1) / 2 
     276    milestones = progress_bar_milestones(iter_count, 100) 
     277     
     278    for count, ((i, ex1), (j, ex2)) in enumerate(_pairs(enumerate(data))): 
     279        matrix[i, j] = dist(ex1, ex2) 
     280        if progress_callback and count in milestones: 
     281            progress_callback(100.0 * count / iter_count) 
    284282             
    285             if progress_callback and count in milestones: 
    286                 progress_callback(100.0 * count / msize) 
    287             count += 1 
    288              
    289     return matrix 
     283    return matrix  
  • docs/reference/rst/Orange.distance.rst

    r9720 r9752  
    11.. py:currentmodule:: Orange.distance 
    2  
    3 .. automodule:: Orange.distance 
    42 
    53########################################## 
     
    2422between two unknown values is always 0.5. 
    2523 
     24.. autofunction:: distance_matrix 
     25 
    2626.. class:: Distance 
    2727 
     
    3434    .. method:: __call__([instances, weightID][, distributions][, basic_var_stat]) 
    3535 
    36         Constructs an :obj:`Distance`.  Not all the data needs to be 
    37         given. Most measures can be constructed from basic_var_stat; 
    38         if it is not given, they can help themselves either by instances 
    39         or distributions. Some do not need any arguments. 
     36        Constructs an :obj:`Distance`. Not all arguments are required. 
     37        Most measures can be constructed from basic_var_stat; if it is 
     38        not given, instances or distributions can be used. 
    4039 
    4140.. class:: DistanceNormalized 
    4241 
    43     This abstract class provides a function which is given two instances 
    44     and returns a list of normalized distances between values of their 
    45     features. Many distance measuring classes need such a function and are 
    46     therefore derived from this class 
     42    An abstract class that provides normalization. 
    4743 
    4844    .. attribute:: normalizers 
    4945 
    50         A precomputed list of normalizing factors for feature values 
     46        A precomputed list of normalizing factors for feature values. They are: 
    5147 
    52         - If a factor positive, differences in feature's values 
    53           are multiplied by it; for continuous features the factor 
    54           would be 1/(max_value-min_value) and for ordinal features 
    55           the factor is 1/number_of_values. If either (or both) of 
    56           features are unknown, the distance is 0.5 
    57         - If a factor is -1, the feature is nominal; the distance 
    58           between two values is 0 if they are same (or at least 
    59           one is unknown) and 1 if they are different. 
    60         - If a factor is 0, the feature is ignored. 
     48        - 1/(max_value-min_value) for continuous and 1/number_of_values 
     49          for ordinal features. 
     50          If either feature is unknown, the distance is 0.5. Such factors 
     51          are used to multiply differences in feature's values. 
     52        - ``-1`` for nominal features; the distance 
     53          between two values is 0 if they are same (or at least one is 
     54          unknown) and 1 if they are different. 
     55        - ``0`` for ignored features. 
    6156 
    6257    .. attribute:: bases, averages, variances 
    6358 
    6459        The minimal values, averages and variances 
    65         (continuous features only) 
     60        (continuous features only). 
    6661 
    6762    .. attribute:: domain_version 
    6863 
    69         The domain version increases each time a domain description is 
    70         changed (i.e. features are added or removed); this checks  
    71         that the user is not attempting to measure distances between 
    72         instances that do not correspond to normalizers. 
     64        The domain version changes each time a domain description is 
     65        changed (i.e. features are added or removed). 
    7366 
    74     .. method:: attribute_distances(instance1, instance2) 
     67    .. method:: feature_distances(instance1, instance2) 
    7568 
    76         Return a list of floats representing distances between pairs of 
    77         feature values of the two instances. 
     69        Return a list of floats representing normalized distances between 
     70        pairs of feature values of the two instances. 
    7871 
    79 .. class:: HammingConstructor 
    8072.. class:: Hamming 
     73.. class:: HammingDistance 
    8174 
    82     Hamming distance between two instances is defined as the number of 
    83     features in which the two instances differ. Note that this measure 
    84     is not really appropriate for instances that contain continuous features. 
     75    The number of features in which the two instances differ. This measure 
     76    is not appropriate for instances that contain continuous features. 
    8577 
    86 .. class:: MaximalConstructor 
    8778.. class:: Maximal 
     79.. class:: MaximalDistance 
    8880 
    89     The maximal between two instances is defined as the maximal distance 
     81    The maximal distance 
    9082    between two feature values. If dist is the result of 
    91     DistanceNormalized.attribute_distances, 
    92     then Maximal returns max(dist). 
     83    ~:obj:`DistanceNormalized.feature_distances`, 
     84    then :class:`Maximal` returns ``max(dist)``. 
    9385 
    94 .. class:: ManhattanConstructor 
    9586.. class:: Manhattan 
     87.. class:: ManhattanDistance 
    9688 
    97     Manhattan distance between two instances is a sum of absolute values 
     89    The sum of absolute values 
    9890    of distances between pairs of features, e.g. ``sum(abs(x) for x in dist)`` 
    99     where dist is the result of ExamplesDistance_Normalized.attributeDistances. 
     91    where dist is the result of ~:obj:`DistanceNormalized.feature_distances`. 
    10092 
    101 .. class:: EuclideanConstructor 
    10293.. class:: Euclidean 
     94.. class:: EuclideanDistance 
    10395 
    104     Euclidean distance is a square root of sum of squared per-feature distances, 
     96    The square root of sum of squared per-feature distances, 
    10597    i.e. ``sqrt(sum(x*x for x in dist))``, where dist is the result of 
    106     ExamplesDistance_Normalized.attributeDistances. 
     98    ~:obj:`DistanceNormalized.feature_distances`. 
    10799 
    108100    .. method:: distributions 
    109101 
    110         An object of type 
    111         :obj:`~Orange.statistics.distribution.Distribution` that holds 
     102        A :obj:`~Orange.statistics.distribution.Distribution` containing 
    112103        the distributions for all discrete features used for 
    113104        computation of distances between known and unknown values. 
    114105 
    115     .. method:: bothSpecialDist 
     106    .. method:: both_special_dist 
    116107 
    117108        A list containing the distance between two unknown values for each 
    118109        discrete feature. 
    119110 
    120     This measure of distance deals with unknown values by computing the 
    121     expected square of distance based on the distribution obtained from the 
     111    Unknown values are handled by computing the 
     112    expected square of distance based on the distribution from the 
    122113    "training" data. Squared distance between 
    123114 
    124         - A known and unknown continuous attribute equals squared distance 
    125           between the known and the average, plus variance 
    126         - Two unknown continuous attributes equals double variance 
    127         - A known and unknown discrete attribute equals the probability 
    128           that the unknown attribute has different value than the known 
    129           (i.e., 1 - probability of the known value) 
    130         - Two unknown discrete attributes equals the probability that two 
     115        - A known and unknown continuous feature equals squared distance 
     116          between the known and the average, plus variance. 
     117        - Two unknown continuous features equals double variance. 
     118        - A known and unknown discrete feature equals the probability 
     119          that the unknown feature has different value than the known 
     120          (i.e., 1 - probability of the known value). 
     121        - Two unknown discrete features equals the probability that two 
    131122          random chosen values are equal, which can be computed as 
    132123          1 - sum of squares of probabilities. 
    133124 
    134     Continuous cases can be handled by averages and variances inherited from 
    135     ExamplesDistance_normalized. The data for discrete cases are stored in 
    136     distributions (used for unknown vs. known value) and in bothSpecial 
    137     (the precomputed distance between two unknown values). 
     125    Continuous cases are handled as inherited from 
     126    :class:`DistanceNormalized`. The data for discrete cases are 
     127    stored in distributions (used for unknown vs. known value) and 
     128    in :obj:`both_special_dist` (the precomputed distance between two 
     129    unknown values). 
    138130 
    139 .. class:: ReliefConstructor 
    140131.. class:: Relief 
     132.. class:: ReliefDistance 
    141133 
    142     Relief is similar to Manhattan distance, but incorporates a more 
    143     correct treatment of undefined values, which is used by ReliefF measure. 
     134    Relief is similar to Manhattan distance, but incorporates the 
     135    treatment of undefined values, which is used by ReliefF measure. 
    144136 
    145 This class is derived directly from ExamplesDistance, not from ExamplesDistance_Normalized. 
     137    This class is derived directly from :obj:`Distance`. 
    146138 
    147139 
     
    149141    :members: 
    150142 
    151 .. autoclass:: SpearmanR 
     143.. autoclass:: PearsonRDistance 
    152144    :members: 
    153145 
    154 .. autoclass:: PearsonRConstructor 
     146.. autoclass:: SpearmanR 
    155147    :members: 
    156148 
    157149.. autoclass:: SpearmanRConstructor 
    158150    :members: 
     151 
     152 
  • source/orange/_aliases.txt

    r9651 r9752  
    6666addmetas add_metas 
    6767removemeta remove_meta 
     68 
     69ExamplesDistance_Normalized 
     70feature_distances attribute_distances 
Note: See TracChangeset for help on using the changeset viewer.