Changeset 10066:f2e874c04356 in orange


Ignore:
Timestamp:
02/08/12 11:21:52 (2 years ago)
Author:
markotoplak
Branch:
default
Children:
10067:ade7a1b97bd0, 10076:6ced46705b1d
rebase_source:
b4466525d3993c507c678c27ddf3a8499b90b76f
Message:

Work on HierarchicalClustering.

Files:
2 edited

Legend:

Unmodified
Added
Removed
  • Orange/clustering/hierarchical.py

    r10046 r10066  
    1212case O(n3)).   
    1313 
    14 The distance should contain no negative elements. This limitation is due 
    15 to implementation details of the algorithm and helpt the algorithm to 
    16 run a bit faste. The elements on the diagonal (representing the element's 
    17 distance from itself) are ignored. 
     14The distance matrix has to contain no negative elements, as this helps 
     15the algorithm to run faster. The elements on the diagonal are ignored. 
    1816 
    1917Basic functionality 
     
    6260 
    6361 
    64 .. class:: HierarchicalCluster 
    65  
    66     Represents a node in the clustering tree, as returned by 
    67     :obj:`HierarchicalClustering` 
    68  
    69     .. attribute:: branches 
    70      
    71         A list of sub-clusters (:class:`HierarchicalCluster` instances). If this 
    72         is a leaf node this attribute is `None` 
    73          
    74     .. attribute:: left 
    75      
    76         The left sub-cluster (defined only if there are only two branches). 
    77         Same as ``branches[0]``. 
    78          
    79     .. attribute:: right 
    80      
    81         The right sub-cluster (defined only if there are only two branches). 
    82         Same as ``branches[1]``. 
    83          
    84     .. attribute:: height 
    85      
    86         Height of the cluster (distance between the sub-clusters). 
    87          
    88     .. attribute:: mapping 
    89      
    90         A list of indices to the original distance matrix. It is the same 
    91         for all clusters in the hierarchy - it simply represents the indices 
    92         ordered according to the clustering. 
    93      
    94     .. attribute:: mapping.objects 
    95  
    96         A sequence describing objects - an :obj:`Orange.data.Table`, a 
    97         list of instance, a list of features (when clustering features), 
    98         or even a string of the same length as the number of elements. 
    99  
    100     .. attribute:: first 
    101     .. attribute:: last 
    102      
    103         ``first`` and ``last`` are indices into the elements of ``mapping`` that 
    104         belong to that cluster. 
    105  
    106     .. method:: __len__() 
    107      
    108         Asking for the length of the cluster gives the number of the objects 
    109         belonging to it. This equals ``last - first``. 
    110      
    111     .. method:: __getitem__(index) 
    112      
    113         By indexing the cluster we address its elements; these are either  
    114         indices or objects (you'll understand this after seeing the examples). 
    115         For instance cluster[2] gives the third element of the cluster, and 
    116         list(cluster) will return the cluster elements as a list. The cluster 
    117         elements are read-only. To actually modify them, you'll have to go 
    118         through mapping, as described below. This is intentionally complicated 
    119         to discourage a naive user from doing what he does not understand. 
    120      
    121     .. method:: swap() 
    122      
    123         Swaps the ``left`` and the ``right`` subcluster; obviously this will 
    124         report an error when the cluster has more than two subclusters. This  
    125         function changes the mapping and first and last of all clusters below 
    126         this one and thus needs O(len(cluster)) time. 
    127          
    128     .. method:: permute(permutation) 
    129      
    130         Permutes the subclusters. Permutation gives the order in which the 
    131         subclusters will be arranged. As for swap, this function changes the 
    132         mapping and first and last of all clusters below this one.  
    133      
    13462     
    13563Example 1 - Toy matrix 
     
    232160---------------------------------- 
    233161 
    234 The most common things to cluster are certainly examples. To show how to 
    235 this is done, we shall now load the Iris data set, initialize a distance 
    236 matrix with the distances measure by :class:`Euclidean` 
     162Clustering of examples is shown on the Iris data set, initialize a distance 
     163matrix with the :class:`Euclidean` distance measure 
    237164and cluster it with average linkage. Since we don't need the matrix, 
    238 we shall let the clustering overwrite it (not that it's needed for 
    239 such a small data set as Iris). 
     165we shall let the clustering overwrite it. 
    240166 
    241167.. literalinclude:: code/hierarchical-example-2.py 
    242168    :lines: 1-15 
    243169 
    244 Note that we haven't forgotten to set the ``matrix.objects``. We did it 
    245 through ``matrix.setattr`` to avoid the warning. Let us now prune the 
    246 clustering using the function we've written above, and print out the 
    247 clusters. 
    248      
     170Let us now prune the clustering using the function we've written above, 
     171and print out the clusters. 
     172 
    249173.. literalinclude:: code/hierarchical-example-2.py 
    250174    :lines: 16-20 
     
    281205However, instead of list of lists, it will return a list of tables. 
    282206 
    283 Exploring hierarchical clusters 
    284 --------------------------------------------------------- 
    285  
     207Cluster analysis 
     208----------------- 
     209 
     210.. autofunction:: cluster_to_list 
     211.. autofunction:: top_clusters 
     212.. autofunction:: top_cluster_membership 
     213.. autofunction:: order_leaves 
     214.. autofunction:: postorder 
     215.. autofunction:: preorder 
     216.. autofunction:: prune 
     217.. autofunction:: pruned 
     218.. autofunction:: clone 
     219.. autofunction:: cluster_depths 
     220.. autofunction:: cophenetic_distances 
     221.. autofunction:: cophenetic_correlation 
     222.. autofunction:: joining_cluster 
     223 
     224 
     225HierarchicalCluster hierarchy 
     226----------------------------- 
     227 
     228Results of clustering are stored in a hierarchy of 
     229:obj:`HierarchicalCluster` objects. 
     230 
     231.. class:: HierarchicalCluster 
     232 
     233    A node in the clustering tree, as returned by 
     234    :obj:`HierarchicalClustering`. 
     235 
     236    .. attribute:: branches 
     237     
     238        A list of sub-clusters (:class:`HierarchicalCluster` instances). If this 
     239        is a leaf node this attribute is `None` 
     240         
     241    .. attribute:: left 
     242     
     243        The left sub-cluster (defined only if there are only two branches). 
     244        Same as ``branches[0]``. 
     245         
     246    .. attribute:: right 
     247     
     248        The right sub-cluster (defined only if there are only two branches). 
     249        Same as ``branches[1]``. 
     250         
     251    .. attribute:: height 
     252     
     253        Height of the cluster (distance between the sub-clusters). 
     254         
     255    .. attribute:: mapping 
     256     
     257        A list of indices to the original distance matrix. It is the same 
     258        for all clusters in the hierarchy - it simply represents the indices 
     259        ordered according to the clustering. 
     260     
     261    .. attribute:: mapping.objects 
     262 
     263        A sequence describing objects - an :obj:`Orange.data.Table`, a 
     264        list of instance, a list of features (when clustering features), 
     265        or even a string of the same length as the number of elements. 
     266 
     267    .. attribute:: first 
     268    .. attribute:: last 
     269     
     270        ``first`` and ``last`` are indices into the elements of ``mapping`` that 
     271        belong to that cluster. 
     272 
     273    .. method:: __len__() 
     274     
     275        Asking for the length of the cluster gives the number of the objects 
     276        belonging to it. This equals ``last - first``. 
     277     
     278    .. method:: __getitem__(index) 
     279     
     280        By indexing the cluster we address its elements; these are either  
     281        indices or objects. 
     282        For instance cluster[2] gives the third element of the cluster, and 
     283        list(cluster) will return the cluster elements as a list. The cluster 
     284        elements are read-only. 
     285     
     286    .. method:: swap() 
     287     
     288        Swaps the ``left`` and the ``right`` subcluster; it will 
     289        report an error when the cluster has more than two subclusters. This  
     290        function changes the mapping and first and last of all clusters below 
     291        this one and thus needs O(len(cluster)) time. 
     292         
     293    .. method:: permute(permutation) 
     294     
     295        Permutes the subclusters. Permutation gives the order in which the 
     296        subclusters will be arranged. As for swap, this function changes the 
     297        mapping and first and last of all clusters below this one.  
     298     
    286299To demonstrate how the data in clusters is stored, we shall continue with 
    287300the clustering we got in the first example. :: 
     
    310323``cluster.mapping[cluster.first:cluster.last]``. ``cluster[i]`` therefore 
    311324returns ``cluster.mapping[cluster.first+i]`` or, if objects are specified, 
    312 ``cluster.objects[cluster.mapping[cluster.first+i]]``. Space consumption 
    313 is minimal since all clusters share the same objects ``mapping`` and 
    314 ``objects``. 
     325``cluster.objects[cluster.mapping[cluster.first+i]]``. 
    315326 
    316327Subclusters are ordered so that ``cluster.left.last`` always equals 
     
    325336is moved by the same offset. 
    326337 
    327 The hierarchy of objects that represent a clustering is open, everything is 
    328 accessible from Python. You can write your own clustering algorithms that 
    329 build this same structure, or you can use Orange's clustering and then do to 
    330 the structure anything you want. For instance prune it, as we have shown 
    331 earlier. However, it is easy to do things wrong: shuffle the mapping, for 
    332 instance, and forget to adjust the ``first`` and ``last`` pointers. Orange 
    333 does some checking for the internal consistency, but you are surely smarter 
    334 and can find a way to crash it. For instance, just create a cycle in the 
    335 structure, call ``swap`` for some cluster above the cycle and you're there. 
    336 But don't blame it on me then. 
     338When building the structure of :obj:`HierarchicalCluster`\s yourself. 
     339However, it is easy to do things wrong: shuffle the mapping, for 
     340instance, and forget to adjust the ``first`` and ``last`` pointers. 
     341 
     342Output 
     343-------------- 
     344 
     345.. autofunction:: dendrogram_layout 
     346.. autofunction:: dendrogram_draw 
    337347 
    338348 
     
    341351 
    342352.. autofunction:: clustering_features 
    343 .. autofunction:: cluster_to_list 
    344 .. autofunction:: top_clusters 
    345 .. autofunction:: top_cluster_membership 
    346 .. autofunction:: order_leaves 
    347  
    348 .. autofunction:: postorder 
    349 .. autofunction:: preorder 
    350 .. autofunction:: dendrogram_layout 
    351 .. autofunction:: dendrogram_draw 
    352 .. autofunction:: clone 
    353 .. autofunction:: prune 
    354 .. autofunction:: pruned 
    355 .. autofunction:: cluster_depths 
    356353.. autofunction:: feature_distance_matrix 
    357 .. autofunction:: joining_cluster 
    358 .. autofunction:: cophenetic_distances 
    359 .. autofunction:: cophenetic_correlation 
    360354 
    361355""" 
     
    438432        reporting the on the progress. 
    439433    :type progress_callback: function 
    440      
     434 
    441435    :rtype: :class:`HierarchicalCluster` 
    442436     
  • docs/reference/rst/code/hierarchical-example-2.py

    r10008 r10066  
    33iris = Orange.data.Table("iris") 
    44matrix = Orange.misc.SymMatrix(len(iris)) 
    5 matrix.setattr("objects", iris) 
    65distance = Orange.distance.Euclidean(iris) 
    76for i1, instance1 in enumerate(iris): 
     
    1312clustering.overwrite_matrix = 1 
    1413root = clustering(matrix) 
     14root.mapping.objects = iris 
    1515 
    1616def prune(cluster, togo): 
Note: See TracChangeset for help on using the changeset viewer.