# Changeset 10106:643f0754d673 in orange

Ignore:
Timestamp:
02/08/12 18:40:14 (2 years ago)
Branch:
default
rebase_source:
a51d842a2c0ecf4c862a05d451fffa52d98e44bf
Message:

Hierarchical clustering documentation updates. New documentation scripts. Fixed a showcase "pruned".

Files:
1 added
3 edited

Unmodified
Added
Removed
• ## Orange/clustering/hierarchical.py

 r10066 The distance matrix has to contain no negative elements, as this helps the algorithm to run faster. The elements on the diagonal are ignored. .. rubric:: Example The following example show clustering of the Iris data, Distance matrix is computed with the :class:`Orange.distance.Euclidean` distance measure and cluster it with average linkage. .. literalinclude:: code/hierarchical-example-2.py :lines: 1-12 Data instances belonging to the top-most four clusters could be printed out with: .. literalinclude:: code/hierarchical-example-2.py :lines: 14-19 It could be more informative to print out the class distributions for each cluster. .. literalinclude:: code/hierarchical-example-2.py :lines: 21-26 Here is the output. :: Iris-setosa:   0  Iris-versicolor:  50  Iris-virginica:  17 Iris-setosa:  49  Iris-versicolor:   0  Iris-virginica:   0 Iris-setosa:   0  Iris-versicolor:   0  Iris-virginica:  33 Iris-setosa:   1  Iris-versicolor:   0  Iris-virginica:   0 The same results could also be obtained with: .. literalinclude:: code/hierarchical-example-3.py :lines: 1-7 Basic functionality Example 1 - Toy matrix ---------------------- Let us construct a simple distance matrix and run clustering on it. .. literalinclude:: code/hierarchical-example.py :lines: 1-14 Root is a root of the cluster hierarchy. We can print using a simple recursive function. .. literalinclude:: code/hierarchical-example.py :lines: 16-20 Our clustering, printed by calling printClustering(root) looks like :: (((04)((57)(89)))((1(26))3)) The elements are separated into two groups, the first containing elements 0, 4, 5, 7, 8, 9, and the second 1, 2, 6, 3. The difference between them equals ``root.height``, 9.0 in our case. The first cluster is further divided onto 0 and 4 in one, and 5, 7, 8, 9 in the other subcluster... It is easy to print out the cluster's objects. Here's what's in the left subcluster of root. :: >>> for el in root.left: ... print el, 0 4 5 7 8 9 Let us write a better function for printing out the clustering: instead of printing out the first (and supposedly the only) element of cluster, cluster[0], we shall print it out as a tuple. .. literalinclude:: code/hierarchical-example.py :lines: 22-26 We can add object description into clustering by .. literalinclude:: code/hierarchical-example.py :lines: 31 As before, let us print out the elements of the first left cluster:: >>> for el in root.left: ... print el, Ann Eve Fred Hue Ivy Jon If objects are given, the cluster's elements, as got by indexing (eg root.left[2]) or by iteration, as in the above case, won't be indices but the elements we clustered. If we put an ExampleTable into objects, root.left[-1] will be the last example of the first left cluster. Now for swapping and permutations. :: >>> printClustering(root) ((('Ann''Eve')(('Fred''Hue')('Ivy''Jon')))(('Bob'('Curt''Greg'))'Danny')) >>> root.left.swap() >>> printClustering(root) (((('Fred''Hue')('Ivy''Jon'))('Ann''Eve'))(('Bob'('Curt''Greg'))'Danny')) >>> root.permute([1, 0]) >>> printClustering(root) ((('Bob'('Curt''Greg'))'Danny')((('Fred''Hue')('Ivy''Jon'))('Ann''Eve'))) Calling ``root.left.swap`` reversed the order of subclusters of ``root.left`` and ``root.permute([1, 0])`` (which is equivalent to ``root.swap`` - there aren't many possible permutations of two elements) reverses the order of ``root.left`` and ``root.right``. Let us write function for cluster pruning. .. literalinclude:: code/hierarchical-example.py :lines: 33-39 We shall use ``printClustering2`` here, since we can have multiple elements in a leaf of the clustering hierarchy. :: >>> prune(root, 9) >>> print printClustering2(root) ((('Bob', 'Curt', 'Greg')('Danny',))(('Fred', 'Hue', 'Ivy', 'Jon')('Ann', 'Eve'))) We've ended up with four clusters. Need a list of clusters? Here's the function. .. literalinclude:: code/hierarchical-example.py :lines: 41-51 The function returns a list of lists, in our case ``[['Bob', 'Curt', 'Greg'], ['Danny'], ['Fred', 'Hue', 'Ivy', 'Jon'], ['Ann', 'Eve']]`` If there were no ``objects`` the list would contains indices instead of names. Example 2 - Clustering of examples ---------------------------------- Clustering of examples is shown on the Iris data set, initialize a distance matrix with the :class:`Euclidean` distance measure and cluster it with average linkage. Since we don't need the matrix, we shall let the clustering overwrite it. .. literalinclude:: code/hierarchical-example-2.py :lines: 1-15 Let us now prune the clustering using the function we've written above, and print out the clusters. .. literalinclude:: code/hierarchical-example-2.py :lines: 16-20 Since the printout is pretty long, it might be more informative to just print out the class distributions for each cluster. .. literalinclude:: code/hierarchical-example-2.py :lines: 22-26 Here's what it shows. :: Iris-setosa:  49    Iris-versicolor:   0    Iris-virginica:   0 Iris-setosa:   1    Iris-versicolor:   0    Iris-virginica:   0 Iris-setosa:   0    Iris-versicolor:  50    Iris-virginica:  17 Iris-setosa:   0    Iris-versicolor:   0    Iris-virginica:  33 Note something else: ``listOfClusters`` does not return a list of :class:`Orange.data.Table`, but a list of lists of instances. Therefore, in the above script, cluster is a list of examples, not an ``Table``, but it gets converted to it automatically when the function is called. Most Orange functions will do this for you automatically. You can, for instance, call a learning algorithms, passing a cluster as an argument. It won't mind. If you, however, want to have a list of table, you can easily convert the list by .. literalinclude:: code/hierarchical-example-2.py :lines: 28 Finally, if you are dealing with examples, you may want to take the function ``listOfClusters`` and replace ``alist.append(list(cluster))`` by ``alist.append(Orange.data.Table(cluster))``. This function is less general, it will fail if objects are not of type :class:`Orange.data.Instance`. However, instead of list of lists, it will return a list of tables. Cluster analysis ----------------- list of instance, a list of features (when clustering features), or even a string of the same length as the number of elements. If objects are given, the cluster's elements, as got by indexing or interacion, are not indices but corresponding objects.  It we put an :obj:`Orange.data.Table` into objects, ``root.left[-1]`` is the last instance of the first left cluster. .. attribute:: first subclusters will be arranged. As for swap, this function changes the mapping and first and last of all clusters below this one. To demonstrate how the data in clusters is stored, we shall continue with the clustering we got in the first example. :: >>> del root.mapping.objects >>> print printClustering(root) (((1(26))3)(((57)(89))(04))) >>> print root.mapping <1, 2, 6, 3, 5, 7, 8, 9, 0, 4> >>> print root.left.first 0 >>> print root.left.last 4 >>> print root.left.mapping[root.left.first:root.left.last] <1, 2, 6, 3> >>> print root.left.left.first 0 >>> print root.left.left.last 3 We removed objects to just to see more clearly what is going on. ``mapping`` is an ordered list of indices to the rows/columns of distance matrix (and, at the same time, indices into objects, if they exist). Each cluster's fields ``first`` and ``last`` are indices into mapping, so the clusters elements are actually ``cluster.mapping[cluster.first:cluster.last]``. ``cluster[i]`` therefore returns ``cluster.mapping[cluster.first+i]`` or, if objects are specified, ``cluster.objects[cluster.mapping[cluster.first+i]]``. Subclusters are ordered so that ``cluster.left.last`` always equals equals ``cluster.branches[i+1].first``. Swapping and permutation do three things: change the order of Swapping and permutation change the order of elements in ``branches``, permute the corresponding regions in :obj:`~HierarchicalCluster.mapping` and adjust the ``first`` and ``last`` for all the clusters below. For the latter, when subclusters of cluster are permuted, the entire subtree starting at ``cluster.branches[i]`` is moved by the same offset. When building the structure of :obj:`HierarchicalCluster`\s yourself. However, it is easy to do things wrong: shuffle the mapping, for instance, and forget to adjust the ``first`` and ``last`` pointers. for all the clusters below. .. rubric:: An example The following example constructs a simple distance matrix and runs clustering on it. .. literalinclude:: code/hierarchical-example.py :lines: 1-16 ``root`` is the root of the cluster hierarchy. We can print it with a simple recursive function. .. literalinclude:: code/hierarchical-example.py :lines: 18-22 The clustering looks like :: >>> print print_clustering(root) (((0 4) ((5 7) (8 9))) ((1 (2 6)) 3)) The elements form two groups, the first with elements 0, 4, 5, 7, 8, 9, and the second with 1, 2, 6, 3. The difference between them equals to >>> print root.height 9.0 The first cluster is further divided onto 0 and 4 in one, and 5, 7, 8, 9 in the other subcluster. The following code prints the left subcluster of root. :: >>> for el in root.left: ... print el, 0 4 5 7 8 9 Instead of printing out the first (and supposedly the only) element of cluster, cluster[0], we shall print it out as a tuple. .. literalinclude:: code/hierarchical-example.py :lines: 24-28 Object descriptions can be added with .. literalinclude:: code/hierarchical-example.py :lines: 28-29 As before, let us print out the elements of the first left cluster:: >>> for el in root.left: ... print el, Ann Eve Fred Hue Ivy Jon Calling ``root.left.swap`` reverses the order of subclusters of ``root.left``:: >>> print_clustering(root) (((Ann Eve) ((Fred Hue) (Ivy Jon))) ((Bob (Curt Greg)) Danny)) >>> root.left.swap() >>> print_clustering(root) ((((Fred Hue) (Ivy Jon)) (Ann Eve)) ((Bob (Curt Greg)) Danny)) Let us write function for cluster pruning. .. literalinclude:: code/hierarchical-example.py :lines: 42-48 Here we need a function that can plot leafs with multiple elements. .. literalinclude:: code/hierarchical-example.py :lines: 50-54 Four clusters remain. >>> prune(root, 5) >>> print print_clustering2(root) (('Bob', 'Curt', 'Greg', 'Danny') ((('Fred', 'Hue') ('Ivy', 'Jon')) ('Ann', 'Eve'))) The following function returns a list of lists. .. literalinclude:: code/hierarchical-example.py :lines: 59-69 The function returns a list of lists, in our case :: >>> list_of_clusters(root) [['Bob', 'Curt', 'Greg'], ['Danny'], ['Fred', 'Hue', 'Ivy', 'Jon'], ['Ann', 'Eve']] If :obj:`~HierarchicalCluster.mapping.objects` were not defined the list would contains indices instead of names. :: >>> root.mapping.objects = None >>> print list_of_clusters(root) [[1, 2, 6], [3], [5, 7, 8, 9], [0, 4]] Output .. autofunction:: dendrogram_layout .. autofunction:: dendrogram_draw Utility Functions :type progress_callback: function .. note:: The ordering is done inplace. The ordering is done inplace. """ :type progress_callback: function .. note:: The ordering is done inplace. The ordering is done inplace. """ """ A class for drawing dendrograms. .. note:: ``dendrogram_draw`` function is a more convinient interface ``dendrogram_draw`` function is a more convinient interface to the functionality provided by this class and. def pruned(cluster, level=None, height=None, condition=None): """ Return a new pruned clustering instance. .. note:: This uses :obj:`clone` to create a copy of the `cluster` instance. """ Return a new pruned clustering instance. It uses :obj:`clone` to create a copy of the `cluster` instance. :param cluster: Cluster to prune.
• ## docs/reference/rst/code/hierarchical-example-2.py

 r10066 iris = Orange.data.Table("iris") matrix = Orange.misc.SymMatrix(len(iris)) distance = Orange.distance.Euclidean(iris) for i1, instance1 in enumerate(iris): for i2 in range(i1 + 1, len(iris)): matrix[i1, i2] = distance(instance1, iris[i2]) matrix = Orange.distance.distance_matrix(iris, Orange.distance.Euclidean) clustering = Orange.clustering.hierarchical.HierarchicalClustering() clustering.linkage = clustering.Average clustering.overwrite_matrix = 1 root = clustering(matrix) root.mapping.objects = iris def prune(cluster, togo): if cluster.branches: if togo < 0: cluster.branches = None else: for branch in cluster.branches: prune(branch, togo - cluster.height) topmost = Orange.clustering.hierarchical.top_clusters(root, 4) def listOfClusters0(cluster, alist): if not cluster.branches: alist.append(list(cluster)) else: for branch in cluster.branches: listOfClusters0(branch, alist) def listOfClusters(root): l = [] listOfClusters0(root, l) return l tables = [Orange.data.Table(cluster) for cluster in listOfClusters(root)] prune(root, 1.4) for n, cluster in enumerate(listOfClusters(root)): for n, cluster in enumerate(topmost): print "\n\n Cluster %i \n" % n for instance in cluster: print instance for cluster in listOfClusters(root): dist = Orange.statistics.distribution.Distribution(iris.domain.class_var, cluster) for cluster in topmost: dist = Orange.statistics.distribution.Distribution(iris.domain.class_var, \ [ ex for ex in cluster ]) for e, d in enumerate(dist): print "%s: %3.0f " % (iris.domain.class_var.values[e], d), print
• ## docs/reference/rst/code/hierarchical-example.py

 r9906 import Orange m = [[], [ 3], [13, 9, 14, 15, 7, 8, 4, 6], [12, 10, 11, 15, 2, 5, 7, 3, 1]] matrix = Orange.misc.SymMatrix(m) root = Orange.clustering.hierarchical.HierarchicalClustering(matrix, linkage=Orange.clustering.hierarchical.HierarchicalClustering.Average) def printClustering(cluster): def print_clustering(cluster): if cluster.branches: return "(%s%s)" % (printClustering(cluster.left), printClustering(cluster.right)) return "(%s %s)" % (print_clustering(cluster.left), print_clustering(cluster.right)) else: return str(cluster[0]) def printClustering2(cluster): print print_clustering(root) print root.height root.mapping.objects = ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"] print print_clustering(root) for el in root.left: print el print print_clustering(root) root.left.swap() print print_clustering(root) root.permute([1, 0]) print print_clustering(root) def prune(cluster, h): if cluster.branches: return "(%s%s)" % (printClustering2(cluster.left), printClustering2(cluster.right)) if cluster.height < h: cluster.branches = None else: for branch in cluster.branches: prune(branch, h) def print_clustering2(cluster): if cluster.branches: return "(%s %s)" % (print_clustering2(cluster.left), print_clustering2(cluster.right)) else: return str(tuple(cluster)) matrix.objects = ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"] prune(root, 5) print print_clustering2(root) root.mapping.objects = ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"] def prune(cluster, togo): if cluster.branches: if togo<0: cluster.branches = None else: for branch in cluster.branches: prune(branch, togo-cluster.height) def listOfClusters0(cluster, alist): def list_of_clusters0(cluster, alist): if not cluster.branches: alist.append(list(cluster)) else: for branch in cluster.branches: listOfClusters0(branch, alist) def listOfClusters(root): list_of_clusters0(branch, alist) def list_of_clusters(root): l = [] listOfClusters0(root, l) return l list_of_clusters0(root, l) return l print list_of_clusters(root) root.mapping.objects = None print list_of_clusters(root) print root.left.last print root.right.first
Note: See TracChangeset for help on using the changeset viewer.