# Changes in [9910:f3496f4c2145:9911:b89a8cf5f401] in orange

Ignore:
Files:
8 edited

Unmodified
Removed
• ## Orange/clustering/hierarchical.py

 r9752 :param matrix: A distance matrix to perform the clustering on. :type matrix: :class:`Orange.core.SymMatrix` :type matrix: :class:`Orange.misc.SymMatrix` Let us construct a simple distance matrix and run clustering on it. :: import Orange from Orange.clustering import hierarchical m = [[], [ 3], [ 2, 4], [17, 5, 4], [ 2, 8, 3, 8], [ 7, 5, 10, 11, 2], [ 8, 4, 1, 5, 11, 13], [ 4, 7, 12, 8, 10, 1, 5], [13, 9, 14, 15, 7, 8, 4, 6], [12, 10, 11, 15, 2, 5, 7, 3, 1]] matrix = Orange.core.SymMatrix(m) root = hierarchical.HierarchicalClustering(matrix, linkage=hierarchical.HierarchicalClustering.Average) .. literalinclude:: code/hierarchical-example.py :lines: 1-14 Root is a root of the cluster hierarchy. We can print using a simple recursive function. :: def printClustering(cluster): if cluster.branches: return "(%s%s)" % (printClustering(cluster.left), printClustering(cluster.right)) else: return str(cluster[0]) .. literalinclude:: code/hierarchical-example.py :lines: 16-20 The output is not exactly nice, but it will have to do. Our clustering, supposedly the only) element of cluster, cluster[0], we shall print it out as a tuple. :: def printClustering2(cluster): if cluster.branches: return "(%s%s)" % (printClustering2(cluster.left), printClustering2(cluster.right)) else: return str(tuple(cluster)) .. literalinclude:: code/hierarchical-example.py :lines: 22-26 The distance matrix could have been given a list of objects. We could, for instance, put :: matrix.objects = ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"] .. literalinclude:: code/hierarchical-example.py :lines: 28-29 above calling the HierarchicalClustering. If we've forgotten to store the objects into matrix prior to clustering, nothing is lost. We can add it into clustering later, by :: root.mapping.objects = ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"] .. literalinclude:: code/hierarchical-example.py :lines: 31 So, what do these "objects" do? Call printClustering(root) again and you'll of ``root.left`` and ``root.right``. Let us write function for cluster pruning. :: def prune(cluster, togo): if cluster.branches: if togo<0: cluster.branches = None else: for branch in cluster.branches: prune(branch, togo-cluster.height) Let us write function for cluster pruning. .. literalinclude:: code/hierarchical-example.py :lines: 33-39 We shall use ``printClustering2`` here, since we can have multiple elements We've ended up with four clusters. Need a list of clusters? Here's the function. :: def listOfClusters0(cluster, alist): if not cluster.branches: alist.append(list(cluster)) else: for branch in cluster.branches: listOfClusters0(branch, alist) def listOfClusters(root): l = [] listOfClusters0(root, l) return l Here's the function. .. literalinclude:: code/hierarchical-example.py :lines: 41-51 The function returns a list of lists, in our case and cluster it with average linkage. Since we don't need the matrix, we shall let the clustering overwrite it (not that it's needed for such a small data set as Iris). :: import Orange from Orange.clustering import hierarchical data = Orange.data.Table("iris") matrix = Orange.core.SymMatrix(len(data)) matrix.setattr("objects", data) distance = Orange.distance.Euclidean(data) for i1, instance1 in enumerate(data): for i2 in range(i1+1, len(data)): matrix[i1, i2] = distance(instance1, data[i2]) clustering = hierarchical.HierarchicalClustering() clustering.linkage = clustering.Average clustering.overwrite_matrix = 1 root = clustering(matrix) such a small data set as Iris). .. literalinclude:: code/hierarchical-example-2.py :lines: 1-15 Note that we haven't forgotten to set the ``matrix.objects``. We did it through ``matrix.setattr`` to avoid the warning. Let us now prune the clustering using the function we've written above, and print out the clusters. :: prune(root, 1.4) for n, cluster in enumerate(listOfClusters(root)): print "\n\n Cluster %i \n" % n for instance in cluster: print instance clusters. .. literalinclude:: code/hierarchical-example-2.py :lines: 16-20 Since the printout is pretty long, it might be more informative to just print out the class distributions for each cluster. :: for cluster in listOfClusters(root): dist = Orange.core.get_class_distribution(cluster) for e, d in enumerate(dist): print "%s: %3.0f " % (data.domain.class_var.values[e], d), print print out the class distributions for each cluster. .. literalinclude:: code/hierarchical-example-2.py :lines: 22-26 Here's what it shows. :: instance, call a learning algorithms, passing a cluster as an argument. It won't mind. If you, however, want to have a list of table, you can easily convert the list by :: tables = [Orange.data.Table(cluster) for cluster in listOfClusters(root)] easily convert the list by .. literalinclude:: code/hierarchical-example-2.py :lines: 28 Finally, if you are dealing with examples, you may want to take the function """ distance = distance_constructor(data) matrix = orange.SymMatrix(len(data)) matrix = Orange.misc.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): """ matrix = orange.SymMatrix(len(data.domain.attributes)) matrix = Orange.misc.SymMatrix(len(data.domain.attributes)) for a1 in range(len(data.domain.attributes)): for a2 in range(a1): :type tree: :class:`HierarchicalCluster` :param matrix: SymMatrix that was used to compute the clustering. :type matrix: :class:`Orange.core.SymMatrix` :type matrix: :class:`Orange.misc.SymMatrix` :param progress_callback: Function used to report on progress. :type progress_callback: function :type tree: :class:`HierarchicalCluster` :param matrix: SymMatrix that was used to compute the clustering. :type matrix: :class:`Orange.core.SymMatrix` :type matrix: :class:`Orange.misc.SymMatrix` :param progress_callback: Function used to report on progress. :type progress_callback: function def feature_distance_matrix(data, distance=None, progress_callback=None): """ A helper function that computes an :class:`Orange.core.SymMatrix` of """ A helper function that computes an :class:`Orange.misc.SymMatrix` of all pairwise distances between features in `data`. :type progress_callback: function :rtype: :class:`Orange.core.SymMatrix` :rtype: :class:`Orange.misc.SymMatrix` """ attributes = data.domain.attributes matrix = orange.SymMatrix(len(attributes)) matrix = Orange.misc.SymMatrix(len(attributes)) iter_count = matrix.dim * (matrix.dim - 1) / 2 milestones = progress_bar_milestones(iter_count, 100) :type cluster: :class:`HierarchicalCluster` :rtype: :class:`Orange.core.SymMatrix` :rtype: :class:`Orange.misc.SymMatrix` """ mapping = cluster.mapping matrix = Orange.core.SymMatrix(len(mapping)) matrix = Orange.misc.SymMatrix(len(mapping)) for cluster in postorder(cluster): if cluster.branches: if __name__=="__main__": data = orange.ExampleTable("doc//datasets//brown-selected.tab") #    data = orange.ExampleTable("doc//datasets//iris.tab") root = hierarchicalClustering(data, order=True) #, linkage=orange.HierarchicalClustering.Single) attr_root = hierarchicalClustering_attributes(data, order=True) #    print root #    d = DendrogramPlotPylab(root, data=data, labels=[str(ex.getclass()) for ex in data], dendrogram_width=0.4, heatmap_width=0.3,  params={}, cmap=None) #    d.plot(show=True, filename="graph.png") dendrogram_draw("graph.eps", root, attr_tree=attr_root, data=data, labels=[str(e.getclass()) for e in data], tree_height=50, #width=500, height=500, cluster_colors={root.right:(255,0,0), root.right.right:(0,255,0)}, color_palette=ColorPalette([(255, 0, 0), (0,0,0), (0, 255,0)], gamma=0.5, overflow=(255, 255, 255), underflow=(255, 255, 255))) #, minv=-0.5, maxv=0.5)
• ## Orange/feature/discretization.py

 r9878 Discretization, \ Preprocessor_discretize def entropyDiscretization_wrapper(data):
• ## docs/reference/rst/Orange.data.rst

 r9896 Orange.data.sample Orange.data.formats Orange.data.discretization
• ## docs/reference/rst/Orange.evaluation.scoring.rst

 r9892 data set, we would compute the matrix like this:: cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \ vehicle.domain.classVar.values.index("van")) cm = Orange.evaluation.scoring.confusion_matrices(resVeh, vehicle.domain.classVar.values.index("van")) and get the results like these:: classes, you can also compute the `sensitivity `_ [TP/(TP+FN)], `specificity \ `_ [TN/(TN+FP)], `positive predictive value \ `_ [TP/(TP+FP)] and `negative predictive value \ `_ [TN/(TN+FN)]. [TP/(TP+FN)], `specificity `_ [TN/(TN+FP)], `positive predictive value `_ [TP/(TP+FP)] and `negative predictive value `_ [TN/(TN+FN)]. In information retrieval, positive predictive value is called precision (the ratio of the number of relevant records retrieved to the total number as F1 [2*precision*recall/(precision+recall)] or, for a general case, Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)]. The `Matthews correlation coefficient \ `_ The `Matthews correlation coefficient `_ in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value
• ## docs/reference/rst/Orange.feature.discretization.rst

 r9863 value according to the rule found by discretization. In this respect, the discretization behaves similar to :class:`Orange.classification.Learner`. Utility functions ================= Some functions and classes that can be used for categorization of continuous features. Besides several general classes that can help in this task, we also provide a function that may help in entropy-based discretization (Fayyad & Irani), and a wrapper around classes for categorization that can be used for learning. .. autoclass:: Orange.feature.discretization.DiscretizedLearner_Class .. autoclass:: DiscretizeTable .. rubric:: Example FIXME. A chapter on `feature subset selection <../ofb/o_fss.htm>`_ in Orange for Beginners tutorial shows the use of DiscretizedLearner. Other discretization classes from core Orange are listed in chapter on `categorization <../ofb/o_categorization.htm>`_ of the same tutorial. Discretization Algorithms
• ## docs/reference/rst/Orange.feature.imputation.rst

 r9890 capable of handling unknown values. Learners with imputer as a component ==================================== Imputer as a component ====================== Learners that cannot handle missing values should provide a slot :obj:`~Orange.classification.logreg.LogRegLearner` will pass them to :obj:`~Orange.classification.logreg.LogRegLearner.imputer_constructor` to get an imputer and used it to impute the missing values in the learning data. Imputed data is then used by the actual learning algorithm. Also, when a an imputer and use it to impute the missing values in the learning data. Imputed data is then used by the actual learning algorithm. When a classifier :obj:`~Orange.classification.logreg.LogRegClassifier` is constructed, the imputer is stored in its attribute :obj:`~Orange.classification.logreg.LogRegClassifier.imputer`. At classification, the same imputer is used for imputation of missing values constructed, the imputer is stored in its attribute :obj:`~Orange.classification.logreg.LogRegClassifier.imputer`. During classification the same imputer is used for imputation of missing values in (testing) examples. it is recommended to use imputation according to the described procedure. The choice of which imputer to use depends on the problem domain. In this example we want to impute the minimal value of each feature. The choice of the imputer depends on the problem domain. In this example the minimal value of each feature is imputed: .. literalinclude:: code/imputation-logreg.py .. note:: Note that just one instance of Just one instance of :obj:`~Orange.classification.logreg.LogRegLearner` is constructed and then used twice in each fold. Once it is given the original instances as they testing. Wrapper for learning algorithms =============================== Wrappers for learning ===================== In a learning/classification process, imputation is needed on two occasions. Before learning, the imputer needs to process the training examples. Before learning, the imputer needs to process the training instances. Afterwards, the imputer is called for each instance to be classified. For example, in cross validation, imputation should be done on training folds simply skips the corresponding attributes in the formula, while classification/regression trees have components for handling the missing values in various ways. If for any reason you want to use these algorithms to run on imputed data, you can use this wrapper. values in various ways. A wrapper is provided for learning algorithms that require imputed data. .. class:: ImputeLearner
• ## source/orange/_aliases.txt

 r9752 ExamplesDistance_Normalized feature_distances attribute_distances TransformValue sub_transformer subtransformer ImputerConstructor impute_class imputeClass
• ## source/orange/discretize.hpp

 r9863 __REGISTER_CLASS int maxNumberOfIntervals; //P maximal number of intervals; default = 0 (no limits) int maxNumberOfIntervals; //P(+n) maximal number of intervals; default = 0 (no limits) bool forceAttribute; //P minimal number of intervals; default = 0 (no limits)
Note: See TracChangeset for help on using the changeset viewer.