# Changes in [9919:8a2a770ef3af:9921:324a93fddcb0] in orange

Ignore:
Files:
7 deleted
24 edited

Unmodified
Removed
• ## .hgignore

 r9881 source/orangeom/lib_vectors_auto.txt # Ignore build and dist dir, created by setup.py build or setup.py bdist_* . # Ignore files created by setup.py. build dist MANIFEST Orange.egg-info # Ignore dot files. # Built documentation. docs/reference/html docs/*/html # Images generated by tests. # Files generated by tests. Orange/testing/regression/*/*.changed.txt Orange/testing/regression/*/*.crash.txt Orange/testing/regression/*/*.new.txt Orange/doc/modules/*.png docs/reference/rst/code/*.png Orange/doc/modules/tree1.dot Orange/doc/reference/del2.tab Orange/doc/reference/undefined-saved-dc-dk.tab Orange/doc/reference/undefined-saved-na.tab Orange/testing/regression/results_orange25/unusedValues.py.txt docs/reference/rst/code/iris.testsave.arff docs/tutorial/rst/code/adult_sample_sampled.tab docs/tutorial/rst/code/tree.dot
• ## Orange/clustering/hierarchical.py

 r9752 :param matrix: A distance matrix to perform the clustering on. :type matrix: :class:`Orange.core.SymMatrix` :type matrix: :class:`Orange.misc.SymMatrix` Let us construct a simple distance matrix and run clustering on it. :: import Orange from Orange.clustering import hierarchical m = [[], [ 3], [ 2, 4], [17, 5, 4], [ 2, 8, 3, 8], [ 7, 5, 10, 11, 2], [ 8, 4, 1, 5, 11, 13], [ 4, 7, 12, 8, 10, 1, 5], [13, 9, 14, 15, 7, 8, 4, 6], [12, 10, 11, 15, 2, 5, 7, 3, 1]] matrix = Orange.core.SymMatrix(m) root = hierarchical.HierarchicalClustering(matrix, linkage=hierarchical.HierarchicalClustering.Average) .. literalinclude:: code/hierarchical-example.py :lines: 1-14 Root is a root of the cluster hierarchy. We can print using a simple recursive function. :: def printClustering(cluster): if cluster.branches: return "(%s%s)" % (printClustering(cluster.left), printClustering(cluster.right)) else: return str(cluster[0]) .. literalinclude:: code/hierarchical-example.py :lines: 16-20 The output is not exactly nice, but it will have to do. Our clustering, supposedly the only) element of cluster, cluster[0], we shall print it out as a tuple. :: def printClustering2(cluster): if cluster.branches: return "(%s%s)" % (printClustering2(cluster.left), printClustering2(cluster.right)) else: return str(tuple(cluster)) .. literalinclude:: code/hierarchical-example.py :lines: 22-26 The distance matrix could have been given a list of objects. We could, for instance, put :: matrix.objects = ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"] .. literalinclude:: code/hierarchical-example.py :lines: 28-29 above calling the HierarchicalClustering. If we've forgotten to store the objects into matrix prior to clustering, nothing is lost. We can add it into clustering later, by :: root.mapping.objects = ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"] .. literalinclude:: code/hierarchical-example.py :lines: 31 So, what do these "objects" do? Call printClustering(root) again and you'll of ``root.left`` and ``root.right``. Let us write function for cluster pruning. :: def prune(cluster, togo): if cluster.branches: if togo<0: cluster.branches = None else: for branch in cluster.branches: prune(branch, togo-cluster.height) Let us write function for cluster pruning. .. literalinclude:: code/hierarchical-example.py :lines: 33-39 We shall use ``printClustering2`` here, since we can have multiple elements We've ended up with four clusters. Need a list of clusters? Here's the function. :: def listOfClusters0(cluster, alist): if not cluster.branches: alist.append(list(cluster)) else: for branch in cluster.branches: listOfClusters0(branch, alist) def listOfClusters(root): l = [] listOfClusters0(root, l) return l Here's the function. .. literalinclude:: code/hierarchical-example.py :lines: 41-51 The function returns a list of lists, in our case and cluster it with average linkage. Since we don't need the matrix, we shall let the clustering overwrite it (not that it's needed for such a small data set as Iris). :: import Orange from Orange.clustering import hierarchical data = Orange.data.Table("iris") matrix = Orange.core.SymMatrix(len(data)) matrix.setattr("objects", data) distance = Orange.distance.Euclidean(data) for i1, instance1 in enumerate(data): for i2 in range(i1+1, len(data)): matrix[i1, i2] = distance(instance1, data[i2]) clustering = hierarchical.HierarchicalClustering() clustering.linkage = clustering.Average clustering.overwrite_matrix = 1 root = clustering(matrix) such a small data set as Iris). .. literalinclude:: code/hierarchical-example-2.py :lines: 1-15 Note that we haven't forgotten to set the ``matrix.objects``. We did it through ``matrix.setattr`` to avoid the warning. Let us now prune the clustering using the function we've written above, and print out the clusters. :: prune(root, 1.4) for n, cluster in enumerate(listOfClusters(root)): print "\n\n Cluster %i \n" % n for instance in cluster: print instance clusters. .. literalinclude:: code/hierarchical-example-2.py :lines: 16-20 Since the printout is pretty long, it might be more informative to just print out the class distributions for each cluster. :: for cluster in listOfClusters(root): dist = Orange.core.get_class_distribution(cluster) for e, d in enumerate(dist): print "%s: %3.0f " % (data.domain.class_var.values[e], d), print print out the class distributions for each cluster. .. literalinclude:: code/hierarchical-example-2.py :lines: 22-26 Here's what it shows. :: instance, call a learning algorithms, passing a cluster as an argument. It won't mind. If you, however, want to have a list of table, you can easily convert the list by :: tables = [Orange.data.Table(cluster) for cluster in listOfClusters(root)] easily convert the list by .. literalinclude:: code/hierarchical-example-2.py :lines: 28 Finally, if you are dealing with examples, you may want to take the function """ distance = distance_constructor(data) matrix = orange.SymMatrix(len(data)) matrix = Orange.misc.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): """ matrix = orange.SymMatrix(len(data.domain.attributes)) matrix = Orange.misc.SymMatrix(len(data.domain.attributes)) for a1 in range(len(data.domain.attributes)): for a2 in range(a1): :type tree: :class:`HierarchicalCluster` :param matrix: SymMatrix that was used to compute the clustering. :type matrix: :class:`Orange.core.SymMatrix` :type matrix: :class:`Orange.misc.SymMatrix` :param progress_callback: Function used to report on progress. :type progress_callback: function :type tree: :class:`HierarchicalCluster` :param matrix: SymMatrix that was used to compute the clustering. :type matrix: :class:`Orange.core.SymMatrix` :type matrix: :class:`Orange.misc.SymMatrix` :param progress_callback: Function used to report on progress. :type progress_callback: function def feature_distance_matrix(data, distance=None, progress_callback=None): """ A helper function that computes an :class:`Orange.core.SymMatrix` of """ A helper function that computes an :class:`Orange.misc.SymMatrix` of all pairwise distances between features in `data`. :type progress_callback: function :rtype: :class:`Orange.core.SymMatrix` :rtype: :class:`Orange.misc.SymMatrix` """ attributes = data.domain.attributes matrix = orange.SymMatrix(len(attributes)) matrix = Orange.misc.SymMatrix(len(attributes)) iter_count = matrix.dim * (matrix.dim - 1) / 2 milestones = progress_bar_milestones(iter_count, 100) :type cluster: :class:`HierarchicalCluster` :rtype: :class:`Orange.core.SymMatrix` :rtype: :class:`Orange.misc.SymMatrix` """ mapping = cluster.mapping matrix = Orange.core.SymMatrix(len(mapping)) matrix = Orange.misc.SymMatrix(len(mapping)) for cluster in postorder(cluster): if cluster.branches: if __name__=="__main__": data = orange.ExampleTable("doc//datasets//brown-selected.tab") #    data = orange.ExampleTable("doc//datasets//iris.tab") root = hierarchicalClustering(data, order=True) #, linkage=orange.HierarchicalClustering.Single) attr_root = hierarchicalClustering_attributes(data, order=True) #    print root #    d = DendrogramPlotPylab(root, data=data, labels=[str(ex.getclass()) for ex in data], dendrogram_width=0.4, heatmap_width=0.3,  params={}, cmap=None) #    d.plot(show=True, filename="graph.png") dendrogram_draw("graph.eps", root, attr_tree=attr_root, data=data, labels=[str(e.getclass()) for e in data], tree_height=50, #width=500, height=500, cluster_colors={root.right:(255,0,0), root.right.right:(0,255,0)}, color_palette=ColorPalette([(255, 0, 0), (0,0,0), (0, 255,0)], gamma=0.5, overflow=(255, 255, 255), underflow=(255, 255, 255))) #, minv=-0.5, maxv=0.5)
• ## Orange/distance/__init__.py

 r9805 def distance_matrix(data, distance_constructor=Euclidean, progress_callback=None): """ A helper function that computes an :obj:`Orange.data.SymMatrix` of all """ A helper function that computes an :obj:`Orange.misc.SymMatrix` of all pairwise distances between instances in `data`. :type progress_callback: function :rtype: :class:`Orange.data.SymMatrix` """ matrix = Orange.data.SymMatrix(len(data)) :rtype: :class:`Orange.misc.SymMatrix` """ matrix = Orange.misc.SymMatrix(len(data)) dist = distance_constructor(data)
• ## Orange/evaluation/testing.py

 r9697 """Appends a new result (class and probability prediction by a single classifier) to the classes and probabilities field.""" if type(aclass)==list: self.classes.append(aclass) self.probabilities.append(aprob) if type(aclass)==int: self.classes.append(int(aclass)) self.probabilities.append(list(aprob)) elif type(aclass.value)==float: self.classes.append(float(aclass)) self.probabilities.append(aprob) else: self.classes.append(int(aclass)) self.probabilities.append(list(aprob)) self.classes.append(aclass) self.probabilities.append(aprob) def set_result(self, i, aclass, aprob):
• ## Orange/feature/discretization.py

 r9878 Discretization, \ Preprocessor_discretize def entropyDiscretization_wrapper(data):
• ## Orange/fixes/fix_changed_names.py

 r9918 "orange.newmetaid": "Orange.feature.new_meta_id", "orange.SymMatrix": "Orange.data.SymMatrix", "orange.SymMatrix": "Orange.misc.SymMatrix", "orange.GetValue": "Orange.classification:Classifier.GetValue", "orange.GetProbabilities": "Orange.classification:Classifier.GetProbabilities",
• ## Orange/network/deprecated.py

 r9671 :param matrix: number of objects in a matrix must match the number of vertices in a network. :type matrix: Orange.core.SymMatrix :type matrix: Orange.misc.SymMatrix :param lower: lower distance bound. :type lower: float self.mdsStep = 0 self.stopMDS = 0 self.vertexDistance.matrixType = Orange.core.SymMatrix.Symmetric self.vertexDistance.matrixType = Orange.misc.SymMatrix.Symmetric self.diag_coors = math.sqrt((min(self.graph.coors[0]) -  \ max(self.graph.coors[0]))**2 + \
• ## Orange/network/network.py

 r9671 self.mdsStep = 0 self.stopMDS = 0 self.items_matrix.matrixType = Orange.core.SymMatrix.Symmetric self.items_matrix.matrixType = Orange.misc.SymMatrix.Symmetric self.diag_coors = math.sqrt((min(self.coors[0]) - \ max(self.coors[0])) ** 2 + \
• ## Orange/preprocess/outliers.py

 r9765 other distance measures """ self.distmatrix = Orange.core.SymMatrix(len(self.examples)) #FIXME self.distmatrix = Orange.misc.SymMatrix(len(self.examples)) #FIXME for i in range(len(self.examples)): for j in range(i + 1):
• ## Orange/projection/linear.py

 r9880 if distances: if n_valid != len(valid_data): classes = Orange.core.SymMatrix(n_valid) classes = Orange.misc.SymMatrix(n_valid) r = 0 for ro, vr in enumerate(valid_data):
• ## Orange/projection/mds.py

 r9725 :param distances: original dissimilarity - a distance matrix to operate on. :type distances: :class:`Orange.core.SymMatrix` :type distances: :class:`Orange.misc.SymMatrix` :param dim: dimension of the projected space. .. attribute:: distances An :class:`Orange.core.SymMatrix` containing the distances that we An :class:`Orange.misc.SymMatrix` containing the distances that we want to achieve (lsmt changes these). .. attribute:: projected_distances An :class:`Orange.core.SymMatrix` containing the distances between An :class:`Orange.misc.SymMatrix` containing the distances between projected points. .. attribute:: original_distances An :class:`Orange.core.SymMatrix` containing the original distances An :class:`Orange.misc.SymMatrix` containing the original distances between points. .. attribute:: stress An :class:`Orange.core.SymMatrix` holding the stress. An :class:`Orange.misc.SymMatrix` holding the stress. .. attribute:: dim def __init__(self, distances=None, dim=2, **kwargs): self.mds=orangemds.MDS(distances, dim, **kwargs) self.original_distances=Orange.core.SymMatrix([m for m in self.distances]) self.original_distances=Orange.misc.SymMatrix([m for m in self.distances]) def __getattr__(self, name):
• ## Orange/testing/unit/tests/test_hclustering.py

 r9724 [13,  9, 14, 15,  7,  8,  4,  6], [12, 10, 11, 15,  2,  5,  7,  3,  1]] self.matrix = Orange.core.SymMatrix(m) self.matrix = Orange.misc.SymMatrix(m) self.matrix.setattr("objects", ["Ann", "Bob", "Curt", "Danny", "Eve", "Fred", "Greg", "Hue", "Ivy", "Jon"]) self.cluster = hier.HierarchicalClustering(self.matrix)
• ## Orange/testing/unit/tests/test_refactoring.py

 r9862 """ import sys, os import unittest def rhasattr(obj, name): """ Recursive hasattr. """ while "." in name: first, name = name.split(".", 1) def rgetattr(obj, name): """ Recursive getattr """ while "." in name: first, name = name.split(".", 1) def import_package(name): """ Import a package and return it. """ mod = __import__(name) if "." in name: self.assertTrue(rhasattr(old_mod, old_name), "{0} is missing".format(old)) self.assertTrue(rhasattr(new_mod, new_name), "{0} is missing".format(new)) self.assertTrue(rhasattr(old_mod, old_name), "{0} is missing".format(old)) self.assertTrue(rhasattr(new_mod, new_name), "{0} is missing".format(new)) def test_import_mapping(self): if __name__ == "__main__": unittest.main()
• ## docs/extend-widgets/rst/conf.py

 r9402 # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. #sys.path.append(os.path.abspath('.')) sys.path.append(os.path.abspath('../../../orange')) import Orange # -- General configuration -----------------------------------------------------
• ## docs/extend-widgets/rst/index.rst

 r9402 ########################## Contents: .. toctree:: :maxdepth: 3 OrangeWidgets.plot **************** * :ref:`genindex` * :ref:`modindex` * :ref:`search`
• ## docs/reference/rst/Orange.data.rst

 r9896 Orange.data.sample Orange.data.formats Orange.data.discretization
• ## docs/reference/rst/Orange.evaluation.scoring.rst

 r9892 data set, we would compute the matrix like this:: cm = Orange.evaluation.scoring.confusion_matrices(resVeh, \ vehicle.domain.classVar.values.index("van")) cm = Orange.evaluation.scoring.confusion_matrices(resVeh, vehicle.domain.classVar.values.index("van")) and get the results like these:: classes, you can also compute the `sensitivity `_ [TP/(TP+FN)], `specificity \ `_ [TN/(TN+FP)], `positive predictive value \ `_ [TP/(TP+FP)] and `negative predictive value \ `_ [TN/(TN+FN)]. [TP/(TP+FN)], `specificity `_ [TN/(TN+FP)], `positive predictive value `_ [TP/(TP+FP)] and `negative predictive value `_ [TN/(TN+FN)]. In information retrieval, positive predictive value is called precision (the ratio of the number of relevant records retrieved to the total number as F1 [2*precision*recall/(precision+recall)] or, for a general case, Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)]. The `Matthews correlation coefficient \ `_ The `Matthews correlation coefficient `_ in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value
• ## docs/reference/rst/Orange.feature.discretization.rst

 r9863 value according to the rule found by discretization. In this respect, the discretization behaves similar to :class:`Orange.classification.Learner`. Utility functions ================= Some functions and classes that can be used for categorization of continuous features. Besides several general classes that can help in this task, we also provide a function that may help in entropy-based discretization (Fayyad & Irani), and a wrapper around classes for categorization that can be used for learning. .. autoclass:: Orange.feature.discretization.DiscretizedLearner_Class .. autoclass:: DiscretizeTable .. rubric:: Example FIXME. A chapter on `feature subset selection <../ofb/o_fss.htm>`_ in Orange for Beginners tutorial shows the use of DiscretizedLearner. Other discretization classes from core Orange are listed in chapter on `categorization <../ofb/o_categorization.htm>`_ of the same tutorial. Discretization Algorithms
• ## docs/reference/rst/Orange.feature.imputation.rst

 r9890 capable of handling unknown values. Learners with imputer as a component ==================================== Imputer as a component ====================== Learners that cannot handle missing values should provide a slot :obj:`~Orange.classification.logreg.LogRegLearner` will pass them to :obj:`~Orange.classification.logreg.LogRegLearner.imputer_constructor` to get an imputer and used it to impute the missing values in the learning data. Imputed data is then used by the actual learning algorithm. Also, when a an imputer and use it to impute the missing values in the learning data. Imputed data is then used by the actual learning algorithm. When a classifier :obj:`~Orange.classification.logreg.LogRegClassifier` is constructed, the imputer is stored in its attribute :obj:`~Orange.classification.logreg.LogRegClassifier.imputer`. At classification, the same imputer is used for imputation of missing values constructed, the imputer is stored in its attribute :obj:`~Orange.classification.logreg.LogRegClassifier.imputer`. During classification the same imputer is used for imputation of missing values in (testing) examples. it is recommended to use imputation according to the described procedure. The choice of which imputer to use depends on the problem domain. In this example we want to impute the minimal value of each feature. The choice of the imputer depends on the problem domain. In this example the minimal value of each feature is imputed: .. literalinclude:: code/imputation-logreg.py .. note:: Note that just one instance of Just one instance of :obj:`~Orange.classification.logreg.LogRegLearner` is constructed and then used twice in each fold. Once it is given the original instances as they testing. Wrapper for learning algorithms =============================== Wrappers for learning ===================== In a learning/classification process, imputation is needed on two occasions. Before learning, the imputer needs to process the training examples. Before learning, the imputer needs to process the training instances. Afterwards, the imputer is called for each instance to be classified. For example, in cross validation, imputation should be done on training folds simply skips the corresponding attributes in the formula, while classification/regression trees have components for handling the missing values in various ways. If for any reason you want to use these algorithms to run on imputed data, you can use this wrapper. values in various ways. A wrapper is provided for learning algorithms that require imputed data. .. class:: ImputeLearner
• ## docs/reference/rst/index.rst

 r9897 Orange.misc OrangeWidgets.plot **************** Index and search
• ## install-scripts/createSnapshot.btm

 r9730 rem # update source(s) to revision HEAD cdd %TMPDIR hg pull --update rem # build core
• ## install-scripts/updateAndCall.btm

 r9730 REM hg clone https://bitbucket.org/biolab/orange snapshot cdd snapshot hg pull --update hg pull hg update REM hg clone https://bitbucket.org/biolab/orange-addon-bioinformatics Bioinformatics cdd Bioinformatics hg pull --update hg pull hg update cdd  e:\orange\scripts\snapshot REM hg clone https://bitbucket.org/biolab/orange-addon-text Text cdd Text hg pull --update hg pull hg update cdd e:\orange\scripts copy /r snapshot\install-scripts\* . copy /q snapshot\install-scripts\* . #svn update -N #svn export http://orange.biolab.si/svn/orange/trunk/orange/doc/LICENSES license.txt #svn call callCreateSnapshot.btm shutdown -s
• ## source/orange/_aliases.txt

 r9919 feature_distances attribute_distances TransformValue sub_transformer subtransformer ImputerConstructor impute_class imputeClass Variable retrieve get_existing
• ## source/orange/discretize.hpp

 r9863 __REGISTER_CLASS int maxNumberOfIntervals; //P maximal number of intervals; default = 0 (no limits) int maxNumberOfIntervals; //P(+n) maximal number of intervals; default = 0 (no limits) bool forceAttribute; //P minimal number of intervals; default = 0 (no limits)
Note: See TracChangeset for help on using the changeset viewer.