# Changeset 9483:a0f1e6b2fcdf in orange

Ignore:
Timestamp:
08/16/11 14:21:56 (3 years ago)
Branch:
default
Convert:
Message:

Merge trunk into MLC branch.

Location:
orange
Files:
3 deleted
96 edited

Unmodified
Removed
• ## orange/Orange/__init__.py

 r9474 _import("misc.render") _import("misc.selection") _import("misc.r") #_import("misc.r")
• ## orange/Orange/classification/bayes.py

 r9450 """ .. index:: naive Bayes classifier .. index:: single: classification; naive Bayes classifier ********************************** Naive Bayes classifier (bayes) ********************************** The most primitive Bayesian classifier is :obj:NaiveLearner. Naive Bayes classification algorithm _ estimates conditional probabilities from training data and uses them for classification of new data instances. The algorithm learns very fast if all features in the training data set are discrete. If a number of features are continues, though, the algorithm runs slower due to time spent to estimate continuous conditional distributions. The following example demonstrates a straightforward invocation of this algorithm (bayes-run.py_, uses titanic.tab_): .. literalinclude:: code/bayes-run.py :lines: 7- .. index:: Naive Bayesian Learner .. autoclass:: Orange.classification.bayes.NaiveLearner :members: :show-inheritance: .. autoclass:: Orange.classification.bayes.NaiveClassifier :members: :show-inheritance: Examples ======== :obj:NaiveLearner can estimate probabilities using relative frequencies or m-estimate (bayes-mestimate.py_, uses lenses.tab_): .. literalinclude:: code/bayes-mestimate.py :lines: 7- Observing conditional probabilities in an m-estimate based classifier shows a shift towards the second class - as compared to probabilities above, where relative frequencies were used. Note that the change in error estimation did not have any effect on apriori probabilities (bayes-thresholdAdjustment.py_, uses adult-sample.tab_): .. literalinclude:: code/bayes-thresholdAdjustment.py :lines: 7- Setting adjustThreshold parameter can sometimes improve the results. Those are the classification accuracies of 10-fold cross-validation of a normal naive bayesian classifier, and one with an adjusted threshold:: [0.7901746265516516, 0.8280138859667578] Probabilities for continuous features are estimated with \ :class:ProbabilityEstimatorConstructor_loess. (bayes-plot-iris.py_, uses iris.tab_): .. literalinclude:: code/bayes-plot-iris.py :lines: 4- .. image:: code/bayes-iris.png :scale: 50 % If petal lengths are shorter, the most probable class is "setosa". Irises with middle petal lengths belong to "versicolor", while longer petal lengths indicate for "virginica". Critical values where the decision would change are at about 5.4 and 6.3. .. _bayes-run.py: code/bayes-run.py .. _bayes-thresholdAdjustment.py: code/bayes-thresholdAdjustment.py .. _bayes-mestimate.py: code/bayes-mestimate.py .. _bayes-plot-iris.py: code/bayes-plot-iris.py .. _adult-sample.tab: code/adult-sample.tab .. _iris.tab: code/iris.tab .. _titanic.tab: code/iris.tab .. _lenses.tab: code/lenses.tab Implementation details ====================== The following two classes are implemented in C++ (*bayes.cpp*). They are not intended to be used directly. Here we provide implementation details for those interested. Orange.core.BayesLearner ------------------------ Fields estimatorConstructor, conditionalEstimatorConstructor and conditionalEstimatorConstructorContinuous are empty (None) by default. If estimatorConstructor is left undefined, p(C) will be estimated by relative frequencies of examples (see ProbabilityEstimatorConstructor_relative). When conditionalEstimatorConstructor is left undefined, it will use the same constructor as for estimating unconditional probabilities (estimatorConstructor is used as an estimator in ConditionalProbabilityEstimatorConstructor_ByRows). That is, by default, both will use relative frequencies. But when estimatorConstructor is set to, for instance, estimate probabilities by m-estimate with m=2.0, the same estimator will be used for estimation of conditional probabilities, too. P(c|vi) for continuous attributes are, by default, estimated with loess (a variant of locally weighted linear regression), using ConditionalProbabilityEstimatorConstructor_loess. The learner first constructs an estimator for p(C). It tries to get a precomputed distribution of probabilities; if the estimator is capable of returning it, the distribution is stored in the classifier's field distribution and the just constructed estimator is disposed. Otherwise, the estimator is stored in the classifier's field estimator, while the distribution is left empty. The same is then done for conditional probabilities. Different constructors are used for discrete and continuous attributes. If the constructed estimator can return all conditional probabilities in form of Contingency, the contingency is stored and the estimator disposed. If not, the estimator is stored. If there are no contingencies when the learning is finished, the resulting classifier's conditionalDistributions is None. Alternatively, if all probabilities are stored as contingencies, the conditionalEstimators fields is None. Field normalizePredictions is copied to the resulting classifier. Orange.core.BayesClassifier --------------------------- Class NaiveClassifier represents a naive bayesian classifier. Probability of class C, knowing that values of features :math:F_1, F_2, ..., F_n are :math:v_1, v_2, ..., v_n, is computed as :math:p(C|v_1, v_2, ..., v_n) = \ p(C) \\cdot \\frac{p(C|v_1)}{p(C)} \\cdot \\frac{p(C|v_2)}{p(C)} \\cdot ... \ \\cdot \\frac{p(C|v_n)}{p(C)}. Note that when relative frequencies are used to estimate probabilities, the more usual formula (with factors of form :math:\\frac{p(v_i|C)}{p(v_i)}) and the above formula are exactly equivalent (without any additional assumptions of independency, as one could think at a first glance). The difference becomes important when using other ways to estimate probabilities, like, for instance, m-estimate. In this case, the above formula is much more appropriate. When computing the formula, probabilities p(C) are read from distribution, which is of type Distribution, and stores a (normalized) probability of each class. When distribution is None, BayesClassifier calls estimator to assess the probability. The former method is faster and is actually used by all existing methods of probability estimation. The latter is more flexible. Conditional probabilities are computed similarly. Field conditionalDistribution is of type DomainContingency which is basically a list of instances of Contingency, one for each attribute; the outer variable of the contingency is the attribute and the inner is the class. Contingency can be seen as a list of normalized probability distributions. For attributes for which there is no contingency in conditionalDistribution a corresponding estimator in conditionalEstimators is used. The estimator is given the attribute value and returns distributions of classes. If neither, nor pre-computed contingency nor conditional estimator exist, the attribute is ignored without issuing any warning. The attribute is also ignored if its value is undefined; this cannot be overriden by estimators. Any field (distribution, estimator, conditionalDistributions, conditionalEstimators) can be None. For instance, BayesLearner normally constructs a classifier which has either distribution or estimator defined. While it is not an error to have both, only distribution will be used in that case. As for the other two fields, they can be both defined and used complementarily; the elements which are missing in one are defined in the other. However, if there is no need for estimators, BayesLearner will not construct an empty list; it will not construct a list at all, but leave the field conditionalEstimators empty. If you only need probabilities of individual class call BayesClassifier's method p(class, example) to compute the probability of this class only. Note that this probability will not be normalized and will thus, in general, not equal the probability returned by the call operator. """ import Orange from Orange.core import BayesClassifier as _BayesClassifier .. :param adjustTreshold: sets the corresponding attribute :type adjustTreshold: boolean :param adjust_threshold: sets the corresponding attribute :type adjust_threshold: boolean :param m: sets the :obj:estimatorConstructor to :class:orange.ProbabilityEstimatorConstructor_m with specified m :type m: integer :param estimatorConstructor: sets the corresponding attribute :type estimatorConstructor: orange.ProbabilityEstimatorConstructor :param conditionalEstimatorConstructor: sets the corresponding attribute :type conditionalEstimatorConstructor: :param estimator_constructor: sets the corresponding attribute :type estimator_constructor: orange.ProbabilityEstimatorConstructor :param conditional_estimator_constructor: sets the corresponding attribute :type conditional_estimator_constructor: :class:orange.ConditionalProbabilityEstimatorConstructor :param conditionalEstimatorConstructorContinuous: sets the corresponding :param conditional_estimator_constructor_continuous: sets the corresponding attribute :type conditionalEstimatorConstructorContinuous: :type conditional_estimator_constructor_continuous: :class:orange.ConditionalProbabilityEstimatorConstructor Constructor parameters set the corresponding attributes. .. attribute:: adjustTreshold .. attribute:: adjust_threshold If set and the class is binary, the classifier's This attribute is ignored if you also set estimatorConstructor. .. attribute:: estimatorConstructor .. attribute:: estimator_constructor Probability estimator constructor for Setting this attribute disables the above described attribute m. .. attribute:: conditionalEstimatorConstructor .. attribute:: conditional_estimator_constructor Probability estimator constructor the estimator for prior probabilities will be used. .. attribute:: conditionalEstimatorConstructorContinuous .. attribute:: conditional_estimator_constructor_continuous Probability estimator constructor for conditional probabilities for "conditionalEstimatorConstructorContinuous":"conditional_estimator_constructor_continuous", "weightID": "weight_id" }, in_place=False)(NaiveLearner) }, in_place=True)(NaiveLearner) :class:Orange.core.BayesClassifier that does the actual classification. :param baseClassifier: an :class:Orange.core.BayesLearner to wrap. If :param base_classifier: an :class:Orange.core.BayesLearner to wrap. If not set, a new :class:Orange.core.BayesLearner is created. :type baseClassifier: :class:Orange.core.BayesLearner :type base_classifier: :class:Orange.core.BayesLearner .. attribute:: distribution An object that returns a probability of class p(C) for a given class C. .. attribute:: conditionalDistributions .. attribute:: conditional_distributions A list of conditional probabilities. .. attribute:: conditionalEstimators .. attribute:: conditional_estimators A list of estimators for conditional probabilities. .. attribute:: adjustThreshold .. attribute:: adjust_threshold For binary classes, this tells the learner to """ def __init__(self, baseClassifier=None): if not baseClassifier: baseClassifier = _BayesClassifier() self.nativeBayesClassifier = baseClassifier for k, v in self.nativeBayesClassifier.__dict__.items(): def __init__(self, base_classifier=None): if not base_classifier: base_classifier = _BayesClassifier() self.native_bayes_classifier = base_classifier for k, v in self.native_bayes_classifier.__dict__.items(): self.__dict__[k] = v :class:Orange.statistics.Distribution or a tuple with both """ return self.nativeBayesClassifier(instance, result_type, *args, **kwdargs) return self.native_bayes_classifier(instance, result_type, *args, **kwdargs) def __setattr__(self, name, value): if name == "nativeBayesClassifier": if name == "native_bayes_classifier": self.__dict__[name] = value return if name in self.nativeBayesClassifier.__dict__: self.nativeBayesClassifier.__dict__[name] = value if name in self.native_bayes_classifier.__dict__: self.native_bayes_classifier.__dict__[name] = value self.__dict__[name] = value """ return self.nativeBayesClassifier.p(class_, instance) return self.native_bayes_classifier.p(class_, instance) def __str__(self): """return classifier in human friendly format.""" nValues=len(self.classVar.values) frmtStr=' %10.3f'*nValues classes=" "*20+ ((' %10s'*nValues) % tuple([i[:10] for i in self.classVar.values])) """Return classifier in human friendly format.""" nvalues=len(self.class_var.values) frmtStr=' %10.3f'*nvalues classes=" "*20+ ((' %10s'*nvalues) % tuple([i[:10] for i in self.class_var.values])) return "\n".join([ ("%20s" % i.variable.values[v][:20]) + (frmtStr % tuple(i[v])) for v in xrange(len(i.variable.values)))] ) for i in self.conditionalDistributions])]) ) for i in self.conditional_distributions if i.variable.var_type == Orange.data.variable.Discrete])])
• ## orange/Orange/classification/majority.py

 r9450 MajorityLearner will most often be used as is, without setting any features. Nevertheless, it has two. parameters. Nevertheless, it has two. .. attribute:: estimatorConstructor .. attribute:: estimator_constructor An estimator constructor that can be used for estimation of this class. .. attribute:: aprioriDistribution .. attribute:: apriori_distribution Apriori class distribution that is passed to estimator same class probabilities. .. attribute:: defaultVal .. attribute:: default_val Value that is returned by the classifier. .. attribute:: defaultDistribution .. attribute:: default_distribution Class probabilities returned by the classifier. The ConstantClassifier's constructor can be called without arguments, with value (for defaultVal), variable (for classVar). If the value is given and is of type orange.Value (alternatives are an integer index of a discrete value or a continuous value), its field variable is will either be used for initializing classVar if variable is not given as with value (for default_val), variable (for class_var). If the value is given and is of type Orange.data.Value (alternatives are an integer index of a discrete value or a continuous value), its field variable will either be used for initializing class_var if variable is not given as an argument, or checked against the variable argument, if it is given.
• ## orange/Orange/classification/tree.py

 r9450 If node is None, we have a null-node; null nodes don't count, so we return 0. Otherwise, the size is 1 (this node) plus the sizes of all subtrees. The node is an internal node if it has a :obj:branchSelector; subtrees. The node is an internal node if it has a :obj:branch_selector; it there's no selector, it's a leaf. Don't attempt to skip the if statement: leaves don't have an empty list of branches, they don't have of the attribute's name and distribution of classes. :obj:Node's branch description is, for all currently defined splits, an instance of a class derived from :obj:orange.Classifier (in fact, it is a :obj:orange.ClassifierFromVarFD, but a :obj:orange.Classifier would suffice), and its :obj:classVar points to the attribute we seek. of a class derived from :obj:Orange.classification.Classifier (in fact, it is a :obj:orange.ClassifierFromVarFD, but a :obj:Orange.classification.Classifier would suffice), and its :obj:class_var points to the attribute we seek. So we print its name. We will also assume that storing class distributions has not been disabled and print them as well.  Then we iterate through Finally, if the node is a leaf, we print out the distribution of learning examples in the node and the class to which the examples in the node would be classified. We again assume that the :obj:nodeClassifier is would be classified. We again assume that the :obj:node_classifier is the default one - a :obj:DefaultClassifier. A better print function should be aware of possible alternatives. There's nothing to prune at null-nodes or leaves, so we act only when :obj:node and :obj:node.branchSelector are defined. If level is :obj:node and :obj:node.branch_selector are defined. If level is not zero, we call the function for each branch. Otherwise, we clear the selector, branches and branch descriptions. We can now examine the default stopping parameters. >>> print learner.stop.maxMajority, learner.stop.minExamples >>> print learner.stop.max_majority, learner.stop.min_examples 1.0 0.0 see what comes out. >>> learner.stop.minExamples = 5.0 >>> learner.stop.min_examples = 5.0 >>> tree = learner(data) >>> print tree.dump() the maximal proportion of majority class. >>> learner.stop.maxMajority = 0.5 >>> learner.stop.max_majority = 0.5 >>> tree = learner(data) >>> print tree.dump() Stores a distribution for learning examples belonging to the node.  Storing distributions can be disabled by setting the :obj:_TreeLearner's storeDistributions flag to false. :obj:_TreeLearner's store_distributions flag to false. .. attribute:: contingency Stores complete contingency matrices for the learning examples belonging to the node. Storing contingencies can be enabled by setting :obj:_TreeLearner's :obj:storeContingencies flag to setting :obj:_TreeLearner's :obj:store_contingencies flag to true. Note that even when the flag is not set, the contingencies get computed and stored to :obj:Node, but are removed shortly :obj:Orange.data.Table contain reference to examples in the root's :obj:Orange.data.Table. Examples are only stored if a corresponding flag (:obj:storeExamples) has been set while a corresponding flag (:obj:store_examples) has been set while building the tree; to conserve the space, storing is disabled by default. .. attribute:: nodeClassifier .. attribute:: node_classifier A classifier (usually, but not necessarily, a decide the final class (or class distribution) of an example. If it's an internal node, it is stored if :obj:Node's flag :obj:storeNodeClassifier is set. Since the :obj:nodeClassifier :obj:store_node_classifier is set. Since the :obj:node_classifier is needed by :obj:Descender and for pruning (see far below), this is the default behaviour; space consumption of the default can be None; in this case the node is empty. .. attribute:: branchDescriptions .. attribute:: branch_descriptions A list with string descriptions for branches, constructed by descriptions, but basically, expect things like 'red' or '>12.3'. .. attribute:: branchSizes .. attribute:: branch_sizes Gives a (weighted) number of training examples that went into probabilities when classifying examples with unknown values. .. attribute:: branchSelector .. attribute:: branch_selector Gives a branch for each example. The same object is used during learning and classifying. The :obj:branchSelector is of type :obj:orange.Classifier, since its job is during learning and classifying. The :obj:branch_selector is of type :obj:Orange.classification.Classifier, since its job is similar to that of a classifier: it gets an example and returns discrete :obj:Orange.data.Value in range :samp:[0, containing a special value (sVal) which should be a discrete distribution (DiscDistribution). This should represent a :obj:branchSelector's opinion of how to divide the example :obj:branch_selector's opinion of how to divide the example between the branches. Whether the proposition will be used or not depends upon the chosen :obj:ExampleSplitter (when learning) or :obj:Descender (when classifying). The lists :obj:branches, :obj:branchDescriptions and :obj:branchSizes are of the same length; all of them are The lists :obj:branches, :obj:branch_descriptions and :obj:branch_sizes are of the same length; all of them are defined if the node is internal and none if it is a leaf. .. method:: treeSize() .. method:: tree_size() Return the number of nodes in the subtrees (including the node, and whose basic job is to descend as far down the tree as possible, according to the values of example's attributes. The :obj:Descender: calls the node's :obj:branchSelector to get the branch index. If calls the node's :obj:branch_selector to get the branch index. If it's a simple index, the corresponding branch is followed. If not, it's up to descender to decide what to do, and that's where descenders differ. A :obj:descender can choose a single branch (for instance, the one that is the most recommended by the :obj:branchSelector) the one that is the most recommended by the :obj:branch_selector) or it can let the branches vote. selecting a single branch and continued the descend. In this case, the descender returns the reached :obj:Node. #. :obj:branchSelector returned a distribution and the #. :obj:branch_selector returned a distribution and the :obj:Descender decided to stop the descend at this (internal) node.  Again, descender returns the current :obj:Node and nothing else. #. :obj:branchSelector returned a distribution and the #. :obj:branch_selector returned a distribution and the :obj:Node wants to split the example (i.e., to decide the class by voting). It returns a :obj:Node and the vote-weights for the branches. The weights can correspond to the distribution returned by :obj:branchSelector, to the number of learning examples that were :obj:branch_selector, to the number of learning examples that were assigned to each branch, or to something else. should stop; it does not matter whether it's a leaf (the first case above) or an internal node (the second case). The node's :obj:nodeClassifier is used to decide the class. If the descender :obj:node_classifier is used to decide the class. If the descender returns a :obj:Node and a distribution, the :obj:TreeClassifier recursively calls itself for each of the subtrees and the predictions It gets a :obj:Node, an :obj:Orange.data.Instance and a distribution of vote weights. For each node, it calls the :obj:classDistribution and then multiplies and sums the :obj:class_distribution and then multiplies and sums the distribution. :obj:vote returns a normalized distribution of predictions. .. method:: classDistribution() .. method:: class_distribution() Gets an additional parameter, a :obj:Node (default tree root). :obj:classDistribution uses :obj:descender. If descender reaches a leaf, it calls :obj:nodeClassifier, otherwise it :obj:class_distribution uses :obj:descender. If descender reaches a leaf, it calls :obj:node_classifier, otherwise it calls :obj:vote. Thus, the :obj:vote and :obj:classDistribution are written Thus, the :obj:vote and :obj:class_distribution are written in a form of double recursion. The recursive calls do not happen at each node of the tree but only at nodes where a vote is needed Calls the descender. If it reaches a leaf, the class is predicted by the leaf's :obj:nodeClassifier. Otherwise, it calls :obj:vote. From now on, :obj:vote and :obj:classDistribution predicted by the leaf's :obj:node_classifier. Otherwise, it calls :obj:vote. From now on, :obj:vote and :obj:class_distribution interweave down the tree and return a distribution of predictions. This method simply chooses the most probable class. Components that govern the structure of the tree are :obj:split (of type :obj:SplitConstructor), :obj:stop (of type :obj:StopCriteria and :obj:exampleSplitter (of type (of type :obj:StopCriteria and :obj:example_splitter (of type :obj:ExampleSplitter). are binarized. Gain ratio is used to select attributes.  A minimum of two examples in a leaf is required for discreter and five examples in a leaf for continuous attributes. examples in a leaf for continuous attributes. .. attribute:: stop the same class. .. attribute:: splitter .. attribute:: example_splitter Object of type :obj:ExampleSplitter. The default splitter is examples according to distributions given by the selector. .. attribute:: contingencyComputer .. attribute:: contingency_computer By default, this slot is left empty and ordinary contingency ignored by some splitting constructors. .. attribute:: nodeLearner .. attribute:: node_learner Induces a classifier from examples belonging to a node. The same learner is used for internal nodes and for leaves. The default :obj:nodeLearner is and for leaves. The default :obj:node_learner is :obj:Orange.classification.majority.MajorityLearner. Descending component that the induces :obj:TreeClassifier will use. Default descender is :obj:Descender_UnknownMergeAsSelector which votes using the :obj:branchSelector's distribution for which votes using the :obj:branch_selector's distribution for vote weights. .. attribute:: maxDepth .. attribute:: max_depth Gives maximal tree depth; 0 means that only root is generated. this hard limit, you can do so as well. .. attribute:: storeDistributions, storeContingencies, storeExamples, storeNodeClassifier .. attribute:: store_distributions, store_contingencies, store_examples, store_node_classifier Decides whether to store class distributions, contingencies and examples in :obj:Node, and whether the :obj:nodeClassifier examples in :obj:Node, and whether the :obj:node_classifier should be build for internal nodes.  By default, distributions and node classifiers are stored, while contingencies and examples are examples are in a file or are fed through a filter, they are copied to a table. Even if they are already in a table, they are copied if :obj:storeExamples is set. This is to assure that pointers remain :obj:store_examples is set. This is to assure that pointers remain pointing to examples even if the user later changes the example table. If they are in the table and the :obj:storeExamples flag table. If they are in the table and the :obj:store_examples flag is clear, we just use them as they are. This will obviously crash in a multi-threaded system if one changes the table during the tree The contingency matrix is computed next. This happens even if the flag :obj:storeContingencies is false.  If the :obj:contingencyComputer :obj:store_contingencies is false.  If the :obj:contingency_computer is given we use it, otherwise we construct just an ordinary contingency matrix. A :obj:stop is called to see whether it's worth to continue. If not, a :obj:nodeClassifier is built and the :obj:Node is returned. Otherwise, a :obj:nodeClassifier is only built if not, a :obj:node_classifier is built and the :obj:Node is returned. Otherwise, a :obj:node_classifier is only built if :obj:forceNodeClassifier flag is set. To get a :obj:Node's :obj:nodeClassifier, the :obj:nodeLearner's :obj:smartLearn function is called To get a :obj:Node's :obj:node_classifier, the :obj:node_learner's :obj:smart_learn function is called with the given examples, weight ID and the just computed matrix. If the learner can use the matrix (and the default, :obj:Orange.classification.majority.MajorityLearner, can), it won't touch the examples. Thus, a choice of :obj:contingencyComputer will, in many cases, affect the :obj:nodeClassifier. The :obj:nodeLearner can return no classifier; if so and touch the examples. Thus, a choice of :obj:contingency_computer will, in many cases, affect the :obj:node_classifier. The :obj:node_learner can return no classifier; if so and if the classifier would be needed for classification, the :obj:TreeClassifier's function returns DK or an empty The contingency gets removed at this point if it is not to be stored. Thus, the :obj:split, :obj:stop and :obj:exampleSplitter stored. Thus, the :obj:split, :obj:stop and :obj:example_splitter can use the contingency matrices if they will. contingency computed from examples, apriori class probabilities, a list of candidate attributes it should consider and a node classifier (if it was constructed, that is, if :obj:storeNodeClassifier (if it was constructed, that is, if :obj:store_node_classifier is left true). :obj:SplitConstructor returns most of the data we talked about when describing the :obj:Node. It returns a classifier to be used as :obj:Node's :obj:branchSelector, a list of branch to be used as :obj:Node's :obj:branch_selector, a list of branch descriptions and a list with the number of examples that go into each branch. Just what we need for the :obj:Node.  It can return by returning no classifier. This can happen for many reasons. A general one is related to number of examples in the branches. :obj:SplitConstructor has a field :obj:minSubset, which sets :obj:SplitConstructor has a field :obj:min_subset, which sets the minimal number of examples in a branch; null nodes, however, are allowed. If there is no split where this condition is met, :obj:SplitConstructor stops the induction. .. attribute:: minSubset .. attribute:: min_subset Sets the minimal number of examples in non-null leaves. As .. method:: __call__(examples, [weightID=0, apriori_distribution, candidates]) Construct a split. Returns a tuple (:obj:branchSelector, :obj:branchDescriptions, :obj:subsetSizes, :obj:quality, Construct a split. Returns a tuple (:obj:branch_selector, :obj:branch_descriptions, :obj:subsetSizes, :obj:quality, :obj:spentAttribute). :obj:spentAttribute is -1 if no attribute is completely spent by the split criterion. If no split is constructed, the :obj:selector, :obj:branchDescriptions split is constructed, the :obj:selector, :obj:branch_descriptions and :obj:subsetSizes are None, while :obj:quality is 0.0 and :obj:spentAttribute is -1. examples have a weight of 1.0. :param apriori-distribution: Should be of type :obj:orange.Distribution and candidates should be a Python :obj:Orange.statistics.distribution.Distribution and candidates should be a Python list of objects which are interpreted as booleans. :param candidates: The split constructor should consider only trees. .. attribute:: worstAcceptable .. attribute:: worst_acceptable The lowest required split quality for a split to be acceptable. highest score, one of them is selected by random. The constructed :obj:branchSelector is an instance of The constructed :obj:branch_selector is an instance of :obj:orange.ClassifierFromVarFD that returns a value of the selected attribute. If the attribute is :obj:Orange.data.variable.Discrete, :obj:branchDescription's are the attribute's values. The attribute :obj:branch_description's are the attribute's values. The attribute is marked as spent, so that it cannot reappear in the node's subtrees. it returns one of those with the highest score. The constructed :obj:branchSelector is again an instance The constructed :obj:branch_selector is again an instance :obj:orange.ClassifierFromVarFD that returns a value of the selected attribute. This time, however, its :obj:transformer The attribute that yields the highest binary split is returned. The constructed :obj:branchSelector is again an instance The constructed :obj:branch_selector is again an instance of :obj:orange.ClassifierFromVarFD with an attached :obj:transformer. This time, :obj:transformer is of type :obj:orange.ThresholdDiscretizer. The branch descriptions are "=threshold". The attribute is not spent. :obj:Orange.feature.discretization.ThresholdDiscretizer. The branch descriptions are "=threshold". The attribute is not spent. .. class:: SplitConstructor_OneAgainstOthers simply choose the first attribute with the highest score anyway.) The :obj:branchSelector, :obj:branchDescriptions and whether The :obj:branch_selector, :obj:branch_descriptions and whether the attribute is spent is decided by the winning split constructor. .. attribute: discreteSplitConstructor .. attribute: discrete_split_constructor Split constructor for discrete attributes; can be, :obj:SplitConstructor_ExhaustiveBinary. .. attribute: continuousSplitConstructor .. attribute: continuous_split_constructor Split constructor for continuous attributes; at the moment, it split constructor you programmed in Python. .. attribute: continuousSplitConstructor .. attribute: continuous_split_constructor Split constructor for continuous attributes; at the moment, examples. .. attribute:: maxMajor .. attribute:: max_majority Maximal proportion of majority class. When this is exceeded, induction stops. .. attribute:: minExamples .. attribute:: min_examples Minimal number of examples in internal leaves. Subsets with less than :obj:minExamples examples are not split any further. less than :obj:min_examples examples are not split any further. Example count is weighed. :obj:ExampleSplitter is given a :obj:Node (from which it can use different stuff, but most of splitters only use the :obj:branchSelector), a set of examples to be divided, and :obj:branch_selector), a set of examples to be divided, and the weight ID. The result is a list of subsets of examples and, optionally, a list of new weight ID's. Most :obj:ExampleSplitter classes simply call the node's :obj:branchSelector and assign examples to corresponding branches. When :obj:branch_selector and assign examples to corresponding branches. When the value is unknown they choose a particular branch or simply skip the example. Use the information in :obj:node (particularly the :obj:branchSelector) to split the given set of examples into :obj:branch_selector) to split the given set of examples into subsets.  Return a tuple with a list of example generators and a list of weights.  The list of weights is either an ordinary classifiers, branch selectors, ...  as the original tree. Thus, you may modify a pruned tree structure (manually cut it, add new nodes, replace components) but modifying, for instance, some node's :obj:nodeClassifier (a :obj:nodeClassifier itself, not a reference to it!) would modify the node's :obj:nodeClassifier in the corresponding node of the components) but modifying, for instance, some node's :obj:node_classifier (a :obj:node_classifier itself, not a reference to it!) would modify the node's :obj:node_classifier in the corresponding node of the original tree. Pruners cannot construct a :obj:nodeClassifier nor merge :obj:nodeClassifier of the pruned subtrees into classifiers for new Pruners cannot construct a :obj:node_classifier nor merge :obj:node_classifier of the pruned subtrees into classifiers for new leaves. Thus, if you want to build a prunable tree, internal nodes must have their :obj:nodeClassifier defined. Fortunately, this is must have their :obj:node_classifier defined. Fortunately, this is the default. This pruner will only prune the nodes in which the node classifier is of class :obj:orange.DefaultClassifier (or from a derived class). is of class :obj:Orange.classification.ConstantClassifier (or from a derived class). Note that the leaves with more than one majority class require some the node:: >>> print tree.dump(leafStr="%V (%M out of %N)") >>> print tree.dump(leaf_str="%V (%M out of %N)") petal width<0.800: Iris-setosa (50.000 out of 50.000) petal width>=0.800 with this:: >>> print tree.dump(leafStr="%V (%^MbA%, %^MbP%)") >>> print tree.dump(leaf_str="%V (%^MbA%, %^MbP%)") petal width<0.800: Iris-setosa (100%, 100%) petal width>=0.800 What is the order of numbers here? If you check :samp:data.domain.classVar.values , you'll learn that the order is setosa, :samp:data.domain.class_var.values , you'll learn that the order is setosa, versicolor, virginica; so in the node at peta length<5.350 we have 49 versicolors and 3 virginicae. To print out the proportions, we can :: tree.dump(leafStr="%V", nodeStr=".") tree.dump(leaf_str="%V", node_str=".") says that the nodeStr should be the same as leafStr (not very useful here, since leafStr is trivial anyway). says that the node_str should be the same as leaf_str (not very useful here, since leaf_str is trivial anyway). :: of virginicas decreases down the tree:: print tree.dump(leafStr='%^.1CbA="Iris-virginica"% (%^.1CbP="Iris-virginica"%)', nodeStr='.') print tree.dump(leaf_str='%^.1CbA="Iris-virginica"% (%^.1CbP="Iris-virginica"%)', node_str='.') Let's first interpret the format string: :samp:CbA="Iris-virginica" is :: >>> print tree.dump(leafStr='"%V   %D %.2DbP %.2dbP"', nodeStr='"%D %.2DbP %.2dbP"') >>> print tree.dump(leaf_str='"%V   %D %.2DbP %.2dbP"', node_str='"%D %.2DbP %.2dbP"') root: [50.000, 50.000, 50.000] . . |    petal width<0.800: [50.000, 0.000, 0.000] [1.00, 0.00, 0.00] [3.00, 0.00, 0.00]: the 90% confidence intervals in the leaves:: >>> print tree.dump(leafStr="[SE: %E]\t %V %I(90)", nodeStr="[SE: %E]") >>> print tree.dump(leaf_str="[SE: %E]\t %V %I(90)", node_str="[SE: %E]") root: [SE: 0.409] |    RM<6.941: [SE: 0.306] :samp:%A the average? Doesn't a regression tree always predict the leaf average anyway? Not necessarily, the tree predict whatever the :attr:TreeClassifier.nodeClassifier in a leaf returns. :attr:TreeClassifier.node_classifier in a leaf returns. As :samp:%V uses the :obj:Orange.data.variable.Continuous' function for printing out the value, therefore the printed number has the same this number with values in the parent nodes:: >>> print tree.dump(leafStr="%C<22 (%cbP<22)", nodeStr=".") >>> print tree.dump(leaf_str="%C<22 (%cbP<22)", node_str=".") root: 277.000 (.) |    RM<6.941: 273.000 (1.160) :: >>> print tree.dump(leafStr="%C![20,22] (%^cbP![20,22]%)", nodeStr=".") >>> print tree.dump(leaf_str="%C![20,22] (%^cbP![20,22]%)", node_str=".") OK, let's observe the format string for one last time. :samp:%c![20, ------------------------------------ :meth:TreeClassifier.dump's argument :obj:userFormats can be used to print out :meth:TreeClassifier.dump's argument :obj:user_formats can be used to print out some other information in the leaves or nodes. If provided, :obj:userFormats should contain a list of tuples with a regular :obj:user_formats should contain a list of tuples with a regular expression and a callback function to be called when that expression is found in the format string. Expressions from :obj:userFormats is found in the format string. Expressions from :obj:user_formats are checked before the built-in expressions discussed above, so you can override the built-ins if you want to. The regular expression should describe a string like those we used above, for instance the string :samp:%.2DbP. When a leaf or internal node is printed out, the format string (:obj:leafStr or :obj:nodeStr) is printed out, the format string (:obj:leaf_str or :obj:node_str) is checked for these regular expressions and when the match is found, the corresponding callback function is called. The callback function will get five arguments: the format string (:obj:leafStr or :obj:nodeStr), the match object, the node which is (:obj:leaf_str or :obj:node_str), the match object, the node which is being printed, its parent (can be None) and the tree classifier. The function should return the format string in which the part described def replaceV(strg, mo, node, parent, tree): return insertStr(strg, mo, str(node.nodeClassifier.defaultValue)) return insertStr(strg, mo, str(node.node_classifier.default_value)) It therefore takes the value predicted at the node (:samp:node.nodeClassifier.defaultValue ), converts it to a string (:samp:node.node_classifier.default_value ), converts it to a string and passes it to *insertStr* to do the replacement. Quinlan's C4.5 code. .. attribute:: nodeType .. attribute:: node_type Type of the node:  :obj:C45Node.Leaf (0), Number of (learning) examples in the node. .. attribute:: classDist .. attribute:: class_dist Class distribution for the node (of type :obj:orange.DiscDistribution). :obj:Orange.statistics.distribution.Discrete). .. attribute:: tested in :samp:node.leaf. Since this is not a qualified value (ie., :obj:C45Node does not know to which attribute it belongs), we need to convert it to a string through :obj:classVar, which is passed as an convert it to a string through :obj:class_var, which is passed as an extra argument to the recursive part of printTree. """ return  _c45_printTree0(self.tree, self.classVar, 0) return  _c45_printTree0(self.tree, self.class_var, 0) def _c45_showBranch(node, classvar, lev, i): var = node.tested str_ = "" if node.nodeType == 1: if node.node_type == 1: str_ += "\n"+"|   "*lev + "%s = %s:" % (var.name, var.values[i]) str_ += _c45_printTree0(node.branch[i], classvar, lev+1) elif node.nodeType == 2: elif node.node_type == 2: str_ += "\n"+"|   "*lev + "%s %s %.1f:" % (var.name, ["<=", ">"][i], node.cut) str_ += _c45_printTree0(node.branch[i], classvar, lev+1) var = node.tested str_ = "" if node.nodeType == 0: if node.node_type == 0: str_ += "%s (%.1f)" % (classvar.values[int(node.leaf)], node.items) else: for i, branch in enumerate(node.branch): if not branch.nodeType: if not branch.node_type: str_ += _c45_showBranch(node, classvar, lev, i) for i, branch in enumerate(node.branch): if branch.nodeType: if branch.node_type: str_ += _c45_showBranch(node, classvar, lev, i) return str_ def _printTreeC45(tree): print _c45_printTree0(tree.tree, tree.classVar, 0) print _c45_printTree0(tree.tree, tree.class_var, 0) of :class:TreeClassifier object is returned instead. The values of attributes can be also be set in the constructor. .. attribute:: nodeLearner Attributes can be also be set in the constructor. .. attribute:: node_learner Induces a classifier from examples belonging to a node. The same learner is used for internal nodes and for leaves. The default :obj:nodeLearner is :obj:Orange.classification.majority.MajorityLearner. default is :obj:Orange.classification.majority.MajorityLearner. **Split construction** affect the procedures for growing of the tree are ignored. These include :obj:binarization, :obj:measure, :obj:worstAcceptable and :obj:minSubset (Default: :obj:worst_acceptable and :obj:min_subset (Default: :class:SplitConstructor_Combined with separate constructors for discrete and continuous attributes. "retis" (:class:Orange.feature.scoring.MSE). Default: "gainRatio". .. attribute:: reliefM, reliefK .. attribute:: relief_m, relief_k Sem m and k to given values if the :obj:measure is relief. **Pruning** .. attribute:: worstAcceptable .. attribute:: worst_acceptable Used in pre-pruning, sets the lowest required attribute tree at that node is not grown further (default: 0). So, to allow splitting only when gainRatio (the default measure) is greater than 0.6, set :samp:worstAcceptable=0.6. .. attribute:: minSubset So, to allow splitting only when gain ratio (the default measure) is greater than 0.6, set :samp:worst_acceptable=0.6. .. attribute:: min_subset Minimal number of examples in non-null leaves (default: 0). .. attribute:: minExamples Data subsets with less than :obj:minExamples .. attribute:: min_examples Data subsets with less than :obj:min_examples examples are not split any further, that is, all leaves in the tree will contain at least that many of examples (default: 0). .. attribute:: maxDepth .. attribute:: max_depth Gives maximal tree depth;  0 means that only root is generated. The default is 100. .. attribute:: maxMajority .. attribute:: max_majority Induction stops when the proportion of majority class in the node exceeds the value set by this parameter(default: 1.0). To stop the induction as soon as the majority class reaches 70%, you should use :samp:maxMajority=0.7, as in the following you should use :samp:max_majority=0.7, as in the following example. The numbers show the majority class proportion at each node. The script tree2.py_ induces and tree induction algorithms. See a documentation on :class:StopCriteria for more info on this function. When used, parameters  :obj:maxMajority and :obj:minExamples When used, parameters  :obj:max_majority and :obj:min_examples will not be  considered (default: None). The default stopping criterion stops induction when all examples **Record keeping** .. attribute:: storeDistributions, storeContingencies, storeExamples, storeNodeClassifier .. attribute:: store_distributions, store_contingencies, store_examples, store_node_classifier Determines whether to store class distributions, contingencies and examples in :class:Node, and whether the :obj:nodeClassifier examples in :class:Node, and whether the :obj:node_classifier should be build for internal nodes. By default everything except :obj:storeExamples is enabled. You won't save any memory by not storing :obj:store_examples is enabled. You won't save any memory by not storing distributions but storing contingencies, since distributions actually points to the same distribution that is stored in :obj:contingency.classes. (default: True except for storeExamples, which defaults to False). store_examples, which defaults to False). """ if not self._handset_split and not self.measure: measure = fscoring.GainRatio() \ if examples.domain.classVar.varType == Orange.data.Type.Discrete \ if examples.domain.class_var.var_type == Orange.data.Type.Discrete \ else fscoring.MSE() bl.split.continuousSplitConstructor.measure = measure bl.split.discreteSplitConstructor.measure = measure bl.split.continuous_split_constructor.measure = measure bl.split.discrete_split_constructor.measure = measure if self.splitter != None: """ split = SplitConstructor_Combined() split.continuousSplitConstructor = \ split.continuous_split_constructor = \ SplitConstructor_Threshold() binarization = getattr(self, "binarization", 0) if binarization == 1: split.discreteSplitConstructor = \ split.discrete_split_constructor = \ SplitConstructor_ExhaustiveBinary() elif binarization == 2: split.discreteSplitConstructor = \ split.discrete_split_constructor = \ SplitConstructor_OneAgainstOthers() else: split.discreteSplitConstructor = \ split.discrete_split_constructor = \ SplitConstructor_Feature() measureIsRelief = isinstance(measure, fscoring.Relief) relM = getattr(self, "reliefM", None) relM = getattr(self, "relief_m", None) if relM and measureIsRelief: measure.m = relM relK = getattr(self, "reliefK", None) relK = getattr(self, "relief_k", None) if relK and measureIsRelief: measure.k = relK split.continuousSplitConstructor.measure = measure split.discreteSplitConstructor.measure = measure wa = getattr(self, "worstAcceptable", 0) split.continuous_split_constructor.measure = measure split.discrete_split_constructor.measure = measure wa = getattr(self, "worst_acceptable", 0) if wa: split.continuousSplitConstructor.worstAcceptable = wa split.discreteSplitConstructor.worstAcceptable = wa ms = getattr(self, "minSubset", 0) split.continuous_split_constructor.worst_acceptable = wa split.discrete_split_constructor.worst_acceptable = wa ms = getattr(self, "min_subset", 0) if ms: split.continuousSplitConstructor.minSubset = ms split.discreteSplitConstructor.minSubset = ms split.continuous_split_constructor.min_subset = ms split.discrete_split_constructor.min_subset = ms return split """ stop = Orange.classification.tree.StopCriteria_common() mm = getattr(self, "maxMajority", 1.0) mm = getattr(self, "max_majority", 1.0) if mm < 1.0: stop.maxMajority = self.maxMajority me = getattr(self, "minExamples", 0) stop.max_majority = self.max_majority me = getattr(self, "min_examples", 0) if me: stop.minExamples = self.minExamples stop.min_examples = self.min_examples return stop learner.stop = self.stop for a in ["storeDistributions", "storeContingencies", "storeExamples", "storeNodeClassifier", "nodeLearner", "maxDepth"]: for a in ["store_distributions", "store_contingencies", "store_examples", "store_node_classifier", "node_learner", "max_depth"]: if hasattr(self, a): setattr(learner, a, getattr(self, a)) _built_fn = { "split": [ _build_split, [ "binarization", "measure", "reliefM", "reliefK", "worstAcceptable", "minSubset" ] ], \ "stop": [ _build_stop, ["maxMajority", "minExamples" ] ] "split": [ _build_split, [ "binarization", "measure", "relief_m", "relief_k", "worst_acceptable", "min_subset" ] ], \ "stop": [ _build_stop, ["max_majority", "min_examples" ] ] } TreeLearner = Orange.misc.deprecated_members({ "mForPruning": "m_pruning", "sameMajorityPruning": "same_majority_pruning" "sameMajorityPruning": "same_majority_pruning", "reliefM": "relief_m", "reliefK": "relief_k", "storeDistributions": "store_distributions", "storeContingencies": "store_contingencies", "storeExamples": "store_examples", "storeNodeClassifier": "store_node_classifier", "worstAcceptable": "worst_acceptable", "minSubset": "min_subset", "maxMajority": "max_majority", "minExamples": "min_examples", "maxDepth": "max_depth", "nodeLearner": "node_learner" }, wrap_methods=[])(TreeLearner) def replaceV(strg, mo, node, parent, tree): return insertStr(strg, mo, str(node.nodeClassifier.defaultValue)) return insertStr(strg, mo, str(node.node_classifier.default_value)) def replaceN(strg, mo, node, parent, tree): def replaceM(strg, mo, node, parent, tree): by = mo.group("by") maj = int(node.nodeClassifier.defaultValue) maj = int(node.node_classifier.default_value) N = node.distribution[maj] if by: def replacem(strg, mo, node, parent, tree): by = mo.group("by") maj = int(node.nodeClassifier.defaultValue) maj = int(node.node_classifier.default_value) if node.distribution.abs > 1e-30: N = node.distribution[maj] / node.distribution.abs def replaceCdisc(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Discrete: if tree.class_var.var_type != Orange.data.Type.Discrete: return insertDot(strg, mo) def replacecdisc(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Discrete: if tree.class_var.var_type != Orange.data.Type.Discrete: return insertDot(strg, mo) def replaceCcont(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Continuous: if tree.class_var.var_type != Orange.data.Type.Continuous: return insertDot(strg, mo) def replaceccont(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Continuous: if tree.class_var.var_type != Orange.data.Type.Continuous: return insertDot(strg, mo) def replaceCconti(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Continuous: if tree.class_var.var_type != Orange.data.Type.Continuous: return insertDot(strg, mo) def replacecconti(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Continuous: if tree.class_var.var_type != Orange.data.Type.Continuous: return insertDot(strg, mo) def replaceD(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Discrete: if tree.class_var.var_type != Orange.data.Type.Discrete: return insertDot(strg, mo) def replaced(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Discrete: if tree.class_var.var_type != Orange.data.Type.Discrete: return insertDot(strg, mo) def replaceAE(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Continuous: if tree.class_var.var_type != Orange.data.Type.Continuous: return insertDot(strg, mo) def replaceI(strg, mo, node, parent, tree): if tree.classVar.varType != Orange.data.Type.Continuous: if tree.class_var.var_type != Orange.data.Type.Continuous: return insertDot(strg, mo) self.leafStr = leafStr else: if self.node().node_classifier.classVar.varType == \ if self.node().node_classifier.class_var.var_type == \ Orange.data.Type.Discrete: self.leafStr = "%V (%^.2m%)" def showBranch(self, node, parent, lev, i): bdes = node.branchDescriptions[i] bdes = node.branchSelector.classVar.name + \ bdes = node.branch_descriptions[i] bdes = node.branch_selector.class_var.name + \ (bdes[0] not in "<=>" and "=" or "") + bdes if node.branches[i]: return label = node.branchSelector.classVar.name label = node.branch_selector.class_var.name if self.nodeStr: label += "\\n" + self.formatString(self.nodeStr, node, parent) (_quoteName(internalName), _quoteName(internalBranchName), node.branchDescriptions[i])) node.branch_descriptions[i])) self.dotTree0(branch, node, internalBranchName) return self.dump() def dump(self, leafStr = "", nodeStr = "", **argkw): @Orange.misc.deprecated_keywords({"fileName": "file_name", \ "leafStr": "leaf_str", "nodeStr": "node_str", \ "userFormats": "user_formats", "minExamples": "min_examples", \ "maxDepth": "max_depth", "simpleFirst": "simple_first"}) def dump(self, leaf_str = "", node_str = "", \ user_formats=[], min_examples=0, max_depth=1e10, \ simple_first=True): """ Return a string representation of a tree. :arg leafStr: The format string for printing the tree leaves. If :arg leaf_str: The format string for printing the tree leaves. If left empty, "%V (%^.2m%)" will be used for classification trees and "%V" for regression trees. :type leafStr: string :arg nodeStr: The format string for printing out the internal nodes. :type leaf_str: string :arg node_str: The format string for printing out the internal nodes. If left empty (as it is by default), no data is printed out for internal nodes. If set to :samp:".", the same string is used as for leaves. :type nodeStr: string :arg maxDepth: If set, it limits the depth to which the tree is :type node_str: string :arg max_depth: If set, it limits the depth to which the tree is printed out. :type maxDepth: integer :arg minExamples: If set, the subtrees with less than the given :type max_depth: integer :arg min_examples: If set, the subtrees with less than the given number of examples are not printed. :type minExamples: integer :arg simpleFirst: If True (default), the branches with a single :type min_examples: integer :arg simple_first: If True (default), the branches with a single node are printed before the branches with larger subtrees. If False, the branches are printed in order of appearance. :type simpleFirst: boolean :arg userFormats: A list of regular expressions and callback :type simple_first: boolean :arg user_formats: A list of regular expressions and callback function through which the user can print out other specific information in the nodes. """ return _TreeDumper(leafStr, nodeStr, argkw.get("userFormats", []) + _TreeDumper.defaultStringFormats, argkw.get("minExamples", 0), argkw.get("maxDepth", 1e10), argkw.get("simpleFirst", True), self).dumpTree() def dot(self, fileName, leafStr = "", nodeStr = "", leafShape="plaintext", nodeShape="plaintext", **argkw): return _TreeDumper(leaf_str, node_str, user_formats + _TreeDumper.defaultStringFormats, min_examples, max_depth, simple_first, self).dumpTree() @Orange.misc.deprecated_keywords({"fileName": "file_name", \ "leafStr": "leaf_str", "nodeStr": "node_str", \ "leafShape": "leaf_shape", "nodeShape": "node_shape", \ "userFormats": "user_formats", "minExamples": "min_examples", \ "maxDepth": "max_depth", "simpleFirst": "simple_first"}) def dot(self, fileName, leaf_str = "", node_str = "", \ leaf_shape="plaintext", node_shape="plaintext", \ user_formats=[], min_examples=0, max_depth=1e10, \ simple_first=True): """ Print the tree to a file in a format used by GraphViz _. nodes and leaves of the tree: :param leafShape: Shape of the outline around leaves of the tree. :param leaf_shape: Shape of the outline around leaves of the tree. If "plaintext", no outline is used (default: "plaintext"). :type leafShape: string :param internalNodeShape: Shape of the outline around internal nodes of the tree. If "plaintext", no outline is used (default: "box") :type leafShape: string :type leaf_shape: string :param node_shape: Shape of the outline around internal nodes of the tree. If "plaintext", no outline is used (default: "plaintext") :type node_shape: string Check Polygon-based Nodes _ fle = type(fileName) == str and open(fileName, "wt") or fileName _TreeDumper(leafStr, nodeStr, argkw.get("userFormats", []) + _TreeDumper.defaultStringFormats, argkw.get("minExamples", 0), argkw.get("maxDepth", 1e10), argkw.get("simpleFirst", True), self, leafShape=leafShape, nodeShape=nodeShape, fle=fle).dotTree() _TreeDumper(leaf_str, node_str, user_formats + _TreeDumper.defaultStringFormats, min_examples, max_depth, simple_first, self, leafShape=leaf_shape, nodeShape=node_shape, fle=fle).dotTree() def count_nodes(self):
• ## orange/Orange/data/io.py

 r9450 attributeLoadStatus = {} def make_float(name): attr, s = orange.Variable.make(name, orange.VarTypes.Continuous, [], [], createOnNew) attr, s = Orange.data.variable.make(name, Orange.data.Type.Continuous, [], [], create_on_new) attributeLoadStatus[attr] = s return attr def make_disc(name, unordered): attr, s = orange.Variable.make(name, orange.VarTypes.Discrete, [], unordered, createOnNew) attr, s = Orange.data.variable.make(name, Orange.data.Type.Discrete, [], unordered, create_on_new) attributeLoadStatus[attr] = s return attr attributeLoadStatus = [attributeLoadStatus[attr] for attr in attributes] + \ [attributeLoadStatus[classVar]] domain = orange.Domain(attributes, classVar) table = orange.ExampleTable([orange.Example(domain, [ex.get(attr, attr("?")) for attr in attributes] + [c]) for ex, c in zip(values, classes)]) domain = Orange.data.Domain(attributes, classVar) table = Orange.data.Table([Orange.data.Instance(domain, [ex.get(attr, attr("?")) for attr in attributes] + [c]) for ex, c in zip(values, classes)]) table.attribute_load_status = attributeLoadStatus return table
• ## orange/Orange/distance/instances.py

 r8059 res = diff * self.icm * diff.transpose() return res[0,0]**0.5 class PearsonRAbsoluteConstructor(PearsonRConstructor): """ Construct an instance of PearsonRAbsolute example distance estimator. """ def __call__(self, data): indxs = [i for i, a in enumerate(data.domain.attributes) \ if a.varType==Orange.data.Type.Continuous] return PearsonRAbsolute(domain=data.domain, indxs=indxs) class PearsonRAbsolute(PearsonR): """ An example distance estimator using absolute value of Pearson correlation coefficient. """ def __call__(self, e1, e2): """ Return absolute Pearson's dissimilarity between e1 and e2, i.e. .. math:: (1 - abs(r))/2 where r is Pearson's correlation coefficient. """ X1 = []; X2 = [] for i in self.indxs: if not(e1[i].isSpecial() or e2[i].isSpecial()): X1.append(float(e1[i])) X2.append(float(e2[i])) if not X1: return 1.0 try: return (1.0 - abs(statc.pearsonr(X1, X2)[0])) except: return 1.0 class SpearmanRAbsoluteConstructor(SpearmanRConstructor): """ Construct an instance of SpearmanRAbsolute example distance estimator. """ def __call__(self, data): indxs = [i for i, a in enumerate(data.domain.attributes) \ if a.varType==Orange.data.Type.Continuous] return SpearmanRAbsolute(domain=data.domain, indxs=indxs) class SpearmanRAbsolute(SpearmanR): def __call__(self, e1, e2): """ Return absolute Spearman's dissimilarity between e1 and e2, i.e. .. math:: (1 - abs(r))/2 where r is Spearman's correlation coefficient. """ X1 = []; X2 = [] for i in self.indxs: if not(e1[i].isSpecial() or e2[i].isSpecial()): X1.append(float(e1[i])) X2.append(float(e2[i])) if not X1: return 1.0 try: return (1.0 - abs(statc.spearmanr(X1, X2)[0])) except: return 1.0 def distance_matrix(data, distance_constructor, progress_callback=None): """ A helper function that computes an obj:Orange.core.SymMatrix of all pairwise distances between instances in data. :param data: A data table :type data: :obj:Orange.data.Table :param distance_constructor: An ExamplesDistance_Constructor instance. :type distance_constructor: :obj:Orange.distances.ExampleDistConstructor """ from Orange.misc import progressBarMilestones as progress_milestones matrix = Orange.core.SymMatrix(len(data)) dist = distance_constructor(data) msize = len(data)*(len(data) - 1)/2 milestones = progress_milestones(msize, 100) count = 0 for i in range(len(data)): for j in range(i + 1, len(data)): matrix[i, j] = dist(data[i], data[j]) if progress_callback and count in milestones: progress_callback(100.0 * count / msize) count += 1 return matrix
• ## orange/Orange/evaluation/scoring.py

 r9475 for te in res.results: ress[te.iteration_number].results.append(te) return ress return ress def split_by_classifiers(res): """ Splites an instance of :obj:ExperimentResults into a list of :obj:ExperimentResults, one for each classifier. """ split_res = [] for i in range(len(res.classifierNames)): r = Orange.evaluation.testing.ExperimentResults(res.numberOfIterations, [res.classifierNames[i]], res.classValues, weights=res.weights, baseClass=res.baseClass, classifiers=[res.classifiers[i]] if res.classifiers else []) r.results = [] for te in res.results: r.results.append(Orange.evaluation.testing.TestedExample(te.iterationNumber, te.actualClass, n=1, weight=te.weight)) r.results[-1].classes = [te.classes[i]] r.results[-1].probabilities = [te.probabilities[i]] split_res.append(r) return split_res
• ## orange/Orange/feature/__init__.py

 r8042 """ .. index:: feature This module provides functionality for feature scoring, selection, discretization, continuzation, imputation, construction and feature interaction analysis. ======= Scoring ======= .. automodule:: Orange.feature.scoring ========= Selection ========= .. automodule:: Orange.feature.selection ============== Discretization ============== .. automodule:: Orange.feature.discretization ============== Continuization ============== .. index:: continuization .. automodule:: Orange.feature.continuization ========== Imputation ========== .. automodule:: Orange.feature.imputation Feature scoring, selection, discretization, continuzation, imputation, construction and feature interaction analysis. """
• ## orange/Orange/feature/continuization.py

 r8042 """ ################################### Continuization (continuization) ################################### """ from Orange.core import DomainContinuizer
• ## orange/Orange/feature/discretization.py

 r8042 """ ################################### Discretization (discretization) ################################### .. index:: discretization
• ## orange/Orange/feature/imputation.py

 r8042 """ ########################### Imputation (imputation) ########################### .. index:: imputation
• ## orange/Orange/feature/scoring.py

 r8042 """ ##################### Scoring (scoring) ##################### .. index:: feature scoring single: feature; feature scoring Feature scoring is used in feature subset selection for classification problems. The goal is to find "good" features that are relevant for the given classification task. Here is a simple script that reads the data, uses :obj:attMeasure to derive feature scores and prints out these for the first three best scored features. Same scoring function is then used to report (only) on three best score features. Feature scoring is assessment of the usefulness of the feature for prediction of the dependant (class) variable. To compute the information gain of feature "tear_rate" in the Lenses data set (loaded into data) use: >>> meas = Orange.feature.scoring.InfoGain() >>> print meas("tear_rate", data) 0.548794925213 Apart from information gain you could also use other scoring methods; see :ref:classification and :ref:regression. Various ways to call them are described on :ref:callingscore. It is possible to construct the object and use it on-the-fly:: >>> print Orange.feature.scoring.InfoGain("tear_rate", data) 0.548794925213 But constructing new instances for each feature is slow for scoring methods that use caching, such as :obj:Relief. Scoring features that are not in the domain is also possible. For instance, discretized features can be scored without producing a data table in advance (slow with :obj:Relief): .. literalinclude:: code/scoring-info-iris.py :lines: 7-11 The following example computes feature scores, both with :obj:score_all and by scoring each feature individually, and prints out the best three features. .. _scoring-all.py: code/scoring-all.py .. _voting.tab: code/voting.tab scoring-all.py_ (uses voting.tab_): .. literalinclude:: code/scoring-all.py :lines: 7- The script should output this:: Feature scores for best three features: The output:: Feature scores for best three features (with score_all): 0.613 physician-fee-freeze 0.255 adoption-of-the-budget-resolution 0.255 el-salvador-aid 0.228 synfuels-corporation-cutback .. autoclass:: Orange.feature.scoring.OrderAttributesByMeasure :members: .. automethod:: Orange.feature.scoring.MeasureAttribute_Distance .. autoclass:: Orange.feature.scoring.MeasureAttribute_DistanceClass :members: .. automethod:: Orange.feature.scoring.MeasureAttribute_MDL .. autoclass:: Orange.feature.scoring.MeasureAttribute_MDLClass :members: .. automethod:: Orange.feature.scoring.mergeAttrValues .. automethod:: Orange.feature.scoring.attMeasure ============ Base Classes ============ There are a number of different measures for assessing the relevance of features with respect to much information they contain about the corresponding class. These procedures are also known as feature scoring. Orange implements several methods that all stem from :obj:Orange.feature.scoring.Measure. The most of common ones compute certain statistics on conditional distributions of class values given the feature values; in Orange, these are derived from :obj:Orange.feature.scoring.MeasureAttributeFromProbabilities. .. class:: Measure This is the base class for a wide range of classes that measure quality of features. The class itself is, naturally, abstract. Its fields merely describe what kinds of features it can handle and what kind of data it requires. .. attribute:: handlesDiscrete Tells whether the measure can handle discrete features. .. attribute:: handlesContinuous Tells whether the measure can handle continuous features. .. attribute:: computesThresholds Tells whether the measure implements the :obj:thresholdFunction. .. attribute:: needs Tells what kind of data the measure needs. This can be either :obj:NeedsGenerator, :obj:NeedsDomainContingency, :obj:NeedsContingency_Class. The first need an instance generator (Relief is an example of such measure), the second can compute the quality from :obj:Orange.statistics.contingency.Domain and the latter only needs the contingency (:obj:Orange.statistics.contingency.VarClass) the feature distribution and the apriori class distribution. Most measures only need the latter. Several (but not all) measures can treat unknown feature values in different ways, depending on field :obj:unknownsTreatment (this field is not defined in :obj:Measure but in many derived classes). Undefined values can be: * ignored (:obj:Measure.IgnoreUnknowns); this has the same effect as if the example for which the feature value is unknown are removed. * punished (:obj:Measure.ReduceByUnknown); the feature quality is reduced by the proportion of unknown values. In impurity measures, this can be interpreted as if the impurity is decreased only on examples for which the value is defined and stays the same for the others, and the feature quality is the average impurity decrease. * imputed (:obj:Measure.UnknownsToCommon); here, undefined values are replaced by the most common feature value. If you want a more clever imputation, you should do it in advance. * treated as a separate value (:obj:MeasureAttribute.UnknownsAsValue) The default treatment is :obj:ReduceByUnknown, which is optimal in most cases and does not make additional presumptions (as, for instance, :obj:UnknownsToCommon which supposes that missing values are not for instance, results of measurements that were not performed due to information extracted from the other features). Use other treatments if you know that they make better sense on your data. The only method supported by all measures is the call operator to which we pass the data and get the number representing the quality of the feature. The number does not have any absolute meaning and can vary widely for different feature measures. The only common characteristic is that higher the value, better the feature. If the feature is so bad that it's quality cannot be measured, the measure returns :obj:Measure.Rejected. None of the measures described here do so. There are different sets of arguments that the call operator can accept. Not all classes will accept all kinds of arguments. Relief, for instance, cannot be computed from contingencies alone. Besides, the feature and the class need to be of the correct type for a particular measure. There are three call operators just to make your life simpler and faster. When working with the data, your method might have already computed, for instance, contingency matrix. If so and if the quality measure you use is OK with that (as most measures are), you can pass the contingency matrix and the measure will compute much faster. If, on the other hand, you only have examples and haven't computed any statistics on them, you can pass examples (and, optionally, an id for meta-feature with weights) and the measure will compute the contingency itself, if needed. .. method:: __call__(attribute, examples[, apriori class distribution][, weightID]) .. method:: __call__(attribute, domain contingency[, apriori class distribution]) .. method:: __call__(contingency, class distribution[, apriori class distribution]) :param attribute: gives the feature whose quality is to be assessed. This can be either a descriptor, an index into domain or a name. In the first form, if the feature is given by descriptor, it doesn't need to be in the domain. It needs to be computable from the feature in the domain, though. Data is given either as examples (and, optionally, id for meta-feature with weight), contingency tables (:obj:Orange.statistics.contingency.Domain) or distributions (:obj:Orange.statistics.distribution.Distribution) for all attributes. In the latter for, what is given as the class distribution depends upon what you do with unknown values (if there are any).  If :obj:unknownsTreatment is :obj:IgnoreUnknowns, the class distribution should be computed on examples for which the feature value is defined. Otherwise, class distribution should be the overall class distribution. The optional argument with apriori class distribution is most often ignored. It comes handy if the measure makes any probability estimates based on apriori class probabilities (such as m-estimate). .. method:: thresholdFunction(attribute, examples[, weightID]) This function computes the qualities for different binarizations of the continuous feature :obj:attribute. The feature should of course be continuous. The result of a function is a list of tuples, where the first element represents a threshold (all splits in the middle between two existing feature values), the second is the measured quality for a corresponding binary feature and the last one is the distribution which gives the number of examples below and above the threshold. The last element, though, may be missing; generally, if the particular measure can get the distribution without any computational burden, it will do so and the caller can use it. If not, the caller needs to compute it itself. The script below shows different ways to assess the quality of astigmatic, tear rate and the first feature (whichever it is) in the dataset lenses. .. literalinclude:: code/scoring-info-lenses.py :lines: 7-21 As for many other classes in Orange, you can construct the object and use it on-the-fly. For instance, to measure the quality of feature "tear_rate", you could write simply:: >>> print Orange.feature.scoring.Info("tear_rate", data) 0.548794984818 You shouldn't use this shortcut with ReliefF, though; see the explanation in the section on ReliefF. It is also possible to assess the quality of features that do not exist in the features. For instance, you can assess the quality of discretized features without constructing a new domain and dataset that would include them. scoring-info-iris.py_ (uses iris.tab_): .. literalinclude:: code/scoring-info-iris.py :lines: 7-11 The quality of the new feature d1 is assessed on data, which does not include the new feature at all. (Note that ReliefF won't do that since it would be too slow. ReliefF requires the feature to be present in the dataset.) Finally, you can compute the quality of meta-features. The following script adds a meta-feature to an example table, initializes it to random values and measures its information gain. scoring-info-lenses.py_ (uses lenses.tab_): .. literalinclude:: code/scoring-info-lenses.py :lines: 54- To show the computation of thresholds, we shall use the Iris data set. scoring-info-iris.py_ (uses iris.tab_): .. literalinclude:: code/scoring-info-iris.py :lines: 7-15 If we hadn't constructed the feature in advance, we could write Orange.feature.scoring.Relief().thresholdFunction("petal length", data). This is not recommendable for ReliefF, since it may be a lot slower. The script below finds and prints out the best threshold for binarization of an feature, that is, the threshold with which the resulting binary feature will have the optimal ReliefF (or any other measure):: thresh, score, distr = meas.bestThreshold("petal length", data) print "Best threshold: %5.3f (score %5.3f)" % (thresh, score) .. class:: MeasureAttributeFromProbabilities This is the abstract base class for feature quality measures that can be computed from contingency matrices only. It relieves the derived classes from having to compute the contingency matrix by defining the first two forms of call operator. (Well, that's not something you need to know if you only work in Python.) Additional feature of this class is that you can set probability estimators. If none are given, probabilities and conditional probabilities of classes are estimated by relative frequencies. .. attribute:: unknownsTreatment Defines what to do with unknown values. See the possibilities described above. .. attribute:: estimatorConstructor .. attribute:: conditionalEstimatorConstructor The classes that are used to estimate unconditional and conditional probabilities of classes, respectively. You can set this to, for instance, :obj:ProbabilityEstimatorConstructor_m and :obj:ConditionalProbabilityEstimatorConstructor_ByRows (with estimator constructor again set to :obj:ProbabilityEstimatorConstructor_m), respectively. Feature scores for best three features (scored individually): 0.613 physician-fee-freeze 0.255 el-salvador-aid 0.228 synfuels-corporation-cutback .. comment:: The next script uses :obj:GainRatio and :obj:Relief. .. literalinclude:: code/scoring-relief-gainRatio.py :lines: 7- Notice that on this data the ranks of features match:: Relief GainRt Feature 0.613  0.752  physician-fee-freeze 0.255  0.444  el-salvador-aid 0.228  0.414  synfuels-corporation-cutback 0.189  0.382  crime 0.166  0.345  adoption-of-the-budget-resolution .. _callingscore: ======================= Calling scoring methods ======================= To score a feature use :obj:Score.__call__. There are diferent function signatures, which enable optimization. For instance, if contingency matrix has already been computed, you can speed up the computation by passing it to the scoring method (if it supports that form - most do). Otherwise the scoring method will have to compute the contingency itself. Not all classes accept all kinds of arguments. :obj:Relief, for instance, only supports the form with instances on the input. .. method:: Score.__call__(attribute, instances[, apriori_class_distribution][, weightID]) :param attribute: the chosen feature, either as a descriptor, index, or a name. :type attribute: :class:Orange.data.variable.Variable or int or string :param instances: data. :type instances: Orange.data.Table :param weightID: id for meta-feature with weight. All scoring methods need to support these parameters. .. method:: Score.__call__(attribute, domain_contingency[, apriori_class_distribution]) :param attribute: the chosen feature, either as a descriptor, index, or a name. :type attribute: :class:Orange.data.variable.Variable or int or string :param domain_contingency: :type domain_contingency: :obj:Orange.statistics.contingency.Domain .. method:: Score.__call__(contingency, class_distribution[, apriori_class_distribution]) :param contingency: :type contingency: :obj:Orange.statistics.contingency.VarClass :param class_distribution: distribution of the class variable. If :obj:unknowns_treatment is :obj:IgnoreUnknowns, it should be computed on instances where feature value is defined. Otherwise, class distribution should be the overall class distribution. :type class_distribution: :obj:Orange.statistics.distribution.Distribution :param apriori_class_distribution: Optional and most often ignored. Useful if the scoring method makes any probability estimates based on apriori class probabilities (such as the m-estimate). :return: Feature score - the higher the value, the better the feature. If the quality cannot be scored, return :obj:Score.Rejected. :rtype: float or :obj:Score.Rejected. The code below scores the same feature with :obj:GainRatio in different ways. .. literalinclude:: code/scoring-calls.py :lines: 7- .. _classification: =========================== Measures for Classification Feature scoring in classification problems =========================== This script scores features with gain ratio and relief. scoring-relief-gainRatio.py_ (uses voting.tab_): .. literalinclude:: code/scoring-relief-gainRatio.py :lines: 7- Notice that on this data the ranks of features match rather well:: Relief GainRt Feature 0.613  0.752  physician-fee-freeze 0.255  0.444  el-salvador-aid 0.228  0.414  synfuels-corporation-cutback 0.189  0.382  crime 0.166  0.345  adoption-of-the-budget-resolution The following section describes the feature quality measures suitable for discrete features and outcomes. See  scoring-info-lenses.py_, scoring-info-iris.py_, scoring-diff-measures.py_ and scoring-regression.py_ for more examples on their use. .. Undocumented: MeasureAttribute_IM, MeasureAttribute_chiSquare, MeasureAttribute_gainRatioA, MeasureAttribute_logOddsRatio, MeasureAttribute_splitGain. .. index:: .. class:: InfoGain The most popular measure, information gain :obj:Info measures the expected decrease of the entropy. Information gain - the expected decrease of entropy. See page on wikipedia _. .. index:: .. class:: GainRatio Gain ratio :obj:GainRatio was introduced by Quinlan in order to avoid overestimation of multi-valued features. It is computed as information gain divided by the entropy of the feature's value. (It has been shown, however, that such measure still overstimates the features with multiple values.) Information gain ratio - information gain divided by the entropy of the feature's value. Introduced in [Quinlan1986]_ in order to avoid overestimation of multi-valued features. It has been shown, however, that it still overestimates features with multiple values. See Wikipedia _. .. index:: .. class:: Gini Gini index :obj:Gini was first introduced by Breiman and can be interpreted as the probability that two randomly chosen examples will have different classes. Gini index is the probability that two randomly chosen instances will have different classes. See Gini coefficient on Wikipedia _. .. index:: .. class:: Relevance Relevance of features :obj:Relevance is a measure that discriminate between features on the basis of their potential value in the formation of decision rules. The potential value for decision rules. .. index:: .. class:: Cost Evaluates features based on the "saving" achieved by knowing the value of Evaluates features based on the cost decrease achieved by knowing the value of feature, according to the specified cost matrix. .. attribute:: cost Cost matrix, see :obj:Orange.classification.CostMatrix for details. If cost of predicting the first class for an example that is actually in Cost matrix, see :obj:Orange.classification.CostMatrix for details. If the cost of predicting the first class of an instance that is actually in the second is 5, and the cost of the opposite error is 1, than an appropriate measure can be constructed and used for feature 3 as follows:: score can be constructed as follows:: >>> meas = Orange.feature.scoring.Cost() 0.083333350718021393 This tells that knowing the value of feature 3 would decrease the classification cost for appx 0.083 per example. Knowing the value of feature 3 would decrease the classification cost for approximately 0.083 per instance. .. comment:: opposite error - is this term correct? TODO .. index:: .. class:: Relief ReliefF :obj:Relief was first developed by Kira and Rendell and then substantially generalized and improved by Kononenko. It measures the usefulness of features based on their ability to distinguish between very similar examples belonging to different classes. Assesses features' ability to distinguish between very similar instances from different classes. This scoring method was first developed by Kira and Rendell and then improved by Kononenko. The class :obj:Relief works on discrete and continuous classes and thus implements ReliefF and RReliefF. .. attribute:: k Number of neighbours for each example. Default is 5. Number of neighbours for each instance. Default is 5. .. attribute:: m Number of reference examples. Default is 100. Set to -1 to take all the examples. .. attribute:: checkCachedData A flag best left alone unless you know what you do. Computation of ReliefF is rather slow since it needs to find k nearest neighbours for each of m reference examples (or all examples, if m is set to -1). Since we normally compute ReliefF for all features in the dataset, :obj:Relief caches the results. When it is called to compute a quality of certain feature, it computes qualities for all features in the dataset. When called again, it uses the stored results if the data has not changeddomain is still the same and the example table has not changed. Checking is done by comparing the data table version :obj:Orange.data.Table for details) and then computing a checksum of the data and comparing it with the previous checksum. The latter can take some time on large tables, so you may want to disable it by setting checkCachedData to :obj:False. In most cases it will do no harm, except when the data is changed in such a way that it passed unnoticed by the version' control, in which cases the computed ReliefFs can be false. Hence: disable it if you know that the data does not change or if you know what kind of changes are detected by the version control. Caching will only have an effect if you use the same instance for all features in the domain. So, don't do this:: for attr in data.domain.attributes: print Orange.feature.scoring.Relief(attr, data) In this script, cached data dies together with the instance of :obj:Relief, which is constructed and destructed for each feature separately. It's way faster to go like this:: meas = Orange.feature.scoring.Relief() for attr in table.domain.attributes: print meas(attr, data) When called for the first time, meas will compute ReliefF for all features and the subsequent calls simply return the stored data. Class :obj:Relief works on discrete and continuous classes and thus implements functionality of algorithms ReliefF and RReliefF. .. note:: ReliefF can also compute the threshold function, that is, the feature quality at different thresholds for binarization. Finally, here is an example which shows what can happen if you disable the computation of checksums:: table = Orange.data.Table("iris") r1 = Orange.feature.scoring.Relief() r2 = Orange.feature.scoring.Relief(checkCachedData = False) print "%.3f\\t%.3f" % (r1(0, table), r2(0, table)) for ex in table: ex[0] = 0 print "%.3f\\t%.3f" % (r1(0, table), r2(0, table)) The first print prints out the same number, 0.321 twice. Then we annulate the first feature. r1 notices it and returns -1 as it's ReliefF, while r2 does not and returns the same number, 0.321, which is now wrong. Number of reference instances. Default is 100. Set to -1 to take all the instances. .. attribute:: check_cached_data Check if the cached data is changed with data checksum. Slow on large tables.  Defaults to :obj:True. Disable it if you know that the data will not change. ReliefF is slow since it needs to find k nearest neighbours for each of m reference instances. As we normally compute ReliefF for all features in the dataset, :obj:Relief caches the results for all features, when called to score a certain feature.  When called again, it uses the stored results if the domain and the data table have not changed (data table version and the data checksum are compared). Caching will only work if you use the same object. Constructing new instances of :obj:Relief fir each feature, like this:: for attr in data.domain.attributes: print Orange.feature.scoring.Relief(attr, data) runs much slower than reusing the same instance:: meas = Orange.feature.scoring.Relief() for attr in table.domain.attributes: print meas(attr, data) .. note:: Relief can also compute the threshold function, that is, the feature quality at different thresholds for binarization. .. autoclass:: Orange.feature.scoring.Distance .. autoclass:: Orange.feature.scoring.MDL .. _regression: ======================= Measures for Regression Feature scoring in regression problems ======================= Except for ReliefF, the only feature quality measure available for regression problems is based on a mean square error. You can also use :obj:Relief for regression. .. index:: .. class:: MSE Implements the mean square error measure. .. attribute:: unknownsTreatment Tells what to do with unknown feature values. See description on the top of this page. Implements the mean square error score. .. attribute:: unknowns_treatment What to do with unknown values. See :obj:Score.unknowns_treatment. .. attribute:: m Parameter for m-estimate of error. Default is 0 (no m-estimate). ========== References ========== * Igor Kononeko, Matjaz Kukar: Machine Learning and Data Mining, Parameter for m-estimate of error. Default is 0 (no m-estimate). ============ Base Classes ============ Implemented methods for scoring relevances of features to the class are subclasses of :obj:Score. Those that compute statistics on conditional distributions of class values given the feature values are derived from :obj:ScoreFromProbabilities. .. class:: Score Abstract base class for feature scoring. Its attributes describe which features it can handle and the required data. **Capabilities** .. attribute:: handles_discrete Indicates whether the scoring method can handle discrete features. .. attribute:: handles_continuous Indicates whether the scoring method can handle continuous features. .. attribute:: computes_thresholds Indicates whether the scoring method implements the :obj:threshold_function. **Input specification** .. attribute:: needs The type of data needed: :obj:Generator, :obj:DomainContingency, or :obj:Contingency_Class. .. attribute:: Generator Constant. Indicates that the scoring method needs an instance generator on the input (as, for example, :obj:Relief). .. attribute:: DomainContingency Constant. Indicates that the scoring method needs :obj:Orange.statistics.contingency.Domain. .. attribute:: Contingency_Class Constant. Indicates, that the scoring method needs the contingency (:obj:Orange.statistics.contingency.VarClass), feature distribution and the apriori class distribution (as most scoring methods). **Treatment of unknown values** .. attribute:: unknowns_treatment Not defined in :obj:Score but defined in classes that are able to treat unknown values. Either :obj:IgnoreUnknowns, :obj:ReduceByUnknown. :obj:UnknownsToCommon, or :obj:UnknownsAsValue. .. attribute:: IgnoreUnknowns Constant. Instances for which the feature value is unknown are removed. .. attribute:: ReduceByUnknown Constant. Features with unknown values are punished. The feature quality is reduced by the proportion of unknown values. For impurity scores the impurity decreases only where the value is defined and stays the same otherwise. .. attribute:: UnknownsToCommon Constant. Undefined values are replaced by the most common value. .. attribute:: UnknownsAsValue Constant. Unknown values are treated as a separate value. **Methods** .. method:: __call__ Abstract. See :ref:callingscore. .. method:: threshold_function(attribute, instances[, weightID]) Abstract. Assess different binarizations of the continuous feature :obj:attribute.  Return a list of tuples, where the first element is a threshold (between two existing values), the second is the quality of the corresponding binary feature, and the last the distribution of instancs below and above the threshold. The last element is optional. To show the computation of thresholds, we shall use the Iris data set (part of scoring-info-iris.py_, uses iris.tab_): .. literalinclude:: code/scoring-info-iris.py :lines: 13-15 .. method:: best_threshold(attribute, instances) Return the best threshold for binarization, that is, the threshold with which the resulting binary feature will have the optimal score. The script below prints out the best threshold for binarization of an feature. ReliefF is used scoring: (part of scoring-info-iris.py_, uses iris.tab_): .. literalinclude:: code/scoring-info-iris.py :lines: 17-18 .. class:: ScoreFromProbabilities Bases: :obj:Score Abstract base class for feature scoring method that can be computed from contingency matrices only. It relieves the derived classes from having to compute the contingency matrix by defining the first two forms of call operator. (Well, that's not something you need to know if you only work in Python.) .. attribute:: unknowns_treatment See :obj:Score.unknowns_treatment. .. attribute:: estimator_constructor .. attribute:: conditional_estimator_constructor The classes that are used to estimate unconditional and conditional probabilities of classes, respectively. You can set this to, for instance, :obj:ProbabilityEstimatorConstructor_m and :obj:ConditionalProbabilityEstimatorConstructor_ByRows (with estimator constructor again set to :obj:ProbabilityEstimatorConstructor_m), respectively. Both default to relative frequencies. ============ Other ============ .. autoclass:: Orange.feature.scoring.OrderAttributes :members: .. autofunction:: Orange.feature.scoring.merge_values .. autofunction:: Orange.feature.scoring.score_all .. comment .. rubric:: References .. [Kononenko2007] Igor Kononenko, Matjaz Kukar: Machine Learning and Data Mining, Woodhead Publishing, 2007. .. [Quinlan1986] J R Quinlan: Induction of Decision Trees, Machine Learning, 1986. .. [Breiman1984] L Breiman et al: Classification and Regression Trees, Chapman and Hall, 1984. .. [Kononenko1995] I Kononenko: On biases in estimating multi-valued attributes, International Joint Conference on Artificial Intelligence, 1995. .. _iris.tab: code/iris.tab import Orange.core as orange from orange import MeasureAttribute as Measure import Orange.misc from orange import MeasureAttribute as Score from orange import MeasureAttributeFromProbabilities as ScoreFromProbabilities from orange import MeasureAttribute_info as InfoGain from orange import MeasureAttribute_gainRatio as GainRatio from orange import MeasureAttribute_MSE as MSE ###### # from orngEvalAttr.py class OrderAttributesByMeasure: """Construct an instance that orders features by their scores. :param measure: a feature measure, derived from :obj:Orange.feature.scoring.Measure. class OrderAttributes: """Orders features by their scores. .. attribute::  score A scoring method derived from :obj:~Orange.feature.scoring.Score. If :obj:None, :obj:Relief with m=5 and k=10 will be used. """ def __init__(self, measure=None): self.measure = measure def __init__(self, score=None): self.score = score def __call__(self, data, weight): """Take :obj:Orange.data.table data table and an instance of :obj:Orange.feature.scoring.Measure to score and order features. """Score and order all features. :param data: a data table used to score features :type data: Orange.data.table :param weight: meta feature that stores weights of individual data instances :type weight: Orange.data.variable :type data: Orange.data.Table :param weight: meta attribute that stores weights of instances :type weight: Orange.data.variable.Variable """ if self.measure: measure = self.measure if self.score: measure = self.score else: measure = Relief(m=5,k=10) measure = Relief(m=5, k=10) measured = [(attr, measure(attr, data, None, weight)) for attr in data.domain.attributes] return [x[0] for x in measured] def MeasureAttribute_Distance(attr = None, data = None): """Instantiate :obj:MeasureAttribute_DistanceClass and use it to return the score of a given feature on given data. :param attr: feature to score :type attr: Orange.data.variable :param data: data table used for feature scoring :type data: Orange.data.table OrderAttributes = Orange.misc.deprecated_members({ "measure": "score", }, wrap_methods=[])(OrderAttributes) class Distance(Score): """The :math:1-D distance is defined as information gain divided by joint entropy :math:H_{CA} (:math:C is the class variable and :math:A the feature): .. math:: 1-D(C,A) = \\frac{\\mathrm{Gain}(A)}{H_{CA}} """ m = MeasureAttribute_DistanceClass() if attr != None and data != None: return m(attr, data) else: return m class MeasureAttribute_DistanceClass(orange.MeasureAttribute): """Implement the 1-D feature distance measure described in Kononenko.""" def __call__(self, attr, data, aprioriDist = None, weightID = None): """Take :obj:Orange.data.table data table and score the given :obj:Orange.data.variable. @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"}) def __new__(cls, attr=None, data=None, apriori_dist=None, weightID=None): self = Score.__new__(cls) if attr != None and data != None: #self.__init__(**argkw) return self.__call__(attr, data, apriori_dist, weightID) else: return self @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"}) def __call__(self, attr, data, apriori_dist=None, weightID=None): """Score the given feature. :param attr: feature to score :type attr: Orange.data.variable :type attr: Orange.data.variable.Variable :param data: a data table used to score features :type data: Orange.data.table :param aprioriDist: :type aprioriDist: :param apriori_dist: :type apriori_dist: :param weightID: meta feature used to weight individual data instances :type weightID: Orange.data.variable :type weightID: Orange.data.variable.Variable """ return 0 def MeasureAttribute_MDL(attr = None, data = None): """Instantiate :obj:MeasureAttribute_MDLClass and use it n given data to return the feature's score.""" m = MeasureAttribute_MDLClass() if attr != None and data != None: return m(attr, data) else: return m class MeasureAttribute_MDLClass(orange.MeasureAttribute): """Score feature based on the minimum description length principle.""" def __call__(self, attr, data, aprioriDist = None, weightID = None): """Take :obj:Orange.data.table data table and score the given :obj:Orange.data.variable. class MDL(Score): """Minimum description length principle [Kononenko1995]_. Let :math:n be the number of instances, :math:n_0 the number of classes, and :math:n_{cj} the number of instances with feature value :math:j and class value :math:c. Then MDL score for the feature A is .. math:: \mathrm{MDL}(A) = \\frac{1}{n} \\Bigg[ \\log\\binom{n}{n_{1.},\\cdots,n_{n_0 .}} - \\sum_j \\log \\binom{n_{.j}}{n_{1j},\\cdots,n_{n_0 j}} \\\\ + \\log \\binom{n+n_0-1}{n_0-1} - \\sum_j \\log \\binom{n_{.j}+n_0-1}{n_0-1} \\Bigg] """ @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"}) def __new__(cls, attr=None, data=None, apriori_dist=None, weightID=None): self = Score.__new__(cls) if attr != None and data != None: #self.__init__(**argkw) return self.__call__(attr, data, apriori_dist, weightID) else: return self @Orange.misc.deprecated_keywords({"aprioriDist": "apriori_dist"}) def __call__(self, attr, data, apriori_dist=None, weightID=None): """Score the given feature. :param attr: feature to score :type attr: Orange.data.variable :type attr: Orange.data.variable.Variable :param data: a data table used to score the feature :type data: Orange.data.table :param aprioriDist: :type aprioriDist: :param apriori_dist: :type apriori_dist: :param weightID: meta feature used to weight individual data instances :type weightID: Orange.data.variable :type weightID: Orange.data.variable.Variable """ return ret def mergeAttrValues(data, attrList, attrMeasure, removeUnusedValues = 1): @Orange.misc.deprecated_keywords({"attrList": "attr_list", "attrMeasure": "attr_score", "removeUnusedValues": "remove_unused_values"}) def merge_values(data, attr_list, attr_score, remove_unused_values = 1): import orngCI #data = data.select([data.domain[attr] for attr in attrList] + [data.domain.classVar]) newData = data.select(attrList + [data.domain.classVar]) newAttr = orngCI.FeatureByCartesianProduct(newData, attrList)[0] #data = data.select([data.domain[attr] for attr in attr_list] + [data.domain.classVar]) newData = data.select(attr_list + [data.domain.class_var]) newAttr = orngCI.FeatureByCartesianProduct(newData, attr_list)[0] dist = orange.Distribution(newAttr, newData) activeValues = [] for i in range(len(newAttr.values)): if dist[newAttr.values[i]] > 0: activeValues.append(i) currScore = attrMeasure(newAttr, newData) currScore = attr_score(newAttr, newData) while 1: bestScore, bestMerge = currScore, None for i1, ind1 in enumerate(activeValues): oldInd1 = newAttr.getValueFrom.lookupTable[ind1] oldInd1 = newAttr.get_value_from.lookupTable[ind1] for ind2 in activeValues[:i1]: newAttr.getValueFrom.lookupTable[ind1] = ind2 score = attrMeasure(newAttr, newData) newAttr.get_value_from.lookupTable[ind1] = ind2 score = attr_score(newAttr, newData) if score >= bestScore: bestScore, bestMerge = score, (ind1, ind2) newAttr.getValueFrom.lookupTable[ind1] = oldInd1 newAttr.get_value_from.lookupTable[ind1] = oldInd1 if bestMerge: ind1, ind2 = bestMerge currScore = bestScore for i, l in enumerate(newAttr.getValueFrom.lookupTable): for i, l in enumerate(newAttr.get_value_from.lookupTable): if not l.isSpecial() and int(l) == ind1: newAttr.getValueFrom.lookupTable[i] = ind2 newAttr.get_value_from.lookupTable[i] = ind2 newAttr.values[ind2] = newAttr.values[ind2] + "+" + newAttr.values[ind1] del activeValues[activeValues.index(ind1)] break if not removeUnusedValues: if not remove_unused_values: return newAttr reducedAttr = orange.EnumVariable(newAttr.name, values = [newAttr.values[i] for i in activeValues]) reducedAttr.getValueFrom = newAttr.getValueFrom reducedAttr.getValueFrom.classVar = reducedAttr reducedAttr.get_value_from = newAttr.get_value_from reducedAttr.get_value_from.class_var = reducedAttr return reducedAttr ###### # from orngFSS def attMeasure(data, measure=Relief(k=20, m=50)): @Orange.misc.deprecated_keywords({"measure": "score"}) def score_all(data, score=Relief(k=20, m=50)): """Assess the quality of features using the given measure and return a sorted list of tuples (feature name, measure). :param data: data table should include a discrete class. :type data: :obj:Orange.data.table :param measure:  feature scoring function. Derived from :obj:Orange.feature.scoring.Measure. Defaults to Defaults to :obj:Orange.feature.scoring.Relief with k=20 and m=50. :type measure: :obj:Orange.feature.scoring.Measure :rtype: :obj:list a sorted list of tuples (feature name, score) :type data: :obj:Orange.data.Table :param score:  feature scoring function. Derived from :obj:~Orange.feature.scoring.Score. Defaults to :obj:~Orange.feature.scoring.Relief with k=20 and m=50. :type measure: :obj:~Orange.feature.scoring.Score :rtype: :obj:list; a sorted list of tuples (feature name, score) """ measl=[] for i in data.domain.attributes: measl.append((i.name, measure(i, data))) measl.append((i.name, score(i, data))) measl.sort(lambda x,y:cmp(y[1], x[1])) #  for i in measl: #    print "%25s, %6.3f" % (i[0], i[1]) return measl
• ## orange/Orange/feature/selection.py

 r8042 """ ######################### Selection (selection) ######################### .. index:: feature selection import Orange.core as orange from Orange.feature.scoring import attMeasure from Orange.feature.scoring import score_all # from orngFSS def bestNAtts(scores, N): """Return the best N features (without scores) from the list returned by function :obj:Orange.feature.scoring.attMeasure. :param scores: a list such as one returned by :obj:Orange.feature.scoring.attMeasure by :obj:Orange.feature.scoring.score_all. :param scores: a list such as returned by :obj:Orange.feature.scoring.score_all :type scores: list :param N: number of best features to select. def attsAboveThreshold(scores, threshold=0.0): """Return features (without scores) from the list returned by :obj:Orange.feature.scoring.attMeasure with score above or :obj:Orange.feature.scoring.score_all with score above or equal to a specified threshold. :param scores: a list such as one returned by :obj:Orange.feature.scoring.attMeasure :obj:Orange.feature.scoring.score_all :type scores: list :param threshold: score threshold for attribute selection. Defaults to 0. :type data: Orange.data.table :param scores: a list such as one returned by :obj:Orange.feature.scoring.attMeasure :obj:Orange.feature.scoring.score_all :type scores: list :param N: number of features to select """Construct and return a new set of examples that includes a class and features from the list returned by :obj:Orange.feature.scoring.attMeasure that have the score above or :obj:Orange.feature.scoring.score_all that have the score above or equal to a specified threshold. :type data: Orange.data.table :param scores: a list such as one returned by :obj:Orange.feature.scoring.attMeasure :obj:Orange.feature.scoring.score_all :type scores: list :param threshold: score threshold for attribute selection. Defaults to 0. """ measl = attMeasure(data, measure) measl = score_all(data, measure) while len(data.domain.attributes)>0 and measl[-1][1]
• ## orange/Orange/misc/__init__.py

 r9450 return decorating_function #from Orange.misc.render import contextmanager from contextlib import contextmanager @contextmanager def member_set(obj, name, val): """ A context manager that sets member name on obj to val and restores the previous value on exit. """ old_val = getattr(obj, name, val) setattr(obj, name, val) yield setattr(obj, name, old_val) class recursion_limit(object):

 r9450 for name in os.listdir(dir): addOnDir = os.path.join(dir, name) if not os.path.isdir(addOnDir): if not os.path.isdir(addOnDir) or name.startswith("."): continue try:
• ## orange/Orange/misc/environ.py

 r9450 dataset_install_dir: Directory with example data sets. network_install_dir: Directory with example networks. add_ons_dir: doc_install_dir = os.path.join(install_dir, "doc") dataset_install_dir = os.path.join(install_dir, "doc", "datasets") network_install_dir = os.path.join(install_dir, "doc", "networks") canvas_install_dir = os.path.join(install_dir, "OrangeCanvas") _ALL_DIR_OPTIONS = ["install_dir", "canvas_install_dir", "widget_install_dir", "icons_install_dir", "doc_install_dir", "dataset_install_dir", "add_ons_dir", "add_ons_dir_user", "doc_install_dir", "dataset_install_dir", "network_install_dir", "add_ons_dir", "add_ons_dir_user", "application_dir", "output_dir", "default_reports_dir", "orange_settings_dir", "canvas_settings_dir",
• ## orange/Orange/misc/testing.py

 r8042 CLASSIFICATION_DATASETS = ["iris", "brown-selected", "lenses", "monks-1"] REGRESSION_DATASETS = ["housing", "auto-mpg"] REGRESSION_DATASETS = ["housing", "auto-mpg", "servo"] CLASSLES_DATASETS =  ["water-treatment"] ALL_DATASETS  = CLASSIFICATION_DATASETS + REGRESSION_DATASETS + CLASSLES_DATASETS for ex in test: if classifier(ex, orange.GetValue) != classifier_clone(ex, orange.GetValue): print classifier(ex, orange.GetBoth) , classifier_clone(ex, orange.GetBoth) print classifier(ex, orange.GetValue) , classifier_clone(ex, orange.GetValue) self.assertEqual(classifier(ex, orange.GetValue), classifier_clone(ex, orange.GetValue), "Pickled and original classifier return a different value!") self.assertTrue(all(classifier(ex, orange.GetValue) == classifier_clone(ex, orange.GetValue) for ex in test)) if isinstance(dataset.domain.class_var, Orange.data.variable.Continuous): self.assertAlmostEqual(classifier(ex, orange.GetValue).native(), classifier_clone(ex, orange.GetValue).native(), dataset.domain.class_var.number_of_decimals + 3, "Pickled and original classifier return a different value!") else: self.assertEqual(classifier(ex, orange.GetValue), classifier_clone(ex, orange.GetValue), "Pickled and original classifier return a different value!") class MeasureAttributeTestCase(DataTestCase): """ MEASURE must be defined in the subclass """ def setUp(self): self.measure = self.MEASURE @test_on_data scores = [] for attr in data.domain.attributes: score = self.MEASURE(attr, data) score = self.measure(attr, data) #            self.assertTrue(score >= 0.0) scores.append(score) """ import cPickle s = cPickle.dumps(self.MEASURE) s = cPickle.dumps(self.measure) measure = cPickle.loads(s) # TODO: make sure measure computes the same scores as measure """ PREPROCESSOR = None def setUp(self): self.preprocessor = self.PREPROCESSOR @test_on_data """ Test preprocessor on dataset """ newdata = self.PREPROCESSOR(dataset) newdata = self.preprocessor(dataset) def test_pickle(self): """ Test preprocessor pickling """ if isinstance(self.PREPROCESSOR, type): prep = self.PREPROCESSOR() # Test the default constructed if isinstance(self.preprocessor, type): prep = self.preprocessor() # Test the default constructed s = cPickle.dumps(prep) prep = cPickle.loads(s) s = cPickle.dumps(self.PREPROCESSOR) s = cPickle.dumps(self.preprocessor) prep = cPickle.loads(s) from Orange.distance.instances import distance_matrix from Orange.misc import member_set class DistanceTestCase(DataTestCase): """ Test orange.ExamplesDistance/Constructor """ DISTANCE_CONSTRUCTOR = None def setUp(self): self.distance_constructor = self.DISTANCE_CONSTRUCTOR @test_on_data def test_distance_on(self, dataset): import numpy indices = orange.MakeRandomIndices2(dataset, min(20, len(dataset))) dataset = dataset.select(indices, 0) with member_set(self.distance_constructor, "ignore_class", True): mat = distance_matrix(dataset, self.distance_constructor) m = numpy.array(list(mat)) self.assertTrue((m >= 0.0).all()) if dataset.domain.class_var: with member_set(self.distance_constructor, "ignore_class", False): mat = distance_matrix(dataset, self.distance_constructor) m1 = numpy.array(list(mat)) self.assertTrue((m1 != m).all() or dataset, "%r does not seem to respect the 'ignore_class' flag") def test_case_script(path): """ Return a TestCase instance from a script in path.
• ## orange/Orange/network/network.py

 r9450 import numpy import networkx as nx import Orange import orangeom import readwrite from networkx import algorithms from networkx.classes import function class MdsTypeClass(): MdsType = MdsTypeClass() def _get_doc(doc): def __init__(self, data=None, name='', **attr): nx.Graph.__init__(self, data, name, **attr) nx.Graph.__init__(self, data, name=name, **attr) BaseGraph.__init__(self) def subgraph(self, nbunch): G = nx.Graph.subgraph(self, nbunch) items = self.items().get_items(G.nodes()) G = G.to_orange_network() G.set_items(items) return G # TODO: _links __doc__ += _get_doc(nx.Graph.__doc__) def __init__(self, data=None, name='', **attr): nx.DiGraph.__init__(self, data, name, **attr) nx.DiGraph.__init__(self, data, name=name, **attr) BaseGraph.__init__(self) def __init__(self, data=None, name='', **attr): nx.MultiGraph.__init__(self, data, name, **attr) nx.MultiGraph.__init__(self, data, name=name, **attr) BaseGraph.__init__(self) def __init__(self, data=None, name='', **attr): nx.MultiDiGraph.__init__(self, data, name, **attr) nx.MultiDiGraph.__init__(self, data, name=name, **attr) BaseGraph.__init__(self) ########################################################################## def map_to_graph(self, graph): nodes = sorted(graph.nodes()) return dict((v, (self.coors[0][i], self.coors[1][i])) for i,v in \ enumerate(nodes)) class NxView(object): """Network View """ def __init__(self, **attr): self._network = None self._nx_explorer = None def set_nx_explorer(self, _nx_explorer): self._nx_explorer = _nx_explorer def init_network(self, graph): return graph def nodes_selected(self): pass
• ## orange/Orange/preprocess/__init__.py

 r9450 __reduce__ = _orange__reduce__ def __init__(self, zeroBased=True, multinomialTreatment=orange.DomainContinuizer.NValues, normalizeContinuous=False, **kwargs): def __init__(self, zeroBased=True, multinomialTreatment=orange.DomainContinuizer.NValues, continuousTreatment=orange.DomainContinuizer.Leave, classTreatment=orange.DomainContinuizer.Ignore, **kwargs): self.zeroBased = zeroBased self.multinomialTreatment = multinomialTreatment self.normalizeContinuous = normalizeContinuous self.continuousTreatment = continuousTreatment self.classTreatment = classTreatment def __call__(self, data, weightId=0): continuizer = orange.DomainContinuizer(zeroBased=self.zeroBased, multinomialTreatment=self.multinomialTreatment, normalizeContinuous=self.normalizeContinuous, classTreatment=orange.DomainContinuizer.Ignore) continuousTreatment=self.continuousTreatment, classTreatment=self.classTreatment) c_domain = continuizer(data, weightId) return data.translate(c_domain)
• ## orange/Orange/projection/pca.py

 r9450 principal components. This transformation is defined in such a way that the first variable has as high variance as possible. If data instances are provided to the constructor, the learning algorithm is called and the resulting classifier is returned instead of the learner. def __call__(self, dataset): """ Perform a pca analysis on a dataset and return classifer that maps data Perform a pca analysis on a dataset and return a classifer that maps data into principal component subspace. """ def __call__(self, dataset): if type(dataset) != Orange.data.Table: dataset = Orange.data.Table(self.input_domain, [dataset]) dataset = Orange.data.Table([dataset]) X = dataset.to_numpy_MA("a")[0] #          for i, a in enumerate(self.input_domain.attributes) #          ]) ]) if len(self.pc_domain) <= 16 else \ ]) if len(self.pc_domain) <= ncomponents else \ "\n".join([ "PCA SUMMARY", else: plt.show()

• ## orange/OrangeCanvas/orngCanvasItems.py

Signals:
" for (outSignal, inSignal) in self.getSignals(): string += "     - " + outSignal + " --> " + inSignal + "
" string = string[:-4] self.setToolTip(string) # print the text with the signals self.caption = "\n".join([outSignal for (outSignal, inSignal) in self.getSignals()]) self.captionItem.setHtml("
%s
" % self.caption.replace("\n", "
")) self.updatePainterPath() if self.inWidget and self.outWidget: status = self.getEnabled() == 0 and " (Disabled)" or "" string = "" + self.outWidget.caption + " --> " + self.inWidget.caption + "" + status + "
Signals:
" for (outSignal, inSignal) in self.getSignals(): string += "     - " + outSignal + " --> " + inSignal + "
" string = string[:-4] self.setToolTip(string) # print the text with the signals self.caption = "\n".join([outSignal for (outSignal, inSignal) in self.getSignals()]) self.captionItem.setHtml("
%s
" % self.caption.replace("\n", "
")) self.updatePainterPath() def hoverEnterEvent(self, event): self.setFlags(QGraphicsItem.ItemIsSelectable)# | QGraphicsItem.ItemIsMovable) if qVersion() >= "4.6" and self.canvasDlg.settings["enableCanvasDropShadows"]: effect = QGraphicsDropShadowEffect() effect.setOffset(QPointF(1.1, 3.1)) effect.setBlurRadius(7) self.setGraphicsEffect(effect) #        if qVersion() >= "4.6" and self.canvasDlg.settings["enableCanvasDropShadows"]: #            effect = QGraphicsDropShadowEffect() #            effect.setOffset(QPointF(1.1, 3.1)) #            effect.setBlurRadius(7) #            self.setGraphicsEffect(effect) #            self.prepareGeometryChange() if scene is not None: rect = QRectF(QPointF(0, 0), self.widgetSize).adjusted(-11, -6, 11, 6)#.adjusted(-100, -100, 100, 100) #(-10-width, -4, +10+width, +25) rect.setTop(rect.top() - 20 - 21) ## Room for progress bar and warning, error, info icons if _graphicsEffect(self): textRect = self.captionItem.boundingRect() ## Should work without this but for some reason if using graphics effects the text gets clipped textRect.moveTo(self.captionItem.pos()) return rect.united(textRect) else: return rect #        if _graphicsEffect(self): #            textRect = self.captionItem.boundingRect() ## Should work without this but for some reason if using graphics effects the text gets clipped #            textRect.moveTo(self.captionItem.pos()) return rect # is mouse position inside the left signal channel
• ## orange/OrangeCanvas/orngDlgs.py

 r8042 generalBox = OWGUI.widgetBox(GeneralTab, "General Options") self.snapToGridCB = OWGUI.checkBox(generalBox, self.settings, "snapToGrid", "Snap widgets to grid", debuggingEnabled = 0) self.enableCanvasDropShadowsCB = OWGUI.checkBox(generalBox, self.settings, "enableCanvasDropShadows", "Enable drop shadows in canvas", debuggingEnabled = 0) #        self.enableCanvasDropShadowsCB = OWGUI.checkBox(generalBox, self.settings, "enableCanvasDropShadows", "Enable drop shadows in canvas", debuggingEnabled = 0) self.writeLogFileCB  = OWGUI.checkBox(generalBox, self.settings, "writeLogFile", "Save content of the Output window to a log file", debuggingEnabled = 0) self.showSignalNamesCB = OWGUI.checkBox(generalBox, self.settings, "showSignalNames", "Show signal names between widgets", debuggingEnabled = 0)
• ## orange/OrangeCanvas/orngDoc.py

 r8052 else: ce.ignore() return QWidget.closeEvent(self, ce) return with self.signalManager.freeze(): while widget.inLines != []: self.removeLine1(widget.inLines[0]) while widget.outLines != []:  self.removeLine1(widget.outLines[0]) #with self.signalManager.freeze(): while widget.inLines != []: self.removeLine1(widget.inLines[0]) while widget.outLines != []:  self.removeLine1(widget.outLines[0]) self.signalManager.removeWidget(widget.instance) self.signalManager.removeWidget(widget.instance) self.widgets.remove(widget)
• ## orange/OrangeCanvas/orngView.py

 r8042 def resetLineSignals(self): if self.selectedLine: self.doc.resetActiveSignals(self.selectedLine.outWidget, self.selectedLine.inWidget, enabled = self.doc.signalManager.getLinkEnabled(self.selectedLine.outWidget.instance, self.selectedLine.inWidget.instance)) self.selectedLine.inWidget.updateTooltip() self.selectedLine.outWidget.updateTooltip() outWidget, inWidget = self.selectedLine.outWidget, self.selectedLine.inWidget self.doc.resetActiveSignals(outWidget, inWidget, enabled = self.doc.signalManager.getLinkEnabled(outWidget.instance, inWidget.instance)) inWidget.updateTooltip() outWidget.updateTooltip() self.selectedLine.updateTooltip() QMessageBox.information( self, "Orange Canvas", "Please wait until Orange finishes processing signals.") return self.doc.resetActiveSignals(activeItem.outWidget, activeItem.inWidget, enabled = self.doc.signalManager.getLinkEnabled(activeItem.outWidget.instance, activeItem.inWidget.instance)) activeItem.inWidget.updateTooltip() activeItem.outWidget.updateTooltip() inWidget, outWidget = activeItem.inWidget, activeItem.outWidget self.doc.resetActiveSignals(outWidget, inWidget, enabled = self.doc.signalManager.getLinkEnabled(outWidget.instance, inWidget.instance)) inWidget.updateTooltip() outWidget.updateTooltip() activeItem.updateTooltip()
• ## orange/OrangeWidgets/Classify/OWCN2RulesViewer.py

 r9450 self.tableView.horizontalHeader().setSectionHidden(i, not visible) anyVisible = anyVisible or visible self.reportButton.setEnabled(anyVisible) # report button is not available if not running canvas if hasattr(self, "reportButton"): self.reportButton.setEnabled(anyVisible)
• ## orange/OrangeWidgets/Data/OWSave.py

 r8042 startfile = self.recentFiles[0] else: startfile = "." startfile = os.path.expanduser("~/") #        filename, selectedFilter = QFileDialog.getSaveFileNameAndFilter(self, 'Save Orange Data File', startfile,
• ## orange/OrangeWidgets/Evaluate/OWTestLearners.py

 r9482 self.show = show self.cmBased = cmBased def dispatch(score_desc, res, cm): """ Dispatch the call to orngStat method. """ return eval("orngStat." + score_desc.f) class OWTestLearners(OWWidget): res = orngTest.learnAndTestOnTestData(learners, self.data, self.testdata, storeExamples = True, callback=pb.advance) pb.finish() if self.isclassification(): cm = orngStat.computeConfusionMatrices(res, classIndex = self.targetClass) else: cm = None if self.preprocessor: # Unwrap learners self.error(range(len(self.stat))) scores = [] for i, s in enumerate(self.stat): try: scores.append(eval("orngStat." + s.f)) except Exception, ex: self.error(i, "An error occurred while evaluating orngStat." + s.f + "on %s due to %s" % \ (" ".join([l.name for l in learners]), ex.message)) scores.append([None] * len(self.learners)) if s.cmBased: try: #                    scores.append(eval("orngStat." + s.f)) scores.append(dispatch(s, res, cm)) except Exception, ex: self.error(i, "An error occurred while evaluating orngStat." + s.f + "on %s due to %s" % \ (" ".join([l.name for l in learners]), ex.message)) scores.append([None] * len(self.learners)) else: scores_one = [] for res_one in orngStat.split_by_classifiers(res): try: #                        scores_one.append(eval("orngStat." + s.f)[0]) scores_one.extend(dispatch(s, res_one, cm)) except Exception, ex: self.error(i, "An error occurred while evaluating orngStat." + s.f + "on %s due to %s" % \ (res.classifierNames[0], ex.message)) scores_one.append(None) scores.append(scores_one) for (i, l) in enumerate(learners): for i, l in enumerate(learners): self.learners[l.id].scores = [s[i] if s else None for s in scores]
• ## orange/OrangeWidgets/OWGUI.py

 r9450 self.initStyleOption(option, index) textRect = style.subElementRect(QStyle.SE_ItemViewItemText, option) if not textRect.isValid(): textRect = option.rect margin = style.pixelMetric(QStyle.PM_FocusFrameHMargin, option) + 1 textRect = textRect.adjusted(margin, 0, -margin, 0) text = self.displayText(index.data(Qt.DisplayRole), QLocale.system()) textRect = style.subElementRect(QStyle.SE_ItemViewItemText, option) if not textRect.isValid(): textRect = option.rect margin = style.pixelMetric(QStyle.PM_FocusFrameHMargin, option) + 1 textRect = textRect.adjusted(margin, 0, -margin, 0)
• ## orange/OrangeWidgets/Unsupervised/OWNetClustering.py

 r8305 """ Network ClusteringOrange widget for community detection in networksicons/Network.pngMiha Stajdohar (miha.stajdohar(@at@)gmail.com)7440 THIS WIDGET IS OBSOLETE; USE OWNxClustering.py """
• ## orange/OrangeWidgets/Unsupervised/OWNetExplorer.py

 r9450 """ Net ExplorerOrange widget for network exploration.icons/Network.pngMiha Stajdohar (miha.stajdohar(@at@)gmail.com)7420 THIS WIDGET IS OBSOLETE; USE OWNxExplorer.py """ import orange
• ## orange/OrangeWidgets/Unsupervised/OWNetworkFile.py

 r8305 """ Network FileReads data from a graf file (Pajek networks (.net) files and GML network files).icons/NetworkFile.pngMiha Stajdohar (miha.stajdohar(@at@)gmail.com)7410 THIS WIDGET IS OBSOLETE; USE OWNxFile.py """
• ## orange/OrangeWidgets/Unsupervised/OWNetworkFromDistances.py

 r8305 """ Network from DistancesCostructs Graph object by connecting nodes from ExampleTable where distance between them is between given threshold.icons/NetworkFromDistances.pngMiha Stajdohar (miha.stajdohar(@at@)gmail.com)7430 THIS WIDGET IS OBSOLETE; USE OWNxFromDistances.py """