Orange

orngCI: Orange Constructive Induction Module

Module orngCI implements classes and function that support constructive induction (CI). It includes minimal-complexity function decomposition and a much generalized minimal-error function decomposition. Both can be used as feature induction algorithms or as a standalone learning algorithm HINT (Zupan, 1997; Zupan et al, 1998). In addition, the module provides an interface to generalized Kramer's method (Kramer, 1994) and serveral other CI related algorithms.


Constructive Induction in Orange

Constructive induction takes a subset of attributes, which we call bound attributes and creates a new attribute whose values are computed from values of the bound attributes. Orange supports the so-called data-driven constructive induction, where learning examples are used to decide how the new attribute is computed from the bound ones.

Feature constructors (orange objects that construct features) are thus given examples, a list of bound attributes, and, optionally, an id of the weight attribute. They construct a new attribute and return a tuple with the new feature and a value that estimates its quality.

Constructing a Constructor

Complex feature constructors, such as those for minimal-error decomposition and for Kramer's algorithm show the flexibility of Orange's component based approach. On the other hand, the hierarchy of components that need to be put together to construct a workable feature inducer is too complex even for an experienced user. The main mission of orngCI is thus to help the user by "flattening" the hierarchy. Defaults are provided for all the components so the user only needs to give the specific parts of the hierarchy.

For instance, let fim be an instance of orange.FeatureIM, a feature constructor for minimal error function decomposition. Its most crucial parameter, m, is "hidden" in fim.clustersFromIM.columnAssessor.m. Parameter m is an attribute of an m-estimate based column assessor component of the column quality assessor based algorithm for column clustering. To save the user from having to remember this, orngCI offers its own class FeatureByIM. Now, if fim is an instance of orngCI.FeatureByIM the user needs to set fim.m. Before the actual feature construction takes place, orngCI.FeatureByIM constructs an appropriate orange.FeatureByIM and sets its clustersFromIM.columnAssessor.m to fim.m. Constructing a structure and putting the user-supplied components at the right places is called deflattening.

A general principle is that if you define a component, you also need to define its subcomponents as well. The default columnAssessor is orange.ColumnAssessor_m. If you leave it alone, all its subcomponents are set as well. However, if you set columnAssessor to orange.ColumnAssessor_Measure, it is your responsibility to set its attribute "measure", either by specifying it as a "measure" argument to FeatureByIM or by setting it directly in columnAssessor.measure.

The rest of this section is intended for advanced users.

The function that performs deflattening is always called createInstance.

createInstance()
Returns an instance of pure orange classes that perform the feature construction. The instance is also stored in self's field instance.

Deflattening is initiated by the call operator. The call operator is given examples, bound attributes and, optionally, the id of the weight attribute, and returns a constructed attribute and an assessment of its quality. To do its job, the call operator needs an instance of a corresponding Orange feature constructor. Constructing such instances at each invocation of call would be too costly, therefore a kind of caching is implemented. When the call operator is invoked for the first time, createInstance is called to deflatten the structure and the result is stored in self.instance. In further calls, the stored self.instance is used again. However, when any of classes attributes that affect the feature construction (such as fim.m, see above example) are modified, self.instance is set to None and a new instance will be constructed at next call.

Now, if you want to, say, tune fim.m, you can change it and cause reinstantiations of corresponding orange classes.

bound = data.domain[:2] fim = orngCI.FeatureByIM() attrs = [] for m in [0.1, 0.5, 1.0, 2.0]: fim.m = m attr, quality = fim(data, bound) attrs.append(attr)

This will construct features from the first two attributes of the domain, using different values of m. (It would be a blasphemy not to publish a shorted way to write the above code fragment: the last four lines can be replaced by attrs = [fim(data, bound)[0] for fim.m in [0.1, 0.5, 1.0, 2.0]]). However, at each call to fim, createInstances is called and a new set of components is constructed since setting m resets them. To make the program faster, you should replace

fim.m = 2

by

fim.instance.clustersFromIM.columnAssessor.m = m

Constructing a Feature

To construct a feature, you need to call a constructor. The call operator expects two or three arguments: examples, bound-attributes and, optionally an id of a weight meta attribute. It returns a tuple containing the attribute and its quality estimation. This is the same for all feature constructors in orngCI, although some may ignore some arguments or return useless quality assessments.

The method of quality estimation depends upon the algorithm used - it can be a decrease of error, an increase of impurity, ets. Qualities returned by different feature constructors are not comparable. The quality can be either negative or positive; in any case, higher numbers mean better attributes. If the measure of quality is negated (ie. low numbers meaning better attributes), they will have a negative sign.

The famous Monk 1 data set (Thrun et al, 1991) has the goal concept y := a=b or e=1. When joining attributes 'a' and 'b', the new attribute should express equality of a and b. Let us see if the minimal-complexity function decomposition can find the appropriate function.

>>> import orange, orngCI >>> data = orange.ExampleTable("monk2") >>> ab, quality = orngCI.FeatureByMinComplexity(data, ["a", "b"]) >>> ab.values <1-1+2-2+3-3, 1-2+1-3+2-1+2-3+3-1+3-2> >>> quality -2.0

Note that we specified the bound attributes by names. This is not the only way. If you want, you can give attribute descriptors, or even their indices in the domain. Thus, knowing that attributes "a" and "b" are the first two attributes, the above call could be replaced by

>>> ab, quality = orngCI.FeatureByMinComplexity(data, data.domain[:2])

or even

>>> ab, quality = orngCI.FeatureByMinComplexity(data, [0, 1])

or by any combination of names, descriptors and indices.

Now for the results. Values of the new attribute show to which values of the bound attributes they correspond. The new attribute is binary, its first value is named '1-1+2-2+3-3' and the other is '1-2+1-3+2-1+2-3+3-1+3-2'. It is obvious that this is the correct feature. Its quality is -2; minimal-complexity function decomposition prefers attributes with less values, so higher the number of values, more negative (and thus lower) the quality.

It makes sense to rename the values of the constructed feature. In our case, we shall call the first value "eq" and the second would be "diff".

>>> ab.values = ["eq", "diff"]

If any of bound attributes have values that include '-' or '+', the constructed attribute's values will be named "c1", "c2", "c3"...

Using the Constructed Features

Constructed features can be added to the domain and used in further processing of the data, such as, for instance, learning. If that is what you want, you do not need to read this section any further: if you have followed the above example, you can conclude it by

>>> data2 = orngCI.addAnAttribute(ab, data)

This adds the attribute to the domain and returns a new example table.

The feature knows how to compute itself from the values of the attributes it is constructed from. The usual Orange mechanism is used for this: the new feature has a getValueFrom pointing to a classifier that computes the feature's value. In short, when some method has an example and needs a value of a particular attribute, but this attribute is not in the example's domain, it can check whether the attribute has a getValueFrom defined. If it has, the method calls it, passing the example as an argument. The getValueFrom either computes the attribute's value or returns don't know.

>>> ab.getValueFrom <ClassifierByLookupTable2 instance at 0x0197BEF0> >>> print ab.getValueFrom.lookupTable <eq, diff, diff, diff, eq, diff, diff, diff, eq> >>> ab.getValueFrom.variable1 EnumVariable 'a' >>> ab.getValueFrom.variable2 EnumVariable 'b'

For most of features constructed by methods in orngCI, getValueFrom will contain some variant of orange.ClassifierByLookupTable. In above case, it is a classifier by lookup table based on two attributes. Values of the new attribute are read from its lookupTable. Each entry in the table corresponds to a combination of value of bound attributes; the first corresponds to (1, 1), the second to (1, 2), then to (1, 3), (2, 1), (2,2)..., where the first digit is for variable1 and the second for variable2. Value "eq" occurs at the first, the fifth and the ninth elements of the table, which correspond to (1, 1), (2, 2) and (3, 3). This is exactly what we expected for Monk 1 dataset.

If you need to manually correct the feature, you can change the lookupTable:

>>> ab.getValueFrom.lookupTable[0]="diff"

This would work for most cases, but it would fail when you start to mess with probability distributions of the new feature. The complete description of classifiers by lookup table is, however, out of scope of this text.

Let us show an example of what getValueFrom can do. We shall compute the class distributions for different values of the new attribute:

>>> c=orange.ContingencyAttrClass(ab, data) >>> for i in c: ... i <0.000, 144.000> <216.000, 72.000>

Works even though the new attribute, ab is not in domain. Its values are computed on the fly, while counting, and are not stored anywhere. What the results say is that for the first value ("eq"), all examples are in the second class ("1"), while when values of a and b are different, 216 examples (exactly 3/4) are in the first class and the other 72 in the second. Similarly, you could assess the quality of the new attribute by using information gain, for instance. Note that the automatic computation won't always work. This is done as a precaution in cases when the algorithm would require many recomputations of the value on one and the same example. ReliefF, for instance, is one such measure. In such cases, you add the new attribute to a domain and compute a new dataset in an ExampleTable, as shown above.

Feature Inducers as Redundancy Removers

Feature inducers can be used to remove (or merge) redundant attribute values. In Monk 1, the attribute "e" has four values, but the goal concept only asks whether the value is "1" or not. Thus, values "2", "3" and "4" are equivalent. If minimal complexity decomposition is called to construct a new attribute from "e", it will easily remove the redundancies:

>>> new_e, quality = orngCI.FeatureByMinComplexity(data, ["e"]) >>> new_e.values <1, 2+3+4>

The new attribute is binary, the first value corresponding to "e=1" and the other to other values of "e".

We can check the lookupTable again.

>>> print new_e.getValueFrom.lookupTable <1, 2+3+4, 2+3+4, 2+3+4, 2+3+4>

Single Feature Construction

FeatureByMinComplexity

FeatureByMinComplexity constructs features using a modified algorithm of Ashenhurst and Curtis, as described by Zupan (1997, 1998). It has two attributes.

colorIG
defines the component that colors the graph. Since only one class (orange.ColorIG_MCF) is available, there is no point in changing this (more accurately, there is nothing to change it to).
completion
defines what to do with combinations of values of bound attributes that did not occur in the data or for which the corresponding graph node had no connections to any other node. The default value of completion is orngCI.FeatureByMinComplexity.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByMinComplexity.CompletionByDefault, where the most frequent value is used for imputation, and orngCI.FeatureByMinComplexity.NoCompletion, where the values are left undefined.

FeatureByIM

Class FeatureByIM implements and extends Zupan's (1997, 1998) method for minimal-error decomposition. It is analogous to minimal-complexity decomposition. While minimal-complexity decomposition supposes that the dataset is noise-free, minimal-error decomposition can also handle noisy data.

The algorithm builds an incompatibility matrix (also called partition matrix) and merges the columns until a stopping criterion is reached. In Zupan's method, columns are compared by how the merge would affect the average m-error estimate over elements of the matrix. Merging is stopped when the error stops decreasing. In Orange, different similarity measures and stopping criteria can be used.

Since the FeatureByIM is analogous for FeatureByMinComplexity , orngCI.FeatureByMinError is provided as an alias for orngCI.FeatureByIM. No class with this name appears in orange, though.

orngCI.FeatureByIM has a number of settable components. Their actual positions after deflattening are given in parentheses.

IMconstructor (instance.IMconstructor)
is a component used for construction of incompatibility matrix. If left at default value of None, orange.IMBySorting will be used.
clustersFromIM (instance.clustersFromIM)
is the method that defines how the columns are merged. The only available class at the moment is ClustersFromIMByAssessor which uses a columnAssessor to evaluate distances between the columns and merges the most similar columns until the stopCriterion says it's enough.
stopCriterion (instance.clustersFromIM.stopCriterion)
decides when to stop the merging. The default depends upon other parameters. The default default (see below) is orange.StopIMClusteringByAssessor_noProfit, which stops the merging when the gain is negative or when it is smaller than the prescribed proportion. Alternatives are orange.StopIMClusteringByAssessor_binary that stops merging when only two columns (future values) are left, and orange.StopIMClusteringByAssessor_n which stops at n columns.
minProfitProportion (instance.clustersFromIM.stopCriterion.minProfitProportion)
sets the minimal profit for merging to continue. If you set it, for example, to 0.05, the merging will stop when the best merge would improve the current quality for less than 5%. If you use this with m-error estimation, for instance, the merging would stop when no merge would decrease the error by at least 5%. The default minProfitProportion is 0.
n (instance.clustersFromIM.stopCriterion.n)
sets the number of values for the new feature. If n is specified, the default stopCriterion is orange.StopIMClusteringFromByAssessor_n.
binary
tells that we want to induce binary features. orange.StopIMClusteringByAssessor_binary is used as stopping criterion.
columnAssessor (instance.clustersFromIM.columnAssessor)
is the component used to assess the qualities of matrices' elements, its columns and profits by column merging. This component is independent of stopping criteria. The default default is orange.ColumnAssessor_m, which uses m-error estimation. Alternatives are
  • orange.ColumnAssessor_Laplace minimizes Laplace estimation of error of matrix elements;
  • orange.ColumnAssessor_Measure which optimizes an attribute quality measure (such as information gain); this would be usually used in conjunction with orange.StopIMClusteringByAssessor_binary or orange.StopIMClusteringByAssessor_n;
  • orange.ColumnAssessor_Kramer which optimizes Kramer's impurity measure. It is defined for binary classes only. Since the profit is non-positive, the default stopping criteria is changed to orange.StopIMClusteringByAssessor_binary. You may replace it by orange.StopIMClusteringByAssessor_n.
  • orange.ColumnAssessor_Relief uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.
m (instance.clustersFromIM.columnAssessor.m)
gives m for m-error estimate.
measure (instance.clustersFromIM.columnAssessor.measure)
is the measure to be used when columnAssessor is of type orange.ColumnAssessor_Measure. If this option is given, the default columnAssessor is orange.ColumnAssessor_Measure.
completion
defines what to do with combinations of values of bound attributes that were not present in the data. The default value of completion is orngCI.FeatureByIM.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByIM.CompletionByDefault, where the most frequent value is used for imputation, and orngCI.FeatureByIM.NoCompletion, where the values are left undefined.

You should pay attention to make sensible combinations of the arguments. If you, for instance, combine column assessor based on Kramer's impurity measure with the ordinary "no profit" stop criterion (by setting both explicitly), you will probably get features that will be simply Cartesian products of bound attributes, since Kramer's impurity measure is non-decreasing. orngCI does not warning about nonsense settings.

As an example, let us check how minimal-error decomposition performs on Monk 1. We shall check its performance with different values of m, and for each we shall print out the values of the constructed feature.

>>> import orange, orngCI >>> data = orange.ExampleTable("monk1") >>> for m in [0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0]: ... ab, quality = orngCI.FeatureByIM(data, ["a", "b"], m=m) ... print "m=%4.1f: %5.3f %s" % (m, quality, ab.values) m= 0.0: 0.000 <1-1+3-3+2-2, 3-1+3-2+1-2+2-3+1-3+2-1> m= 0.1: 0.087 <2-2+3-3+1-1, 2-3+3-1+2-1+1-2+3-2+1-3> m= 0.2: 0.152 <1-1+2-2+3-3, 1-3+3-2+2-1+2-3+3-1+1-2> m= 0.5: 0.300 <2-3+3-1+1-2+3-2+1-3+2-1+3-3+2-2+1-1> m= 1.0: 0.333 <1-1+3-3+2-2, 1-2+3-2+2-1+1-3+3-1+2-3> m= 2.0: 0.500 <2-3+3-1+3-2+1-2+2-1+1-3+3-3+1-1+2-2> m= 5.0: 0.375 <2-1+3-1+3-2+1-2+2-3+1-3+3-3+1-1+2-2> m=10.0: 0.186 <1-1+3-3+2-2+1-3+1-2+3-2+3-1+2-1+2-3>

Results can vary due to random decisions in the procedure. In general, lower m's give correct features. This is not surprising: through m the user may inform the algorithm about the level of noise in the data. With higher m's, the algorithm will tend to oversimplify the feature, in our case, it will merge more than needed.

Let us see whether Kramer's impurity measure can compete with this.

>>> ab, quality = orngCI.FeatureByIM(data, ["a", "b"], columnAssessor=orange.ColumnAssessor_Kramer()) >>> print "%5.3f %s" % (quality, ab.values) 6.917 <2-2+2-3+1-3+3-3+1-2+1-1+3-1+2-1, 3-2>

It cannot. But what if we allow it to build a four-valued attribute?

>>> ab, quality = orngCI.FeatureByIM(data, ["a", "b"], columnAssessor=orange.ColumnAssessor_Kramer(), n=4) >>> print "%5.3f %s" % (quality, ab.values) 2.917 <1-1+3-2+2-1+3-3+1-2+2-2, 1-3, 2-3, 3-1> Not much better. Note that these examples are not relevant tests of the methods. Minimal-error based decomposition is designed for noisy domains, and m is usually fitted with internal cross validation. Kramer's impurity measure was designed for another feature construction method, not for comparing columns of incompatibility matrices.

As a more interesting test, let us see what happens if we construct an attribute with optimization of information gain.

>>> ab, quality = orngCI.FeatureByIM(data, ["a", "b"], measure = orange.MeasureAttribute_info()) >>> print "%5.3f %s" % (quality, ab.values) 0.000 <1-1+3-3+2-2, 2-3+3-1+1-3+3-2+1-2+2-1>> Perfect. We could add "binary=1" to the call, but it is obviously not needed. Values 1-1, 2-2, and 3-3 can be joined without increasing the entropy, and the other values can be joined likewise. (Note that this does not say anything about the actual information gain of the new attribute. orange.MeasureAttribute_info() estimates entropies in the elements of the incompatibility matrix and observes the changes in entropy with potential merging. This is not the standard information gain of an attribute but rather a form of non-myopic information gain measure.)

FeatureByKramer

FeatureByKramer implements Kramer's algorithm for feature construction (Kramer, 1994).

Basically, it computes a distribution of classes for each combination of values of bound attributes and merges them, just like the minimal-error decomposition merges columns of incompatibility matrix - and hopefully. In case of Monk 1, it extract a 0:36 distribution for combinations 1-1, 2-2 and 3-3, and 36:12 for other values. It is straightforward for any clustering algorithm to see that the former three values belong to one and the latter three to another group.

This method is somewhat similar to minimal-error decomposition. If you sum all rows of incompability matrix to a single row, you get a list of distributions corresponding to combinations of value of bound attributes. And that's exactly the data that Kramer's algorithm operates on. The remaining differences between the algorithms are only in the choice of components; while minimal-error estimation uses a change of m-error estimate as a measure of distance between two values of the new attribute, Kramer's algorithm uses a special impurity measure (which can be, as Kramer himself states, replaced by any other impurity measure or even by some measure of similarity between two distributions).

Due to the similarity between the two methods, the structure of their components is similar. The data structure that corresponds to IM (incompatilibity matrix) is called ExampleDist. There is no component analogous to IMconstructor.

clustersFromDistributions (instance.clustersFromDistributions)
is the method that defines how the values are merged. The only available class at the moment is ClustersFromDistributionsByAssessor which uses a distributionAssessor to evaluate differences between the distributions and merges the most similar values until the stopCriterion says it's enough.
distributionAssessor (instance.clustersFromDistributions.distributionAssessor)
is the component used to assess the individual and average qualities of distributions, and profits from their merging. This component is independent of stopping criteria. The default is orange.DistributionAssessor_Kramer, which uses Kramer's measure of impurity. Alternatives are
  • orange.ColumnAssessor_Laplace minimizes Laplace estimation of error of distributions;
  • orange.DistributionAssessor_Measure which optimizes an attribute quality measure (such as information gain);
  • orange.ColumnAssessor_m which assesses the quality of distributions using m-estimate for error.
  • orange.DistributionAssessor_Relief uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.
Note that this default is different than the one in analogous orngCI.FeatureByIM class. This also influences the default stopCriterion (see below).
stopCriterion (instance.clustersFromDistributions.stopCriterion)
decides when to stop the merging. The default depends upon other parameters. Rules for default are the same as for orngCI.FeatureByIM: when no n, minProfitProportion or binary are given, the stopCriterion depends upon columnAssessor. When Kramer's assessor is used, default is orange.StopDistributionClustering_binary, which stops when only two values are left. Otherwise, orange.StopDistributionClustering_noProfit that stops when the gain is negative or when it is smaller than the prescribed proportion is used. The third alternative is orange.StopDistributionClustering_n that stops at n values.
minProfitProportion (instance.clustersFromDistributions.stopCriterion.minProfitProportion)
sets the minimal profit for merging to continue. If you set it, for example, to 0.05, the merging will stop when the best merge would improve the current quality for less than 5%. If you use this with m-error estimation, for instance, the merging would stop when no merge would decrease the error by at least 5%. The default minProfitProporion is 0.
n (instance.clustersFromDistributions.stopCriterion.n)
sets the number of values for the new feature. If n is specified, the default stopCriterion is orange.StopDistributionClusteringByAssessor_n.
binary
tells that we want to induce binary features. orange.StopIMClusteringByAssessor_binary is used as stopping criterion.
m (instance.clustersFromDistributions.distributionAssessor.m)
gives m for m-error estimate.
measure (instance.clustersFromDistribution.distributionAssessor.measure)
is the measure to be used when distributionAssessor is of type orange.DistributionAssessor_Measure. If this option is given, the default distributionAssessor is orange.DistributionAssessor_Measure.
completion
defines what to do with combinations of values of bound attributes that were not present in the data. The default value of completion is orngCI.FeatureByKramer.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByKramer.CompletionByDefault, where the most frequent value is used for imputation, and orngCI.FeatureByKramer.NoCompletion, where the values are left undefined.

The class usage is essentially the same as that of orngCI.FeatureByIM except for different defaults for distributionAssessor and stopCriterion. This invokes the default Kramer's algorithm that uses Kramer's measure of impurity for distribution quality assessment and induces binary concepts:

>>> ab, quality = orngCI.FeatureByKramer(data, ["a", "b"]) >>> print "%5.3f %s" % (quality, ab.values) -0.125 <2-1+2-3+1-2+3-1+1-3+3-2, 2-2+3-3+1-1>

Setting m will replace the distribution estimator by orange.DistributionAssessor_m and, consequentially, orange.StopDistributionClustering_noProfit will be used as the stop criterion:

>>> for m in [0.0, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0]: ... ab, quality = orngCI.FeatureByKramer(data, ["a", "b"], m=m) ... print "m=%4.1f: %5.3f %s" % (m, quality, ab.values) ... Kramer's method with m-estimation of error m= 0.0: 1.458 <2-1+3-2+3-1+2-3+1-2+1-3, 2-2+3-3+1-1> m= 0.1: 1.457 <1-2+1-3+2-1+3-2+3-1+2-3, 2-2+3-3+1-1> m= 0.2: 1.455 <1-1+3-3+2-2, 2-1+3-1+1-2+2-3+3-2+1-3> m= 0.5: 1.451 <2-1+1-3+3-1+1-2+3-2+2-3, 2-2+3-3+1-1> m= 1.0: 1.444 <2-2+3-3+1-1, 1-3+2-3+1-2+3-2+2-1+3-1> m= 2.0: 1.430 <1-1+2-2+3-3, 1-3+2-3+2-1+3-2+3-1+1-2> m= 5.0: 1.391 <1-1+3-3+2-2, 2-1+1-2+3-1+2-3+3-2+1-3> m=10.0: 1.334 <1-1+2-2+3-3, 3-1+1-2+2-1+2-3+3-2+1-3>

FeatureByRandomMerge

FeatureByRandomMerge constructs features with the given number of values. Grouping of combinations of bound values is random and the estimated quality of the feature is a random value between 0 and 100, inclusive. The class is useful for comparisons - if a "real" method of constructive induction does not outperform this, it is of no use.

n
Sets the number of values for the new feature

As an example, we shall induce 30-valued random features:

>>> for i in range(5): ... ab, quality = orngCI.FeatureByRandom(data, ["a", "b"], n=3) ... print "%5.3f %s" % (quality, ab.values) ... 53.000 <2-2+2-3, 1-2+1-3+2-1+3-2+3-3, 1-1+3-1> 19.000 <3-2+3-3, 1-2+1-3+2-2+3-1, 1-1+2-1+2-3> 2.000 <2-1+2-3+3-1, 2-2, 1-1+1-2+1-3+3-2+3-3> 100.000 <1-3+2-3+3-1+3-3, 1-2+2-1+3-2, 1-1+2-2> 86.000 <1-3+2-2+3-1+3-2, 1-1+2-3+3-3, 1-2+2-1>

FeatureByCartesianProduct

FeatureByCartesianProduct constructs an attribute which presents a Cartesian product of bound attributes. Its quality is estimated by a given attribute quality estimator of type orange.MeasureAttribute. If none is given, all qualities are 0.

measure
The measure of attribute quality.

Let us construct a cartesian product-attribute and assess its quality using information gain:

>>> ab, quality = orngCI.FeatureByCartesianProduct(data, ["a", "b"], ... measure=orange.MeasureAttribute_gainRatio()) >>> print "%5.3f %s" % (quality, ab.values) 0.145 <1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3>

Multiple Feature Construction

FeatureGenerator

orngCI.FeatureGenerator is a class that constructs a list of features using the given feature construction algorithm. Besides, you can also specify a subset generator - an object that generates bound sets.

featureInducer
The object that builds features from given examples and sets of bound attributes. You can use any of the above feature inducers for this purpose. This attribute needs to be specified; no default is provided.
subsetGenerator
Generates bound sets from the given attributes. The default generator is orange.SubsetsGenerator_constSize(2), which returns all pairs of attributes. Orange provides several alternatives, the most useful if orange.SubsetsGenerator_minMaxSize(min, max) which generates all subsets within the specified minimal and maximal number of elements. You can define your own generators in Python, if you like.

The core of the classes' code is really trivial. The class first checks whether the two fields are specified and provides the default for subsetGenerator if needed. Then it executes:

return [self.featureInducer(data, bound, weightID) for bound in ssgen]

We will user the class to induce new features from all attribute pairs in Monk 1 using Kramer's algorithm.

>>> fk = orngCI.FeatureByKramer() >>> features = orngCI.FeatureGenerator(data, featureInducer = fk) >>> for ab, quality in features: ... names = [x.name for x in ab.getValueFrom.boundset()] ... print "%s: %5.3f, %s" % (names, quality) ['a', 'b']: -0.125 ['a', 'c']: -0.250 ['a', 'd']: -0.250 ['a', 'e']: -0.167 ['a', 'f']: -0.250 ...

StructureInducer

StructureInducer induces a hierarchy of attributes as proposed by Zupan et al (1998). At each step, it induces attributes from subset of the current attribute set. If any attributes are found, one of them is selected, inserted into domain and its bound attributes are removed. This is the repeated until there is only one attribute left or the selected attribute includes all attributes as bound set (i.e. if new attributes are induced from pairs of existing attributes, this would stop the induction when there are only two attributes left. The result of calling StructureInducer is a function that computes the class attribute from the remaining attributes. Through using functions boundset() and getValueFrom it is possible to descend down through hierarchy.

featureInducer
Any feature inducer described above (such as FeatureByMinError or FeatureByKramer) or another Python function or class with the same behaviour.
subsetsGenerator
A subset generator for bound attributes. StructureInducer constructs a FeatureGenerator and initializes it with featureInducer and subsetsGenerator.
redundancyRemover
Any class or function that gets examples and weight, and returns examples with redundancies removed. An example of such class is orngCI.AttributeRedundanciesRemover. If given, it is called prior to structure induction (but not during induction).
keepDuplicates
During structure induction, as attributes are merged, more and more examples become the same. To speed up the induction, such examples are replaced with a single example with a higher weight. If feature inducer can correctly handle weights (or if example duplicates do not change the induced features, as for instance, in minimal-complexity decomposition), this does not change the outcome. If you, however, want to keep the duplicated examples, set keepDuplicates to 1).

This example shows how to induce a structure using Kramer's method for feature construction, where data stores examples for dataset Monk 1.

>>> inducer = orngCI.FeatureByKramer() >>> root = orngCI.StructureInducer(data, featureInducer=inducer) >>> orngCI.printHierarchy(root) y/2 <0, 1> d/3 <1, 2, 3> c4/2 <c0, c1> f/2 <1, 2> c3/2 <1-c0+2-c0, 1-c1+2-c1> c/2 <1, 2> c2/2 <c0, c1> e/4 <1, 2, 3, 4> c1/2 <3-1+1-3+2-1+1-2+3-2+2-3, 2-2+3-3+1-1> a/3 <1, 2, 3> b/3 <1, 2, 3>

Learning

HINT

Class HINT can be used as a Learner.

Basically, HINT is just a wrapper around StructureInducer. The user can specify the type of decomposition (minimal complexity, minimal error) and the size of bound sets. Alternatively, HINT can use defaults. When minimal error decomposition is used, HINT fits parameter m for error estimation by performing a 5-fold cross validation to find the value that gives the maximal classification accuracy.

type
Sets the type of decomposition. It can be "complexity", "error", or "auto". This, basically, sets the StructureInducer's featureInducer to FeatureByMinError or FeatureByMinComplexity. When type is "auto", HINT tries to determine whether the domain contains noise or not by observing whether there are examples with the same values of attributes but different classes. If there are no such examples, minimal-complexity is used; if there are, it switches to minimal-error.
boundSize
Defines the size of bound sets. It can be either an integer (usually 2 or 3) or a tuple with minimal and maximal bound set size. If bound size is not given, bound sets are pairs of attributes.

Auxiliary Functions

RemoveAttributeRedundancies

RemoveAttributeRedundancies is a class for removing (merging) redundant attribute values. The class works as proposed by Zupan (1997). First it ranks the attributes according to some measure of quality (ReliefF, by default). Then it induces a new attribute from each attribute, starting with the worst ranked. If the induced attribute is the same as the original, nothing happens. If some values were merged, the original attribute is replaced with the induced. If all values are merged into one, the attribute is redundant and is removed.

m
The parameter for m-error estimate
inducer
Inducer for constructing attributes from the original. If omitted, FeatureByMinError is used if m is given, and FeatureByMinComplexity if it is not.

addAttribute(attribute, table)

This function adds an attribute to the domain. A new domain (and a new ExampleTable is constructed; the original is left intact. Values for the new attribute are computed through its valueFrom function.

replaceWithInduced(attribute, table)

Similar to addAttribute, but also removes the bound attributes from the domain.

printHierarchy(attribute | classifier)

Prints out a hierarchy starting with a given attribute or a classifier. In the latter case, the classifier must support boundset method; classifiers returned by InduceStructure and HINT do have this method.

You've seen an example of how to use this function a while ago.

dotHierarchy(file, attribute | classifier

Writes a hierarchy starting with a given attribute or a classifier to a file for tree drawing program Dot from package GraphViz. If the classifier is given, it must support boundset method; classifiers returned by InduceStructure and HINT do have this method. File can be given as a string, in which case a new file is created, or as an opened file, in which case the tree is appended to the file.

References

S. Kramer. CN2-MCI (1994): A two-Step Method for Constructive Induction. Proc. ML-COLT '94 Workshop on Constructive Induction and Change of Representation. (http://www.hpl.hp.com/personal/Tom_Fawcett/CICR/kramer.ps.gz)

S. B. Thrun et al. (1991): A performance comparison of different learning algorithms. Carnegie Mellon University CMU-CS-91-197.

B. Zupan (1997). Machine learning based on function decomposition. PhD Thesis at University of Ljubljana.

B. Zupan, M. Bohanec, J. Demsar and I. Bratko (1998). Feature transformation by function decomposition. IEEE intell. syst. their appl., vol. 13, pp. 38-43