orngCI: Orange Constructive Induction Module
Module orngCI implements classes and function that support constructive induction (CI). It includes minimal-complexity function decomposition and a much generalized minimal-error function decomposition. Both can be used as feature induction algorithms or as a standalone learning algorithm HINT (Zupan, 1997; Zupan et al, 1998). In addition, the module provides an interface to generalized Kramer's method (Kramer, 1994) and serveral other CI related algorithms.
Constructive Induction in Orange
Constructive induction takes a subset of attributes, which we call bound attributes and creates a new attribute whose values are computed from values of the bound attributes. Orange supports the so-called data-driven constructive induction, where learning examples are used to decide how the new attribute is computed from the bound ones.
Feature constructors (orange objects that construct features) are thus given examples, a list of bound attributes, and, optionally, an id of the weight attribute. They construct a new attribute and return a tuple with the new feature and a value that estimates its quality.
Constructing a Constructor
Complex feature constructors, such as those for minimal-error
decomposition and for Kramer's algorithm show the flexibility of
Orange's component based approach. On the other hand, the hierarchy of
components that need to be put together to construct a workable
feature inducer is too complex even for an experienced user. The main
mission of orngCI is thus to help the user by
"flattening" the hierarchy. Defaults are provided for all the
components so the user only needs to give the specific parts of the
hierarchy.
For instance, let fim be an instance of
orange.FeatureIM, a feature constructor for minimal error
function decomposition. Its most crucial parameter, m, is
"hidden" in
fim.clustersFromIM.columnAssessor.m. Parameter m
is an attribute of an m-estimate based column assessor component of
the column quality assessor based algorithm for column clustering. To
save the user from having to remember this, orngCI offers
its own class . Now, if fim is an
instance of orngCI.FeatureByIM the user needs to set
fim.m. Before the actual feature construction takes
place, orngCI.FeatureByIM constructs an appropriate
orange.FeatureByIM and sets its
clustersFromIM.columnAssessor.m to
fim.m. Constructing a structure and putting the
user-supplied components at the right places is called
deflattening.
A general principle is that if you define a component, you also
need to define its subcomponents as well. The default
columnAssessor is
orange.ColumnAssessor_m. If you leave it alone, all its
subcomponents are set as well. However, if you set
columnAssessor to
orange.ColumnAssessor_Measure, it is your responsibility
to set its attribute "measure", either by specifying it as a "measure"
argument to FeatureByIM or by setting it directly in
columnAssessor.measure.
The rest of this section is intended for advanced users.
The function that performs deflattening is always called createInstance.
- createInstance()
- Returns an instance of pure
orangeclasses that perform the feature construction. The instance is also stored inself's fieldinstance.
Deflattening is initiated by the call operator. The call operator
is given examples, bound attributes and, optionally, the id of the
weight attribute, and returns a constructed attribute and an
assessment of its quality. To do its job, the call operator needs an
instance of a corresponding Orange feature constructor. Constructing
such instances at each invocation of call would be too costly,
therefore a kind of caching is implemented. When the call operator is
invoked for the first time, createInstance is called to
deflatten the structure and the result is stored in
self.instance. In further calls, the stored
self.instance is used again. However, when any of classes
attributes that affect the feature construction (such as
fim.m, see above example) are modified,
self.instance is set to None and a new
instance will be constructed at next call.
Now, if you want to, say, tune fim.m, you can change
it and cause reinstantiations of corresponding orange
classes.
This will construct features from the first two attributes of the
domain, using different values of m. (It would be a blasphemy not to
publish a shorted way to write the above code fragment: the last four
lines can be replaced by attrs = [fim(data, bound)[0] for fim.m
in [0.1, 0.5, 1.0, 2.0]]). However, at each call to
fim, createInstances is called and a new set
of components is constructed since setting m resets
them. To make the program faster, you should replace
by
Constructing a Feature
To construct a feature, you need to call a constructor. The call
operator expects two or three arguments: examples, bound-attributes
and, optionally an id of a weight meta attribute. It returns a tuple
containing the attribute and its quality estimation. This is the same
for all feature constructors in orngCI, although some may
ignore some arguments or return useless quality assessments.
The method of quality estimation depends upon the algorithm used - it can be a decrease of error, an increase of impurity, ets. Qualities returned by different feature constructors are not comparable. The quality can be either negative or positive; in any case, higher numbers mean better attributes. If the measure of quality is negated (ie. low numbers meaning better attributes), they will have a negative sign.
The famous Monk 1 data set (Thrun et al, 1991) has the goal concept y := a=b or e=1. When joining attributes 'a' and 'b', the new attribute should express equality of a and b. Let us see if the minimal-complexity function decomposition can find the appropriate function.
Note that we specified the bound attributes by names. This is not the only way. If you want, you can give attribute descriptors, or even their indices in the domain. Thus, knowing that attributes "a" and "b" are the first two attributes, the above call could be replaced by
or even
or by any combination of names, descriptors and indices.
Now for the results. Values of the new attribute show to which values of the bound attributes they correspond. The new attribute is binary, its first value is named '1-1+2-2+3-3' and the other is '1-2+1-3+2-1+2-3+3-1+3-2'. It is obvious that this is the correct feature. Its quality is -2; minimal-complexity function decomposition prefers attributes with less values, so higher the number of values, more negative (and thus lower) the quality.
It makes sense to rename the values of the constructed feature. In our case, we shall call the first value "eq" and the second would be "diff".
If any of bound attributes have values that include '-' or '+', the constructed attribute's values will be named "c1", "c2", "c3"...
Using the Constructed Features
Constructed features can be added to the domain and used in further processing of the data, such as, for instance, learning. If that is what you want, you do not need to read this section any further: if you have followed the above example, you can conclude it by
This adds the attribute to the domain and returns a new example table.
The feature knows how to compute itself from the values of the attributes it is constructed from. The usual Orange mechanism is used for this: the new feature has a getValueFrom pointing to a classifier that computes the feature's value. In short, when some method has an example and needs a value of a particular attribute, but this attribute is not in the example's domain, it can check whether the attribute has a getValueFrom defined. If it has, the method calls it, passing the example as an argument. The getValueFrom either computes the attribute's value or returns don't know.
For most of features constructed by methods in orngCI, getValueFrom will contain some variant of orange.ClassifierByLookupTable. In above case, it is a classifier by lookup table based on two attributes. Values of the new attribute are read from its lookupTable. Each entry in the table corresponds to a combination of value of bound attributes; the first corresponds to (1, 1), the second to (1, 2), then to (1, 3), (2, 1), (2,2)..., where the first digit is for variable1 and the second for variable2. Value "eq" occurs at the first, the fifth and the ninth elements of the table, which correspond to (1, 1), (2, 2) and (3, 3). This is exactly what we expected for Monk 1 dataset.
If you need to manually correct the feature, you can change the lookupTable:
This would work for most cases, but it would fail when you start to mess with probability distributions of the new feature. The complete description of classifiers by lookup table is, however, out of scope of this text.
Let us show an example of what getValueFrom can do. We shall compute the class distributions for different values of the new attribute:
Works even though the new attribute, ab is not in domain. Its values are computed on the fly, while counting, and are not stored anywhere. What the results say is that for the first value ("eq"), all examples are in the second class ("1"), while when values of a and b are different, 216 examples (exactly 3/4) are in the first class and the other 72 in the second. Similarly, you could assess the quality of the new attribute by using information gain, for instance. Note that the automatic computation won't always work. This is done as a precaution in cases when the algorithm would require many recomputations of the value on one and the same example. ReliefF, for instance, is one such measure. In such cases, you add the new attribute to a domain and compute a new dataset in an ExampleTable, as shown above.
Feature Inducers as Redundancy Removers
Feature inducers can be used to remove (or merge) redundant attribute values. In Monk 1, the attribute "e" has four values, but the goal concept only asks whether the value is "1" or not. Thus, values "2", "3" and "4" are equivalent. If minimal complexity decomposition is called to construct a new attribute from "e", it will easily remove the redundancies:
We can check the lookupTable again.
Single Feature Construction
FeatureByMinComplexity
- colorIG
- defines the component that colors the graph. Since only one class (
orange.ColorIG_MCF) is available, there is no point in changing this (more accurately, there is nothing to change it to). - completion
- defines what to do with combinations of values of bound attributes that did not occur in the data or for which the corresponding graph node had no connections to any other node. The default value of
completionisorngCI.FeatureByMinComplexity.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives areorngCI.FeatureByMinComplexity.CompletionByDefault, where the most frequent value is used for imputation, andorngCI.FeatureByMinComplexity.NoCompletion, where the values are left undefined.
FeatureByIM
Class
The algorithm builds an incompatibility matrix (also called partition matrix) and merges the columns until a stopping criterion is reached. In Zupan's method, columns are compared by how the merge would affect the average m-error estimate over elements of the matrix. Merging is stopped when the error stops decreasing. In Orange, different similarity measures and stopping criteria can be used.
Since the FeatureByIM is analogous for FeatureByMinComplexity , orngCI.FeatureByMinError is provided as an alias for orngCI.FeatureByIM. No class with this name appears in orange, though.
orngCI.FeatureByIM has a number of settable components. Their actual positions after deflattening are given in parentheses.
- IMconstructor (instance.IMconstructor)
- is a component used for construction of incompatibility matrix. If left at default value of
None,orange.IMBySortingwill be used. - clustersFromIM (instance.clustersFromIM)
- is the method that defines how the columns are merged. The only available class at the moment is
ClustersFromIMByAssessorwhich uses acolumnAssessorto evaluate distances between the columns and merges the most similar columns until thestopCriterionsays it's enough. - stopCriterion (instance.clustersFromIM.stopCriterion)
- decides when to stop the merging. The default depends upon other parameters. The default default (see below) is
orange.StopIMClusteringByAssessor_noProfit, which stops the merging when the gain is negative or when it is smaller than the prescribed proportion. Alternatives areorange.StopIMClusteringByAssessor_binarythat stops merging when only two columns (future values) are left, andorange.StopIMClusteringByAssessor_nwhich stops atncolumns. - minProfitProportion (instance.clustersFromIM.stopCriterion.minProfitProportion)
- sets the minimal profit for merging to continue. If you set it, for example, to 0.05, the merging will stop when the best merge would improve the current quality for less than 5%. If you use this with m-error estimation, for instance, the merging would stop when no merge would decrease the error by at least 5%. The default
minProfitProportionis 0. - n (instance.clustersFromIM.stopCriterion.n)
- sets the number of values for the new feature. If
nis specified, the defaultstopCriterionisorange.StopIMClusteringFromByAssessor_n. - binary
- tells that we want to induce binary features.
orange.StopIMClusteringByAssessor_binaryis used as stopping criterion. - columnAssessor (instance.clustersFromIM.columnAssessor)
- is the component used to assess the qualities of matrices' elements, its columns and profits by column merging.
This component is independent of stopping criteria. The default default is
orange.ColumnAssessor_m, which uses m-error estimation. Alternatives are-
orange.ColumnAssessor_Laplaceminimizes Laplace estimation of error of matrix elements; orange.ColumnAssessor_Measurewhich optimizes an attribute quality measure (such as information gain); this would be usually used in conjunction withorange.StopIMClusteringByAssessor_binaryororange.StopIMClusteringByAssessor_n;-
orange.ColumnAssessor_Kramerwhich optimizes Kramer's impurity measure. It is defined for binary classes only. Since the profit is non-positive, the default stopping criteria is changed toorange.StopIMClusteringByAssessor_binary. You may replace it byorange.StopIMClusteringByAssessor_n. -
orange.ColumnAssessor_Reliefuses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.
-
- m (instance.clustersFromIM.columnAssessor.m)
- gives m for m-error estimate.
- measure (instance.clustersFromIM.columnAssessor.measure)
- is the measure to be used when
columnAssessoris of typeorange.ColumnAssessor_Measure. If this option is given, the defaultcolumnAssessorisorange.ColumnAssessor_Measure. - completion
- defines what to do with combinations of values of bound attributes that were not present in the data. The default value of
completionisorngCI.FeatureByIM.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives areorngCI.FeatureByIM.CompletionByDefault, where the most frequent value is used for imputation, andorngCI.FeatureByIM.NoCompletion, where the values are left undefined.
You should pay attention to make sensible combinations of the arguments. If you, for instance, combine column assessor based on Kramer's impurity measure with the ordinary "no profit" stop criterion (by setting both explicitly), you will probably get features that will be simply Cartesian products of bound attributes, since Kramer's impurity measure is non-decreasing. orngCI does not warning about nonsense settings.
As an example, let us check how minimal-error decomposition performs on Monk 1. We shall check its performance with different values of m, and for each we shall print out the values of the constructed feature.
Results can vary due to random decisions in the procedure. In general, lower m's give correct features. This is not surprising: through m the user may inform the algorithm about the level of noise in the data. With higher m's, the algorithm will tend to oversimplify the feature, in our case, it will merge more than needed.
Let us see whether Kramer's impurity measure can compete with this.
It cannot. But what if we allow it to build a four-valued attribute?
As a more interesting test, let us see what happens if we construct an attribute with optimization of information gain.
orange.MeasureAttribute_info() estimates entropies in the elements of the incompatibility matrix and observes the changes in entropy with potential merging. This is not the standard information gain of an attribute but rather a form of non-myopic information gain measure.)
FeatureByKramer
implements Kramer's algorithm for feature construction (Kramer, 1994).
Basically, it computes a distribution of classes for each combination of values of bound attributes and merges them, just like the minimal-error decomposition merges columns of incompatibility matrix - and hopefully. In case of Monk 1, it extract a 0:36 distribution for combinations 1-1, 2-2 and 3-3, and 36:12 for other values. It is straightforward for any clustering algorithm to see that the former three values belong to one and the latter three to another group.
This method is somewhat similar to minimal-error decomposition. If you sum all rows of incompability matrix to a single row, you get a list of distributions corresponding to combinations of value of bound attributes. And that's exactly the data that Kramer's algorithm operates on. The remaining differences between the algorithms are only in the choice of components; while minimal-error estimation uses a change of m-error estimate as a measure of distance between two values of the new attribute, Kramer's algorithm uses a special impurity measure (which can be, as Kramer himself states, replaced by any other impurity measure or even by some measure of similarity between two distributions).
Due to the similarity between the two methods, the structure of their components is similar. The data structure that corresponds to IM (incompatilibity matrix) is called ExampleDist. There is no component analogous to IMconstructor.
- clustersFromDistributions (instance.clustersFromDistributions)
- is the method that defines how the values are merged. The only available class at the moment is
ClustersFromDistributionsByAssessorwhich uses adistributionAssessorto evaluate differences between the distributions and merges the most similar values until thestopCriterionsays it's enough. - distributionAssessor (instance.clustersFromDistributions.distributionAssessor)
- is the component used to assess the individual and average qualities of distributions, and profits from their merging.
This component is independent of stopping criteria. The default is
orange.DistributionAssessor_Kramer, which uses Kramer's measure of impurity. Alternatives are-
orange.ColumnAssessor_Laplaceminimizes Laplace estimation of error of distributions; -
orange.DistributionAssessor_Measurewhich optimizes an attribute quality measure (such as information gain); -
orange.ColumnAssessor_mwhich assesses the quality of distributions using m-estimate for error. -
orange.DistributionAssessor_Reliefuses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.
orngCI.FeatureByIMclass. This also influences the defaultstopCriterion(see below). -
- stopCriterion (instance.clustersFromDistributions.stopCriterion)
- decides when to stop the merging. The default depends upon other parameters. Rules for default are the same as for
orngCI.FeatureByIM: when non,minProfitProportionorbinaryare given, thestopCriteriondepends uponcolumnAssessor. When Kramer's assessor is used, default isorange.StopDistributionClustering_binary, which stops when only two values are left. Otherwise,orange.StopDistributionClustering_noProfitthat stops when the gain is negative or when it is smaller than the prescribed proportion is used. The third alternative isorange.StopDistributionClustering_nthat stops atnvalues. - minProfitProportion (instance.clustersFromDistributions.stopCriterion.minProfitProportion)
- sets the minimal profit for merging to continue. If you set it, for example, to 0.05, the merging will stop when the best merge would improve the current quality for less than 5%. If you use this with m-error estimation, for instance, the merging would stop when no merge would decrease the error by at least 5%. The default
minProfitProporionis 0. - n (instance.clustersFromDistributions.stopCriterion.n)
- sets the number of values for the new feature. If
nis specified, the defaultstopCriterionisorange.StopDistributionClusteringByAssessor_n. - binary
- tells that we want to induce binary features.
orange.StopIMClusteringByAssessor_binaryis used as stopping criterion. - m (instance.clustersFromDistributions.distributionAssessor.m)
- gives m for m-error estimate.
- measure (instance.clustersFromDistribution.distributionAssessor.measure)
- is the measure to be used when
distributionAssessoris of typeorange.DistributionAssessor_Measure. If this option is given, the defaultdistributionAssessorisorange.DistributionAssessor_Measure. - completion
- defines what to do with combinations of values of bound attributes that were not present in the data. The default value of
completionisorngCI.FeatureByKramer.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives areorngCI.FeatureByKramer.CompletionByDefault, where the most frequent value is used for imputation, andorngCI.FeatureByKramer.NoCompletion, where the values are left undefined.
The class usage is essentially the same as that of orngCI.FeatureByIM except for different defaults for distributionAssessor and stopCriterion. This invokes the default Kramer's algorithm that uses Kramer's measure of impurity for distribution quality assessment and induces binary concepts:
Setting m will replace the distribution estimator by orange.DistributionAssessor_m and, consequentially, orange.StopDistributionClustering_noProfit will be used as the stop criterion:
FeatureByRandomMerge
constructs features with the given number of values. Grouping of combinations of bound values is random and the estimated quality of the feature is a random value between 0 and 100, inclusive. The class is useful for comparisons - if a "real" method of constructive induction does not outperform this, it is of no use.
- n
- Sets the number of values for the new feature
As an example, we shall induce 30-valued random features:
FeatureByCartesianProduct
constructs an attribute which presents a Cartesian product of bound attributes. Its quality is estimated by a given attribute quality estimator of type orange.MeasureAttribute. If none is given, all qualities are 0.
- measure
- The measure of attribute quality.
Let us construct a cartesian product-attribute and assess its quality using information gain:
Multiple Feature Construction
FeatureGenerator
orngCI. is a class that constructs a list of features using the given feature construction algorithm. Besides, you can also specify a subset generator - an object that generates bound sets.
- featureInducer
- The object that builds features from given examples and sets of bound attributes. You can use any of the above feature inducers for this purpose. This attribute needs to be specified; no default is provided.
- subsetGenerator
- Generates bound sets from the given attributes. The default generator is
orange.SubsetsGenerator_constSize(2), which returns all pairs of attributes. Orange provides several alternatives, the most useful iforange.SubsetsGenerator_minMaxSize(min, max)which generates all subsets within the specified minimal and maximal number of elements. You can define your own generators in Python, if you like.
The core of the classes' code is really trivial. The class first checks whether the two fields are specified and provides the default for subsetGenerator if needed. Then it executes:
We will user the class to induce new features from all attribute pairs in Monk 1 using Kramer's algorithm.
StructureInducer
induces a hierarchy of attributes as proposed by Zupan et al (1998). At each step, it induces attributes from subset of the current attribute set. If any attributes are found, one of them is selected, inserted into domain and its bound attributes are removed. This is the repeated until there is only one attribute left or the selected attribute includes all attributes as bound set (i.e. if new attributes are induced from pairs of existing attributes, this would stop the induction when there are only two attributes left. The result of calling StructureInducer is a function that computes the class attribute from the remaining attributes. Through using functions boundset() and getValueFrom it is possible to descend down through hierarchy.
- featureInducer
- Any feature inducer described above (such as
FeatureByMinErrororFeatureByKramer) or another Python function or class with the same behaviour. - subsetsGenerator
- A subset generator for bound attributes.
StructureInducerconstructs aFeatureGeneratorand initializes it withfeatureInducerandsubsetsGenerator. - redundancyRemover
- Any class or function that gets examples and weight, and returns examples with redundancies removed. An example of such class is
orngCI.AttributeRedundanciesRemover. If given, it is called prior to structure induction (but not during induction). - keepDuplicates
- During structure induction, as attributes are merged, more and more examples become the same. To speed up the induction, such examples are replaced with a single example with a higher weight. If feature inducer can correctly handle weights (or if example duplicates do not change the induced features, as for instance, in minimal-complexity decomposition), this does not change the outcome. If you, however, want to keep the duplicated examples, set
keepDuplicatesto 1).
This example shows how to induce a structure using Kramer's method for feature construction, where data stores examples for dataset Monk 1.
Learning
HINT
Class can be used as a Learner.
Basically, HINT is just a wrapper around StructureInducer. The user can specify the type of decomposition (minimal complexity, minimal error) and the size of bound sets. Alternatively, HINT can use defaults. When minimal error decomposition is used, HINT fits parameter m for error estimation by performing a 5-fold cross validation to find the value that gives the maximal classification accuracy.
- type
- Sets the type of decomposition. It can be
"complexity","error", or"auto". This, basically, sets theStructureInducer'sfeatureInducertoFeatureByMinErrororFeatureByMinComplexity. Whentypeis"auto",HINTtries to determine whether the domain contains noise or not by observing whether there are examples with the same values of attributes but different classes. If there are no such examples, minimal-complexity is used; if there are, it switches to minimal-error. - boundSize
- Defines the size of bound sets. It can be either an integer (usually 2 or 3) or a tuple with minimal and maximal bound set size. If bound size is not given, bound sets are pairs of attributes.
Auxiliary Functions
RemoveAttributeRedundancies
- m
- The parameter for m-error estimate
- inducer
- Inducer for constructing attributes from the original. If omitted,
FeatureByMinErroris used ifmis given, andFeatureByMinComplexityif it is not.
addAttribute(attribute, table)
This function adds an attribute to the domain. A new domain (and a new ExampleTable is constructed; the original is left intact. Values for the new attribute are computed through its valueFrom function.
replaceWithInduced(attribute, table)
Similar to addAttribute, but also removes the bound attributes from the domain.
printHierarchy(attribute | classifier)
Prints out a hierarchy starting with a given attribute or a classifier. In the latter case, the classifier must support boundset method; classifiers returned by InduceStructure and HINT do have this method.
You've seen an example of how to use this function a while ago.
dotHierarchy(file, attribute | classifier
Writes a hierarchy starting with a given attribute or a classifier to a file for tree drawing program Dot from package GraphViz. If the classifier is given, it must support boundset method; classifiers returned by InduceStructure and HINT do have this method. File can be given as a string, in which case a new file is created, or as an opened file, in which case the tree is appended to the file.
References
S. Kramer. CN2-MCI (1994): A two-Step Method for Constructive Induction. Proc. ML-COLT '94 Workshop on Constructive Induction and Change of Representation. (http://www.hpl.hp.com/personal/Tom_Fawcett/CICR/kramer.ps.gz)
S. B. Thrun et al. (1991): A performance comparison of different learning algorithms. Carnegie Mellon University CMU-CS-91-197.
B. Zupan (1997). Machine learning based on function decomposition. PhD Thesis at University of Ljubljana.
B. Zupan, M. Bohanec, J. Demsar and I. Bratko (1998). Feature transformation by function decomposition. IEEE intell. syst. their appl., vol. 13, pp. 38-43
