Orange Forum • View topic - Discretization test set with intervalls from training set

Discretization test set with intervalls from training set

A place to ask questions about methods in Orange and how they are used and other general support.

Discretization test set with intervalls from training set

Postby jens.auer » Thu Jan 19, 2006 11:48

Hi,

I want to discretize a dataset as a preprocessing step for my other methods with the entropy based method. Is there a build-in way to 1) discretize a training set and 2) use the intervalls from this discretization to discretize the test set?

Postby Blaz » Wed Feb 22, 2006 7:33

Sure, and it's very simple. The principle is that of conversion of data domain (information on types of attributes). When a data set is discretized, new attributes "remember" how this was done, so that if you use this domain and request some other data set to convert to, the discretization will take place. Here is the example code.

Code: Select all
import orange
data = orange.ExampleTable("iris")

#split the data to learn and test set
ind = orange.MakeRandomIndices2(data, p0=6)
learn = data.select(ind, 0)
test = data.select(ind, 1)

# discretize learning set, then use its new domain
# to discretize the test set
learnD = orange.Preprocessor_discretize(data, method=orange.EntropyDiscretization())
testD = orange.ExampleTable(learnD.domain, test)

print "Test set, original:"
for i in range(3):
    print test[i]

print "Test set, discretized:"
for i in range(3):
    print testD[i]


Notice that this mechanism is used in orange for prediction of classifiers. When these are induced from the learning set, they store the domain of the data. When they are presented an example to classify, the example is first converted using the learning domain (to match the classifier needs) and then passed to the classifier itself.


Return to Questions & Support