Orange Forum • View topic - Discretization test set with intervalls from training set

## Discretization test set with intervalls from training set

A place to ask questions about methods in Orange and how they are used and other general support.

### Discretization test set with intervalls from training set

Hi,

I want to discretize a dataset as a preprocessing step for my other methods with the entropy based method. Is there a build-in way to 1) discretize a training set and 2) use the intervalls from this discretization to discretize the test set?

Sure, and it's very simple. The principle is that of conversion of data domain (information on types of attributes). When a data set is discretized, new attributes "remember" how this was done, so that if you use this domain and request some other data set to convert to, the discretization will take place. Here is the example code.

Code: Select all
`import orangedata = orange.ExampleTable("iris")#split the data to learn and test setind = orange.MakeRandomIndices2(data, p0=6)learn = data.select(ind, 0)test = data.select(ind, 1)# discretize learning set, then use its new domain# to discretize the test setlearnD = orange.Preprocessor_discretize(data, method=orange.EntropyDiscretization())testD = orange.ExampleTable(learnD.domain, test)print "Test set, original:"for i in range(3):    print test[i]print "Test set, discretized:"for i in range(3):    print testD[i]`

Notice that this mechanism is used in orange for prediction of classifiers. When these are induced from the learning set, they store the domain of the data. When they are presented an example to classify, the example is first converted using the learning domain (to match the classifier needs) and then passed to the classifier itself.