## Utilities (utils)¶

### Value transformers¶

Value transformers take care of simple transformations of values. Discretization, for instance, creates a transformer that converts continuous values into discrete, while continuizers do the opposite. Classification trees use transformers for binarization where values of discrete attributes are converted into binary.

These objects are most often constructed by other classes and only seldom manually. See information on Data discretization (discretization) and Continuization (continuization).

class TransformValue

The abstract root of the hierarchy of transformers, which provides the call operator and chaining of transformers.

subtransformer

The transformation that takes place prior to this. This way, transformations can be chained.

class Ordinal2Continuous

Converts ordinal values to continuous. For example, variable values values small, medium, large, extra large (if given in that order) would be, by default, converted to 0.0, 1.0, 2.0 and 3.0. It is possible to add a factor by which the values are multiplied. If the factor for the above case were 0.3333, the value would be converted to 0, 0.3333, 0.6666 and 0.9999.

factor

The factor by which the values are multiplied.

import Orange.data
import Orange.feature

lenses = Orange.data.Table("lenses")
age = lenses.domain["age"]

age_c = Orange.feature.Continuous("age_c")
age_c.getValueFrom = Orange.classification.ClassifierFromVar(whichVar = age)
age_c.getValueFrom.transformer = Orange.data.utils.Ordinal2Continuous()

age_cn = Orange.feature.Continuous("age_cn")
age_cn.getValueFrom = Orange.classification.ClassifierFromVar(whichVar = age)
age_cn.getValueFrom.transformer = Orange.data.utils.Ordinal2Continuous()
age_cn.getValueFrom.transformer.factor = 0.5

newDomain = Orange.data.Domain([age, age_c, age_cn], lenses.domain.classVar)
newData = Orange.data.Table(newDomain, lenses)

The values of attribute age (young, pre-presbyopic and presbyopic) are transformed to 0.0, 1.0 and 2.0 in age_c and to 0, 0.5 and 1 in age_cn.

class Discrete2Continuous

Converts a discrete value to a continuous so that some chosen value is converted to 1.0 and all others to 0.0 or -1.0, depending on the settings.

value

The value that is converted to 1.0; others are converted to 0.0 or -1.0, depending on zero_based. Value needs to be specified by an integer index.

zero_based

Decides whether the other values will be transformed to 0.0 (True, default) or -1.0 (False). When False, undefined values are transformed to 0.0; otherwise, undefined values yield an error.

invert

If True (default is False), the transformations are reversed - the selected value becomes 0.0 (or -1.0) and others 1.0.

The following script loads the Monks 1 data set and constructs a new attribute e1 that will indicate whether e is 1 or not.

import Orange.data

monks = Orange.data.Table("monks-1")

e1 = Orange.feature.Continuous("e=1")
e1.getValueFrom = Orange.classification.ClassifierFromVar(whichVar=monks.domain["e"])
e1.getValueFrom.transformer = Orange.data.utils.Discrete2Continuous()
class NormalizeContinuous

Normalizes continuous values by subtracting the average and dividing the difference by half of the span.

average

The value that is subtracted from the original.

span

Divisor

The following script “normalizes” all attribute in the Iris dataset by subtracting the average value and dividing by the half of deviation.

for attr in iris.domain.features:
attr_c = Orange.feature.Continuous(attr.name + "_n")
attr_c.getValueFrom = Orange.classification.ClassifierFromVar(whichVar=attr)
transformer = Orange.data.utils.NormalizeContinuous()
attr_c.getValueFrom.transformer = transformer
transformer.average = domstat[attr].avg
transformer.span = domstat[attr].dev
newattrs.append(attr_c)

newDomain = Orange.data.Domain(newattrs, iris.domain.classVar)
newData = Orange.data.Table(newDomain, iris)
for ex in newData[:5]:
print ex
class MapIntValue

A discrete-to-discrete transformer that changes values according to the given mapping. MapIntValue is used for binarization in decision trees.

mapping

A mapping that determines the new value: v = mapping[v]. Undefined values remain undefined. Elements of the mapping are contains integer indices of values.

The following script transforms the value of age in dataset lenses from ‘young’ to ‘young’, and from ‘pre-presbyopic’ and ‘presbyopic’ to ‘old’.

import Orange

lenses = Orange.data.Table("lenses")
age = lenses.domain["age"]

age_b = Orange.feature.Discrete("age_c", values = ['young', 'old'])
age_b.getValueFrom = Orange.classification.ClassifierFromVar(whichVar = age)
age_b.getValueFrom.transformer = Orange.data.utils.MapIntValue()
age_b.getValueFrom.transformer.mapping = [0, 1, 1]

newDomain = Orange.data.Domain([age_b, age], lenses.domain.classVar)
newData = Orange.data.Table(newDomain, lenses)

The mapping tells that the 0th value of age maps to the 0th of age_b, and the 1st and 2nd value go to the 1st value of age_b.