.. py:currentmodule:: Orange.core

2 | |

###################################

Continuization (``continuization``)

###################################

6 | |

Continuization refers to transformation of discrete (binary or

multinominal) variables to continuous. The class described below

operates on the entire domain; documentation on

:file:`Orange.core.transformvalue.rst` explains how to treat each

variable separately.

12 | |

.. class:: DomainContinuizer

14 | |

Returns a new domain containing only continuous attributes given a

domain or data table. Some options are available only if the data is

provided.

18 | |

The attributes are treated according to their type:

20 | |

* continuous variables can be normalized or left unchanged

22 | |

* discrete attribute with less than two possible values are removed;

24 | |

* binary variables are transformed into 0.0/1.0 or -1.0/1.0

indicator variables

27 | |

* multinomial variables are treated according to the flag

``multinomial_treatment``.

30 | |

The typical use of the class is as follows::

32 | |

continuizer = orange.DomainContinuizer()

continuizer.multinomialTreatment = continuizer.LowestIsBase

domain0 = continuizer(data)

data0 = data.translate(domain0)

37 | |

.. attribute:: zero_based

39 | |

Determines the value used as the "low" value of the variable. When

binary variables are transformed into continuous or when multivalued

variable is transformed into multiple variables, the transformed

variable can either have values 0.0 and 1.0 (default, ``zero_based``

is ``True``) or -1.0 and 1.0 (``zero_based`` is ``False``). The

following text assumes the default case.

46 | |

.. attribute:: multinomial_treatment

48 | |

Decides the treatment of multinomial variables. Let N be the

number of the variables's values.

51 | |

DomainContinuizer.NValues

53 | |

The variable is replaced by N indicator variables, each

corresponding to one value of the original variable. In other

words, for each value of the original attribute, only the

corresponding new attribute will have a value of 1 and others

will be zero.

59 | |

Note that these variables are not independent, so they cannot be

used (directly) in, for instance, linear or logistic regression.

62 | |

For example, data set "bridges" has feature "RIVER" with

values "M", "A", "O" and "Y", in that order. Its value for

the 15th row is "M". Continuization replaces the variable

with variables "RIVER=M", "RIVER=A", "RIVER=O" and

"RIVER=Y". For the 15th row, the first has value 1 and

others are 0.

69 | |

DomainContinuizer.LowestIsBase

Similar to the above except that it creates only N-1

variables. The missing indicator belongs to the lowest value:

when the original variable has the lowest value all indicators

are 0.

75 | |

If the variable descriptor has the ``base_value`` defined, the

specified value is used as base instead of the lowest one.

78 | |

Continuizing the variable "RIVER" gives similar results as

above except that it would omit "RIVER=M"; all three

variables would be zero for the 15th data instance.

82 | |

DomainContinuizer.FrequentIsBase

Like above, except that the most frequent value is used as the

base (this can again be overidden by setting the descriptor's

``base_value``). If there are multiple most frequent values, the

one with the lowest index is used. The frequency of values is

extracted from data, so this option cannot be used if constructor

is given only a domain.

90 | |

Variable "RIVER" would be continuized similarly to above

except that it omits "RIVER=A", which is the most frequent value.

93 | |

DomainContinuizer.Ignore

Multivalued variables are omitted.

96 | |

DomainContinuizer.ReportError

Raise an error if there are any multinominal variables in the data.

99 | |

DomainContinuizer.AsOrdinal

Multivalued variables are treated as ordinal and replaced by a

continuous variables with the values' index, e.g. 0, 1, 2, 3...

103 | |

DomainContinuizer.AsNormalizedOrdinal

As above, except that the resulting continuous value will be from

range 0 to 1, e.g. 0, 0.25, 0.5, 0.75, 1 for a five-valued

variable.

108 | |

.. attribute:: normalize_continuous

110 | |

If ``False`` (default), continues variables are left unchanged. If

``True``, they are replaced with normalized values by subtracting

the average value and dividing by the deviation. Statistics are

computed from the data, so constructor must be given data, not just

domain.

116 | |

.. attribute class_treatment

118 | |

Determines the treatment of discrete class attribute. Continuous

class attributes are always left unchanged.

121 | |

DomainContinuizer.Ignore

Class attribute is copied as is. Note that this is different

from the meaning of this value at multinomial_treatment where

it denotes omitting the attribute.

126 | |

DomainContinuizer.AsOrdinal, DomainContinuizer.AsNormalizedOrdinal

If class is multinomial, it is treated as ordinal, in the

same manner as described above. Binary classes are

transformed to 0.0/1.0 attributes.

