source: orange/docs/reference/rst/Orange.data.continuization.rst @ 9941:3580c7e699e8

Revision 9941:3580c7e699e8, 4.4 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Added documentation about continuization, finished TransformValue

Line 
1.. py:currentmodule:: Orange.core
2
3###################################
4Continuization (``continuization``)
5###################################
6
7Continuization refers to transformation of discrete (binary or
8multinominal) variables to continuous. The class described below
9operates on the entire domain; documentation on
10:file:`Orange.core.transformvalue.rst` explains how to treat each
11variable separately.
12
13.. class DomainContinuizer
14
15    Returns a new domain containing only continuous attributes given a
16    domain or data table. Some options are available only if the data is
17    provided.
18
19    The attributes are treated according to their type:
20
21    * continuous variables can be normalized or left unchanged
22
23    * discrete attribute with less than two possible values are removed;
24
25    * binary variables are transformed into 0.0/1.0 or -1.0/1.0
26      indicator variables
27
28    * multinomial variables are treated according to the flag
29      ``multinomial_treatment``.
30
31    .. attribute zero_based
32
33        Determines the value used as the "low" value of the variable. When
34        binary variables are transformed into continuous or when multivalued
35        variable is transformed into multiple variables, the transformed
36        variable can either have values 0.0 and 1.0 (default, ``zero_based``
37        is ``True``) or -1.0 and 1.0 (``zero_based`` is ``False``). The
38        following text assumes the default case.
39
40    .. attribute multinomial_treatment
41
42       Decides the treatment of multinomial variables. Let N be the
43       number of the variables's values.
44
45       DomainContinuizer.NValues
46
47           The variable is replaced by N indicator variables, each
48           corresponding to one value of the original variable. In other
49           words, for each value of the original attribute, only the
50           corresponding new attribute will have a value of 1 and others
51           will be zero.
52
53           Note that these variables are not independent, so they cannot be
54           used (directly) in, for instance, linear or logistic regression.
55
56       DomainContinuizer.LowestIsBase
57           Similar to the above except that it creates only N-1
58           variables. The missing indicator belongs to the lowest value:
59           when the original variable has the lowest value all indicators
60           are 0.
61
62       If the variable descriptor has the ``base_value`` defined, the
63           specified value is used as base instead of the lowest one.
64
65       DomainContinuizer.FrequentIsBase
66
67           Like above, except that the most frequent value is used as the
68           base (this can again be overidden by setting the descriptor's
69           ``base_value``). If there are multiple most frequent values, the
70           one with the lowest index is used. The frequency of values is
71           extracted from data, so this option cannot be used if constructor
72           is given only a domain.
73           
74       DomainContinuizer.Ignore
75           Multivalued variables are omitted.
76
77       DomainContinuizer.ReportError
78           Raise an error if there are any multinominal variables in the data.
79
80       DomainContinuizer.AsOrdinal
81           Multivalued variables are treated as ordinal and replaced by a
82           continuous variables with the values' index, e.g. 0, 1, 2, 3...
83
84       DomainContinuizer.AsNormalizedOrdinal
85           As above, except that the resulting continuous value will be from
86           range 0 to 1, e.g. 0, 0.25, 0.5, 0.75, 1 for a five-valued
87           variable.
88
89    .. attribute normalize_continuous
90
91        If ``False`` (default), continues variables are left unchanged. If
92        ``True``, they are replaced with normalized values by subtracting
93        the average value and dividing by the deviation. Statistics are
94        computed from the data, so constructor must be given data, not just
95        domain.
96
97    .. attribute class_treatment
98
99        Determines the treatment of discrete class attribute. Continuous
100        class attributes are always left unchanged.
101
102        DomainContinuizer.Ignore
103           Class attribute is copied as is. Note that this is different
104           from the meaning of this value at multinomial_treatment where
105           it denotes omitting the attribute.
106
107        DomainContinuizer.AsOrdinal, DomainContinuizer.AsNormalizedOrdinal
108           If class is multinomial, it is treated as ordinal, in the
109           same manner as described above. Binary classes are
110           transformed to 0.0/1.0 attributes.
Note: See TracBrowser for help on using the repository browser.