source: orange/docs/reference/rst/Orange.data.continuization.rst @ 10780:bf55f2fffd32

Revision 10780:bf55f2fffd32, 5.4 KB checked in by Lan Zagar <lan.zagar@…>, 2 years ago (diff)

Two-fived a doc code snippet.

Line 
1.. py:currentmodule:: Orange.data.continuization
2
3###################################
4Continuization (``continuization``)
5###################################
6
7Continuization refers to transformation of discrete (binary or
8multinominal) variables to continuous. The class described below
9operates on the entire domain; documentation on
10:file:`Orange.core.transformvalue.rst` explains how to treat each
11variable separately.
12
13.. class:: DomainContinuizer
14
15    Returns a new domain containing only continuous attributes given a
16    domain or data table. Some options are available only if the data is
17    provided.
18
19    The attributes are treated according to their type:
20
21    * continuous variables can be normalized or left unchanged
22
23    * discrete attribute with less than two possible values are removed;
24
25    * binary variables are transformed into 0.0/1.0 or -1.0/1.0
26      indicator variables
27
28    * multinomial variables are treated according to the flag
29      ``multinomial_treatment``.
30
31    The typical use of the class is as follows::
32
33        continuizer = Orange.data.continuization.DomainContinuizer()
34        continuizer.multinomial_treatment = continuizer.LowestIsBase
35        domain0 = continuizer(data)
36        data0 = data.translate(domain0)
37
38    .. attribute:: zero_based
39
40        Determines the value used as the "low" value of the variable. When
41        binary variables are transformed into continuous or when multivalued
42        variable is transformed into multiple variables, the transformed
43        variable can either have values 0.0 and 1.0 (default, ``zero_based``
44        is ``True``) or -1.0 and 1.0 (``zero_based`` is ``False``). The
45        following text assumes the default case.
46
47    .. attribute:: multinomial_treatment
48
49       Decides the treatment of multinomial variables. Let N be the
50       number of the variables's values.
51
52       DomainContinuizer.NValues
53
54           The variable is replaced by N indicator variables, each
55           corresponding to one value of the original variable. In other
56           words, for each value of the original attribute, only the
57           corresponding new attribute will have a value of 1 and others
58           will be zero.
59
60           Note that these variables are not independent, so they cannot be
61           used (directly) in, for instance, linear or logistic regression.
62
63           For example, data set "bridges" has feature "RIVER" with
64           values "M", "A", "O" and "Y", in that order. Its value for
65           the 15th row is "M". Continuization replaces the variable
66           with variables "RIVER=M", "RIVER=A", "RIVER=O" and
67           "RIVER=Y". For the 15th row, the first has value 1 and
68           others are 0.
69
70       DomainContinuizer.LowestIsBase
71           Similar to the above except that it creates only N-1
72           variables. The missing indicator belongs to the lowest value:
73           when the original variable has the lowest value all indicators
74           are 0.
75
76       If the variable descriptor has the ``base_value`` defined, the
77           specified value is used as base instead of the lowest one.
78
79           Continuizing the variable "RIVER" gives similar results as
80           above except that it would omit "RIVER=M"; all three
81           variables would be zero for the 15th data instance.
82
83       DomainContinuizer.FrequentIsBase
84           Like above, except that the most frequent value is used as the
85           base (this can again be overidden by setting the descriptor's
86           ``base_value``). If there are multiple most frequent values, the
87           one with the lowest index is used. The frequency of values is
88           extracted from data, so this option cannot be used if constructor
89           is given only a domain.
90
91           Variable "RIVER" would be continuized similarly to above
92           except that it omits "RIVER=A", which is the most frequent value.
93           
94       DomainContinuizer.Ignore
95           Multivalued variables are omitted.
96
97       DomainContinuizer.ReportError
98           Raise an error if there are any multinominal variables in the data.
99
100       DomainContinuizer.AsOrdinal
101           Multivalued variables are treated as ordinal and replaced by a
102           continuous variables with the values' index, e.g. 0, 1, 2, 3...
103
104       DomainContinuizer.AsNormalizedOrdinal
105           As above, except that the resulting continuous value will be from
106           range 0 to 1, e.g. 0, 0.25, 0.5, 0.75, 1 for a five-valued
107           variable.
108
109    .. attribute:: normalize_continuous
110
111        If ``False`` (default), continues variables are left unchanged. If
112        ``True``, they are replaced with normalized values by subtracting
113        the average value and dividing by the deviation. Statistics are
114        computed from the data, so constructor must be given data, not just
115        domain.
116
117    .. attribute class_treatment
118
119        Determines the treatment of discrete class attribute. Continuous
120        class attributes are always left unchanged.
121
122        DomainContinuizer.Ignore
123           Class attribute is copied as is. Note that this is different
124           from the meaning of this value at multinomial_treatment where
125           it denotes omitting the attribute.
126
127        DomainContinuizer.AsOrdinal, DomainContinuizer.AsNormalizedOrdinal
128           If class is multinomial, it is treated as ordinal, in the
129           same manner as described above. Binary classes are
130           transformed to 0.0/1.0 attributes.
Note: See TracBrowser for help on using the repository browser.