#
source:
orange/docs/reference/rst/Orange.data.continuization.rst
@
9966:74af350d804f

Revision 9966:74af350d804f, 5.4 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff) |
---|

Line | |
---|---|

1 | .. py:currentmodule:: Orange.core |

2 | |

3 | ################################### |

4 | Continuization (``continuization``) |

5 | ################################### |

6 | |

7 | Continuization refers to transformation of discrete (binary or |

8 | multinominal) variables to continuous. The class described below |

9 | operates on the entire domain; documentation on |

10 | :file:`Orange.core.transformvalue.rst` explains how to treat each |

11 | variable separately. |

12 | |

13 | .. class:: DomainContinuizer |

14 | |

15 | Returns a new domain containing only continuous attributes given a |

16 | domain or data table. Some options are available only if the data is |

17 | provided. |

18 | |

19 | The attributes are treated according to their type: |

20 | |

21 | * continuous variables can be normalized or left unchanged |

22 | |

23 | * discrete attribute with less than two possible values are removed; |

24 | |

25 | * binary variables are transformed into 0.0/1.0 or -1.0/1.0 |

26 | indicator variables |

27 | |

28 | * multinomial variables are treated according to the flag |

29 | ``multinomial_treatment``. |

30 | |

31 | The typical use of the class is as follows:: |

32 | |

33 | continuizer = orange.DomainContinuizer() |

34 | continuizer.multinomialTreatment = continuizer.LowestIsBase |

35 | domain0 = continuizer(data) |

36 | data0 = data.translate(domain0) |

37 | |

38 | .. attribute:: zero_based |

39 | |

40 | Determines the value used as the "low" value of the variable. When |

41 | binary variables are transformed into continuous or when multivalued |

42 | variable is transformed into multiple variables, the transformed |

43 | variable can either have values 0.0 and 1.0 (default, ``zero_based`` |

44 | is ``True``) or -1.0 and 1.0 (``zero_based`` is ``False``). The |

45 | following text assumes the default case. |

46 | |

47 | .. attribute:: multinomial_treatment |

48 | |

49 | Decides the treatment of multinomial variables. Let N be the |

50 | number of the variables's values. |

51 | |

52 | DomainContinuizer.NValues |

53 | |

54 | The variable is replaced by N indicator variables, each |

55 | corresponding to one value of the original variable. In other |

56 | words, for each value of the original attribute, only the |

57 | corresponding new attribute will have a value of 1 and others |

58 | will be zero. |

59 | |

60 | Note that these variables are not independent, so they cannot be |

61 | used (directly) in, for instance, linear or logistic regression. |

62 | |

63 | For example, data set "bridges" has feature "RIVER" with |

64 | values "M", "A", "O" and "Y", in that order. Its value for |

65 | the 15th row is "M". Continuization replaces the variable |

66 | with variables "RIVER=M", "RIVER=A", "RIVER=O" and |

67 | "RIVER=Y". For the 15th row, the first has value 1 and |

68 | others are 0. |

69 | |

70 | DomainContinuizer.LowestIsBase |

71 | Similar to the above except that it creates only N-1 |

72 | variables. The missing indicator belongs to the lowest value: |

73 | when the original variable has the lowest value all indicators |

74 | are 0. |

75 | |

76 | If the variable descriptor has the ``base_value`` defined, the |

77 | specified value is used as base instead of the lowest one. |

78 | |

79 | Continuizing the variable "RIVER" gives similar results as |

80 | above except that it would omit "RIVER=M"; all three |

81 | variables would be zero for the 15th data instance. |

82 | |

83 | DomainContinuizer.FrequentIsBase |

84 | Like above, except that the most frequent value is used as the |

85 | base (this can again be overidden by setting the descriptor's |

86 | ``base_value``). If there are multiple most frequent values, the |

87 | one with the lowest index is used. The frequency of values is |

88 | extracted from data, so this option cannot be used if constructor |

89 | is given only a domain. |

90 | |

91 | Variable "RIVER" would be continuized similarly to above |

92 | except that it omits "RIVER=A", which is the most frequent value. |

93 | |

94 | DomainContinuizer.Ignore |

95 | Multivalued variables are omitted. |

96 | |

97 | DomainContinuizer.ReportError |

98 | Raise an error if there are any multinominal variables in the data. |

99 | |

100 | DomainContinuizer.AsOrdinal |

101 | Multivalued variables are treated as ordinal and replaced by a |

102 | continuous variables with the values' index, e.g. 0, 1, 2, 3... |

103 | |

104 | DomainContinuizer.AsNormalizedOrdinal |

105 | As above, except that the resulting continuous value will be from |

106 | range 0 to 1, e.g. 0, 0.25, 0.5, 0.75, 1 for a five-valued |

107 | variable. |

108 | |

109 | .. attribute:: normalize_continuous |

110 | |

111 | If ``False`` (default), continues variables are left unchanged. If |

112 | ``True``, they are replaced with normalized values by subtracting |

113 | the average value and dividing by the deviation. Statistics are |

114 | computed from the data, so constructor must be given data, not just |

115 | domain. |

116 | |

117 | .. attribute class_treatment |

118 | |

119 | Determines the treatment of discrete class attribute. Continuous |

120 | class attributes are always left unchanged. |

121 | |

122 | DomainContinuizer.Ignore |

123 | Class attribute is copied as is. Note that this is different |

124 | from the meaning of this value at multinomial_treatment where |

125 | it denotes omitting the attribute. |

126 | |

127 | DomainContinuizer.AsOrdinal, DomainContinuizer.AsNormalizedOrdinal |

128 | If class is multinomial, it is treated as ordinal, in the |

129 | same manner as described above. Binary classes are |

130 | transformed to 0.0/1.0 attributes. |

**Note:**See TracBrowser for help on using the repository browser.