source: orange/orange/doc/reference/TransformValue.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 24.7 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html> <HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
4</HEAD> <body>
5
6<h1>Value Transformers</h1>
7
8<P>Class <CODE><INDEX name="classes/TransformValue">TransformValue</CODE> is a base class for a hierarchy of classes used throughout Orange for simple transformation of values. Discretization, for instances, creates
9a transformer that converts continuous values into discrete,
10while continuizers do the opposite. Classification trees use transformers for binarization where values of discrete attributes are converted into binary.</P>
11
12<P>Transformers are most commonly used in conjunction with <a href="classifierFromVar.htm">Classifiers from Attribute</a>. It is also possible to subtype this class in Python.</P>
13
14<P>Although this classes can occasionally come very handy, you will mostly encounter them when created by other methods, such as discretization.</P>
15
16
17<H2>Transforming Individual Attributes</H2>
18
19<H3>TransformValue</H3>
20
21<P><CODE>TransformValue</CODE> is the abstract root of the hierarchy, itself derived from <CODE>Orange</CODE>. When called with a <A href="Value.htm"><CODE>Value</CODE></A> as an argument, it returns the transformed value.</P>
22
23<P>See <a href="classifierFromVar.htm">Classifiers from Attribute</a> for an example of how to derive new Python classes from <CODE>TransformValue</CODE>.</P>
24
25<P class=section>Attributes</P>
26<DL class=attributes>
27<DT>subTransformer</DT>
28<DD>Specifies the transformation that takes place prior to this. This way, transformations can be chained, although this will seldom be needed.</DD>
29</DL>
30
31
32<H3>Ordinal2Continuous</H3>
33<index name="converting discrete to continuous">
34
35<P><CODE><INDEX name="classes/Ordinal2Continuous">Ordinal2Continuous</CODE> converts ordinal values to equidistant continuous.
36Four-valued attribute with, say, values 'small', 'medium', 'large', 'extra large' would be converted to 0.0, 1.0, 2.0 and 3.0. You can also specify a factor by which the values are multiplied. If the factor for above attribute is set to 1/3 (or, in general, to 1 by number of values), the new continuous attribute will have values from 0.0 to 1.0.</P>
37
38<P class=section>Attributes</P>
39<DL class=attributes>
40<DT>factor</DT>
41<DD>The factor by which the values are multiplied.</DD>
42</DL>
43
44<p class="header">part of <a href="transformvalue-o2c.py">transformvalues-o2c.py</a>
45(uses <a href="lenses.tab">lenses.tab</a>)</p>
46<xmp class="code">import orange
47
48data = orange.ExampleTable("lenses")
49
50age = data.domain["age"]
51
52age_c = orange.FloatVariable("age_c")
53age_c.getValueFrom = orange.ClassifierFromVar(whichVar = age)
54age_c.getValueFrom.transformer = orange.Ordinal2Continuous()
55
56newDomain = orange.Domain([age, age_c], data.domain.classVar)
57newData = orange.ExampleTable(newDomain, data)
58</XMP>
59
60<P>The values of attribute 'age' ('young', 'pre-presbyopic' and 'presbyopic') are in the new domain transformed to 0.0, 1.0 and 2.0. If we additionally set <CODE>age_c.getValueFrom.transformer.factor</CODE> to 0.5, the new values will be 0.0, 0.5 and 1.0.</P>
61
62
63
64<H3>Discrete2Continuous</H3>
65
66<P><code><INDEX name="classes/Discrete2Continuous">Discrete2Continuous</code> converts a discrete value to a continuous so that some designated value is converted to 1.0 and all others to 0.0 or -1.0, depending on the settings.</P>
67
68<P class=section>Attributes</P>
69<DL class=attributes>
70<DT>value</DT>
71<DD>The value that in converted to 1.0; others are converted to 0.0 or -1.0. Value needs to be specified by an integer index, not a <CODE>Value</CODE>.</DD>
72
73<DT>zeroBased</DT>
74<DD>Decides whether the other values will be transformed to 0.0 (<CODE>True</CODE>, default) or -1.0 (<CODE>False</CODE>). When <CODE>False</CODE> undefined values are transformed to 0.0. Otherwise, undefined values yield an error.</DD>
75
76<DT>invert</DT>
77<DD>If <CODE>True</CODE> (default is <CODE>False</CODE>), the transformations are reversed - the selected <CODE>value</CODE> becomes 0.0 (or -1.0) and others 1.0.</DD>
78</DL>
79
80<P>The following examples load the Monk 1 dataset and prepares various transformations for attribute "e".</P>
81
82<p class="header">part of <a href="transformvalue-d2c.py">transformvalues-d2c.py</a>
83(uses <a href="monk1.tab">monk1.tab</a>)</p>
84<xmp class="code">import orange
85
86data = orange.ExampleTable("monk1")
87
88e = data.domain["e"]
89
90e1 = orange.FloatVariable("e=1")
91e1.getValueFrom = orange.ClassifierFromVar(whichVar = e)
92e1.getValueFrom.transformer = orange.Discrete2Continuous()
93e1.getValueFrom.transformer.value = int(orange.Value(e, "1"))
94</XMP>
95
96<P>We first construct a new continuous attribute <CODE>e1</CODE>, and set its <CODE>getValueFrom</CODE> to a newly constructed classifier that will extract the value of <CODE>e</CODE> from any example it's given. Then we tell the classifier to transform the gotten value using a <CODE>Discrete2Continuous</CODE> transformation. The tranformations <CODE>value</CODE> is set to the index of <CODE>e</CODE>'s value "1"; one way to do it is to construct a <CODE>Value</CODE> of attribute <CODE>e</CODE> and cast it to integer (if you don't understand this, use it without understanding it).</P>
97
98<P>To demonstrate the use of various flags, we constructed two more attributes in a similar manner. Both are based on <CODE>e</CODE>, all check whether <CODE>e</CODE>'s value is "1", except that the new attribute's <CODE>e10</CODE> tranformation will not be zero based and the <CODE>e01</CODE>'s transformation will also be inverted:
99
100<p class="header">part of <a href="transformvalue-d2c.py">transformvalues-d2c.py</a></p>
101<xmp class="code">(...)
102e10.getValueFrom.transformer.zeroBased = False
103(...)
104e01.getValueFrom.transformer.zeroBased = False
105e01.getValueFrom.transformer.invert = True
106</XMP>
107
108<P>Finally, we shall construct a new domain that will only have the original <CODE>e</CODE> and its transformations, and the class. We shall convert the entire table to that domain and print out the first ten examples.</P>
109
110<p class="header">part of <a href="transformvalue-d2c.py">transformvalues-d2c.py</a></p>
111<xmp class="code">newDomain = orange.Domain([e, e1, e10, e01], data.domain.classVar)
112newData = orange.ExampleTable(newDomain, data)
113for ex in newData[:10]:
114    print ex
115</xmp>
116
117<P>Here's the script's output.</P>
118
119<XMP class=code>['1', 1.000, 1.000, -1.000, '1']
120['1', 1.000, 1.000, -1.000, '1']
121['2', 0.000, -1.000, 1.000, '1']
122['2', 0.000, -1.000, 1.000, '1']
123['3', 0.000, -1.000, 1.000, '1']
124['3', 0.000, -1.000, 1.000, '1']
125['4', 0.000, -1.000, 1.000, '1']
126['4', 0.000, -1.000, 1.000, '1']
127['1', 1.000, 1.000, -1.000, '1']
128['1', 1.000, 1.000, -1.000, '1']
129</XMP>
130
131<P>The difference between the second and the third attribute is in that where the second has zero's, the third has -1's. The last attribute (before the class) is reversed version of the third.</P>
132
133<P>You can, of course, "divide" a single attribute into a number of continuous attributes. Original attribute <CODE>e</CODE> has four possible values; let's create for new attributes, each corresponding to one of <CODE>e</CODE>'s values.</P>
134
135<p class="header">part of <a href="transformvalue-d2c.py">transformvalues-d2c.py</a>
136(uses <a href="monk1.tab">monk1.tab</a>)</p>
137<XMP class="code">attributes = [e]
138for v in e.values:
139    newattr = orange.FloatVariable("e=%s" % v)
140    newattr.getValueFrom = orange.ClassifierFromVar(whichVar = e)
141    newattr.getValueFrom.transformer = orange.Discrete2Continuous()
142    newattr.getValueFrom.transformer.value = int(orange.Value(e, v))
143    attributes.append(newattr)
144</XMP>
145
146<P>The output of this script is</P>
147<XMP class="code">['1', 1.000, 0.000, 0.000, 0.000, '1']
148['1', 1.000, 0.000, 0.000, 0.000, '1']
149['2', 0.000, 1.000, 0.000, 0.000, '1']
150['2', 0.000, 1.000, 0.000, 0.000, '1']
151['3', 0.000, 0.000, 1.000, 0.000, '1']
152['3', 0.000, 0.000, 1.000, 0.000, '1']
153['4', 0.000, 0.000, 0.000, 1.000, '1']
154['4', 0.000, 0.000, 0.000, 1.000, '1']
155['1', 1.000, 0.000, 0.000, 0.000, '1']
156['1', 1.000, 0.000, 0.000, 0.000, '1']
157</XMP>
158
159
160<H3>NormalizeContinuous</H3>
161<index name="normalization of values">
162
163<P>Transformer <CODE><INDEX name="classes/NormalizeContinuous">NormalizeContinuous</CODE> takes a continuous values and keeps it continuous, but subtracts the <CODE>average</CODE> and divides the difference by half of the <CODE>span</CODE>; <CODE>v' = (v-average) / span</CODE></P>
164
165<P class=section>Attributes</P>
166<DL class=attributes>
167<DT>average</DT>
168<DD>The value that is subtracted from the original.</DD>
169
170<DT>span</DT>
171<DD>The divisor</DD>
172</DL>
173
174<P>The following script "normalizes" all attribute in the Iris dataset by subtracting the average value and dividing by the half of deviation.</P>
175
176<p class="header">part of <a href="transformvalue-nc.py">transformvalues-nc.py</a>
177(uses <a href="iris.tab">iris.tab</a>)</p>
178<XMP class="code">import orange
179
180data = orange.ExampleTable("iris")
181
182domstat = orange.DomainBasicAttrStat(data)
183newattrs = []
184for attr in data.domain.attributes:
185    attr_c = orange.FloatVariable(attr.name+"_n")
186    attr_c.getValueFrom = orange.ClassifierFromVar(whichVar = attr)
187    transformer = orange.NormalizeContinuous()
188    attr_c.getValueFrom.transformer = transformer
189    transformer.average = domstat[attr].avg
190    transformer.span = domstat[attr].dev/2
191    newattrs.append(attr_c)
192
193newDomain = orange.Domain(data.domain.attributes + newattrs, data.domain.classVar)
194newData = orange.ExampleTable(newDomain, data)
195</XMP>
196
197
198<H3>MapIntValue</H3>
199
200<P><CODE><INDEX name="classes/MapIntValue">MapIntValue</CODE> is a discrete-to-discrete transformer that changes values according to the given mapping. MapIntValue is used for binarization in decision trees.</P>
201
202<P class=section>Attributes</P>
203<DL class=attributes>
204<DT>mapping</DT>
205<DD>Mapping that determines the new value: <CODE>v' = mapping[v]</CODE>. Undefined values remain undefined. Mapping is indexed by integer indices and contains integer indices of values.</DD>
206</DL>
207
208<P>The following script transforms the value of 'age' in dataset lenses from 'young' to 'young', and from 'pre-presbyopic' and 'presbyopic' to 'old'.</P>
209
210
211<p class="header">part of <a href="transformvalue-miv.py">transformvalues-miv.py</a>
212(uses <a href="lenses.tab">lenses.tab</a>)</p>
213<XMP class="code">age = data.domain["age"]
214
215age_b = orange.EnumVariable("age_c", values = ['young', 'old'])
216age_b.getValueFrom = orange.ClassifierFromVar(whichVar = age)
217age_b.getValueFrom.transformer = orange.MapIntValue()
218age_b.getValueFrom.transformer.mapping = [0, 1, 1]
219</XMP>
220
221<P>The mapping tells that 0th value of <CODE>age</CODE> goes to 0th, while 1st and 2nd go to the 1st value of <CODE>age_b</CODE>.
222
223
224<H2>Transforming Domains and Datasets</H2>
225
226<P>In the example on use of <CODE>NormalizeContinuous</CODE> we have already seen how to transform all attributes of some dataset and prepare the corresponding new dataset. This operation is rather common, so it makes sense to have a few classes for accomplishing this task. Such a class is inevitably less flexible than per-attribute transformations, since no specific options can be set for individual attributes. For instance, <CODE>DomainContinuizer</CODE> which will be introduced below, can be told how to treat multinominal attributes, but the same treatment then applies to all such attributes. In case that some of your attributes need specific treatment, you will have to program individual treatments yourself, in the manner similar to what we showed while introducing <CODE>NormalizeContinuous</CODE>.</P>
227
228<H3>DomainContinuizer</H3>
229<P><code><INDEX name="classes/DomainContinuizer">DomainContinuizer</code> is a class that, given a domain or a set of examples returns a new domain containing only continuous attributes. If examples are given, the original continuous attribute can be normalized, while for discrete attributes it is possible to use the most frequent value as the base. The attributes are treated according to their type:
230<UL>
231<LI>continuous attributes are normalized if requested, and copied if not;</LI>
232<LI>discrete attribute with less than two possible values are omitted;</LI>
233<LI>binary (ie two-valued discrete) attributes are transformed via <CODE>Discrete2Continuous</CODE> into 0.0/1.0 or -1.0/1.0 continuous attribute;</LI>
234<LI>multinomial discrete attributes are treated according to the flag <CODE>multinomialTreatment</CODE>.</LI>
235</UL>
236</P>
237
238<P>The fate of the class attribute is determined specifically.</P>
239
240<P class=section>Attributes</P>
241<DL class=attributes>
242<DT>zeroBased</DT>
243<DD>This flag corresponds to <CODE>zeroBased</CODE> flag of class <CODE>Discrete2Continuous</CODE> and determines the value used as the "low" value of the attribute. When binary attribute are transformed into continuous or when multivalued attribute is transformed into multiple attributes, the transformed attribute can either have values 0.0 and 1.0 (default, <CODE>zeroBased=True</CODE>) or -1.0 and 1.0. In the following text, we will assume that <CODE>zeroBased</CODE> is <CODE>True</CODE> and use 0.0.</DD>
244
245<DT>multinomialTreatment</DT> <DD>decides the treatment of multinomial attributes. Let <CODE>N</CODE> be the number of the attribute's values.
246  <DL class=attributes>
247  <DT>DomainContinuizer.LowestIsBase</DT>
248  <DD>The attribute is replaced by N-1 attributes. If the attribute has the lowest
249      value (0), all N-1 attributes are zero. If not, the attribute corresponding to
250      the actual attributes value (the first of the attributes corresponding to
251      value 1, the second to 2...) will be 1.0 and the other will be 0.0. For attributes
252      that have <CODE>baseValue</CODE> set, the specified value is used as base instead
253      of the lowest one.</DD>
254
255  <DT>DomainContinuizer.FrequentIsBase</DT>
256  <DD>The attribute is treated in the same fashion as above, except that not the lowest
257      but the most frequent value is used as a base. If there are more attributes that
258      share the first place, the lowest value is used. For this option to work, the
259      continuized domain needs to be constructed from a dataset, not a domain (which
260      doesn't give information on value frequencies). Again, if attribute has
261      <CODE>baseValue</CODE> set, the specified value is used instead of the most
262      frequent.</DD>
263
264  <DT>DomainContinuizer.NValues</DT>
265  <DD>The attribute is replaced by N attributes. If you plan to use the newly
266      constructed domain in statistical modelling, make sure that the method is
267      immune to dependent attributes. An exception to that are binary attributes
268      which are still replaced by a single attribute.</DD>
269
270  <DT>DomainContinuizer.Ignore</DT>
271  <DD>Multivalued attributes are omitted.</DD>
272
273  <DT>DomainContinuizer.ReportError</DT>
274  <DD>If multivalued attribute is encountered, an error is raised.</DD>
275
276  <DT>DomainContinuizer.AsOrdinal</DT>
277  <DD>Multivalued attributes are treated as ordinal, <I>ie</I> replaced by a continuous
278      attribute with the values' index (see <CODE>Ordinal2Continuous</CODE>).</DD>
279
280  <DT>DomainContinuizer.AsNormalizedOrdinal</DT>
281  <DD>As above, except that the resulting continuous value will be from range 0 to
282      1.</DD>
283  </DL>
284</DD>
285
286<DT>normalizeContinuous</DT>
287<DD>If <CODE>True</CODE> (not by default) continuous attributes are "normalized": they are subtracted the average value and divided by the deviation. This is only possible when the continuizer is given the data, not only the domain.</DD>
288
289<DT>classTreatment</DT>
290<DD>Determines what happens with the class attribute if it is discrete.
291  <DL class=attributes>
292  <DT>DomainContinuizer.Ignore</DT>
293  <DD>Class attribute is copied as is. Note that this is different from the meaning of
294      this value at <CODE>multinomialTreatment</CODE> where it denotes omitting the
295      attribute.</DD>
296
297  <DT>DomainContinuizer.AsOrdinal, DomainContinuizer.AsNormalizedOrdinal</DT>
298  <DD>If class is multinomial, it is treated as ordinal, in the same manner as
299      described above. Binary classes are transformed to 0.0/1.0 attributes.</DD>
300  </DL>
301  It is not possible to normalize the continuous class with
302  <CODE>DomainContinuizer</CODE>.
303</DD>
304</DL>
305
306<P>Let us first examine the effect of <CODE>multinomialTreatment</CODE> on attributes from dataset "bridges". To be able to follow the transformations, we shall first print out a description of domain and the 15th example in the dataset.</P>
307
308<p class="header">part of <a href="transformvalue-domain.py">transformvalues-domain.py</a>
309(uses <a href="bridges.tab">bridges.tab</a>)</p>
310<XMP class="code">def printExample(ex):
311    for val in ex:
312        print "%20s: %s" % (val.variable.name, val)
313
314data = orange.ExampleTable("bridges")
315
316for attr in data.domain:
317    if attr.varType == orange.VarTypes.Continuous:
318        print "%20s: continuous" % attr.name
319    else:
320        print "%20s: %s" % (attr.name, attr.values)
321
322print
323print "Original 15th example:"
324printExample(data[15])
325</xmp>
326
327<P>We'll show the output in a moment. Let us now use the lowest values as the bases and continuize the attributes.</P>
328
329<p class="header">part of <a href="transformvalue-domain.py">transformvalues-domain.py</a></p>
330<XMP class="code">continuizer = orange.DomainContinuizer()
331
332continuizer.multinomialTreatment = continuizer.LowestIsBase
333domain0 = continuizer(data)
334data0 = data.translate(domain0)
335printExample(data0[15])
336</XMP>
337
338<P>Here's what we get; to the left, we've added the original example and the domain description, so that we can see what happens.</P>
339
340<TABLE>
341<TR><TD>
342<XMP CLASS=CODE>         RIVER=A: 0.000
343         RIVER=O: 0.000
344         RIVER=Y: 0.000
345         ERECTED: 1863
346PURPOSE=AQUEDUCT: 0.000
347      PURPOSE=RR: 1.000
348    PURPOSE=WALK: 0.000
349          LENGTH: 1000
350           LANES: 2
351       CLEAR-G=G: 0.000
352     T-OR-D=DECK: 0.000
353   MATERIAL=IRON: 1.000
354  MATERIAL=STEEL: 0.000
355     SPAN=MEDIUM: 1.000
356       SPAN=LONG: 0.000
357       REL-L=S-F: ?
358         REL-L=F: ?
359     TYPE=SUSPEN: 0.000
360   TYPE=SIMPLE-T: 1.000
361       TYPE=ARCH: 0.000
362   TYPE=CANTILEV: 0.000
363        TYPE=NIL: 0.000
364     TYPE=CONT-T: 0.000
365</XMP>
366</TD>
367<TD>
368<XMP CLASS=CODE>   RIVER: M
369 ERECTED: 1863
370 PURPOSE: RR
371  LENGTH: 1000
372   LANES: 2
373 CLEAR-G: N
374  T-OR-D: THROUGH
375MATERIAL: IRON
376    SPAN: MEDIUM
377   REL-L: ?
378   TYPE: SIMPLE-T
379</XMP>
380</TD>
381<TD>
382<XMP CLASS=CODE>   RIVER: <M, A, O, Y>
383 ERECTED: continuous
384 PURPOSE: <HIGHWAY, AQUEDUCT, RR, WALK>
385  LENGTH: continuous
386   LANES: continuous
387 CLEAR-G: <N, G>
388  T-OR-D: <THROUGH, DECK>
389MATERIAL: <WOOD, IRON, STEEL>
390    SPAN: <SHORT, MEDIUM, LONG>
391   REL-L: <S, S-F, F>
392    TYPE: <WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, NIL, CONT-T>
393</XMP>
394</TD>
395</TR>
396</TABLE>
397
398<P>The first, four-valued attribute River is replaced by three attributes corresponding to values "A", "O" and "Y". For the 15th example, River is "M" so all three attributes are 0.0. The continuous year is left intact. Of the three attributes that describe the purpose of the bridge, "PURPOSE=RR" is 1.0 since this is the rail-road bridge. Value of the three-valued "REL-L" is undefined in the original example, so the corresponding two attributes in the new domain are undefined as well...</P>
399
400<P>In the next test, we replaced <CODE>continuizer.LowestIsBase</CODE> by <CODE>continuizer.FrequentIsBase</CODE>, instructing Orange to use the most frequent values for base values.</P>
401
402<TABLE>
403<TR><TD>
404<XMP CLASS=CODE>         RIVER=M: 1.000
405         RIVER=O: 0.000
406         RIVER=Y: 0.000
407         ERECTED: 1863
408PURPOSE=AQUEDUCT: 0.000
409      PURPOSE=RR: 1.000
410    PURPOSE=WALK: 0.000
411          LENGTH: 1000
412           LANES: 2
413       CLEAR-G=N: 1.000
414     T-OR-D=DECK: 0.000
415   MATERIAL=WOOD: 0.000
416   MATERIAL=IRON: 1.000
417      SPAN=SHORT: 0.000
418       SPAN=LONG: 0.000
419         REL-L=S: ?
420       REL-L=S-F: ?
421       TYPE=WOOD: 0.000
422     TYPE=SUSPEN: 0.000
423       TYPE=ARCH: 0.000
424   TYPE=CANTILEV: 0.000
425        TYPE=NIL: 0.000
426     TYPE=CONT-T: 0.000
427</XMP>
428</TD>
429<TD>
430<XMP CLASS=CODE>   RIVER: M
431 ERECTED: 1863
432 PURPOSE: RR
433  LENGTH: 1000
434   LANES: 2
435 CLEAR-G: N
436  T-OR-D: THROUGH
437MATERIAL: IRON
438    SPAN: MEDIUM
439   REL-L: ?
440   TYPE: SIMPLE-T
441</XMP>
442</TD>
443<TD>
444<XMP CLASS=CODE>   RIVER: <M, A, O, Y>
445 ERECTED: continuous
446 PURPOSE: <HIGHWAY, AQUEDUCT, RR, WALK>
447  LENGTH: continuous
448   LANES: continuous
449 CLEAR-G: <N, G>
450  T-OR-D: <THROUGH, DECK>
451MATERIAL: <WOOD, IRON, STEEL>
452    SPAN: <SHORT, MEDIUM, LONG>
453   REL-L: <S, S-F, F>
454    TYPE: <WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, NIL, CONT-T>
455</XMP>
456</TD>
457</TR>
458</TABLE>
459
460<P>Comparing the outputs, we notice that for the first attribute, "A" is chosen as the base value instead of "M", so the three new attributes tell whether the bridge is over "M", "O" or "Y". As for Purpose, nothing changes since highway bridges are the most often. The base value also changes for the binary Clear-G, since G is more frequent than N...</P>
461
462<P>Next alternative is <CODE>continuizer.NValues</CODE>, which turns N-valued attributes into N attributes, except for N==2, where we still get the binary attribute, using the lowest value for the base.</P>
463
464<TABLE>
465<TR><TD>
466<XMP CLASS=CODE>         RIVER=M: 1.000
467         RIVER=A: 0.000
468         RIVER=O: 0.000
469         RIVER=Y: 0.000
470         ERECTED: 1863
471 PURPOSE=HIGHWAY: 0.000
472PURPOSE=AQUEDUCT: 0.000
473      PURPOSE=RR: 1.000
474    PURPOSE=WALK: 0.000
475          LENGTH: 1000
476           LANES: 2
477       CLEAR-G=G: 0.000
478     T-OR-D=DECK: 0.000
479   MATERIAL=WOOD: 0.000
480   MATERIAL=IRON: 1.000
481  MATERIAL=STEEL: 0.000
482      SPAN=SHORT: 0.000
483     SPAN=MEDIUM: 1.000
484       SPAN=LONG: 0.000
485         REL-L=S: ?
486       REL-L=S-F: ?
487         REL-L=F: ?
488       TYPE=WOOD: 0.000
489     TYPE=SUSPEN: 0.000
490   TYPE=SIMPLE-T: 1.000
491       TYPE=ARCH: 0.000
492   TYPE=CANTILEV: 0.000
493        TYPE=NIL: 0.000
494     TYPE=CONT-T: 0.000
495</XMP>
496</TD>
497<TD>
498<XMP CLASS=CODE>   RIVER: M
499 ERECTED: 1863
500 PURPOSE: RR
501  LENGTH: 1000
502   LANES: 2
503 CLEAR-G: N
504  T-OR-D: THROUGH
505MATERIAL: IRON
506    SPAN: MEDIUM
507   REL-L: ?
508   TYPE: SIMPLE-T
509</XMP>
510</TD>
511<TD>
512<XMP CLASS=CODE>   RIVER: <M, A, O, Y>
513 ERECTED: continuous
514 PURPOSE: <HIGHWAY, AQUEDUCT, RR, WALK>
515  LENGTH: continuous
516   LANES: continuous
517 CLEAR-G: <N, G>
518  T-OR-D: <THROUGH, DECK>
519MATERIAL: <WOOD, IRON, STEEL>
520    SPAN: <SHORT, MEDIUM, LONG>
521   REL-L: <S, S-F, F>
522    TYPE: <WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, NIL, CONT-T>
523</XMP>
524</TD>
525</TR>
526</TABLE>
527
528<P>The least exciting case is <CODE>continuizer.Ignore</CODE>, which reduces the attribute set to continuous attributes.</P>
529
530<TABLE>
531<TR><TD>
532<XMP CLASS=CODE>         ERECTED: 1863
533          LENGTH: 1000
534           LANES: 2
535       CLEAR-G=G: 0.000
536     T-OR-D=DECK: 0.000
537</XMP>
538</TD>
539<TD>
540<XMP CLASS=CODE>   RIVER: M
541 ERECTED: 1863
542 PURPOSE: RR
543  LENGTH: 1000
544   LANES: 2
545 CLEAR-G: N
546  T-OR-D: THROUGH
547MATERIAL: IRON
548    SPAN: MEDIUM
549   REL-L: ?
550   TYPE: SIMPLE-T
551</XMP>
552</TD>
553<TD>
554<XMP CLASS=CODE>   RIVER: <M, A, O, Y>
555 ERECTED: continuous
556 PURPOSE: <HIGHWAY, AQUEDUCT, RR, WALK>
557  LENGTH: continuous
558   LANES: continuous
559 CLEAR-G: <N, G>
560  T-OR-D: <THROUGH, DECK>
561MATERIAL: <WOOD, IRON, STEEL>
562    SPAN: <SHORT, MEDIUM, LONG>
563   REL-L: <S, S-F, F>
564    TYPE: <WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, NIL, CONT-T>
565</XMP>
566</TD>
567</TR>
568</TABLE>
569
570<P>The last two variations retain the number of attributes, but turn them into continuous. <CODE>continuizer.AsOrdinal</CODE> looks like this.</P>
571
572<TABLE>
573<TR><TD>
574<XMP CLASS=CODE>   C_RIVER: 0.000
575   ERECTED: 1863
576 C_PURPOSE: 2.000
577    LENGTH: 1000
578     LANES: 2
579 C_CLEAR-G: 0.000
580  C_T-OR-D: 0.000
581C_MATERIAL: 1.000
582    C_SPAN: 1.000
583   C_REL-L: ?
584    C_TYPE: 2.000
585</XMP>
586</TD>
587<TD>
588<XMP CLASS=CODE>   RIVER: M
589 ERECTED: 1863
590 PURPOSE: RR
591  LENGTH: 1000
592   LANES: 2
593 CLEAR-G: N
594  T-OR-D: THROUGH
595MATERIAL: IRON
596    SPAN: MEDIUM
597   REL-L: ?
598   TYPE: SIMPLE-T
599</XMP>
600</TD>
601<TD>
602<XMP CLASS=CODE>   RIVER: <M, A, O, Y>
603 ERECTED: continuous
604 PURPOSE: <HIGHWAY, AQUEDUCT, RR, WALK>
605  LENGTH: continuous
606   LANES: continuous
607 CLEAR-G: <N, G>
608  T-OR-D: <THROUGH, DECK>
609MATERIAL: <WOOD, IRON, STEEL>
610    SPAN: <SHORT, MEDIUM, LONG>
611   REL-L: <S, S-F, F>
612    TYPE: <WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, NIL, CONT-T>
613</XMP>
614</TD>
615</TR>
616</TABLE>
617
618<P>For instance, the value of C_Purpose is 2.000 since the Purpose has the 2nd possible value of purpose (if we start counting by 0). Finally, <CODE>continuizer.AsNormalizedOrdinal</CODE> normalizes the new continuous attributes to range 0.0 - 1.0.</P>
619
620<TABLE>
621<TR><TD>
622<XMP CLASS=CODE>   C_RIVER: 0.000
623   ERECTED: 1863
624 C_PURPOSE: 0.667
625    LENGTH: 1000
626     LANES: 2
627 C_CLEAR-G: 0.000
628  C_T-OR-D: 0.000
629C_MATERIAL: 0.500
630    C_SPAN: 0.500
631   C_REL-L: ?
632    C_TYPE: 0.333
633</XMP>
634</TD>
635<TD>
636<XMP CLASS=CODE>   RIVER: M
637 ERECTED: 1863
638 PURPOSE: RR
639  LENGTH: 1000
640   LANES: 2
641 CLEAR-G: N
642  T-OR-D: THROUGH
643MATERIAL: IRON
644    SPAN: MEDIUM
645   REL-L: ?
646   TYPE: SIMPLE-T
647</XMP>
648</TD>
649<TD>
650<XMP CLASS=CODE>   RIVER: <M, A, O, Y>
651 ERECTED: continuous
652 PURPOSE: <HIGHWAY, AQUEDUCT, RR, WALK>
653  LENGTH: continuous
654   LANES: continuous
655 CLEAR-G: <N, G>
656  T-OR-D: <THROUGH, DECK>
657MATERIAL: <WOOD, IRON, STEEL>
658    SPAN: <SHORT, MEDIUM, LONG>
659   REL-L: <S, S-F, F>
660    TYPE: <WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, NIL, CONT-T>
661</XMP>
662</TD>
663</TR>
664</TABLE>
665
666<P>Values of Purpose now transform to 0.000, 0.333, 0.667 and 1.000; for railroad bridges, the corresponding value is 0.667.</P>
667
668
669</BODY>
670</HTML> 
Note: See TracBrowser for help on using the repository browser.