source: orange/Orange/doc/reference/discretization.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 23.5 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<HTML>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8
9<index name="preprocessing+discretization">
10<H1>Discretization</H1>
11
12<HR>
13
14<P>Example-based automatic discretization is in essence similar to learning: given a set of examples, discretization method proposes a list of suitable intervals to cut the attribute's values into. For this reason, Orange structures for discretization resemble its structures for learning. Objects derived from <CODE>orange.Discretization</CODE> play a role of "learner" that, upon observing the examples, construct an <CODE>orange.Discretizer</CODE> whose role is to convert continuous values into discrete according to the rule found by <CODE>Discretization</CODE>.</P>
15
16<P>Orange core now supports several methods of discretization; here's a list of methods with belonging classes.</P>
17
18<!--
19<TABLE cellspacing=4pt>
20<TR valign=top>
21<TD nowrap><B>Equi-distant discretization</B><BR>
22<CODE>EquiDistDiscretization<CODE><BR>
23<CODE>EquiDiscDiscretizer<CODE></TD>
24
25<TD>The range of attribute's values is split into prescribed number equal-sized intervals.</TD>
26</TR>
27
28<TR valign=top>
29<TD nowrap><B>Quantile discretization</B><BR>
30<CODE>EquiNDiscretization<CODE><BR>
31<CODE>IntervalDiscretizer<CODE></TD>
32
33<TD>The range is split into intervals containing equal number of examples.</TD>
34</TR>
35
36<TR valign=top>
37<TD nowrap><B>Entropy-based discretization</B><BR>
38<CODE>EntropyDiscretization<CODE><BR>
39<CODE>IntervalDiscretizer<CODE></TD>
40
41<TD>Developed by Fayyad and Irani, this method balances between entropy in intervals and MDL of discretization.</TD>
42</TR>
43
44
45<TR valign=top>
46<TD nowrap><B>Bi-modal discretization</B><BR>
47<CODE>BiModalDiscretization<CODE><BR>
48<CODE>IntervalDiscretizer<CODE><BR>
49or <CODE>BiModalDiscretizer</CODE></TD>
50
51<TD>Two cut-off points set to optimize the difference of the distribution in the middle interval and the distributions outside it.</TD>
52</TR>
53
54
55<TR valign=top>
56<TD nowrap><B>Fixed discretization</B><BR>
57<CODE>FixedDiscretization<CODE><BR>
58<CODE>IntervalDiscretizer<CODE></TD>
59
60<TD>Discretization with user-prescribed cut-off points.</TD>
61</TR>
62</TABLE>
63-->
64
65<DL>
66<DT><B>Equi-distant discretization (<CODE>EquiDistDiscretization</CODE>,
67<CODE>EquiDistDiscretizer</CODE>)</B></DT>
68<DD>The range of attribute's values is split into prescribed number equal-sized intervals.<P></DD>
69
70<DT><B>Quantile-based discretization
71(<CODE>EquiNDiscretization</CODE>,
72<CODE>IntervalDiscretizer</CODE>)</B></DT> <DD>The range is split into
73intervals containing equal number of examples.<P></DD>
74
75<DT><B>Entropy-based discretization (<CODE>EntropyDiscretization</CODE>,
76<CODE>IntervalDiscretizer</CODE>)</B></DT>
77<DD>Developed by Fayyad and Irani, this method balances between entropy in intervals and MDL of discretization.<P></DD>
78
79<DT><B>Bi-modal discretization (<CODE>BiModalDiscretization</CODE>,
80<CODE>BiModalDiscretizer</CODE>/<CODE>IntervalDiscretizer</CODE>)</B></DT>
81<DD>Two cut-off points set to optimize the difference of the distribution in the middle interval and the distributions outside it.<P></DD>
82
83<DT><B>Fixed discretization (<CODE>FixedDiscretization</CODE>,
84<CODE>IntervalDiscretizer</CODE>)</B></DT>
85<DD>Discretization with user-prescribed cut-off points.</DD>
86</DL>
87
88<HR>
89
90<H2>General Schema</H2>
91
92<P>Instances of classes derived from <CODE>orange.<INDEX name="classes/Discretization">Discretization</INDEX></CODE> define a single method: the call operator. The object can also be called through constructor.</P>
93
94<DL class=attributes>
95<DT>__call__(attribute, examples[, weightID])</DT>
96<DD>Given a continuous <CODE>attribute</CODE>, examples and, optionally id of attribute with example weight, this function returns a discretized attribute. Argument <CODE>attribute</CODE> can be a descriptor, index or name of the attribute.</DD>
97</DL>
98
99<P>Here's an example.</P>
100
101<p class="header"><a href="discretization.py">part of discretization.py</a>
102(uses <a href="iris.tab">iris.tab</a>)</p>
103<XMP class="code">
104import orange
105data = orange.ExampleTable("iris")
106
107sep_w = orange.EntropyDiscretization("sepal width", data)
108
109data2 = data.select([data.domain["sepal width"], sep_w, data.domain.classVar])
110
111for ex in data2[:10]:
112    print ex
113</XMP>
114
115<P>The discretized attribute <CODE>sep_w</CODE> is constructed with a call to <CODE>EntropyDiscretization</CODE> (instead of constructing it and calling it afterwards, we passed the arguments for calling to the constructor, as is often allowed in Orange). We then constructed a new <CODE>ExampleTable</CODE> with attributes "sepal width" (the original continuous attribute), <CODE>sep_w</CODE> and the class attribute. Script output is:</p>
116
117<XMP class=code>
118[3.500000, '>3.30', 'Iris-setosa']
119[3.000000, '(2.90, 3.30]', 'Iris-setosa']
120[3.200000, '(2.90, 3.30]', 'Iris-setosa']
121[3.100000, '(2.90, 3.30]', 'Iris-setosa']
122[3.600000, '>3.30', 'Iris-setosa']
123[3.900000, '>3.30', 'Iris-setosa']
124[3.400000, '>3.30', 'Iris-setosa']
125[3.400000, '>3.30', 'Iris-setosa']
126[2.900000, '<2.90', 'Iris-setosa']
127[3.100000, '(2.90, 3.30]', 'Iris-setosa']
128</XMP>
129
130<P><CODE>EntropyDiscretization</CODE> named the new attribute's values by the interval range (it also named the attribute as "D_sepal width"). The new attribute's values get computed automatically when they are needed.</P>
131
132<P>As those that have read about <A href="Variable.htm#getValueFrom"><CODE>Variable</CODE></A> know, the answer to "How this works?" is hidden in the attribute's field <CODE>getValueFrom</CODE>. This little dialog reveals the secret.</P>
133
134<XML class=code>
135>>> sep_w
136EnumVariable 'D_sepal width'
137>>> sep_w.getValueFrom
138<ClassifierFromVar instance at 0x01BA7DC0>
139>>> sep_w.getValueFrom.whichVar
140FloatVariable 'sepal width'
141>>> sep_w.getValueFrom.transformer
142<IntervalDiscretizer instance at 0x01BA2100>
143>>> sep_w.getValueFrom.transformer.points
144<2.90000009537, 3.29999995232>
145</XML>
146
147<P>So, the <CODE>select</CODE> statement in the above example converted all examples from <CODE>data</CODE> to the new domain. Since the new domain includes the attribute <CODE>sep_w</CODE> that is not present in the original, <CODE>sep_w</CODE>'s values are computed on the fly. For each example in <CODE>data</CODE>, <CODE>sep_w.getValueFrom</CODE> is called to compute <CODE>sep_w</CODE>'s value (if you ever need to call <CODE>getValueFrom</CODE>, you shouldn't call <CODE>getValueFrom</CODE> directly but call <CODE>computeValue</CODE> instead). <CODE>sep_w.getValueFrom</CODE> looks for value of "sepal width" in the original example. The original, continuous sepal width is passed to the <CODE>transformer</CODE> that determines the interval by its field <CODE>points</CODE>. Transformer returns the discrete value which is in turn returned by <CODE>getValueFrom</CODE> and stored in the new example.</P>
148
149<P>You don't need to understand this mechanism exactly. It's important to know that there are two classes of objects for discretization. Those derived from <CODE>Discretizer</CODE> (such as <CODE>IntervalDiscretizer</CODE> that we've seen above) are used as transformers that translate continuous value into discrete. Discretization algorithms are derived from <CODE>Discretization</CODE>. Their job is to construct a <CODE>Discretizer</CODE> and return a new variable with the discretizer stored in <CODE>getValueFrom.transformer</CODE>.</p>
150
151<HR>
152
153<H2>Discretizers</H2>
154
155<P>Different discretizers support different methods for conversion of continuous values into discrete. The most general is <CODE>IntervalDiscretizer</CODE> that is also used by most discretization methods. Two other discretizers, <CODE>EquiDistDiscretizer</CODE> and <CODE>ThresholdDiscretizer</CODE> could easily be replaced by <CODE>IntervalDiscretizer</CODE> but are used for speed and simplicity. The fourth discretizer, <CODE>BiModalDiscretizer</CODE> is specialized for discretizations induced by <CODE>BiModalDiscretization</CODE>.</P>
156
157<P>All discretizers support a handy method for construction of a new attribute from an existing one.</P>
158
159<P class=section>Methods</P>
160<DL class=attributes>
161<DT>constructVariable(attribute)</DT>
162<DD>Constructs a new attribute descriptor; the new attribute is discretized <CODE>attribute</CODE>. The new attribute's name equal <CODE>attribute.name</CODE> prefixed by "D_", and its symbolic values are discretizer specific. The above example shows what comes out form <CODE>IntervalDiscretizer</CODE>. Discretization algorithms actually first construct a discretizer and then call its <CODE>constructVariable</CODE> to construct an attribute descriptor.</DD>
163</DL>
164
165<P>An example of how this method is used is shown in the following section about <CODE>IntervalDiscretizer</CODE>.
166
167
168<H3>IntervalDiscretizer</H3>
169<index name="classes/IntervalDiscretizer">
170
171<P><CODE>IntervalDiscretizer</CODE> is the most common discretizer. It made its first appearance in the example about general discretization schema and you will see more of it later. It has a single interesting attribute.</p>
172
173<P class=section>Attributes</P>
174<DL class=attributes>
175<DT>points</DT>
176<DD>Cut-off points. All values below or equal to the first point belong to the first interval, those between the first and the second (including those equal to the second) go to the second interval and so forth to the last interval which covers all values greater than the last element in <CODE>points</CODE>. The number of intervals is thus <CODE>len(points)+1</CODE>.</DD>
177</DL>
178
179<P>Let us manually construct an interval discretizer with cut-off points at 3.0 and 5.0. We shall use the discretizer to construct a discretized sepal length.</P>
180
181<p class="header"><a href="discretization.py">part of discretization.py</a>
182(uses <a href="iris.tab">iris.tab</a>)</p>
183<XMP class=code>idisc = orange.IntervalDiscretizer(points = [3.0, 5.0])
184sep_l = idisc.constructVariable(data.domain["sepal length"])
185data2 = data.select([data.domain["sepal length"], sep_l, data.domain.classVar])
186</XMP>
187
188<P>That's all. First five examples of <CODE>data2</CODE> are now</P>
189<XMP class=code>[5.100000, '>5.00', 'Iris-setosa']
190[4.900000, '(3.00, 5.00]', 'Iris-setosa']
191[4.700000, '(3.00, 5.00]', 'Iris-setosa']
192[4.600000, '(3.00, 5.00]', 'Iris-setosa']
193[5.000000, '(3.00, 5.00]', 'Iris-setosa']
194</XMP>
195
196<P>Can you use the same discretizer for more than one attribute? Yes, as long as they have same cut-off points, of course. Simply call <CODE>constructVar</CODE> for each continuous attribute.</P>
197
198<p class="header"><a href="discretization.py">part of discretization.py</a>
199(uses <a href="iris.tab">iris.tab</a>)</p>
200<XMP class=code>idisc = orange.IntervalDiscretizer(points = [3.0, 5.0])
201newattrs = [idisc.constructVariable(attr) for attr in data.domain.attributes]
202data2 = data.select(newattrs + [data.domain.classVar])
203</XMP>
204
205<P>Each attribute now has its own <CODE>ClassifierFromVar</CODE> in its <CODE>getValueFrom</CODE>, but all use the same <CODE>IntervalDiscretizer</CODE>, <CODE>idisc</CODE>. Changing an element of its <CODE>points</CODE> affect all attributes.</P>
206
207<P><B>Do not change the length of <CODE>points</CODE> if the
208discretizer is used by any attribute.</B> The length of
209<CODE>points</CODE> should always match the number of values of the
210attribute, which is determined by the length of the attribute's field
211<CODE>values</CODE>. Therefore, if <CODE>attr</CODE> is a discretized
212attribute, than <CODE>len(attr.values)</CODE> must equal
213<CODE>len(attr.getValueFrom.transformer.points)+1</CODE>. It always
214does, unless you deliberately change it. If the sizes don't match,
215Orange will probably crash, and it will be entirely your fault.</P>
216
217<H3>EquiDistDiscretizer</H3>
218<index name="classes/EquiDistDiscretizer">
219
220<P><CODE>EquiDistDiscretizer</CODE> is a bit faster but more rigid than <CODE>IntervalDiscretizer</CODE>: it uses intervals of fixed width.</CODE>
221
222<P class=section>Attributes</p>
223<DL class=attributes>
224<DT>firstCut</DT>
225<DD>The first cut-off point.</DD>
226<DT>step</DT>
227<DD>Width of intervals.</DD>
228<DT>numberOfIntervals</DT>
229<DD>Number of intervals.</DD>
230<dt>points (read-only)</dt>
231<dd>The cut-off points; this is not a real attribute although it behaves as one. Reading it constructs a list of cut-off points and returns it, but changing the list doesn't affect the discretizer - it's a separate list. This attribute is here only for to give the <code>EquiDistDiscretizer</code> the same interface as that of <code>IntervalDiscretizer</code>.</dd>
232</DL>
233
234<P>All values below <CODE>firstCut</CODE> belong to the first interval (including possible values smaller than <CODE>firstVal</CODE>. Otherwise, value <CODE>val</CODE>'s interval is <CODE>floor((val-firstVal)/step)</CODE>. If this is turns out to be greater or equal to <CODE>numberOfIntervals</CODE>, it is decreased to <CODE>numberOfIntervals-1</CODE>.</P>
235
236<P>This discretizer is returned by <CODE>EquiDistDiscretization</CODE>; you can see an example in the corresponding section. You can also construct an <CODE>EquiDistDiscretization</CODE> manually and call its <CODE>constructVariable</CODE>, just as already shown for the <CODE>IntervalDiscretizer</CODE>.</P>
237
238
239<H3>ThresholdDiscretizer</H3>
240<index name="classes/ThresholdDiscretizer">
241
242<P>Threshold discretizer converts continuous values into binary by comparing them with a threshold. This discretizer is actually not used by any discretization method, but you can use it for manual discretization. Orange needs this discretizer for binarization of continuous attributes in decision trees.</P>
243
244<P class=section>Attributes</P>
245<dl class=attributes</dl>
246<DT>threshold</DT>
247<DD>Threshold; values below or equal to the threshold belong to the first interval and those that are greater go to the second.</DD>
248</DL>
249
250
251<H3>BiModalDiscretizer</H3>
252<index name="classes/BiModalDiscretizer">
253
254<P>This discretizer is the first discretizer that couldn't be replaced by <CODE>IntervalDiscretizer</CODE>. It has two cut off points and values are discretized according to whether they belong to the middle region (which includes the lower but not the upper boundary) or not. The discretizer is returned by <CODE>ByModalDiscretization</CODE> if its field <CODE>splitInTwo</CODE> is <CODE>true</CODE> (which by default is); see an example there.</P>
255
256<P class=section>Attributes</P>
257<DL class=attributes>
258<DT>low, high</DT>
259<DD>Lower and upper boundary of the interval. The lower is included in the interval and the upper is not.</DD>
260</DL>
261
262
263<HR>
264
265<H2>Discretization Algorithms</H2>
266
267<H3>Discretization with Intervals of Equal Size</H3>
268<index name="discretization/intervals with same width">
269<index name="classes/EquiDistDiscretization">
270
271<P><CODE>EquiDistDiscretization</CODE> discretizes the attribute by cutting it into the prescribed number of intervals of equal width. The examples are needed to determine the span of attribute values. The interval between the smallest and the largest is then cut into equal parts.</P>
272
273<P class=section>Attributes</P>
274<DL class=attributes>
275<DT>numberOfIntervals</DT>
276<DD>Number of intervals into which the attribute is to be discretized. Default value is 4.</DD>
277</DL>
278
279<P>For an example, we shall discretize all attributes of Iris dataset into 6 intervals. We shall construct an <CODE>ExampleTable</CODE> with discretized attributes and print description of the attributes.</P>
280
281<XMP class=code>disc = orange.EquiDistDiscretization(numberOfIntervals = 6)
282newattrs = [disc(attr, data) for attr in data.domain.attributes]
283data2 = data.select(newattrs + [data.domain.classVar])
284
285for attr in newattrs:
286    print "%s: %s" % (attr.name, attr.values)
287</XMP>
288
289<P>Script's answer is</P>
290<XMP class=code>D_sepal length: <<4.90, [4.90, 5.50), [5.50, 6.10), [6.10, 6.70), [6.70, 7.30), >7.30>
291D_sepal width: <<2.40, [2.40, 2.80), [2.80, 3.20), [3.20, 3.60), [3.60, 4.00), >4.00>
292D_petal length: <<1.98, [1.98, 2.96), [2.96, 3.94), [3.94, 4.92), [4.92, 5.90), >5.90>
293D_petal width: <<0.50, [0.50, 0.90), [0.90, 1.30), [1.30, 1.70), [1.70, 2.10), >2.10>
294</XMP>
295
296<P>Any more decent ways for a script to find the interval boundaries than by
297parsing the symbolic values? Sure, they are hidden in the discretizer, which is, as usual, stored in <CODE>attr.getValueFrom.transformer</CODE>.</P>
298
299<P>Compare the following with the values above.</P>
300<XMP class=code>>>> for attr in newattrs:
301...    print "%s: first interval at %5.3f, step %5.3f" % \
302...    (attr.name, attr.getValueFrom.transformer.firstCut, \
303...    attr.getValueFrom.transformer.step)
304D_sepal length: first interval at 4.900, step 0.600
305D_sepal width: first interval at 2.400, step 0.400
306D_petal length: first interval at 1.980, step 0.980
307D_petal width: first interval at 0.500, step 0.400
308</XMP>
309
310<P>As all discretizers, <CODE>EquiDistDiscretizer</CODE> also has the method <CODE>constructVariable</CODE>. The following example discretizes all attributes into six equal intervals of width 1, the first interval
311
312<XMP class=code>edisc = orange.EquiDistDiscretizer(firstVal = 2.0, step = 1.0, numberOfIntervals = 5)
313newattrs = [edisc.constructVariable(attr) for attr in data.domain.attributes]
314data2 = data.select(newattrs + [data.domain.classVar])
315for ex in data2[:10]:
316    print ex
317</XMP>
318
319
320<H3>Discretization with Intervals Containing (Approximately) Equal Number of Examples</H3>
321<index name="discretization/quantiles-based">
322<index name="classes/EquiNDiscretization">
323
324<P><CODE>EquiNDiscretization</CODE> discretizes the attribute by cutting it into the prescribed number of intervals so that each of them contains equal number of examples. The examples are obviously needed for this discretization, too.</P>
325
326<P class=section>Attributes</P>
327<DL class=attributes>
328<DT>numberOfIntervals</DT>
329<DD>Number of intervals into which the attribute is to be discretized. Default value is 4.</DD>
330</DL>
331
332<P>The use of this discretization is equivalent to the above one, except that we use <code>EquiNDiscretization</code> instead of <code>EquiDistDiscretization</code>. The resulting discretizer is <code>IntervalDiscretizer</code>, hence it has <code>points</code> instead of <code>firstCut</code>/<code>step</code>/<code>numberOfIntervals</code>.</P>
333
334<H3>Entropy-based Discretization</H3>
335<index name="discretization/entropy-based (Fayyad-Irani)">
336<index name="Fayyad-Irani discretization">
337<index name="classes/EntropyDiscretization">
338
339<P>Fayyad-Irani's discretization method works without a predefined number of intervals. Instead, it recursively splits intervals at the cut-off point that minimizes the entropy, until the entropy decrease is smaller than the increase of MDL induced by the new point.</P>
340
341<P>An interesting thing about this discretization technique is that an attribute can be discretized into a single interval, if no suitable cut-off points are found. If this is the case, the attribute is rendered useless and can be removed. This discretization can therefore also serve for feature subset selection.</P>
342
343<P class=section>Attributes</P>
344<DL class=attributes>
345<DT>forceAttribute</DT>
346<DD>Forces the algorithm to induce at least one cut-off point, even when its information gain is lower than MDL (default: <CODE>false</CODE>).</DD>
347</DL>
348
349<p class="header"><a href="discretization.py">part of discretization.py</a>
350(uses <a href="iris.tab">iris.tab</a>)</p>
351<XMP class="code">entro = orange.EntropyDiscretization()
352for attr in data.domain.attributes:
353    disc = entro(attr, data)
354    print "%s: %s" % (attr.name, disc.getValueFrom.transformer.points)
355</XMP>
356
357<P>The output shows that all attributes are discretized onto three intervals:</P>
358<XMP class=code>sepal length: <5.5, 6.09999990463>
359sepal width: <2.90000009537, 3.29999995232>
360petal length: <1.89999997616, 4.69999980927>
361petal width: <0.600000023842, 1.70000004768>
362</XMP>
363
364
365<H3>Bi-Modal Discretization</H3>
366<index name="discretization/bi-modal">
367<index name="classes/BiModalDiscretization">
368
369<P><CODE>BiModalDiscretization</CODE> sets two cut-off points so that the class distribution of examples in between is as different from the overall distribution as possible. The difference is measure by chi-square statistics. All possible cut-off points are tried, thus the discretization runs in O(n<SUP>2</SUP>).</P>
370
371<P>This discretization method is especially suitable for the attributes in which the middle region corresponds to normal and the outer regions to abnormal values of the attribute. Depending on the nature of the attribute, we can treat the lower and higher values separately, thus discretizing the attribute into three intervals, or together, in a binary attribute whose values correspond to normal and abnormal.</P>
372
373<P class=section>Attributes</P>
374<DL class=attributes>
375<DT>splitInTwo</DT>
376<DD>Decides whether the resulting attribute should have three or two. If <CODE>true</CODE> (default), we have three intervals and the discretizer is of type <CODE>BiModalDiscretizer</CODE>. If <CODE>false</CODE> the result is the ordinary <CODE>IntervalDiscretizer</CODE>.</DD>
377</DL>
378
379<P>Iris dataset has three-valued class attribute, classes are setosa, virginica and versicolor. As the picture below shows, sepal lenghts of versicolors are between lengths of setosas and virginicas (the picture itself is drawn using LOESS probability estimation, see documentation on <a href="BayesLearner.htm">naive Bayesian learner</a>.</P>
380
381<CENTER><IMG src="bayes-iris.gif"></center>
382
383<P>If we merge classes setosa and virginica into one, we can observe whether the bi-modal discretization would correctly recognize the interval in which versicolors dominate.</P>
384
385<XMP class=code>newclass = orange.EnumVariable("is versicolor", values = ["no", "yes"])
386newclass.getValueFrom = lambda ex, w: ex["iris"]=="Iris-versicolor"
387newdomain = orange.Domain(data.domain.attributes, newclass)
388data_v = orange.ExampleTable(newdomain, data)
389</XMP>
390
391<P>In this script, we have constructed a new class attribute which tells whether an iris is versicolor or not. We have told how this attribute's value is computed from the original class value with a simple lambda function. Finally, we have constructed a new domain and converted the examples. Now for discretization.</P>
392
393<XMP class=code>for attr in data_v.domain.attributes:
394    disc = bimod(attr, data_v)
395    print "%s: (%5.3f, %5.3f)" % (attr.name, \
396          disc.getValueFrom.transformer.low, \
397          disc.getValueFrom.transformer.high)
398</XMP>
399
400<P>Script prints out the middle intervals:</P>
401<XMP class=code>
402Bi-Modal discretization
403sepal length: (5.400, 6.200]
404sepal width: (2.000, 2.900]
405petal length: (1.900, 4.700]
406petal width: (0.600, 1.600]
407</XMP>
408
409<P>Judging by the graph, the cut-off points for "sepal length" make sense.</P>
410
411
412<!--  I NEED TO THINK WHETHER THIS IS OBSOLETE OR NOT.
413      ONE ARGUMENT AGAINST IS THAT WE DON'T WANT TO HAVE
414      TOO MANY WAYS TO DO THE SAME THING. IT'S NOT IN
415      THE SPIRIT OF PYTHON.
416      (LET THE CLASS BE, BUT NOT EXPORTED?)
417
418<H2>Discretization of Entire Dataset</H2>
419
420<P>There are numerous ways for discretizing entire dataset. We have done it routinely in the above examples. There is a <a href="preprocessing.htm">preprocessor</a> for that. There's module <a href="../modules/orngDisc.htm">orngDisc</a>. The class that is most closely associated with discretization classes is <CODE>DomainDiscretization</CODE>.</P>
421
422<P><CODE>DomainDiscretization</CODE> is an object whom we set a discretization algorithm as a property. <CODE>DomainDiscretization</CODE> can be called with examples (and weights, optionally) as an argument, and returns a discretized domain. What we gain in comparison with we did above and what some other methods do, is that <CODE>DomainDiscretization</CODE> can speed-up the process by optimization tricks that depend on the type of discretization.</P>
423
424<P class=section>Attributes</P>
425<DL class=attributes>
426<DT>discretization</DT>
427<DD>A discretization method; use one of the above classes, such as <CODE>EntropyDiscretization</CODE> or <CODE>EquiDistDiscretization</CODE>.</DD>
428</DL>
429
430-->
431
432</BODY> 
Note: See TracBrowser for help on using the repository browser.