# source:orange/Orange/doc/modules/orngMisc.htm@9671:a7b056375472

Revision 9671:a7b056375472, 11.2 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line
4<body>
5
6<h1>orngMisc: Miscellaneous Utility Functions</h1>
7
8<P>This module contains miscellaneous useful functions for counting, criteria-based selections and similar.</P>
9
10<h2>Counters</h2>
11<index name="modules+counters">
12<index name="subsets generators">
13
14<P><code>orngMisc</code> contains a bunch of classes that generate sequences of various kinds.</P>
15
16<H3>BooleanCounter</H3>
17<index name="classes/BooleanCounter (in orngMisc)">
18
19<P>A class which represents a boolean counter. The constructor is given the number of bits and during the iteration the counter returns a list of that length with 0 and 1's in it.</P>
20
21<P>One way to use the counter is within a for-loop:</P>
22
23<xmp class="code">>>> for t in orngMisc.BooleanCounter(3):
24...     print t
25...
26[0, 0, 0]
27[0, 0, 1]
28[0, 1, 0]
29[0, 1, 1]
30[1, 0, 0]
31[1, 0, 1]
32[1, 1, 0]
33[1, 1, 1]
34</xmp>
35
36<P>You can also call it manually.</P>
37
38<xmp class="code">>>> o = orngMisc.BooleanCounter(3)
39>>> o.next()
40[0, 0, 0]
41>>> o.next()
42[0, 0, 1]
43>>> o.next()
44[0, 1, 0]
45</xmp>
46
47<P>The current state (the last result of a call to <code>next</code>) is also stored in the attribute <code>state</code>.</P>
48
49
50<h3>LimitedCounter</h3>
51<index name="classes/LimitedCounter (in orngMisc)">
52
53<P>This class is similar to <code>BooleanCounter</code> except that the digits do not count from 0 to 1, but to the limits that are specified individually for each digit.</P>
54
55<xmp class="code">>>> for t in orngMisc.LimitedCounter([3, 5, 2]):
56...     print t
57...
58[0, 0, 0]
59[0, 0, 1]
60[0, 1, 0]
61[0, 1, 1]
62[0, 2, 0]
63[0, 2, 1]
64[0, 3, 0]
65[0, 3, 1]
66[0, 4, 0]
67[0, 4, 1]
68[1, 0, 0]
69[1, 0, 1]
70[1, 1, 0]
71[1, 1, 1]
72   (etc.)
73</xmp>
74
75<P>The class can be used in a for-loop or by manually calling <code>next</code>, just as <code>BooleanCounter</code> above.</P>
76
77<h3>NondecreasingCounter</h3>
78<index name="classes/NondecreasingCounter (in orngMisc)">
79
80<P>Nondecreasing counter generates all non-decreasing integer sequences in which no numbers are skipped, that is, if <em>n</em> is in the sequence, the sequence also includes all numbers between 0 and <em>n</em>. For instance, [0, 0, 1, 0] is illegal since it decreases, and [0, 0, 2, 2] is illegal since it has 2 without having 1 first. Or, with an example</P>
81
82<xmp class="code">>>> for t in orngMisc.NondecreasingCounter(4):
83...     print t
84...
85[0, 0, 0, 0]
86[0, 0, 0, 1]
87[0, 0, 1, 1]
88[0, 0, 1, 2]
89[0, 1, 1, 1]
90[0, 1, 1, 2]
91[0, 1, 2, 2]
92[0, 1, 2, 3]
93</xmp>
94
95
96<h3>CannonicFuncCounter</h3>
97<index name="classes/CannonicFuncCounter (in orngMisc)">
98
99<P>Returns all sequences of a given length where no numbers are skipped (see above) and none of the generated sequence is equal to another if only the labels are changed. For instance, [0, 2, 2, 1] and [1, 0, 0, 2] are considered equivalent: if we take the former and replace 0 by 1, 2 by 0 and 1 by 2 we get the second list.</P>
100
101<P>The sequences generated are equivalent to all possible functions from a set of cardinality of the sequences length.</P>
102
103<xmp class="code">>>> for t in orngMisc.CanonicFuncCounter(4):
104...     print t
105...
106[0, 0, 0, 0]
107[0, 0, 0, 1]
108[0, 0, 1, 0]
109[0, 0, 1, 1]
110[0, 0, 1, 2]
111[0, 1, 0, 0]
112[0, 1, 0, 1]
113[0, 1, 0, 2]
114[0, 1, 1, 0]
115[0, 1, 1, 1]
116[0, 1, 1, 2]
117[0, 1, 2, 0]
118[0, 1, 2, 1]
119[0, 1, 2, 2]
120[0, 1, 2, 3]
121</xmp>
122
123<h2>Selection of the "optimal" object</h2>
124
125<P>Many machine learning techniques generate a set different solutions or have to choose, as for instance in classification tree induction, between different attributes. The most trivial solution is to iterate through the candidates, compare them and remember the optimal one. The problem occurs, however, when there are multiple candidates that are equally good, and the naive approaches would select the first or the last one, depending upon the formulation of the if-statement.</P>
126
127<P><code>orngMisc</code> provides a class that makes a random choice in such cases. Each new candidate is compared with the currently optimal one; it replaces the optimal if it is better, while if they are equal, one is chosen by random. The number of competing optimal candidates is stored, so in this random choice the probability to select the new candidate (over the current one) is <em>1/w</em>, where <code>w</code> is the current number of equal candidates, including the present one. One can easily verify that this gives equal chances to all candidates, independent of the order in which they are presented.</P>
128
129<h3>BestOnTheFly</h3>
130<index name="classes/BestOnTheFly (in orngMisc)">
131
132<p>A class which finds the optimal object in a sequence of objects. The class is fed the candidates one by one, and remembers the "winner". It can thus be used by methods that generate different "solutions" to a problem and need to select the optimal one, but do not want to store them all.</p>
133
134<p class=section>Methods</P>
135<DL class=attributes>
136<dt>__init__(compare = cmp, seed = 0, callCompareOn1st = False)</dt>
137<dd>Constructor can be given a compare function and a random seed; both arguments are optional. The default function for comparison supposes that the objects can be compared by the usual comparison operators. If the seed is not given, a seed of 0 is used to ensure that the same experiment always gives the same results, despite pseudo-randomness. If <code>callCompareOn1st</code> is set, <code>BestOnTheFly</code> will suppose that the candidates are lists are tuples, and it will call <code>compare</code> with the first element of the tuple.</dd>
138
139<dt>candidate(x)</dt>
140<dd>Candidates are fed to a function <code>candidate</code>, which accepts a single argument (the candidate).</dd>
141
142<dt>winner()</dt>
143<dd>Returns the (currently) optimal object. This function can be called any number of times, even when the candidates are still coming - it doesn't "collapse" anything.</dd>
144
145<dt>winnerIndex()</dt>
146<dd>Returns the index of the optimal object within the sequence of the candidates.</dd>
147</DL>
148
149<h3>selectBest(l, compare = cmp, seed = 0, callCompareOn1st = False)</h3>
150
151<P>Function <code>selectBest</code> (it is a separate function, not a method of <code>BestOnTheFly</code>) returns the optimal object from list <code>l</code>. The function can be used if the candidates are already in the list, so using the more complicated <code>BestOnTheFly</code> directly is not needed.</P>
152
153<P>To demonstrate the use of <code>BestOnTheFly</code> and present <code>selectBest</code>: here's how <code>selectBest</code> is implemented.</P>
154
155<xmp class="code">def selectBest(x, compare=cmp, seed = 0, callCompareOn1st = False):
156    bs=BestOnTheFly(compare, seed, callCompareOn1st)
157    for i in x:
158        bs.candidate(i)
159    return bs.winner()
160</xmp>
161
162<h3>selectBestIndex(l, compare=cmp, seed=0, callCompareOn1st = False)</h3>
163
164<P>Similar to <code>selectBest</code> except that it doesn't return the best object but its index in the list <code>l</code>.</P>
165
166<h3>compare2_bigger, compare2_smaller</h3>
167
168<P>Those two functions can be used as compare operators for the above class and functions. The first returns the largest and the second the smallest objects (the function are equal to <code>cmp</code> and <code>-cmp</code>, respectively).</P>
169
170<h3>compare2_firstBigger, compare2_firstSmaller, compare2_lastBigger, compare2_lastSmaller</h3>
171
172<P>These function assume that the objects being compared are lists (or tuples). To compare objects <code>x</code> and <code>y</code> they compare <code>x[0]</code> with <code>y[0]</code> (the first two functions) or <code>x[-1]</code> with <code>y[-1]</code> (the other two).</P>
173
174<h3>Example</h3>
175
176<P>The following snippet loads the data set lymphography and prints out the attribute with the highest information gain.</P>
177
179<xmp class="code">import orange, orngMisc
180
181data = orange.ExampleTable("lymphography")
182
183findBest = orngMisc.BestOnTheFly(callCompareOn1st = True)
184
185for attr in data.domain.attributes:
186    findBest.candidate((orange.MeasureAttribute_gainRatio(attr, data), attr))
187
188print findBest.winner()
189</xmp>
190
191<P>Our candidates are tuples gain ratios and attributes, so we set <code>callCompareOn1st</code> to make the compare function compare the first element (gain ratios). We could achieve the same by initializing the object like this:</P>
192<xmp class="code">findBest = orngMisc.BestOnTheFly(orngMisc.compare2_firstBigger)</xmp>
193
194<P>The other way to do it is through indices.</P>
195
197<xmp class="code">findBest = orngMisc.BestOnTheFly()
198
199for attr in data.domain.attributes:
200    findBest.candidate(orange.MeasureAttribute_gainRatio(attr, data))
201
202bestIndex = findBest.winnerIndex()
203print data.domain[bestIndex],", ", findBest.winner()
204</xmp>
205
206<P>Here we only give gain ratios to <code>bestOnTheFly</code>, so we don't have to specify a special compare operator. After checking all attributes we get the index of the optimal one by calling <code>winnerIndex</code>.</P>
207
208
209<h2>Other functions</h2>
210
211<h3>frange(start, stop, step)</h3>
212
213<P>This function is equivalent to the built-in <code>range</code>, except that <code>start</code>, <code>stop</code> and <code>step</code> can be continuous (<code>float</code>) and that it returns the values between <code>start</code> and <code>stop</code>, <em>including <code>stop</code></em> (e.g., <code>frange(0, 1, 0.5)</code> returns <code>[0.0, 0.5, 1.0]</code>.</P>
214
215<P>Default arguments are somewhat peculiar.
216<ul>
217<li><code>frange()</code> is equivalent to <code>frange(0, 1, 0.1)</code></li>
218<li><code>frange(x)</code> is equivalent to <code>frange(x, 1, x)</code>, for instance, <code>frange(0.25)</code> gives <code>[0.25, 0.5, 0.75, 1.0]</code></li>
219<li><code>frange(x, y)</code> is equivalent to <code>frange(0, x, y)</code>, therefore <code>frange(3, .5)</code> returns <code>[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]</code></li>
220</ul>
221</P>
222
223<h3>printVerbose(text[, verb])</h3>
224
225<p>Prints out the text <code>text</code> if <code>verb</code> is <code>True</code>. The second argument can be omitted - this is generally recommnded; in this case, the text is printed if <code>orngMisc.verbose</code> is <code>True</code>.</p>
226
227<p>This function supports a general "verbose mode" for the application. Functions have some verbose info to print out, should do so through <code>orngMisc.printVerbose</code>.</p>
228
229<P>Most modules sadly ignore this function and have their individual "verbose" arguments.</P>
230
231<H3>getObjectName(x, default="")</H3>
232
233<P>Returns a suitable name of object <code>x</code> by first checking whether it has attribute <code>name</code>, <code>shortDescription</code>, <code>description</code>, <code>func_doc</code> or <code>func_name</code>, in that order. If it has none of this, but is a class instance, it returns the class name. If none of these succeeds, it returns the value of argument <code>default</code>.</P>
234
235<P>The function is used primarily for getting the name of a learner or classifier.</P>
236
237<H3>demangleExamples(x)</H3>
238
239<P>A function used by several functions (mostly in <code>orngTest</code>) that can accept either an examples generator or a tuple (example generator, weight) as an argument. <code>demangleExamples</code> accepts any of these two as an argument and returns a tuple (with the weight meta id set to 0 if non was given).
240</P>
241
242</body>
243</html>
Note: See TracBrowser for help on using the repository browser.