source: orange/Orange/doc/reference/random.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 5.9 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
8<h1>Randomness in Orange</h1>
9<index name="randomness in Orange">
11<P>There are two general situations where Orange uses random numbers: those where user know about it and those where (s)he doesn't. The first occurrs in the usual situations - picking random examples, adding noise, and similar. Classes that make such decisions can either use a global random number generator or their own number generator if the user assigns them one. We'll talk about that later, since it's rather routine. We'll start first with a more complex case of random decisions that the user is not aware of.</P>
13<h2>Random decisions behind your back</h2>
15<p>When Orange builds a decision tree, candidate attributes are evaluated and the best candidate is chosen. But what if two or more share the first place? Most machine learning systems don't care about it and always take the first, which is unfair and, besides, has strange effects: the induced model and, consequentially, its accuracy depends upon the order of attributes. Which shouldn't be.</p>
17<p>This is not an isolated problem. Another instance is when a classifier has to choose between two equally probable classes when there is no additional information (such as classification costs) to help make the prediction. Or selecting random reference examples when computing ReliefF. Returning a modus of a distribution with two or more competing values...</p>
19<p>The old solution was to make a random selection in such cases. Take a random class (out of the most probable, of course), random attribute, random examples... Although theoretically correct, it comes with a price: the only way to ensure repeatability of experiments is by setting the global random generator, which is not a good practice in component-based systems.</p>
21<p>What Orange does now is more cunning. When, for instance, choosing between <em>n</em> equally probable classes, Orange computes something like a hash value from the example to be classified. Its remainder at division by <em>n</em> is then used to select the class. Thus, the class will be random, but always the same for same example.</p>
23<p>A similar trick is used elsewhere. To choose an attribute when building a tree, it simply divides the number of learning examples at that node by the number of candidate attributes and the remainder is used again.</p>
25<p>When more random numbers are needed, for instance for selecting <em>m</em> random reference examples for computing ReliefF, the number of examples is used for a random seed for a temporary random generator.</p>
27<p>To conclude: Orange will sometimes make decisions that will look random. The reason for this is that it is more fair than most of machine learning systems that pick the first (or the last) candidate. But whatever decision is taken, it will be the same if you run the program twice. The message is thus: be aware that this is happenning, but don't care about it.</p>
30<h2>Random number generators</h2>
32<p>Classes that are intended to make random things, like adding noise or picking random examples, (a) don't hide it and (b) can be controlled. They can be assigned a random generator, an instance of <code><INDEX name="classes/RandomGenerator">RandomGenerator</code>. More components can share the same generator. Even more, this is encouraged when the components compose an experimental setup. Setting a random seed for such generator would effect all components of the setup, which is desirable.</p>
34<p>On the other hand, you don't need to assign random generators if you don't want to (not even to achieve repeatability of you experiments, which was necessary in previous versions of Orange). Leave things as they are and they will create random generators they need and initialize them with seed 0.</p>
36<p>Classes for creating random indices for making random subsets of examples have a special property: they can be assigned either a random generator or a seed for a random generator. See the documentation on <a href="RandomIndices.htm">random indices</a> for details.</p>
38<p>Orange uses a state-of-the-art Mersenne Twister random generator with a cycle of 2<sup>19937</sup>-1 (which is close to saying that it has no cycle).</p>
40<P class=section>Attributes</P>
41<dl class=attributes>
43<dd>A seed that is used for initialization of the generator when <CODE>reset</CODE> is called.</dd>
45<dd>The number of times the generator was called after initialization/reset.</dd>
48<p>The <code>initseed</code> can be passed as an argument to constructor, an argument to the method <code>reset</code> or set directly.</p>
50<p>There are two methods (beside the constructor). The generator can be reset via the method <code>reset</code>. An optional argument can be given to be stored (and immediately used) as a <code>initseed</code>; if not, the existing value of <code>initseed</code> is used.</p>
52<p>When called, the generator returns a signed random integer. With an optional argument, an integer <code>n</code>, the generator will return integers from 0 to <code>n-1</code>.</p>
54<p>You will seldom use the generator in your Python scripts, since Python itself provides an excellent random generator (moreover, from version 2.3 it uses exactly the same algorithm as Orange). Moreover, Python's random number generator uses current system time for a seed and therefore provides for a more random random number. You can use Python's random number for a "truly" random number seed for Python, as demonstrated below.</p>
56<P>Usually, you will construct a random generator to be used for a particular object (or more of them; objects can share the same generator).</P>
58<P>Two examples for use of random number generators are in the documentation on <a href="RandomIndices.htm">random indices</a> and <A href="ExampleTable.htm">example tables</A>.</P>
Note: See TracBrowser for help on using the repository browser.