source: orange/orange/doc/reference/SubsetsGenerator.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 6.8 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print>
5</HEAD>
6
7<BODY>
8<h1>Generators for Subsets of Attributes</h1>
9<index name="subsets of attributes">
10
11<P>Subsets generators are classes that generate subsets of a given set of attributes. Their primary mission was to generate bound sets for function decomposition, although they can also be used for other purposes.</P>
12
13<hr>
14
15<H2>SubsetsGenerator</H2>
16
17<P><CODE><INDEX name="classes/SubsetsGenerator">SubsetsGenerator</CODE> is an abstract class that defines the behaviour of derived classes.</P>
18
19<H3>Using Subsets Generators</H3>
20
21<P>Let <CODE>sgen</CODE> be a generator that constructs pair of attributes from domain Monk 1 (the section below describes how to create such generator). You can use it in for sentences</P>
22
23<xmp class="code">>>> for attrs in sgen:
24...     print attrs
25...
26(EnumVariable 'a', EnumVariable 'b')
27(EnumVariable 'a', EnumVariable 'c')
28(EnumVariable 'a', EnumVariable 'd')
29     ...
30</xmp>
31
32<P>or in list comprehensions</P>
33
34<XMP class="code">subsets = [attrs for attrs in sgen]
35</XMP>
36
37<P><SMALL>There is another way of using subset generators. You can reset generator by calling <CODE>reset</CODE> and get a sequence of attribute subsets by calling <CODE>next</CODE> until it returns <CODE>None</CODE>. This is provided for compatibility with older versions of Orange and describe here for easier understanding of old code. Don't use it.</SMALL></P>
38
39<H3>Initializing the Generator</H3>
40
41<P>Before iterating through subsets, the generator needs to be given a set of attributes. You can specify the set at construction time, set it through <CODE>varList</CODE> attribute, or specify it at for-clause. So, to construct the above generator, one can write</P>
42
43<XMP class="code">sgen = orange.SubsetsGenerator_constSize(data.domain.attributes)
44</XMP>
45
46<P>or</P>
47
48<XMP class="code">sgen = orange.SubsetsGenerator_constSize()
49sgen.varList = data.domain.attributes
50</XMP>
51
52<P>The third, somewhat ugly alternative, is providing the attribute set at call time.</P>
53
54<XMP class="code">sgen = orange.SubsetsGenerator_constSize()
55for attrs in sgen(data.domain.attributes):
56  print attrs
57</XMP>
58
59<small>Why is it ugly? It's syntactically a call, and it's also implemented as one, but the code is equivalent to <code>def __call__(self, v): self.varList = v; return self</code>. Why is this dirt here at all? For compatibility with some stuff originating from before Python had the iterator protocol. And because it can come handy.</small>
60
61<H2>SubsetsGenerator_constSize</H2>
62
63<P><CODE><INDEX name="classes/SubsetsGenerator_constSize">SubsetsGenerator_constSize</CODE> returns subsets of predefined size.</P>
64
65<P class=section>Attributes</P>
66<DL class=attributes>
67<DT>B</DT>
68<DD>Subsets size. Default size is 2.</DD>
69</DL>
70
71<P>Here is an example</P>
72
73<p class="header">part of <a href="subsetsgenerators.py">subsetsgenerators.py</a>
74(uses <a href="monk1.tab">monk1.tab</a>)</p>
75<XMP class="code">gen1 = orange.SubsetsGenerator_constSize(data.domain.attributes, B=3)
76for attrs in gen1:
77  print attrs
78</XMP>
79
80<P>Output begins by</P>
81
82<XMP class="code">(EnumVariable 'a', EnumVariable 'b', EnumVariable 'c')
83(EnumVariable 'a', EnumVariable 'b', EnumVariable 'd')
84(EnumVariable 'a', EnumVariable 'b', EnumVariable 'e')
85(EnumVariable 'a', EnumVariable 'b', EnumVariable 'f')
86(EnumVariable 'a', EnumVariable 'c', EnumVariable 'd')
87(EnumVariable 'a', EnumVariable 'c', EnumVariable 'e')
88(EnumVariable 'a', EnumVariable 'c', EnumVariable 'f')
89(EnumVariable 'a', EnumVariable 'd', EnumVariable 'e')
90(EnumVariable 'a', EnumVariable 'd', EnumVariable 'f')
91(EnumVariable 'a', EnumVariable 'e', EnumVariable 'f')
92(EnumVariable 'b', EnumVariable 'c', EnumVariable 'd')
93...
94</XMP>
95
96<P>More often, however, the generator will be constructed in advance and then used to construct subsets of some given attribute set.</P>
97
98<p class="header">part of <a href="subsetsgenerators.py">subsetsgenerators.py</a>
99(uses <a href="monk1.tab">monk1.tab</a>)</p>
100<XMP class="code">def f(gen, data):
101  for attrs in gen(data.domain.attributes):
102    print attrs
103
104gen = orange.SubsetsGenerator_constSize(B=3)
105f(gen, data)
106</XMP>
107
108<H2>SubsetsGenerator_minMaxSize</H2>
109
110<P><CODE><INDEX name="classes/SubsetsGenerator_minMaxSize">SubsetsGenerator_minMaxSize</CODE> returns subsets of sizes within given limits. Subsets are sorted by increasing cardinality.</P>
111
112<P class=section>Attributes</P>
113<DL class=attributes>
114<DT>min, max</DT>
115<DD>Minimal and maximal subset size. Defaults are 2 and 3.</DD>
116</DL>
117
118
119<p class="header">part of <a href="subsetsgenerators.py">subsetsgenerators.py</a>
120(uses <a href="monk1.tab">monk1.tab</a>)</p>
121<XMP class="code">gen4 = orange.SubsetsGenerator_minMaxSize(min=1, max=3)
122for attrs in gen4(data.domain.attributes):
123  print attrs
124</xmp>
125
126<P>The output begins by:</P>
127<XMP class=code>(EnumVariable 'a',)
128(EnumVariable 'b',)
129(EnumVariable 'c',)
130(EnumVariable 'd',)
131(EnumVariable 'e',)
132(EnumVariable 'f',)
133(EnumVariable 'a', EnumVariable 'b')
134(EnumVariable 'a', EnumVariable 'c')
135(EnumVariable 'a', EnumVariable 'd')
136</XMP>
137
138<H3>SubsetsGenerator_constant</H3>
139
140<p><code><INDEX name="classes/SubsetsGenerator_constant">SubsetsGenerator_constant</code> "generates" a single subset, prescribed by the user.
141
142<P class=section>Attributes</P>
143<DL class=attributes>
144<DT>constant</DT>
145<DD>The one and only subset returned by the generator.</DD>
146</DL>
147
148<P>The code below will always return a subset containing the first three attributes.</p>
149
150<p class="header">part of <a href="subsetsgenerators.py">subsetsgenerators.py</a>
151(uses <a href="monk1.tab">monk1.tab</a>)</p>
152<xmp class="code">gen5 = orange.SubsetsGenerator_constant()
153gen5.constant = data.domain[:3]
154for attrs in gen5(data.domain.attributes):
155  print attrs
156</xmp>
157
158<p>Why the hell would you need such a generator? There are object that require a subsets generator as a component. In function decomposition, for instance, subsets generators are used to construct a list of candidate subsets. This is a way to force them into observing a single prescribed subset.</p>
159
160<H3>SubsetsGenerator_withRestrictions</H3>
161<index name="classes/SubsetsGenerator_withRestrictions">
162
163<p>This is the most complex subsets generator. It uses a generator - one of the above generators - stored in a field <code>subGenerator</code> to generate subsets, but it filters out all the subsets that do not comply to restrictions.</P>
164
165<P class=section>Attributes</P>
166<DL class=attributes>
167<DT>subGenerator</DT>
168<DD>A generator that "proposes" subsets.</DD>
169
170<DT>required</DT>
171<DD>A list of attributes which need (all of them!) to be included in a subset.</DD>
172
173<DT>forbidden</DT>
174<DD>A list of forbidden attributes that must not appear in a subset.</DD>
175
176<DT>forbiddenSubSubsets</DT>
177<DD>Combinations of attribute that must not appear in a subset (that is, a subset is invalid if it contains one of the subsubsets in this list).
178</DL>
Note: See TracBrowser for help on using the repository browser.