source: orange/orange/doc/ofb/assoc.htm @ 6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 7.2 KB checked in by Mitar <Mitar@…>, 4 years ago (diff)

Made XPM version of the icon 32x32.

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4</HEAD>
5<body>
6
7<p class="Path">
8Prev: <a href="regression.htm">Regression</a>,
9Next: <a href="other.htm">Other Techniques for Orange Scripting</a>,
10Up: <a href="default.htm">On Tutorial 'Orange for Beginners'</a>
11</p>
12
13
14<index name="association rules"><H1>Association Rules</H1>
15
16<index name="association rules">
17<p>Association rules are fun to do in Orange. One reason for this is
18Python, and particular implementation that allows a list of
19association rules to behave just like any list in Python. That is, you
20can select parts of the list, you can remove rules, even add them
21(yes, append() works on Orange association rules!).</p>
22
23<p>But let's start with the basics. For association rules, Orange
24straightforwardly implements APRIORI algorithm (see Agrawal et al.,
25Fast discovery of association rules, a chapter in Advances in
26knowledge discovery and data mining, 1996), Orange includes an
27optimized version of the algorithm that works on tabular data).  For
28number of reasons (but mostly for convenience) association rules
29should be constructed and managed through the interface provided by <a
30href="../modules/orngAssoc.htm">orngAssoc module</a>.  As implemented
31in Orange, association rules construction procedure does not handle
32continuous attributes, so make sure that your data is
33categorized. Also, class variables are treated just like attributes.
34For examples in this tutorial, we will use data from the data set <a
35href="imports-85.tab">imports-85.tab</a>, which surveys different
36types of cars and lists their characteristics. We will use only first
37ten attributes from this data set and categorize them so three equally
38populated intervals will be created for each continuous variable.
39This will be done through the following part of the code:</p>
40
41<XMP class=code>data = orange.ExampleTable("imports-85")
42data = orange.Preprocessor_discretize(data, \
43  method=orange.EquiNDiscretization(numberOfIntervals=3))
44data = data.select(range(10))
45</XMP>
46
47<p>Now, to our examples. First one uses the data set constructed with
48above script and shows how to build a list of association rules which
49will have support of at least 0.4. Next, we select a subset of first
50five rules, print them out, delete first three rules and repeat the
51printout. The script that does this is:</p>
52
53<p class="header">part of <a href="assoc1.py">assoc1.py</a>  (uses <a href=
54"imports-85.tab">imports-85.tab</a>) </p>
55<xmp class=code>rules = orange.AssociationRulesInducer(data, support=0.4)
56
57print "%i rules with support higher than or equal to %5.3f found." % (len(rules), minSupport)
58
59orngAssoc.sort(rules, ["support", "confidence"])
60
61orngAssoc.printRules(rules[:5], ["support", "confidence"])
62print
63
64del rules[:3]
65orngAssoc.printRules(rules[:5], ["support", "confidence"])
66print
67</xmp>
68
69The output of this script is:
70
71<xmp class="code">87 rules with support higher than or equal to 0.400 found.
72
73supp    conf    rule
740.888   0.984   engine-location=front -> fuel-type=gas
750.888   0.901   fuel-type=gas -> engine-location=front
760.805   0.982   engine-location=front -> aspiration=std
770.805   0.817   aspiration=std -> engine-location=front
780.785   0.958   fuel-type=gas -> aspiration=std
79
80supp    conf    rule
810.805   0.982   engine-location=front -> aspiration=std
820.805   0.817   aspiration=std -> engine-location=front
830.785   0.958   fuel-type=gas -> aspiration=std
840.771   0.981   fuel-type=gas aspiration=std -> engine-location=front
850.771   0.958   aspiration=std engine-location=front -> fuel-type=gas
86</xmp>
87
88<p>Notice that the when printing out the rules, user can specify which
89rule evaluation measures are to be printed. Choose anything from
90<code>['support', 'confidence', 'lift', 'leverage', 'strength',
91'coverage']</code>.</p>
92
93<p>The second example uses the same data set, but first prints out
94five most confident rules. Then, it shows a rather advanced type of
95filtering: every rule has parameters that record its support,
96confidence, etc... These may be used when constructing your own filter
97functions. The one in our example uses <code>support</code> and
98<code>lift</code>. [If you have just started with Python: lambda is a
99compact way to specify a simple function without using def
100statement. As a function, it uses its own name space, so minimal lift
101and support requested in our example should be passed as function
102arguments]. Here goes the code:</p>
103
104<p class="header">part of <a href="assoc2.py">assoc2.py</a>  (uses <a href=
105"imports-85.tab">imports-85.tab</a>) </p>
106<xmp class=code>
107rules = orange.AssociationRulesInducer(data, support = 0.4)
108
109n = 5
110print "%i most confident rules:" % (n)
111orngAssoc.sort(rules, ["confidence"])
112orngAssoc.printRules(rules[0:n], ['confidence','support','lift'])
113
114conf = 0.8; lift = 1.1
115print "\nRules with support>%5.3f and lift>%5.3f" % (conf, lift)
116rulesC=rules.filter(lambda x: x.confidence>conf and x.lift>lift)
117orngAssoc.sort(rulesC, ['confidence'])
118orngAssoc.printRules(rulesC, ['confidence','support','lift'])
119</xmp>
120
121<p>Just one rule with requested support and lift is found in our rule set:</p>
122
123<xmp class="code">5 most confident rules:
124conf    supp    lift    rule
1251.000   0.478   1.015   fuel-type=gas aspiration=std drive-wheels=fwd -> engine-location=front
1261.000   0.429   1.015   fuel-type=gas aspiration=std num-of-doors=four -> engine-location=front
1271.000   0.507   1.015   aspiration=std drive-wheels=fwd -> engine-location=front
1281.000   0.449   1.015   aspiration=std num-of-doors=four -> engine-location=front
1291.000   0.541   1.015   fuel-type=gas drive-wheels=fwd -> engine-location=front
130
131Rules with confidence>0.800 and lift>1.100
132conf    supp    lift    rule
1330.898   0.429   1.116   fuel-type=gas num-of-doors=four -> aspiration=std engine-location=front
134</xmp>
135
136<p>Finally, for our third example, we introduce cloning. Cloning helps if you require to work with different rule subsets that stem from common rule set created from some data (actually, cloning is quite useless in our example, but may be very useful otherwise). So, we use cloning to make a copy of the set of rules, then sort by first support and then confidence, and then print out few best rules. We have also lower required minimal support, just to see how many rules we obtain in this way.</p>
137
138<p class="header">part of <a href="assoc3.py">assoc3.py</a>  (uses <a href=
139"imports-85.tab">imports-85.tab</a>) </p>
140<xmp class=code>minSupport = 0.2
141rules = orngAssoc.build(data, minSupport)
142print "%i rules with support higher than or equal to %5.3f found.\n" % (len(rules), minSupport)
143
144rules2 = rules.clone()
145rules2.sortByConfidence()
146
147n = 5
148print "Best %i rules:" % n
149subset = rules[:n]
150subset.printMeasures(['support','confidence'])
151</xmp>
152
153<p>The output of this script is:</p>
154
155<xmp class="code">828 rules with support higher than or equal to 0.200 found.
156
157Best 5 rules:
158supp    conf    rule
1590.888   0.984   engine-location=front -> fuel-type=gas
1600.888   0.901   fuel-type=gas -> engine-location=front
1610.805   0.982   engine-location=front -> aspiration=std
1620.805   0.817   aspiration=std -> engine-location=front
1630.785   0.958   fuel-type=gas -> aspiration=std
164</xmp>
165
166<hr><br><p class="Path">
167Prev: <a href="regression.htm">Regression</a>,
168Next: <a href="other.htm">Other Techniques for Orange Scripting</a>,
169Up: <a href="default.htm">On Tutorial 'Orange for Beginners'</a></p>
170
171</body></html>
172
Note: See TracBrowser for help on using the repository browser.