source: orange/docs/tutorial/rst/association-rules.rst @ 11028:009ba5a75e30

Revision 11028:009ba5a75e30, 5.2 KB checked in by Miha Stajdohar <miha.stajdohar@…>, 17 months ago (diff)

Added a common documentation index.

Line 
1.. index:: association rules
2
3Association rules
4=================
5
6Association rules are fun to do in Orange. One reason for this is
7Python, and particular implementation that allows a list of
8association rules to behave just like any list in Python. That is, you
9can select parts of the list, you can remove rules, even add them
10(yes, ``append()`` works on Orange association rules!).
11
12For association rules, Orange straightforwardly implements APRIORI
13algorithm (see Agrawal et al., Fast discovery of association rules, a
14chapter in Advances in knowledge discovery and data mining, 1996),
15Orange includes an optimized version of the algorithm that works on
16tabular data).  For number of reasons (but mostly for convenience)
17association rules should be constructed and managed through the
18interface provided by :py:mod:`Orange.associate`.  As implemented in Orange,
19association rules construction procedure does not handle continuous
20attributes, so make sure that your data is categorized. Also, class
21variables are treated just like attributes.  For examples in this
22tutorial, we will use data from the data set :download:`imports-85.tab <code/imports-85.tab>`, which
23surveys different types of cars and lists their characteristics. We
24will use only first ten attributes from this data set and categorize
25them so three equally populated intervals will be created for each
26continuous variable.  This will be done through the following part of
27the code::
28
29   data = orange.ExampleTable("imports-85")
30   data = orange.Preprocessor_discretize(data, \
31     method=orange.EquiNDiscretization(numberOfIntervals=3))
32   data = data.select(range(10))
33
34Now, to our examples. First one uses the data set constructed with
35above script and shows how to build a list of association rules which
36will have support of at least 0.4. Next, we select a subset of first
37five rules, print them out, delete first three rules and repeat the
38printout. The script that does this is (part of :download:`assoc1.py <code/assoc1.py>`, uses
39:download:`imports-85.tab <code/imports-85.tab>`)::
40
41   rules = orange.AssociationRulesInducer(data, support=0.4)
42   
43   print "%i rules with support higher than or equal to %5.3f found." % (len(rules), minSupport)
44   
45   orngAssoc.sort(rules, ["support", "confidence"])
46   
47   orngAssoc.printRules(rules[:5], ["support", "confidence"])
48   print
49   
50   del rules[:3]
51   orngAssoc.printRules(rules[:5], ["support", "confidence"])
52   print
53
54The output of this script is::
55
56   87 rules with support higher than or equal to 0.400 found.
57   
58   supp    conf    rule
59   0.888   0.984   engine-location=front -> fuel-type=gas
60   0.888   0.901   fuel-type=gas -> engine-location=front
61   0.805   0.982   engine-location=front -> aspiration=std
62   0.805   0.817   aspiration=std -> engine-location=front
63   0.785   0.958   fuel-type=gas -> aspiration=std
64   
65   supp    conf    rule
66   0.805   0.982   engine-location=front -> aspiration=std
67   0.805   0.817   aspiration=std -> engine-location=front
68   0.785   0.958   fuel-type=gas -> aspiration=std
69   0.771   0.981   fuel-type=gas aspiration=std -> engine-location=front
70   0.771   0.958   aspiration=std engine-location=front -> fuel-type=gas
71   
72Notice that the when printing out the rules, user can specify which
73rule evaluation measures are to be printed. Choose anything from
74``['support', 'confidence', 'lift', 'leverage', 'strength',
75'coverage']``.
76
77The second example uses the same data set, but first prints out five
78most confident rules. Then, it shows a rather advanced type of
79filtering: every rule has parameters that record its support,
80confidence, etc... These may be used when constructing your own filter
81functions. The one in our example uses ``support`` and ``lift``.
82
83.. note:: 
84   If you have just started with Python: lambda is a compact way to
85   specify a simple function without using def statement. As a
86   function, it uses its own name space, so minimal lift and support
87   requested in our example should be passed as function
88   arguments.
89
90Here goes the code (part of :download:`assoc2.py <code/assoc2.py>`)::
91
92   rules = orange.AssociationRulesInducer(data, support = 0.4)
93   
94   n = 5
95   print "%i most confident rules:" % (n)
96   orngAssoc.sort(rules, ["confidence"])
97   orngAssoc.printRules(rules[0:n], ['confidence','support','lift'])
98   
99   conf = 0.8; lift = 1.1
100   print "\nRules with support>%5.3f and lift>%5.3f" % (conf, lift)
101   rulesC=rules.filter(lambda x: x.confidence>conf and x.lift>lift)
102   orngAssoc.sort(rulesC, ['confidence'])
103   orngAssoc.printRules(rulesC, ['confidence','support','lift'])
104   
105Just one rule with requested support and lift is found in our rule set::
106
107   5 most confident rules:
108   conf    supp    lift    rule
109   1.000   0.478   1.015   fuel-type=gas aspiration=std drive-wheels=fwd -> engine-location=front
110   1.000   0.429   1.015   fuel-type=gas aspiration=std num-of-doors=four -> engine-location=front
111   1.000   0.507   1.015   aspiration=std drive-wheels=fwd -> engine-location=front
112   1.000   0.449   1.015   aspiration=std num-of-doors=four -> engine-location=front
113   1.000   0.541   1.015   fuel-type=gas drive-wheels=fwd -> engine-location=front
114   
115   Rules with confidence>0.800 and lift>1.100
116   conf    supp    lift    rule
117   0.898   0.429   1.116   fuel-type=gas num-of-doors=four -> aspiration=std engine-location=front
118   
Note: See TracBrowser for help on using the repository browser.