Changeset 7235:f579c6c0fc54 in orange

02/02/11 20:36:19 (3 years ago)
Gregor Rot <gregor.rot@…>
1 edited


  • orange/Orange/associate/

    r6848 r7235  
     6Orange provides two algorithms for induction of association rules. One is the basic Agrawal's algorithm with dynamic induction of supported itemsets and rules that is designed specifically for datasets with a large number of different items. This is, however, not really suitable for attribute-based machine learning problems, which are at the primary focus of Orange. We have thus adapted the original algorithm to be more efficient for the latter type of data, and to induce the rules in which, for contrast to Agrawal's rules, both sides don't only contain attributes (like "bread, butter -> jam") but also their values ("bread = wheat, butter = yes -> jam = plum"). As a further variation, the algorithm can be limited to search only for classification rules in which the sole attribute to appear on the right side of the rule is the class attribute. 
     8It is also possible to extract item sets instead of association rules. These are often more interesting than the rules themselves. 
     10Besides association rule inducer, Orange also provides a rather simplified method for classification by association rules. 
     13Association rules inducer with Agrawal's algorithm 
     16The class that induces rules by Agrawal's algorithm, accepts the data examples of two forms. The first is the standard form in which each examples is described by values of a fixed list of attributes, defined in domain. The algorithm, however, disregards the attribute values but only checks whether the value is defined or not. The rule shown above, "bread, butter -> jam" actually means that if "bread" and "butter" are defined, then "jam" is defined as well. It is expected that most of values will be undefined - if this is not so, you need to use the other association rules inducer, described in the next chapter. 
     18Since the usual representation of examples described above is rather unsuitable for sparse examples, AssociationRulesSparseInducer can also use examples represented a bit differently. Sparse examples have no fixed attributes - the examples' domain is empty, there are neither ordinary nor class attributes. All values assigned to example are given as meta-attributes. All meta-attributes need, however, be `registered with the domain descriptor <>`_. If you have data of this kind, the most suitable format for it is the `basket format <>`_. 
     20In both cases, the examples are first translated into an internal AssociationRulesSparseInducer's internal format for sparse datasets. The algorithm first dynamically builds all itemsets (sets of attributes) that have at least the prescribed support. Each of these is then used to derive rules with requested confidence. 
     22If examples were given in the sparse form, so are the left and right side of the induced rules. If examples were given in the standard form, so are the examples in association rules. 
     24.. class:: Orange.associate.AssociationRulesSparseInducer 
     26    .. attribute:: support 
     27    Minimal support for the rule. 
     29    .. attribute:: confidence 
     30    Minimal confidence for the rule. 
     32    .. attribute:: storeExamples 
     33    Tells the inducer to store the examples covered by each rule and those confirming it. 
     35    .. attribute:: maxItemSets 
     36    The maximal number of itemsets. 
     38The last attribute deserves some explanation. The algorithm's running time (and its memory consumption) depends on the minimal support; the lower the requested support, the more eligible itemsets will be found. There is no general rule for knowing the itemset in advance (generally, value should be around 0.3, but this depends upon the number of different items, the diversity of examples...) so it's very easy to set the limit too low. In this case, the algorithm can induce hundreds of thousands of itemsets until it runs out of memory. To prevent this, it will stop inducing itemsets and report an error when the prescribed maximum maxItemSets is exceeded. In this case, you should increase the required support. On the other hand, you can (reasonably) increase the maxItemSets to as high as you computer is able to handle. 
     41Examples for AssociationRulesSparseInducer 
     44We shall test the rule inducer on a dataset consisting of a brief description of Spanish Inquisition, given by Palin et al: 
     46    NOBODY expects the Spanish Inquisition! Our chief weapon is surprise...surprise and fear...fear and surprise.... Our two weapons are fear and surprise...and ruthless efficiency.... Our *three* weapons are fear, surprise, and ruthless efficiency...and an almost fanatical devotion to the Pope.... Our *four* *Amongst* our weapons.... Amongst our weaponry...are such elements as fear, surprise.... I'll come in again. 
     48    NOBODY expects the Spanish Inquisition! Amongst our weaponry are such diverse elements as: fear, surprise, ruthless efficiency, an almost fanatical devotion to the Pope, and nice red uniforms - Oh damn! 
     50The text needs to be cleaned of punctuation marks and capital letters at beginnings of the sentences, each sentence needs to be put in a new line and commas need to be inserted between the words. 
     52inquisition.basket :: 
     54    nobody, expects, the, Spanish, Inquisition 
     55    our, chief, weapon, is, surprise, surprise, and, fear,fear, and, surprise 
     56    our, two, weapons, are, fear, and, surprise, and, ruthless, efficiency 
     57    our, three, weapons, are, fear, surprise, and, ruthless, efficiency, and, an, almost, fanatical, devotion, to, the, Pope 
     58    our, four, no 
     59    amongst, our, weapons 
     60    amongst, our, weaponry, are, such, elements, as, fear, surprise 
     61    I'll, come, in, again 
     62    nobody, expects, the, Spanish, Inquisition 
     63    amongst, our, weaponry, are, such, diverse, elements, as, fear, surprise, ruthless, efficiency, an, almost, fanatical, devotion, to, the, Pope, and, nice, red, uniforms, oh damn 
     65Inducing the rules is trivial. 
     66 (uses inquisition.basket) :: 
     69    import Orange 
     70    data ="inquisition") 
     72    rules = Orange.associate.AssociationRulesSparseInducer(data, support = 0.5) 
     74    print "%5s   %5s" % ("supp", "conf") 
     75    for r in rules: 
     76        print "%5.3f   %5.3f   %s" % (, r.confidence, r) 
     78The induced rules are surprisingly fear-full. :: 
     80    0.500   1.000   fear -> surprise 
     81    0.500   1.000   surprise -> fear 
     82    0.500   1.000   fear -> surprise our 
     83    0.500   1.000   fear surprise -> our 
     84    0.500   1.000   fear our -> surprise 
     85    0.500   1.000   surprise -> fear our 
     86    0.500   1.000   surprise our -> fear 
     87    0.500   0.714   our -> fear surprise 
     88    0.500   1.000   fear -> our 
     89    0.500   0.714   our -> fear 
     90    0.500   1.000   surprise -> our 
     91    0.500   0.714   our -> surprise 
     93If examples are weighted, weight can be passed as an additional argument to call operator. 
     95To get only a list of supported item sets, one should call the method getItemsets. The result is a list whose elements are tuples with two elements. The first is a tuple with indices of attributes in the item set. Sparse examples are usually represented with meta attributes, so this indices will be negative. The second element is  a list of indices supporting the item set, that is, containing all the items in the set. If storeExamples is False, the second element is None. 
     96 (uses inquisition.basket) :: 
     99    inducer = Orange.associate.AssociationRulesSparseInducer(support = 0.5, storeExamples = True) 
     100    itemsets = inducer.getItemsets(data) 
     102Now itemsets is a list of itemsets along with the examples supporting them since we set storeExamples to True. :: 
     104    >>> itemsets[5] 
     105    ((-11, -7), [1, 2, 3, 6, 9]) 
     106    >>> [data.domain[i].name for i in itemsets[5][0]] 
     107    ['surprise', 'our']    
     109The sixth itemset contains attributes with indices -11 and -7, that is, the words "surprise" and "our". The examples supporting it are those with indices 1,2, 3, 6 and 9. 
     111This way of representing the itemsets is not very programmer-friendly, but it is much more memory efficient than and faster to work with than using objects like Variable and Example. 
     114Association rules for non-sparse examples 
     117The other algorithm for association rules provided by Orange, AssociationRulesInducer is optimized for non-sparse examples in the usual Orange form. Each example is described by values of a fixed set of attributes. Unknown values are ignored, while values of attributes are not (as opposite to the above-described algorithm for sparse rules). In addition, the algorithm can be directed to search only for classification rules, in which the only attribute on the right-hand side is the class attribute. 
     119.. class:: Orange.associate.AssociationRulesInducer 
     121    .. attribute:: support 
     122    Minimal support for the rule. 
     124    .. attribute:: confidence 
     125    Minimal confidence for the rule. 
     127    .. attribute:: classificationRules 
     128    If 1 (default is 0), the algorithm constructs classification rules instead of general association rules. 
     130    .. attribute:: storeExamples 
     131    Tells the inducer to store the examples covered by each rule and those confirming it 
     133    .. attribute:: maxItemSets 
     134    The maximal number of itemsets. 
     136Meaning of all attributes (except the new one, classificationRules) is the same as for AssociationRulesSparseInducer. See the description of maxItemSet there. 
     137 (uses :: 
     140    import orange 
     142    data = orange.ExampleTable("lenses") 
     144    print "Association rules" 
     145    rules = orange.AssociationRulesInducer(data, supp = 0.5) 
     146    for r in rules: 
     147        print "%5.3f  %5.3f  %s" % (, r.confidence, r) 
     149The found rules are :: 
     151    0.333  0.533  lenses=none -> prescription=hypermetrope 
     152    0.333  0.667  prescription=hypermetrope -> lenses=none 
     153    0.333  0.533  lenses=none -> astigmatic=yes 
     154    0.333  0.667  astigmatic=yes -> lenses=none 
     155    0.500  0.800  lenses=none -> tear_rate=reduced 
     156    0.500  1.000  tear_rate=reduced -> lenses=none 
     158To limit the algorithm to classification rules, set classificationRules to 1. :: 
     160    import orange 
     162    data = orange.ExampleTable("inquisition") 
     163    rules = orange.AssociationRulesSparseInducer(data, 
     164                support = 0.5, storeExamples = True) 
     166    print "%5s   %5s" % ("supp", "conf") 
     167    for r in rules: 
     168        print "%5.3f   %5.3f   %s" % (, r.confidence, r) 
     170The found rules are, naturally, a subset of the above rules. :: 
     172    0.333  0.667  prescription=hypermetrope -> lenses=none 
     173    0.333  0.667  astigmatic=yes -> lenses=none 
     174    0.500  1.000  tear_rate=reduced -> lenses=none 
     176AssociationRulesInducer can also work with weighted examples; the ID of weight attribute should be passed as an additional argument in a call. 
     178Itemsets are induced in a similar fashion as for sparse data, except that the first element of the tuple, the item set, is represented not by indices of attributes, as before, but with tuples (attribute-index, value-index). :: 
     180    inducer = orange.AssociationRulesInducer(support = 0.3, storeExamples = True) 
     181    itemsets = inducer.getItemsets(data) 
     182    print itemsets[8] 
     184This prints out :: 
     186    (((2, 1), (4, 0)), [2, 6, 10, 14, 15, 18, 22, 23]) 
     188meaning that the ninth itemset contains the second value of the third attribute, (2, 1), and the first value of the fifth, (4, 0). 
     191Association rule 
     194Both classes for induction of association rules return the induced rules in AssociationRules which is basically a list of instances of AssociationRule. 
     196.. class:: Orange.associate.AssociationRules 
     198    .. attribute:: left, right 
     199    The left and the right side of the rule. Both are given as Example. In rules created by AssociationRulesSparseInducer from examples that contain all values as meta-values, left and right are examples in the same form. Otherwise, values in left that do not appear in the rule are don't care, and value in right are don't know. Both can, however, be tested by isSpecial (see documentation on  `Value <>`_). 
     201    .. attribute:: nLeft, nRight 
     202    The number of attributes (ie defined values) on the left and on the right side of the rule. 
     204    .. attribute:: nAppliesLeft, nAppliesRight, nAppliesBoth 
     205    The number of (learning) examples that conform to the left, the right and to both sides of the rule. 
     207    .. attribute:: nExamples 
     208    The total number of learning examples. 
     210    .. attribute:: support 
     211    nAppliesBoth/nExamples. 
     213    .. attribute:: confidence 
     214    nAppliesBoth/nAppliesLeft. 
     216    .. attribute:: coverage 
     217    nAppliesLeft/nExamples. 
     219    .. attribute:: strength 
     220    nAppliesRight/nAppliesLeft. 
     222    .. attribute:: lift 
     223    nExamples * nAppliesBoth / (nAppliesLeft * nAppliesRight). 
     225    .. attribute:: leverage 
     226    (nAppliesBoth * nExamples - nAppliesLeft * nAppliesRight). 
     228    .. attribute:: examples, matchLeft, matchBoth 
     229    If storeExamples was True during induction, examples contains a copy of the example table used to induce the rules. Attributes matchLeft and matchBoth are lists of integers, representing the indices of examples which match the left-hand side of the rule and both sides, respectively.     
     231    .. method:: AssociationRule(left, right, nAppliesLeft, nAppliesRight, nAppliesBoth, nExamples) 
     232    Constructs an association rule and computes all measures listed above. 
     234    .. method:: AssociationRule(left, right, support, confidence]]) 
     235    Construct association rule and sets its support and confidence. If you intend to pass such a rule to someone that expects more things to be set, you should set the manually - AssociationRules's constructor cannot compute anything from these two arguments. 
     237    .. method:: AssociationRule(rule) 
     238    Given an association rules as the argument, constructor copies the rule into a new rule. 
     240    .. method:: appliesLeft(example), appliesRight(example), appliesBoth(example) 
     241    Tells whether the example fits into the left, right and both sides of the rule, respectively. If the rule is represented by sparse examples, the given example must be sparse as well. 
     243Association rule inducers do not store evidence about which example supports which rule (although this is available during induction, the information is discarded afterwards). Let us write a function that find the examples that confirm the rule (ie fit both sides of it) and those that contradict it (fit the left-hand side but not the right). :: 
     245    import orange 
     247    data = orange.ExampleTable("lenses") 
     249    rules = orange.AssociationRulesInducer(data, supp = 0.3) 
     250    rule = rules[0] 
     252    print 
     253    print "Rule: ", rule 
     254    print 
     256    print "Supporting examples:" 
     257    for example in data: 
     258        if rule.appliesBoth(example): 
     259            print example 
     260    print 
     262    print "Contradicting examples:" 
     263    for example in data: 
     264        if rule.appliesLeft(example) and not rule.appliesRight(example): 
     265            print example 
     266    print 
     268The latter printouts get simpler and (way!) faster if we instruct the inducer to store the examples. We can then do, for instance, this. :: 
     270    print "Match left: " 
     271    print "\\n".join(str(rule.examples[i]) for i in rule.matchLeft) 
     272    print "\\nMatch both: " 
     273    print "\\n".join(str(rule.examples[i]) for i in rule.matchBoth) 
     275The "contradicting" examples are then those whose indices are find in matchLeft but not in matchBoth. The memory friendlier and the faster ways to compute this are as follows. :: 
     277    >>> [x for x in rule.matchLeft if not x in rule.matchBoth] 
     278    [0, 2, 8, 10, 16, 17, 18] 
     279    >>> set(rule.matchLeft) - set(rule.matchBoth) 
     280    set([0, 2, 8, 10, 16, 17, 18]) 
    1284from orange import \ 
    2      AssociationRule, \ 
     285    AssociationRule, \ 
    3286    AssociationRules, \ 
    4287    AssociationRulesInducer, \ 
Note: See TracChangeset for help on using the changeset viewer.