Changeset 9405:ef07f0737499 in orange


Ignore:
Timestamp:
12/21/11 12:40:06 (2 years ago)
Author:
markotoplak
Branch:
default
Convert:
ff72dc8b7da813c6ba9c0afa5703c46a89e85b82
Message:

Some quick fixes to Associate documentation. References #733.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/associate/__init__.py

    r9349 r9405  
    99itemsets and rules that is designed specifically for datasets with a 
    1010large number of different items. This is, however, not really suitable 
    11 for feature-based machine learning problems, which are at the primary focus 
    12 of Orange. We have thus adapted the original algorithm to be more efficient 
    13 for the latter type of data, and to induce the rules in which, for contrast 
    14 to Agrawal's rules, both sides don't only contain features 
     11for feature-based machine learning problems. 
     12We have adapted the original algorithm for efficiency 
     13with the latter type of data, and to induce the rules where,  
     14both sides don't only contain features 
    1515(like "bread, butter -> jam") but also their values 
    16 ("bread = wheat, butter = yes -> jam = plum"). As a further variation, the 
    17 algorithm can be limited to search only for classification rules in which 
    18 the sole feature to appear on the right side of the rule is the class feature. 
     16("bread = wheat, butter = yes -> jam = plum"). 
    1917 
    2018It is also possible to extract item sets instead of association rules. These 
     
    3533actually means that if "bread" and "butter" are defined, then "jam" is defined 
    3634as well. It is expected that most of values will be undefined - if this is not 
    37 so, you need to use the other association rules inducer, described in the 
    38 chapter :ref:`non-sparse-examples`. 
    39  
    40 Since the usual representation of examples described above is rather unsuitable 
    41 for sparse examples, AssociationRulesSparseInducer can also use examples 
    42 represented a bit differently. Sparse examples have no fixed 
    43 features - the examples' domain is empty, there are neither ordinary nor class 
    44 features. All values assigned to example are given as meta-attributes. 
    45 All meta-attributes need, however, be `registered with the domain descriptor 
    46 <http://orange.biolab.si/doc/reference/Domain.htm#meta-attributes>`_. 
    47 If you have data of this kind, the most suitable format for it is the 
    48 `basket format <http://orange.biolab.si/doc/reference/fileformats.htm#basket>`_. 
    49  
    50 In both cases, the examples are first translated into an internal 
    51 AssociationRulesSparseInducer's internal format for sparse datasets. The 
    52 algorithm first dynamically builds all itemsets (sets of features) that have 
     35so, use the :class:`~AssociationRulesInducer`. 
     36 
     37:class:`AssociationRulesSparseInducer` can also use sparse data.  
     38Sparse examples have no fixed 
     39features - the domain is empty. All values assigned to example are given as meta attributes. 
     40All meta attributes need to be registered with the :obj:`~Orange.data.Domain`. 
     41The most suitable format fot this kind of data it is the basket format. 
     42 
     43The algorithm first dynamically builds all itemsets (sets of features) that have 
    5344at least the prescribed support. Each of these is then used to derive rules 
    5445with requested confidence. 
     
    5849the examples in association rules. 
    5950 
    60 .. class:: Orange.associate.AssociationRulesSparseInducer 
     51.. class:: AssociationRulesSparseInducer 
    6152 
    6253    .. attribute:: support 
    6354     
    64     Minimal support for the rule. 
    65      
     55        Minimal support for the rule. 
     56         
    6657    .. attribute:: confidence 
    6758     
    68     Minimal confidence for the rule. 
    69      
    70     .. attribute:: storeExamples 
    71      
    72     Tells the inducer to store the examples covered by each rule and 
    73     those confirming it. 
    74      
    75     .. attribute:: maxItemSets 
    76      
    77     The maximal number of itemsets. 
    78  
    79 .. _maxItemSets: 
    80  
    81 The maxItemSets attribute deserves some explanation. The algorithm's 
    82 running time (and its memory consumption) depends on the minimal support; 
    83 the lower the requested support, the more eligible itemsets will be found. 
    84 There is no general rule for knowing the itemset in advance (generally, value 
    85 should be around 0.3, but this depends upon the number of different items, the 
    86 diversity of examples...) so it's very easy to set the limit too low. In this 
    87 case, the algorithm can induce hundreds of thousands of itemsets until it 
    88 runs out of memory. To prevent this, it will stop inducing itemsets and 
    89 report an error when the prescribed maximum maxItemSets is exceeded. 
    90 In this case, you should increase the required support. On the other hand, 
    91 you can (reasonably) increase the maxItemSets to as high as you computer is 
    92 able to handle. 
     59        Minimal confidence for the rule. 
     60         
     61    .. attribute:: store_examples 
     62     
     63        Store the examples covered by each rule and 
     64        those confirming it. 
     65         
     66    .. attribute:: max_item_sets 
     67     
     68        The maximal number of itemsets. The algorithm's 
     69        running time (and its memory consumption) depends on the minimal support; 
     70        the lower the requested support, the more eligible itemsets will be found. 
     71        There is no general rule for setting support - perhaps it  
     72        should be around 0.3, but this depends on the data set. 
     73        If the supoort was set too low, the algorithm could run out of memory. 
     74        Therefore, Orange limits the number of generated rules to 
     75        :obj:`max_item_sets`. If Orange reports, that the prescribed 
     76        :obj:`max_item_sets` was exceeded, increase the requered support 
     77        or alternatively, increase :obj:`max_item_sets` to as high as you computer 
     78        can handle. 
     79 
     80    .. method:: __call__(data, weight_id) 
     81 
     82        Induce rules from the data set. 
     83 
     84 
     85    .. method:: get_itemsets(data) 
     86 
     87        Returns a list of pairs. The first element of a pair is a tuple with  
     88        indices of features in the item set (negative for sparse data).  
     89        The second element is a list of indices supporting the item set, that is, 
     90        all the items in the set. If :obj:`store_examples` is False, the second 
     91        element is None. 
    9392 
    9493We shall test the rule inducer on a dataset consisting of a brief description 
     
    131130    0.500   0.714   our -> surprise 
    132131 
    133 If examples are weighted, weight can be passed as an additional argument to 
    134 call operator. 
    135  
    136132To get only a list of supported item sets, one should call the method 
    137 getItemsets. The result is a list whose elements are tuples with two elements. 
    138 The first is a tuple with indices of features in the item set. Sparse examples 
    139 are usually represented with meta-attributes, so this indices will be negative. 
    140 The second element is  a list of indices supporting the item set, that is, 
    141 containing all the items in the set. If storeExamples is False, the second 
    142 element is None. :: 
    143  
    144     inducer = Orange.associate.AssociationRulesSparseInducer(support = 0.5, storeExamples = True) 
    145     itemsets = inducer.getItemsets(data) 
     133get_itemsets:: 
     134 
     135    inducer = Orange.associate.AssociationRulesSparseInducer(support = 0.5, store_examples = True) 
     136    itemsets = inducer.get_itemsets(data) 
    146137     
    147138Now itemsets is a list of itemsets along with the examples supporting them 
    148 since we set storeExamples to True. :: 
     139since we set store_examples to True. :: 
    149140 
    150141    >>> itemsets[5] 
     
    157148indices 1,2, 3, 6 and 9. 
    158149 
    159 This way of representing the itemsets is not very programmer-friendly, but 
    160 it is much more memory efficient than and faster to work with than using 
    161 objects like Variable and Example. 
     150This way of representing the itemsets is memory efficient and faster than using 
     151objects like :obj:`~Orange.data.variable.Variable` and :obj:`~Orange.data.Instance`. 
    162152 
    163153.. _non-sparse-examples: 
    164154 
    165155=================== 
    166 Non-sparse examples 
     156Non-sparse data 
    167157=================== 
    168158 
    169 The other algorithm for association rules provided by Orange 
    170 (AssociationRulesInducer) is optimized for non-sparse examples in the usual 
    171 Orange form. Each example is described by values of a fixed set of features. 
     159:class:`AssociationRulesInducer` works with non-sparse data. 
    172160Unknown values are ignored, while values of features are not (as opposite to 
    173 the above-described algorithm for sparse rules). In addition, the algorithm 
     161the algorithm for sparse rules). In addition, the algorithm 
    174162can be directed to search only for classification rules, in which the only 
    175 feature on the right-hand side is the class Feature. 
    176  
    177 .. class:: Orange.associate.AssociationRulesInducer(float asupp, float aconf) 
    178  
    179     :param asupp: a k-means clustering object. 
    180     :type km: :class:`KMeans` 
     163feature on the right-hand side is the class variable. 
     164 
     165.. class:: AssociationRulesInducer 
     166 
     167    All attributes can be set with the constructor.  
    181168 
    182169    .. attribute:: support 
    183170     
    184     Minimal support for the rule. 
     171       Minimal support for the rule. 
    185172     
    186173    .. attribute:: confidence 
    187174     
    188     Minimal confidence for the rule. 
    189      
    190     .. attribute:: classificationRules 
    191      
    192     If 1 (default is 0), the algorithm constructs classification rules instead 
    193     of general association rules. 
    194      
    195     .. attribute:: storeExamples 
    196      
    197     Tells the inducer to store the examples covered by each rule and those 
    198     confirming it 
    199      
    200     .. attribute:: maxItemSets 
    201      
    202     The maximal number of itemsets. 
    203  
    204 Meaning of all attributes (except the new one, classificationRules) is the 
    205 same as for AssociationRulesSparseInducer. See the description of 
    206 :ref:`maxItemSets <maxItemSets>` there. The example uses :download:`lenses.tab <code/lenses.tab>`:: 
     175        Minimal confidence for the rule. 
     176     
     177    .. attribute:: classification_rules 
     178     
     179        If True (default is False), the classification rules are constructed instead 
     180        of general association rules. 
     181 
     182    .. attribute:: store_examples 
     183     
     184        Store the examples covered by each rule and those 
     185        confirming it 
     186         
     187    .. attribute:: max_item_sets 
     188     
     189        The maximal number of itemsets. 
     190 
     191    .. method:: __call__(data, weight_id) 
     192 
     193        Induce rules from the data set. 
     194 
     195    .. method:: get_itemsets(data) 
     196 
     197        Returns a list of pairs. The first element of a pair is a tuple with  
     198        indices of features in the item set (negative for sparse data).  
     199        The second element is a list of indices supporting the item set, that is, 
     200        all the items in the set. If :obj:`store_examples` is False, the second 
     201        element is None. 
     202 
     203The example:: 
    207204 
    208205    import Orange 
     
    211208 
    212209    print "Association rules" 
    213     rules = Orange.associate.AssociationRulesInducer(data, supp = 0.5) 
     210    rules = Orange.associate.AssociationRulesInducer(data, support = 0.5) 
    214211    for r in rules: 
    215212        print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r) 
     
    227224 
    228225    print "\\nClassification rules" 
    229     rules = orange.AssociationRulesInducer(data, supp = 0.3, classificationRules = 1) 
     226    rules = orange.AssociationRulesInducer(data, support = 0.3, classificationRules = 1) 
    230227    for r in rules: 
    231228        print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r) 
     
    237234    0.500  1.000  tear_rate=reduced -> lenses=none 
    238235     
    239 AssociationRulesInducer can also work with weighted examples; the ID of weight 
    240 feature should be passed as an additional argument in a call. 
    241  
    242236Itemsets are induced in a similar fashion as for sparse data, except that the 
    243237first element of the tuple, the item set, is represented not by indices of 
    244238features, as before, but with tuples (feature-index, value-index): :: 
    245239 
    246     inducer = Orange.associate.AssociationRulesInducer(support = 0.3, storeExamples = True) 
    247     itemsets = inducer.getItemsets(data) 
     240    inducer = Orange.associate.AssociationRulesInducer(support = 0.3, store_examples = True) 
     241    itemsets = inducer.get_itemsets(data) 
    248242    print itemsets[8] 
    249243     
     
    255249(2, 1), and the first value of the fifth (4, 0). 
    256250 
    257 ================= 
    258 AssociationRules 
    259 ================= 
    260  
    261 Both classes for induction of association rules return the induced rules in 
    262 AssociationRules which is basically a list of instances of AssociationRule. 
    263  
    264 .. class:: Orange.associate.AssociationRules 
    265  
    266     .. attribute:: left, right 
    267      
    268         The left and the right side of the rule. Both are given as Example. 
    269         In rules created by AssociationRulesSparseInducer from examples that 
    270         contain all values as meta-values, left and right are examples in the 
    271         same form. Otherwise, values in left that do not appear in the rule 
    272         are "don't care", and value in right are "don't know". Both can, 
    273         however, be tested by isSpecial (see documentation on 
    274         `Value <http://orange.biolab.si/doc/reference/Value.htm>`_). 
    275      
    276     .. attribute:: nLeft, nRight 
    277      
    278         The number of features (i.e. defined values) on the left and on the 
    279         right side of the rule. 
    280      
    281     .. attribute:: nAppliesLeft, nAppliesRight, nAppliesBoth 
    282      
    283         The number of (learning) examples that conform to the left, the right 
    284         and to both sides of the rule. 
    285      
    286     .. attribute:: nExamples 
    287      
    288         The total number of learning examples. 
    289      
    290     .. attribute:: support 
    291      
    292         nAppliesBoth/nExamples. 
    293  
    294     .. attribute:: confidence 
    295      
    296         nAppliesBoth/nAppliesLeft. 
    297      
    298     .. attribute:: coverage 
    299      
    300         nAppliesLeft/nExamples. 
    301  
    302     .. attribute:: strength 
    303      
    304         nAppliesRight/nAppliesLeft. 
    305      
    306     .. attribute:: lift 
    307      
    308         nExamples * nAppliesBoth / (nAppliesLeft * nAppliesRight). 
    309      
    310     .. attribute:: leverage 
    311      
    312         (nAppliesBoth * nExamples - nAppliesLeft * nAppliesRight). 
    313      
    314     .. attribute:: examples, matchLeft, matchBoth 
    315      
    316         If storeExamples was True during induction, examples contains a copy 
    317         of the example table used to induce the rules. Attributes matchLeft 
    318         and matchBoth are lists of integers, representing the indices of 
    319         examples which match the left-hand side of the rule and both sides, 
    320         respectively. 
    321  
    322     .. method:: AssociationRule(left, right, nAppliesLeft, nAppliesRight, nAppliesBoth, nExamples) 
     251======================= 
     252Representation of rules 
     253======================= 
     254 
     255An :class:`AssociationRule` represents a rule. In Orange, methods for  
     256induction of association rules return the induced rules in 
     257:class:`AssociationRules`, which is basically a list of :class:`AssociationRule` instances. 
     258 
     259.. class:: AssociationRule 
     260 
     261    .. method:: __init__(left, right, n_applies_left, n_applies_right, n_applies_both, n_examples) 
    323262     
    324263        Constructs an association rule and computes all measures listed above. 
    325264     
    326     .. method:: AssociationRule(left, right, support, confidence) 
     265    .. method:: __init__(left, right, support, confidence) 
    327266     
    328267        Construct association rule and sets its support and confidence. If 
     
    331270        from arguments support and confidence. 
    332271     
    333     .. method:: AssociationRule(rule) 
     272    .. method:: __init__(rule) 
    334273     
    335274        Given an association rule as the argument, constructor copies of the 
    336275        rule. 
    337      
    338     .. method:: appliesLeft(example) 
    339      
    340     .. method:: appliesRight(example) 
    341      
    342     .. method:: appliesBoth(example) 
     276  
     277    .. attribute:: left, right 
     278     
     279        The left and the right side of the rule. Both are given as :class:`Orange.data.Instance`. 
     280        In rules created by :class:`AssociationRulesSparseInducer` from examples that 
     281        contain all values as meta-values, left and right are examples in the 
     282        same form. Otherwise, values in left that do not appear in the rule 
     283        are "don't care", and value in right are "don't know". Both can, 
     284        however, be tested by :meth:`~Orange.data.Value.is_special`. 
     285     
     286    .. attribute:: n_left, n_right 
     287     
     288        The number of features (i.e. defined values) on the left and on the 
     289        right side of the rule. 
     290     
     291    .. attribute:: n_applies_left, n_applies_right, n_applies_both 
     292     
     293        The number of (learning) examples that conform to the left, the right 
     294        and to both sides of the rule. 
     295     
     296    .. attribute:: n_examples 
     297     
     298        The total number of learning examples. 
     299     
     300    .. attribute:: support 
     301     
     302        nAppliesBoth/nExamples. 
     303 
     304    .. attribute:: confidence 
     305     
     306        n_applies_both/n_applies_left. 
     307     
     308    .. attribute:: coverage 
     309     
     310        n_applies_left/n_examples. 
     311 
     312    .. attribute:: strength 
     313     
     314        n_applies_right/n_applies_left. 
     315     
     316    .. attribute:: lift 
     317     
     318        n_examples * n_applies_both / (n_applies_left * n_applies_right). 
     319     
     320    .. attribute:: leverage 
     321     
     322        (n_Applies_both * n_examples - n_applies_left * n_applies_right). 
     323     
     324    .. attribute:: examples, match_left, match_both 
     325     
     326        If store_examples was True during induction, examples contains a copy 
     327        of the example table used to induce the rules. Attributes match_left 
     328        and match_both are lists of integers, representing the indices of 
     329        examples which match the left-hand side of the rule and both sides, 
     330        respectively. 
     331    
     332    .. method:: applies_left(example) 
     333     
     334    .. method:: applies_right(example) 
     335     
     336    .. method:: applies_both(example) 
    343337     
    344338        Tells whether the example fits into the left, right or both sides of 
     
    347341     
    348342Association rule inducers do not store evidence about which example supports 
    349 which rule (although this information is available during induction its 
    350 discarded afterwards). Let us write a function that finds the examples that 
     343which rule. Let us write a function that finds the examples that 
    351344confirm the rule (fit both sides of it) and those that contradict it (fit the 
    352 left-hand side but not the right). The example uses the :download:`lenses.tab <code/lenses.tab>`:: 
     345left-hand side but not the right). The example:: 
    353346 
    354347    import Orange 
     
    371364    print "Contradicting examples:" 
    372365    for example in data: 
    373         if rule.appliesLeft(example) and not rule.appliesRight(example): 
     366        if rule.applies_left(example) and not rule.applies_right(example): 
    374367            print example 
    375368    print 
     
    379372 
    380373    print "Match left: " 
    381     print "\\n".join(str(rule.examples[i]) for i in rule.matchLeft) 
     374    print "\\n".join(str(rule.examples[i]) for i in rule.match_left) 
    382375    print "\\nMatch both: " 
    383     print "\\n".join(str(rule.examples[i]) for i in rule.matchBoth) 
     376    print "\\n".join(str(rule.examples[i]) for i in rule.match_both) 
    384377 
    385378The "contradicting" examples are then those whose indices are found in 
    386 matchLeft but not in matchBoth. The memory friendlier and the faster way 
     379match_left but not in match_both. The memory friendlier and the faster way 
    387380to compute this is as follows: :: 
    388381 
    389     >>> [x for x in rule.matchLeft if not x in rule.matchBoth] 
     382    >>> [x for x in rule.match_left if not x in rule.match_both] 
    390383    [0, 2, 8, 10, 16, 17, 18] 
    391     >>> set(rule.matchLeft) - set(rule.matchBoth) 
     384    >>> set(rule.match_left) - set(rule.match_both) 
    392385    set([0, 2, 8, 10, 16, 17, 18]) 
    393386 
    394387""" 
     388 
     389 
     390 
    395391 
    396392from orange import \ 
     
    401397    ItemsetNodeProxy, \ 
    402398    ItemsetsSparseInducer 
     399 
     400 
Note: See TracChangeset for help on using the changeset viewer.