Ignore:
Files:
566 added
377 deleted
99 edited

Legend:

Unmodified
Added
Removed
  • Orange/OrangeWidgets/Data/OWDataDomain.py

    r9671 r9996  
    629629                    mid = original_metas[meta] 
    630630                else: 
    631                     mid = Orange.data.new_meta_id() 
     631                    mid = Orange.feature.Descriptor.new_meta_id() 
    632632                domain.addmeta(mid, meta) 
    633633            newdata = Orange.data.Table(domain, self.data) 
  • Orange/OrangeWidgets/Data/OWPurgeDomain.py

    r9671 r9997  
    134134            for attr in self.data.domain.attributes: 
    135135                if attr.varType == orange.VarTypes.Continuous: 
    136                     if orange.RemoveRedundantOneValue.hasAtLeastTwoValues(self.data, attr): 
     136                    if orange.RemoveRedundantOneValue.has_at_least_two_values(self.data, attr): 
    137137                        newattrs.append(attr) 
    138138                    else: 
  • Orange/OrangeWidgets/Prototypes/OWCorrelations.py

    r9671 r9996  
    115115    domain = Orange.data.Domain(attrs, None) 
    116116    row_name = variable.String("Row name") 
    117     domain.addmeta(Orange.data.new_meta_id(), row_name) 
     117    domain.addmeta(Orange.feature.Descriptor.new_meta_id(), row_name) 
    118118     
    119119    table = Orange.data.Table(domain, [list(r) for r in matrix]) 
     
    445445                 
    446446                domain = Orange.data.Domain([pearson, spearman], None) 
    447                 domain.addmeta(Orange.data.new_meta_id(), row_name) 
     447                domain.addmeta(Orange.feature.Descriptor.new_meta_id(), row_name) 
    448448                table = Orange.data.Table(domain, self.target_correlations) 
    449449                for inst, name in zip(table, self.var_names): 
  • Orange/__init__.py

    r9929 r9986  
    1919_import("data.io") 
    2020_import("data.sample") 
     21_import("data.utils") 
     22_import("data.discretization") 
    2123 
    2224_import("network") 
  • Orange/associate/__init__.py

    r9919 r9988  
    1 """ 
    2 ============================== 
    3 Induction of association rules 
    4 ============================== 
    5  
    6 Orange provides two algorithms for induction of 
    7 `association rules <http://en.wikipedia.org/wiki/Association_rule_learning>`_. 
    8 One is the basic Agrawal's algorithm with dynamic induction of supported 
    9 itemsets and rules that is designed specifically for datasets with a 
    10 large number of different items. This is, however, not really suitable 
    11 for feature-based machine learning problems. 
    12 We have adapted the original algorithm for efficiency 
    13 with the latter type of data, and to induce the rules where,  
    14 both sides don't only contain features 
    15 (like "bread, butter -> jam") but also their values 
    16 ("bread = wheat, butter = yes -> jam = plum"). 
    17  
    18 It is also possible to extract item sets instead of association rules. These 
    19 are often more interesting than the rules themselves. 
    20  
    21 Besides association rule inducer, Orange also provides a rather simplified 
    22 method for classification by association rules. 
    23  
    24 =================== 
    25 Agrawal's algorithm 
    26 =================== 
    27  
    28 The class that induces rules by Agrawal's algorithm, accepts the data examples 
    29 of two forms. The first is the standard form in which each example is 
    30 described by values of a fixed list of features (defined in domain). 
    31 The algorithm, however, disregards the feature values and only checks whether 
    32 the value is defined or not. The rule shown above ("bread, butter -> jam") 
    33 actually means that if "bread" and "butter" are defined, then "jam" is defined 
    34 as well. It is expected that most of values will be undefined - if this is not 
    35 so, use the :class:`~AssociationRulesInducer`. 
    36  
    37 :class:`AssociationRulesSparseInducer` can also use sparse data.  
    38 Sparse examples have no fixed 
    39 features - the domain is empty. All values assigned to example are given as meta attributes. 
    40 All meta attributes need to be registered with the :obj:`~Orange.data.Domain`. 
    41 The most suitable format fot this kind of data it is the basket format. 
    42  
    43 The algorithm first dynamically builds all itemsets (sets of features) that have 
    44 at least the prescribed support. Each of these is then used to derive rules 
    45 with requested confidence. 
    46  
    47 If examples were given in the sparse form, so are the left and right side 
    48 of the induced rules. If examples were given in the standard form, so are 
    49 the examples in association rules. 
    50  
    51 .. class:: AssociationRulesSparseInducer 
    52  
    53     .. attribute:: support 
    54      
    55         Minimal support for the rule. 
    56          
    57     .. attribute:: confidence 
    58      
    59         Minimal confidence for the rule. 
    60          
    61     .. attribute:: store_examples 
    62      
    63         Store the examples covered by each rule and 
    64         those confirming it. 
    65          
    66     .. attribute:: max_item_sets 
    67      
    68         The maximal number of itemsets. The algorithm's 
    69         running time (and its memory consumption) depends on the minimal support; 
    70         the lower the requested support, the more eligible itemsets will be found. 
    71         There is no general rule for setting support - perhaps it  
    72         should be around 0.3, but this depends on the data set. 
    73         If the supoort was set too low, the algorithm could run out of memory. 
    74         Therefore, Orange limits the number of generated rules to 
    75         :obj:`max_item_sets`. If Orange reports, that the prescribed 
    76         :obj:`max_item_sets` was exceeded, increase the requered support 
    77         or alternatively, increase :obj:`max_item_sets` to as high as you computer 
    78         can handle. 
    79  
    80     .. method:: __call__(data, weight_id) 
    81  
    82         Induce rules from the data set. 
    83  
    84  
    85     .. method:: get_itemsets(data) 
    86  
    87         Returns a list of pairs. The first element of a pair is a tuple with  
    88         indices of features in the item set (negative for sparse data).  
    89         The second element is a list of indices supporting the item set, that is, 
    90         all the items in the set. If :obj:`store_examples` is False, the second 
    91         element is None. 
    92  
    93 We shall test the rule inducer on a dataset consisting of a brief description 
    94 of Spanish Inquisition, given by Palin et al: 
    95  
    96     NOBODY expects the Spanish Inquisition! Our chief weapon is surprise...surprise and fear...fear and surprise.... Our two weapons are fear and surprise...and ruthless efficiency.... Our *three* weapons are fear, surprise, and ruthless efficiency...and an almost fanatical devotion to the Pope.... Our *four*...no... *Amongst* our weapons.... Amongst our weaponry...are such elements as fear, surprise.... I'll come in again. 
    97  
    98     NOBODY expects the Spanish Inquisition! Amongst our weaponry are such diverse elements as: fear, surprise, ruthless efficiency, an almost fanatical devotion to the Pope, and nice red uniforms - Oh damn! 
    99      
    100 The text needs to be cleaned of punctuation marks and capital letters at beginnings of the sentences, each sentence needs to be put in a new line and commas need to be inserted between the words. 
    101  
    102 Data example (:download:`inquisition.basket <code/inquisition.basket>`): 
    103  
    104 .. literalinclude:: code/inquisition.basket 
    105     
    106 Inducing the rules is trivial (uses :download:`inquisition.basket <code/inquisition.basket>`):: 
    107  
    108     import Orange 
    109     data = Orange.data.Table("inquisition") 
    110  
    111     rules = Orange.associate.AssociationRulesSparseInducer(data, support = 0.5) 
    112  
    113     print "%5s   %5s" % ("supp", "conf") 
    114     for r in rules: 
    115         print "%5.3f   %5.3f   %s" % (r.support, r.confidence, r) 
    116  
    117 The induced rules are surprisingly fear-full: :: 
    118  
    119     0.500   1.000   fear -> surprise 
    120     0.500   1.000   surprise -> fear 
    121     0.500   1.000   fear -> surprise our 
    122     0.500   1.000   fear surprise -> our 
    123     0.500   1.000   fear our -> surprise 
    124     0.500   1.000   surprise -> fear our 
    125     0.500   1.000   surprise our -> fear 
    126     0.500   0.714   our -> fear surprise 
    127     0.500   1.000   fear -> our 
    128     0.500   0.714   our -> fear 
    129     0.500   1.000   surprise -> our 
    130     0.500   0.714   our -> surprise 
    131  
    132 To get only a list of supported item sets, one should call the method 
    133 get_itemsets:: 
    134  
    135     inducer = Orange.associate.AssociationRulesSparseInducer(support = 0.5, store_examples = True) 
    136     itemsets = inducer.get_itemsets(data) 
    137      
    138 Now itemsets is a list of itemsets along with the examples supporting them 
    139 since we set store_examples to True. :: 
    140  
    141     >>> itemsets[5] 
    142     ((-11, -7), [1, 2, 3, 6, 9]) 
    143     >>> [data.domain[i].name for i in itemsets[5][0]] 
    144     ['surprise', 'our']    
    145      
    146 The sixth itemset contains features with indices -11 and -7, that is, the 
    147 words "surprise" and "our". The examples supporting it are those with 
    148 indices 1,2, 3, 6 and 9. 
    149  
    150 This way of representing the itemsets is memory efficient and faster than using 
    151 objects like :obj:`~Orange.feature.Descriptor` and :obj:`~Orange.data.Instance`. 
    152  
    153 .. _non-sparse-examples: 
    154  
    155 =================== 
    156 Non-sparse data 
    157 =================== 
    158  
    159 :class:`AssociationRulesInducer` works with non-sparse data. 
    160 Unknown values are ignored, while values of features are not (as opposite to 
    161 the algorithm for sparse rules). In addition, the algorithm 
    162 can be directed to search only for classification rules, in which the only 
    163 feature on the right-hand side is the class variable. 
    164  
    165 .. class:: AssociationRulesInducer 
    166  
    167     All attributes can be set with the constructor.  
    168  
    169     .. attribute:: support 
    170      
    171        Minimal support for the rule. 
    172      
    173     .. attribute:: confidence 
    174      
    175         Minimal confidence for the rule. 
    176      
    177     .. attribute:: classification_rules 
    178      
    179         If True (default is False), the classification rules are constructed instead 
    180         of general association rules. 
    181  
    182     .. attribute:: store_examples 
    183      
    184         Store the examples covered by each rule and those 
    185         confirming it 
    186          
    187     .. attribute:: max_item_sets 
    188      
    189         The maximal number of itemsets. 
    190  
    191     .. method:: __call__(data, weight_id) 
    192  
    193         Induce rules from the data set. 
    194  
    195     .. method:: get_itemsets(data) 
    196  
    197         Returns a list of pairs. The first element of a pair is a tuple with  
    198         indices of features in the item set (negative for sparse data).  
    199         The second element is a list of indices supporting the item set, that is, 
    200         all the items in the set. If :obj:`store_examples` is False, the second 
    201         element is None. 
    202  
    203 The example:: 
    204  
    205     import Orange 
    206  
    207     data = Orange.data.Table("lenses") 
    208  
    209     print "Association rules" 
    210     rules = Orange.associate.AssociationRulesInducer(data, support = 0.5) 
    211     for r in rules: 
    212         print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r) 
    213          
    214 The found rules are: :: 
    215  
    216     0.333  0.533  lenses=none -> prescription=hypermetrope 
    217     0.333  0.667  prescription=hypermetrope -> lenses=none 
    218     0.333  0.533  lenses=none -> astigmatic=yes 
    219     0.333  0.667  astigmatic=yes -> lenses=none 
    220     0.500  0.800  lenses=none -> tear_rate=reduced 
    221     0.500  1.000  tear_rate=reduced -> lenses=none 
    222      
    223 To limit the algorithm to classification rules, set classificationRules to 1: :: 
    224  
    225     print "\\nClassification rules" 
    226     rules = orange.AssociationRulesInducer(data, support = 0.3, classificationRules = 1) 
    227     for r in rules: 
    228         print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r) 
    229  
    230 The found rules are, naturally, a subset of the above rules: :: 
    231  
    232     0.333  0.667  prescription=hypermetrope -> lenses=none 
    233     0.333  0.667  astigmatic=yes -> lenses=none 
    234     0.500  1.000  tear_rate=reduced -> lenses=none 
    235      
    236 Itemsets are induced in a similar fashion as for sparse data, except that the 
    237 first element of the tuple, the item set, is represented not by indices of 
    238 features, as before, but with tuples (feature-index, value-index): :: 
    239  
    240     inducer = Orange.associate.AssociationRulesInducer(support = 0.3, store_examples = True) 
    241     itemsets = inducer.get_itemsets(data) 
    242     print itemsets[8] 
    243      
    244 This prints out :: 
    245  
    246     (((2, 1), (4, 0)), [2, 6, 10, 14, 15, 18, 22, 23]) 
    247      
    248 meaning that the ninth itemset contains the second value of the third feature 
    249 (2, 1), and the first value of the fifth (4, 0). 
    250  
    251 ======================= 
    252 Representation of rules 
    253 ======================= 
    254  
    255 An :class:`AssociationRule` represents a rule. In Orange, methods for  
    256 induction of association rules return the induced rules in 
    257 :class:`AssociationRules`, which is basically a list of :class:`AssociationRule` instances. 
    258  
    259 .. class:: AssociationRule 
    260  
    261     .. method:: __init__(left, right, n_applies_left, n_applies_right, n_applies_both, n_examples) 
    262      
    263         Constructs an association rule and computes all measures listed above. 
    264      
    265     .. method:: __init__(left, right, support, confidence) 
    266      
    267         Construct association rule and sets its support and confidence. If 
    268         you intend to pass on such a rule you should set other attributes 
    269         manually - AssociationRules's constructor cannot compute anything 
    270         from arguments support and confidence. 
    271      
    272     .. method:: __init__(rule) 
    273      
    274         Given an association rule as the argument, constructor copies of the 
    275         rule. 
    276   
    277     .. attribute:: left, right 
    278      
    279         The left and the right side of the rule. Both are given as :class:`Orange.data.Instance`. 
    280         In rules created by :class:`AssociationRulesSparseInducer` from examples that 
    281         contain all values as meta-values, left and right are examples in the 
    282         same form. Otherwise, values in left that do not appear in the rule 
    283         are "don't care", and value in right are "don't know". Both can, 
    284         however, be tested by :meth:`~Orange.data.Value.is_special`. 
    285      
    286     .. attribute:: n_left, n_right 
    287      
    288         The number of features (i.e. defined values) on the left and on the 
    289         right side of the rule. 
    290      
    291     .. attribute:: n_applies_left, n_applies_right, n_applies_both 
    292      
    293         The number of (learning) examples that conform to the left, the right 
    294         and to both sides of the rule. 
    295      
    296     .. attribute:: n_examples 
    297      
    298         The total number of learning examples. 
    299      
    300     .. attribute:: support 
    301      
    302         nAppliesBoth/nExamples. 
    303  
    304     .. attribute:: confidence 
    305      
    306         n_applies_both/n_applies_left. 
    307      
    308     .. attribute:: coverage 
    309      
    310         n_applies_left/n_examples. 
    311  
    312     .. attribute:: strength 
    313      
    314         n_applies_right/n_applies_left. 
    315      
    316     .. attribute:: lift 
    317      
    318         n_examples * n_applies_both / (n_applies_left * n_applies_right). 
    319      
    320     .. attribute:: leverage 
    321      
    322         (n_Applies_both * n_examples - n_applies_left * n_applies_right). 
    323      
    324     .. attribute:: examples, match_left, match_both 
    325      
    326         If store_examples was True during induction, examples contains a copy 
    327         of the example table used to induce the rules. Attributes match_left 
    328         and match_both are lists of integers, representing the indices of 
    329         examples which match the left-hand side of the rule and both sides, 
    330         respectively. 
    331     
    332     .. method:: applies_left(example) 
    333      
    334     .. method:: applies_right(example) 
    335      
    336     .. method:: applies_both(example) 
    337      
    338         Tells whether the example fits into the left, right or both sides of 
    339         the rule, respectively. If the rule is represented by sparse examples, 
    340         the given example must be sparse as well. 
    341      
    342 Association rule inducers do not store evidence about which example supports 
    343 which rule. Let us write a function that finds the examples that 
    344 confirm the rule (fit both sides of it) and those that contradict it (fit the 
    345 left-hand side but not the right). The example:: 
    346  
    347     import Orange 
    348  
    349     data = Orange.data.Table("lenses") 
    350  
    351     rules = Orange.associate.AssociationRulesInducer(data, supp = 0.3) 
    352     rule = rules[0] 
    353  
    354     print 
    355     print "Rule: ", rule 
    356     print 
    357  
    358     print "Supporting examples:" 
    359     for example in data: 
    360         if rule.appliesBoth(example): 
    361             print example 
    362     print 
    363  
    364     print "Contradicting examples:" 
    365     for example in data: 
    366         if rule.applies_left(example) and not rule.applies_right(example): 
    367             print example 
    368     print 
    369  
    370 The latter printouts get simpler and faster if we instruct the inducer to 
    371 store the examples. We can then do, for instance, this: :: 
    372  
    373     print "Match left: " 
    374     print "\\n".join(str(rule.examples[i]) for i in rule.match_left) 
    375     print "\\nMatch both: " 
    376     print "\\n".join(str(rule.examples[i]) for i in rule.match_both) 
    377  
    378 The "contradicting" examples are then those whose indices are found in 
    379 match_left but not in match_both. The memory friendlier and the faster way 
    380 to compute this is as follows: :: 
    381  
    382     >>> [x for x in rule.match_left if not x in rule.match_both] 
    383     [0, 2, 8, 10, 16, 17, 18] 
    384     >>> set(rule.match_left) - set(rule.match_both) 
    385     set([0, 2, 8, 10, 16, 17, 18]) 
    386  
    387 =============== 
    388 Utilities 
    389 =============== 
    390  
    391 .. autofunction:: print_rules 
    392  
    393 .. autofunction:: sort 
    394  
    395 """ 
    396  
    3971from orange import \ 
    3982    AssociationRule, \ 
  • Orange/classification/knn.py

    r9724 r9994  
    136136into training (80%) and testing (20%) instances. We will use the former  
    137137for "training" the classifier and test it on five testing instances  
    138 randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`): 
     138randomly selected from a part of (:download:`knnlearner.py <code/knnlearner.py>`): 
    139139 
    140140.. literalinclude:: code/knnExample1.py 
     
    157157decide to do so, the distance_constructor must be set to an instance 
    158158of one of the classes for distance measuring. This can be seen in the following 
    159 part of (:download:`knnlearner.py <code/knnlearner.py>`, uses :download:`iris.tab <code/iris.tab>`): 
     159part of (:download:`knnlearner.py <code/knnlearner.py>`): 
    160160 
    161161.. literalinclude:: code/knnExample2.py 
     
    271271-------- 
    272272 
    273 The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`, uses :download:`lenses.tab <code/lenses.tab>`) 
     273The following script (:download:`knnInstanceDistance.py <code/knnInstanceDistance.py>`) 
    274274shows how to find the five nearest neighbors of the first instance 
    275275in the lenses dataset. 
  • Orange/classification/logreg.py

    r9936 r9959  
    188188        self.__dict__.update(kwds) 
    189189 
    190     def __call__(self, instance, resultType = Orange.classification.Classifier.GetValue): 
    191         # classification not implemented yet. For now its use is only to provide regression coefficients and its statistics 
    192         pass 
     190    def __call__(self, instance, result_type = Orange.classification.Classifier.GetValue): 
     191        # classification not implemented yet. For now its use is only to 
     192        # provide regression coefficients and its statistics 
     193        raise NotImplemented 
    193194     
    194195 
    195196class LogRegLearnerGetPriors(object): 
    196     def __new__(cls, instances=None, weightID=0, **argkw): 
     197    def __new__(cls, instances=None, weight_id=0, **argkw): 
    197198        self = object.__new__(cls) 
    198199        if instances: 
    199200            self.__init__(**argkw) 
    200             return self.__call__(instances, weightID) 
     201            return self.__call__(instances, weight_id) 
    201202        else: 
    202203            return self 
  • Orange/classification/lookup.py

    r9919 r9994  
    2121they usually reside in :obj:`~Orange.feature.Descriptor.get_value_from` fields of constructed 
    2222features to facilitate their automatic computation. For instance, 
    23 the following script shows how to translate the :download:`monks-1.tab <code/monks-1.tab>` data set 
     23the following script shows how to translate the `monks-1.tab` data set 
    2424features into a more useful subset that will only include the features 
    2525``a``, ``b``, ``e``, and features that will tell whether ``a`` and ``b`` are equal and 
    2626whether ``e`` is 1 (don't bother about the details, they follow later;  
    27 :download:`lookup-lookup.py <code/lookup-lookup.py>`, uses: :download:`monks-1.tab <code/monks-1.tab>`): 
     27:download:`lookup-lookup.py <code/lookup-lookup.py>`): 
    2828 
    2929.. literalinclude:: code/lookup-lookup.py 
     
    158158        Let's see some indices for randomly chosen examples from the original table. 
    159159         
    160         part of :download:`lookup-lookup.py <code/lookup-lookup.py>` (uses: :download:`monks-1.tab <code/monks-1.tab>`): 
     160        part of :download:`lookup-lookup.py <code/lookup-lookup.py>`: 
    161161 
    162162        .. literalinclude:: code/lookup-lookup.py 
     
    254254    is called and the resulting classifier is returned instead of the learner. 
    255255 
    256 part of :download:`lookup-table.py <code/lookup-table.py>` (uses: :download:`monks-1.tab <code/monks-1.tab>`): 
     256part of :download:`lookup-table.py <code/lookup-table.py>`: 
    257257 
    258258.. literalinclude:: code/lookup-table.py 
     
    323323the class_var. It doesn't set the :obj:`Orange.feature.Descriptor.get_value_from`, though. 
    324324 
    325 part of :download:`lookup-table.py <code/lookup-table.py>` (uses: :download:`monks-1.tab <code/monks-1.tab>`):: 
     325part of :download:`lookup-table.py <code/lookup-table.py>`:: 
    326326 
    327327    import Orange 
     
    336336alternative call arguments, it offers an easy way to observe feature 
    337337interactions. For this purpose, we shall omit e, and construct a 
    338 ClassifierByDataTable from a and b only (part of :download:`lookup-table.py <code/lookup-table.py>`; uses: :download:`monks-1.tab <code/monks-1.tab>`): 
     338ClassifierByDataTable from a and b only (part of :download:`lookup-table.py <code/lookup-table.py>`): 
    339339 
    340340.. literalinclude:: code/lookup-table.py 
     
    511511       
    512512 
    513 def lookup_from_data(examples, weight=0, learnerForUnknown=None): 
     513from Orange.misc import deprecated_keywords 
     514@deprecated_keywords({"learnerForUnknown":"learner_for_unknown"}) 
     515def lookup_from_data(examples, weight=0, learner_for_unknown=None): 
    514516    if len(examples.domain.attributes) <= 3: 
    515517        lookup = lookup_from_bound(examples.domain.class_var, 
     
    528530        # ClassifierByDataTable, let it deal with them 
    529531        return LookupLearner(examples, weight, 
    530                              learnerForUnknown=learnerForUnknown) 
     532                             learner_for_unknown=learner_for_unknown) 
    531533 
    532534    else: 
    533535        return LookupLearner(examples, weight, 
    534                              learnerForUnknown=learnerForUnknown) 
     536                             learner_for_unknown=learner_for_unknown) 
    535537         
    536538         
  • Orange/classification/majority.py

    r9671 r9994  
    6262This "learning algorithm" will most often be used as a baseline, 
    6363that is, to determine if some other learning algorithm provides 
    64 any information about the class (:download:`majority-classification.py <code/majority-classification.py>`, 
    65 uses: :download:`monks-1.tab <code/monks-1.tab>`): 
     64any information about the class (:download:`majority-classification.py <code/majority-classification.py>`): 
    6665 
    6766.. literalinclude:: code/majority-classification.py 
  • Orange/classification/rules.py

    r9936 r9994  
    3232Usage is consistent with typical learner usage in Orange: 
    3333 
    34 :download:`rules-cn2.py <code/rules-cn2.py>` (uses :download:`titanic.tab <code/titanic.tab>`) 
     34:download:`rules-cn2.py <code/rules-cn2.py>` 
    3535 
    3636.. literalinclude:: code/rules-cn2.py 
     
    155155in description of classes that follows it: 
    156156 
    157 part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`) 
     157part of :download:`rules-customized.py <code/rules-customized.py>` 
    158158 
    159159.. literalinclude:: code/rules-customized.py 
     
    181181different bean width. This is simply written as: 
    182182 
    183 part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`) 
     183part of :download:`rules-customized.py <code/rules-customized.py>` 
    184184 
    185185.. literalinclude:: code/rules-customized.py 
  • Orange/classification/wrappers.py

    r9671 r9961  
    55import Orange.evaluation.scoring 
    66 
     7from Orange.misc import deprecated_members 
     8 
    79class StepwiseLearner(Orange.core.Learner): 
    8   def __new__(cls, data=None, weightId=None, **kwargs): 
     10  def __new__(cls, data=None, weight_id=None, **kwargs): 
    911      self = Orange.core.Learner.__new__(cls, **kwargs) 
    1012      if data is not None: 
    1113          self.__init__(**kwargs) 
    12           return self(data, weightId) 
     14          return self(data, weight_id) 
    1315      else: 
    1416          return self 
    1517       
    1618  def __init__(self, **kwds): 
    17     self.removeThreshold = 0.3 
    18     self.addThreshold = 0.2 
     19    self.remove_threshold = 0.3 
     20    self.add_threshold = 0.2 
    1921    self.stat, self.statsign = scoring.CA, 1 
    20     self.__dict__.update(kwds) 
     22    for name, val in kwds.items(): 
     23        setattr(self, name, val) 
    2124 
    22   def __call__(self, examples, weightID = 0, **kwds): 
     25  def __call__(self, data, weight_id = 0, **kwds): 
    2326    import Orange.evaluation.testing, Orange.evaluation.scoring, statc 
    2427     
    2528    self.__dict__.update(kwds) 
    2629 
    27     if self.removeThreshold < self.addThreshold: 
    28         raise ValueError("'removeThreshold' should be larger or equal to 'addThreshold'") 
     30    if self.remove_threshold < self.add_threshold: 
     31        raise ValueError("'remove_threshold' should be larger or equal to 'add_threshold'") 
    2932 
    30     classVar = examples.domain.classVar 
     33    classVar = data.domain.classVar 
    3134     
    32     indices = Orange.core.MakeRandomIndicesCV(examples, folds = getattr(self, "folds", 10)) 
     35    indices = Orange.core.MakeRandomIndicesCV(data, folds = getattr(self, "folds", 10)) 
    3336    domain = Orange.data.Domain([], classVar) 
    3437 
    35     res = Orange.evaluation.testing.test_with_indices([self.learner], Orange.data.Table(domain, examples), indices) 
     38    res = Orange.evaluation.testing.test_with_indices([self.learner], Orange.data.Table(domain, data), indices) 
    3639     
    3740    oldStat = self.stat(res)[0] 
    38     oldStats = [self.stat(x)[0] for x in Orange.evaluation.scoring.splitByIterations(res)] 
     41    oldStats = [self.stat(x)[0] for x in Orange.evaluation.scoring.split_by_iterations(res)] 
    3942    print ".", oldStat, domain 
    4043    stop = False 
     
    4548            for attr in domain.attributes: 
    4649                newdomain = Orange.data.Domain(filter(lambda x: x!=attr, domain.attributes), classVar) 
    47                 res = Orange.evaluation.testing.test_with_indices([self.learner], (Orange.data.Table(newdomain, examples), weightID), indices) 
     50                res = Orange.evaluation.testing.test_with_indices([self.learner], (Orange.data.Table(newdomain, data), weight_id), indices) 
    4851                 
    4952                newStat = self.stat(res)[0] 
    50                 newStats = [self.stat(x)[0] for x in Orange.evaluation.scoring.splitByIterations(res)]  
     53                newStats = [self.stat(x)[0] for x in Orange.evaluation.scoring.split_by_iterations(res)]  
    5154                print "-", newStat, newdomain 
    5255                ## If stat has increased (ie newStat is better than bestStat) 
     
    5457                    if cmp(newStat, oldStat) == self.statsign: 
    5558                        bestStat, bestStats, bestAttr = newStat, newStats, attr 
    56                     elif statc.wilcoxont(oldStats, newStats)[1] > self.removeThreshold: 
     59                    elif statc.wilcoxont(oldStats, newStats)[1] > self.remove_threshold: 
    5760                            bestStat, bestAttr, bestStats = newStat, newStats, attr 
    5861            if bestStat: 
     
    6366 
    6467        bestStat, bestAttr = oldStat, None 
    65         for attr in examples.domain.attributes: 
     68        for attr in data.domain.attributes: 
    6669            if not attr in domain.attributes: 
    6770                newdomain = Orange.data.Domain(domain.attributes + [attr], classVar) 
    68                 res = Orange.evaluation.testing.test_with_indices([self.learner], (Orange.data.Table(newdomain, examples), weightID), indices) 
     71                res = Orange.evaluation.testing.test_with_indices([self.learner], (Orange.data.Table(newdomain, data), weight_id), indices) 
    6972                 
    7073                newStat = self.stat(res)[0] 
    71                 newStats = [self.stat(x)[0] for x in Orange.evaluation.scoring.splitByIterations(res)]  
     74                newStats = [self.stat(x)[0] for x in Orange.evaluation.scoring.split_by_iterations(res)]  
    7275                print "+", newStat, newdomain 
    7376 
    7477                ## If stat has increased (ie newStat is better than bestStat) 
    75                 if cmp(newStat, bestStat) == self.statsign and statc.wilcoxont(oldStats, newStats)[1] < self.addThreshold: 
     78                if cmp(newStat, bestStat) == self.statsign and statc.wilcoxont(oldStats, newStats)[1] < self.add_threshold: 
    7679                    bestStat, bestStats, bestAttr = newStat, newStats, attr 
    7780        if bestAttr: 
     
    8184            print "added", bestAttr.name 
    8285 
    83     return self.learner(Orange.data.Table(domain, examples), weightID) 
     86    return self.learner(Orange.data.Table(domain, data), weight_id) 
    8487 
     88StepwiseLearner = deprecated_members( 
     89                    {"removeThreshold": "remove_threshold", 
     90                     "addThreshold": "add_threshold"}, 
     91                    )(StepwiseLearner) 
  • Orange/clustering/kmeans.py

    r9725 r9994  
    1616 
    1717The following code runs k-means clustering and prints out the cluster indexes 
    18 for the last 10 data instances (:download:`kmeans-run.py <code/kmeans-run.py>`, uses :download:`iris.tab <code/iris.tab>`): 
     18for the last 10 data instances (:download:`kmeans-run.py <code/kmeans-run.py>`): 
    1919 
    2020.. literalinclude:: code/kmeans-run.py 
     
    2929o be computed at each iteration we have to set :obj:`minscorechange`, but we can 
    3030leave it at 0 or even set it to a negative value, which allows the score to deteriorate 
    31 by some amount (:download:`kmeans-run-callback.py <code/kmeans-run-callback.py>`, uses :download:`iris.tab <code/iris.tab>`): 
     31by some amount (:download:`kmeans-run-callback.py <code/kmeans-run-callback.py>`): 
    3232 
    3333.. literalinclude:: code/kmeans-run-callback.py 
     
    4444    Iteration: 8, changes: 0, score: 9.8624 
    4545 
    46 Call-back above is used for reporting of the progress, but may as well call a function that plots a selection data projection with corresponding centroid at a given step of the clustering. This is exactly what we did with the following script (:download:`kmeans-trace.py <code/kmeans-trace.py>`, uses :download:`iris.tab <code/iris.tab>`): 
     46Call-back above is used for reporting of the progress, but may as well call a function that plots a selection data projection with corresponding centroid at a given step of the clustering. This is exactly what we did with the following script (:download:`kmeans-trace.py <code/kmeans-trace.py>`): 
    4747 
    4848.. literalinclude:: code/kmeans-trace.py 
     
    8282and finds more optimal centroids. The following code compares three different  
    8383initialization methods (random, diversity-based and hierarchical clustering-based)  
    84 in terms of how fast they converge (:download:`kmeans-cmp-init.py <code/kmeans-cmp-init.py>`, uses :download:`iris.tab <code/iris.tab>`, 
    85 :download:`housing.tab <code/housing.tab>`, :download:`vehicle.tab <code/vehicle.tab>`): 
     84in terms of how fast they converge (:download:`kmeans-cmp-init.py <code/kmeans-cmp-init.py>`): 
    8685 
    8786.. literalinclude:: code/kmeans-cmp-init.py 
     
    9695 
    9796The following code computes the silhouette score for k=2..7 and plots a  
    98 silhuette plot for k=3 (:download:`kmeans-silhouette.py <code/kmeans-silhouette.py>`, uses :download:`iris.tab <code/iris.tab>`): 
     97silhuette plot for k=3 (:download:`kmeans-silhouette.py <code/kmeans-silhouette.py>`): 
    9998 
    10099.. literalinclude:: code/kmeans-silhouette.py 
     
    176175score_distance_to_centroids.minimize = True 
    177176 
    178 def score_conditionalEntropy(km): 
     177def score_conditional_entropy(km): 
    179178    """UNIMPLEMENTED cluster quality measured by conditional entropy""" 
    180     pass 
    181  
    182 def score_withinClusterDistance(km): 
     179    raise NotImplemented 
     180 
     181def score_within_cluster_distance(km): 
    183182    """UNIMPLEMENTED weighted average within-cluster pairwise distance""" 
    184     pass 
    185  
    186 score_withinClusterDistance.minimize = True 
    187  
    188 def score_betweenClusterDistance(km): 
     183    raise NotImplemented 
     184 
     185score_within_cluster_distance.minimize = True 
     186 
     187def score_between_cluster_distance(km): 
    189188    """Sum of distances from elements to 'nearest miss' centroids""" 
    190189    return sum(min(km.distance(c, d) for j,c in enumerate(km.centroids) if j!=km.clusters[i]) for i,d in enumerate(km.data)) 
     190 
     191from Orange.misc import deprecated_function_name 
     192score_betweenClusterDistance = deprecated_function_name(score_between_cluster_distance) 
    191193 
    192194def score_silhouette(km, index=None): 
  • Orange/clustering/mixture.py

    r9919 r9976  
    277277    """ Computes the gaussian mixture model from an Orange data-set. 
    278278    """ 
    279     def __new__(cls, data=None, weightId=None, **kwargs): 
     279    def __new__(cls, data=None, weight_id=None, **kwargs): 
    280280        self = object.__new__(cls) 
    281281        if data is not None: 
    282282            self.__init__(**kwargs) 
    283             return self.__call__(data, weightId) 
     283            return self.__call__(data, weight_id) 
    284284        else: 
    285285            return self 
     
    289289        self.init_function = init_function 
    290290         
    291     def __call__(self, data, weightId=None): 
     291    def __call__(self, data, weight_id=None): 
    292292        from Orange.preprocess import Preprocessor_impute, DomainContinuizer 
    293293#        data = Preprocessor_impute(data) 
  • Orange/data/sample.py

    r9697 r9994  
    265265Let us construct a list of indices that would assign half of examples 
    266266to the first set and a quarter to the second and third (part of 
    267 :download:`randomindicesn.py <code/randomindicesn.py>`, uses :download:`lenses.tab <code/lenses.tab>`): 
     267:download:`randomindicesn.py <code/randomindicesn.py>`): 
    268268 
    269269.. literalinclude:: code/randomindicesn.py 
     
    292292indices for 10 examples for 5-fold cross validation. For the latter, 
    293293we shall only pass the number of examples, which, of course, prevents 
    294 the stratification. Part of :download:`randomindicescv.py <code/randomindicescv.py>`, uses :download:`lenses.tab <code/lenses.tab>`): 
     294the stratification. Part of :download:`randomindicescv.py <code/randomindicescv.py>`): 
    295295 
    296296.. literalinclude:: code/randomindicescv.py 
  • Orange/data/utils.py

    r9936 r9986  
    1 """\ 
    2 ************************** 
    3 Data Utilities (``utils``) 
    4 ************************** 
    5  
    6 Common operations on :class:`Orange.data.Table`. 
    7  
    8 """ 
    91#from __future__ import absolute_import 
     2 
     3from Orange.core import TransformValue, \ 
     4    Ordinal2Continuous, \ 
     5    Discrete2Continuous, \ 
     6    NormalizeContinuous, \ 
     7    MapIntValue 
     8 
    109 
    1110import random 
  • Orange/ensemble/__init__.py

    r9671 r9994  
    4646validation and observe classification accuracy. 
    4747 
    48 :download:`ensemble.py <code/ensemble.py>` (uses :download:`lymphography.tab <code/lymphography.tab>`) 
     48:download:`ensemble.py <code/ensemble.py>` 
    4949 
    5050.. literalinclude:: code/ensemble.py 
     
    8282to a tree learner on a liver disorder (bupa) and housing data sets. 
    8383 
    84 :download:`ensemble-forest.py <code/ensemble-forest.py>` (uses :download:`bupa.tab <code/bupa.tab>`, :download:`housing.tab <code/housing.tab>`) 
     84:download:`ensemble-forest.py <code/ensemble-forest.py>` 
    8585 
    8686.. literalinclude:: code/ensemble-forest.py 
     
    106106and minExamples are both set to 5. 
    107107 
    108 :download:`ensemble-forest2.py <code/ensemble-forest2.py>` (uses :download:`bupa.tab <code/bupa.tab>`) 
     108:download:`ensemble-forest2.py <code/ensemble-forest2.py>` 
    109109 
    110110.. literalinclude:: code/ensemble-forest2.py 
     
    144144:class:`Orange.data.Table` for details). 
    145145 
    146 :download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>` (uses :download:`iris.tab <code/iris.tab>`) 
     146:download:`ensemble-forest-measure.py <code/ensemble-forest-measure.py>` 
    147147 
    148148.. literalinclude:: code/ensemble-forest-measure.py 
  • Orange/evaluation/scoring.py

    r9892 r10005  
    44 
    55import Orange 
    6 from Orange import statc 
     6from Orange import statc, corn 
    77from Orange.misc import deprecated_keywords 
     8from Orange.evaluation import testing 
    89 
    910#### Private stuff 
     
    126127MAE = ME 
    127128 
     129 
     130class ConfusionMatrix: 
     131    """ 
     132    Classification result summary 
     133 
     134    .. attribute:: TP 
     135 
     136        True Positive predictions 
     137 
     138    .. attribute:: TN 
     139 
     140        True Negative predictions 
     141 
     142    .. attribute:: FP 
     143 
     144        False Positive predictions 
     145 
     146    .. attribute:: FN 
     147 
     148        False Negative predictions 
     149    """ 
     150    def __init__(self): 
     151        self.TP = self.FN = self.FP = self.TN = 0.0 
     152 
     153    @deprecated_keywords({"predictedPositive": "predicted_positive", 
     154                          "isPositive": "is_positive"}) 
     155    def addTFPosNeg(self, predicted_positive, is_positive, weight = 1.0): 
     156        """ 
     157        Update confusion matrix with result of a single classification 
     158 
     159        :param predicted_positive: positive class value was predicted 
     160        :param is_positive: correct class value is positive 
     161        :param weight: weight of the selected instance 
     162         """ 
     163        if predicted_positive: 
     164            if is_positive: 
     165                self.TP += weight 
     166            else: 
     167                self.FP += weight 
     168        else: 
     169            if is_positive: 
     170                self.FN += weight 
     171            else: 
     172                self.TN += weight 
     173 
     174 
    128175######################################################################### 
    129176# PERFORMANCE MEASURES: 
     
    305352# Scores for evaluation of classifiers 
    306353 
    307 @deprecated_keywords({"reportSE": "report_se"}) 
    308 def CA(res, report_se = False, **argkw): 
    309     """ Computes classification accuracy, i.e. percentage of matches between 
    310     predicted and actual class. The function returns a list of classification 
    311     accuracies of all classifiers tested. If reportSE is set to true, the list 
    312     will contain tuples with accuracies and standard errors. 
    313      
    314     If results are from multiple repetitions of experiments (like those 
    315     returned by Orange.evaluation.testing.crossValidation or 
    316     Orange.evaluation.testing.proportionTest) the 
    317     standard error (SE) is estimated from deviation of classification 
    318     accuracy accross folds (SD), as SE = SD/sqrt(N), where N is number 
    319     of repetitions (e.g. number of folds). 
    320      
    321     If results are from a single repetition, we assume independency of 
    322     instances and treat the classification accuracy as distributed according 
    323     to binomial distribution. This can be approximated by normal distribution, 
    324     so we report the SE of sqrt(CA*(1-CA)/N), where CA is classification 
    325     accuracy and N is number of test instances. 
    326      
    327     Instead of ExperimentResults, this function can be given a list of 
    328     confusion matrices (see below). Standard errors are in this case 
    329     estimated using the latter method. 
    330     """ 
    331     if res.number_of_iterations==1: 
    332         if type(res)==ConfusionMatrix: 
    333             div = nm.TP+nm.FN+nm.FP+nm.TN 
    334             check_non_zero(div) 
    335             ca = [(nm.TP+nm.TN)/div] 
    336         else: 
    337             CAs = [0.0]*res.number_of_learners 
    338             if argkw.get("unweighted", 0) or not res.weights: 
    339                 totweight = gettotsize(res) 
    340                 for tex in res.results: 
    341                     CAs = map(lambda res, cls: res+(cls==tex.actual_class), CAs, tex.classes) 
    342             else: 
    343                 totweight = 0. 
    344                 for tex in res.results: 
    345                     CAs = map(lambda res, cls: res+(cls==tex.actual_class and tex.weight), CAs, tex.classes) 
    346                     totweight += tex.weight 
    347             check_non_zero(totweight) 
    348             ca = [x/totweight for x in CAs] 
    349              
     354class CAClass(object): 
     355    CONFUSION_MATRIX = 0 
     356    CONFUSION_MATRIX_LIST = 1 
     357    CLASSIFICATION = 2 
     358    CROSS_VALIDATION = 3 
     359 
     360    @deprecated_keywords({"reportSE": "report_se"}) 
     361    def __call__(self, test_results, report_se = False, unweighted=False): 
     362        """Return percentage of matches between predicted and actual class. 
     363 
     364        :param test_results: :obj:`~Orange.evaluation.testing.ExperimentResults` 
     365                             or :obj:`ConfusionMatrix`. 
     366        :param report_se: include standard error in result. 
     367        :rtype: list of scores, one for each learner. 
     368 
     369        Standard errors are estimated from deviation of CAs across folds (if 
     370        test_results were produced by cross_validation) or approximated under 
     371        the assumption of normal distribution otherwise. 
     372        """ 
     373        input_type = self.get_input_type(test_results) 
     374        if input_type == self.CONFUSION_MATRIX: 
     375            return self.from_confusion_matrix(test_results, report_se) 
     376        elif input_type == self.CONFUSION_MATRIX_LIST: 
     377            return self.from_confusion_matrix_list(test_results, report_se) 
     378        elif input_type == self.CLASSIFICATION: 
     379            return self.from_classification_results( 
     380                                        test_results, report_se, unweighted) 
     381        elif input_type == self.CROSS_VALIDATION: 
     382            return self.from_crossvalidation_results( 
     383                                        test_results, report_se, unweighted) 
     384 
     385    def from_confusion_matrix(self, cm, report_se): 
     386        all_predictions = cm.TP+cm.FN+cm.FP+cm.TN 
     387        check_non_zero(all_predictions) 
     388        ca = (cm.TP+cm.TN)/all_predictions 
     389 
     390        if report_se: 
     391            return ca, ca*(1-ca)/math.sqrt(all_predictions) 
     392        else: 
     393            return ca 
     394 
     395    def from_confusion_matrix_list(self, confusion_matrices, report_se): 
     396        return map(self.from_confusion_matrix, confusion_matrices) # TODO: report_se 
     397 
     398    def from_classification_results(self, test_results, report_se, unweighted): 
     399        CAs = [0.0]*test_results.number_of_learners 
     400        totweight = 0. 
     401        for tex in test_results.results: 
     402            w = 1. if unweighted else tex.weight 
     403            CAs = map(lambda res, cls: res+(cls==tex.actual_class and w), CAs, tex.classes) 
     404            totweight += w 
     405        check_non_zero(totweight) 
     406        ca = [x/totweight for x in CAs] 
     407 
    350408        if report_se: 
    351409            return [(x, x*(1-x)/math.sqrt(totweight)) for x in ca] 
    352410        else: 
    353411            return ca 
    354          
    355     else: 
    356         CAsByFold = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)] 
    357         foldN = [0.0]*res.number_of_iterations 
    358  
    359         if argkw.get("unweighted", 0) or not res.weights: 
    360             for tex in res.results: 
    361                 for lrn in range(res.number_of_learners): 
    362                     CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) 
    363                 foldN[tex.iteration_number] += 1 
    364         else: 
    365             for tex in res.results: 
    366                 for lrn in range(res.number_of_learners): 
    367                     CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) and tex.weight 
    368                 foldN[tex.iteration_number] += tex.weight 
     412 
     413    def from_crossvalidation_results(self, test_results, report_se, unweighted): 
     414        CAsByFold = [[0.0]*test_results.number_of_iterations for i in range(test_results.number_of_learners)] 
     415        foldN = [0.0]*test_results.number_of_iterations 
     416 
     417        for tex in test_results.results: 
     418            w = 1. if unweighted else tex.weight 
     419            for lrn in range(test_results.number_of_learners): 
     420                CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) and w 
     421            foldN[tex.iteration_number] += w 
    369422 
    370423        return statistics_by_folds(CAsByFold, foldN, report_se, False) 
    371424 
    372  
    373 # Obsolete, but kept for compatibility 
    374 def CA_se(res, **argkw): 
    375     return CA(res, True, **argkw) 
     425    def get_input_type(self, test_results): 
     426        if isinstance(test_results, ConfusionMatrix): 
     427            return self.CONFUSION_MATRIX 
     428        elif isinstance(test_results, testing.ExperimentResults): 
     429            if test_results.number_of_iterations == 1: 
     430                return self.CLASSIFICATION 
     431            else: 
     432                return self.CROSS_VALIDATION 
     433        elif isinstance(test_results, list): 
     434            return self.CONFUSION_MATRIX_LIST 
     435 
     436 
     437 
     438CA = CAClass() 
    376439 
    377440@deprecated_keywords({"reportSE": "report_se"}) 
     
    562625    else: 
    563626        return apply(Friedman, (res, statistics), argkw) 
    564      
    565 class ConfusionMatrix: 
    566     """ Class ConfusionMatrix stores data about false and true 
    567     predictions compared to real class. It stores the number of 
    568     True Negatives, False Positive, False Negatives and True Positives. 
    569     """ 
    570     def __init__(self): 
    571         self.TP = self.FN = self.FP = self.TN = 0.0 
    572  
    573     def addTFPosNeg(self, predictedPositive, isPositive, weight = 1.0): 
    574         if predictedPositive: 
    575             if isPositive: 
    576                 self.TP += weight 
    577             else: 
    578                 self.FP += weight 
    579         else: 
    580             if isPositive: 
    581                 self.FN += weight 
    582             else: 
    583                 self.TN += weight 
    584  
    585  
    586 @deprecated_keywords({"classIndex": "class_index"}) 
    587 def confusion_matrices(res, class_index=-1, **argkw): 
    588     """ This function can compute two different forms of confusion matrix: 
    589     one in which a certain class is marked as positive and the other(s) 
    590     negative, and another in which no class is singled out. The way to 
    591     specify what we want is somewhat confusing due to backward 
    592     compatibility issues. 
    593     """ 
    594     tfpns = [ConfusionMatrix() for i in range(res.number_of_learners)] 
     627 
     628 
     629@deprecated_keywords({"res": "test_results", 
     630                      "classIndex": "class_index"}) 
     631def confusion_matrices(test_results, class_index=-1, 
     632                       unweighted=False, cutoff=.5): 
     633    """ 
     634    Return confusion matrices for test_results. 
     635 
     636    :param test_results: test results 
     637    :param class_index: index of class value for which the confusion matrices 
     638                        are to be computed. 
     639    :param unweighted: ignore instance weights. 
     640    :params cutoff: cutoff for probability 
     641 
     642    :rtype: list of :obj:`ConfusionMatrix` 
     643    """ 
     644    tfpns = [ConfusionMatrix() for i in range(test_results.number_of_learners)] 
    595645     
    596646    if class_index<0: 
    597         numberOfClasses = len(res.class_values) 
     647        numberOfClasses = len(test_results.class_values) 
    598648        if class_index < -1 or numberOfClasses > 2: 
    599             cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.number_of_learners)] 
    600             if argkw.get("unweighted", 0) or not res.weights: 
    601                 for tex in res.results: 
     649            cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(test_results.number_of_learners)] 
     650            if unweighted or not test_results.weights: 
     651                for tex in test_results.results: 
    602652                    trueClass = int(tex.actual_class) 
    603653                    for li, pred in enumerate(tex.classes): 
     
    606656                            cm[li][trueClass][predClass] += 1 
    607657            else: 
    608                 for tex in enumerate(res.results): 
     658                for tex in enumerate(test_results.results): 
    609659                    trueClass = int(tex.actual_class) 
    610660                    for li, pred in tex.classes: 
     
    614664            return cm 
    615665             
    616         elif res.baseClass>=0: 
    617             class_index = res.baseClass 
     666        elif test_results.baseClass>=0: 
     667            class_index = test_results.baseClass 
    618668        else: 
    619669            class_index = 1 
    620              
    621     cutoff = argkw.get("cutoff") 
    622     if cutoff: 
    623         if argkw.get("unweighted", 0) or not res.weights: 
    624             for lr in res.results: 
     670 
     671    if cutoff != .5: 
     672        if unweighted or not test_results.weights: 
     673            for lr in test_results.results: 
    625674                isPositive=(lr.actual_class==class_index) 
    626                 for i in range(res.number_of_learners): 
     675                for i in range(test_results.number_of_learners): 
    627676                    tfpns[i].addTFPosNeg(lr.probabilities[i][class_index]>cutoff, isPositive) 
    628677        else: 
    629             for lr in res.results: 
     678            for lr in test_results.results: 
    630679                isPositive=(lr.actual_class==class_index) 
    631                 for i in range(res.number_of_learners): 
     680                for i in range(test_results.number_of_learners): 
    632681                    tfpns[i].addTFPosNeg(lr.probabilities[i][class_index]>cutoff, isPositive, lr.weight) 
    633682    else: 
    634         if argkw.get("unweighted", 0) or not res.weights: 
    635             for lr in res.results: 
     683        if unweighted or not test_results.weights: 
     684            for lr in test_results.results: 
    636685                isPositive=(lr.actual_class==class_index) 
    637                 for i in range(res.number_of_learners): 
     686                for i in range(test_results.number_of_learners): 
    638687                    tfpns[i].addTFPosNeg(lr.classes[i]==class_index, isPositive) 
    639688        else: 
    640             for lr in res.results: 
     689            for lr in test_results.results: 
    641690                isPositive=(lr.actual_class==class_index) 
    642                 for i in range(res.number_of_learners): 
     691                for i in range(test_results.number_of_learners): 
    643692                    tfpns[i].addTFPosNeg(lr.classes[i]==class_index, isPositive, lr.weight) 
    644693    return tfpns 
     
    651700@deprecated_keywords({"confusionMatrix": "confusion_matrix"}) 
    652701def confusion_chi_square(confusion_matrix): 
     702    """ 
     703    Return chi square statistic of the confusion matrix 
     704    (higher value indicates that prediction is not by chance). 
     705    """ 
     706    if isinstance(confusion_matrix, ConfusionMatrix) or \ 
     707       not isinstance(confusion_matrix[1], list): 
     708        return _confusion_chi_square(confusion_matrix) 
     709    else: 
     710        return map(_confusion_chi_square, confusion_matrix) 
     711 
     712def _confusion_chi_square(confusion_matrix): 
     713    if isinstance(confusion_matrix, ConfusionMatrix): 
     714        c = confusion_matrix 
     715        confusion_matrix = [[c.TP, c.FN], [c.FP, c.TN]] 
    653716    dim = len(confusion_matrix) 
    654717    rowPriors = [sum(r) for r in confusion_matrix] 
    655     colPriors = [sum([r[i] for r in confusion_matrix]) for i in range(dim)] 
     718    colPriors = [sum(r[i] for r in confusion_matrix) for i in range(dim)] 
    656719    total = sum(rowPriors) 
    657720    rowPriors = [r/total for r in rowPriors] 
     
    666729    df = (dim-1)**2 
    667730    return ss, df, statc.chisqprob(ss, df) 
    668          
    669      
    670 def sens(confm): 
    671     """Return sensitivity (recall rate) over the given confusion matrix.""" 
    672     if type(confm) == list: 
    673         return [sens(cm) for cm in confm] 
    674     else: 
    675         tot = confm.TP+confm.FN 
     731 
     732@deprecated_keywords({"confm": "confusion_matrix"}) 
     733def sens(confusion_matrix): 
     734    """ 
     735    Return `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_and_specificity>`_ 
     736    (proportion of actual positives which are correctly identified as such). 
     737    """ 
     738    if type(confusion_matrix) == list: 
     739        return [sens(cm) for cm in confusion_matrix] 
     740    else: 
     741        tot = confusion_matrix.TP+confusion_matrix.FN 
    676742        if tot < 1e-6: 
    677743            import warnings 
     
    679745            return -1 
    680746 
    681         return confm.TP/tot 
    682  
    683 def recall(confm): 
    684     """Return recall rate (sensitivity) over the given confusion matrix.""" 
    685     return sens(confm) 
    686  
    687  
    688 def spec(confm): 
    689     """Return specificity over the given confusion matrix.""" 
    690     if type(confm) == list: 
    691         return [spec(cm) for cm in confm] 
    692     else: 
    693         tot = confm.FP+confm.TN 
     747        return confusion_matrix.TP/tot 
     748 
     749 
     750@deprecated_keywords({"confm": "confusion_matrix"}) 
     751def recall(confusion_matrix): 
     752    """ 
     753    Return `recall <http://en.wikipedia.org/wiki/Precision_and_recall>`_ 
     754    (fraction of relevant instances that are retrieved). 
     755    """ 
     756    return sens(confusion_matrix) 
     757 
     758 
     759@deprecated_keywords({"confm": "confusion_matrix"}) 
     760def spec(confusion_matrix): 
     761    """ 
     762    Return `specificity <http://en.wikipedia.org/wiki/Sensitivity_and_specificity>`_ 
     763    (proportion of negatives which are correctly identified). 
     764    """ 
     765    if type(confusion_matrix) == list: 
     766        return [spec(cm) for cm in confusion_matrix] 
     767    else: 
     768        tot = confusion_matrix.FP+confusion_matrix.TN 
    694769        if tot < 1e-6: 
    695770            import warnings 
    696771            warnings.warn("Can't compute specificity: one or both classes have no instances") 
    697772            return -1 
    698         return confm.TN/tot 
    699    
    700  
    701 def PPV(confm): 
    702     """Return positive predictive value (precision rate) over the given confusion matrix.""" 
    703     if type(confm) == list: 
    704         return [PPV(cm) for cm in confm] 
    705     else: 
    706         tot = confm.TP+confm.FP 
     773        return confusion_matrix.TN/tot 
     774 
     775 
     776@deprecated_keywords({"confm": "confusion_matrix"}) 
     777def PPV(confusion_matrix): 
     778    """ 
     779    Return `positive predictive value <http://en.wikipedia.org/wiki/Positive_predictive_value>`_ 
     780    (proportion of subjects with positive test results who are correctly diagnosed).""" 
     781    if type(confusion_matrix) == list: 
     782        return [PPV(cm) for cm in confusion_matrix] 
     783    else: 
     784        tot = confusion_matrix.TP+confusion_matrix.FP 
    707785        if tot < 1e-6: 
    708786            import warnings 
    709787            warnings.warn("Can't compute PPV: one or both classes have no instances") 
    710788            return -1 
    711         return confm.TP/tot 
    712  
    713  
    714 def precision(confm): 
    715     """Return precision rate (positive predictive value) over the given confusion matrix.""" 
    716     return PPV(confm) 
    717  
    718  
    719 def NPV(confm): 
    720     """Return negative predictive value over the given confusion matrix.""" 
    721     if type(confm) == list: 
    722         return [NPV(cm) for cm in confm] 
    723     else: 
    724         tot = confm.FN+confm.TN 
     789        return confusion_matrix.TP/tot 
     790 
     791 
     792@deprecated_keywords({"confm": "confusion_matrix"}) 
     793def precision(confusion_matrix): 
     794    """ 
     795    Return `precision <http://en.wikipedia.org/wiki/Precision_and_recall>`_ 
     796    (retrieved instances that are relevant). 
     797    """ 
     798    return PPV(confusion_matrix) 
     799 
     800@deprecated_keywords({"confm": "confusion_matrix"}) 
     801def NPV(confusion_matrix): 
     802    """Return `negative predictive value <http://en.wikipedia.org/wiki/Negative_predictive_value>`_ 
     803     (proportion of subjects with a negative test result who are correctly 
     804     diagnosed). 
     805     """ 
     806    if type(confusion_matrix) == list: 
     807        return [NPV(cm) for cm in confusion_matrix] 
     808    else: 
     809        tot = confusion_matrix.FN+confusion_matrix.TN 
    725810        if tot < 1e-6: 
    726811            import warnings 
    727812            warnings.warn("Can't compute NPV: one or both classes have no instances") 
    728813            return -1 
    729         return confm.TN/tot 
    730  
    731 def F1(confm): 
    732     """Return F1 score (harmonic mean of precision and recall) over the given confusion matrix.""" 
    733     if type(confm) == list: 
    734         return [F1(cm) for cm in confm] 
    735     else: 
    736         p = precision(confm) 
    737         r = recall(confm) 
     814        return confusion_matrix.TN/tot 
     815 
     816@deprecated_keywords({"confm": "confusion_matrix"}) 
     817def F1(confusion_matrix): 
     818    """Return `F1 score <http://en.wikipedia.org/wiki/F1_score>`_ 
     819    (harmonic mean of precision and recall).""" 
     820    if type(confusion_matrix) == list: 
     821        return [F1(cm) for cm in confusion_matrix] 
     822    else: 
     823        p = precision(confusion_matrix) 
     824        r = recall(confusion_matrix) 
    738825        if p + r > 0: 
    739826            return 2. * p * r / (p + r) 
     
    743830            return -1 
    744831 
    745 def Falpha(confm, alpha=1.0): 
     832 
     833@deprecated_keywords({"confm": "confusion_matrix"}) 
     834def Falpha(confusion_matrix, alpha=1.0): 
    746835    """Return the alpha-mean of precision and recall over the given confusion matrix.""" 
    747     if type(confm) == list: 
    748         return [Falpha(cm, alpha=alpha) for cm in confm] 
    749     else: 
    750         p = precision(confm) 
    751         r = recall(confm) 
     836    if type(confusion_matrix) == list: 
     837        return [Falpha(cm, alpha=alpha) for cm in confusion_matrix] 
     838    else: 
     839        p = precision(confusion_matrix) 
     840        r = recall(confusion_matrix) 
    752841        return (1. + alpha) * p * r / (alpha * p + r) 
    753      
    754 def MCC(confm): 
    755     ''' 
    756     Return Mattew correlation coefficient over the given confusion matrix. 
    757  
    758     MCC is calculated as follows: 
    759     MCC = (TP*TN - FP*FN) / sqrt( (TP+FP)*(TP+FN)*(TN+FP)*(TN+FN) ) 
    760      
    761     [1] Matthews, B.W., Comparison of the predicted and observed secondary  
    762     structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442-451 
    763  
    764     code by Boris Gorelik 
    765     ''' 
    766     if type(confm) == list: 
    767         return [MCC(cm) for cm in confm] 
    768     else: 
    769         truePositive = confm.TP 
    770         trueNegative = confm.TN 
    771         falsePositive = confm.FP 
    772         falseNegative = confm.FN  
     842 
     843 
     844@deprecated_keywords({"confm": "confusion_matrix"}) 
     845def MCC(confusion_matrix): 
     846    """ 
     847    Return `Matthew correlation coefficient <http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_ 
     848    (correlation coefficient between the observed and predicted binary classifications) 
     849    """ 
     850    # code by Boris Gorelik 
     851    if type(confusion_matrix) == list: 
     852        return [MCC(cm) for cm in confusion_matrix] 
     853    else: 
     854        truePositive = confusion_matrix.TP 
     855        trueNegative = confusion_matrix.TN 
     856        falsePositive = confusion_matrix.FP 
     857        falseNegative = confusion_matrix.FN 
    773858           
    774859        try:    
     
    791876 
    792877@deprecated_keywords({"bIsListOfMatrices": "b_is_list_of_matrices"}) 
    793 def scotts_pi(confm, b_is_list_of_matrices=True): 
     878def scotts_pi(confusion_matrix, b_is_list_of_matrices=True): 
    794879   """Compute Scott's Pi for measuring inter-rater agreement for nominal data 
    795880 
     
    798883   raters. 
    799884 
    800    @param confm: confusion matrix, or list of confusion matrices. To obtain 
     885   @param confusion_matrix: confusion matrix, or list of confusion matrices. To obtain 
    801886                           non-binary confusion matrix, call 
    802887                           Orange.evaluation.scoring.compute_confusion_matrices and set the 
     
    811896   if b_is_list_of_matrices: 
    812897       try: 
    813            return [scotts_pi(cm, b_is_list_of_matrices=False) for cm in confm] 
     898           return [scotts_pi(cm, b_is_list_of_matrices=False) for cm in confusion_matrix] 
    814899       except TypeError: 
    815900           # Nevermind the parameter, maybe this is a "conventional" binary 
    816901           # confusion matrix and bIsListOfMatrices was specified by mistake 
    817            return scottsPiSingle(confm, bIsListOfMatrices=False) 
     902           return scottsPiSingle(confusion_matrix, bIsListOfMatrices=False) 
    818903   else: 
    819        if isinstance(confm, ConfusionMatrix): 
    820            confm = numpy.array( [[confm.TP, confm.FN], 
    821                    [confm.FP, confm.TN]], dtype=float) 
     904       if isinstance(confusion_matrix, ConfusionMatrix): 
     905           confusion_matrix = numpy.array( [[confusion_matrix.TP, confusion_matrix.FN], 
     906                   [confusion_matrix.FP, confusion_matrix.TN]], dtype=float) 
    822907       else: 
    823            confm = numpy.array(confm, dtype=float) 
    824  
    825        marginalSumOfRows = numpy.sum(confm, axis=0) 
    826        marginalSumOfColumns = numpy.sum(confm, axis=1) 
     908           confusion_matrix = numpy.array(confusion_matrix, dtype=float) 
     909 
     910       marginalSumOfRows = numpy.sum(confusion_matrix, axis=0) 
     911       marginalSumOfColumns = numpy.sum(confusion_matrix, axis=1) 
    827912       jointProportion = (marginalSumOfColumns + marginalSumOfRows)/ \ 
    828                            (2.0 * numpy.sum(confm, axis=None)) 
     913                           (2.0 * numpy.sum(confusion_matrix, axis=None)) 
    829914       # In the eq. above, 2.0 is what the Wikipedia page calls 
    830915       # the number of annotators. Here we have two annotators: 
     
    833918 
    834919       prExpected = numpy.sum(jointProportion ** 2, axis=None) 
    835        prActual = numpy.sum(numpy.diag(confm), axis=None)/numpy.sum(confm, axis=None) 
     920       prActual = numpy.sum(numpy.diag(confusion_matrix), axis=None)/numpy.sum(confusion_matrix, axis=None) 
    836921 
    837922       ret = (prActual - prExpected) / (1.0 - prExpected) 
     
    846931    tuples (aROC, standard error). 
    847932    """ 
    848     import corn 
    849933    useweights = res.weights and not argkw.get("unweighted", 0) 
    850934    problists, tots = corn.computeROCCumulative(res, class_index, useweights) 
     
    879963@deprecated_keywords({"classIndex": "class_index"}) 
    880964def compare_2_AUCs(res, lrn1, lrn2, class_index=-1, **argkw): 
    881     import corn 
    882965    return corn.compare2ROCs(res, lrn1, lrn2, class_index, res.weights and not argkw.get("unweighted")) 
    883966 
     
    890973    1-specificity and y is sensitivity. 
    891974    """ 
    892     import corn 
    893975    problists, tots = corn.computeROCCumulative(res, class_index) 
    894976 
     
    9261008@deprecated_keywords({"keepConcavities": "keep_concavities"}) 
    9271009def ROC_add_point(P, R, keep_concavities=1): 
    928     if keepConcavities: 
     1010    if keep_concavities: 
    9291011        R.append(P) 
    9301012    else: 
     
    9461028                      "keepConcavities": "keep_concavities"}) 
    9471029def TC_compute_ROC(res, class_index=-1, keep_concavities=1): 
    948     import corn 
    9491030    problists, tots = corn.computeROCCumulative(res, class_index) 
    9501031 
     
    11711252@deprecated_keywords({"classIndex": "class_index"}) 
    11721253def compute_calibration_curve(res, class_index=-1): 
    1173     import corn 
    11741254    ## merge multiple iterations into one 
    11751255    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels) 
     
    12341314@deprecated_keywords({"classIndex": "class_index"}) 
    12351315def compute_lift_curve(res, class_index=-1): 
    1236     import corn 
    12371316    ## merge multiple iterations into one 
    12381317    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifier_names, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded, test_type=res.test_type, labels=res.labels) 
     
    12711350def compute_CDT(res, class_index=-1, **argkw): 
    12721351    """Obsolete, don't use""" 
    1273     import corn 
    12741352    if class_index<0: 
    12751353        if res.baseClass>=0: 
     
    13611439                      "divideByIfIte": "divide_by_if_ite"}) 
    13621440def AUC_ij(ite, class_index1, class_index2, use_weights = True, all_ite = None, divide_by_if_ite = 1.0): 
    1363     import corn 
    13641441    return AUC_x(corn.computeCDTPair, ite, all_ite, divide_by_if_ite, (class_index1, class_index2, use_weights)) 
    13651442 
     
    13691446                      "useWeights": "use_weights", 
    13701447                      "divideByIfIte": "divide_by_if_ite"}) 
    1371 def AUC_i(ite, class_index, use_weights = True, all_ite = None, divide_by_if_ite = 1.0): 
    1372     import corn 
     1448def AUC_i(ite, class_index, use_weights = True, all_ite = None, 
     1449          divide_by_if_ite = 1.0): 
    13731450    return AUC_x(corn.computeCDT, ite, all_ite, divide_by_if_ite, (class_index, use_weights)) 
    13741451 
  • Orange/feature/discretization.py

    r9927 r9944  
    9393 
    9494        from Orange.feature import discretization 
    95         bayes = Orange.classification.bayes.NaiveBayesLearner() 
     95        bayes = Orange.classification.bayes.Learner() 
    9696        disc = orange.Preprocessor_discretize(method=discretization.EquiNDiscretization(numberOfIntervals=10)) 
    9797        dBayes = discretization.DiscretizedLearner(bayes, name='disc bayes') 
     
    127127  def __call__(self, example, resultType = orange.GetValue): 
    128128    return self.classifier(example, resultType) 
    129  
    130 class DiscretizeTable(object): 
    131     """Discretizes all continuous features of the data table. 
    132  
    133     :param data: data to discretize. 
    134     :type data: :class:`Orange.data.Table` 
    135  
    136     :param features: data features to discretize. None (default) to discretize all features. 
    137     :type features: list of :class:`Orange.feature.Descriptor` 
    138  
    139     :param method: feature discretization method. 
    140     :type method: :class:`Discretization` 
    141     """ 
    142     def __new__(cls, data=None, features=None, discretize_class=False, method=EqualFreq(n=3)): 
    143         if data is None: 
    144             self = object.__new__(cls) 
    145             return self 
    146         else: 
    147             self = cls(features=features, discretize_class=discretize_class, method=method) 
    148             return self(data) 
    149  
    150     def __init__(self, features=None, discretize_class=False, method=EqualFreq(n=3)): 
    151         self.features = features 
    152         self.discretize_class = discretize_class 
    153         self.method = method 
    154  
    155     def __call__(self, data): 
    156         pp = Preprocessor_discretize(attributes=self.features, discretizeClass=self.discretize_class) 
    157         pp.method = self.method 
    158         return pp(data) 
    159  
  • Orange/feature/scoring.py

    r9919 r9988  
    1 """ 
    2 ##################### 
    3 Scoring (``scoring``) 
    4 ##################### 
    5  
    6 .. index:: feature scoring 
    7  
    8 .. index::  
    9    single: feature; feature scoring 
    10  
    11 Feature score is an assessment of the usefulness of the feature for  
    12 prediction of the dependant (class) variable. 
    13  
    14 To compute the information gain of feature "tear_rate" in the Lenses data set (loaded into ``data``) use: 
    15  
    16     >>> meas = Orange.feature.scoring.InfoGain() 
    17     >>> print meas("tear_rate", data) 
    18     0.548794925213 
    19  
    20 Other scoring methods are listed in :ref:`classification` and 
    21 :ref:`regression`. Various ways to call them are described on 
    22 :ref:`callingscore`. 
    23  
    24 Instead of first constructing the scoring object (e.g. ``InfoGain``) and 
    25 then using it, it is usually more convenient to do both in a single step:: 
    26  
    27     >>> print Orange.feature.scoring.InfoGain("tear_rate", data) 
    28     0.548794925213 
    29  
    30 This way is much slower for Relief that can efficiently compute scores 
    31 for all features in parallel. 
    32  
    33 It is also possible to score features that do not appear in the data 
    34 but can be computed from it. A typical case are discretized features: 
    35  
    36 .. literalinclude:: code/scoring-info-iris.py 
    37     :lines: 7-11 
    38  
    39 The following example computes feature scores, both with 
    40 :obj:`score_all` and by scoring each feature individually, and prints out  
    41 the best three features.  
    42  
    43 .. literalinclude:: code/scoring-all.py 
    44     :lines: 7- 
    45  
    46 The output:: 
    47  
    48     Feature scores for best three features (with score_all): 
    49     0.613 physician-fee-freeze 
    50     0.255 el-salvador-aid 
    51     0.228 synfuels-corporation-cutback 
    52  
    53     Feature scores for best three features (scored individually): 
    54     0.613 physician-fee-freeze 
    55     0.255 el-salvador-aid 
    56     0.228 synfuels-corporation-cutback 
    57  
    58 .. comment 
    59     The next script uses :obj:`GainRatio` and :obj:`Relief`. 
    60  
    61     .. literalinclude:: code/scoring-relief-gainRatio.py 
    62         :lines: 7- 
    63  
    64     Notice that on this data the ranks of features match:: 
    65          
    66         Relief GainRt Feature 
    67         0.613  0.752  physician-fee-freeze 
    68         0.255  0.444  el-salvador-aid 
    69         0.228  0.414  synfuels-corporation-cutback 
    70         0.189  0.382  crime 
    71         0.166  0.345  adoption-of-the-budget-resolution 
    72  
    73  
    74 .. _callingscore: 
    75  
    76 ======================= 
    77 Calling scoring methods 
    78 ======================= 
    79  
    80 To score a feature use :obj:`Score.__call__`. There are diferent 
    81 function signatures, which enable optimization. For instance, 
    82 most scoring methods first compute contingency tables from the 
    83 data. If these are already computed, they can be passed to the scorer 
    84 instead of the data. 
    85  
    86 Not all classes accept all kinds of arguments. :obj:`Relief`, 
    87 for instance, only supports the form with instances on the input. 
    88  
    89 .. method:: Score.__call__(attribute, data[, apriori_class_distribution][, weightID]) 
    90  
    91     :param attribute: the chosen feature, either as a descriptor,  
    92       index, or a name. 
    93     :type attribute: :class:`Orange.feature.Descriptor` or int or string 
    94     :param data: data. 
    95     :type data: `Orange.data.Table` 
    96     :param weightID: id for meta-feature with weight. 
    97  
    98     All scoring methods support the first signature. 
    99  
    100 .. method:: Score.__call__(attribute, domain_contingency[, apriori_class_distribution]) 
    101  
    102     :param attribute: the chosen feature, either as a descriptor,  
    103       index, or a name. 
    104     :type attribute: :class:`Orange.feature.Descriptor` or int or string 
    105     :param domain_contingency:  
    106     :type domain_contingency: :obj:`Orange.statistics.contingency.Domain` 
    107  
    108 .. method:: Score.__call__(contingency, class_distribution[, apriori_class_distribution]) 
    109  
    110     :param contingency: 
    111     :type contingency: :obj:`Orange.statistics.contingency.VarClass` 
    112     :param class_distribution: distribution of the class 
    113       variable. If :obj:`unknowns_treatment` is :obj:`IgnoreUnknowns`, 
    114       it should be computed on instances where feature value is 
    115       defined. Otherwise, class distribution should be the overall 
    116       class distribution. 
    117     :type class_distribution:  
    118       :obj:`Orange.statistics.distribution.Distribution` 
    119     :param apriori_class_distribution: Optional and most often 
    120       ignored. Useful if the scoring method makes any probability estimates 
    121       based on apriori class probabilities (such as the m-estimate). 
    122     :return: Feature score - the higher the value, the better the feature. 
    123       If the quality cannot be scored, return :obj:`Score.Rejected`. 
    124     :rtype: float or :obj:`Score.Rejected`. 
    125  
    126 The code below scores the same feature with :obj:`GainRatio`  
    127 using different calls. 
    128  
    129 .. literalinclude:: code/scoring-calls.py 
    130     :lines: 7- 
    131  
    132 .. _classification: 
    133  
    134 ========================================== 
    135 Feature scoring in classification problems 
    136 ========================================== 
    137  
    138 .. Undocumented: MeasureAttribute_IM, MeasureAttribute_chiSquare, MeasureAttribute_gainRatioA, MeasureAttribute_logOddsRatio, MeasureAttribute_splitGain. 
    139  
    140 .. index::  
    141    single: feature scoring; information gain 
    142  
    143 .. class:: InfoGain 
    144  
    145     Information gain; the expected decrease of entropy. See `page on wikipedia 
    146     <http://en.wikipedia.org/wiki/Information_gain_ratio>`_. 
    147  
    148 .. index::  
    149    single: feature scoring; gain ratio 
    150  
    151 .. class:: GainRatio 
    152  
    153     Information gain ratio; information gain divided by the entropy of the feature's 
    154     value. Introduced in [Quinlan1986]_ in order to avoid overestimation 
    155     of multi-valued features. It has been shown, however, that it still 
    156     overestimates features with multiple values. See `Wikipedia 
    157     <http://en.wikipedia.org/wiki/Information_gain_ratio>`_. 
    158  
    159 .. index::  
    160    single: feature scoring; gini index 
    161  
    162 .. class:: Gini 
    163  
    164     Gini index is the probability that two randomly chosen instances will have different 
    165     classes. See `Gini coefficient on Wikipedia <http://en.wikipedia.org/wiki/Gini_coefficient>`_. 
    166  
    167 .. index::  
    168    single: feature scoring; relevance 
    169  
    170 .. class:: Relevance 
    171  
    172     The potential value for decision rules. 
    173  
    174 .. index::  
    175    single: feature scoring; cost 
    176  
    177 .. class:: Cost 
    178  
    179     Evaluates features based on the cost decrease achieved by knowing the value of 
    180     feature, according to the specified cost matrix. 
    181  
    182     .. attribute:: cost 
    183       
    184         Cost matrix, see :obj:`Orange.classification.CostMatrix` for details. 
    185  
    186     If the cost of predicting the first class of an instance that is actually in 
    187     the second is 5, and the cost of the opposite error is 1, than an appropriate 
    188     score can be constructed as follows:: 
    189  
    190  
    191         >>> meas = Orange.feature.scoring.Cost() 
    192         >>> meas.cost = ((0, 5), (1, 0)) 
    193         >>> meas(3, data) 
    194         0.083333350718021393 
    195  
    196     Knowing the value of feature 3 would decrease the 
    197     classification cost for approximately 0.083 per instance. 
    198  
    199     .. comment   opposite error - is this term correct? TODO 
    200  
    201 .. index::  
    202    single: feature scoring; ReliefF 
    203  
    204 .. class:: Relief 
    205  
    206     Assesses features' ability to distinguish between very similar 
    207     instances from different classes. This scoring method was first 
    208     developed by Kira and Rendell and then improved by  Kononenko. The 
    209     class :obj:`Relief` works on discrete and continuous classes and 
    210     thus implements ReliefF and RReliefF. 
    211  
    212     ReliefF is slow since it needs to find k nearest neighbours for 
    213     each of m reference instances. As we normally compute ReliefF for 
    214     all features in the dataset, :obj:`Relief` caches the results for 
    215     all features, when called to score a certain feature.  When called 
    216     again, it uses the stored results if the domain and the data table 
    217     have not changed (data table version and the data checksum are 
    218     compared). Caching will only work if you use the same object.  
    219     Constructing new instances of :obj:`Relief` for each feature, 
    220     like this:: 
    221  
    222         for attr in data.domain.attributes: 
    223             print Orange.feature.scoring.Relief(attr, data) 
    224  
    225     runs much slower than reusing the same instance:: 
    226  
    227         meas = Orange.feature.scoring.Relief() 
    228         for attr in table.domain.attributes: 
    229             print meas(attr, data) 
    230  
    231  
    232     .. attribute:: k 
    233      
    234        Number of neighbours for each instance. Default is 5. 
    235  
    236     .. attribute:: m 
    237      
    238         Number of reference instances. Default is 100. When -1, all 
    239         instances are used as reference. 
    240  
    241     .. attribute:: check_cached_data 
    242      
    243         Check if the cached data is changed, which may be slow on large 
    244         tables.  Defaults to :obj:`True`, but should be disabled when it 
    245         is certain that the data will not change while the scorer is used. 
    246  
    247 .. autoclass:: Orange.feature.scoring.Distance 
    248     
    249 .. autoclass:: Orange.feature.scoring.MDL 
    250  
    251 .. _regression: 
    252  
    253 ====================================== 
    254 Feature scoring in regression problems 
    255 ====================================== 
    256  
    257 .. class:: Relief 
    258  
    259     Relief is used for regression in the same way as for 
    260     classification (see :class:`Relief` in classification 
    261     problems). 
    262  
    263 .. index::  
    264    single: feature scoring; mean square error 
    265  
    266 .. class:: MSE 
    267  
    268     Implements the mean square error score. 
    269  
    270     .. attribute:: unknowns_treatment 
    271      
    272         What to do with unknown values. See :obj:`Score.unknowns_treatment`. 
    273  
    274     .. attribute:: m 
    275      
    276         Parameter for m-estimate of error. Default is 0 (no m-estimate). 
    277  
    278  
    279  
    280 ============ 
    281 Base Classes 
    282 ============ 
    283  
    284 Implemented methods for scoring relevances of features are subclasses 
    285 of :obj:`Score`. Those that compute statistics on conditional 
    286 distributions of class values given the feature values are derived from 
    287 :obj:`ScoreFromProbabilities`. 
    288  
    289 .. class:: Score 
    290  
    291     Abstract base class for feature scoring. Its attributes describe which 
    292     types of features it can handle which kind of data it requires. 
    293  
    294     **Capabilities** 
    295  
    296     .. attribute:: handles_discrete 
    297      
    298         Indicates whether the scoring method can handle discrete features. 
    299  
    300     .. attribute:: handles_continuous 
    301      
    302         Indicates whether the scoring method can handle continuous features. 
    303  
    304     .. attribute:: computes_thresholds 
    305      
    306         Indicates whether the scoring method implements the :obj:`threshold_function`. 
    307  
    308     **Input specification** 
    309  
    310     .. attribute:: needs 
    311      
    312         The type of data needed indicated by one the constants 
    313         below. Classes with use :obj:`DomainContingency` will also handle 
    314         generators. Those based on :obj:`Contingency_Class` will be able 
    315         to take generators and domain contingencies. 
    316  
    317         .. attribute:: Generator 
    318  
    319             Constant. Indicates that the scoring method needs an instance 
    320             generator on the input as, for example, :obj:`Relief`. 
    321  
    322         .. attribute:: DomainContingency 
    323  
    324             Constant. Indicates that the scoring method needs 
    325             :obj:`Orange.statistics.contingency.Domain`. 
    326  
    327         .. attribute:: Contingency_Class 
    328  
    329             Constant. Indicates, that the scoring method needs the contingency 
    330             (:obj:`Orange.statistics.contingency.VarClass`), feature 
    331             distribution and the apriori class distribution (as most 
    332             scoring methods). 
    333  
    334     **Treatment of unknown values** 
    335  
    336     .. attribute:: unknowns_treatment 
    337  
    338         Defined in classes that are able to treat unknown values. It 
    339         should be set to one of the values below. 
    340  
    341         .. attribute:: IgnoreUnknowns 
    342  
    343             Constant. Instances for which the feature value is unknown are removed. 
    344  
    345         .. attribute:: ReduceByUnknown 
    346  
    347             Constant. Features with unknown values are  
    348             punished. The feature quality is reduced by the proportion of 
    349             unknown values. For impurity scores the impurity decreases 
    350             only where the value is defined and stays the same otherwise. 
    351  
    352         .. attribute:: UnknownsToCommon 
    353  
    354             Constant. Undefined values are replaced by the most common value. 
    355  
    356         .. attribute:: UnknownsAsValue 
    357  
    358             Constant. Unknown values are treated as a separate value. 
    359  
    360     **Methods** 
    361  
    362     .. method:: __call__ 
    363  
    364         Abstract. See :ref:`callingscore`. 
    365  
    366     .. method:: threshold_function(attribute, instances[, weightID]) 
    367      
    368         Abstract.  
    369          
    370         Assess different binarizations of the continuous feature 
    371         :obj:`attribute`.  Return a list of tuples. The first element 
    372         is a threshold (between two existing values), the second is 
    373         the quality of the corresponding binary feature, and the third 
    374         the distribution of instances below and above the threshold. 
    375         Not all scorers return the third element. 
    376  
    377         To show the computation of thresholds, we shall use the Iris 
    378         data set: 
    379  
    380         .. literalinclude:: code/scoring-info-iris.py 
    381             :lines: 13-16 
    382  
    383     .. method:: best_threshold(attribute, instances) 
    384  
    385         Return the best threshold for binarization, that is, the threshold 
    386         with which the resulting binary feature will have the optimal 
    387         score. 
    388  
    389         The script below prints out the best threshold for 
    390         binarization of an feature. ReliefF is used scoring:  
    391  
    392         .. literalinclude:: code/scoring-info-iris.py 
    393             :lines: 18-19 
    394  
    395 .. class:: ScoreFromProbabilities 
    396  
    397     Bases: :obj:`Score` 
    398  
    399     Abstract base class for feature scoring method that can be 
    400     computed from contingency matrices. 
    401  
    402     .. attribute:: estimator_constructor 
    403     .. attribute:: conditional_estimator_constructor 
    404      
    405         The classes that are used to estimate unconditional 
    406         and conditional probabilities of classes, respectively. 
    407         Defaults use relative frequencies; possible alternatives are, 
    408         for instance, :obj:`ProbabilityEstimatorConstructor_m` and 
    409         :obj:`ConditionalProbabilityEstimatorConstructor_ByRows` 
    410         (with estimator constructor again set to 
    411         :obj:`ProbabilityEstimatorConstructor_m`), respectively. 
    412  
    413 ============ 
    414 Other 
    415 ============ 
    416  
    417 .. autoclass:: Orange.feature.scoring.OrderAttributes 
    418    :members: 
    419  
    420 .. autofunction:: Orange.feature.scoring.score_all 
    421  
    422 .. rubric:: Bibliography 
    423  
    424 .. [Kononenko2007] Igor Kononenko, Matjaz Kukar: Machine Learning and Data Mining,  
    425   Woodhead Publishing, 2007. 
    426  
    427 .. [Quinlan1986] J R Quinlan: Induction of Decision Trees, Machine Learning, 1986. 
    428  
    429 .. [Breiman1984] L Breiman et al: Classification and Regression Trees, Chapman and Hall, 1984. 
    430  
    431 .. [Kononenko1995] I Kononenko: On biases in estimating multi-valued attributes, International Joint Conference on Artificial Intelligence, 1995. 
    432  
    433 """ 
    434  
    4351import Orange.core as orange 
    4362import Orange.misc 
     
    44511from orange import MeasureAttribute_relief as Relief 
    44612from orange import MeasureAttribute_MSE as MSE 
    447  
    44813 
    44914###### 
  • Orange/fixes/fix_changed_names.py

    r9942 r9991  
    236236           "orange.MajorityLearner":"Orange.classification.majority.MajorityLearner", 
    237237           "orange.DefaultClassifier":"Orange.classification.ConstantClassifier", 
     238 
     239           "orngSQL.SQLReader": "Orange.data.sql.SQLReader", 
     240           "orngSQL.SQLWriter": "Orange.data.sql.SQLWriter", 
    238241 
    239242           "orange.LookupLearner":"Orange.classification.lookup.LookupLearner", 
     
    586589           "orange.RandomGenerator": "Orange.misc.Random", 
    587590 
     591           "orange.TransformValue": "Orange.data.utils.TransformValue", 
     592           "orange.Ordinal2Continuous": "Orange.data.utils.Ordinal2Continuous", 
     593           "orange.Discrete2Continuous": "Orange.data.utils.Discrete2Continuous", 
     594           "orange.NormalizeContinuous": "Orange.data.utils.NormalizeContinuous", 
     595           "orange.MapIntValue": "Orange.data.utils.MapIntValue", 
     596 
    588597           } 
    589598 
  • Orange/fixes/fix_orange_imports.py

    r9818 r9991  
    5858           "orngLinProj": "Orange.projection.linear", 
    5959           "orngEnviron": "Orange.misc.environ", 
     60           "orngSQL": "Orange.data.sql" 
    6061           } 
    6162 
  • Orange/misc/selection.py

    r9775 r9994  
    3232feature with the highest information gain. 
    3333 
    34 part of :download:`misc-selection-bestonthefly.py <code/misc-selection-bestonthefly.py>` (uses :download:`lymphography.tab <code/lymphography.tab>`) 
     34part of :download:`misc-selection-bestonthefly.py <code/misc-selection-bestonthefly.py>` 
    3535 
    3636.. literalinclude:: code/misc-selection-bestonthefly.py 
     
    4242like this: 
    4343 
    44 part of :download:`misc-selection-bestonthefly.py <code/misc-selection-bestonthefly.py>` (uses :download:`lymphography.tab <code/lymphography.tab>`) 
     44part of :download:`misc-selection-bestonthefly.py <code/misc-selection-bestonthefly.py>` 
    4545 
    4646.. literalinclude:: code/misc-selection-bestonthefly.py 
     
    5050The other way to do it is through indices. 
    5151 
    52 :download:`misc-selection-bestonthefly.py <code/misc-selection-bestonthefly.py>` (uses :download:`lymphography.tab <code/lymphography.tab>`) 
     52:download:`misc-selection-bestonthefly.py <code/misc-selection-bestonthefly.py>` 
    5353 
    5454.. literalinclude:: code/misc-selection-bestonthefly.py 
  • Orange/multilabel/br.py

    r9671 r9994  
    4545 
    4646The following example demonstrates a straightforward invocation of 
    47 this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`, uses 
    48 :download:`emotions.tab <code/emotions.tab>`): 
     47this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`): 
    4948 
    5049.. literalinclude:: code/mlc-classify.py 
  • Orange/multilabel/brknn.py

    r9671 r9994  
    3030 
    3131The following example demonstrates a straightforward invocation of 
    32 this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`, uses 
    33 :download:`emotions.tab <code/emotions.tab>`): 
     32this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`): 
    3433 
    3534.. literalinclude:: code/mlc-classify.py 
  • Orange/multilabel/lp.py

    r9922 r9994  
    3434 
    3535The following example demonstrates a straightforward invocation of 
    36 this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`, uses 
    37 :download:`emotions.tab <code/emotions.tab>`): 
     36this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`): 
    3837 
    3938.. literalinclude:: code/mlc-classify.py 
  • Orange/multilabel/mlknn.py

    r9671 r9994  
    3636 
    3737The following example demonstrates a straightforward invocation of 
    38 this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`, uses 
    39 :download:`emotions.tab <code/emotions.tab>`): 
     38this algorithm (:download:`mlc-classify.py <code/mlc-classify.py>`): 
    4039 
    4140.. literalinclude:: code/mlc-classify.py 
  • Orange/multilabel/mulan.py

    r9927 r9994  
    4040 
    4141if __name__=="__main__": 
    42     table = trans_mulan_data("../../doc/datasets/emotions.xml","../../doc/datasets/emotions.arff") 
     42    table = trans_mulan_data("../doc/datasets/emotions.xml","../doc/datasets/emotions.arff") 
    4343     
    4444    for i in range(10): 
    4545        print table[i] 
    4646     
    47     table.save("emotions.tab") 
     47    table.save("/tmp/emotions.tab") 
  • Orange/multitarget/__init__.py

    r9671 r9994  
    2424:download:`generate_multitarget.py <code/generate_multitarget.py>`) to show 
    2525some basic functionalities (part of 
    26 :download:`multitarget.py <code/multitarget.py>`, uses 
    27 :download:`multitarget-synthetic.tab <code/multitarget-synthetic.tab>`). 
     26:download:`multitarget.py <code/multitarget.py>`). 
    2827 
    2928.. literalinclude:: code/multitarget.py 
  • Orange/multitarget/tree.py

    r9922 r9994  
    2424The following example demonstrates how to build a prediction model with 
    2525MultitargetTreeLearner and use it to predict (multiple) class values for 
    26 a given instance (:download:`multitarget.py <code/multitarget.py>`, 
    27 uses :download:`test-pls.tab <code/test-pls.tab>`): 
     26a given instance (:download:`multitarget.py <code/multitarget.py>`): 
    2827 
    2928.. literalinclude:: code/multitarget.py 
  • Orange/network/__init__.py

    r9671 r9994  
    2323Pajek (.net) or GML file format. 
    2424 
    25 :download:`network-read-nx.py <code/network-read-nx.py>` (uses: :download:`K5.net <code/K5.net>`): 
     25:download:`network-read-nx.py <code/network-read-nx.py>`: 
    2626 
    2727.. literalinclude:: code/network-read.py 
  • Orange/network/deprecated.py

    r9922 r9994  
    2929Pajek (.net) or GML file format. 
    3030 
    31 :download:`network-read.py <code/network-read.py>` (uses: :download:`K5.net <code/K5.net>`): 
     31:download:`network-read.py <code/network-read.py>`: 
    3232 
    3333.. literalinclude:: code/network-read.py 
  • Orange/projection/correspondence.py

    r9671 r9994  
    2121 
    2222Data table given below represents smoking habits of different employees 
    23 in a company (computed from :download:`smokers_ct.tab <code/smokers_ct.tab>`). 
     23in a company (computed from `smokers_ct.tab`). 
    2424 
    2525    ================  ====  =====  ======  =====  ========== 
     
    5656 
    5757So lets load the data, compute the contingency and do the analysis 
    58 (:download:`correspondence.py <code/correspondence.py>`, uses :download:`smokers_ct.tab <code/smokers_ct.tab>`):: 
     58(:download:`correspondence.py <code/correspondence.py>`):: 
    5959     
    6060    from Orange.projection import correspondence 
  • Orange/projection/mds.py

    r9916 r9994  
    5555(not included with orange, http://matplotlib.sourceforge.net/). 
    5656 
    57 Example (:download:`mds-scatterplot.py <code/mds-scatterplot.py>`, uses :download:`iris.tab <code/iris.tab>`) 
     57Example (:download:`mds-scatterplot.py <code/mds-scatterplot.py>`) 
    5858 
    5959.. literalinclude:: code/mds-scatterplot.py 
     
    7676time. 
    7777 
    78 Example (:download:`mds-advanced.py <code/mds-advanced.py>`, uses :download:`iris.tab <code/iris.tab>`) 
     78Example (:download:`mds-advanced.py <code/mds-advanced.py>`) 
    7979 
    8080.. literalinclude:: code/mds-advanced.py 
  • Orange/projection/som.py

    r9671 r9994  
    8282 
    8383Class :obj:`Map` stores the self-organizing map composed of :obj:`Node` objects. The code below 
    84 (:download:`som-node.py <code/som-node.py>`, uses :download:`iris.tab <code/iris.tab>`) shows an example how to access the information stored in the  
     84(:download:`som-node.py <code/som-node.py>`) shows an example how to access the information stored in the 
    8585node of the map: 
    8686 
     
    9898======== 
    9999 
    100 The following code  (:download:`som-mapping.py <code/som-mapping.py>`, uses :download:`iris.tab <code/iris.tab>`) infers self-organizing map from Iris data set. The map is rather small, and consists  
     100The following code  (:download:`som-mapping.py <code/som-mapping.py>`) infers self-organizing map from Iris data set. The map is rather small, and consists 
    101101of only 9 cells. We optimize the network, and then report how many data instances were mapped 
    102102into each cell. The second part of the code reports on data instances from one of the corner cells: 
  • Orange/regression/mean.py

    r9671 r9994  
    2626Here's a simple example. 
    2727 
    28 :download:`mean-regression.py <code/mean-regression.py>` (uses: :download:`housing.tab <code/housing.tab>`): 
     28:download:`mean-regression.py <code/mean-regression.py>`: 
    2929 
    3030.. literalinclude:: code/mean-regression.py 
  • Orange/regression/tree.py

    r9671 r9994  
    1212but uses a different set of functions to evaluate node splitting and stop 
    1313criteria. Usage of regression trees is straightforward as demonstrated on the 
    14 following example (:download:`regression-tree-run.py <code/regression-tree-run.py>`, uses :download:`servo.tab <code/servo.tab>`): 
     14following example (:download:`regression-tree-run.py <code/regression-tree-run.py>`): 
    1515 
    1616.. literalinclude:: code/regression-tree-run.py 
  • Orange/statistics/basic.py

    r9671 r9994  
    9797        variables in the domain. 
    9898     
    99     part of :download:`distributions-basic-stat.py <code/distributions-basic-stat.py>` (uses :download:`monks-1.tab <code/monks-1.tab>`) 
     99    part of :download:`distributions-basic-stat.py <code/distributions-basic-stat.py>` 
    100100     
    101101    .. literalinclude:: code/distributions-basic-stat.py 
     
    111111 
    112112 
    113     part of :download:`distributions-basic-stat.py <code/distributions-basic-stat.py>` (uses :download:`iris.tab <code/iris.tab>`) 
     113    part of :download:`distributions-basic-stat.py <code/distributions-basic-stat.py>` 
    114114     
    115115    .. literalinclude:: code/distributions-basic-stat.py 
  • Orange/testing/regression/results_reference/knnlearner.py.txt

    r9689 r10016  
     1Testing using euclidean distance 
    12Iris-setosa Iris-setosa 
    23Iris-versicolor Iris-versicolor 
     
    67 
    78 
    8  
     9Testing using hamming distance 
    910Iris-virginica Iris-virginica 
    1011Iris-setosa Iris-setosa 
  • Orange/testing/regression/results_reference/randomindicescv.py.txt

    r9689 r10016  
     1Indices for ordinary 10-fold CV 
    12<1, 1, 3, 8, 8, 3, 2, 7, 5, 0, 1, 5, 2, 9, 4, 7, 4, 9, 3, 6, 0, 2, 0, 6> 
     3Indices for 5 folds on 10 examples 
    24<3, 0, 1, 0, 3, 2, 4, 4, 1, 2> 
  • Orange/testing/regression/results_reference/treelearner.py.txt

    r9689 r10016  
    1 None 
    2 None 
    311.0 0.0 
    42 
    53 
    64Tree with minExamples = 5.0 
     5tear_rate=reduced: none (100.00%) 
     6tear_rate=normal 
     7|    astigmatic=no 
     8|    |    age=pre-presbyopic: soft (100.00%) 
     9|    |    age=presbyopic: none (50.00%) 
     10|    |    age=young: soft (100.00%) 
     11|    astigmatic=yes 
     12|    |    prescription=hypermetrope: none (66.67%) 
     13|    |    prescription=myope: hard (100.00%) 
    714 
    8 tear_rate (<15.000, 4.000, 5.000>)  
    9 : normal  
    10    astigmatic (<3.000, 4.000, 5.000>)  
    11    : no  
    12       age (<1.000, 0.000, 5.000>)  
    13       : pre-presbyopic --> soft (<0.000, 0.000, 2.000>)   
    14       : presbyopic --> none (<1.000, 0.000, 1.000>)   
    15       : young --> soft (<0.000, 0.000, 2.000>)   
    16    : yes  
    17       prescription (<2.000, 4.000, 0.000>)  
    18       : hypermetrope --> none (<2.000, 1.000, 0.000>)   
    19       : myope --> hard (<0.000, 3.000, 0.000>)   
    20 : reduced --> none (<12.000, 0.000, 0.000>)   
     15 
    2116 
    2217Tree with maxMajority = 0.5 
    23 --> none (<15.000, 4.000, 5.000>)  
     18none (62.50%) 
  • Orange/testing/regression/results_reference/treestructure.py.txt

    r9689 r10016  
    1 Tree size: 10 
     1Tree size: 15 
    22 
    33 
    44Unpruned tree 
     5tear_rate=reduced: none (100.00%) 
     6tear_rate=normal 
     7|    astigmatic=no 
     8|    |    age=pre-presbyopic: soft (100.00%) 
     9|    |    age=young: soft (100.00%) 
     10|    |    age=presbyopic 
     11|    |    |    prescription=hypermetrope: soft (100.00%) 
     12|    |    |    prescription=myope: none (100.00%) 
     13|    astigmatic=yes 
     14|    |    prescription=myope: hard (100.00%) 
     15|    |    prescription=hypermetrope 
     16|    |    |    age=pre-presbyopic: none (100.00%) 
     17|    |    |    age=presbyopic: none (100.00%) 
     18|    |    |    age=young: hard (100.00%) 
    519 
    6 tear_rate (<15.000, 4.000, 5.000>)  
    7 : normal  
    8    astigmatic (<3.000, 4.000, 5.000>)  
    9    : no  
    10       age (<1.000, 0.000, 5.000>)  
    11       : pre-presbyopic --> soft (<0.000, 0.000, 2.000>)   
    12       : presbyopic --> none (<1.000, 0.000, 1.000>)   
    13       : young --> soft (<0.000, 0.000, 2.000>)   
    14    : yes  
    15       prescription (<2.000, 4.000, 0.000>)  
    16       : hypermetrope --> none (<2.000, 1.000, 0.000>)   
    17       : myope --> hard (<0.000, 3.000, 0.000>)   
    18 : reduced --> none (<12.000, 0.000, 0.000>)   
     20 
    1921 
    2022Pruned tree 
     23tear_rate=reduced: none (100.00%) 
     24tear_rate=normal 
     25|    astigmatic=no: soft (83.33%) 
     26|    astigmatic=yes: hard (66.67%) 
    2127 
    22 tear_rate (<15.000, 4.000, 5.000>)  
    23 : normal  
    24    astigmatic (<3.000, 4.000, 5.000>)  
    25    : no --> soft (<1.000, 0.000, 5.000>)   
    26    : yes --> hard (<2.000, 4.000, 0.000>)   
    27 : reduced --> none (<12.000, 0.000, 0.000>)  
  • Orange/testing/regression/xtest.py

    r9873 r9972  
    1212platform = sys.platform 
    1313pyversion = sys.version[:3] 
    14 states = ["OK", "changed", "random", "error", "crash"] 
     14states = ["OK", "timedout", "changed", "random", "error", "crash"] 
    1515 
    1616def file_name_match(name, patterns): 
     
    2222 
    2323def test_scripts(complete, just_print, module="orange", root_directory=".", 
    24                 test_files=None, directories=None): 
     24                test_files=None, directories=None, timeout=5): 
    2525    """Test the scripts in the given directory.""" 
    2626    global error_status 
     
    123123                sys.stdout.flush() 
    124124 
    125                 for state in ["crash", "error", "new", "changed", "random1", "random2"]: 
     125                for state in states: 
    126126                    remname = "%s/%s.%s.%s.%s.txt" % \ 
    127127                              (outputsdir, name, platform, pyversion, state) 
     
    130130 
    131131                titerations = re_israndom.search(open(name, "rt").read()) and 1 or iterations 
    132                 os.spawnl(os.P_WAIT, sys.executable, "-c", regtestdir + "/xtest_one.py", name, str(titerations), outputsdir) 
    133  
    134                 result = open("xtest1_report", "rt").readline().rstrip() or "crash" 
     132                #os.spawnl(os.P_WAIT, sys.executable, "-c", regtestdir + "/xtest_one.py", name, str(titerations), outputsdir) 
     133                p = subprocess.Popen([sys.executable, regtestdir + "/xtest_one.py", name, str(titerations), outputsdir]) 
     134 
     135                passed_time = 0 
     136                while passed_time < timeout: 
     137                    time.sleep(0.01) 
     138                    passed_time += 0.01 
     139 
     140                    if p.poll() is not None: 
     141                        break 
     142 
     143                if p.poll() is None: 
     144                    p.kill() 
     145                    result2 = "timedout" 
     146                    print "timedout (use: --timeout #)" 
     147                    # remove output file and change it for *.timedout.* 
     148                    for state in states: 
     149                        remname = "%s/%s.%s.%s.%s.txt" % \ 
     150                                  (outputsdir, name, platform, pyversion, state) 
     151                        if os.path.exists(remname): 
     152                            os.remove(remname) 
     153 
     154                    timeoutname = "%s/%s.%s.%s.%s.txt" % (outputsdir, name, sys.platform, sys.version[:3], "timedout") 
     155                    open(timeoutname, "wt").close() 
     156                    result = "timedout" 
     157                else: 
     158                    stdout, stderr = p.communicate() 
     159                    result = open("xtest1_report", "rt").readline().rstrip() or "crash" 
     160 
    135161                error_status = max(error_status, states.index(result)) 
    136162                os.remove("xtest1_report") 
     
    139165 
    140166    os.chdir(caller_directory) 
    141  
    142167 
    143168iterations = 1 
     
    147172def usage(): 
    148173    """Print out help.""" 
    149     print "%s [update|test|report|report-html|errors] -[h|s] [--single|--module=[orange|obi|text]|--dir=<dir>|] <files>" % sys.argv[0] 
    150     print "  test:   regression tests on all scripts" 
    151     print "  update: regression tests on all previously failed scripts (default)" 
     174    print "%s [test|update|report|report-html|errors] -[h|s] [--single|--module=[all|orange|docs]|--timeout=<#>|--dir=<dir>|] <files>" % sys.argv[0] 
     175    print "  test:   regression tests on all scripts (default)" 
     176    print "  update: regression tests on all previously failed scripts" 
    152177    print "  report: report on testing results" 
    153178    print "  errors: report on errors from regression tests" 
     
    155180    print "-s, --single: runs a single test on each script" 
    156181    print "--module=<module>: defines a module to test" 
     182    print "--timeout=<#seconds>: defines max. execution time" 
    157183    print "--dir=<dir>: a comma-separated list of names where any should match the directory to be tested" 
    158184    print "<files>: space separated list of string matching the file names to be tested" 
     
    163189    global iterations 
    164190 
    165     command = "update" 
     191    command = "test" 
    166192    if argv: 
    167193        if argv[0] in ["update", "test", "report", "report-html", "errors", "help"]: 
     
    170196 
    171197    try: 
    172         opts, test_files = getopt.getopt(argv, "hs", ["single", "module=", "help", "files=", "verbose="]) 
     198        opts, test_files = getopt.getopt(argv, "hs", ["single", "module=", "timeout=", "help", "files=", "verbose="]) 
    173199    except getopt.GetoptError: 
    174200        print "Warning: Wrong argument" 
     
    183209 
    184210    module = opts.get("--module", "all") 
    185     if module in ["all"]: 
     211    if module == "all": 
    186212        root = "%s/.." % environ.install_dir 
    187213        module = "orange" 
    188         dirs = [("modules", "Orange/doc/modules"), 
    189                 ("reference", "Orange/doc/reference"), 
    190                 ("ofb", "docs/tutorial/rst/code"), 
    191                 ("orange25", "docs/reference/rst/code")] 
    192     elif module in ["orange"]: 
     214        dirs = [("tests", "Orange/testing/regression/tests"), 
     215                ("tests_20", "Orange/testing/regression/tests_20"), 
     216                ("tutorial", "docs/tutorial/rst/code"), 
     217                ("reference", "docs/reference/rst/code")] 
     218    elif module == "orange": 
     219        root = "%s" % environ.install_dir 
     220        module = "orange" 
     221        dirs = [("tests", "testing/regression/tests"), 
     222                ("tests_20", "testing/regression/tests_20")] 
     223    elif module == "docs": 
    193224        root = "%s/.." % environ.install_dir 
    194225        module = "orange" 
    195         dirs = [("modules", "Orange/doc/modules"), 
    196                 ("reference", "Orange/doc/reference"), 
    197                 ("ofb", "docs/tutorial/rst/code")] 
    198     elif module in ["ofb-rst"]: 
    199         root = "%s/.." % environ.install_dir 
    200         module = "orange" 
    201         dirs = [("ofb", "docs/tutorial/rst/code")] 
    202     elif module in ["orange25"]: 
    203         root = "%s/.." % environ.install_dir 
    204         module = "orange" 
    205         dirs = [("orange25", "docs/reference/rst/code")] 
    206     elif module == "obi": 
    207         root = environ.add_ons_dir + "/Bioinformatics/doc" 
    208         dirs = [("modules", "modules")] 
    209     elif module == "text": 
    210         root = environ.add_ons_dir + "/Text/doc" 
    211         dirs = [("modules", "modules")] 
     226        dirs = [("tutorial", "docs/tutorial/rst/code"), 
     227                ("reference", "docs/reference/rst/code")] 
    212228    else: 
    213         print "Error: %s is wrong name of the module, should be in [orange|obi|text]" % module 
     229        print "Error: %s is wrong name of the module, should be in [orange|docs]" % module 
     230        sys.exit(1) 
     231 
     232    timeout = 5 
     233    try: 
     234        _t = opts.get("--timeout", "5") 
     235        timeout = int(_t) 
     236        if timeout <= 0 or timeout >= 120: 
     237            raise AttributeError() 
     238    except AttributeError: 
     239        print "Error: timeout out of range (0 < # < 120)" 
     240        sys.exit(1) 
     241    except: 
     242        print "Error: %s wrong timeout" % opts.get("--timeout", "5") 
    214243        sys.exit(1) 
    215244 
    216245    test_scripts(command == "test", command == "report" or (command == "report-html" and command or False), 
    217246                 module=module, root_directory=root, 
    218                  test_files=test_files, directories=dirs) 
     247                 test_files=test_files, directories=dirs, timeout=timeout) 
    219248    # sys.exit(error_status) 
    220249 
  • Orange/testing/unit/tests/test_association.py

    r9679 r9979  
    1414         
    1515    self.assertLessEqual(len(rules), self.inducer.max_item_sets) 
    16     print "\n%5s   %5s" % ("supp", "conf") 
    1716    for r in rules: 
    18         print "%5.3f   %5.3f   %s" % (r.support, r.confidence, r) 
    1917        self.assertGreaterEqual(r.support, self.inducer.support) 
    2018        self.assertIsNotNone(r.left) 
  • Orange/testing/unit/tests/test_ensemble.py

    r9679 r9978  
    2323        testing.LearnerTestCase.test_pickling_on(self, dataset) 
    2424 
     25 
    2526@datasets_driven(datasets=testing.CLASSIFICATION_DATASETS) 
    2627class TestRandomForest(testing.LearnerTestCase): 
     
    3132    @test_on_datasets(datasets=["iris"]) 
    3233    def test_pickling_on(self, dataset): 
    33         testing.LearnerTestCase.test_pickling_on(self, dataset) 
     34        raise NotImplemented("SmallTreeLearner pickling is not implemented") 
     35#        testing.LearnerTestCase.test_pickling_on(self, dataset) 
     36         
    3437         
    3538         
  • docs/reference/rst/Orange.associate.rst

    r9372 r9994  
    33==================================== 
    44 
    5 .. automodule:: Orange.associate 
     5============================== 
     6Induction of association rules 
     7============================== 
     8 
     9Orange provides two algorithms for induction of 
     10`association rules <http://en.wikipedia.org/wiki/Association_rule_learning>`_. 
     11One is the basic Agrawal's algorithm with dynamic induction of supported 
     12itemsets and rules that is designed specifically for datasets with a 
     13large number of different items. This is, however, not really suitable 
     14for feature-based machine learning problems. 
     15We have adapted the original algorithm for efficiency 
     16with the latter type of data, and to induce the rules where, 
     17both sides don't only contain features 
     18(like "bread, butter -> jam") but also their values 
     19("bread = wheat, butter = yes -> jam = plum"). 
     20 
     21It is also possible to extract item sets instead of association rules. These 
     22are often more interesting than the rules themselves. 
     23 
     24Besides association rule inducer, Orange also provides a rather simplified 
     25method for classification by association rules. 
     26 
     27=================== 
     28Agrawal's algorithm 
     29=================== 
     30 
     31The class that induces rules by Agrawal's algorithm, accepts the data examples 
     32of two forms. The first is the standard form in which each example is 
     33described by values of a fixed list of features (defined in domain). 
     34The algorithm, however, disregards the feature values and only checks whether 
     35the value is defined or not. The rule shown above ("bread, butter -> jam") 
     36actually means that if "bread" and "butter" are defined, then "jam" is defined 
     37as well. It is expected that most of values will be undefined - if this is not 
     38so, use the :class:`~AssociationRulesInducer`. 
     39 
     40:class:`AssociationRulesSparseInducer` can also use sparse data. 
     41Sparse examples have no fixed 
     42features - the domain is empty. All values assigned to example are given as meta attributes. 
     43All meta attributes need to be registered with the :obj:`~Orange.data.Domain`. 
     44The most suitable format fot this kind of data it is the basket format. 
     45 
     46The algorithm first dynamically builds all itemsets (sets of features) that have 
     47at least the prescribed support. Each of these is then used to derive rules 
     48with requested confidence. 
     49 
     50If examples were given in the sparse form, so are the left and right side 
     51of the induced rules. If examples were given in the standard form, so are 
     52the examples in association rules. 
     53 
     54.. class:: AssociationRulesSparseInducer 
     55 
     56    .. attribute:: support 
     57 
     58        Minimal support for the rule. 
     59 
     60    .. attribute:: confidence 
     61 
     62        Minimal confidence for the rule. 
     63 
     64    .. attribute:: store_examples 
     65 
     66        Store the examples covered by each rule and 
     67        those confirming it. 
     68 
     69    .. attribute:: max_item_sets 
     70 
     71        The maximal number of itemsets. The algorithm's 
     72        running time (and its memory consumption) depends on the minimal support; 
     73        the lower the requested support, the more eligible itemsets will be found. 
     74        There is no general rule for setting support - perhaps it 
     75        should be around 0.3, but this depends on the data set. 
     76        If the supoort was set too low, the algorithm could run out of memory. 
     77        Therefore, Orange limits the number of generated rules to 
     78        :obj:`max_item_sets`. If Orange reports, that the prescribed 
     79        :obj:`max_item_sets` was exceeded, increase the requered support 
     80        or alternatively, increase :obj:`max_item_sets` to as high as you computer 
     81        can handle. 
     82 
     83    .. method:: __call__(data, weight_id) 
     84 
     85        Induce rules from the data set. 
     86 
     87 
     88    .. method:: get_itemsets(data) 
     89 
     90        Returns a list of pairs. The first element of a pair is a tuple with 
     91        indices of features in the item set (negative for sparse data). 
     92        The second element is a list of indices supporting the item set, that is, 
     93        all the items in the set. If :obj:`store_examples` is False, the second 
     94        element is None. 
     95 
     96We shall test the rule inducer on a dataset consisting of a brief description 
     97of Spanish Inquisition, given by Palin et al: 
     98 
     99    NOBODY expects the Spanish Inquisition! Our chief weapon is surprise...surprise and fear...fear and surprise.... Our two weapons are fear and surprise...and ruthless efficiency.... Our *three* weapons are fear, surprise, and ruthless efficiency...and an almost fanatical devotion to the Pope.... Our *four*...no... *Amongst* our weapons.... Amongst our weaponry...are such elements as fear, surprise.... I'll come in again. 
     100 
     101    NOBODY expects the Spanish Inquisition! Amongst our weaponry are such diverse elements as: fear, surprise, ruthless efficiency, an almost fanatical devotion to the Pope, and nice red uniforms - Oh damn! 
     102 
     103The text needs to be cleaned of punctuation marks and capital letters at beginnings of the sentences, each sentence needs to be put in a new line and commas need to be inserted between the words. 
     104 
     105Data example (:download:`inquisition.basket <code/inquisition.basket>`): 
     106 
     107.. literalinclude:: code/inquisition.basket 
     108 
     109Inducing the rules is trivial:: 
     110 
     111    import Orange 
     112    data = Orange.data.Table("inquisition") 
     113 
     114    rules = Orange.associate.AssociationRulesSparseInducer(data, support = 0.5) 
     115 
     116    print "%5s   %5s" % ("supp", "conf") 
     117    for r in rules: 
     118        print "%5.3f   %5.3f   %s" % (r.support, r.confidence, r) 
     119 
     120The induced rules are surprisingly fear-full: :: 
     121 
     122    0.500   1.000   fear -> surprise 
     123    0.500   1.000   surprise -> fear 
     124    0.500   1.000   fear -> surprise our 
     125    0.500   1.000   fear surprise -> our 
     126    0.500   1.000   fear our -> surprise 
     127    0.500   1.000   surprise -> fear our 
     128    0.500   1.000   surprise our -> fear 
     129    0.500   0.714   our -> fear surprise 
     130    0.500   1.000   fear -> our 
     131    0.500   0.714   our -> fear 
     132    0.500   1.000   surprise -> our 
     133    0.500   0.714   our -> surprise 
     134 
     135To get only a list of supported item sets, one should call the method 
     136get_itemsets:: 
     137 
     138    inducer = Orange.associate.AssociationRulesSparseInducer(support = 0.5, store_examples = True) 
     139    itemsets = inducer.get_itemsets(data) 
     140 
     141Now itemsets is a list of itemsets along with the examples supporting them 
     142since we set store_examples to True. :: 
     143 
     144    >>> itemsets[5] 
     145    ((-11, -7), [1, 2, 3, 6, 9]) 
     146    >>> [data.domain[i].name for i in itemsets[5][0]] 
     147    ['surprise', 'our'] 
     148 
     149The sixth itemset contains features with indices -11 and -7, that is, the 
     150words "surprise" and "our". The examples supporting it are those with 
     151indices 1,2, 3, 6 and 9. 
     152 
     153This way of representing the itemsets is memory efficient and faster than using 
     154objects like :obj:`~Orange.feature.Descriptor` and :obj:`~Orange.data.Instance`. 
     155 
     156.. _non-sparse-examples: 
     157 
     158=================== 
     159Non-sparse data 
     160=================== 
     161 
     162:class:`AssociationRulesInducer` works with non-sparse data. 
     163Unknown values are ignored, while values of features are not (as opposite to 
     164the algorithm for sparse rules). In addition, the algorithm 
     165can be directed to search only for classification rules, in which the only 
     166feature on the right-hand side is the class variable. 
     167 
     168.. class:: AssociationRulesInducer 
     169 
     170    All attributes can be set with the constructor. 
     171 
     172    .. attribute:: support 
     173 
     174       Minimal support for the rule. 
     175 
     176    .. attribute:: confidence 
     177 
     178        Minimal confidence for the rule. 
     179 
     180    .. attribute:: classification_rules 
     181 
     182        If True (default is False), the classification rules are constructed instead 
     183        of general association rules. 
     184 
     185    .. attribute:: store_examples 
     186 
     187        Store the examples covered by each rule and those 
     188        confirming it 
     189 
     190    .. attribute:: max_item_sets 
     191 
     192        The maximal number of itemsets. 
     193 
     194    .. method:: __call__(data, weight_id) 
     195 
     196        Induce rules from the data set. 
     197 
     198    .. method:: get_itemsets(data) 
     199 
     200        Returns a list of pairs. The first element of a pair is a tuple with 
     201        indices of features in the item set (negative for sparse data). 
     202        The second element is a list of indices supporting the item set, that is, 
     203        all the items in the set. If :obj:`store_examples` is False, the second 
     204        element is None. 
     205 
     206The example:: 
     207 
     208    import Orange 
     209 
     210    data = Orange.data.Table("lenses") 
     211 
     212    print "Association rules" 
     213    rules = Orange.associate.AssociationRulesInducer(data, support = 0.5) 
     214    for r in rules: 
     215        print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r) 
     216 
     217The found rules are: :: 
     218 
     219    0.333  0.533  lenses=none -> prescription=hypermetrope 
     220    0.333  0.667  prescription=hypermetrope -> lenses=none 
     221    0.333  0.533  lenses=none -> astigmatic=yes 
     222    0.333  0.667  astigmatic=yes -> lenses=none 
     223    0.500  0.800  lenses=none -> tear_rate=reduced 
     224    0.500  1.000  tear_rate=reduced -> lenses=none 
     225 
     226To limit the algorithm to classification rules, set classificationRules to 1: :: 
     227 
     228    print "\\nClassification rules" 
     229    rules = orange.AssociationRulesInducer(data, support = 0.3, classificationRules = 1) 
     230    for r in rules: 
     231        print "%5.3f  %5.3f  %s" % (r.support, r.confidence, r) 
     232 
     233The found rules are, naturally, a subset of the above rules: :: 
     234 
     235    0.333  0.667  prescription=hypermetrope -> lenses=none 
     236    0.333  0.667  astigmatic=yes -> lenses=none 
     237    0.500  1.000  tear_rate=reduced -> lenses=none 
     238 
     239Itemsets are induced in a similar fashion as for sparse data, except that the 
     240first element of the tuple, the item set, is represented not by indices of 
     241features, as before, but with tuples (feature-index, value-index): :: 
     242 
     243    inducer = Orange.associate.AssociationRulesInducer(support = 0.3, store_examples = True) 
     244    itemsets = inducer.get_itemsets(data) 
     245    print itemsets[8] 
     246 
     247This prints out :: 
     248 
     249    (((2, 1), (4, 0)), [2, 6, 10, 14, 15, 18, 22, 23]) 
     250 
     251meaning that the ninth itemset contains the second value of the third feature 
     252(2, 1), and the first value of the fifth (4, 0). 
     253 
     254======================= 
     255Representation of rules 
     256======================= 
     257 
     258An :class:`AssociationRule` represents a rule. In Orange, methods for 
     259induction of association rules return the induced rules in 
     260:class:`AssociationRules`, which is basically a list of :class:`AssociationRule` instances. 
     261 
     262.. class:: AssociationRule 
     263 
     264    .. method:: __init__(left, right, n_applies_left, n_applies_right, n_applies_both, n_examples) 
     265 
     266        Constructs an association rule and computes all measures listed above. 
     267 
     268    .. method:: __init__(left, right, support, confidence) 
     269 
     270        Construct association rule and sets its support and confidence. If 
     271        you intend to pass on such a rule you should set other attributes 
     272        manually - AssociationRules's constructor cannot compute anything 
     273        from arguments support and confidence. 
     274 
     275    .. method:: __init__(rule) 
     276 
     277        Given an association rule as the argument, constructor copies of the 
     278        rule. 
     279 
     280    .. attribute:: left, right 
     281 
     282        The left and the right side of the rule. Both are given as :class:`Orange.data.Instance`. 
     283        In rules created by :class:`AssociationRulesSparseInducer` from examples that 
     284        contain all values as meta-values, left and right are examples in the 
     285        same form. Otherwise, values in left that do not appear in the rule 
     286        are "don't care", and value in right are "don't know". Both can, 
     287        however, be tested by :meth:`~Orange.data.Value.is_special`. 
     288 
     289    .. attribute:: n_left, n_right 
     290 
     291        The number of features (i.e. defined values) on the left and on the 
     292        right side of the rule. 
     293 
     294    .. attribute:: n_applies_left, n_applies_right, n_applies_both 
     295 
     296        The number of (learning) examples that conform to the left, the right 
     297        and to both sides of the rule. 
     298 
     299    .. attribute:: n_examples 
     300 
     301        The total number of learning examples. 
     302 
     303    .. attribute:: support 
     304 
     305        nAppliesBoth/nExamples. 
     306 
     307    .. attribute:: confidence 
     308 
     309        n_applies_both/n_applies_left. 
     310 
     311    .. attribute:: coverage 
     312 
     313        n_applies_left/n_examples. 
     314 
     315    .. attribute:: strength 
     316 
     317        n_applies_right/n_applies_left. 
     318 
     319    .. attribute:: lift 
     320 
     321        n_examples * n_applies_both / (n_applies_left * n_applies_right). 
     322 
     323    .. attribute:: leverage 
     324 
     325        (n_Applies_both * n_examples - n_applies_left * n_applies_right). 
     326 
     327    .. attribute:: examples, match_left, match_both 
     328 
     329        If store_examples was True during induction, examples contains a copy 
     330        of the example table used to induce the rules. Attributes match_left 
     331        and match_both are lists of integers, representing the indices of 
     332        examples which match the left-hand side of the rule and both sides, 
     333        respectively. 
     334 
     335    .. method:: applies_left(example) 
     336 
     337    .. method:: applies_right(example) 
     338 
     339    .. method:: applies_both(example) 
     340 
     341        Tells whether the example fits into the left, right or both sides of 
     342        the rule, respectively. If the rule is represented by sparse examples, 
     343        the given example must be sparse as well. 
     344 
     345Association rule inducers do not store evidence about which example supports 
     346which rule. Let us write a function that finds the examples that 
     347confirm the rule (fit both sides of it) and those that contradict it (fit the 
     348left-hand side but not the right). The example:: 
     349 
     350    import Orange 
     351 
     352    data = Orange.data.Table("lenses") 
     353 
     354    rules = Orange.associate.AssociationRulesInducer(data, supp = 0.3) 
     355    rule = rules[0] 
     356 
     357    print 
     358    print "Rule: ", rule 
     359    print 
     360 
     361    print "Supporting examples:" 
     362    for example in data: 
     363        if rule.appliesBoth(example): 
     364            print example 
     365    print 
     366 
     367    print "Contradicting examples:" 
     368    for example in data: 
     369        if rule.applies_left(example) and not rule.applies_right(example): 
     370            print example 
     371    print 
     372 
     373The latter printouts get simpler and faster if we instruct the inducer to 
     374store the examples. We can then do, for instance, this: :: 
     375 
     376    print "Match left: " 
     377    print "\\n".join(str(rule.examples[i]) for i in rule.match_left) 
     378    print "\\nMatch both: " 
     379    print "\\n".join(str(rule.examples[i]) for i in rule.match_both) 
     380 
     381The "contradicting" examples are then those whose indices are found in 
     382match_left but not in match_both. The memory friendlier and the faster way 
     383to compute this is as follows: :: 
     384 
     385    >>> [x for x in rule.match_left if not x in rule.match_both] 
     386    [0, 2, 8, 10, 16, 17, 18] 
     387    >>> set(rule.match_left) - set(rule.match_both) 
     388    set([0, 2, 8, 10, 16, 17, 18]) 
     389 
     390=============== 
     391Utilities 
     392=============== 
     393 
     394.. autofunction:: print_rules 
     395 
     396.. autofunction:: sort 
  • docs/reference/rst/Orange.data.continuization.rst

    r9941 r9966  
    1111variable separately. 
    1212 
    13 .. class DomainContinuizer 
     13.. class:: DomainContinuizer 
    1414 
    1515    Returns a new domain containing only continuous attributes given a 
     
    2929      ``multinomial_treatment``. 
    3030 
    31     .. attribute zero_based 
     31    The typical use of the class is as follows:: 
     32 
     33        continuizer = orange.DomainContinuizer() 
     34        continuizer.multinomialTreatment = continuizer.LowestIsBase 
     35        domain0 = continuizer(data) 
     36        data0 = data.translate(domain0) 
     37 
     38    .. attribute:: zero_based 
    3239 
    3340        Determines the value used as the "low" value of the variable. When 
     
    3845        following text assumes the default case. 
    3946 
    40     .. attribute multinomial_treatment 
     47    .. attribute:: multinomial_treatment 
    4148 
    4249       Decides the treatment of multinomial variables. Let N be the 
     
    5461           used (directly) in, for instance, linear or logistic regression. 
    5562 
     63           For example, data set "bridges" has feature "RIVER" with 
     64           values "M", "A", "O" and "Y", in that order. Its value for 
     65           the 15th row is "M". Continuization replaces the variable 
     66           with variables "RIVER=M", "RIVER=A", "RIVER=O" and 
     67           "RIVER=Y". For the 15th row, the first has value 1 and 
     68           others are 0. 
     69 
    5670       DomainContinuizer.LowestIsBase 
    5771           Similar to the above except that it creates only N-1 
     
    6377           specified value is used as base instead of the lowest one. 
    6478 
     79           Continuizing the variable "RIVER" gives similar results as 
     80           above except that it would omit "RIVER=M"; all three 
     81           variables would be zero for the 15th data instance. 
     82 
    6583       DomainContinuizer.FrequentIsBase 
    66  
    6784           Like above, except that the most frequent value is used as the 
    6885           base (this can again be overidden by setting the descriptor's 
     
    7188           extracted from data, so this option cannot be used if constructor 
    7289           is given only a domain. 
     90 
     91           Variable "RIVER" would be continuized similarly to above 
     92           except that it omits "RIVER=A", which is the most frequent value. 
    7393            
    7494       DomainContinuizer.Ignore 
     
    87107           variable. 
    88108 
    89     .. attribute normalize_continuous 
     109    .. attribute:: normalize_continuous 
    90110 
    91111        If ``False`` (default), continues variables are left unchanged. If 
  • docs/reference/rst/Orange.data.discretization.rst

    r9900 r9963  
    1 .. py:currentmodule:: Orange.data 
     1.. py:currentmodule:: Orange.data.discretization 
    22 
    3 ################################### 
    4 Discretization (``discretization``) 
    5 ################################### 
     3######################################## 
     4Data discretization (``discretization``) 
     5######################################## 
    66 
    77.. index:: discretization 
     
    1010   single: data; discretization 
    1111 
    12 Continues features in the data can be discretized using a uniform discretization method. The approach will consider 
    13 only continues features, and replace them in the data set with corresponding categorical features: 
     12Continues features in the data can be discretized using a uniform discretization method. Discretization considers 
     13only continues features, and replaces them in the new data set with corresponding categorical features: 
    1414 
    1515.. literalinclude:: code/discretization-table.py 
    1616 
    17 Discretization introduces new categorical features and computes their values in accordance to 
    18 a discretization method:: 
     17Discretization introduces new categorical features with discretized values:: 
    1918 
    2019    Original data set: 
     
    2827    ['<=5.45', '>3.15', '<=2.45', '<=0.80', 'Iris-setosa'] 
    2928 
    30 The procedure uses feature discretization classes as define in XXX and applies them on entire data sets. 
    31 The suported discretization methods are: 
     29Data discretization uses feature discretization classes from :doc:`Orange.feature 
     30.discretization` and applies them on entire data set. The suported discretization methods are: 
    3231 
    3332* equal width discretization, where the domain of continuous feature is split to intervals of the same 
    34   width equal-sized intervals (:class:`EqualWidth`), 
    35 * equal frequency discretization, where each intervals contains equal number of data instances (:class:`EqualFreq`), 
     33  width equal-sized intervals (uses :class:`Orange.feature.discretization.EqualWidth`), 
     34* equal frequency discretization, where each intervals contains equal number of data instances (uses 
     35  :class:`Orange.feature.discretization.EqualFreq`), 
    3636* entropy-based, as originally proposed by [FayyadIrani1993]_ that infers the intervals to minimize 
    37   within-interval entropy of class distributions (:class:`Entropy`), 
     37  within-interval entropy of class distributions (uses :class:`Orange.feature.discretization.Entropy`), 
    3838* bi-modal, using three intervals to optimize the difference of the class distribution in 
    39   the middle with the distribution outside it (:class:`BiModal`), 
     39  the middle with the distribution outside it (uses :class:`Orange.feature.discretization.BiModal`), 
    4040* fixed, with the user-defined cut-off points. 
    4141 
    42 The above script used the default discretization method (equal frequency with three intervals). This can be 
    43 changed while some selected discretization approach as demonstrated below: 
     42.. FIXME give a corresponding class for fixed discretization 
     43 
     44Default discretization method (equal frequency with three intervals) can be replaced with other 
     45discretization approaches as demonstrated below: 
    4446 
    4547.. literalinclude:: code/discretization-table-method.py 
    4648    :lines: 3-5 
    4749 
    48 Classes 
    49 ======= 
     50Entropy-based discretization is special as it may infer new features that are constant and have only one value. Such 
     51features are redundant and provide no information about the class are. By default, 
     52:class:`DiscretizeTable` would remove them, a way performing feature subset selection. The effect of removal of 
     53non-informative features is also demonstrated in the following script: 
    5054 
    51 Some functions and classes that can be used for 
    52 categorization of continuous features. Besides several general classes that 
    53 can help in this task, we also provide a function that may help in 
    54 entropy-based discretization (Fayyad & Irani), and a wrapper around classes for 
    55 categorization that can be used for learning. 
     55.. literalinclude:: code/discretization-entropy.py 
     56    :lines: 3- 
    5657 
    57 .. autoclass:: Orange.feature.discretization.DiscretizedLearner_Class 
     58In the sampled dat set above three features were discretized to a constant and thus removed:: 
     59 
     60    Redundant features (3 of 13): 
     61    cholesterol, rest SBP, age 
     62 
     63.. note:: 
     64    Entropy-based and bi-modal discretization require class-labeled data sets. 
     65 
     66Data discretization classes 
     67=========================== 
     68 
     69.. .. autoclass:: Orange.feature.discretization.DiscretizedLearner_Class 
    5870 
    5971.. autoclass:: DiscretizeTable 
    6072 
    61 .. rubric:: Example 
     73.. A chapter on `feature subset selection <../ofb/o_fss.htm>`_ in Orange 
     74   for Beginners tutorial shows the use of DiscretizedLearner. Other 
     75   discretization classes from core Orange are listed in chapter on 
     76   `categorization <../ofb/o_categorization.htm>`_ of the same tutorial. -> should put in classification/wrappers 
    6277 
    63 FIXME. A chapter on `feature subset selection <../ofb/o_fss.htm>`_ in Orange 
    64 for Beginners tutorial shows the use of DiscretizedLearner. Other 
    65 discretization classes from core Orange are listed in chapter on 
    66 `categorization <../ofb/o_categorization.htm>`_ of the same tutorial. 
     78.. [FayyadIrani1993] UM Fayyad and KB Irani. Multi-interval discretization of continuous valued 
     79  attributes for classification learning. In Proc. 13th International Joint Conference on Artificial Intelligence, pages 
     80  1022--1029, Chambery, France, 1993. 
  • docs/reference/rst/Orange.data.domain.rst

    r9936 r9958  
    308308         variable from the list is used as the class variable. :: 
    309309 
    310              >>> domain1 = orange.Domain([a, b]) 
    311              >>> domain2 = orange.Domain(["a", b, c], domain) 
     310             >>> domain1 = Orange.data.Domain([a, b]) 
     311             >>> domain2 = Orange.data.Domain(["a", b, c], domain) 
    312312 
    313313         :param variables: List of variables (strings or instances of :obj:`~Orange.feature.Descriptor`) 
     
    323323         last variable should be used as the class variable. :: 
    324324 
    325              >>> domain1 = orange.Domain([a, b], False) 
    326              >>> domain2 = orange.Domain(["a", b, c], False, domain) 
     325             >>> domain1 = Orange.data.Domain([a, b], False) 
     326             >>> domain2 = Orange.data.Domain(["a", b, c], False, domain) 
    327327 
    328328         :param variables: List of variables (strings or instances of :obj:`~Orange.feature.Descriptor`) 
  • docs/reference/rst/Orange.data.instance.rst

    r9936 r9958  
    9191passed along with the data:: 
    9292 
    93     bayes = orange.BayesLearner(data, id) 
     93    bayes = Orange.classification.bayes.NaiveLearner(data, id) 
    9494 
    9595Many other functions accept weights in similar fashion. 
     
    112112accessed:: 
    113113 
    114     w = orange.FloatVariable("w") 
     114    w = Orange.feature.Continuous("w") 
    115115    data.domain.addmeta(id, w) 
    116116 
     
    125125allows for conversion from Python native types:: 
    126126 
    127     ok = orange.EnumVariable("ok?", values=["no", "yes"]) 
     127    ok = Orange.feature.Discrete("ok?", values=["no", "yes"]) 
    128128    ok_id = Orange.feature.Descriptor.new_meta_id() 
    129129    data.domain.addmeta(ok_id, ok) 
     
    237237        Convert the instance into an ordinary Python list. If the 
    238238        optional argument `level` is 1 (default), the result is a list of 
    239         instances of :obj:`orange.data.Value`. If it is 0, it contains 
     239        instances of :obj:`Orange.data.Value`. If it is 0, it contains 
    240240        pure Python objects, that is, strings for discrete variables 
    241241        and numbers for continuous ones. 
     
    281281        attributes are returned. :: 
    282282 
    283             data = orange.ExampleTable("inquisition2") 
     283            data = Orange.data.Table("inquisition2") 
    284284            example = data[4] 
    285             print example.getmetas() 
    286             print example.getmetas(int) 
    287             print example.getmetas(str) 
    288             print example.getmetas(orange.Variable) 
     285            print example.get_metas() 
     286            print example.get_metas(int) 
     287            print example.get_metas(str) 
     288            print example.get_metas(Orange.feature.Descriptor) 
    289289 
    290290        :param key_type: the key type; either ``int``, ``str`` or :obj:`~Orange.feature.Descriptor` 
  • docs/reference/rst/Orange.data.rst

    r9941 r9992  
    1313    Orange.data.discretization 
    1414    Orange.data.continuization 
     15    Orange.data.utils 
     16    Orange.data.continuization 
     17    Orange.data.sql 
  • docs/reference/rst/Orange.data.value.rst

    r9927 r9958  
    133133    deg3 = Orange.feature.Discrete( 
    134134        "deg3", values=["little", "medium", "big"]) 
    135     deg4 = orange.feature.Discrete( 
     135    deg4 = Orange.feature.Discrete( 
    136136        "deg4", values=["tiny", "little", "big", "huge"]) 
    137     val3 = orange.Value(deg3) 
    138     val4 = orange.Value(deg4) 
     137    val3 = Orange.data.Value(deg3) 
     138    val4 = Orange.data.Value(deg4) 
    139139    val3.value = "medium" 
    140140    val4.value = "little" 
  • docs/reference/rst/Orange.evaluation.scoring.rst

    r9904 r10004  
    77.. index: scoring 
    88 
    9 This module contains various measures of quality for classification and 
    10 regression. Most functions require an argument named :obj:`res`, an instance of 
    11 :class:`Orange.evaluation.testing.ExperimentResults` as computed by 
    12 functions from :mod:`Orange.evaluation.testing` and which contains 
    13 predictions obtained through cross-validation, 
    14 leave one-out, testing on training data or test set instances. 
     9Scoring plays and integral role in evaluation of any prediction model. Orange 
     10implements various scores for evaluation of classification, 
     11regression and multi-label models. Most of the methods needs to be called 
     12with an instance of :obj:`ExperimentResults`. 
     13 
     14.. literalinclude:: code/statExample0.py 
    1515 
    1616============== 
    1717Classification 
    1818============== 
     19 
     20Many scores for evaluation of classification models can be computed solely 
     21from the confusion matrix constructed manually with the 
     22:obj:`confusion_matrices` function. If class variable has more than two 
     23values, the index of the value to calculate the confusion matrix for should 
     24be passed as well. 
     25 
     26Calibration scores 
     27================== 
     28 
     29.. autoclass:: CAClass 
     30.. autofunction:: sens 
     31.. autofunction:: spec 
     32.. autofunction:: PPV 
     33.. autofunction:: NPV 
     34.. autofunction:: precision 
     35.. autofunction:: recall 
     36.. autofunction:: F1 
     37.. autofunction:: Falpha 
     38.. autofunction:: MCC 
     39.. autofunction:: AP 
     40.. autofunction:: IS 
     41 
     42Discriminatory scores 
     43===================== 
     44 
     45.. autofunction:: Brier_score 
     46 
     47.. autofunction:: AUC 
     48 
     49    .. attribute:: AUC.ByWeightedPairs (or 0) 
     50 
     51      Computes AUC for each pair of classes (ignoring instances of all other 
     52      classes) and averages the results, weighting them by the number of 
     53      pairs of instances from these two classes (e.g. by the product of 
     54      probabilities of the two classes). AUC computed in this way still 
     55      behaves as concordance index, e.g., gives the probability that two 
     56      randomly chosen instances from different classes will be correctly 
     57      recognized (this is of course true only if the classifier knows 
     58      from which two classes the instances came). 
     59 
     60   .. attribute:: AUC.ByPairs (or 1) 
     61 
     62      Similar as above, except that the average over class pairs is not 
     63      weighted. This AUC is, like the binary, independent of class 
     64      distributions, but it is not related to concordance index any more. 
     65 
     66   .. attribute:: AUC.WeightedOneAgainstAll (or 2) 
     67 
     68      For each class, it computes AUC for this class against all others (that 
     69      is, treating other classes as one class). The AUCs are then averaged by 
     70      the class probabilities. This is related to concordance index in which 
     71      we test the classifier's (average) capability for distinguishing the 
     72      instances from a specified class from those that come from other classes. 
     73      Unlike the binary AUC, the measure is not independent of class 
     74      distributions. 
     75 
     76   .. attribute:: AUC.OneAgainstAll (or 3) 
     77 
     78      As above, except that the average is not weighted. 
     79 
     80   In case of multiple folds (for instance if the data comes from cross 
     81   validation), the computation goes like this. When computing the partial 
     82   AUCs for individual pairs of classes or singled-out classes, AUC is 
     83   computed for each fold separately and then averaged (ignoring the number 
     84   of instances in each fold, it's just a simple average). However, if a 
     85   certain fold doesn't contain any instances of a certain class (from the 
     86   pair), the partial AUC is computed treating the results as if they came 
     87   from a single-fold. This is not really correct since the class 
     88   probabilities from different folds are not necessarily comparable, 
     89   yet this will most often occur in a leave-one-out experiments, 
     90   comparability shouldn't be a problem. 
     91 
     92   Computing and printing out the AUC's looks just like printing out 
     93   classification accuracies (except that we call AUC instead of 
     94   CA, of course):: 
     95 
     96       AUCs = Orange.evaluation.scoring.AUC(res) 
     97       for l in range(len(learners)): 
     98           print "%10s: %5.3f" % (learners[l].name, AUCs[l]) 
     99 
     100   For vehicle, you can run exactly this same code; it will compute AUCs 
     101   for all pairs of classes and return the average weighted by probabilities 
     102   of pairs. Or, you can specify the averaging method yourself, like this:: 
     103 
     104       AUCs = Orange.evaluation.scoring.AUC(resVeh, Orange.evaluation.scoring.AUC.WeightedOneAgainstAll) 
     105 
     106   The following snippet tries out all four. (We don't claim that this is 
     107   how the function needs to be used; it's better to stay with the default.):: 
     108 
     109       methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"] 
     110       print " " *25 + "  \tbayes\ttree\tmajority" 
     111       for i in range(4): 
     112           AUCs = Orange.evaluation.scoring.AUC(resVeh, i) 
     113           print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs)) 
     114 
     115   As you can see from the output:: 
     116 
     117                                   bayes   tree    majority 
     118              by pairs, weighted:  0.789   0.871   0.500 
     119                        by pairs:  0.791   0.872   0.500 
     120           one vs. all, weighted:  0.783   0.800   0.500 
     121                     one vs. all:  0.783   0.800   0.500 
     122 
     123.. autofunction:: AUC_single 
     124 
     125.. autofunction:: AUC_pair 
     126 
     127.. autofunction:: AUC_matrix 
     128 
     129The remaining functions, which plot the curves and statistically compare 
     130them, require that the results come from a test with a single iteration, 
     131and they always compare one chosen class against all others. If you have 
     132cross validation results, you can either use split_by_iterations to split the 
     133results by folds, call the function for each fold separately and then sum 
     134the results up however you see fit, or you can set the ExperimentResults' 
     135attribute number_of_iterations to 1, to cheat the function - at your own 
     136responsibility for the statistical correctness. Regarding the multi-class 
     137problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class 
     138attribute's baseValue at the time when results were computed. If baseValue 
     139was not given at that time, 1 (that is, the second class) is used as default. 
     140 
     141We shall use the following code to prepare suitable experimental results:: 
     142 
     143    ri2 = Orange.core.MakeRandomIndices2(voting, 0.6) 
     144    train = voting.selectref(ri2, 0) 
     145    test = voting.selectref(ri2, 1) 
     146    res1 = Orange.evaluation.testing.learnAndTestOnTestData(learners, train, test) 
     147 
     148 
     149.. autofunction:: AUCWilcoxon 
     150 
     151.. autofunction:: compute_ROC 
     152 
     153 
     154.. autofunction:: confusion_matrices 
     155 
     156.. autoclass:: ConfusionMatrix 
     157 
    19158 
    20159To prepare some data for examples on this page, we shall load the voting data 
     
    27166 
    28167Basic cross validation example is shown in the following part of 
    29 (:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`): 
    30  
    31 .. literalinclude:: code/statExample0.py 
     168(:download:`statExamples.py <code/statExamples.py>`): 
    32169 
    33170If instances are weighted, weights are taken into account. This can be 
     
    39176=========================== 
    40177 
    41 .. autofunction:: CA 
    42  
    43 .. autofunction:: AP 
    44  
    45 .. autofunction:: Brier_score 
    46  
    47 .. autofunction:: IS 
     178 
     179 
     180 
    48181 
    49182So, let's compute all this in part of 
    50 (:download:`statExamples.py <code/statExamples.py>`, uses :download:`voting.tab <code/voting.tab>` and :download:`vehicle.tab <code/vehicle.tab>`) and print it out: 
     183(:download:`statExamples.py <code/statExamples.py>`) and print it out: 
    51184 
    52185.. literalinclude:: code/statExample1.py 
     
    58191    bayes   0.903   0.902   0.175    0.759 
    59192    tree    0.846   0.845   0.286    0.641 
    60     majorty  0.614   0.526   0.474   -0.000 
     193    majority  0.614   0.526   0.474   -0.000 
    61194 
    62195Script :download:`statExamples.py <code/statExamples.py>` contains another example that also prints out 
     
    163296   instances. The classifier is obviously quite biased to vans. 
    164297 
    165    .. method:: sens(confm) 
    166    .. method:: spec(confm) 
    167    .. method:: PPV(confm) 
    168    .. method:: NPV(confm) 
    169    .. method:: precision(confm) 
    170    .. method:: recall(confm) 
    171    .. method:: F2(confm) 
    172    .. method:: Falpha(confm, alpha=2.0) 
    173    .. method:: MCC(conf) 
     298 
    174299 
    175300   With the confusion matrix defined in terms of positive and negative 
     
    321446We shall use the following code to prepare suitable experimental results:: 
    322447 
    323     ri2 = Orange.core.MakeRandomIndices2(voting, 0.6) 
     448    ri2 = Orange.data.sample.SubsetIndices2(voting, 0.6) 
    324449    train = voting.selectref(ri2, 0) 
    325450    test = voting.selectref(ri2, 1) 
     
    417542 
    418543So, let's compute all this and print it out (part of 
    419 :download:`mlc-evaluate.py <code/mlc-evaluate.py>`, uses 
    420 :download:`emotions.tab <code/emotions.tab>`): 
     544:download:`mlc-evaluate.py <code/mlc-evaluate.py>`): 
    421545 
    422546.. literalinclude:: code/mlc-evaluate.py 
  • docs/reference/rst/Orange.evaluation.testing.rst

    r9696 r9994  
    3232list of learning algorithms is prepared. 
    3333 
    34 part of :download:`testing-test.py <code/testing-test.py>` (uses :download:`voting.tab <code/voting.tab>`) 
     34part of :download:`testing-test.py <code/testing-test.py>` 
    3535 
    3636.. literalinclude:: code/testing-test.py 
  • docs/reference/rst/Orange.feature.discretization.rst

    r9927 r9964  
    11.. py:currentmodule:: Orange.feature.discretization 
    22 
    3 ###################################<