source: orange/Orange/classification/rules.py @ 9738:99aafa81f22e

Revision 9738:99aafa81f22e, 109.6 KB checked in by Miha Stajdohar <miha.stajdohar@…>, 2 years ago (diff)

Added create_dichotomous_class.

Line 
1"""
2
3.. index:: rule induction
4
5.. index::
6   single: classification; rule induction
7
8**************************
9Rule induction (``rules``)
10**************************
11
12This module implements supervised rule induction algorithms
13and rule-based classification methods, specifically the
14`CN2 induction algorithm <http://www.springerlink.com/content/k6q2v76736w5039r/>`_
15in multiple variants, including an argument-based learning one.
16The implementation is modular, based on the rule induction
17framework that is described below, providing the opportunity to change, specialize
18and improve the algorithm.
19
20CN2 algorithm
21=============
22
23.. index::
24   single: classification; CN2
25
26Several variations of well-known CN2 rule learning algorithms are implemented.
27All are implemented by wrapping the
28:class:`~Orange.classification.rules.RuleLearner` class. Each CN2 learner class
29in this module changes some of RuleLearner's replaceable components to reflect
30the required behavior.
31
32Usage is consistent with typical learner usage in Orange:
33
34:download:`rules-cn2.py <code/rules-cn2.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
35
36.. literalinclude:: code/rules-cn2.py
37    :lines: 7-
38
39The result::
40   
41    IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000>
42    IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000>
43    IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000>
44    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000>
45    IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000>
46    IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000>
47    IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000>
48    IF status=['second'] THEN survived=yes<13.000, 80.000>
49    IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000>
50    IF status=['crew'] THEN survived=no<670.000, 192.000>
51    IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000>
52    IF sex=['male'] THEN survived=no<118.000, 57.000>
53    IF age=['child'] THEN survived=no<17.000, 14.000>
54    IF TRUE THEN survived=no<89.000, 76.000>
55   
56.. autoclass:: Orange.classification.rules.CN2Learner
57   :members:
58   :show-inheritance:
59   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
60      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
61   
62.. autoclass:: Orange.classification.rules.CN2Classifier
63   :members:
64   :show-inheritance:
65   :exclude-members: beamWidth, resultType
66   
67.. index:: unordered CN2
68
69.. index::
70   single: classification; unordered CN2
71
72.. autoclass:: Orange.classification.rules.CN2UnorderedLearner
73   :members:
74   :show-inheritance:
75   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
76      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
77   
78.. autoclass:: Orange.classification.rules.CN2UnorderedClassifier
79   :members:
80   :show-inheritance:
81   
82.. index:: CN2-SD
83.. index:: subgroup discovery
84
85.. index::
86   single: classification; CN2-SD
87   
88.. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner
89   :members:
90   :show-inheritance:
91   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
92      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
93   
94.. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner
95   :members:
96   :show-inheritance:
97   
98References
99----------
100
101* Clark, Niblett. `The CN2 Induction Algorithm
102  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine
103  Learning 3(4):261--284, 1989.
104* Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements
105  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In
106  Machine Learning - EWSL-91. Proceedings of the European Working Session on
107  Learning, pp 151--163, Porto, Portugal, March 1991.
108* Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD
109  <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal
110  of Machine Learning Research 5: 153-188, 2004.
111
112
113Argument based CN2
114==================
115
116Orange also supports argument-based CN2 learning.
117
118.. autoclass:: Orange.classification.rules.ABCN2
119   :members:
120   :show-inheritance:
121   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
122      ruleFinder, ruleStopping, storeInstances, targetClass, weightID,
123      argument_id
124   
125   This class has many more undocumented methods; see the source code for
126   reference.
127   
128.. autoclass:: Orange.classification.rules.ABCN2Ordered
129   :members:
130   :show-inheritance:
131   
132.. autoclass:: Orange.classification.rules.ABCN2M
133   :members:
134   :show-inheritance:
135   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
136      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
137
138Thismodule has many more undocumented argument-based learning related classed;
139see the source code for reference.
140
141References
142----------
143
144* Bratko, Mozina, Zabkar. `Argument-Based Machine Learning
145  <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in
146  Computer Science: vol. 4203/2006, 11-17, 2006.
147
148
149Rule induction framework
150========================
151
152A general framework of classes supports the described CN2 implementation, and
153can in fact be fine-tuned to specific needs by replacing individual components.
154Here is a simple example, while a detailed architecture can be observed
155in description of classes that follows it:
156
157part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
158
159.. literalinclude:: code/rules-customized.py
160    :lines: 7-17
161
162probability with m=50. The result is::
163
164    IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000>
165    IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000>
166    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000>
167    IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000>
168    IF status=['second'] THEN survived=yes<13.000, 104.000>
169    IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000>
170    IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000>
171    IF status=['crew'] THEN survived=yes<3.000, 20.000>
172    IF sex=['female'] THEN survived=no<106.000, 90.000>
173    IF TRUE THEN survived=yes<0.000, 5.000>
174
175Notice that it is first necessary to set the :obj:`rule_finder` component,
176because the default components are not constructed when the learner is
177constructed, but only when we run it on data. At that time, the algorithm
178checks which components are necessary and sets defaults. Similarly, when the
179learner finishes, it destructs all *default* components. Continuing with our
180example, assume that we wish to set a different validation function and a
181different bean width. This is simply written as:
182
183part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
184
185.. literalinclude:: code/rules-customized.py
186    :lines: 19-23
187
188.. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1)
189   
190   Representation of a single induced rule.
191   
192   Parameters, that can be passed to the constructor, correspond to the first
193   7 attributes. All attributes are:
194   
195   .. attribute:: filter
196   
197      contents of the rule; this is the basis of the Rule class. Must be of
198      type :class:`Orange.core.Filter`; an instance of
199      :class:`Orange.core.Filter_values` is set as a default.
200   
201   .. attribute:: classifier
202     
203      each rule can be used as a classical Orange like
204      classifier. Must be of type :class:`Orange.classification.Classifier`.
205      By default, an instance of :class:`Orange.classification.ConstantClassifier` is used.
206   
207   .. attribute:: learner
208     
209      learner to be used for making a classifier. Must be of type
210      :class:`Orange.classification.Learner`. By default,
211      :class:`Orange.classification.majority.MajorityLearner` is used.
212   
213   .. attribute:: class_distribution
214     
215      distribution of class in data instances covered by this rule
216      (:class:`Orange.statistics.distribution.Distribution`).
217   
218   .. attribute:: examples
219     
220      data instances covered by this rule (:class:`Orange.data.Table`).
221   
222   .. attribute:: weight_id
223   
224      ID of the weight meta-attribute for the stored data instances (int).
225   
226   .. attribute:: quality
227     
228      quality of the rule. Rules with higher quality are better (float).
229   
230   .. attribute:: complexity
231   
232      complexity of the rule (float). Complexity is used for
233      selecting between rules with equal quality, where rules with lower
234      complexity are preferred. Typically, complexity corresponds to the
235      number of selectors in rule (actually to number of conditions in filter),
236      but, obviously, any other measure can be applied.
237   
238   .. method:: filter_and_store(instances, weight_id=0, target_class=-1)
239   
240      Filter passed data instances and store them in the attribute 'examples'.
241      Also, compute class_distribution, set weight of stored examples and create
242      a new classifier using 'learner' attribute.
243     
244      :param weight_id: ID of the weight meta-attribute.
245      :type weight_id: int
246      :param target_class: index of target class; -1 for all.
247      :type target_class: int
248   
249   Objects of this class can be invoked:
250
251   .. method:: __call__(instance, instances, weight_id=0, target_class=-1)
252   
253      There are two ways of invoking this method. One way is only passing the
254      data instance; then the Rule object returns True if the given instance is
255      covered by the rule's filter.
256     
257      :param instance: data instance.
258      :type instance: :class:`Orange.data.Instance`
259     
260      Another way of invocation is passing a table of data instances,
261      in which case a table of instances, covered by this rule, is returned.
262     
263      :param instances: a table of data instances.
264      :type instances: :class:`Orange.data.Table`
265      :param ref: TODO
266      :type ref: bool
267      :param negate: if set to True, the result is inverted: the resulting
268          table contains instances *not* covered by the rule.
269      :type negate: bool
270
271.. py:class:: Orange.classification.rules.RuleLearner(store_instances = true, target_class = -1, base_rules = Orange.classification.rules.RuleList())
272   
273   Bases: :class:`Orange.classification.Learner`
274   
275   A base rule induction learner. The algorithm follows separate-and-conquer
276   strategy, which has its origins in the AQ family of algorithms
277   (Fuernkranz J.; Separate-and-Conquer Rule Learning, Artificial Intelligence
278   Review 13, 3-54, 1999). Basically, such algorithms search for the "best"
279   possible rule in learning instancess, remove covered data from learning
280   instances (separate) and repeat the process (conquer) on the remaining
281   instances.
282   
283   The class' functionality can be best explained by showing its __call__
284   function:
285   
286   .. parsed-literal::
287
288      def \_\_call\_\_(self, instances, weight_id=0):
289          rule_list = Orange.classification.rules.RuleList()
290          all_instances = Orange.data.Table(instances)
291          while not self.\ **data_stopping**\ (instances, weight_id, self.target_class):
292              new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class,
293                                        self.base_rules)
294              if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id):
295                  break
296              instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances,
297                                                      weight_id, self.target_class)
298              rule_list.append(new_rule)
299          return Orange.classification.rules.RuleClassifier_FirstRule(
300              rules=rule_list, instances=all_instances)
301               
302   The four customizable components here are the invoked :obj:`data_stopping`,
303   :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping`
304   objects. By default, components of the original CN2 algorithm are be used,
305   but this can be changed by modifying those attributes:
306   
307   .. attribute:: data_stopping
308   
309      an object of class
310      :class:`~Orange.classification.rules.RuleDataStoppingCriteria`
311      that determines whether there will be any benefit from further learning
312      (ie. if there is enough data to continue learning). The default
313      implementation
314      (:class:`~Orange.classification.rules.RuleDataStoppingCriteria_NoPositives`)
315      returns True if there are no more instances of given class.
316   
317   .. attribute:: rule_stopping
318     
319      an object of class
320      :class:`~Orange.classification.rules.RuleStoppingCriteria`
321      that decides from the last rule learned if it is worthwhile to use the
322      rule and learn more rules. By default, no rule stopping criteria is
323      used (:obj:`rule_stopping` == :obj:`None`), thus accepting all
324      rules.
325       
326   .. attribute:: cover_and_remove
327       
328      an object of
329      :class:`RuleCovererAndRemover` that removes
330      instances covered by the rule and returns remaining instances. The
331      default implementation
332      (:class:`RuleCovererAndRemover_Default`)
333      only removes the instances that belong to given target class, except if
334      it is not given (ie. :obj:`target_class` == -1).
335   
336   .. attribute:: rule_finder
337     
338      an object of class
339      :class:`~Orange.classification.rules.RuleFinder` that learns a single
340      rule from instances. Default implementation is
341      :class:`~Orange.classification.rules.RuleBeamFinder`.
342
343   :param store_instances: if set to True, the rules will have data instances
344       stored.
345   :type store_instances: bool
346   
347   :param target_class: index of a specific class being learned; -1 for all.
348   :type target_class: int
349   
350   :param base_rules: Rules that we would like to use in :obj:`rule_finder` to
351       constrain the learning space. If not set, it will be set to a set
352       containing only an empty rule.
353   :type base_rules: :class:`~Orange.classification.rules.RuleList`
354
355Rule finders
356------------
357
358.. class:: Orange.classification.rules.RuleFinder
359
360   Base class for all rule finders. These are used to learn a single rule from
361   instances.
362   
363   Rule finders are invokable in the following manner:
364   
365   .. method:: __call__(table, weight_id, target_class, base_rules)
366   
367      Return a new rule, induced from instances in the given table.
368     
369      :param table: data instances to learn from.
370      :type table: :class:`Orange.data.Table`
371     
372      :param weight_id: ID of the weight meta-attribute for the stored data
373          instances.
374      :type weight_id: int
375     
376      :param target_class: index of a specific class being learned; -1 for all.
377      :type target_class: int
378     
379      :param base_rules: Rules that we would like to use in :obj:`rule_finder`
380          to constrain the learning space. If not set, it will be set to a set
381          containing only an empty rule.
382      :type base_rules: :class:`~Orange.classification.rules.RuleList`
383
384.. class:: Orange.classification.rules.RuleBeamFinder
385   
386   Bases: :class:`~Orange.classification.rules.RuleFinder`
387   
388   Beam search for the best rule. This is the default class used in RuleLearner
389   to find the best rule. Pseudo code of the algorithm is shown here:
390
391   .. parsed-literal::
392
393      def \_\_call\_\_(self, table, weight_id, target_class, base_rules):
394          prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id)
395          rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior)
396          \# compute quality of rules in rules_star and best_rule
397          ...
398          while len(rules_star) \> 0:
399              candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id)
400              for cand in candidates:
401                  new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class)
402                  for new_rule in new_rules:
403                      if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution):
404                          new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior)
405                          rules_star.append(new_rule)
406                          if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and
407                              new_rule.quality \> best_rule.quality:
408                              best_rule = new_rule
409              rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id)
410          return best_rule
411
412   Bolded in the pseudo-code are several exchangeable components, exposed as
413   attributes. These are:
414
415   .. attribute:: initializer
416   
417      an object of class
418      :class:`~Orange.classification.rules.RuleBeamInitializer`
419      used to initialize :obj:`rules_star` and for selecting the
420      initial best rule. By default
421      (:class:`~Orange.classification.rules.RuleBeamInitializer_Default`),
422      :obj:`base_rules` are returned as starting :obj:`rulesSet` and the best
423      from :obj:`base_rules` is set as :obj:`best_rule`. If :obj:`base_rules`
424      are not set, this class will return :obj:`rules_star` with rule that
425      covers all instances (has no selectors) and this rule will be also used
426      as :obj:`best_rule`.
427   
428   .. attribute:: candidate_selector
429   
430      an object of class
431      :class:`~Orange.classification.rules.RuleBeamCandidateSelector`
432      used to separate a subset from the current
433      :obj:`rules_star` and return it. These rules will be used in the next
434      specification step. Default component (an instance of
435      :class:`~Orange.classification.rules.RuleBeamCandidateSelector_TakeAll`)
436      takes all rules in :obj:`rules_star`.
437   
438   .. attribute:: refiner
439   
440      an object of class
441      :class:`~Orange.classification.rules.RuleBeamRefiner`
442      used to refine given rule. New rule should cover a
443      strict subset of examples covered by given rule. Default component
444      (:class:`~Orange.classification.rules.RuleBeamRefiner_Selector`) adds
445      a conjunctive selector to selectors present in the rule.
446   
447   .. attribute:: rule_filter
448   
449      an object of class
450      :class:`~Orange.classification.rules.RuleBeamFilter`
451      used to filter rules to keep beam relatively small
452      to contain search complexity. By default, it takes five best rules:
453      :class:`~Orange.classification.rules.RuleBeamFilter_Width`\ *(m=5)*\ .
454
455   .. method:: __call__(data, weight_id, target_class, base_rules)
456
457   Determines the next best rule to cover the remaining data instances.
458   
459   :param data: data instances.
460   :type data: :class:`Orange.data.Table`
461   
462   :param weight_id: index of the weight meta-attribute.
463   :type weight_id: int
464   
465   :param target_class: index of the target class.
466   :type target_class: int
467   
468   :param base_rules: existing rules.
469   :type base_rules: :class:`~Orange.classification.rules.RuleList`
470
471Rule evaluators
472---------------
473
474.. class:: Orange.classification.rules.RuleEvaluator
475
476   Base class for rule evaluators that evaluate the quality of the rule based
477   on covered data instances. All evaluators support being invoked in the
478   following manner:
479   
480   .. method:: __call__(rule, instances, weight_id, target_class, prior)
481   
482      Calculates a non-negative rule quality.
483     
484      :param rule: rule to evaluate.
485      :type rule: :class:`~Orange.classification.rules.Rule`
486     
487      :param instances: a table of instances, covered by the rule.
488      :type instances: :class:`Orange.data.Table`
489     
490      :param weight_id: index of the weight meta-attribute.
491      :type weight_id: int
492     
493      :param target_class: index of target class of this rule.
494      :type target_class: int
495     
496      :param prior: prior class distribution.
497      :type prior: :class:`Orange.statistics.distribution.Distribution`
498
499.. autoclass:: Orange.classification.rules.LaplaceEvaluator
500   :members:
501   :show-inheritance:
502   :exclude-members: targetClass, weightID
503
504.. autoclass:: Orange.classification.rules.WRACCEvaluator
505   :members:
506   :show-inheritance:
507   :exclude-members: targetClass, weightID
508   
509.. class:: Orange.classification.rules.RuleEvaluator_Entropy
510
511   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
512   
513.. class:: Orange.classification.rules.RuleEvaluator_LRS
514
515   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
516
517.. class:: Orange.classification.rules.RuleEvaluator_Laplace
518
519   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
520
521.. class:: Orange.classification.rules.RuleEvaluator_mEVC
522
523   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
524   
525Instance covering and removal
526-----------------------------
527
528.. class:: RuleCovererAndRemover
529
530   Base class for rule coverers and removers that, when invoked, remove
531   instances covered by the rule and return remaining instances.
532
533   .. method:: __call__(rule, instances, weights, target_class)
534   
535      Calculates a non-negative rule quality.
536     
537      :param rule: rule to evaluate.
538      :type rule: :class:`~Orange.classification.rules.Rule`
539     
540      :param instances: a table of instances, covered by the rule.
541      :type instances: :class:`Orange.data.Table`
542     
543      :param weights: index of the weight meta-attribute.
544      :type weights: int
545     
546      :param target_class: index of target class of this rule.
547      :type target_class: int
548
549.. autoclass:: CovererAndRemover_MultWeights
550
551.. autoclass:: CovererAndRemover_AddWeights
552   
553Miscellaneous functions
554-----------------------
555
556.. automethod:: Orange.classification.rules.rule_to_string
557
558..
559    Undocumented are:
560    Data-based Stopping Criteria
561    ----------------------------
562    Rule-based Stopping Criteria
563    ----------------------------
564    Rule-based Stopping Criteria
565    ----------------------------
566
567"""
568
569import random
570import math
571import operator
572import numpy
573
574import Orange
575import Orange.core
576from Orange.core import \
577    AssociationClassifier, \
578    AssociationLearner, \
579    RuleClassifier, \
580    RuleClassifier_firstRule, \
581    RuleClassifier_logit, \
582    RuleLearner, \
583    Rule, \
584    RuleBeamCandidateSelector, \
585    RuleBeamCandidateSelector_TakeAll, \
586    RuleBeamFilter, \
587    RuleBeamFilter_Width, \
588    RuleBeamInitializer, \
589    RuleBeamInitializer_Default, \
590    RuleBeamRefiner, \
591    RuleBeamRefiner_Selector, \
592    RuleClassifierConstructor, \
593    RuleCovererAndRemover, \
594    RuleCovererAndRemover_Default, \
595    RuleDataStoppingCriteria, \
596    RuleDataStoppingCriteria_NoPositives, \
597    RuleEvaluator, \
598    RuleEvaluator_Entropy, \
599    RuleEvaluator_LRS, \
600    RuleEvaluator_Laplace, \
601    RuleEvaluator_mEVC, \
602    RuleFinder, \
603    RuleBeamFinder, \
604    RuleList, \
605    RuleStoppingCriteria, \
606    RuleStoppingCriteria_NegativeDistribution, \
607    RuleValidator, \
608    RuleValidator_LRS
609from Orange.misc import deprecated_keywords
610from Orange.misc import deprecated_members
611
612
613class ConvertClass:
614    """ Converting class variables into dichotomous class variable. """
615    def __init__(self, classAtt, classValue, newClassAtt):
616        self.classAtt = classAtt
617        self.classValue = classValue
618        self.newClassAtt = newClassAtt
619
620    def __call__(self, example, returnWhat):
621        if example[self.classAtt] == self.classValue:
622            return Orange.data.Value(self.newClassAtt, self.classValue + "_")
623        else:
624            return Orange.data.Value(self.newClassAtt, "not " + self.classValue)
625
626
627def create_dichotomous_class(domain, att, value, negate, removeAtt=None):
628    # create new variable
629    newClass = Orange.data.variable.Discrete(att.name + "_", values=[str(value) + "_", "not " + str(value)])
630    positive = Orange.data.Value(newClass, str(value) + "_")
631    negative = Orange.data.Value(newClass, "not " + str(value))
632    newClass.getValueFrom = ConvertClass(att, str(value), newClass)
633
634    att = [a for a in domain.attributes]
635    newDomain = Orange.data.Domain(att + [newClass])
636    newDomain.addmetas(domain.getmetas())
637    if negate == 1:
638        return (newDomain, negative)
639    else:
640        return (newDomain, positive)
641
642
643class LaplaceEvaluator(RuleEvaluator):
644    """
645    Laplace's rule of succession.
646    """
647    def __call__(self, rule, data, weight_id, target_class, apriori):
648        if not rule.class_distribution:
649            return 0.
650        sumDist = rule.class_distribution.cases
651        if not sumDist or (target_class > -1 and not rule.class_distribution[target_class]):
652            return 0.
653        # get distribution
654        if target_class > -1:
655            return (rule.class_distribution[target_class] + 1) / (sumDist + 2)
656        else:
657            return (max(rule.class_distribution) + 1) / (sumDist + len(data.domain.class_var.values))
658
659LaplaceEvaluator = deprecated_members({"weightID": "weight_id",
660                                       "targetClass": "target_class"})(LaplaceEvaluator)
661
662
663class WRACCEvaluator(RuleEvaluator):
664    """
665    Weighted relative accuracy.
666    """
667    def __call__(self, rule, data, weight_id, target_class, apriori):
668        if not rule.class_distribution:
669            return 0.
670        sumDist = rule.class_distribution.cases
671        if not sumDist or (target_class > -1 and not rule.class_distribution[target_class]):
672            return 0.
673        # get distribution
674        if target_class > -1:
675            pRule = rule.class_distribution[target_class] / apriori[target_class]
676            pTruePositive = rule.class_distribution[target_class] / sumDist
677            pClass = apriori[target_class] / apriori.cases
678        else:
679            pRule = sumDist / apriori.cases
680            pTruePositive = max(rule.class_distribution) / sumDist
681            pClass = apriori[rule.class_distribution.modus()] / sum(apriori)
682        if pTruePositive > pClass:
683            return pRule * (pTruePositive - pClass)
684        else: return (pTruePositive - pClass) / max(pRule, 1e-6)
685
686WRACCEvaluator = deprecated_members({"weightID": "weight_id",
687                                     "targetClass": "target_class"})(WRACCEvaluator)
688
689
690class MEstimateEvaluator(RuleEvaluator):
691    """
692    Rule evaluator using m-estimate of probability rule evaluation function.
693   
694    :param m: m-value for m-estimate
695    :type m: int
696   
697    """
698    def __init__(self, m=2):
699        self.m = m
700    def __call__(self, rule, data, weight_id, target_class, apriori):
701        if not rule.class_distribution:
702            return 0.
703        sumDist = rule.class_distribution.abs
704        if self.m == 0 and not sumDist:
705            return 0.
706        # get distribution
707        if target_class > -1:
708            p = rule.class_distribution[target_class] + self.m * apriori[target_class] / apriori.abs
709            p = p / (rule.class_distribution.abs + self.m)
710        else:
711            p = max(rule.class_distribution) + self.m * apriori[rule.\
712                class_distribution.modus()] / apriori.abs
713            p = p / (rule.class_distribution.abs + self.m)
714        return p
715
716MEstimateEvaluator = deprecated_members({"weightID": "weight_id",
717                                         "targetClass": "target_class"})(MEstimateEvaluator)
718
719
720class CN2Learner(RuleLearner):
721    """
722    Classical CN2 (see Clark and Niblett; 1988) induces a set of ordered
723    rules, which means that classificator must try these rules in the same
724    order as they were learned.
725   
726    If data instances are provided to the constructor, the learning algorithm
727    is called and the resulting classifier is returned instead of the learner.
728
729    :param evaluator: an object that evaluates a rule from covered instances.
730        By default, entropy is used as a measure.
731    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
732    :param beam_width: width of the search beam.
733    :type beam_width: int
734    :param alpha: significance level of the likelihood ratio statistics to
735        determine whether rule is better than the default rule.
736    :type alpha: float
737
738    """
739
740    def __new__(cls, instances=None, weight_id=0, **kwargs):
741        self = RuleLearner.__new__(cls, **kwargs)
742        if instances is not None:
743            self.__init__(**kwargs)
744            return self.__call__(instances, weight_id)
745        else:
746            return self
747
748    def __init__(self, evaluator=RuleEvaluator_Entropy(), beam_width=5,
749        alpha=1.0, **kwds):
750        self.__dict__.update(kwds)
751        self.rule_finder = RuleBeamFinder()
752        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width=beam_width)
753        self.rule_finder.evaluator = evaluator
754        self.rule_finder.validator = RuleValidator_LRS(alpha=alpha)
755
756    def __call__(self, instances, weight=0):
757        supervisedClassCheck(instances)
758
759        cl = RuleLearner.__call__(self, instances, weight)
760        rules = cl.rules
761        return CN2Classifier(rules, instances, weight)
762
763CN2Learner = deprecated_members({"beamWidth": "beam_width",
764                     "ruleFinder": "rule_finder",
765                     "ruleStopping": "rule_stopping",
766                     "dataStopping": "data_stopping",
767                     "coverAndRemove": "cover_and_remove",
768                     "storeInstances": "store_instances",
769                     "targetClass": "target_class",
770                     "baseRules": "base_rules",
771                     "weightID": "weight_id"})(CN2Learner)
772
773
774class CN2Classifier(RuleClassifier):
775    """
776    Classical CN2 (see Clark and Niblett; 1988) classifies a new instance
777    using an ordered set of rules. Usually the learner
778    (:class:`~Orange.classification.rules.CN2Learner`) is used to construct the
779    classifier.
780   
781    :param rules: learned rules to be used for classification (mandatory).
782    :type rules: :class:`~Orange.classification.rules.RuleList`
783   
784    :param instances: data instances that were used for learning.
785    :type instances: :class:`Orange.data.Table`
786   
787    :param weight_id: ID of the weight meta-attribute.
788    :type weight_id: int
789
790    """
791
792    @deprecated_keywords({"examples": "instances"})
793    def __init__(self, rules=None, instances=None, weight_id=0, **argkw):
794        self.rules = rules
795        self.examples = instances
796        self.weight_id = weight_id
797        self.class_var = None if instances is None else instances.domain.class_var
798        self.__dict__.update(argkw)
799        if instances is not None:
800            self.prior = Orange.statistics.distribution.Distribution(instances.domain.class_var, instances)
801
802    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
803        """
804        :param instance: instance to be classified.
805        :type instance: :class:`Orange.data.Instance`
806       
807        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
808              :class:`Orange.classification.Classifier.GetProbabilities` or
809              :class:`Orange.classification.Classifier.GetBoth`
810       
811        :rtype: :class:`Orange.data.Value`,
812              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
813        """
814        classifier = None
815        for r in self.rules:
816         #   r.filter.domain = instance.domain
817            if r(instance) and r.classifier:
818                classifier = r.classifier
819                classifier.defaultDistribution = r.class_distribution
820                break
821        if not classifier:
822            classifier = Orange.classification.ConstantClassifier(instance.domain.class_var, \
823                self.prior.modus())
824            classifier.defaultDistribution = self.prior
825
826        classifier.defaultDistribution.normalize()
827        if result_type == Orange.classification.Classifier.GetValue:
828          return classifier(instance)
829        if result_type == Orange.classification.Classifier.GetProbabilities:
830          return classifier.default_distribution
831        return (classifier(instance), classifier.default_distribution)
832
833    def __str__(self):
834        ret_str = rule_to_string(self.rules[0]) + " " + str(self.rules[0].\
835            class_distribution) + "\n"
836        for r in self.rules[1:]:
837            ret_str += "ELSE " + rule_to_string(r) + " " + str(r.class_distribution) + "\n"
838        return ret_str
839
840CN2Classifier = deprecated_members({"resultType": "result_type",
841                                    "beamWidth": "beam_width"})(CN2Classifier)
842
843
844class CN2UnorderedLearner(RuleLearner):
845    """
846    CN2 unordered (see Clark and Boswell; 1991) induces a set of unordered
847    rules - classification from rules does not assume ordering of rules.
848    Learning rules is quite similar to learning in classical CN2, where
849    the process of learning of rules is separated to learning rules for each
850    class.
851   
852    If data instances are provided to the constructor, the learning algorithm
853    is called and the resulting classifier is returned instead of the learner.
854
855    :param evaluator: an object that evaluates a rule from covered instances.
856        By default, Laplace's rule of succession is used as a measure.
857    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
858    :param beam_width: width of the search beam.
859    :type beam_width: int
860    :param alpha: significance level of the likelihood ratio statistics to
861        determine whether rule is better than the default rule.
862    :type alpha: float
863    """
864    def __new__(cls, instances=None, weight_id=0, **kwargs):
865        self = RuleLearner.__new__(cls, **kwargs)
866        if instances is not None:
867            self.__init__(**kwargs)
868            return self.__call__(instances, weight_id)
869        else:
870            return self
871
872    def __init__(self, evaluator=RuleEvaluator_Laplace(), beam_width=5,
873        alpha=1.0, **kwds):
874        self.__dict__.update(kwds)
875        self.rule_finder = RuleBeamFinder()
876        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width=beam_width)
877        self.rule_finder.evaluator = evaluator
878        self.rule_finder.validator = RuleValidator_LRS(alpha=alpha)
879        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha=1.0)
880        self.rule_stopping = RuleStopping_Apriori()
881        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
882
883    @deprecated_keywords({"weight": "weight_id"})
884    def __call__(self, instances, weight_id=0):
885        supervisedClassCheck(instances)
886
887        rules = RuleList()
888        self.rule_stopping.apriori = Orange.statistics.distribution.Distribution(
889            instances.domain.class_var, instances)
890        progress = getattr(self, "progressCallback", None)
891        if progress:
892            progress.start = 0.0
893            progress.end = 0.0
894            distrib = Orange.statistics.distribution.Distribution(
895                instances.domain.class_var, instances, weight_id)
896            distrib.normalize()
897        for target_class in instances.domain.class_var:
898            if progress:
899                progress.start = progress.end
900                progress.end += distrib[target_class]
901            self.target_class = target_class
902            cl = RuleLearner.__call__(self, instances, weight_id)
903            for r in cl.rules:
904                rules.append(r)
905        if progress:
906            progress(1.0, None)
907        return CN2UnorderedClassifier(rules, instances, weight_id)
908
909CN2UnorderedLearner = deprecated_members({"beamWidth": "beam_width",
910                     "ruleFinder": "rule_finder",
911                     "ruleStopping": "rule_stopping",
912                     "dataStopping": "data_stopping",
913                     "coverAndRemove": "cover_and_remove",
914                     "storeInstances": "store_instances",
915                     "targetClass": "target_class",
916                     "baseRules": "base_rules",
917                     "weightID": "weight_id"})(CN2UnorderedLearner)
918
919
920class CN2UnorderedClassifier(RuleClassifier):
921    """
922    CN2 unordered (see Clark and Boswell; 1991) classifies a new instance using
923    a set of unordered rules. Usually the learner
924    (:class:`~Orange.classification.rules.CN2UnorderedLearner`) is used to
925    construct the classifier.
926   
927    :param rules: learned rules to be used for classification (mandatory).
928    :type rules: :class:`~Orange.classification.rules.RuleList`
929   
930    :param instances: data instances that were used for learning.
931    :type instances: :class:`Orange.data.Table`
932   
933    :param weight_id: ID of the weight meta-attribute.
934    :type weight_id: int
935
936    """
937
938    @deprecated_keywords({"examples": "instances"})
939    def __init__(self, rules=None, instances=None, weight_id=0, **argkw):
940        self.rules = rules
941        self.examples = instances
942        self.weight_id = weight_id
943        self.class_var = instances.domain.class_var if instances is not None else None
944        self.__dict__.update(argkw)
945        if instances is not None:
946            self.prior = Orange.statistics.distribution.Distribution(
947                                instances.domain.class_var, instances)
948
949    @deprecated_keywords({"retRules": "ret_rules"})
950    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue, ret_rules=False):
951        """
952        :param instance: instance to be classified.
953        :type instance: :class:`Orange.data.Instance`
954       
955        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
956              :class:`Orange.classification.Classifier.GetProbabilities` or
957              :class:`Orange.classification.Classifier.GetBoth`
958       
959        :rtype: :class:`Orange.data.Value`,
960              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
961        """
962        def add(disc1, disc2, sumd):
963            disc = Orange.statistics.distribution.Discrete(disc1)
964            sumdisc = sumd
965            for i, d in enumerate(disc):
966                disc[i] += disc2[i]
967                sumdisc += disc2[i]
968            return disc, sumdisc
969
970        # create empty distribution
971        retDist = Orange.statistics.distribution.Discrete(self.examples.domain.class_var)
972        covRules = RuleList()
973        # iterate through instances - add distributions
974        sumdisc = 0.
975        for r in self.rules:
976            if r(instance) and r.class_distribution:
977                retDist, sumdisc = add(retDist, r.class_distribution, sumdisc)
978                covRules.append(r)
979        if not sumdisc:
980            retDist = self.prior
981            sumdisc = self.prior.abs
982
983        if sumdisc > 0.0:
984            for c in self.examples.domain.class_var:
985                retDist[c] /= sumdisc
986        else:
987            retDist.normalize()
988
989        if ret_rules:
990            if result_type == Orange.classification.Classifier.GetValue:
991              return (retDist.modus(), covRules)
992            if result_type == Orange.classification.Classifier.GetProbabilities:
993              return (retDist, covRules)
994            return (retDist.modus(), retDist, covRules)
995        if result_type == Orange.classification.Classifier.GetValue:
996          return retDist.modus()
997        if result_type == Orange.classification.Classifier.GetProbabilities:
998          return retDist
999        return (retDist.modus(), retDist)
1000
1001    def __str__(self):
1002        retStr = ""
1003        for r in self.rules:
1004            retStr += rule_to_string(r) + " " + str(r.class_distribution) + "\n"
1005        return retStr
1006
1007
1008class CN2SDUnorderedLearner(CN2UnorderedLearner):
1009    """
1010    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules, which
1011    is the same as :class:`~Orange.classification.rules.CN2UnorderedLearner`.
1012    The difference between classical CN2 unordered and CN2-SD is selection of
1013    specific evaluation function and covering function:
1014    :class:`WRACCEvaluator` is used to implement
1015    weight-relative accuracy and
1016    :class:`CovererAndRemover_MultWeights` avoids
1017    excluding covered instances, multiplying their weight by the value of
1018    mult parameter instead.
1019   
1020    If data instances are provided to the constructor, the learning algorithm
1021    is called and the resulting classifier is returned instead of the learner.
1022
1023    :param evaluator: an object that evaluates a rule from covered instances.
1024        By default, weighted relative accuracy is used.
1025    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
1026   
1027    :param beam_width: width of the search beam.
1028    :type beam_width: int
1029   
1030    :param alpha: significance level of the likelihood ratio statistics to
1031        determine whether rule is better than the default rule.
1032    :type alpha: float
1033   
1034    :param mult: multiplicator for weights of covered instances.
1035    :type mult: float
1036    """
1037    def __new__(cls, instances=None, weight_id=0, **kwargs):
1038        self = CN2UnorderedLearner.__new__(cls, **kwargs)
1039        if instances is not None:
1040            self.__init__(**kwargs)
1041            return self.__call__(instances, weight_id)
1042        else:
1043            return self
1044
1045    def __init__(self, evaluator=WRACCEvaluator(), beam_width=5,
1046                alpha=0.05, mult=0.7, **kwds):
1047        CN2UnorderedLearner.__init__(self, evaluator=evaluator,
1048                                          beam_width=beam_width, alpha=alpha, **kwds)
1049        self.cover_and_remove = CovererAndRemover_MultWeights(mult=mult)
1050
1051    def __call__(self, instances, weight=0):
1052        supervisedClassCheck(instances)
1053
1054        oldInstances = Orange.data.Table(instances)
1055        classifier = CN2UnorderedLearner.__call__(self, instances, weight)
1056        for r in classifier.rules:
1057            r.filterAndStore(oldInstances, weight, r.classifier.default_val)
1058        return classifier
1059
1060
1061class ABCN2(RuleLearner):
1062    """
1063    This is an implementation of argument-based CN2 using EVC as evaluation
1064    and LRC classification.
1065   
1066    Rule learning parameters that can be passed to constructor:
1067   
1068    :param width: beam width (default 5).
1069    :type width: int
1070    :param learn_for_class: class for which to learn; None (default) if all
1071       classes are to be learnt.
1072    :param learn_one_rule: decides whether to rule one rule only (default
1073       False).
1074    :type learn_one_rule: boolean
1075    :param analyse_argument: index of argument to analyse; -1 to learn normally
1076       (default)
1077    :type analyse_argument: int
1078    :param debug: sets debug mode - prints some info during execution; False (default)
1079    :type debug: boolean
1080   
1081    The following evaluator related arguments are supported:
1082   
1083    :param m: m for m-estimate to be corrected with EVC (default 2).
1084    :type m: int
1085    :param opt_reduction: type of EVC correction: 0=no correction,
1086       1=pessimistic, 2=normal (default 2).
1087    :type opt_reduction: int
1088    :param nsampling: number of samples in estimating extreme value
1089       distribution for EVC (default 100).
1090    :type nsampling: int
1091    :param evd: pre-given extreme value distributions.
1092    :param evd_arguments: pre-given extreme value distributions for arguments.
1093   
1094    Those parameters control rule validation:
1095   
1096    :param rule_sig: minimal rule significance (default 1.0).
1097    :type rule_sig: float
1098    :param att_sig: minimal attribute significance in rule (default 1.0).
1099    :type att_sig: float
1100    :param max_rule_complexity: maximum number of conditions in rule (default 5).
1101    :type max_rule_complexity: int
1102    :param min_coverage: minimal number of covered instances (default 5).
1103    :type min_coverage: int
1104   
1105    Probabilistic covering can be controlled using:
1106   
1107    :param min_improved: minimal number of instances improved in probabilistic covering (default 1).
1108    :type min_improved: int
1109    :param min_improved_perc: minimal percentage of covered instances that need to be improved (default 0.0).
1110    :type min_improved_perc: float
1111   
1112    Finally, LRC (classifier) related parameters are:
1113   
1114    :param add_sub_rules: decides whether to add sub-rules.
1115    :type add_sub_rules: boolean
1116    :param min_cl_sig: minimal significance of beta in classifier (default 0.5).
1117    :type min_cl_sig: float
1118    :param min_beta: minimal beta value (default 0.0).
1119    :type min_beta: float
1120    :param set_prefix_rules: decides whether ordered prefix rules should be
1121       added (default False).
1122    :type set_prefix_rules: boolean
1123    :param alternative_learner: use rule-learner as a correction method for
1124       other machine learning methods (default None).
1125
1126    """
1127
1128    def __init__(self, argument_id=0, width=5, m=2, opt_reduction=2, nsampling=100, max_rule_complexity=5,
1129                 rule_sig=1.0, att_sig=1.0, postpruning=None, min_quality=0., min_coverage=1, min_improved=1, min_improved_perc=0.0,
1130                 learn_for_class=None, learn_one_rule=False, evd=None, evd_arguments=None, prune_arguments=False, analyse_argument= -1,
1131                 alternative_learner=None, min_cl_sig=0.5, min_beta=0.0, set_prefix_rules=False, add_sub_rules=False, debug=False,
1132                 **kwds):
1133
1134        # argument ID which is passed to abcn2 learner
1135        self.argument_id = argument_id
1136        # learn for specific class only?       
1137        self.learn_for_class = learn_for_class
1138        # only analysing a specific argument or learning all at once
1139        self.analyse_argument = analyse_argument
1140        # should we learn only one rule?
1141        self.learn_one_rule = learn_one_rule
1142        self.postpruning = postpruning
1143        # rule finder
1144        self.rule_finder = RuleBeamFinder()
1145        self.ruleFilter = RuleBeamFilter_Width(width=width)
1146        self.ruleFilter_arguments = ABBeamFilter(width=width)
1147        if max_rule_complexity - 1 < 0:
1148            max_rule_complexity = 10
1149        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha=1.0, min_quality=0., max_rule_complexity=max_rule_complexity - 1, min_coverage=min_coverage)
1150        self.refiner = RuleBeamRefiner_Selector()
1151        self.refiner_arguments = SelectorAdder(discretizer=Orange.feature.discretization.EntropyDiscretization(forceAttribute=1,
1152                                                                                           maxNumberOfIntervals=2))
1153        self.prune_arguments = prune_arguments
1154        # evc evaluator
1155        evdGet = Orange.core.EVDistGetter_Standard()
1156        self.rule_finder.evaluator = RuleEvaluator_mEVC(m=m, evDistGetter=evdGet, min_improved=min_improved, min_improved_perc=min_improved_perc)
1157        self.rule_finder.evaluator.returnExpectedProb = True
1158        self.rule_finder.evaluator.optimismReduction = opt_reduction
1159        self.rule_finder.evaluator.ruleAlpha = rule_sig
1160        self.rule_finder.evaluator.attributeAlpha = att_sig
1161        self.rule_finder.evaluator.validator = RuleValidator_LRS(alpha=1.0, min_quality=min_quality, min_coverage=min_coverage, max_rule_complexity=max_rule_complexity - 1)
1162
1163        # learn stopping criteria
1164        self.rule_stopping = None
1165        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
1166        # evd fitting
1167        self.evd_creator = EVDFitter(self, n=nsampling)
1168        self.evd = evd
1169        self.evd_arguments = evd_arguments
1170        # classifier
1171        self.add_sub_rules = add_sub_rules
1172        self.classifier = PILAR(alternative_learner=alternative_learner, min_cl_sig=min_cl_sig, min_beta=min_beta, set_prefix_rules=set_prefix_rules)
1173        self.debug = debug
1174        # arbitrary parameters
1175        self.__dict__.update(kwds)
1176
1177
1178    def __call__(self, examples, weight_id=0):
1179        # initialize progress bar
1180        progress = getattr(self, "progressCallback", None)
1181        if progress:
1182            progress.start = 0.0
1183            progress.end = 0.0
1184            distrib = Orange.statistics.distribution.Distribution(
1185                             examples.domain.class_var, examples, weight_id)
1186            distrib.normalize()
1187
1188        # we begin with an empty set of rules
1189        all_rules = RuleList()
1190
1191        # th en, iterate through all classes and learn rule for each class separately
1192        for cl_i, cl in enumerate(examples.domain.class_var):
1193            if progress:
1194                step = distrib[cl] / 2.
1195                progress.start = progress.end
1196                progress.end += step
1197
1198            if self.learn_for_class and not self.learn_for_class in [cl, cl_i]:
1199                continue
1200
1201            # rules for this class only
1202            rules = RuleList()
1203
1204            # create dichotomous class
1205            dich_data = self.create_dich_class(examples, cl)
1206
1207            # preparation of the learner (covering, evd, etc.)
1208            self.prepare_settings(dich_data, weight_id, cl_i, progress)
1209
1210            # learn argumented rules first ...
1211            self.turn_ABML_mode(dich_data, weight_id, cl_i)
1212            # first specialize all unspecialized arguments
1213            # dich_data = self.specialise_arguments(dich_data, weight_id)
1214            # comment: specialisation of arguments is within learning of an argumented rule;
1215            #          this is now different from the published algorithm
1216            if progress:
1217                progress.start = progress.end
1218                progress.end += step
1219
1220            aes = self.get_argumented_examples(dich_data)
1221            aes = self.sort_arguments(aes, dich_data)
1222            while aes:
1223                if self.analyse_argument > -1 and \
1224                   (isinstance(self.analyse_argument, Orange.core.Example) and not Orange.core.Example(dich_data.domain, self.analyse_argument) == aes[0] or \
1225                    isinstance(self.analyse_argument, int) and not dich_data[self.analyse_argument] == aes[0]):
1226                    aes = aes[1:]
1227                    continue
1228                ae = aes[0]
1229                rule = self.learn_argumented_rule(ae, dich_data, weight_id) # target class is always first class (0)
1230                if self.debug and rule:
1231                    print "learned arg rule", Orange.classification.rules.rule_to_string(rule)
1232                elif self.debug:
1233                    print "no rule came out of ", ae
1234                if rule:
1235                    rules.append(rule)
1236                    aes = filter(lambda x: not rule(x), aes)
1237                else:
1238                    aes = aes[1:]
1239                aes = aes[1:]
1240
1241            if not progress and self.debug:
1242                print " arguments finished ... "
1243
1244            # remove all examples covered by rules
1245            for rule in rules:
1246                dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1247            if progress:
1248                progress(self.remaining_probability(dich_data), None)
1249
1250            # learn normal rules on remaining examples
1251            if self.analyse_argument == -1:
1252                self.turn_normal_mode(dich_data, weight_id, cl_i)
1253                while dich_data:
1254                    # learn a rule
1255                    rule = self.learn_normal_rule(dich_data, weight_id, self.apriori)
1256                    if not rule:
1257                        break
1258                    if self.debug:
1259                        print "rule learned: ", Orange.classification.rules.rule_to_string(rule), rule.quality
1260                    dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1261                    if progress:
1262                        progress(self.remaining_probability(dich_data), None)
1263                    rules.append(rule)
1264                    if self.learn_one_rule:
1265                        break
1266
1267            # prune unnecessary rules
1268            rules = self.prune_unnecessary_rules(rules, dich_data, weight_id)
1269
1270            if self.add_sub_rules:
1271                rules = self.add_sub_rules_call(rules, dich_data, weight_id)
1272
1273            # restore domain and class in rules, add them to all_rules
1274            for r in rules:
1275                all_rules.append(self.change_domain(r, cl, examples, weight_id))
1276
1277            if progress:
1278                progress(1.0, None)
1279        # create a classifier from all rules       
1280        return self.create_classifier(all_rules, examples, weight_id)
1281
1282    def learn_argumented_rule(self, ae, examples, weight_id):
1283        # prepare roots of rules from arguments
1284        positive_args = self.init_pos_args(ae, examples, weight_id)
1285        if not positive_args: # something wrong
1286            raise "There is a problem with argumented example %s" % str(ae)
1287            return None
1288        negative_args = self.init_neg_args(ae, examples, weight_id)
1289
1290        # set negative arguments in refiner
1291        self.rule_finder.refiner.notAllowedSelectors = negative_args
1292        self.rule_finder.refiner.example = ae
1293        # set arguments to filter
1294        self.rule_finder.ruleFilter.setArguments(examples.domain, positive_args)
1295
1296        # learn a rule
1297        self.rule_finder.evaluator.bestRule = None
1298        self.rule_finder(examples, weight_id, 0, positive_args)
1299
1300        # return best rule
1301        return self.rule_finder.evaluator.bestRule
1302
1303    def prepare_settings(self, examples, weight_id, cl_i, progress):
1304        # apriori distribution
1305        self.apriori = Orange.statistics.distribution.Distribution(
1306                                examples.domain.class_var, examples, weight_id)
1307
1308        # prepare covering mechanism
1309        self.coverAndRemove = CovererAndRemover_Prob(examples, weight_id, 0, self.apriori, self.argument_id)
1310        self.rule_finder.evaluator.probVar = examples.domain.getmeta(self.cover_and_remove.probAttribute)
1311
1312        # compute extreme distributions
1313        # TODO: why evd and evd_this????
1314        if self.rule_finder.evaluator.optimismReduction > 0 and not self.evd:
1315            self.evd_this = self.evd_creator.computeEVD(examples, weight_id, target_class=0, progress=progress)
1316        if self.evd:
1317            self.evd_this = self.evd[cl_i]
1318
1319    def turn_ABML_mode(self, examples, weight_id, cl_i):
1320        # evaluator
1321        if self.rule_finder.evaluator.optimismReduction > 0 and self.argument_id:
1322            if self.evd_arguments:
1323                self.rule_finder.evaluator.evDistGetter.dists = self.evd_arguments[cl_i]
1324            else:
1325                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD_example(examples, weight_id, target_class=0)
1326        # rule refiner
1327        self.rule_finder.refiner = self.refiner_arguments
1328        self.rule_finder.refiner.argument_id = self.argument_id
1329        self.rule_finder.ruleFilter = self.ruleFilter_arguments
1330
1331    def create_dich_class(self, examples, cl):
1332        """
1333        Create dichotomous class.
1334        """
1335        (newDomain, targetVal) = create_dichotomous_class(examples.domain, examples.domain.class_var, str(cl), negate=0)
1336        newDomainmetas = newDomain.getmetas()
1337        newDomain.addmeta(Orange.data.new_meta_id(), examples.domain.class_var) # old class as meta
1338        dichData = examples.select(newDomain)
1339        if self.argument_id:
1340            for d in dichData: # remove arguments given to other classes
1341                if not d.getclass() == targetVal:
1342                    d[self.argument_id] = "?"
1343        return dichData
1344
1345    def get_argumented_examples(self, examples):
1346        if not self.argument_id:
1347            return None
1348
1349        # get argumented examples
1350        return ArgumentFilter_hasSpecial()(examples, self.argument_id, target_class=0)
1351
1352    def sort_arguments(self, arg_examples, examples):
1353        if not self.argument_id:
1354            return None
1355        evaluateAndSortArguments(examples, self.argument_id)
1356        if len(arg_examples) > 0:
1357            # sort examples by their arguments quality (using first argument as it has already been sorted)
1358            sorted = arg_examples.native()
1359            sorted.sort(lambda x, y:-cmp(x[self.argument_id].value.positive_arguments[0].quality,
1360                                         y[self.argument_id].value.positive_arguments[0].quality))
1361            return Orange.data.Table(examples.domain, sorted)
1362        else:
1363            return None
1364
1365    def turn_normal_mode(self, examples, weight_id, cl_i):
1366        # evaluator
1367        if self.rule_finder.evaluator.optimismReduction > 0:
1368            if self.evd:
1369                self.rule_finder.evaluator.evDistGetter.dists = self.evd[cl_i]
1370            else:
1371                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD(examples, weight_id, target_class=0)
1372        # rule refiner
1373        self.rule_finder.refiner = self.refiner
1374        self.rule_finder.ruleFilter = self.ruleFilter
1375
1376    def learn_normal_rule(self, examples, weight_id, apriori):
1377        if hasattr(self.rule_finder.evaluator, "bestRule"):
1378            self.rule_finder.evaluator.bestRule = None
1379        rule = self.rule_finder(examples, weight_id, 0, RuleList())
1380        if hasattr(self.rule_finder.evaluator, "bestRule") and self.rule_finder.evaluator.returnExpectedProb:
1381            rule = self.rule_finder.evaluator.bestRule
1382            self.rule_finder.evaluator.bestRule = None
1383        if self.postpruning:
1384            rule = self.postpruning(rule, examples, weight_id, 0, aprior)
1385        return rule
1386
1387    def remove_covered_examples(self, rule, examples, weight_id):
1388        nexamples, nweight = self.cover_and_remove(rule, examples, weight_id, 0)
1389        return nexamples
1390
1391
1392    def prune_unnecessary_rules(self, rules, examples, weight_id):
1393        return self.cover_and_remove.getBestRules(rules, examples, weight_id)
1394
1395    def change_domain(self, rule, cl, examples, weight_id):
1396        rule.filter = Orange.core.Filter_values(domain=examples.domain,
1397                                        conditions=rule.filter.conditions)
1398        rule.filterAndStore(examples, weight_id, cl)
1399        if hasattr(rule, "learner") and hasattr(rule.learner, "arg_example"):
1400            rule.learner.arg_example = Orange.data.Instance(examples.domain, rule.learner.arg_example)
1401        return rule
1402
1403    def create_classifier(self, rules, examples, weight_id):
1404        return self.classifier(rules, examples, weight_id)
1405
1406    def add_sub_rules_call(self, rules, examples, weight_id):
1407        apriori = Orange.statistics.distribution.Distribution(
1408                            examples.domain.class_var, examples, weight_id)
1409        new_rules = RuleList()
1410        for r in rules:
1411            new_rules.append(r)
1412
1413        # loop through rules
1414        for r in rules:
1415            tmpList = RuleList()
1416            tmpRle = r.clone()
1417            tmpRle.filter.conditions = r.filter.conditions[:r.requiredConditions] # do not split argument
1418            tmpRle.parentRule = None
1419            tmpRle.filterAndStore(examples, weight_id, r.classifier.default_val)
1420            tmpRle.complexity = 0
1421            tmpList.append(tmpRle)
1422            while tmpList and len(tmpList[0].filter.conditions) <= len(r.filter.conditions):
1423                tmpList2 = RuleList()
1424                for tmpRule in tmpList:
1425                    # evaluate tmpRule
1426                    oldREP = self.rule_finder.evaluator.returnExpectedProb
1427                    self.rule_finder.evaluator.returnExpectedProb = False
1428                    tmpRule.quality = self.rule_finder.evaluator(tmpRule, examples, weight_id, r.classifier.default_val, apriori)
1429                    self.rule_finder.evaluator.returnExpectedProb = oldREP
1430                tmpList.sort(lambda x, y:-cmp(x.quality, y.quality))
1431                tmpList = tmpList[:self.rule_filter.width]
1432
1433                for tmpRule in tmpList:
1434                    # if rule not in rules already, add it to the list
1435                    if not True in [Orange.classification.rules.rules_equal(ri, tmpRule) for ri in new_rules] and len(tmpRule.filter.conditions) > 0 and tmpRule.quality > apriori[r.classifier.default_val] / apriori.abs:
1436                        new_rules.append(tmpRule)
1437                    # create new tmpRules, set parent Rule, append them to tmpList2
1438                    if not True in [Orange.classification.rules.rules_equal(ri, tmpRule) for ri in new_rules]:
1439                        for c in r.filter.conditions:
1440                            tmpRule2 = tmpRule.clone()
1441                            tmpRule2.parentRule = tmpRule
1442                            tmpRule2.filter.conditions.append(c)
1443                            tmpRule2.filterAndStore(examples, weight_id, r.classifier.default_val)
1444                            tmpRule2.complexity += 1
1445                            if tmpRule2.class_distribution.abs < tmprule.class_distribution.abs:
1446                                tmpList2.append(tmpRule2)
1447                tmpList = tmpList2
1448        return new_rules
1449
1450    def init_pos_args(self, ae, examples, weight_id):
1451        pos_args = RuleList()
1452        # prepare arguments
1453        for p in ae[self.argument_id].value.positive_arguments:
1454            new_arg = Rule(filter=ArgFilter(argument_id=self.argument_id,
1455                                                   filter=self.newFilter_values(p.filter),
1456                                                   arg_example=ae),
1457                                                   complexity=0)
1458            new_arg.valuesFilter = new_arg.filter.filter
1459            pos_args.append(new_arg)
1460
1461
1462        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1463            old_exp = self.rule_finder.evaluator.returnExpectedProb
1464            self.rule_finder.evaluator.returnExpectedProb = False
1465
1466        # argument pruning (all or just unfinished arguments)
1467        # if pruning is chosen, then prune arguments if possible
1468        for p in pos_args:
1469            p.filterAndStore(examples, weight_id, 0)
1470            if not p.learner:
1471                p.learner = DefaultLearner(default_value=ae.getclass())
1472            # pruning on: we check on all conditions and take only best
1473            if self.prune_arguments:
1474                allowed_conditions = [c for c in p.filter.conditions]
1475                pruned_conditions = self.prune_arg_conditions(ae, allowed_conditions, examples, weight_id)
1476                p.baseDist = Orange.statistics.distribution.Distribution(examples.domain.classVar, examples, weight_id)
1477                p.filter.conditions = pruned_conditions
1478                p.learner.setattr("arg_length", 0)
1479
1480            else: # prune only unspecified conditions
1481                spec_conditions = [c for c in p.filter.conditions if not c.unspecialized_condition]
1482                unspec_conditions = [c for c in p.filter.conditions if c.unspecialized_condition]
1483                # let rule cover now all examples filtered by specified conditions
1484                p.filter.conditions = spec_conditions
1485                p.filterAndStore(examples, weight_id, 0)
1486                p.baseDist = p.classDistribution
1487                p.learner.setattr("arg_length", len(p.filter.conditions))
1488                pruned_conditions = self.prune_arg_conditions(ae, unspec_conditions, p.examples, p.weightID)
1489                p.filter.conditions.extend(pruned_conditions)
1490                p.filter.filter.conditions.extend(pruned_conditions)
1491                # if argument does not contain all unspecialized reasons, add those reasons with minimum values
1492                at_oper_pairs = [(c.position, c.oper) for c in p.filter.conditions if type(c) == Orange.core.ValueFilter_continuous]
1493                for u in unspec_conditions:
1494                    if not (u.position, u.oper) in at_oper_pairs:
1495                        # find minimum value
1496                        if u.oper == Orange.core.ValueFilter_continuous.Greater or u.oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1497                            u.ref = min([float(e[u.position]) - 10. for e in p.examples])
1498                        else:
1499                            u.ref = max([float(e[u.position]) + 10. for e in p.examples])
1500                        p.filter.conditions.append(u)
1501                        p.filter.filter.conditions.append(u)
1502
1503        # set parameters to arguments
1504        for p_i, p in enumerate(pos_args):
1505            p.filterAndStore(examples, weight_id, 0)
1506            p.filter.domain = examples.domain
1507            p.classifier = p.learner(p.examples, p.weightID)
1508            p.requiredConditions = len(p.filter.conditions)
1509            p.learner.setattr("arg_example", ae)
1510            p.complexity = len(p.filter.conditions)
1511
1512        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1513            self.rule_finder.evaluator.returnExpectedProb = old_exp
1514
1515        return pos_args
1516
1517    def newFilter_values(self, filter):
1518        newFilter = Orange.core.Filter_values()
1519        newFilter.conditions = filter.conditions[:]
1520        newFilter.domain = filter.domain
1521        newFilter.negate = filter.negate
1522        newFilter.conjunction = filter.conjunction
1523        return newFilter
1524
1525    def init_neg_args(self, ae, examples, weight_id):
1526        return ae[self.argument_id].value.negative_arguments
1527
1528    def remaining_probability(self, examples):
1529        return self.cover_and_remove.covered_percentage(examples)
1530
1531    def prune_arg_conditions(self, crit_example, allowed_conditions, examples, weight_id):
1532        if not allowed_conditions:
1533            return []
1534        cn2_learner = Orange.classification.rules.CN2UnorderedLearner()
1535        cn2_learner.rule_finder = RuleBeamFinder()
1536        cn2_learner.rule_finder.refiner = SelectorArgConditions(crit_example, allowed_conditions)
1537        cn2_learner.rule_finder.evaluator = Orange.classification.rules.MEstimateEvaluator(self.rule_finder.evaluator.m)
1538        rule = cn2_learner.rule_finder(examples, weight_id, 0, RuleList())
1539        return rule.filter.conditions
1540
1541ABCN2 = deprecated_members({"beamWidth": "beam_width",
1542                     "ruleFinder": "rule_finder",
1543                     "ruleStopping": "rule_stopping",
1544                     "dataStopping": "data_stopping",
1545                     "coverAndRemove": "cover_and_remove",
1546                     "storeInstances": "store_instances",
1547                     "targetClass": "target_class",
1548                     "baseRules": "base_rules",
1549                     "weightID": "weight_id",
1550                     "argumentID": "argument_id"})(ABCN2)
1551
1552class CN2EVCUnorderedLearner(ABCN2):
1553    """
1554    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules in a
1555    simmilar manner as
1556    :class:`~Orange.classification.rules.CN2SDUnorderedLearner`. This
1557    implementation uses the EVC rule evaluation.
1558   
1559    If data instances are provided to the constructor, the learning algorithm
1560    is called and the resulting classifier is returned instead of the learner.
1561
1562    :param evaluator: an object that evaluates a rule from covered instances.
1563        By default, weighted relative accuracy is used.
1564    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
1565   
1566    :param beam_width: width of the search beam.
1567    :type beam_width: int
1568   
1569    :param alpha: significance level of the likelihood ratio statistics to
1570        determine whether rule is better than the default rule.
1571    :type alpha: float
1572   
1573    :param mult: multiplicator for weights of covered instances.
1574    :type mult: float
1575    """
1576    def __init__(self, width=5, nsampling=100, rule_sig=1.0, att_sig=1.0, \
1577        min_coverage=1., max_rule_complexity=5.):
1578        ABCN2.__init__(self, width=width, nsampling=nsampling,
1579            rule_sig=rule_sig, att_sig=att_sig, min_coverage=int(min_coverage),
1580            max_rule_complexity=int(max_rule_complexity))
1581
1582class DefaultLearner(Orange.core.Learner):
1583    """
1584    Default lerner - returns default classifier with predefined output class.
1585    """
1586    def __init__(self, default_value=None):
1587        self.default_value = default_value
1588    def __call__(self, examples, weight_id=0):
1589        return Orange.classification.majority.ConstantClassifier(self.default_value, defaultDistribution=Orange.core.Distribution(examples.domain.class_var, examples, weight_id))
1590
1591class ABCN2Ordered(ABCN2):
1592    """
1593    Rules learned by ABCN2 are ordered and used as a decision list.
1594    """
1595    def __init__(self, argument_id=0, **kwds):
1596        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1597        self.classifier.set_prefix_rules = True
1598        self.classifier.optimize_betas = False
1599
1600class ABCN2M(ABCN2):
1601    """
1602    Argument based rule learning with m-estimate as evaluation function.
1603    """
1604    def __init__(self, argument_id=0, **kwds):
1605        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1606        self.opt_reduction = 0
1607        self.rule_finder.evaluator.optimismReduction = self.opt_reduction
1608        self.classifier = CN2UnorderedClassifier
1609
1610class ABCN2MLRC(ABCN2):
1611    """
1612    Argument based rule learning with m-estimate as evaluation function. LRC is used as a classification method.
1613    """
1614    def __init__(self, argument_id=0, **kwds):
1615        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1616        self.opt_reduction = 0
1617        self.rule_finder.evaluator.optimismReduction = self.opt_reduction
1618
1619class ABCN2_StandardClassification(ABCN2):
1620    """
1621    Argument based rule learning with the original classification technique.
1622    """
1623    def __init__(self, argument_id=0, **kwds):
1624        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1625        self.classifier = CN2UnorderedClassifier
1626
1627
1628class RuleStopping_Apriori(RuleStoppingCriteria):
1629    def __init__(self, apriori=None):
1630        self.apriori = None
1631
1632    def __call__(self, rules, rule, instances, data):
1633        if not self.apriori:
1634            return False
1635        if not type(rule.classifier) == Orange.classification.ConstantClassifier:
1636            return False
1637        ruleAcc = rule.class_distribution[rule.classifier.default_val] / rule.class_distribution.abs
1638        aprioriAcc = self.apriori[rule.classifier.default_val] / self.apriori.abs
1639        if ruleAcc > aprioriAcc:
1640            return False
1641        return True
1642
1643
1644class RuleStopping_SetRules(RuleStoppingCriteria):
1645    def __init__(self, validator):
1646        self.rule_stopping = RuleStoppingCriteria_NegativeDistribution()
1647        self.validator = validator
1648
1649    def __call__(self, rules, rule, instances, data):
1650        ru_st = self.rule_stopping(rules, rule, instances, data)
1651        if not ru_st:
1652            self.validator.rules.append(rule)
1653        return bool(ru_st)
1654
1655
1656class LengthValidator(RuleValidator):
1657    """ prune rules with more conditions than self.length. """
1658    def __init__(self, length= -1):
1659        self.length = length
1660
1661    def __call__(self, rule, data, weight_id, target_class, apriori):
1662        if self.length >= 0:
1663            return len(rule.filter.conditions) <= self.length
1664        return True
1665
1666
1667class NoDuplicatesValidator(RuleValidator):
1668    def __init__(self, alpha=.05, min_coverage=0, max_rule_length=0, rules=RuleList()):
1669        self.rules = rules
1670        self.validator = RuleValidator_LRS(alpha=alpha, \
1671            min_coverage=min_coverage, max_rule_length=max_rule_length)
1672
1673    def __call__(self, rule, data, weight_id, target_class, apriori):
1674        if rule_in_set(rule, self.rules):
1675            return False
1676        return bool(self.validator(rule, data, weight_id, target_class, apriori))
1677
1678
1679
1680class RuleClassifier_BestRule(RuleClassifier):
1681    def __init__(self, rules, instances, weight_id=0, **argkw):
1682        self.rules = rules
1683        self.examples = instances
1684        self.class_var = instances.domain.class_var
1685        self.__dict__.update(argkw)
1686        self.prior = Orange.statistics.distribution.Distribution(
1687                    instances.domain.class_var, instances)
1688
1689    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
1690        retDist = Orange.statistics.distribution.Distribution(instance.domain.class_var)
1691        bestRule = None
1692        for r in self.rules:
1693            if r(instance) and (not bestRule or r.quality > bestRule.quality):
1694                for v_i, v in enumerate(instance.domain.class_var):
1695                    retDist[v_i] = r.class_distribution[v_i]
1696                bestRule = r
1697        if not bestRule:
1698            retDist = self.prior
1699        else:
1700            bestRule.used += 1
1701        sumdist = sum(retDist)
1702        if sumdist > 0.0:
1703            for c in self.examples.domain.class_var:
1704                retDist[c] /= sumdisc
1705        else:
1706            retDist.normalize()
1707        # return classifier(instance, result_type=result_type)
1708        if result_type == Orange.classification.Classifier.GetValue:
1709          return retDist.modus()
1710        if result_type == Orange.classification.Classifier.GetProbabilities:
1711          return retDist
1712        return (retDist.modus(), retDist)
1713
1714    def __str__(self):
1715        retStr = ""
1716        for r in self.rules:
1717            retStr += rule_to_string(r) + " " + str(r.class_distribution) + "\n"
1718        return retStr
1719
1720
1721class CovererAndRemover_MultWeights(RuleCovererAndRemover):
1722    """
1723    Covering and removing of instances using weight multiplication:
1724   
1725    :param mult: weighting multiplication factor
1726    :type mult: float   
1727    """
1728
1729    def __init__(self, mult=0.7):
1730        self.mult = mult
1731    def __call__(self, rule, instances, weights, target_class):
1732        if not weights:
1733            weights = Orange.data.new_meta_id()
1734            instances.addMetaAttribute(weights, 1.)
1735            instances.domain.addmeta(weights, Orange.data.variable.\
1736                Continuous("weights-" + str(weights)), True)
1737        newWeightsID = Orange.data.new_meta_id()
1738        instances.addMetaAttribute(newWeightsID, 1.)
1739        instances.domain.addmeta(newWeightsID, Orange.data.variable.\
1740            Continuous("weights-" + str(newWeightsID)), True)
1741        for instance in instances:
1742            if rule(instance) and instance.getclass() == rule.classifier(\
1743                instance, Orange.classification.Classifier.GetValue):
1744                instance[newWeightsID] = instance[weights] * self.mult
1745            else:
1746                instance[newWeightsID] = instance[weights]
1747        return (instances, newWeightsID)
1748
1749
1750class CovererAndRemover_AddWeights(RuleCovererAndRemover):
1751    """
1752    Covering and removing of instances using weight addition.
1753   
1754    """
1755
1756    def __call__(self, rule, instances, weights, target_class):
1757        if not weights:
1758            weights = Orange.data.new_meta_id()
1759            instances.addMetaAttribute(weights, 1.)
1760            instances.domain.addmeta(weights, Orange.data.variable.\
1761                Continuous("weights-" + str(weights)), True)
1762        try:
1763            coverage = instances.domain.getmeta("Coverage")
1764        except:
1765            coverage = Orange.data.variable.Continuous("Coverage")
1766            instances.domain.addmeta(Orange.data.new_meta_id(), coverage, True)
1767            instances.addMetaAttribute(coverage, 0.0)
1768        newWeightsID = Orange.data.new_meta_id()
1769        instances.addMetaAttribute(newWeightsID, 1.)
1770        instances.domain.addmeta(newWeightsID, Orange.data.variable.\
1771            Continuous("weights-" + str(newWeightsID)), True)
1772        for instance in instances:
1773            if rule(instance) and instance.getclass() == rule.classifier(instance, \
1774                    Orange.classification.Classifier.GetValue):
1775                try:
1776                    instance[coverage] += 1.0
1777                except:
1778                    instance[coverage] = 1.0
1779                instance[newWeightsID] = 1.0 / (instance[coverage] + 1)
1780            else:
1781                instance[newWeightsID] = instance[weights]
1782        return (instances, newWeightsID)
1783
1784
1785class CovererAndRemover_Prob(RuleCovererAndRemover):
1786    """ This class impements probabilistic covering. """
1787    def __init__(self, examples, weight_id, target_class, apriori, argument_id):
1788        self.best_rule = [None] * len(examples)
1789        self.prob_attribute = Orange.data.new_meta_id()
1790        self.apriori_prob = apriori[target_class] / apriori.abs
1791        examples.addMetaAttribute(self.prob_attribute, self.apriori_prob)
1792        examples.domain.addmeta(self.prob_attribute,
1793            Orange.data.variable.Continuous("Probs"))
1794        self.argument_id = argument_id
1795
1796    def getBestRules(self, current_rules, examples, weight_id):
1797        best_rules = RuleList()
1798        for r_i, r in enumerate(self.best_rule):
1799            if r and not rule_in_set(r, best_rules) and int(examples[r_i].getclass()) == int(r.classifier.default_value):
1800                if hasattr(r.learner, "arg_example"):
1801                    setattr(r, "best_example", r.learner.arg_example)
1802                else:
1803                    setattr(r, "best_example", examples[r_i])
1804                best_rules.append(r)
1805        return best_rules
1806
1807    def __call__(self, rule, examples, weights, target_class):
1808        """ if example has an argument, then the rule must be consistent with the argument. """
1809        example = getattr(rule.learner, "arg_example", None)
1810        for ei, e in enumerate(examples):
1811            if e == example:
1812                e[self.prob_attribute] = 1.0
1813                self.best_rule[ei] = rule
1814            elif rule(e) and rule.quality > e[self.prob_attribute]:
1815                e[self.prob_attribute] = rule.quality + 0.001 # 0.001 is added to avoid numerical errors
1816                self.best_rule[ei] = rule
1817        return (examples, weights)
1818
1819    def filter_covers_example(self, example, filter):
1820        filter_indices = RuleCoversArguments.filterIndices(filter)
1821        if filter(example):
1822            try:
1823                if example[self.argument_id].value and len(example[self.argument_id].value.positive_arguments) > 0: # example has positive arguments
1824                    # conditions should cover at least one of the positive arguments
1825                    one_arg_covered = False
1826                    for pA in example[self.argument_id].value.positive_arguments:
1827                        arg_covered = [self.condIn(c, filter_indices) for c in pA.filter.conditions]
1828                        one_arg_covered = one_arg_covered or len(arg_covered) == sum(arg_covered) #arg_covered
1829                        if one_arg_covered:
1830                            break
1831                    if not one_arg_covered:
1832                        return False
1833                if example[self.argument_id].value and len(example[self.argument_id].value.negative_arguments) > 0: # example has negative arguments
1834                    # condition should not cover neither of negative arguments
1835                    for pN in example[self.argument_id].value.negative_arguments:
1836                        arg_covered = [self.condIn(c, filter_indices) for c in pN.filter.conditions]
1837                        if len(arg_covered) == sum(arg_covered):
1838                            return False
1839            except:
1840                return True
1841            return True
1842        return False
1843
1844    def condIn(self, cond, filter_indices): # is condition in the filter?
1845        condInd = RuleCoversArguments.conditionIndex(cond)
1846        if operator.or_(condInd, filter_indices[cond.position]) == filter_indices[cond.position]:
1847            return True
1848        return False
1849
1850
1851    def covered_percentage(self, examples):
1852        p = 0.0
1853        for ei, e in enumerate(examples):
1854            p += (e[self.prob_attribute] - self.apriori_prob) / (1.0 - self.apriori_prob)
1855        return p / len(examples)
1856
1857
1858
1859
1860@deprecated_keywords({"showDistribution": "show_distribution"})
1861def rule_to_string(rule, show_distribution=True):
1862    """
1863    Write a string presentation of rule in human readable format.
1864   
1865    :param rule: rule to pretty-print.
1866    :type rule: :class:`~Orange.classification.rules.Rule`
1867   
1868    :param show_distribution: determines whether presentation should also
1869        contain the distribution of covered instances
1870    :type show_distribution: bool
1871   
1872    """
1873    def selectSign(oper):
1874        if oper == Orange.core.ValueFilter_continuous.Less:
1875            return "<"
1876        elif oper == Orange.core.ValueFilter_continuous.LessEqual:
1877            return "<="
1878        elif oper == Orange.core.ValueFilter_continuous.Greater:
1879            return ">"
1880        elif oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1881            return ">="
1882        else: return "="
1883
1884    if not rule:
1885        return "None"
1886    conds = rule.filter.conditions
1887    domain = rule.filter.domain
1888
1889    ret = "IF "
1890    if len(conds) == 0:
1891        ret = ret + "TRUE"
1892
1893    for i, c in enumerate(conds):
1894        if i > 0:
1895            ret += " AND "
1896        if type(c) == Orange.core.ValueFilter_discrete:
1897            ret += domain[c.position].name + "=" + str([domain[c.position].\
1898                values[int(v)] for v in c.values])
1899        elif type(c) == Orange.core.ValueFilter_continuous:
1900            ret += domain[c.position].name + selectSign(c.oper) + str(c.ref)
1901    if rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1902            and rule.classifier.default_val:
1903        ret = ret + " THEN " + domain.class_var.name + "=" + \
1904        str(rule.classifier.default_value)
1905        if show_distribution:
1906            ret += str(rule.class_distribution)
1907    elif rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1908            and type(domain.class_var) == Orange.core.EnumVariable:
1909        ret = ret + " THEN " + domain.class_var.name + "=" + \
1910        str(rule.class_distribution.modus())
1911        if show_distribution:
1912            ret += str(rule.class_distribution)
1913    return ret
1914
1915def supervisedClassCheck(instances):
1916    if not instances.domain.class_var:
1917        raise Exception("Class variable is required!")
1918    if instances.domain.class_var.varType == Orange.core.VarTypes.Continuous:
1919        raise Exception("CN2 requires a discrete class!")
1920
1921
1922def rule_in_set(rule, rules):
1923    for r in rules:
1924        if rules_equal(rule, r):
1925            return True
1926    return False
1927
1928def rules_equal(rule1, rule2):
1929    if not len(rule1.filter.conditions) == len(rule2.filter.conditions):
1930        return False
1931    for c1 in rule1.filter.conditions:
1932        found = False # find the same condition in the other rule
1933        for c2 in rule2.filter.conditions:
1934            try:
1935                if not c1.position == c2.position: continue # same feature?
1936                if not type(c1) == type(c2): continue # same type of condition
1937                if type(c1) == Orange.core.ValueFilter_discrete:
1938                    if not type(c1.values[0]) == type(c2.values[0]): continue
1939                    if not c1.values[0] == c2.values[0]: continue # same value?
1940                if type(c1) == Orange.core.ValueFilter_continuous:
1941                    if not c1.oper == c2.oper: continue # same operator?
1942                    if not c1.ref == c2.ref: continue #same threshold?
1943                found = True
1944                break
1945            except:
1946                pass
1947        if not found:
1948            return False
1949    return True
1950
1951# Miscellaneous - utility functions
1952def avg(l):
1953    if len(l) == 0:
1954        return 0.
1955    return sum(l) / len(l)
1956
1957def var(l):
1958    if len(l) < 2:
1959        return 0.
1960    av = avg(l)
1961    vars = [math.pow(li - av, 2) for li in l]
1962    return sum(vars) / (len(l) - 1)
1963
1964def median(l):
1965    if len(l) == 0:
1966        return 0.
1967    l.sort()
1968    le = len(l)
1969    if le % 2 == 1:
1970        return l[(le - 1) / 2]
1971    else:
1972        return (l[le / 2 - 1] + l[le / 2]) / 2
1973
1974def perc(l, p):
1975    l.sort()
1976    return l[int(math.floor(p * len(l)))]
1977
1978class EVDFitter:
1979    """ Randomizes a dataset and fits an extreme value distribution onto it. """
1980
1981    def __init__(self, learner, n=200, randomseed=100):
1982        self.learner = learner
1983        self.n = n
1984        self.randomseed = randomseed
1985        # initialize random seed to make experiments repeatable
1986        random.seed(self.randomseed)
1987
1988
1989    def createRandomDataSet(self, data):
1990        newData = Orange.core.ExampleTable(data)
1991        # shuffle data
1992        cl_num = newData.toNumpy("C")
1993        random.shuffle(cl_num[0][:, 0])
1994        clData = Orange.core.ExampleTable(Orange.core.Domain([newData.domain.classVar]), cl_num[0])
1995        for d_i, d in enumerate(newData):
1996            d[newData.domain.classVar] = clData[d_i][newData.domain.classVar]
1997        return newData
1998
1999    def createEVDistList(self, evdList):
2000        l = Orange.core.EVDistList()
2001        for el in evdList:
2002            l.append(Orange.core.EVDist(mu=el[0], beta=el[1], percentiles=el[2]))
2003        return l
2004
2005
2006    # estimated fisher tippett parameters for a set of values given in vals list (+ deciles)
2007    def compParameters(self, vals, oldMi, oldBeta, oldPercs, fixedBeta=False):
2008        # compute percentiles
2009        vals.sort()
2010        N = len(vals)
2011        percs = [avg(vals[int(float(N) * i / 10):int(float(N) * (i + 1) / 10)]) for i in range(10)]
2012        if N < 10:
2013            return oldMi, oldBeta, percs
2014        if not fixedBeta:
2015            beta = min(2.0, math.sqrt(6 * var(vals) / math.pow(math.pi, 2)))#min(2.0, max(oldBeta, math.sqrt(6*var(vals)/math.pow(math.pi,2))))
2016        else:
2017            beta = oldBeta
2018        mi = max(oldMi, percs[-1] + beta * math.log(-math.log(0.95)))
2019        mi = percs[-1] + beta * math.log(-math.log(0.95))
2020        return max(oldMi, numpy.average(vals) - beta * 0.5772156649), beta, None
2021
2022    def prepare_learner(self):
2023        self.oldStopper = self.learner.ruleFinder.ruleStoppingValidator
2024        self.evaluator = self.learner.ruleFinder.evaluator
2025        self.refiner = self.learner.ruleFinder.refiner
2026        self.validator = self.learner.ruleFinder.validator
2027        self.ruleFilter = self.learner.ruleFinder.ruleFilter
2028        self.learner.ruleFinder.validator = None
2029        self.learner.ruleFinder.evaluator = Orange.core.RuleEvaluator_LRS()
2030        self.learner.ruleFinder.evaluator.storeRules = True
2031        self.learner.ruleFinder.ruleStoppingValidator = Orange.core.RuleValidator_LRS(alpha=1.0)
2032        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = 0
2033        self.learner.ruleFinder.refiner = Orange.core.RuleBeamRefiner_Selector()
2034        self.learner.ruleFinder.ruleFilter = Orange.core.RuleBeamFilter_Width(width=5)
2035
2036
2037    def restore_learner(self):
2038        self.learner.ruleFinder.evaluator = self.evaluator
2039        self.learner.ruleFinder.ruleStoppingValidator = self.oldStopper
2040        self.learner.ruleFinder.refiner = self.refiner
2041        self.learner.ruleFinder.validator = self.validator
2042        self.learner.ruleFinder.ruleFilter = self.ruleFilter
2043
2044    def computeEVD(self, data, weightID=0, target_class=0, progress=None):
2045        import time
2046        # prepare learned for distribution computation       
2047        self.prepare_learner()
2048
2049        # loop through N (sampling repetitions)
2050        extremeDists = [(0, 1, [])]
2051        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = self.oldStopper.max_rule_complexity
2052        maxVals = [[] for l in range(self.oldStopper.max_rule_complexity + 1)]
2053        for d_i in range(self.n):
2054            if not progress:
2055                if self.learner.debug:
2056                    print d_i,
2057            else:
2058                progress(float(d_i) / self.n, None)
2059            # create data set (remove and randomize)
2060            a = time.time()
2061            tempData = self.createRandomDataSet(data)
2062            a = time.time()
2063            self.learner.ruleFinder.evaluator.rules = RuleList()
2064            a = time.time()
2065            for l in range(self.oldStopper.max_rule_complexity + 2):
2066               self.learner.ruleFinder.evaluator.rules.append(None)
2067            a = time.time()
2068            # Next, learn a rule
2069            self.learner.ruleFinder(tempData, weightID, target_class, RuleList())
2070            a = time.time()
2071            for l in range(self.oldStopper.max_rule_complexity + 1):
2072                if self.learner.ruleFinder.evaluator.rules[l]:
2073                    maxVals[l].append(self.learner.ruleFinder.evaluator.rules[l].quality)
2074                else:
2075                    maxVals[l].append(0)
2076##                qs = [r.quality for r in self.learner.ruleFinder.evaluator.rules if r.complexity == l+1]
2077####                if qs:
2078####                    for r in self.learner.ruleFinder.evaluator.rules:
2079####                        if r.quality == max(qs) and r.classDistribution.abs == 16 and r.classDistribution[0] == 16:
2080####                            print "best rule", orngCN2.ruleToString(r), r.quality
2081##                if qs:
2082##                    maxVals[l].append(max(qs))
2083##                else:
2084##                    maxVals[l].append(0)
2085            a = time.time()
2086
2087        # longer rule should always be better than shorter rule
2088        for l in range(self.oldStopper.max_rule_complexity):
2089            for i in range(len(maxVals[l])):
2090                if maxVals[l + 1][i] < maxVals[l][i]:
2091                    maxVals[l + 1][i] = maxVals[l][i]
2092##        print
2093##        for mi, m in enumerate(maxVals):
2094##            print "mi=",mi,m
2095
2096        mu, beta, perc = 1.0, 2.0, [0.0] * 10
2097        for mi, m in enumerate(maxVals):
2098##            if mi == 0:
2099##                mu, beta, perc = self.compParameters(m, mu, beta, perc)
2100##            else:
2101            mu, beta, perc = self.compParameters(m, mu, beta, perc, fixedBeta=True)
2102            extremeDists.append((mu, beta, perc))
2103            extremeDists.extend([(0, 1, [])] * (mi))
2104            if self.learner.debug:
2105                print mi, mu, beta, perc
2106
2107        self.restore_learner()
2108        return self.createEVDistList(extremeDists)
2109
2110class ABBeamFilter(Orange.core.RuleBeamFilter):
2111    """
2112    ABBeamFilter: Filters beam;
2113        - leaves first N rules (by quality)
2114        - leaves first N rules that have only of arguments in condition part
2115    """
2116    def __init__(self, width=5):
2117        self.width = width
2118        self.pArgs = None
2119
2120    def __call__(self, rulesStar, examples, weight_id):
2121        newStar = Orange.core.RuleList()
2122        rulesStar.sort(lambda x, y:-cmp(x.quality, y.quality))
2123        argsNum = 0
2124        for r_i, r in enumerate(rulesStar):
2125            if r_i < self.width: # either is one of best "width" rules
2126                newStar.append(r)
2127            elif self.onlyPositives(r):
2128                if argsNum < self.width:
2129                    newStar.append(r)
2130                    argsNum += 1
2131        return newStar
2132
2133    def setArguments(self, domain, positive_arguments):
2134        self.pArgs = positive_arguments
2135        self.domain = domain
2136        self.argTab = [0] * len(self.domain.attributes)
2137        for arg in self.pArgs:
2138            for cond in arg.filter.conditions:
2139                self.argTab[cond.position] = 1
2140
2141    def onlyPositives(self, rule):
2142        if not self.pArgs:
2143            return False
2144
2145        ruleTab = [0] * len(self.domain.attributes)
2146        for cond in rule.filter.conditions:
2147            ruleTab[cond.position] = 1
2148        return map(operator.or_, ruleTab, self.argTab) == self.argTab
2149
2150
2151class RuleCoversArguments:
2152    """
2153    Class determines if rule covers one out of a set of arguments.
2154    """
2155    def __init__(self, arguments):
2156        self.arguments = arguments
2157        self.indices = []
2158        for a in self.arguments:
2159            indNA = getattr(a.filter, "indices", None)
2160            if not indNA:
2161                a.filter.setattr("indices", RuleCoversArguments.filterIndices(a.filter))
2162            self.indices.append(a.filter.indices)
2163
2164    def __call__(self, rule):
2165        if not self.indices:
2166            return False
2167        if not getattr(rule.filter, "indices", None):
2168            rule.filter.indices = RuleCoversArguments.filterIndices(rule.filter)
2169        for index in self.indices:
2170            if map(operator.or_, rule.filter.indices, index) == rule.filter.indices:
2171                return True
2172        return False
2173
2174    def filterIndices(filter):
2175        if not filter.domain:
2176            return []
2177        ind = [0] * len(filter.domain.attributes)
2178        for c in filter.conditions:
2179            ind[c.position] = operator.or_(ind[c.position],
2180                                         RuleCoversArguments.conditionIndex(c))
2181        return ind
2182    filterIndices = staticmethod(filterIndices)
2183
2184    def conditionIndex(c):
2185        if type(c) == Orange.core.ValueFilter_continuous:
2186            if (c.oper == Orange.core.ValueFilter_continuous.GreaterEqual or
2187                c.oper == Orange.core.ValueFilter_continuous.Greater):
2188                return 5# 0101
2189            elif (c.oper == Orange.core.ValueFilter_continuous.LessEqual or
2190                  c.oper == Orange.core.ValueFilter_continuous.Less):
2191                return 3 # 0011
2192            else:
2193                return c.oper
2194        else:
2195            return 1 # 0001
2196    conditionIndex = staticmethod(conditionIndex)
2197
2198    def oneSelectorToCover(ruleIndices, argIndices):
2199        at, type = -1, 0
2200        for r_i, ind in enumerate(ruleIndices):
2201            if not argIndices[r_i]:
2202                continue
2203            if at > -1 and not ind == argIndices[r_i]: # need two changes
2204                return (-1, 0)
2205            if not ind == argIndices[r_i]:
2206                if argIndices[r_i] in [1, 3, 5]:
2207                    at, type = r_i, argIndices[r_i]
2208                if argIndices[r_i] == 6:
2209                    if ind == 3:
2210                        at, type = r_i, 5
2211                    if ind == 5:
2212                        at, type = r_i, 3
2213        return at, type
2214    oneSelectorToCover = staticmethod(oneSelectorToCover)
2215
2216
2217class SelectorAdder(Orange.core.RuleBeamRefiner):
2218    """
2219    Selector adder, this function is a refiner function:
2220       - refined rules are not consistent with any of negative arguments.
2221    """
2222    def __init__(self, example=None, not_allowed_selectors=[], argument_id=None,
2223                 discretizer=Orange.core.EntropyDiscretization(forceAttribute=True)):
2224        # required values - needed values of attributes
2225        self.example = example
2226        self.argument_id = argument_id
2227        self.not_allowed_selectors = not_allowed_selectors
2228        self.discretizer = discretizer
2229
2230    def __call__(self, oldRule, data, weight_id, target_class= -1):
2231        inNotAllowedSelectors = RuleCoversArguments(self.not_allowed_selectors)
2232        new_rules = Orange.core.RuleList()
2233
2234        # get positive indices (selectors already in the rule)
2235        indices = getattr(oldRule.filter, "indices", None)
2236        if not indices:
2237            indices = RuleCoversArguments.filterIndices(oldRule.filter)
2238            oldRule.filter.setattr("indices", indices)
2239
2240        # get negative indices (selectors that should not be in the rule)
2241        negative_indices = [0] * len(data.domain.attributes)
2242        for nA in self.not_allowed_selectors:
2243            #print indices, nA.filter.indices
2244            at_i, type_na = RuleCoversArguments.oneSelectorToCover(indices, nA.filter.indices)
2245            if at_i > -1:
2246                negative_indices[at_i] = operator.or_(negative_indices[at_i], type_na)
2247
2248        #iterate through indices = attributes
2249        for i, ind in enumerate(indices):
2250            if not self.example[i] or self.example[i].isSpecial():
2251                continue
2252            if ind == 1:
2253                continue
2254            if data.domain[i].varType == Orange.core.VarTypes.Discrete and not negative_indices[i] == 1: # DISCRETE attribute
2255                if self.example:
2256                    values = [self.example[i]]
2257                else:
2258                    values = data.domain[i].values
2259                for v in values:
2260                    tempRule = oldRule.clone()
2261                    tempRule.filter.conditions.append(Orange.core.ValueFilter_discrete(position=i,
2262                                                                                  values=[Orange.core.Value(data.domain[i], v)],
2263                                                                                  acceptSpecial=0))
2264                    tempRule.complexity += 1
2265                    tempRule.filter.indices[i] = 1 # 1 stands for discrete attribute (see RuleCoversArguments.conditionIndex)
2266                    tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2267                    if len(tempRule.examples) < len(oldRule.examples):
2268                        new_rules.append(tempRule)
2269            elif data.domain[i].varType == Orange.core.VarTypes.Continuous and not negative_indices[i] == 7: # CONTINUOUS attribute
2270                try:
2271                    at = data.domain[i]
2272                    at_d = self.discretizer(at, oldRule.examples)
2273                except:
2274                    continue # discretization failed !
2275                # If discretization makes sense? then:
2276                if len(at_d.values) > 1:
2277                    for p in at_d.getValueFrom.transformer.points:
2278                        #LESS
2279                        if not negative_indices[i] == 3:
2280                            tempRule = self.getTempRule(oldRule, i, Orange.core.ValueFilter_continuous.LessEqual, p, target_class, 3)
2281                            if len(tempRule.examples) < len(oldRule.examples) and self.example[i] <= p:# and not inNotAllowedSelectors(tempRule):
2282                                new_rules.append(tempRule)
2283                        #GREATER
2284                        if not negative_indices[i] == 5:
2285                            tempRule = self.getTempRule(oldRule, i, Orange.core.ValueFilter_continuous.Greater, p, target_class, 5)
2286                            if len(tempRule.examples) < len(oldRule.examples) and self.example[i] > p:# and not inNotAllowedSelectors(tempRule):
2287                                new_rules.append(tempRule)
2288        for r in new_rules:
2289            r.parentRule = oldRule
2290            r.valuesFilter = r.filter.filter
2291        return new_rules
2292
2293    def getTempRule(self, oldRule, pos, oper, ref, target_class, atIndex):
2294        tempRule = oldRule.clone()
2295
2296        tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=pos,
2297                                                                        oper=oper,
2298                                                                        ref=ref,
2299                                                                        acceptSpecial=0))
2300        tempRule.complexity += 1
2301        tempRule.filter.indices[pos] = operator.or_(tempRule.filter.indices[pos], atIndex) # from RuleCoversArguments.conditionIndex
2302        tempRule.filterAndStore(oldRule.examples, tempRule.weightID, target_class)
2303        return tempRule
2304
2305    def setCondition(self, oldRule, target_class, ci, condition):
2306        tempRule = oldRule.clone()
2307        tempRule.filter.conditions[ci] = condition
2308        tempRule.filter.conditions[ci].setattr("specialized", 1)
2309        tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2310        return tempRule
2311
2312SelectorAdder = deprecated_members({"notAllowedSelectors": "not_allowed_selectors",
2313                     "argumentID": "argument_id"})(SelectorAdder)
2314
2315# This filter is the ugliest code ever! Problem is with Orange, I had some problems with inheriting deepCopy
2316# I should take another look at it.
2317class ArgFilter(Orange.core.Filter):
2318    """ This class implements AB-covering principle. """
2319    def __init__(self, argument_id=None, filter=Orange.core.Filter_values(), arg_example=None):
2320        self.filter = filter
2321        self.indices = getattr(filter, "indices", [])
2322        if not self.indices and len(filter.conditions) > 0:
2323            self.indices = RuleCoversArguments.filterIndices(filter)
2324        self.argument_id = argument_id
2325        self.domain = self.filter.domain
2326        self.conditions = filter.conditions
2327        self.arg_example = arg_example
2328
2329    def condIn(self, cond): # is condition in the filter?
2330        condInd = ruleCoversArguments.conditionIndex(cond)
2331        if operator.or_(condInd, self.indices[cond.position]) == self.indices[cond.position]:
2332            return True
2333        return False
2334
2335    def __call__(self, example):
2336##        print "in", self.filter(example)#, self.filter.conditions[0](example)
2337##        print self.filter.conditions[1].values
2338        if self.filter(example) and example != self.arg_example:
2339            return True
2340        elif self.filter(example):
2341            try:
2342                if example[self.argument_id].value and len(example[self.argument_id].value.positiveArguments) > 0: # example has positive arguments
2343                    # conditions should cover at least one of the positive arguments
2344                    oneArgCovered = False
2345                    for pA in example[self.argument_id].value.positiveArguments:
2346                        argCovered = [self.condIn(c) for c in pA.filter.conditions]
2347                        oneArgCovered = oneArgCovered or len(argCovered) == sum(argCovered) #argCovered
2348                        if oneArgCovered:
2349                            break
2350                    if not oneArgCovered:
2351                        return False
2352                if example[self.argument_id].value and len(example[self.argument_id].value.negativeArguments) > 0: # example has negative arguments
2353                    # condition should not cover neither of negative arguments
2354                    for pN in example[self.argument_id].value.negativeArguments:
2355                        argCovered = [self.condIn(c) for c in pN.filter.conditions]
2356                        if len(argCovered) == sum(argCovered):
2357                            return False
2358            except:
2359                return True
2360            return True
2361        else:
2362            return False
2363
2364    def __setattr__(self, name, obj):
2365        self.__dict__[name] = obj
2366        self.filter.setattr(name, obj)
2367
2368    def deep_copy(self):
2369        newFilter = ArgFilter(argument_id=self.argument_id)
2370        newFilter.filter = Orange.core.Filter_values() #self.filter.deepCopy()
2371        newFilter.filter.conditions = self.filter.conditions[:]
2372        newFilter.domain = self.filter.domain
2373        newFilter.negate = self.filter.negate
2374        newFilter.conjunction = self.filter.conjunction
2375        newFilter.domain = self.filter.domain
2376        newFilter.conditions = newFilter.filter.conditions
2377        newFilter.indices = self.indices[:]
2378        return newFilter
2379
2380ArgFilter = deprecated_members({"argumentID": "argument_id"})(ArgFilter)
2381
2382class SelectorArgConditions(Orange.core.RuleBeamRefiner):
2383    """
2384    Selector adder, this function is a refiner function:
2385      - refined rules are not consistent with any of negative arguments.
2386    """
2387    def __init__(self, example, allowed_selectors):
2388        # required values - needed values of attributes
2389        self.example = example
2390        self.allowed_selectors = allowed_selectors
2391
2392    def __call__(self, oldRule, data, weight_id, target_class= -1):
2393        if len(oldRule.filter.conditions) >= len(self.allowed_selectors):
2394            return Orange.core.RuleList()
2395        new_rules = Orange.core.RuleList()
2396        for c in self.allowed_selectors:
2397            # normal condition
2398            if not c.unspecialized_condition:
2399                tempRule = oldRule.clone()
2400                tempRule.filter.conditions.append(c)
2401                tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2402                if len(tempRule.examples) < len(oldRule.examples):
2403                    new_rules.append(tempRule)
2404            # unspecified condition
2405            else:
2406                # find all possible example values
2407                vals = {}
2408                for e in oldRule.examples:
2409                    if not e[c.position].isSpecial():
2410                        vals[str(e[c.position])] = 1
2411                values = vals.keys()
2412                # for each value make a condition
2413                for v in values:
2414                    tempRule = oldRule.clone()
2415                    tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=c.position,
2416                                                                                    oper=c.oper,
2417                                                                                    ref=float(v),
2418                                                                                    acceptSpecial=0))
2419                    if tempRule(self.example):
2420                        tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2421                        if len(tempRule.examples) < len(oldRule.examples):
2422                            new_rules.append(tempRule)
2423##        print " NEW RULES "
2424##        for r in new_rules:
2425##            print Orange.classification.rules.rule_to_string(r)
2426        for r in new_rules:
2427            r.parentRule = oldRule
2428##            print Orange.classification.rules.rule_to_string(r)
2429        return new_rules
2430
2431
2432class CrossValidation:
2433    def __init__(self, folds=5, random_generator=150):
2434        self.folds = folds
2435        self.random_generator = random_generator
2436
2437    def __call__(self, learner, examples, weight):
2438        res = orngTest.crossValidation([learner], (examples, weight), folds=self.folds, random_generator=self.random_generator)
2439        return self.get_prob_from_res(res, examples)
2440
2441    def get_prob_from_res(self, res, examples):
2442        prob_dist = Orange.core.DistributionList()
2443        for tex in res.results:
2444            d = Orange.core.Distribution(examples.domain.class_var)
2445            for di in range(len(d)):
2446                d[di] = tex.probabilities[0][di]
2447            prob_dist.append(d)
2448        return prob_dist
2449
2450
2451class PILAR:
2452    """
2453    PILAR (Probabilistic improvement of learning algorithms with rules).
2454    """
2455    def __init__(self, alternative_learner=None, min_cl_sig=0.5, min_beta=0.0, set_prefix_rules=False, optimize_betas=True):
2456        self.alternative_learner = alternative_learner
2457        self.min_cl_sig = min_cl_sig
2458        self.min_beta = min_beta
2459        self.set_prefix_rules = set_prefix_rules
2460        self.optimize_betas = optimize_betas
2461        self.selected_evaluation = CrossValidation(folds=5)
2462
2463    def __call__(self, rules, examples, weight=0):
2464        rules = self.add_null_rule(rules, examples, weight)
2465        if self.alternative_learner:
2466            prob_dist = self.selected_evaluation(self.alternative_learner, examples, weight)
2467            classifier = self.alternative_learner(examples, weight)
2468##            prob_dist = Orange.core.DistributionList()
2469##            for e in examples:
2470##                prob_dist.append(classifier(e,Orange.core.GetProbabilities))
2471            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas, classifier, prob_dist)
2472        else:
2473            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas)
2474
2475##        print "result"
2476        for ri, r in enumerate(cl.rules):
2477            cl.rules[ri].setattr("beta", cl.ruleBetas[ri])
2478##            if cl.ruleBetas[ri] > 0:
2479##                print Orange.classification.rules.rule_to_string(r), r.quality, cl.ruleBetas[ri]
2480        cl.all_rules = cl.rules
2481        cl.rules = self.sort_rules(cl.rules)
2482        cl.ruleBetas = [r.beta for r in cl.rules]
2483        cl.setattr("data", examples)
2484        return cl
2485
2486    def add_null_rule(self, rules, examples, weight):
2487        for cl in examples.domain.class_var:
2488            tmpRle = Orange.core.Rule()
2489            tmpRle.filter = Orange.core.Filter_values(domain=examples.domain)
2490            tmpRle.parentRule = None
2491            tmpRle.filterAndStore(examples, weight, int(cl))
2492            tmpRle.quality = tmpRle.class_distribution[int(cl)] / tmpRle.class_distribution.abs
2493            rules.append(tmpRle)
2494        return rules
2495
2496    def sort_rules(self, rules):
2497        new_rules = Orange.core.RuleList()
2498        foundRule = True
2499        while foundRule:
2500            foundRule = False
2501            bestRule = None
2502            for r in rules:
2503                if r in new_rules:
2504                    continue
2505                if r.beta < 0.01 and r.beta > -0.01:
2506                    continue
2507                if not bestRule:
2508                    bestRule = r
2509                    foundRule = True
2510                    continue
2511                if len(r.filter.conditions) < len(bestRule.filter.conditions):
2512                    bestRule = r
2513                    foundRule = True
2514                    continue
2515                if len(r.filter.conditions) == len(bestRule.filter.conditions) and r.beta > bestRule.beta:
2516                    bestRule = r
2517                    foundRule = True
2518                    continue
2519            if bestRule:
2520                new_rules.append(bestRule)
2521        return new_rules
2522
2523PILAR = deprecated_members({"sortRules": "sort_rules"})(PILAR)
2524
2525
2526class RuleClassifier_bestRule(Orange.core.RuleClassifier):
2527    """
2528    A very simple classifier, it takes the best rule of each class and
2529    normalizes probabilities.
2530    """
2531    def __init__(self, rules, examples, weight_id=0, **argkw):
2532        self.rules = rules
2533        self.examples = examples
2534        self.apriori = Orange.core.Distribution(examples.domain.class_var, examples, weight_id)
2535        self.apriori_prob = [a / self.apriori.abs for a in self.apriori]
2536        self.weight_id = weight_id
2537        self.__dict__.update(argkw)
2538        self.default_class_index = -1
2539
2540    @deprecated_keywords({"retRules": "ret_rules"})
2541    def __call__(self, example, result_type=Orange.classification.Classifier.GetValue, ret_rules=False):
2542        example = Orange.core.Example(self.examples.domain, example)
2543        tempDist = Orange.core.Distribution(example.domain.class_var)
2544        best_rules = [None] * len(example.domain.class_var.values)
2545
2546        for r in self.rules:
2547            if r(example) and not self.default_class_index == int(r.classifier.default_val) and \
2548               (not best_rules[int(r.classifier.default_val)] or r.quality > tempDist[r.classifier.default_val]):
2549                tempDist[r.classifier.default_val] = r.quality
2550                best_rules[int(r.classifier.default_val)] = r
2551        for b in best_rules:
2552            if b:
2553                used = getattr(b, "used", 0.0)
2554                b.setattr("used", used + 1)
2555        nonCovPriorSum = sum([tempDist[i] == 0. and self.apriori_prob[i] or 0. for i in range(len(self.apriori_prob))])
2556        if tempDist.abs < 1.:
2557            residue = 1. - tempDist.abs
2558            for a_i, a in enumerate(self.apriori_prob):
2559                if tempDist[a_i] == 0.:
2560                    tempDist[a_i] = self.apriori_prob[a_i] * residue / nonCovPriorSum
2561            final_dist = tempDist #Orange.core.Distribution(example.domain.class_var)
2562        else:
2563            tempDist.normalize() # prior probability
2564            tmp_examples = Orange.core.ExampleTable(self.examples)
2565            for r in best_rules:
2566                if r:
2567                    tmp_examples = r.filter(tmp_examples)
2568            tmpDist = Orange.core.Distribution(tmp_examples.domain.class_var, tmp_examples, self.weight_id)
2569            tmpDist.normalize()
2570            probs = [0.] * len(self.examples.domain.class_var.values)
2571            for i in range(len(self.examples.domain.class_var.values)):
2572                probs[i] = tmpDist[i] + tempDist[i] * 2
2573            final_dist = Orange.core.Distribution(self.examples.domain.class_var)
2574            for cl_i, cl in enumerate(self.examples.domain.class_var):
2575                final_dist[cl] = probs[cl_i]
2576            final_dist.normalize()
2577
2578        if ret_rules: # Do you want to return rules with classification?
2579            if result_type == Orange.classification.Classifier.GetValue:
2580              return (final_dist.modus(), best_rules)
2581            if result_type == Orange.core.GetProbabilities:
2582              return (final_dist, best_rules)
2583            return (final_dist.modus(), final_dist, best_rules)
2584        if result_type == Orange.classification.Classifier.GetValue:
2585          return final_dist.modus()
2586        if result_type == Orange.core.GetProbabilities:
2587          return final_dist
2588        return (final_dist.modus(), final_dist)
2589
2590RuleClassifier_bestRule = deprecated_members({"defaultClassIndex": "default_class_index"})(RuleClassifier_bestRule)
Note: See TracBrowser for help on using the repository browser.