source: orange/Orange/classification/rules.py @ 9701:485a2ee92454

Revision 9701:485a2ee92454, 108.6 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Removed orngABML import.

Line 
1"""
2
3.. index:: rule induction
4
5.. index::
6   single: classification; rule induction
7
8**************************
9Rule induction (``rules``)
10**************************
11
12This module implements supervised rule induction algorithms
13and rule-based classification methods, specifically the
14`CN2 induction algorithm <http://www.springerlink.com/content/k6q2v76736w5039r/>`_
15in multiple variants, including an argument-based learning one.
16The implementation is modular, based on the rule induction
17framework that is described below, providing the opportunity to change, specialize
18and improve the algorithm.
19
20CN2 algorithm
21=============
22
23.. index::
24   single: classification; CN2
25
26Several variations of well-known CN2 rule learning algorithms are implemented.
27All are implemented by wrapping the
28:class:`~Orange.classification.rules.RuleLearner` class. Each CN2 learner class
29in this module changes some of RuleLearner's replaceable components to reflect
30the required behavior.
31
32Usage is consistent with typical learner usage in Orange:
33
34:download:`rules-cn2.py <code/rules-cn2.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
35
36.. literalinclude:: code/rules-cn2.py
37    :lines: 7-
38
39The result::
40   
41    IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000>
42    IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000>
43    IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000>
44    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000>
45    IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000>
46    IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000>
47    IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000>
48    IF status=['second'] THEN survived=yes<13.000, 80.000>
49    IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000>
50    IF status=['crew'] THEN survived=no<670.000, 192.000>
51    IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000>
52    IF sex=['male'] THEN survived=no<118.000, 57.000>
53    IF age=['child'] THEN survived=no<17.000, 14.000>
54    IF TRUE THEN survived=no<89.000, 76.000>
55   
56.. autoclass:: Orange.classification.rules.CN2Learner
57   :members:
58   :show-inheritance:
59   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
60      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
61   
62.. autoclass:: Orange.classification.rules.CN2Classifier
63   :members:
64   :show-inheritance:
65   :exclude-members: beamWidth, resultType
66   
67.. index:: unordered CN2
68
69.. index::
70   single: classification; unordered CN2
71
72.. autoclass:: Orange.classification.rules.CN2UnorderedLearner
73   :members:
74   :show-inheritance:
75   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
76      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
77   
78.. autoclass:: Orange.classification.rules.CN2UnorderedClassifier
79   :members:
80   :show-inheritance:
81   
82.. index:: CN2-SD
83.. index:: subgroup discovery
84
85.. index::
86   single: classification; CN2-SD
87   
88.. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner
89   :members:
90   :show-inheritance:
91   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
92      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
93   
94.. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner
95   :members:
96   :show-inheritance:
97   
98References
99----------
100
101* Clark, Niblett. `The CN2 Induction Algorithm
102  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine
103  Learning 3(4):261--284, 1989.
104* Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements
105  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In
106  Machine Learning - EWSL-91. Proceedings of the European Working Session on
107  Learning, pp 151--163, Porto, Portugal, March 1991.
108* Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD
109  <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal
110  of Machine Learning Research 5: 153-188, 2004.
111
112
113Argument based CN2
114==================
115
116Orange also supports argument-based CN2 learning.
117
118.. autoclass:: Orange.classification.rules.ABCN2
119   :members:
120   :show-inheritance:
121   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
122      ruleFinder, ruleStopping, storeInstances, targetClass, weightID,
123      argument_id
124   
125   This class has many more undocumented methods; see the source code for
126   reference.
127   
128.. autoclass:: Orange.classification.rules.ABCN2Ordered
129   :members:
130   :show-inheritance:
131   
132.. autoclass:: Orange.classification.rules.ABCN2M
133   :members:
134   :show-inheritance:
135   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
136      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
137
138Thismodule has many more undocumented argument-based learning related classed;
139see the source code for reference.
140
141References
142----------
143
144* Bratko, Mozina, Zabkar. `Argument-Based Machine Learning
145  <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in
146  Computer Science: vol. 4203/2006, 11-17, 2006.
147
148
149Rule induction framework
150========================
151
152A general framework of classes supports the described CN2 implementation, and
153can in fact be fine-tuned to specific needs by replacing individual components.
154Here is a simple example, while a detailed architecture can be observed
155in description of classes that follows it:
156
157part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
158
159.. literalinclude:: code/rules-customized.py
160    :lines: 7-17
161
162probability with m=50. The result is::
163
164    IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000>
165    IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000>
166    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000>
167    IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000>
168    IF status=['second'] THEN survived=yes<13.000, 104.000>
169    IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000>
170    IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000>
171    IF status=['crew'] THEN survived=yes<3.000, 20.000>
172    IF sex=['female'] THEN survived=no<106.000, 90.000>
173    IF TRUE THEN survived=yes<0.000, 5.000>
174
175Notice that it is first necessary to set the :obj:`rule_finder` component,
176because the default components are not constructed when the learner is
177constructed, but only when we run it on data. At that time, the algorithm
178checks which components are necessary and sets defaults. Similarly, when the
179learner finishes, it destructs all *default* components. Continuing with our
180example, assume that we wish to set a different validation function and a
181different bean width. This is simply written as:
182
183part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
184
185.. literalinclude:: code/rules-customized.py
186    :lines: 19-23
187
188.. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1)
189   
190   Representation of a single induced rule.
191   
192   Parameters, that can be passed to the constructor, correspond to the first
193   7 attributes. All attributes are:
194   
195   .. attribute:: filter
196   
197      contents of the rule; this is the basis of the Rule class. Must be of
198      type :class:`Orange.core.Filter`; an instance of
199      :class:`Orange.core.Filter_values` is set as a default.
200   
201   .. attribute:: classifier
202     
203      each rule can be used as a classical Orange like
204      classifier. Must be of type :class:`Orange.classification.Classifier`.
205      By default, an instance of :class:`Orange.classification.ConstantClassifier` is used.
206   
207   .. attribute:: learner
208     
209      learner to be used for making a classifier. Must be of type
210      :class:`Orange.classification.Learner`. By default,
211      :class:`Orange.classification.majority.MajorityLearner` is used.
212   
213   .. attribute:: class_distribution
214     
215      distribution of class in data instances covered by this rule
216      (:class:`Orange.statistics.distribution.Distribution`).
217   
218   .. attribute:: examples
219     
220      data instances covered by this rule (:class:`Orange.data.Table`).
221   
222   .. attribute:: weight_id
223   
224      ID of the weight meta-attribute for the stored data instances (int).
225   
226   .. attribute:: quality
227     
228      quality of the rule. Rules with higher quality are better (float).
229   
230   .. attribute:: complexity
231   
232      complexity of the rule (float). Complexity is used for
233      selecting between rules with equal quality, where rules with lower
234      complexity are preferred. Typically, complexity corresponds to the
235      number of selectors in rule (actually to number of conditions in filter),
236      but, obviously, any other measure can be applied.
237   
238   .. method:: filter_and_store(instances, weight_id=0, target_class=-1)
239   
240      Filter passed data instances and store them in the attribute 'examples'.
241      Also, compute class_distribution, set weight of stored examples and create
242      a new classifier using 'learner' attribute.
243     
244      :param weight_id: ID of the weight meta-attribute.
245      :type weight_id: int
246      :param target_class: index of target class; -1 for all.
247      :type target_class: int
248   
249   Objects of this class can be invoked:
250
251   .. method:: __call__(instance, instances, weight_id=0, target_class=-1)
252   
253      There are two ways of invoking this method. One way is only passing the
254      data instance; then the Rule object returns True if the given instance is
255      covered by the rule's filter.
256     
257      :param instance: data instance.
258      :type instance: :class:`Orange.data.Instance`
259     
260      Another way of invocation is passing a table of data instances,
261      in which case a table of instances, covered by this rule, is returned.
262     
263      :param instances: a table of data instances.
264      :type instances: :class:`Orange.data.Table`
265      :param ref: TODO
266      :type ref: bool
267      :param negate: if set to True, the result is inverted: the resulting
268          table contains instances *not* covered by the rule.
269      :type negate: bool
270
271.. py:class:: Orange.classification.rules.RuleLearner(store_instances = true, target_class = -1, base_rules = Orange.classification.rules.RuleList())
272   
273   Bases: :class:`Orange.classification.Learner`
274   
275   A base rule induction learner. The algorithm follows separate-and-conquer
276   strategy, which has its origins in the AQ family of algorithms
277   (Fuernkranz J.; Separate-and-Conquer Rule Learning, Artificial Intelligence
278   Review 13, 3-54, 1999). Basically, such algorithms search for the "best"
279   possible rule in learning instancess, remove covered data from learning
280   instances (separate) and repeat the process (conquer) on the remaining
281   instances.
282   
283   The class' functionality can be best explained by showing its __call__
284   function:
285   
286   .. parsed-literal::
287
288      def \_\_call\_\_(self, instances, weight_id=0):
289          rule_list = Orange.classification.rules.RuleList()
290          all_instances = Orange.data.Table(instances)
291          while not self.\ **data_stopping**\ (instances, weight_id, self.target_class):
292              new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class,
293                                        self.base_rules)
294              if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id):
295                  break
296              instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances,
297                                                      weight_id, self.target_class)
298              rule_list.append(new_rule)
299          return Orange.classification.rules.RuleClassifier_FirstRule(
300              rules=rule_list, instances=all_instances)
301               
302   The four customizable components here are the invoked :obj:`data_stopping`,
303   :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping`
304   objects. By default, components of the original CN2 algorithm are be used,
305   but this can be changed by modifying those attributes:
306   
307   .. attribute:: data_stopping
308   
309      an object of class
310      :class:`~Orange.classification.rules.RuleDataStoppingCriteria`
311      that determines whether there will be any benefit from further learning
312      (ie. if there is enough data to continue learning). The default
313      implementation
314      (:class:`~Orange.classification.rules.RuleDataStoppingCriteria_NoPositives`)
315      returns True if there are no more instances of given class.
316   
317   .. attribute:: rule_stopping
318     
319      an object of class
320      :class:`~Orange.classification.rules.RuleStoppingCriteria`
321      that decides from the last rule learned if it is worthwhile to use the
322      rule and learn more rules. By default, no rule stopping criteria is
323      used (:obj:`rule_stopping` == :obj:`None`), thus accepting all
324      rules.
325       
326   .. attribute:: cover_and_remove
327       
328      an object of
329      :class:`RuleCovererAndRemover` that removes
330      instances covered by the rule and returns remaining instances. The
331      default implementation
332      (:class:`RuleCovererAndRemover_Default`)
333      only removes the instances that belong to given target class, except if
334      it is not given (ie. :obj:`target_class` == -1).
335   
336   .. attribute:: rule_finder
337     
338      an object of class
339      :class:`~Orange.classification.rules.RuleFinder` that learns a single
340      rule from instances. Default implementation is
341      :class:`~Orange.classification.rules.RuleBeamFinder`.
342
343   :param store_instances: if set to True, the rules will have data instances
344       stored.
345   :type store_instances: bool
346   
347   :param target_class: index of a specific class being learned; -1 for all.
348   :type target_class: int
349   
350   :param base_rules: Rules that we would like to use in :obj:`rule_finder` to
351       constrain the learning space. If not set, it will be set to a set
352       containing only an empty rule.
353   :type base_rules: :class:`~Orange.classification.rules.RuleList`
354
355Rule finders
356------------
357
358.. class:: Orange.classification.rules.RuleFinder
359
360   Base class for all rule finders. These are used to learn a single rule from
361   instances.
362   
363   Rule finders are invokable in the following manner:
364   
365   .. method:: __call__(table, weight_id, target_class, base_rules)
366   
367      Return a new rule, induced from instances in the given table.
368     
369      :param table: data instances to learn from.
370      :type table: :class:`Orange.data.Table`
371     
372      :param weight_id: ID of the weight meta-attribute for the stored data
373          instances.
374      :type weight_id: int
375     
376      :param target_class: index of a specific class being learned; -1 for all.
377      :type target_class: int
378     
379      :param base_rules: Rules that we would like to use in :obj:`rule_finder`
380          to constrain the learning space. If not set, it will be set to a set
381          containing only an empty rule.
382      :type base_rules: :class:`~Orange.classification.rules.RuleList`
383
384.. class:: Orange.classification.rules.RuleBeamFinder
385   
386   Bases: :class:`~Orange.classification.rules.RuleFinder`
387   
388   Beam search for the best rule. This is the default class used in RuleLearner
389   to find the best rule. Pseudo code of the algorithm is shown here:
390
391   .. parsed-literal::
392
393      def \_\_call\_\_(self, table, weight_id, target_class, base_rules):
394          prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id)
395          rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior)
396          \# compute quality of rules in rules_star and best_rule
397          ...
398          while len(rules_star) \> 0:
399              candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id)
400              for cand in candidates:
401                  new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class)
402                  for new_rule in new_rules:
403                      if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution):
404                          new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior)
405                          rules_star.append(new_rule)
406                          if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and
407                              new_rule.quality \> best_rule.quality:
408                              best_rule = new_rule
409              rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id)
410          return best_rule
411
412   Bolded in the pseudo-code are several exchangeable components, exposed as
413   attributes. These are:
414
415   .. attribute:: initializer
416   
417      an object of class
418      :class:`~Orange.classification.rules.RuleBeamInitializer`
419      used to initialize :obj:`rules_star` and for selecting the
420      initial best rule. By default
421      (:class:`~Orange.classification.rules.RuleBeamInitializer_Default`),
422      :obj:`base_rules` are returned as starting :obj:`rulesSet` and the best
423      from :obj:`base_rules` is set as :obj:`best_rule`. If :obj:`base_rules`
424      are not set, this class will return :obj:`rules_star` with rule that
425      covers all instances (has no selectors) and this rule will be also used
426      as :obj:`best_rule`.
427   
428   .. attribute:: candidate_selector
429   
430      an object of class
431      :class:`~Orange.classification.rules.RuleBeamCandidateSelector`
432      used to separate a subset from the current
433      :obj:`rules_star` and return it. These rules will be used in the next
434      specification step. Default component (an instance of
435      :class:`~Orange.classification.rules.RuleBeamCandidateSelector_TakeAll`)
436      takes all rules in :obj:`rules_star`.
437   
438   .. attribute:: refiner
439   
440      an object of class
441      :class:`~Orange.classification.rules.RuleBeamRefiner`
442      used to refine given rule. New rule should cover a
443      strict subset of examples covered by given rule. Default component
444      (:class:`~Orange.classification.rules.RuleBeamRefiner_Selector`) adds
445      a conjunctive selector to selectors present in the rule.
446   
447   .. attribute:: rule_filter
448   
449      an object of class
450      :class:`~Orange.classification.rules.RuleBeamFilter`
451      used to filter rules to keep beam relatively small
452      to contain search complexity. By default, it takes five best rules:
453      :class:`~Orange.classification.rules.RuleBeamFilter_Width`\ *(m=5)*\ .
454
455   .. method:: __call__(data, weight_id, target_class, base_rules)
456
457   Determines the next best rule to cover the remaining data instances.
458   
459   :param data: data instances.
460   :type data: :class:`Orange.data.Table`
461   
462   :param weight_id: index of the weight meta-attribute.
463   :type weight_id: int
464   
465   :param target_class: index of the target class.
466   :type target_class: int
467   
468   :param base_rules: existing rules.
469   :type base_rules: :class:`~Orange.classification.rules.RuleList`
470
471Rule evaluators
472---------------
473
474.. class:: Orange.classification.rules.RuleEvaluator
475
476   Base class for rule evaluators that evaluate the quality of the rule based
477   on covered data instances. All evaluators support being invoked in the
478   following manner:
479   
480   .. method:: __call__(rule, instances, weight_id, target_class, prior)
481   
482      Calculates a non-negative rule quality.
483     
484      :param rule: rule to evaluate.
485      :type rule: :class:`~Orange.classification.rules.Rule`
486     
487      :param instances: a table of instances, covered by the rule.
488      :type instances: :class:`Orange.data.Table`
489     
490      :param weight_id: index of the weight meta-attribute.
491      :type weight_id: int
492     
493      :param target_class: index of target class of this rule.
494      :type target_class: int
495     
496      :param prior: prior class distribution.
497      :type prior: :class:`Orange.statistics.distribution.Distribution`
498
499.. autoclass:: Orange.classification.rules.LaplaceEvaluator
500   :members:
501   :show-inheritance:
502   :exclude-members: targetClass, weightID
503
504.. autoclass:: Orange.classification.rules.WRACCEvaluator
505   :members:
506   :show-inheritance:
507   :exclude-members: targetClass, weightID
508   
509.. class:: Orange.classification.rules.RuleEvaluator_Entropy
510
511   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
512   
513.. class:: Orange.classification.rules.RuleEvaluator_LRS
514
515   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
516
517.. class:: Orange.classification.rules.RuleEvaluator_Laplace
518
519   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
520
521.. class:: Orange.classification.rules.RuleEvaluator_mEVC
522
523   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
524   
525Instance covering and removal
526-----------------------------
527
528.. class:: RuleCovererAndRemover
529
530   Base class for rule coverers and removers that, when invoked, remove
531   instances covered by the rule and return remaining instances.
532
533   .. method:: __call__(rule, instances, weights, target_class)
534   
535      Calculates a non-negative rule quality.
536     
537      :param rule: rule to evaluate.
538      :type rule: :class:`~Orange.classification.rules.Rule`
539     
540      :param instances: a table of instances, covered by the rule.
541      :type instances: :class:`Orange.data.Table`
542     
543      :param weights: index of the weight meta-attribute.
544      :type weights: int
545     
546      :param target_class: index of target class of this rule.
547      :type target_class: int
548
549.. autoclass:: CovererAndRemover_MultWeights
550
551.. autoclass:: CovererAndRemover_AddWeights
552   
553Miscellaneous functions
554-----------------------
555
556.. automethod:: Orange.classification.rules.rule_to_string
557
558..
559    Undocumented are:
560    Data-based Stopping Criteria
561    ----------------------------
562    Rule-based Stopping Criteria
563    ----------------------------
564    Rule-based Stopping Criteria
565    ----------------------------
566
567"""
568
569import random
570import math
571import operator
572import numpy
573
574import Orange
575import Orange.core
576from Orange.core import \
577    AssociationClassifier, \
578    AssociationLearner, \
579    RuleClassifier, \
580    RuleClassifier_firstRule, \
581    RuleClassifier_logit, \
582    RuleLearner, \
583    Rule, \
584    RuleBeamCandidateSelector, \
585    RuleBeamCandidateSelector_TakeAll, \
586    RuleBeamFilter, \
587    RuleBeamFilter_Width, \
588    RuleBeamInitializer, \
589    RuleBeamInitializer_Default, \
590    RuleBeamRefiner, \
591    RuleBeamRefiner_Selector, \
592    RuleClassifierConstructor, \
593    RuleCovererAndRemover, \
594    RuleCovererAndRemover_Default, \
595    RuleDataStoppingCriteria, \
596    RuleDataStoppingCriteria_NoPositives, \
597    RuleEvaluator, \
598    RuleEvaluator_Entropy, \
599    RuleEvaluator_LRS, \
600    RuleEvaluator_Laplace, \
601    RuleEvaluator_mEVC, \
602    RuleFinder, \
603    RuleBeamFinder, \
604    RuleList, \
605    RuleStoppingCriteria, \
606    RuleStoppingCriteria_NegativeDistribution, \
607    RuleValidator, \
608    RuleValidator_LRS
609from Orange.misc import deprecated_keywords
610from Orange.misc import deprecated_members
611
612
613class LaplaceEvaluator(RuleEvaluator):
614    """
615    Laplace's rule of succession.
616    """
617    def __call__(self, rule, data, weight_id, target_class, apriori):
618        if not rule.class_distribution:
619            return 0.
620        sumDist = rule.class_distribution.cases
621        if not sumDist or (target_class>-1 and not rule.class_distribution[target_class]):
622            return 0.
623        # get distribution
624        if target_class>-1:
625            return (rule.class_distribution[target_class]+1)/(sumDist+2)
626        else:
627            return (max(rule.class_distribution)+1)/(sumDist+len(data.domain.class_var.values))
628
629LaplaceEvaluator = deprecated_members({"weightID": "weight_id",
630                                       "targetClass": "target_class"})(LaplaceEvaluator)
631
632
633class WRACCEvaluator(RuleEvaluator):
634    """
635    Weighted relative accuracy.
636    """
637    def __call__(self, rule, data, weight_id, target_class, apriori):
638        if not rule.class_distribution:
639            return 0.
640        sumDist = rule.class_distribution.cases
641        if not sumDist or (target_class>-1 and not rule.class_distribution[target_class]):
642            return 0.
643        # get distribution
644        if target_class>-1:
645            pRule = rule.class_distribution[target_class]/apriori[target_class]
646            pTruePositive = rule.class_distribution[target_class]/sumDist
647            pClass = apriori[target_class]/apriori.cases
648        else:
649            pRule = sumDist/apriori.cases
650            pTruePositive = max(rule.class_distribution)/sumDist
651            pClass = apriori[rule.class_distribution.modus()]/sum(apriori)
652        if pTruePositive>pClass:
653            return pRule*(pTruePositive-pClass)
654        else: return (pTruePositive-pClass)/max(pRule,1e-6)
655
656WRACCEvaluator = deprecated_members({"weightID": "weight_id",
657                                     "targetClass": "target_class"})(WRACCEvaluator)
658
659
660class MEstimateEvaluator(RuleEvaluator):
661    """
662    Rule evaluator using m-estimate of probability rule evaluation function.
663   
664    :param m: m-value for m-estimate
665    :type m: int
666   
667    """
668    def __init__(self, m=2):
669        self.m = m
670    def __call__(self, rule, data, weight_id, target_class, apriori):
671        if not rule.class_distribution:
672            return 0.
673        sumDist = rule.class_distribution.abs
674        if self.m == 0 and not sumDist:
675            return 0.
676        # get distribution
677        if target_class>-1:
678            p = rule.class_distribution[target_class]+self.m*apriori[target_class]/apriori.abs
679            p = p / (rule.class_distribution.abs + self.m)
680        else:
681            p = max(rule.class_distribution)+self.m*apriori[rule.\
682                class_distribution.modus()]/apriori.abs
683            p = p / (rule.class_distribution.abs + self.m)     
684        return p
685
686MEstimateEvaluator = deprecated_members({"weightID": "weight_id",
687                                         "targetClass": "target_class"})(MEstimateEvaluator)
688
689
690class CN2Learner(RuleLearner):
691    """
692    Classical CN2 (see Clark and Niblett; 1988) induces a set of ordered
693    rules, which means that classificator must try these rules in the same
694    order as they were learned.
695   
696    If data instances are provided to the constructor, the learning algorithm
697    is called and the resulting classifier is returned instead of the learner.
698
699    :param evaluator: an object that evaluates a rule from covered instances.
700        By default, entropy is used as a measure.
701    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
702    :param beam_width: width of the search beam.
703    :type beam_width: int
704    :param alpha: significance level of the likelihood ratio statistics to
705        determine whether rule is better than the default rule.
706    :type alpha: float
707
708    """
709   
710    def __new__(cls, instances=None, weight_id=0, **kwargs):
711        self = RuleLearner.__new__(cls, **kwargs)
712        if instances is not None:
713            self.__init__(**kwargs)
714            return self.__call__(instances, weight_id)
715        else:
716            return self
717       
718    def __init__(self, evaluator = RuleEvaluator_Entropy(), beam_width = 5,
719        alpha = 1.0, **kwds):
720        self.__dict__.update(kwds)
721        self.rule_finder = RuleBeamFinder()
722        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width = beam_width)
723        self.rule_finder.evaluator = evaluator
724        self.rule_finder.validator = RuleValidator_LRS(alpha = alpha)
725       
726    def __call__(self, instances, weight=0):
727        supervisedClassCheck(instances)
728       
729        cl = RuleLearner.__call__(self,instances,weight)
730        rules = cl.rules
731        return CN2Classifier(rules, instances, weight)
732
733CN2Learner = deprecated_members({"beamWidth": "beam_width",
734                     "ruleFinder": "rule_finder",
735                     "ruleStopping": "rule_stopping",
736                     "dataStopping": "data_stopping",
737                     "coverAndRemove": "cover_and_remove",
738                     "storeInstances": "store_instances",
739                     "targetClass": "target_class",
740                     "baseRules": "base_rules",
741                     "weightID": "weight_id"})(CN2Learner)
742
743
744class CN2Classifier(RuleClassifier):
745    """
746    Classical CN2 (see Clark and Niblett; 1988) classifies a new instance
747    using an ordered set of rules. Usually the learner
748    (:class:`~Orange.classification.rules.CN2Learner`) is used to construct the
749    classifier.
750   
751    :param rules: learned rules to be used for classification (mandatory).
752    :type rules: :class:`~Orange.classification.rules.RuleList`
753   
754    :param instances: data instances that were used for learning.
755    :type instances: :class:`Orange.data.Table`
756   
757    :param weight_id: ID of the weight meta-attribute.
758    :type weight_id: int
759
760    """
761   
762    @deprecated_keywords({"examples": "instances"})
763    def __init__(self, rules=None, instances=None, weight_id = 0, **argkw):
764        self.rules = rules
765        self.examples = instances
766        self.weight_id = weight_id
767        self.class_var = None if instances is None else instances.domain.class_var
768        self.__dict__.update(argkw)
769        if instances is not None:
770            self.prior = Orange.statistics.distribution.Distribution(instances.domain.class_var,instances)
771
772    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
773        """
774        :param instance: instance to be classified.
775        :type instance: :class:`Orange.data.Instance`
776       
777        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
778              :class:`Orange.classification.Classifier.GetProbabilities` or
779              :class:`Orange.classification.Classifier.GetBoth`
780       
781        :rtype: :class:`Orange.data.Value`,
782              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
783        """
784        classifier = None
785        for r in self.rules:
786         #   r.filter.domain = instance.domain
787            if r(instance) and r.classifier:
788                classifier = r.classifier
789                classifier.defaultDistribution = r.class_distribution
790                break
791        if not classifier:
792            classifier = Orange.classification.ConstantClassifier(instance.domain.class_var,\
793                self.prior.modus())
794            classifier.defaultDistribution = self.prior
795
796        classifier.defaultDistribution.normalize()
797        if result_type == Orange.classification.Classifier.GetValue:
798          return classifier(instance)
799        if result_type == Orange.classification.Classifier.GetProbabilities:
800          return classifier.default_distribution
801        return (classifier(instance),classifier.default_distribution)
802
803    def __str__(self):
804        ret_str = rule_to_string(self.rules[0])+" "+str(self.rules[0].\
805            class_distribution)+"\n"
806        for r in self.rules[1:]:
807            ret_str += "ELSE "+rule_to_string(r)+" "+str(r.class_distribution)+"\n"
808        return ret_str
809
810CN2Classifier = deprecated_members({"resultType": "result_type",
811                                    "beamWidth": "beam_width"})(CN2Classifier)
812
813
814class CN2UnorderedLearner(RuleLearner):
815    """
816    CN2 unordered (see Clark and Boswell; 1991) induces a set of unordered
817    rules - classification from rules does not assume ordering of rules.
818    Learning rules is quite similar to learning in classical CN2, where
819    the process of learning of rules is separated to learning rules for each
820    class.
821   
822    If data instances are provided to the constructor, the learning algorithm
823    is called and the resulting classifier is returned instead of the learner.
824
825    :param evaluator: an object that evaluates a rule from covered instances.
826        By default, Laplace's rule of succession is used as a measure.
827    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
828    :param beam_width: width of the search beam.
829    :type beam_width: int
830    :param alpha: significance level of the likelihood ratio statistics to
831        determine whether rule is better than the default rule.
832    :type alpha: float
833    """
834    def __new__(cls, instances=None, weight_id=0, **kwargs):
835        self = RuleLearner.__new__(cls, **kwargs)
836        if instances is not None:
837            self.__init__(**kwargs)
838            return self.__call__(instances, weight_id)
839        else:
840            return self
841           
842    def __init__(self, evaluator = RuleEvaluator_Laplace(), beam_width = 5,
843        alpha = 1.0, **kwds):
844        self.__dict__.update(kwds)
845        self.rule_finder = RuleBeamFinder()
846        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width = beam_width)
847        self.rule_finder.evaluator = evaluator
848        self.rule_finder.validator = RuleValidator_LRS(alpha = alpha)
849        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha = 1.0)
850        self.rule_stopping = RuleStopping_Apriori()
851        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
852   
853    @deprecated_keywords({"weight": "weight_id"})
854    def __call__(self, instances, weight_id=0):
855        supervisedClassCheck(instances)
856       
857        rules = RuleList()
858        self.rule_stopping.apriori = Orange.statistics.distribution.Distribution(
859            instances.domain.class_var,instances)
860        progress=getattr(self,"progressCallback",None)
861        if progress:
862            progress.start = 0.0
863            progress.end = 0.0
864            distrib = Orange.statistics.distribution.Distribution(
865                instances.domain.class_var, instances, weight_id)
866            distrib.normalize()
867        for target_class in instances.domain.class_var:
868            if progress:
869                progress.start = progress.end
870                progress.end += distrib[target_class]
871            self.target_class = target_class
872            cl = RuleLearner.__call__(self,instances,weight_id)
873            for r in cl.rules:
874                rules.append(r)
875        if progress:
876            progress(1.0,None)
877        return CN2UnorderedClassifier(rules, instances, weight_id)
878
879CN2UnorderedLearner = deprecated_members({"beamWidth": "beam_width",
880                     "ruleFinder": "rule_finder",
881                     "ruleStopping": "rule_stopping",
882                     "dataStopping": "data_stopping",
883                     "coverAndRemove": "cover_and_remove",
884                     "storeInstances": "store_instances",
885                     "targetClass": "target_class",
886                     "baseRules": "base_rules",
887                     "weightID": "weight_id"})(CN2UnorderedLearner)
888
889
890class CN2UnorderedClassifier(RuleClassifier):
891    """
892    CN2 unordered (see Clark and Boswell; 1991) classifies a new instance using
893    a set of unordered rules. Usually the learner
894    (:class:`~Orange.classification.rules.CN2UnorderedLearner`) is used to
895    construct the classifier.
896   
897    :param rules: learned rules to be used for classification (mandatory).
898    :type rules: :class:`~Orange.classification.rules.RuleList`
899   
900    :param instances: data instances that were used for learning.
901    :type instances: :class:`Orange.data.Table`
902   
903    :param weight_id: ID of the weight meta-attribute.
904    :type weight_id: int
905
906    """
907
908    @deprecated_keywords({"examples": "instances"})
909    def __init__(self, rules = None, instances = None, weight_id = 0, **argkw):
910        self.rules = rules
911        self.examples = instances
912        self.weight_id = weight_id
913        self.class_var = instances.domain.class_var if instances is not None else None
914        self.__dict__.update(argkw)
915        if instances is not None:
916            self.prior = Orange.statistics.distribution.Distribution(
917                                instances.domain.class_var, instances)
918
919    @deprecated_keywords({"retRules": "ret_rules"})
920    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue, ret_rules = False):
921        """
922        :param instance: instance to be classified.
923        :type instance: :class:`Orange.data.Instance`
924       
925        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
926              :class:`Orange.classification.Classifier.GetProbabilities` or
927              :class:`Orange.classification.Classifier.GetBoth`
928       
929        :rtype: :class:`Orange.data.Value`,
930              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
931        """
932        def add(disc1, disc2, sumd):
933            disc = Orange.statistics.distribution.Discrete(disc1)
934            sumdisc = sumd
935            for i,d in enumerate(disc):
936                disc[i]+=disc2[i]
937                sumdisc += disc2[i]
938            return disc, sumdisc
939
940        # create empty distribution
941        retDist = Orange.statistics.distribution.Discrete(self.examples.domain.class_var)
942        covRules = RuleList()
943        # iterate through instances - add distributions
944        sumdisc = 0.
945        for r in self.rules:
946            if r(instance) and r.class_distribution:
947                retDist, sumdisc = add(retDist, r.class_distribution, sumdisc)
948                covRules.append(r)
949        if not sumdisc:
950            retDist = self.prior
951            sumdisc = self.prior.abs
952           
953        if sumdisc > 0.0:
954            for c in self.examples.domain.class_var:
955                retDist[c] /= sumdisc
956        else:
957            retDist.normalize()
958       
959        if ret_rules:
960            if result_type == Orange.classification.Classifier.GetValue:
961              return (retDist.modus(), covRules)
962            if result_type == Orange.classification.Classifier.GetProbabilities:
963              return (retDist, covRules)
964            return (retDist.modus(),retDist,covRules)
965        if result_type == Orange.classification.Classifier.GetValue:
966          return retDist.modus()
967        if result_type == Orange.classification.Classifier.GetProbabilities:
968          return retDist
969        return (retDist.modus(),retDist)
970
971    def __str__(self):
972        retStr = ""
973        for r in self.rules:
974            retStr += rule_to_string(r)+" "+str(r.class_distribution)+"\n"
975        return retStr
976
977
978class CN2SDUnorderedLearner(CN2UnorderedLearner):
979    """
980    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules, which
981    is the same as :class:`~Orange.classification.rules.CN2UnorderedLearner`.
982    The difference between classical CN2 unordered and CN2-SD is selection of
983    specific evaluation function and covering function:
984    :class:`WRACCEvaluator` is used to implement
985    weight-relative accuracy and
986    :class:`CovererAndRemover_MultWeights` avoids
987    excluding covered instances, multiplying their weight by the value of
988    mult parameter instead.
989   
990    If data instances are provided to the constructor, the learning algorithm
991    is called and the resulting classifier is returned instead of the learner.
992
993    :param evaluator: an object that evaluates a rule from covered instances.
994        By default, weighted relative accuracy is used.
995    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
996   
997    :param beam_width: width of the search beam.
998    :type beam_width: int
999   
1000    :param alpha: significance level of the likelihood ratio statistics to
1001        determine whether rule is better than the default rule.
1002    :type alpha: float
1003   
1004    :param mult: multiplicator for weights of covered instances.
1005    :type mult: float
1006    """
1007    def __new__(cls, instances=None, weight_id=0, **kwargs):
1008        self = CN2UnorderedLearner.__new__(cls, **kwargs)
1009        if instances is not None:
1010            self.__init__(**kwargs)
1011            return self.__call__(instances, weight_id)
1012        else:
1013            return self
1014       
1015    def __init__(self, evaluator = WRACCEvaluator(), beam_width = 5,
1016                alpha = 0.05, mult=0.7, **kwds):
1017        CN2UnorderedLearner.__init__(self, evaluator = evaluator,
1018                                          beam_width = beam_width, alpha = alpha, **kwds)
1019        self.cover_and_remove = CovererAndRemover_MultWeights(mult=mult)
1020
1021    def __call__(self, instances, weight=0):       
1022        supervisedClassCheck(instances)
1023       
1024        oldInstances = Orange.data.Table(instances)
1025        classifier = CN2UnorderedLearner.__call__(self,instances,weight)
1026        for r in classifier.rules:
1027            r.filterAndStore(oldInstances,weight,r.classifier.default_val)
1028        return classifier
1029
1030
1031class ABCN2(RuleLearner):
1032    """
1033    This is an implementation of argument-based CN2 using EVC as evaluation
1034    and LRC classification.
1035   
1036    Rule learning parameters that can be passed to constructor:
1037   
1038    :param width: beam width (default 5).
1039    :type width: int
1040    :param learn_for_class: class for which to learn; None (default) if all
1041       classes are to be learnt.
1042    :param learn_one_rule: decides whether to rule one rule only (default
1043       False).
1044    :type learn_one_rule: boolean
1045    :param analyse_argument: index of argument to analyse; -1 to learn normally
1046       (default)
1047    :type analyse_argument: int
1048    :param debug: sets debug mode - prints some info during execution; False (default)
1049    :type debug: boolean
1050   
1051    The following evaluator related arguments are supported:
1052   
1053    :param m: m for m-estimate to be corrected with EVC (default 2).
1054    :type m: int
1055    :param opt_reduction: type of EVC correction: 0=no correction,
1056       1=pessimistic, 2=normal (default 2).
1057    :type opt_reduction: int
1058    :param nsampling: number of samples in estimating extreme value
1059       distribution for EVC (default 100).
1060    :type nsampling: int
1061    :param evd: pre-given extreme value distributions.
1062    :param evd_arguments: pre-given extreme value distributions for arguments.
1063   
1064    Those parameters control rule validation:
1065   
1066    :param rule_sig: minimal rule significance (default 1.0).
1067    :type rule_sig: float
1068    :param att_sig: minimal attribute significance in rule (default 1.0).
1069    :type att_sig: float
1070    :param max_rule_complexity: maximum number of conditions in rule (default 5).
1071    :type max_rule_complexity: int
1072    :param min_coverage: minimal number of covered instances (default 5).
1073    :type min_coverage: int
1074   
1075    Probabilistic covering can be controlled using:
1076   
1077    :param min_improved: minimal number of instances improved in probabilistic covering (default 1).
1078    :type min_improved: int
1079    :param min_improved_perc: minimal percentage of covered instances that need to be improved (default 0.0).
1080    :type min_improved_perc: float
1081   
1082    Finally, LRC (classifier) related parameters are:
1083   
1084    :param add_sub_rules: decides whether to add sub-rules.
1085    :type add_sub_rules: boolean
1086    :param min_cl_sig: minimal significance of beta in classifier (default 0.5).
1087    :type min_cl_sig: float
1088    :param min_beta: minimal beta value (default 0.0).
1089    :type min_beta: float
1090    :param set_prefix_rules: decides whether ordered prefix rules should be
1091       added (default False).
1092    :type set_prefix_rules: boolean
1093    :param alternative_learner: use rule-learner as a correction method for
1094       other machine learning methods (default None).
1095
1096    """
1097   
1098    def __init__(self, argument_id=0, width=5, m=2, opt_reduction=2, nsampling=100, max_rule_complexity=5,
1099                 rule_sig=1.0, att_sig=1.0, postpruning=None, min_quality=0., min_coverage=1, min_improved=1, min_improved_perc=0.0,
1100                 learn_for_class = None, learn_one_rule = False, evd=None, evd_arguments=None, prune_arguments=False, analyse_argument=-1,
1101                 alternative_learner = None, min_cl_sig = 0.5, min_beta = 0.0, set_prefix_rules = False, add_sub_rules = False, debug=False,
1102                 **kwds):
1103       
1104        # argument ID which is passed to abcn2 learner
1105        self.argument_id = argument_id
1106        # learn for specific class only?       
1107        self.learn_for_class = learn_for_class
1108        # only analysing a specific argument or learning all at once
1109        self.analyse_argument = analyse_argument
1110        # should we learn only one rule?
1111        self.learn_one_rule = learn_one_rule
1112        self.postpruning = postpruning
1113        # rule finder
1114        self.rule_finder = RuleBeamFinder()
1115        self.ruleFilter = RuleBeamFilter_Width(width=width)
1116        self.ruleFilter_arguments = ABBeamFilter(width=width)
1117        if max_rule_complexity - 1 < 0:
1118            max_rule_complexity = 10
1119        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha = 1.0, min_quality = 0., max_rule_complexity = max_rule_complexity - 1, min_coverage=min_coverage)
1120        self.refiner = RuleBeamRefiner_Selector()
1121        self.refiner_arguments = SelectorAdder(discretizer = Orange.feature.discretization.EntropyDiscretization(forceAttribute = 1,
1122                                                                                           maxNumberOfIntervals = 2))
1123        self.prune_arguments = prune_arguments
1124        # evc evaluator
1125        evdGet = Orange.core.EVDistGetter_Standard()
1126        self.rule_finder.evaluator = RuleEvaluator_mEVC(m=m, evDistGetter = evdGet, min_improved = min_improved, min_improved_perc = min_improved_perc)
1127        self.rule_finder.evaluator.returnExpectedProb = True
1128        self.rule_finder.evaluator.optimismReduction = opt_reduction
1129        self.rule_finder.evaluator.ruleAlpha = rule_sig
1130        self.rule_finder.evaluator.attributeAlpha = att_sig
1131        self.rule_finder.evaluator.validator = RuleValidator_LRS(alpha = 1.0, min_quality = min_quality, min_coverage=min_coverage, max_rule_complexity = max_rule_complexity - 1)
1132
1133        # learn stopping criteria
1134        self.rule_stopping = None
1135        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
1136        # evd fitting
1137        self.evd_creator = EVDFitter(self,n=nsampling)
1138        self.evd = evd
1139        self.evd_arguments = evd_arguments
1140        # classifier
1141        self.add_sub_rules = add_sub_rules
1142        self.classifier = PILAR(alternative_learner = alternative_learner, min_cl_sig = min_cl_sig, min_beta = min_beta, set_prefix_rules = set_prefix_rules)
1143        self.debug = debug
1144        # arbitrary parameters
1145        self.__dict__.update(kwds)
1146
1147
1148    def __call__(self, examples, weight_id=0):
1149        # initialize progress bar
1150        progress=getattr(self,"progressCallback",None)
1151        if progress:
1152            progress.start = 0.0
1153            progress.end = 0.0
1154            distrib = Orange.statistics.distribution.Distribution(
1155                             examples.domain.class_var, examples, weight_id)
1156            distrib.normalize()
1157       
1158        # we begin with an empty set of rules
1159        all_rules = RuleList()
1160
1161        # th en, iterate through all classes and learn rule for each class separately
1162        for cl_i,cl in enumerate(examples.domain.class_var):
1163            if progress:
1164                step = distrib[cl] / 2.
1165                progress.start = progress.end
1166                progress.end += step
1167               
1168            if self.learn_for_class and not self.learn_for_class in [cl,cl_i]:
1169                continue
1170
1171            # rules for this class only
1172            rules = RuleList()
1173
1174            # create dichotomous class
1175            dich_data = self.create_dich_class(examples, cl)
1176
1177            # preparation of the learner (covering, evd, etc.)
1178            self.prepare_settings(dich_data, weight_id, cl_i, progress)
1179
1180            # learn argumented rules first ...
1181            self.turn_ABML_mode(dich_data, weight_id, cl_i)
1182            # first specialize all unspecialized arguments
1183            # dich_data = self.specialise_arguments(dich_data, weight_id)
1184            # comment: specialisation of arguments is within learning of an argumented rule;
1185            #          this is now different from the published algorithm
1186            if progress:
1187                progress.start = progress.end
1188                progress.end += step
1189           
1190            aes = self.get_argumented_examples(dich_data)
1191            aes = self.sort_arguments(aes, dich_data)
1192            while aes:
1193                if self.analyse_argument > -1 and \
1194                   (isinstance(self.analyse_argument, Orange.core.Example) and not Orange.core.Example(dich_data.domain, self.analyse_argument) == aes[0] or \
1195                    isinstance(self.analyse_argument, int) and not dich_data[self.analyse_argument] == aes[0]):
1196                    aes = aes[1:]
1197                    continue
1198                ae = aes[0]
1199                rule = self.learn_argumented_rule(ae, dich_data, weight_id) # target class is always first class (0)
1200                if self.debug and rule:
1201                    print "learned arg rule", Orange.classification.rules.rule_to_string(rule)
1202                elif self.debug:
1203                    print "no rule came out of ", ae
1204                if rule:
1205                    rules.append(rule)
1206                    aes = filter(lambda x: not rule(x), aes)
1207                else:
1208                    aes = aes[1:]
1209                aes = aes[1:]
1210
1211            if not progress and self.debug:
1212                print " arguments finished ... "                   
1213                   
1214            # remove all examples covered by rules
1215            for rule in rules:
1216                dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1217            if progress:
1218                progress(self.remaining_probability(dich_data),None)
1219
1220            # learn normal rules on remaining examples
1221            if self.analyse_argument == -1:
1222                self.turn_normal_mode(dich_data, weight_id, cl_i)
1223                while dich_data:
1224                    # learn a rule
1225                    rule = self.learn_normal_rule(dich_data, weight_id, self.apriori)
1226                    if not rule:
1227                        break
1228                    if self.debug:
1229                        print "rule learned: ", Orange.classification.rules.rule_to_string(rule), rule.quality
1230                    dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1231                    if progress:
1232                        progress(self.remaining_probability(dich_data),None)
1233                    rules.append(rule)
1234                    if self.learn_one_rule:
1235                        break
1236
1237            # prune unnecessary rules
1238            rules = self.prune_unnecessary_rules(rules, dich_data, weight_id)
1239
1240            if self.add_sub_rules:
1241                rules = self.add_sub_rules_call(rules, dich_data, weight_id)
1242
1243            # restore domain and class in rules, add them to all_rules
1244            for r in rules:
1245                all_rules.append(self.change_domain(r, cl, examples, weight_id))
1246
1247            if progress:
1248                progress(1.0,None)
1249        # create a classifier from all rules       
1250        return self.create_classifier(all_rules, examples, weight_id)
1251
1252    def learn_argumented_rule(self, ae, examples, weight_id):
1253        # prepare roots of rules from arguments
1254        positive_args = self.init_pos_args(ae, examples, weight_id)
1255        if not positive_args: # something wrong
1256            raise "There is a problem with argumented example %s"%str(ae)
1257            return None
1258        negative_args = self.init_neg_args(ae, examples, weight_id)
1259
1260        # set negative arguments in refiner
1261        self.rule_finder.refiner.notAllowedSelectors = negative_args
1262        self.rule_finder.refiner.example = ae
1263        # set arguments to filter
1264        self.rule_finder.ruleFilter.setArguments(examples.domain,positive_args)
1265
1266        # learn a rule
1267        self.rule_finder.evaluator.bestRule = None
1268        self.rule_finder(examples,weight_id, 0, positive_args)
1269       
1270        # return best rule
1271        return self.rule_finder.evaluator.bestRule
1272       
1273    def prepare_settings(self, examples, weight_id, cl_i, progress):
1274        # apriori distribution
1275        self.apriori = Orange.statistics.distribution.Distribution(
1276                                examples.domain.class_var,examples,weight_id)
1277       
1278        # prepare covering mechanism
1279        self.coverAndRemove = CovererAndRemover_Prob(examples, weight_id, 0, self.apriori, self.argument_id)
1280        self.rule_finder.evaluator.probVar = examples.domain.getmeta(self.cover_and_remove.probAttribute)
1281
1282        # compute extreme distributions
1283        # TODO: why evd and evd_this????
1284        if self.rule_finder.evaluator.optimismReduction > 0 and not self.evd:
1285            self.evd_this = self.evd_creator.computeEVD(examples, weight_id, target_class=0, progress = progress)
1286        if self.evd:
1287            self.evd_this = self.evd[cl_i]
1288
1289    def turn_ABML_mode(self, examples, weight_id, cl_i):
1290        # evaluator
1291        if self.rule_finder.evaluator.optimismReduction > 0 and self.argument_id:
1292            if self.evd_arguments:
1293                self.rule_finder.evaluator.evDistGetter.dists = self.evd_arguments[cl_i]
1294            else:
1295                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD_example(examples, weight_id, target_class=0)
1296        # rule refiner
1297        self.rule_finder.refiner = self.refiner_arguments
1298        self.rule_finder.refiner.argument_id = self.argument_id
1299        self.rule_finder.ruleFilter = self.ruleFilter_arguments
1300
1301    def create_dich_class(self, examples, cl):
1302        """
1303        Create dichotomous class.
1304        """
1305        (newDomain, targetVal) = createDichotomousClass(examples.domain, examples.domain.class_var, str(cl), negate=0)
1306        newDomainmetas = newDomain.getmetas()
1307        newDomain.addmeta(Orange.data.new_meta_id(), examples.domain.class_var) # old class as meta
1308        dichData = examples.select(newDomain)
1309        if self.argument_id:
1310            for d in dichData: # remove arguments given to other classes
1311                if not d.getclass() == targetVal:
1312                    d[self.argument_id] = "?"
1313        return dichData
1314
1315    def get_argumented_examples(self, examples):
1316        if not self.argument_id:
1317            return None
1318       
1319        # get argumented examples
1320        return ArgumentFilter_hasSpecial()(examples, self.argument_id, target_class = 0)
1321
1322    def sort_arguments(self, arg_examples, examples):
1323        if not self.argument_id:
1324            return None
1325        evaluateAndSortArguments(examples, self.argument_id)
1326        if len(arg_examples)>0:
1327            # sort examples by their arguments quality (using first argument as it has already been sorted)
1328            sorted = arg_examples.native()
1329            sorted.sort(lambda x,y: -cmp(x[self.argument_id].value.positive_arguments[0].quality,
1330                                         y[self.argument_id].value.positive_arguments[0].quality))
1331            return Orange.data.Table(examples.domain, sorted)
1332        else:
1333            return None
1334
1335    def turn_normal_mode(self, examples, weight_id, cl_i):
1336        # evaluator
1337        if self.rule_finder.evaluator.optimismReduction > 0:
1338            if self.evd:
1339                self.rule_finder.evaluator.evDistGetter.dists = self.evd[cl_i]
1340            else:
1341                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD(examples, weight_id, target_class=0)
1342        # rule refiner
1343        self.rule_finder.refiner = self.refiner
1344        self.rule_finder.ruleFilter = self.ruleFilter
1345       
1346    def learn_normal_rule(self, examples, weight_id, apriori):
1347        if hasattr(self.rule_finder.evaluator, "bestRule"):
1348            self.rule_finder.evaluator.bestRule = None
1349        rule = self.rule_finder(examples,weight_id,0,RuleList())
1350        if hasattr(self.rule_finder.evaluator, "bestRule") and self.rule_finder.evaluator.returnExpectedProb:
1351            rule = self.rule_finder.evaluator.bestRule
1352            self.rule_finder.evaluator.bestRule = None
1353        if self.postpruning:
1354            rule = self.postpruning(rule,examples,weight_id,0, aprior)
1355        return rule
1356
1357    def remove_covered_examples(self, rule, examples, weight_id):
1358        nexamples, nweight = self.cover_and_remove(rule,examples,weight_id,0)
1359        return nexamples
1360
1361
1362    def prune_unnecessary_rules(self, rules, examples, weight_id):
1363        return self.cover_and_remove.getBestRules(rules,examples,weight_id)
1364
1365    def change_domain(self, rule, cl, examples, weight_id):
1366        rule.filter = Orange.core.Filter_values(domain = examples.domain,
1367                                        conditions = rule.filter.conditions)
1368        rule.filterAndStore(examples, weight_id, cl)       
1369        if hasattr(rule, "learner") and hasattr(rule.learner, "arg_example"):
1370            rule.learner.arg_example = Orange.data.Instance(examples.domain, rule.learner.arg_example)
1371        return rule
1372
1373    def create_classifier(self, rules, examples, weight_id):
1374        return self.classifier(rules, examples, weight_id)
1375
1376    def add_sub_rules_call(self, rules, examples, weight_id):
1377        apriori = Orange.statistics.distribution.Distribution(
1378                            examples.domain.class_var,examples,weight_id)
1379        new_rules = RuleList()
1380        for r in rules:
1381            new_rules.append(r)
1382
1383        # loop through rules
1384        for r in rules:
1385            tmpList = RuleList()
1386            tmpRle = r.clone()
1387            tmpRle.filter.conditions = r.filter.conditions[:r.requiredConditions] # do not split argument
1388            tmpRle.parentRule = None
1389            tmpRle.filterAndStore(examples,weight_id,r.classifier.default_val)
1390            tmpRle.complexity = 0
1391            tmpList.append(tmpRle)
1392            while tmpList and len(tmpList[0].filter.conditions) <= len(r.filter.conditions):
1393                tmpList2 = RuleList()
1394                for tmpRule in tmpList:
1395                    # evaluate tmpRule
1396                    oldREP = self.rule_finder.evaluator.returnExpectedProb
1397                    self.rule_finder.evaluator.returnExpectedProb = False
1398                    tmpRule.quality = self.rule_finder.evaluator(tmpRule,examples,weight_id,r.classifier.default_val,apriori)
1399                    self.rule_finder.evaluator.returnExpectedProb = oldREP
1400                tmpList.sort(lambda x,y: -cmp(x.quality, y.quality))
1401                tmpList = tmpList[:self.rule_filter.width]
1402                   
1403                for tmpRule in tmpList:
1404                    # if rule not in rules already, add it to the list
1405                    if not True in [Orange.classification.rules.rules_equal(ri,tmpRule) for ri in new_rules] and len(tmpRule.filter.conditions)>0 and tmpRule.quality > apriori[r.classifier.default_val]/apriori.abs:
1406                        new_rules.append(tmpRule)
1407                    # create new tmpRules, set parent Rule, append them to tmpList2
1408                    if not True in [Orange.classification.rules.rules_equal(ri,tmpRule) for ri in new_rules]:
1409                        for c in r.filter.conditions:
1410                            tmpRule2 = tmpRule.clone()
1411                            tmpRule2.parentRule = tmpRule
1412                            tmpRule2.filter.conditions.append(c)
1413                            tmpRule2.filterAndStore(examples,weight_id,r.classifier.default_val)
1414                            tmpRule2.complexity += 1
1415                            if tmpRule2.class_distribution.abs < tmprule.class_distribution.abs:
1416                                tmpList2.append(tmpRule2)
1417                tmpList = tmpList2
1418        return new_rules
1419
1420    def init_pos_args(self, ae, examples, weight_id):
1421        pos_args = RuleList()
1422        # prepare arguments
1423        for p in ae[self.argument_id].value.positive_arguments:
1424            new_arg = Rule(filter=ArgFilter(argument_id = self.argument_id,
1425                                                   filter = self.newFilter_values(p.filter),
1426                                                   arg_example = ae),
1427                                                   complexity = 0)
1428            new_arg.valuesFilter = new_arg.filter.filter
1429            pos_args.append(new_arg)
1430
1431
1432        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1433            old_exp = self.rule_finder.evaluator.returnExpectedProb
1434            self.rule_finder.evaluator.returnExpectedProb = False
1435           
1436        # argument pruning (all or just unfinished arguments)
1437        # if pruning is chosen, then prune arguments if possible
1438        for p in pos_args:
1439            p.filterAndStore(examples, weight_id, 0)
1440            if not p.learner:
1441                p.learner = DefaultLearner(default_value=ae.getclass())
1442            # pruning on: we check on all conditions and take only best
1443            if self.prune_arguments:
1444                allowed_conditions = [c for c in p.filter.conditions]
1445                pruned_conditions = self.prune_arg_conditions(ae, allowed_conditions, examples, weight_id)
1446                p.baseDist = Orange.statistics.distribution.Distribution(examples.domain.classVar, examples, weight_id)
1447                p.filter.conditions = pruned_conditions
1448                p.learner.setattr("arg_length", 0)
1449               
1450            else: # prune only unspecified conditions
1451                spec_conditions = [c for c in p.filter.conditions if not c.unspecialized_condition]
1452                unspec_conditions = [c for c in p.filter.conditions if c.unspecialized_condition]
1453                # let rule cover now all examples filtered by specified conditions
1454                p.filter.conditions = spec_conditions
1455                p.filterAndStore(examples, weight_id, 0)
1456                p.baseDist = p.classDistribution
1457                p.learner.setattr("arg_length", len(p.filter.conditions))
1458                pruned_conditions = self.prune_arg_conditions(ae, unspec_conditions, p.examples, p.weightID)
1459                p.filter.conditions.extend(pruned_conditions)
1460                p.filter.filter.conditions.extend(pruned_conditions)
1461                # if argument does not contain all unspecialized reasons, add those reasons with minimum values
1462                at_oper_pairs = [(c.position, c.oper) for c in p.filter.conditions if type(c) == Orange.core.ValueFilter_continuous]
1463                for u in unspec_conditions:
1464                    if not (u.position, u.oper) in at_oper_pairs:
1465                        # find minimum value
1466                        if u.oper == Orange.core.ValueFilter_continuous.Greater or u.oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1467                            u.ref = min([float(e[u.position])-10. for e in p.examples])
1468                        else:
1469                            u.ref = max([float(e[u.position])+10. for e in p.examples])
1470                        p.filter.conditions.append(u)
1471                        p.filter.filter.conditions.append(u)
1472               
1473        # set parameters to arguments
1474        for p_i,p in enumerate(pos_args):
1475            p.filterAndStore(examples,weight_id,0)
1476            p.filter.domain = examples.domain
1477            p.classifier = p.learner(p.examples, p.weightID)
1478            p.requiredConditions = len(p.filter.conditions)
1479            p.learner.setattr("arg_example", ae)
1480            p.complexity = len(p.filter.conditions)
1481           
1482        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1483            self.rule_finder.evaluator.returnExpectedProb = old_exp
1484
1485        return pos_args
1486
1487    def newFilter_values(self, filter):
1488        newFilter = Orange.core.Filter_values()
1489        newFilter.conditions = filter.conditions[:]
1490        newFilter.domain = filter.domain
1491        newFilter.negate = filter.negate
1492        newFilter.conjunction = filter.conjunction
1493        return newFilter
1494
1495    def init_neg_args(self, ae, examples, weight_id):
1496        return ae[self.argument_id].value.negative_arguments
1497
1498    def remaining_probability(self, examples):
1499        return self.cover_and_remove.covered_percentage(examples)
1500
1501    def prune_arg_conditions(self, crit_example, allowed_conditions, examples, weight_id):
1502        if not allowed_conditions:
1503            return []
1504        cn2_learner = Orange.classification.rules.CN2UnorderedLearner()
1505        cn2_learner.rule_finder = RuleBeamFinder()
1506        cn2_learner.rule_finder.refiner = SelectorArgConditions(crit_example, allowed_conditions)
1507        cn2_learner.rule_finder.evaluator = Orange.classification.rules.MEstimateEvaluator(self.rule_finder.evaluator.m)
1508        rule = cn2_learner.rule_finder(examples,weight_id,0,RuleList())
1509        return rule.filter.conditions
1510
1511ABCN2 = deprecated_members({"beamWidth": "beam_width",
1512                     "ruleFinder": "rule_finder",
1513                     "ruleStopping": "rule_stopping",
1514                     "dataStopping": "data_stopping",
1515                     "coverAndRemove": "cover_and_remove",
1516                     "storeInstances": "store_instances",
1517                     "targetClass": "target_class",
1518                     "baseRules": "base_rules",
1519                     "weightID": "weight_id",
1520                     "argumentID": "argument_id"})(ABCN2)
1521
1522class CN2EVCUnorderedLearner(ABCN2):
1523    """
1524    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules in a
1525    simmilar manner as
1526    :class:`~Orange.classification.rules.CN2SDUnorderedLearner`. This
1527    implementation uses the EVC rule evaluation.
1528   
1529    If data instances are provided to the constructor, the learning algorithm
1530    is called and the resulting classifier is returned instead of the learner.
1531
1532    :param evaluator: an object that evaluates a rule from covered instances.
1533        By default, weighted relative accuracy is used.
1534    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
1535   
1536    :param beam_width: width of the search beam.
1537    :type beam_width: int
1538   
1539    :param alpha: significance level of the likelihood ratio statistics to
1540        determine whether rule is better than the default rule.
1541    :type alpha: float
1542   
1543    :param mult: multiplicator for weights of covered instances.
1544    :type mult: float
1545    """
1546    def __init__(self, width=5, nsampling=100, rule_sig=1.0, att_sig=1.0,\
1547        min_coverage = 1., max_rule_complexity = 5.):
1548        ABCN2.__init__(self, width=width, nsampling=nsampling,
1549            rule_sig=rule_sig, att_sig=att_sig, min_coverage=int(min_coverage),
1550            max_rule_complexity = int(max_rule_complexity))
1551
1552class DefaultLearner(Orange.core.Learner):
1553    """
1554    Default lerner - returns default classifier with predefined output class.
1555    """
1556    def __init__(self,default_value = None):
1557        self.default_value = default_value
1558    def __call__(self,examples,weight_id=0):
1559        return Orange.classification.majority.ConstantClassifier(self.default_value,defaultDistribution = Orange.core.Distribution(examples.domain.class_var,examples,weight_id))
1560
1561class ABCN2Ordered(ABCN2):
1562    """
1563    Rules learned by ABCN2 are ordered and used as a decision list.
1564    """
1565    def __init__(self, argument_id=0, **kwds):
1566        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1567        self.classifier.set_prefix_rules = True
1568        self.classifier.optimize_betas = False
1569
1570class ABCN2M(ABCN2):
1571    """
1572    Argument based rule learning with m-estimate as evaluation function.
1573    """
1574    def __init__(self, argument_id=0, **kwds):
1575        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1576        self.opt_reduction = 0
1577        self.rule_finder.evaluator.optimismReduction = self.opt_reduction       
1578        self.classifier = CN2UnorderedClassifier
1579
1580class ABCN2MLRC(ABCN2):
1581    """
1582    Argument based rule learning with m-estimate as evaluation function. LRC is used as a classification method.
1583    """
1584    def __init__(self, argument_id=0, **kwds):
1585        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1586        self.opt_reduction = 0
1587        self.rule_finder.evaluator.optimismReduction = self.opt_reduction       
1588
1589class ABCN2_StandardClassification(ABCN2):
1590    """
1591    Argument based rule learning with the original classification technique.
1592    """
1593    def __init__(self, argument_id=0, **kwds):
1594        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1595        self.classifier = CN2UnorderedClassifier       
1596
1597
1598class RuleStopping_Apriori(RuleStoppingCriteria):
1599    def __init__(self, apriori=None):
1600        self.apriori =  None
1601       
1602    def __call__(self,rules,rule,instances,data):
1603        if not self.apriori:
1604            return False
1605        if not type(rule.classifier) == Orange.classification.ConstantClassifier:
1606            return False
1607        ruleAcc = rule.class_distribution[rule.classifier.default_val]/rule.class_distribution.abs
1608        aprioriAcc = self.apriori[rule.classifier.default_val]/self.apriori.abs
1609        if ruleAcc>aprioriAcc:
1610            return False
1611        return True
1612
1613
1614class RuleStopping_SetRules(RuleStoppingCriteria):
1615    def __init__(self,validator):
1616        self.rule_stopping = RuleStoppingCriteria_NegativeDistribution()
1617        self.validator = validator
1618
1619    def __call__(self,rules,rule,instances,data):       
1620        ru_st = self.rule_stopping(rules,rule,instances,data)
1621        if not ru_st:
1622            self.validator.rules.append(rule)
1623        return bool(ru_st)
1624
1625
1626class LengthValidator(RuleValidator):
1627    """ prune rules with more conditions than self.length. """
1628    def __init__(self, length=-1):
1629        self.length = length
1630       
1631    def __call__(self, rule, data, weight_id, target_class, apriori):
1632        if self.length >= 0:
1633            return len(rule.filter.conditions) <= self.length
1634        return True   
1635
1636
1637class NoDuplicatesValidator(RuleValidator):
1638    def __init__(self,alpha=.05,min_coverage=0,max_rule_length=0,rules=RuleList()):
1639        self.rules = rules
1640        self.validator = RuleValidator_LRS(alpha=alpha,\
1641            min_coverage=min_coverage,max_rule_length=max_rule_length)
1642       
1643    def __call__(self, rule, data, weight_id, target_class, apriori):
1644        if rule_in_set(rule,self.rules):
1645            return False
1646        return bool(self.validator(rule,data,weight_id,target_class,apriori))
1647               
1648
1649
1650class RuleClassifier_BestRule(RuleClassifier):
1651    def __init__(self, rules, instances, weight_id = 0, **argkw):
1652        self.rules = rules
1653        self.examples = instances
1654        self.class_var = instances.domain.class_var
1655        self.__dict__.update(argkw)
1656        self.prior = Orange.statistics.distribution.Distribution(
1657                    instances.domain.class_var, instances)
1658
1659    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
1660        retDist = Orange.statistics.distribution.Distribution(instance.domain.class_var)
1661        bestRule = None
1662        for r in self.rules:
1663            if r(instance) and (not bestRule or r.quality>bestRule.quality):
1664                for v_i,v in enumerate(instance.domain.class_var):
1665                    retDist[v_i] = r.class_distribution[v_i]
1666                bestRule = r
1667        if not bestRule:
1668            retDist = self.prior
1669        else:
1670            bestRule.used += 1
1671        sumdist = sum(retDist)
1672        if sumdist > 0.0:
1673            for c in self.examples.domain.class_var:
1674                retDist[c] /= sumdisc
1675        else:
1676            retDist.normalize()
1677        # return classifier(instance, result_type=result_type)
1678        if result_type == Orange.classification.Classifier.GetValue:
1679          return retDist.modus()
1680        if result_type == Orange.classification.Classifier.GetProbabilities:
1681          return retDist
1682        return (retDist.modus(),retDist)
1683
1684    def __str__(self):
1685        retStr = ""
1686        for r in self.rules:
1687            retStr += rule_to_string(r)+" "+str(r.class_distribution)+"\n"
1688        return retStr   
1689
1690
1691class CovererAndRemover_MultWeights(RuleCovererAndRemover):
1692    """
1693    Covering and removing of instances using weight multiplication:
1694   
1695    :param mult: weighting multiplication factor
1696    :type mult: float   
1697    """
1698   
1699    def __init__(self, mult = 0.7):
1700        self.mult = mult
1701    def __call__(self, rule, instances, weights, target_class):
1702        if not weights:
1703            weights = Orange.data.new_meta_id()
1704            instances.addMetaAttribute(weights,1.)
1705            instances.domain.addmeta(weights, Orange.data.variable.\
1706                Continuous("weights-"+str(weights)), True)
1707        newWeightsID = Orange.data.new_meta_id()
1708        instances.addMetaAttribute(newWeightsID,1.)
1709        instances.domain.addmeta(newWeightsID, Orange.data.variable.\
1710            Continuous("weights-"+str(newWeightsID)), True)
1711        for instance in instances:
1712            if rule(instance) and instance.getclass() == rule.classifier(\
1713                instance,Orange.classification.Classifier.GetValue):
1714                instance[newWeightsID]=instance[weights]*self.mult
1715            else:
1716                instance[newWeightsID]=instance[weights]
1717        return (instances,newWeightsID)
1718
1719
1720class CovererAndRemover_AddWeights(RuleCovererAndRemover):
1721    """
1722    Covering and removing of instances using weight addition.
1723   
1724    """
1725   
1726    def __call__(self, rule, instances, weights, target_class):
1727        if not weights:
1728            weights = Orange.data.new_meta_id()
1729            instances.addMetaAttribute(weights,1.)
1730            instances.domain.addmeta(weights, Orange.data.variable.\
1731                Continuous("weights-"+str(weights)), True)
1732        try:
1733            coverage = instances.domain.getmeta("Coverage")
1734        except:
1735            coverage = Orange.data.variable.Continuous("Coverage")
1736            instances.domain.addmeta(Orange.data.new_meta_id(),coverage, True)
1737            instances.addMetaAttribute(coverage,0.0)
1738        newWeightsID = Orange.data.new_meta_id()
1739        instances.addMetaAttribute(newWeightsID,1.)
1740        instances.domain.addmeta(newWeightsID, Orange.data.variable.\
1741            Continuous("weights-"+str(newWeightsID)), True)
1742        for instance in instances:
1743            if rule(instance) and instance.getclass() == rule.classifier(instance,\
1744                    Orange.classification.Classifier.GetValue):
1745                try:
1746                    instance[coverage]+=1.0
1747                except:
1748                    instance[coverage]=1.0
1749                instance[newWeightsID]=1.0/(instance[coverage]+1)
1750            else:
1751                instance[newWeightsID]=instance[weights]
1752        return (instances,newWeightsID)
1753
1754
1755class CovererAndRemover_Prob(RuleCovererAndRemover):
1756    """ This class impements probabilistic covering. """
1757    def __init__(self, examples, weight_id, target_class, apriori, argument_id):
1758        self.best_rule = [None]*len(examples)
1759        self.prob_attribute = Orange.data.new_meta_id()
1760        self.apriori_prob = apriori[target_class]/apriori.abs
1761        examples.addMetaAttribute(self.prob_attribute, self.apriori_prob)
1762        examples.domain.addmeta(self.prob_attribute, 
1763            Orange.data.variable.Continuous("Probs"))
1764        self.argument_id = argument_id
1765
1766    def getBestRules(self, current_rules, examples, weight_id):
1767        best_rules = RuleList()
1768        for r_i,r in enumerate(self.best_rule):
1769            if r and not rule_in_set(r,best_rules) and int(examples[r_i].getclass())==int(r.classifier.default_value):
1770                if hasattr(r.learner, "arg_example"):
1771                    setattr(r, "best_example", r.learner.arg_example)
1772                else:
1773                    setattr(r, "best_example", examples[r_i])
1774                best_rules.append(r)
1775        return best_rules
1776       
1777    def __call__(self, rule, examples, weights, target_class):
1778        """ if example has an argument, then the rule must be consistent with the argument. """
1779        example = getattr(rule.learner, "arg_example", None)
1780        for ei, e in enumerate(examples):
1781            if e == example:
1782                e[self.prob_attribute] = 1.0
1783                self.best_rule[ei] = rule
1784            elif rule(e) and rule.quality>e[self.prob_attribute]:
1785                e[self.prob_attribute] = rule.quality+0.001 # 0.001 is added to avoid numerical errors
1786                self.best_rule[ei]=rule
1787        return (examples,weights)
1788
1789    def filter_covers_example(self, example, filter):
1790        filter_indices = RuleCoversArguments.filterIndices(filter)
1791        if filter(example):
1792            try:
1793                if example[self.argument_id].value and len(example[self.argument_id].value.positive_arguments)>0: # example has positive arguments
1794                    # conditions should cover at least one of the positive arguments
1795                    one_arg_covered = False
1796                    for pA in example[self.argument_id].value.positive_arguments:
1797                        arg_covered = [self.condIn(c,filter_indices) for c in pA.filter.conditions]
1798                        one_arg_covered = one_arg_covered or len(arg_covered) == sum(arg_covered) #arg_covered
1799                        if one_arg_covered:
1800                            break
1801                    if not one_arg_covered:
1802                        return False
1803                if example[self.argument_id].value and len(example[self.argument_id].value.negative_arguments)>0: # example has negative arguments
1804                    # condition should not cover neither of negative arguments
1805                    for pN in example[self.argument_id].value.negative_arguments:
1806                        arg_covered = [self.condIn(c, filter_indices) for c in pN.filter.conditions]
1807                        if len(arg_covered)==sum(arg_covered):
1808                            return False
1809            except:
1810                return True
1811            return True
1812        return False
1813
1814    def condIn(self, cond, filter_indices): # is condition in the filter?
1815        condInd = RuleCoversArguments.conditionIndex(cond)
1816        if operator.or_(condInd,filter_indices[cond.position]) == filter_indices[cond.position]:
1817            return True
1818        return False       
1819
1820
1821    def covered_percentage(self, examples):
1822        p = 0.0
1823        for ei, e in enumerate(examples):
1824            p += (e[self.prob_attribute] - self.apriori_prob)/(1.0-self.apriori_prob)
1825        return p/len(examples)
1826
1827
1828
1829
1830@deprecated_keywords({"showDistribution": "show_distribution"})
1831def rule_to_string(rule, show_distribution = True):
1832    """
1833    Write a string presentation of rule in human readable format.
1834   
1835    :param rule: rule to pretty-print.
1836    :type rule: :class:`~Orange.classification.rules.Rule`
1837   
1838    :param show_distribution: determines whether presentation should also
1839        contain the distribution of covered instances
1840    :type show_distribution: bool
1841   
1842    """
1843    def selectSign(oper):
1844        if oper == Orange.core.ValueFilter_continuous.Less:
1845            return "<"
1846        elif oper == Orange.core.ValueFilter_continuous.LessEqual:
1847            return "<="
1848        elif oper == Orange.core.ValueFilter_continuous.Greater:
1849            return ">"
1850        elif oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1851            return ">="
1852        else: return "="
1853
1854    if not rule:
1855        return "None"
1856    conds = rule.filter.conditions
1857    domain = rule.filter.domain
1858   
1859    ret = "IF "
1860    if len(conds)==0:
1861        ret = ret + "TRUE"
1862
1863    for i,c in enumerate(conds):
1864        if i > 0:
1865            ret += " AND "
1866        if type(c) == Orange.core.ValueFilter_discrete:
1867            ret += domain[c.position].name + "=" + str([domain[c.position].\
1868                values[int(v)] for v in c.values])
1869        elif type(c) == Orange.core.ValueFilter_continuous:
1870            ret += domain[c.position].name + selectSign(c.oper) + str(c.ref)
1871    if rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1872            and rule.classifier.default_val:
1873        ret = ret + " THEN "+domain.class_var.name+"="+\
1874        str(rule.classifier.default_value)
1875        if show_distribution:
1876            ret += str(rule.class_distribution)
1877    elif rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1878            and type(domain.class_var) == Orange.core.EnumVariable:
1879        ret = ret + " THEN "+domain.class_var.name+"="+\
1880        str(rule.class_distribution.modus())
1881        if show_distribution:
1882            ret += str(rule.class_distribution)
1883    return ret       
1884
1885def supervisedClassCheck(instances):
1886    if not instances.domain.class_var:
1887        raise Exception("Class variable is required!")
1888    if instances.domain.class_var.varType == Orange.core.VarTypes.Continuous:
1889        raise Exception("CN2 requires a discrete class!")
1890   
1891
1892def rule_in_set(rule,rules):
1893    for r in rules:
1894        if rules_equal(rule,r):
1895            return True
1896    return False
1897
1898def rules_equal(rule1,rule2):
1899    if not len(rule1.filter.conditions)==len(rule2.filter.conditions):
1900        return False
1901    for c1 in rule1.filter.conditions:
1902        found=False # find the same condition in the other rule
1903        for c2 in rule2.filter.conditions:
1904            try:
1905                if not c1.position == c2.position: continue # same feature?
1906                if not type(c1) == type(c2): continue # same type of condition
1907                if type(c1) == Orange.core.ValueFilter_discrete:
1908                    if not type(c1.values[0]) == type(c2.values[0]): continue
1909                    if not c1.values[0] == c2.values[0]: continue # same value?
1910                if type(c1) == Orange.core.ValueFilter_continuous:
1911                    if not c1.oper == c2.oper: continue # same operator?
1912                    if not c1.ref == c2.ref: continue #same threshold?
1913                found=True
1914                break
1915            except:
1916                pass
1917        if not found:
1918            return False
1919    return True
1920
1921# Miscellaneous - utility functions
1922def avg(l):
1923    if len(l)==0:
1924        return 0.
1925    return sum(l)/len(l)
1926
1927def var(l):
1928    if len(l)<2:
1929        return 0.
1930    av = avg(l)
1931    vars=[math.pow(li-av,2) for li in l]
1932    return sum(vars)/(len(l)-1)
1933
1934def median(l):
1935    if len(l)==0:
1936        return 0.   
1937    l.sort()
1938    le = len(l)
1939    if le%2 == 1:
1940        return l[(le-1)/2]
1941    else:
1942        return (l[le/2-1]+l[le/2])/2
1943
1944def perc(l,p):
1945    l.sort()
1946    return l[int(math.floor(p*len(l)))]
1947
1948class EVDFitter:
1949    """ Randomizes a dataset and fits an extreme value distribution onto it. """
1950
1951    def __init__(self, learner, n=200, randomseed=100):
1952        self.learner = learner
1953        self.n = n
1954        self.randomseed = randomseed
1955        # initialize random seed to make experiments repeatable
1956        random.seed(self.randomseed)
1957       
1958       
1959    def createRandomDataSet(self, data):
1960        newData = Orange.core.ExampleTable(data)
1961        # shuffle data
1962        cl_num = newData.toNumpy("C")
1963        random.shuffle(cl_num[0][:,0])
1964        clData = Orange.core.ExampleTable(Orange.core.Domain([newData.domain.classVar]),cl_num[0])
1965        for d_i,d in enumerate(newData):
1966            d[newData.domain.classVar] = clData[d_i][newData.domain.classVar]
1967        return newData
1968
1969    def createEVDistList(self, evdList):
1970        l = Orange.core.EVDistList()
1971        for el in evdList:
1972            l.append(Orange.core.EVDist(mu=el[0],beta=el[1],percentiles=el[2]))
1973        return l
1974
1975
1976    # estimated fisher tippett parameters for a set of values given in vals list (+ deciles)
1977    def compParameters(self, vals, oldMi, oldBeta, oldPercs, fixedBeta=False):                   
1978        # compute percentiles
1979        vals.sort()
1980        N = len(vals)
1981        percs = [avg(vals[int(float(N)*i/10):int(float(N)*(i+1)/10)]) for i in range(10)]
1982        if N<10:
1983            return oldMi, oldBeta, percs
1984        if not fixedBeta:
1985            beta = min(2.0, math.sqrt(6*var(vals)/math.pow(math.pi,2)))#min(2.0, max(oldBeta, math.sqrt(6*var(vals)/math.pow(math.pi,2))))
1986        else:
1987            beta = oldBeta
1988        mi = max(oldMi,percs[-1]+beta*math.log(-math.log(0.95)))
1989        mi = percs[-1]+beta*math.log(-math.log(0.95))
1990        return max(oldMi, numpy.average(vals)-beta*0.5772156649), beta, None
1991   
1992    def prepare_learner(self):
1993        self.oldStopper = self.learner.ruleFinder.ruleStoppingValidator
1994        self.evaluator = self.learner.ruleFinder.evaluator
1995        self.refiner = self.learner.ruleFinder.refiner
1996        self.validator = self.learner.ruleFinder.validator
1997        self.ruleFilter = self.learner.ruleFinder.ruleFilter
1998        self.learner.ruleFinder.validator = None
1999        self.learner.ruleFinder.evaluator = Orange.core.RuleEvaluator_LRS()
2000        self.learner.ruleFinder.evaluator.storeRules = True
2001        self.learner.ruleFinder.ruleStoppingValidator = Orange.core.RuleValidator_LRS(alpha=1.0)
2002        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = 0
2003        self.learner.ruleFinder.refiner = Orange.core.RuleBeamRefiner_Selector()
2004        self.learner.ruleFinder.ruleFilter = Orange.core.RuleBeamFilter_Width(width = 5)
2005
2006
2007    def restore_learner(self):
2008        self.learner.ruleFinder.evaluator = self.evaluator
2009        self.learner.ruleFinder.ruleStoppingValidator = self.oldStopper
2010        self.learner.ruleFinder.refiner = self.refiner
2011        self.learner.ruleFinder.validator = self.validator
2012        self.learner.ruleFinder.ruleFilter = self.ruleFilter
2013
2014    def computeEVD(self, data, weightID=0, target_class=0, progress=None):
2015        import time
2016        # prepare learned for distribution computation       
2017        self.prepare_learner()
2018
2019        # loop through N (sampling repetitions)
2020        extremeDists=[(0, 1, [])]
2021        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = self.oldStopper.max_rule_complexity
2022        maxVals = [[] for l in range(self.oldStopper.max_rule_complexity+1)]
2023        for d_i in range(self.n):
2024            if not progress:
2025                if self.learner.debug:
2026                    print d_i,
2027            else:
2028                progress(float(d_i)/self.n, None)               
2029            # create data set (remove and randomize)
2030            a = time.time()
2031            tempData = self.createRandomDataSet(data)
2032            a = time.time()
2033            self.learner.ruleFinder.evaluator.rules = RuleList()
2034            a = time.time()
2035            for l in range(self.oldStopper.max_rule_complexity+2):
2036               self.learner.ruleFinder.evaluator.rules.append(None)
2037            a = time.time()
2038            # Next, learn a rule
2039            self.learner.ruleFinder(tempData,weightID,target_class, RuleList())
2040            a = time.time()
2041            for l in range(self.oldStopper.max_rule_complexity+1):
2042                if self.learner.ruleFinder.evaluator.rules[l]:
2043                    maxVals[l].append(self.learner.ruleFinder.evaluator.rules[l].quality)
2044                else:
2045                    maxVals[l].append(0)
2046##                qs = [r.quality for r in self.learner.ruleFinder.evaluator.rules if r.complexity == l+1]
2047####                if qs:
2048####                    for r in self.learner.ruleFinder.evaluator.rules:
2049####                        if r.quality == max(qs) and r.classDistribution.abs == 16 and r.classDistribution[0] == 16:
2050####                            print "best rule", orngCN2.ruleToString(r), r.quality
2051##                if qs:
2052##                    maxVals[l].append(max(qs))
2053##                else:
2054##                    maxVals[l].append(0)
2055            a = time.time()
2056
2057        # longer rule should always be better than shorter rule
2058        for l in range(self.oldStopper.max_rule_complexity):
2059            for i in range(len(maxVals[l])):
2060                if maxVals[l+1][i] < maxVals[l][i]:
2061                    maxVals[l+1][i] = maxVals[l][i]
2062##        print
2063##        for mi, m in enumerate(maxVals):
2064##            print "mi=",mi,m
2065
2066        mu, beta, perc = 1.0, 2.0, [0.0]*10
2067        for mi,m in enumerate(maxVals):
2068##            if mi == 0:
2069##                mu, beta, perc = self.compParameters(m, mu, beta, perc)
2070##            else:
2071            mu, beta, perc = self.compParameters(m, mu, beta, perc, fixedBeta=True)
2072            extremeDists.append((mu, beta, perc))
2073            extremeDists.extend([(0,1,[])]*(mi))
2074            if self.learner.debug:
2075                print mi, mu, beta, perc
2076
2077        self.restore_learner()
2078        return self.createEVDistList(extremeDists)
2079       
2080class ABBeamFilter(Orange.core.RuleBeamFilter):
2081    """
2082    ABBeamFilter: Filters beam;
2083        - leaves first N rules (by quality)
2084        - leaves first N rules that have only of arguments in condition part
2085    """
2086    def __init__(self,width=5):
2087        self.width=width
2088        self.pArgs=None
2089
2090    def __call__(self,rulesStar,examples,weight_id):
2091        newStar=Orange.core.RuleList()
2092        rulesStar.sort(lambda x,y: -cmp(x.quality,y.quality))
2093        argsNum=0
2094        for r_i,r in enumerate(rulesStar):
2095            if r_i<self.width: # either is one of best "width" rules
2096                newStar.append(r)
2097            elif self.onlyPositives(r):
2098                if argsNum<self.width:
2099                    newStar.append(r)
2100                    argsNum+=1
2101        return newStar               
2102
2103    def setArguments(self,domain,positive_arguments):
2104        self.pArgs = positive_arguments
2105        self.domain = domain
2106        self.argTab = [0]*len(self.domain.attributes)
2107        for arg in self.pArgs:
2108            for cond in arg.filter.conditions:
2109                self.argTab[cond.position]=1
2110       
2111    def onlyPositives(self,rule):
2112        if not self.pArgs:
2113            return False
2114
2115        ruleTab=[0]*len(self.domain.attributes)
2116        for cond in rule.filter.conditions:
2117            ruleTab[cond.position]=1
2118        return map(operator.or_,ruleTab,self.argTab)==self.argTab
2119
2120
2121class RuleCoversArguments:
2122    """
2123    Class determines if rule covers one out of a set of arguments.
2124    """
2125    def __init__(self, arguments):
2126        self.arguments = arguments
2127        self.indices = []
2128        for a in self.arguments:
2129            indNA = getattr(a.filter,"indices",None)
2130            if not indNA:
2131                a.filter.setattr("indices", RuleCoversArguments.filterIndices(a.filter))
2132            self.indices.append(a.filter.indices)
2133
2134    def __call__(self, rule):
2135        if not self.indices:
2136            return False
2137        if not getattr(rule.filter,"indices",None):
2138            rule.filter.indices = RuleCoversArguments.filterIndices(rule.filter)
2139        for index in self.indices:
2140            if map(operator.or_,rule.filter.indices,index) == rule.filter.indices:
2141                return True
2142        return False
2143
2144    def filterIndices(filter):
2145        if not filter.domain:
2146            return []
2147        ind = [0]*len(filter.domain.attributes)
2148        for c in filter.conditions:
2149            ind[c.position]=operator.or_(ind[c.position],
2150                                         RuleCoversArguments.conditionIndex(c))
2151        return ind
2152    filterIndices = staticmethod(filterIndices)
2153
2154    def conditionIndex(c):
2155        if type(c) == Orange.core.ValueFilter_continuous:
2156            if (c.oper == Orange.core.ValueFilter_continuous.GreaterEqual or
2157                c.oper == Orange.core.ValueFilter_continuous.Greater):
2158                return 5# 0101
2159            elif (c.oper == Orange.core.ValueFilter_continuous.LessEqual or
2160                  c.oper == Orange.core.ValueFilter_continuous.Less):
2161                return 3 # 0011
2162            else:
2163                return c.oper
2164        else:
2165            return 1 # 0001
2166    conditionIndex = staticmethod(conditionIndex)       
2167
2168    def oneSelectorToCover(ruleIndices, argIndices):
2169        at, type = -1, 0
2170        for r_i, ind in enumerate(ruleIndices):
2171            if not argIndices[r_i]:
2172                continue
2173            if at>-1 and not ind == argIndices[r_i]: # need two changes
2174                return (-1,0)
2175            if not ind == argIndices[r_i]:
2176                if argIndices[r_i] in [1,3,5]:
2177                    at,type=r_i,argIndices[r_i]
2178                if argIndices[r_i]==6:
2179                    if ind==3:
2180                        at,type=r_i,5
2181                    if ind==5:
2182                        at,type=r_i,3
2183        return at,type
2184    oneSelectorToCover = staticmethod(oneSelectorToCover)
2185
2186
2187class SelectorAdder(Orange.core.RuleBeamRefiner):
2188    """
2189    Selector adder, this function is a refiner function:
2190       - refined rules are not consistent with any of negative arguments.
2191    """
2192    def __init__(self, example=None, not_allowed_selectors=[], argument_id = None,
2193                 discretizer = Orange.core.EntropyDiscretization(forceAttribute=True)):
2194        # required values - needed values of attributes
2195        self.example = example
2196        self.argument_id = argument_id
2197        self.not_allowed_selectors = not_allowed_selectors
2198        self.discretizer = discretizer
2199       
2200    def __call__(self, oldRule, data, weight_id, target_class=-1):
2201        inNotAllowedSelectors = RuleCoversArguments(self.not_allowed_selectors)
2202        new_rules = Orange.core.RuleList()
2203
2204        # get positive indices (selectors already in the rule)
2205        indices = getattr(oldRule.filter,"indices",None)
2206        if not indices:
2207            indices = RuleCoversArguments.filterIndices(oldRule.filter)
2208            oldRule.filter.setattr("indices",indices)
2209
2210        # get negative indices (selectors that should not be in the rule)
2211        negative_indices = [0]*len(data.domain.attributes)
2212        for nA in self.not_allowed_selectors:
2213            #print indices, nA.filter.indices
2214            at_i,type_na = RuleCoversArguments.oneSelectorToCover(indices, nA.filter.indices)
2215            if at_i>-1:
2216                negative_indices[at_i] = operator.or_(negative_indices[at_i],type_na)
2217
2218        #iterate through indices = attributes
2219        for i,ind in enumerate(indices):
2220            if not self.example[i] or self.example[i].isSpecial():
2221                continue
2222            if ind == 1: 
2223                continue
2224            if data.domain[i].varType == Orange.core.VarTypes.Discrete and not negative_indices[i]==1: # DISCRETE attribute
2225                if self.example:
2226                    values = [self.example[i]]
2227                else:
2228                    values = data.domain[i].values
2229                for v in values:
2230                    tempRule = oldRule.clone()
2231                    tempRule.filter.conditions.append(Orange.core.ValueFilter_discrete(position = i,
2232                                                                                  values = [Orange.core.Value(data.domain[i],v)],
2233                                                                                  acceptSpecial=0))
2234                    tempRule.complexity += 1
2235                    tempRule.filter.indices[i] = 1 # 1 stands for discrete attribute (see RuleCoversArguments.conditionIndex)
2236                    tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2237                    if len(tempRule.examples)<len(oldRule.examples):
2238                        new_rules.append(tempRule)
2239            elif data.domain[i].varType == Orange.core.VarTypes.Continuous and not negative_indices[i]==7: # CONTINUOUS attribute
2240                try:
2241                    at = data.domain[i]
2242                    at_d = self.discretizer(at,oldRule.examples)
2243                except:
2244                    continue # discretization failed !
2245                # If discretization makes sense? then:
2246                if len(at_d.values)>1:
2247                    for p in at_d.getValueFrom.transformer.points:
2248                        #LESS
2249                        if not negative_indices[i]==3:
2250                            tempRule = self.getTempRule(oldRule,i,Orange.core.ValueFilter_continuous.LessEqual,p,target_class,3)
2251                            if len(tempRule.examples)<len(oldRule.examples) and self.example[i]<=p:# and not inNotAllowedSelectors(tempRule):
2252                                new_rules.append(tempRule)
2253                        #GREATER
2254                        if not negative_indices[i]==5:
2255                            tempRule = self.getTempRule(oldRule,i,Orange.core.ValueFilter_continuous.Greater,p,target_class,5)
2256                            if len(tempRule.examples)<len(oldRule.examples) and self.example[i]>p:# and not inNotAllowedSelectors(tempRule):
2257                                new_rules.append(tempRule)
2258        for r in new_rules:
2259            r.parentRule = oldRule
2260            r.valuesFilter = r.filter.filter
2261        return new_rules
2262
2263    def getTempRule(self,oldRule,pos,oper,ref,target_class,atIndex):
2264        tempRule = oldRule.clone()
2265
2266        tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=pos,
2267                                                                        oper=oper,
2268                                                                        ref=ref,
2269                                                                        acceptSpecial=0))
2270        tempRule.complexity += 1
2271        tempRule.filter.indices[pos] = operator.or_(tempRule.filter.indices[pos],atIndex) # from RuleCoversArguments.conditionIndex
2272        tempRule.filterAndStore(oldRule.examples,tempRule.weightID,target_class)
2273        return tempRule
2274
2275    def setCondition(self, oldRule, target_class, ci, condition):
2276        tempRule = oldRule.clone()
2277        tempRule.filter.conditions[ci] = condition
2278        tempRule.filter.conditions[ci].setattr("specialized",1)
2279        tempRule.filterAndStore(oldRule.examples,oldRule.weightID,target_class)
2280        return tempRule
2281
2282SelectorAdder = deprecated_members({"notAllowedSelectors": "not_allowed_selectors",
2283                     "argumentID": "argument_id"})(SelectorAdder)
2284
2285# This filter is the ugliest code ever! Problem is with Orange, I had some problems with inheriting deepCopy
2286# I should take another look at it.
2287class ArgFilter(Orange.core.Filter):
2288    """ This class implements AB-covering principle. """
2289    def __init__(self, argument_id=None, filter = Orange.core.Filter_values(), arg_example = None):
2290        self.filter = filter
2291        self.indices = getattr(filter,"indices",[])
2292        if not self.indices and len(filter.conditions)>0:
2293            self.indices = RuleCoversArguments.filterIndices(filter)
2294        self.argument_id = argument_id
2295        self.domain = self.filter.domain
2296        self.conditions = filter.conditions
2297        self.arg_example = arg_example
2298       
2299    def condIn(self,cond): # is condition in the filter?
2300        condInd = ruleCoversArguments.conditionIndex(cond)
2301        if operator.or_(condInd,self.indices[cond.position]) == self.indices[cond.position]:
2302            return True
2303        return False
2304   
2305    def __call__(self,example):
2306##        print "in", self.filter(example)#, self.filter.conditions[0](example)
2307##        print self.filter.conditions[1].values
2308        if self.filter(example) and example != self.arg_example:
2309            return True
2310        elif self.filter(example):
2311            try:
2312                if example[self.argument_id].value and len(example[self.argument_id].value.positiveArguments)>0: # example has positive arguments
2313                    # conditions should cover at least one of the positive arguments
2314                    oneArgCovered = False
2315                    for pA in example[self.argument_id].value.positiveArguments:
2316                        argCovered = [self.condIn(c) for c in pA.filter.conditions]
2317                        oneArgCovered = oneArgCovered or len(argCovered) == sum(argCovered) #argCovered
2318                        if oneArgCovered:
2319                            break
2320                    if not oneArgCovered:
2321                        return False
2322                if example[self.argument_id].value and len(example[self.argument_id].value.negativeArguments)>0: # example has negative arguments
2323                    # condition should not cover neither of negative arguments
2324                    for pN in example[self.argument_id].value.negativeArguments:
2325                        argCovered = [self.condIn(c) for c in pN.filter.conditions]
2326                        if len(argCovered)==sum(argCovered):
2327                            return False
2328            except:
2329                return True
2330            return True
2331        else:
2332            return False
2333
2334    def __setattr__(self,name,obj):
2335        self.__dict__[name]=obj
2336        self.filter.setattr(name,obj)
2337
2338    def deep_copy(self):
2339        newFilter = ArgFilter(argument_id=self.argument_id)
2340        newFilter.filter = Orange.core.Filter_values() #self.filter.deepCopy()
2341        newFilter.filter.conditions = self.filter.conditions[:]
2342        newFilter.domain = self.filter.domain
2343        newFilter.negate = self.filter.negate
2344        newFilter.conjunction = self.filter.conjunction
2345        newFilter.domain = self.filter.domain
2346        newFilter.conditions = newFilter.filter.conditions
2347        newFilter.indices = self.indices[:]
2348        return newFilter
2349
2350ArgFilter = deprecated_members({"argumentID": "argument_id"})(ArgFilter)
2351
2352class SelectorArgConditions(Orange.core.RuleBeamRefiner):
2353    """
2354    Selector adder, this function is a refiner function:
2355      - refined rules are not consistent with any of negative arguments.
2356    """
2357    def __init__(self, example, allowed_selectors):
2358        # required values - needed values of attributes
2359        self.example = example
2360        self.allowed_selectors = allowed_selectors
2361
2362    def __call__(self, oldRule, data, weight_id, target_class=-1):
2363        if len(oldRule.filter.conditions) >= len(self.allowed_selectors):
2364            return Orange.core.RuleList()
2365        new_rules = Orange.core.RuleList()
2366        for c in self.allowed_selectors:
2367            # normal condition
2368            if not c.unspecialized_condition:
2369                tempRule = oldRule.clone()
2370                tempRule.filter.conditions.append(c)
2371                tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2372                if len(tempRule.examples)<len(oldRule.examples):
2373                    new_rules.append(tempRule)
2374            # unspecified condition
2375            else:
2376                # find all possible example values
2377                vals = {}
2378                for e in oldRule.examples:
2379                    if not e[c.position].isSpecial():
2380                        vals[str(e[c.position])] = 1
2381                values = vals.keys()
2382                # for each value make a condition
2383                for v in values:
2384                    tempRule = oldRule.clone()
2385                    tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=c.position,
2386                                                                                    oper=c.oper,
2387                                                                                    ref=float(v),
2388                                                                                    acceptSpecial=0))
2389                    if tempRule(self.example):
2390                        tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2391                        if len(tempRule.examples)<len(oldRule.examples):
2392                            new_rules.append(tempRule)
2393##        print " NEW RULES "
2394##        for r in new_rules:
2395##            print Orange.classification.rules.rule_to_string(r)
2396        for r in new_rules:
2397            r.parentRule = oldRule
2398##            print Orange.classification.rules.rule_to_string(r)
2399        return new_rules
2400
2401
2402class CrossValidation:
2403    def __init__(self, folds=5, random_generator = 150):
2404        self.folds = folds
2405        self.random_generator = random_generator
2406
2407    def __call__(self, learner, examples, weight):
2408        res = orngTest.crossValidation([learner], (examples, weight), folds = self.folds, random_generator = self.random_generator)
2409        return self.get_prob_from_res(res, examples)
2410
2411    def get_prob_from_res(self, res, examples):
2412        prob_dist = Orange.core.DistributionList()
2413        for tex in res.results:
2414            d = Orange.core.Distribution(examples.domain.class_var)
2415            for di in range(len(d)):
2416                d[di] = tex.probabilities[0][di]
2417            prob_dist.append(d)
2418        return prob_dist
2419
2420
2421class PILAR:
2422    """
2423    PILAR (Probabilistic improvement of learning algorithms with rules).
2424    """
2425    def __init__(self, alternative_learner = None, min_cl_sig = 0.5, min_beta = 0.0, set_prefix_rules = False, optimize_betas = True):
2426        self.alternative_learner = alternative_learner
2427        self.min_cl_sig = min_cl_sig
2428        self.min_beta = min_beta
2429        self.set_prefix_rules = set_prefix_rules
2430        self.optimize_betas = optimize_betas
2431        self.selected_evaluation = CrossValidation(folds=5)
2432
2433    def __call__(self, rules, examples, weight=0):
2434        rules = self.add_null_rule(rules, examples, weight)
2435        if self.alternative_learner:
2436            prob_dist = self.selected_evaluation(self.alternative_learner, examples, weight)
2437            classifier = self.alternative_learner(examples,weight)
2438##            prob_dist = Orange.core.DistributionList()
2439##            for e in examples:
2440##                prob_dist.append(classifier(e,Orange.core.GetProbabilities))
2441            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas, classifier, prob_dist)
2442        else:
2443            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas)
2444
2445##        print "result"
2446        for ri,r in enumerate(cl.rules):
2447            cl.rules[ri].setattr("beta",cl.ruleBetas[ri])
2448##            if cl.ruleBetas[ri] > 0:
2449##                print Orange.classification.rules.rule_to_string(r), r.quality, cl.ruleBetas[ri]
2450        cl.all_rules = cl.rules
2451        cl.rules = self.sort_rules(cl.rules)
2452        cl.ruleBetas = [r.beta for r in cl.rules]
2453        cl.setattr("data", examples)
2454        return cl
2455
2456    def add_null_rule(self, rules, examples, weight):
2457        for cl in examples.domain.class_var:
2458            tmpRle = Orange.core.Rule()
2459            tmpRle.filter = Orange.core.Filter_values(domain = examples.domain)
2460            tmpRle.parentRule = None
2461            tmpRle.filterAndStore(examples,weight,int(cl))
2462            tmpRle.quality = tmpRle.class_distribution[int(cl)]/tmpRle.class_distribution.abs
2463            rules.append(tmpRle)
2464        return rules
2465       
2466    def sort_rules(self, rules):
2467        new_rules = Orange.core.RuleList()
2468        foundRule = True
2469        while foundRule:
2470            foundRule = False
2471            bestRule = None
2472            for r in rules:
2473                if r in new_rules:
2474                    continue
2475                if r.beta < 0.01 and r.beta > -0.01:
2476                    continue
2477                if not bestRule:
2478                    bestRule = r
2479                    foundRule = True
2480                    continue
2481                if len(r.filter.conditions) < len(bestRule.filter.conditions):
2482                    bestRule = r
2483                    foundRule = True
2484                    continue
2485                if len(r.filter.conditions) ==  len(bestRule.filter.conditions) and r.beta > bestRule.beta:
2486                    bestRule = r
2487                    foundRule = True
2488                    continue
2489            if bestRule:
2490                new_rules.append(bestRule)
2491        return new_rules
2492
2493PILAR = deprecated_members({"sortRules": "sort_rules"})(PILAR)
2494
2495
2496class RuleClassifier_bestRule(Orange.core.RuleClassifier):
2497    """
2498    A very simple classifier, it takes the best rule of each class and
2499    normalizes probabilities.
2500    """
2501    def __init__(self, rules, examples, weight_id = 0, **argkw):
2502        self.rules = rules
2503        self.examples = examples
2504        self.apriori = Orange.core.Distribution(examples.domain.class_var,examples,weight_id)
2505        self.apriori_prob = [a/self.apriori.abs for a in self.apriori]
2506        self.weight_id = weight_id
2507        self.__dict__.update(argkw)
2508        self.default_class_index = -1
2509
2510    @deprecated_keywords({"retRules": "ret_rules"})
2511    def __call__(self, example, result_type=Orange.classification.Classifier.GetValue, ret_rules = False):
2512        example = Orange.core.Example(self.examples.domain,example)
2513        tempDist = Orange.core.Distribution(example.domain.class_var)
2514        best_rules = [None]*len(example.domain.class_var.values)
2515
2516        for r in self.rules:
2517            if r(example) and not self.default_class_index == int(r.classifier.default_val) and \
2518               (not best_rules[int(r.classifier.default_val)] or r.quality>tempDist[r.classifier.default_val]):
2519                tempDist[r.classifier.default_val] = r.quality
2520                best_rules[int(r.classifier.default_val)] = r
2521        for b in best_rules:
2522            if b:
2523                used = getattr(b,"used",0.0)
2524                b.setattr("used",used+1)
2525        nonCovPriorSum = sum([tempDist[i] == 0. and self.apriori_prob[i] or 0. for i in range(len(self.apriori_prob))])
2526        if tempDist.abs < 1.:
2527            residue = 1. - tempDist.abs
2528            for a_i,a in enumerate(self.apriori_prob):
2529                if tempDist[a_i] == 0.:
2530                    tempDist[a_i]=self.apriori_prob[a_i]*residue/nonCovPriorSum
2531            final_dist = tempDist #Orange.core.Distribution(example.domain.class_var)
2532        else:
2533            tempDist.normalize() # prior probability
2534            tmp_examples = Orange.core.ExampleTable(self.examples)
2535            for r in best_rules:
2536                if r:
2537                    tmp_examples = r.filter(tmp_examples)
2538            tmpDist = Orange.core.Distribution(tmp_examples.domain.class_var,tmp_examples,self.weight_id)
2539            tmpDist.normalize()
2540            probs = [0.]*len(self.examples.domain.class_var.values)
2541            for i in range(len(self.examples.domain.class_var.values)):
2542                probs[i] = tmpDist[i]+tempDist[i]*2
2543            final_dist = Orange.core.Distribution(self.examples.domain.class_var)
2544            for cl_i,cl in enumerate(self.examples.domain.class_var):
2545                final_dist[cl] = probs[cl_i]
2546            final_dist.normalize()
2547               
2548        if ret_rules: # Do you want to return rules with classification?
2549            if result_type == Orange.classification.Classifier.GetValue:
2550              return (final_dist.modus(),best_rules)
2551            if result_type == Orange.core.GetProbabilities:
2552              return (final_dist, best_rules)
2553            return (final_dist.modus(),final_dist, best_rules)
2554        if result_type == Orange.classification.Classifier.GetValue:
2555          return final_dist.modus()
2556        if result_type == Orange.core.GetProbabilities:
2557          return final_dist
2558        return (final_dist.modus(),final_dist)
2559   
2560RuleClassifier_bestRule = deprecated_members({"defaultClassIndex": "default_class_index"})(RuleClassifier_bestRule)
Note: See TracBrowser for help on using the repository browser.