source: orange/Orange/classification/rules.py @ 10219:11f9bd9ce992

Revision 10219:11f9bd9ce992, 109.4 KB checked in by Janez Demšar <janez.demsar@…>, 2 years ago (diff)

Orange.classification.rules no longer exports AssociationLearner and Classifier

Line 
1"""
2
3.. index:: rule induction
4
5.. index::
6   single: classification; rule induction
7
8**************************
9Rule induction (``rules``)
10**************************
11
12This module implements supervised rule induction algorithms
13and rule-based classification methods, specifically the
14`CN2 induction algorithm <http://www.springerlink.com/content/k6q2v76736w5039r/>`_
15in multiple variants, including an argument-based learning one.
16The implementation is modular, based on the rule induction
17framework that is described below, providing the opportunity to change, specialize
18and improve the algorithm.
19
20CN2 algorithm
21=============
22
23.. index::
24   single: classification; CN2
25
26Several variations of well-known CN2 rule learning algorithms are implemented.
27All are implemented by wrapping the
28:class:`~Orange.classification.rules.RuleLearner` class. Each CN2 learner class
29in this module changes some of RuleLearner's replaceable components to reflect
30the required behavior.
31
32Usage is consistent with typical learner usage in Orange:
33
34:download:`rules-cn2.py <code/rules-cn2.py>`
35
36.. literalinclude:: code/rules-cn2.py
37    :lines: 7-
38
39The result::
40   
41    IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000>
42    IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000>
43    IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000>
44    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000>
45    IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000>
46    IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000>
47    IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000>
48    IF status=['second'] THEN survived=yes<13.000, 80.000>
49    IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000>
50    IF status=['crew'] THEN survived=no<670.000, 192.000>
51    IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000>
52    IF sex=['male'] THEN survived=no<118.000, 57.000>
53    IF age=['child'] THEN survived=no<17.000, 14.000>
54    IF TRUE THEN survived=no<89.000, 76.000>
55   
56.. autoclass:: Orange.classification.rules.CN2Learner
57   :members:
58   :show-inheritance:
59   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
60      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
61   
62.. autoclass:: Orange.classification.rules.CN2Classifier
63   :members:
64   :show-inheritance:
65   :exclude-members: beamWidth, resultType
66   
67.. index:: unordered CN2
68
69.. index::
70   single: classification; unordered CN2
71
72.. autoclass:: Orange.classification.rules.CN2UnorderedLearner
73   :members:
74   :show-inheritance:
75   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
76      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
77   
78.. autoclass:: Orange.classification.rules.CN2UnorderedClassifier
79   :members:
80   :show-inheritance:
81   
82.. index:: CN2-SD
83.. index:: subgroup discovery
84
85.. index::
86   single: classification; CN2-SD
87   
88.. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner
89   :members:
90   :show-inheritance:
91   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
92      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
93   
94.. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner
95   :members:
96   :show-inheritance:
97   
98References
99----------
100
101* Clark, Niblett. `The CN2 Induction Algorithm
102  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine
103  Learning 3(4):261--284, 1989.
104* Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements
105  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In
106  Machine Learning - EWSL-91. Proceedings of the European Working Session on
107  Learning, pp 151--163, Porto, Portugal, March 1991.
108* Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD
109  <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal
110  of Machine Learning Research 5: 153-188, 2004.
111
112
113Argument based CN2
114==================
115
116Orange also supports argument-based CN2 learning.
117
118.. autoclass:: Orange.classification.rules.ABCN2
119   :members:
120   :show-inheritance:
121   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
122      ruleFinder, ruleStopping, storeInstances, targetClass, weightID,
123      argument_id
124   
125   This class has many more undocumented methods; see the source code for
126   reference.
127   
128.. autoclass:: Orange.classification.rules.ABCN2Ordered
129   :members:
130   :show-inheritance:
131   
132.. autoclass:: Orange.classification.rules.ABCN2M
133   :members:
134   :show-inheritance:
135   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
136      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
137
138Thismodule has many more undocumented argument-based learning related classed;
139see the source code for reference.
140
141References
142----------
143
144* Bratko, Mozina, Zabkar. `Argument-Based Machine Learning
145  <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in
146  Computer Science: vol. 4203/2006, 11-17, 2006.
147
148
149Rule induction framework
150========================
151
152A general framework of classes supports the described CN2 implementation, and
153can in fact be fine-tuned to specific needs by replacing individual components.
154Here is a simple example, while a detailed architecture can be observed
155in description of classes that follows it:
156
157part of :download:`rules-customized.py <code/rules-customized.py>`
158
159.. literalinclude:: code/rules-customized.py
160    :lines: 7-17
161
162probability with m=50. The result is::
163
164    IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000>
165    IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000>
166    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000>
167    IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000>
168    IF status=['second'] THEN survived=yes<13.000, 104.000>
169    IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000>
170    IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000>
171    IF status=['crew'] THEN survived=yes<3.000, 20.000>
172    IF sex=['female'] THEN survived=no<106.000, 90.000>
173    IF TRUE THEN survived=yes<0.000, 5.000>
174
175Notice that it is first necessary to set the :obj:`rule_finder` component,
176because the default components are not constructed when the learner is
177constructed, but only when we run it on data. At that time, the algorithm
178checks which components are necessary and sets defaults. Similarly, when the
179learner finishes, it destructs all *default* components. Continuing with our
180example, assume that we wish to set a different validation function and a
181different bean width. This is simply written as:
182
183part of :download:`rules-customized.py <code/rules-customized.py>`
184
185.. literalinclude:: code/rules-customized.py
186    :lines: 19-23
187
188.. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1)
189   
190   Representation of a single induced rule.
191   
192   Parameters, that can be passed to the constructor, correspond to the first
193   7 attributes. All attributes are:
194   
195   .. attribute:: filter
196   
197      contents of the rule; this is the basis of the Rule class. Must be of
198      type :class:`Orange.core.Filter`; an instance of
199      :class:`Orange.core.Filter_values` is set as a default.
200   
201   .. attribute:: classifier
202     
203      each rule can be used as a classical Orange like
204      classifier. Must be of type :class:`Orange.classification.Classifier`.
205      By default, an instance of :class:`Orange.classification.ConstantClassifier` is used.
206   
207   .. attribute:: learner
208     
209      learner to be used for making a classifier. Must be of type
210      :class:`Orange.classification.Learner`. By default,
211      :class:`Orange.classification.majority.MajorityLearner` is used.
212   
213   .. attribute:: class_distribution
214     
215      distribution of class in data instances covered by this rule
216      (:class:`Orange.statistics.distribution.Distribution`).
217   
218   .. attribute:: examples
219     
220      data instances covered by this rule (:class:`Orange.data.Table`).
221   
222   .. attribute:: weight_id
223   
224      ID of the weight meta-attribute for the stored data instances (int).
225   
226   .. attribute:: quality
227     
228      quality of the rule. Rules with higher quality are better (float).
229   
230   .. attribute:: complexity
231   
232      complexity of the rule (float). Complexity is used for
233      selecting between rules with equal quality, where rules with lower
234      complexity are preferred. Typically, complexity corresponds to the
235      number of selectors in rule (actually to number of conditions in filter),
236      but, obviously, any other measure can be applied.
237   
238   .. method:: filter_and_store(instances, weight_id=0, target_class=-1)
239   
240      Filter passed data instances and store them in the attribute 'examples'.
241      Also, compute class_distribution, set weight of stored examples and create
242      a new classifier using 'learner' attribute.
243     
244      :param weight_id: ID of the weight meta-attribute.
245      :type weight_id: int
246      :param target_class: index of target class; -1 for all.
247      :type target_class: int
248   
249   Objects of this class can be invoked:
250
251   .. method:: __call__(instance, instances, weight_id=0, target_class=-1)
252   
253      There are two ways of invoking this method. One way is only passing the
254      data instance; then the Rule object returns True if the given instance is
255      covered by the rule's filter.
256     
257      :param instance: data instance.
258      :type instance: :class:`Orange.data.Instance`
259     
260      Another way of invocation is passing a table of data instances,
261      in which case a table of instances, covered by this rule, is returned.
262     
263      :param instances: a table of data instances.
264      :type instances: :class:`Orange.data.Table`
265      :param ref: TODO
266      :type ref: bool
267      :param negate: if set to True, the result is inverted: the resulting
268          table contains instances *not* covered by the rule.
269      :type negate: bool
270
271.. py:class:: Orange.classification.rules.RuleLearner(store_instances = true, target_class = -1, base_rules = Orange.classification.rules.RuleList())
272   
273   Bases: :class:`Orange.classification.Learner`
274   
275   A base rule induction learner. The algorithm follows separate-and-conquer
276   strategy, which has its origins in the AQ family of algorithms
277   (Fuernkranz J.; Separate-and-Conquer Rule Learning, Artificial Intelligence
278   Review 13, 3-54, 1999). Basically, such algorithms search for the "best"
279   possible rule in learning instancess, remove covered data from learning
280   instances (separate) and repeat the process (conquer) on the remaining
281   instances.
282   
283   The class' functionality can be best explained by showing its __call__
284   function:
285   
286   .. parsed-literal::
287
288      def \_\_call\_\_(self, instances, weight_id=0):
289          rule_list = Orange.classification.rules.RuleList()
290          all_instances = Orange.data.Table(instances)
291          while not self.\ **data_stopping**\ (instances, weight_id, self.target_class):
292              new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class,
293                                        self.base_rules)
294              if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id):
295                  break
296              instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances,
297                                                      weight_id, self.target_class)
298              rule_list.append(new_rule)
299          return Orange.classification.rules.RuleClassifier_FirstRule(
300              rules=rule_list, instances=all_instances)
301               
302   The four customizable components here are the invoked :obj:`data_stopping`,
303   :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping`
304   objects. By default, components of the original CN2 algorithm are be used,
305   but this can be changed by modifying those attributes:
306   
307   .. attribute:: data_stopping
308   
309      an object of class
310      :class:`~Orange.classification.rules.RuleDataStoppingCriteria`
311      that determines whether there will be any benefit from further learning
312      (ie. if there is enough data to continue learning). The default
313      implementation
314      (:class:`~Orange.classification.rules.RuleDataStoppingCriteria_NoPositives`)
315      returns True if there are no more instances of given class.
316   
317   .. attribute:: rule_stopping
318     
319      an object of class
320      :class:`~Orange.classification.rules.RuleStoppingCriteria`
321      that decides from the last rule learned if it is worthwhile to use the
322      rule and learn more rules. By default, no rule stopping criteria is
323      used (:obj:`rule_stopping` == :obj:`None`), thus accepting all
324      rules.
325       
326   .. attribute:: cover_and_remove
327       
328      an object of
329      :class:`RuleCovererAndRemover` that removes
330      instances covered by the rule and returns remaining instances. The
331      default implementation
332      (:class:`RuleCovererAndRemover_Default`)
333      only removes the instances that belong to given target class, except if
334      it is not given (ie. :obj:`target_class` == -1).
335   
336   .. attribute:: rule_finder
337     
338      an object of class
339      :class:`~Orange.classification.rules.RuleFinder` that learns a single
340      rule from instances. Default implementation is
341      :class:`~Orange.classification.rules.RuleBeamFinder`.
342
343   :param store_instances: if set to True, the rules will have data instances
344       stored.
345   :type store_instances: bool
346   
347   :param target_class: index of a specific class being learned; -1 for all.
348   :type target_class: int
349   
350   :param base_rules: Rules that we would like to use in :obj:`rule_finder` to
351       constrain the learning space. If not set, it will be set to a set
352       containing only an empty rule.
353   :type base_rules: :class:`~Orange.classification.rules.RuleList`
354
355Rule finders
356------------
357
358.. class:: Orange.classification.rules.RuleFinder
359
360   Base class for all rule finders. These are used to learn a single rule from
361   instances.
362   
363   Rule finders are invokable in the following manner:
364   
365   .. method:: __call__(table, weight_id, target_class, base_rules)
366   
367      Return a new rule, induced from instances in the given table.
368     
369      :param table: data instances to learn from.
370      :type table: :class:`Orange.data.Table`
371     
372      :param weight_id: ID of the weight meta-attribute for the stored data
373          instances.
374      :type weight_id: int
375     
376      :param target_class: index of a specific class being learned; -1 for all.
377      :type target_class: int
378     
379      :param base_rules: Rules that we would like to use in :obj:`rule_finder`
380          to constrain the learning space. If not set, it will be set to a set
381          containing only an empty rule.
382      :type base_rules: :class:`~Orange.classification.rules.RuleList`
383
384.. class:: Orange.classification.rules.RuleBeamFinder
385   
386   Bases: :class:`~Orange.classification.rules.RuleFinder`
387   
388   Beam search for the best rule. This is the default class used in RuleLearner
389   to find the best rule. Pseudo code of the algorithm is shown here:
390
391   .. parsed-literal::
392
393      def \_\_call\_\_(self, table, weight_id, target_class, base_rules):
394          prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id)
395          rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior)
396          \# compute quality of rules in rules_star and best_rule
397          ...
398          while len(rules_star) \> 0:
399              candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id)
400              for cand in candidates:
401                  new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class)
402                  for new_rule in new_rules:
403                      if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution):
404                          new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior)
405                          rules_star.append(new_rule)
406                          if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and
407                              new_rule.quality \> best_rule.quality:
408                              best_rule = new_rule
409              rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id)
410          return best_rule
411
412   Bolded in the pseudo-code are several exchangeable components, exposed as
413   attributes. These are:
414
415   .. attribute:: initializer
416   
417      an object of class
418      :class:`~Orange.classification.rules.RuleBeamInitializer`
419      used to initialize :obj:`rules_star` and for selecting the
420      initial best rule. By default
421      (:class:`~Orange.classification.rules.RuleBeamInitializer_Default`),
422      :obj:`base_rules` are returned as starting :obj:`rulesSet` and the best
423      from :obj:`base_rules` is set as :obj:`best_rule`. If :obj:`base_rules`
424      are not set, this class will return :obj:`rules_star` with rule that
425      covers all instances (has no selectors) and this rule will be also used
426      as :obj:`best_rule`.
427   
428   .. attribute:: candidate_selector
429   
430      an object of class
431      :class:`~Orange.classification.rules.RuleBeamCandidateSelector`
432      used to separate a subset from the current
433      :obj:`rules_star` and return it. These rules will be used in the next
434      specification step. Default component (an instance of
435      :class:`~Orange.classification.rules.RuleBeamCandidateSelector_TakeAll`)
436      takes all rules in :obj:`rules_star`.
437   
438   .. attribute:: refiner
439   
440      an object of class
441      :class:`~Orange.classification.rules.RuleBeamRefiner`
442      used to refine given rule. New rule should cover a
443      strict subset of examples covered by given rule. Default component
444      (:class:`~Orange.classification.rules.RuleBeamRefiner_Selector`) adds
445      a conjunctive selector to selectors present in the rule.
446   
447   .. attribute:: rule_filter
448   
449      an object of class
450      :class:`~Orange.classification.rules.RuleBeamFilter`
451      used to filter rules to keep beam relatively small
452      to contain search complexity. By default, it takes five best rules:
453      :class:`~Orange.classification.rules.RuleBeamFilter_Width`\ *(m=5)*\ .
454
455   .. method:: __call__(data, weight_id, target_class, base_rules)
456
457   Determines the next best rule to cover the remaining data instances.
458   
459   :param data: data instances.
460   :type data: :class:`Orange.data.Table`
461   
462   :param weight_id: index of the weight meta-attribute.
463   :type weight_id: int
464   
465   :param target_class: index of the target class.
466   :type target_class: int
467   
468   :param base_rules: existing rules.
469   :type base_rules: :class:`~Orange.classification.rules.RuleList`
470
471Rule evaluators
472---------------
473
474.. class:: Orange.classification.rules.RuleEvaluator
475
476   Base class for rule evaluators that evaluate the quality of the rule based
477   on covered data instances. All evaluators support being invoked in the
478   following manner:
479   
480   .. method:: __call__(rule, instances, weight_id, target_class, prior)
481   
482      Calculates a non-negative rule quality.
483     
484      :param rule: rule to evaluate.
485      :type rule: :class:`~Orange.classification.rules.Rule`
486     
487      :param instances: a table of instances, covered by the rule.
488      :type instances: :class:`Orange.data.Table`
489     
490      :param weight_id: index of the weight meta-attribute.
491      :type weight_id: int
492     
493      :param target_class: index of target class of this rule.
494      :type target_class: int
495     
496      :param prior: prior class distribution.
497      :type prior: :class:`Orange.statistics.distribution.Distribution`
498
499.. autoclass:: Orange.classification.rules.LaplaceEvaluator
500   :members:
501   :show-inheritance:
502   :exclude-members: targetClass, weightID
503
504.. autoclass:: Orange.classification.rules.WRACCEvaluator
505   :members:
506   :show-inheritance:
507   :exclude-members: targetClass, weightID
508   
509.. class:: Orange.classification.rules.RuleEvaluator_Entropy
510
511   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
512   
513.. class:: Orange.classification.rules.RuleEvaluator_LRS
514
515   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
516
517.. class:: Orange.classification.rules.RuleEvaluator_Laplace
518
519   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
520
521.. class:: Orange.classification.rules.RuleEvaluator_mEVC
522
523   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
524   
525Instance covering and removal
526-----------------------------
527
528.. class:: RuleCovererAndRemover
529
530   Base class for rule coverers and removers that, when invoked, remove
531   instances covered by the rule and return remaining instances.
532
533   .. method:: __call__(rule, instances, weights, target_class)
534   
535      Calculates a non-negative rule quality.
536     
537      :param rule: rule to evaluate.
538      :type rule: :class:`~Orange.classification.rules.Rule`
539     
540      :param instances: a table of instances, covered by the rule.
541      :type instances: :class:`Orange.data.Table`
542     
543      :param weights: index of the weight meta-attribute.
544      :type weights: int
545     
546      :param target_class: index of target class of this rule.
547      :type target_class: int
548
549.. autoclass:: CovererAndRemover_MultWeights
550
551.. autoclass:: CovererAndRemover_AddWeights
552   
553Miscellaneous functions
554-----------------------
555
556.. automethod:: Orange.classification.rules.rule_to_string
557
558..
559    Undocumented are:
560    Data-based Stopping Criteria
561    ----------------------------
562    Rule-based Stopping Criteria
563    ----------------------------
564    Rule-based Stopping Criteria
565    ----------------------------
566
567"""
568
569import random
570import math
571import operator
572import numpy
573
574import Orange
575import Orange.core
576from Orange.core import \
577    RuleClassifier, \
578    RuleClassifier_firstRule, \
579    RuleClassifier_logit, \
580    RuleLearner, \
581    Rule, \
582    RuleBeamCandidateSelector, \
583    RuleBeamCandidateSelector_TakeAll, \
584    RuleBeamFilter, \
585    RuleBeamFilter_Width, \
586    RuleBeamInitializer, \
587    RuleBeamInitializer_Default, \
588    RuleBeamRefiner, \
589    RuleBeamRefiner_Selector, \
590    RuleClassifierConstructor, \
591    RuleCovererAndRemover, \
592    RuleCovererAndRemover_Default, \
593    RuleDataStoppingCriteria, \
594    RuleDataStoppingCriteria_NoPositives, \
595    RuleEvaluator, \
596    RuleEvaluator_Entropy, \
597    RuleEvaluator_LRS, \
598    RuleEvaluator_Laplace, \
599    RuleEvaluator_mEVC, \
600    RuleFinder, \
601    RuleBeamFinder, \
602    RuleList, \
603    RuleStoppingCriteria, \
604    RuleStoppingCriteria_NegativeDistribution, \
605    RuleValidator, \
606    RuleValidator_LRS
607from Orange.misc import deprecated_keywords
608from Orange.misc import deprecated_members
609
610
611class ConvertClass:
612    """ Converting class variables into dichotomous class variable. """
613    def __init__(self, classAtt, classValue, newClassAtt):
614        self.classAtt = classAtt
615        self.classValue = classValue
616        self.newClassAtt = newClassAtt
617
618    def __call__(self, example, returnWhat):
619        if example[self.classAtt] == self.classValue:
620            return Orange.data.Value(self.newClassAtt, self.classValue + "_")
621        else:
622            return Orange.data.Value(self.newClassAtt, "not " + self.classValue)
623
624
625def create_dichotomous_class(domain, att, value, negate, removeAtt=None):
626    # create new variable
627    newClass = Orange.feature.Discrete(att.name + "_", values=[str(value) + "_", "not " + str(value)])
628    positive = Orange.data.Value(newClass, str(value) + "_")
629    negative = Orange.data.Value(newClass, "not " + str(value))
630    newClass.getValueFrom = ConvertClass(att, str(value), newClass)
631
632    att = [a for a in domain.attributes]
633    newDomain = Orange.data.Domain(att + [newClass])
634    newDomain.addmetas(domain.getmetas())
635    if negate == 1:
636        return (newDomain, negative)
637    else:
638        return (newDomain, positive)
639
640
641class LaplaceEvaluator(RuleEvaluator):
642    """
643    Laplace's rule of succession.
644    """
645    def __call__(self, rule, data, weight_id, target_class, apriori):
646        if not rule.class_distribution:
647            return 0.
648        sumDist = rule.class_distribution.cases
649        if not sumDist or (target_class > -1 and not rule.class_distribution[target_class]):
650            return 0.
651        # get distribution
652        if target_class > -1:
653            return (rule.class_distribution[target_class] + 1) / (sumDist + 2)
654        else:
655            return (max(rule.class_distribution) + 1) / (sumDist + len(data.domain.class_var.values))
656
657LaplaceEvaluator = deprecated_members({"weightID": "weight_id",
658                                       "targetClass": "target_class"})(LaplaceEvaluator)
659
660
661class WRACCEvaluator(RuleEvaluator):
662    """
663    Weighted relative accuracy.
664    """
665    def __call__(self, rule, data, weight_id, target_class, apriori):
666        if not rule.class_distribution:
667            return 0.
668        sumDist = rule.class_distribution.cases
669        if not sumDist or (target_class > -1 and not rule.class_distribution[target_class]):
670            return 0.
671        # get distribution
672        if target_class > -1:
673            pRule = rule.class_distribution[target_class] / apriori[target_class]
674            pTruePositive = rule.class_distribution[target_class] / sumDist
675            pClass = apriori[target_class] / apriori.cases
676        else:
677            pRule = sumDist / apriori.cases
678            pTruePositive = max(rule.class_distribution) / sumDist
679            pClass = apriori[rule.class_distribution.modus()] / sum(apriori)
680        if pTruePositive > pClass:
681            return pRule * (pTruePositive - pClass)
682        else: return (pTruePositive - pClass) / max(pRule, 1e-6)
683
684WRACCEvaluator = deprecated_members({"weightID": "weight_id",
685                                     "targetClass": "target_class"})(WRACCEvaluator)
686
687
688class MEstimateEvaluator(RuleEvaluator):
689    """
690    Rule evaluator using m-estimate of probability rule evaluation function.
691   
692    :param m: m-value for m-estimate
693    :type m: int
694   
695    """
696    def __init__(self, m=2):
697        self.m = m
698    def __call__(self, rule, data, weight_id, target_class, apriori):
699        if not rule.class_distribution:
700            return 0.
701        sumDist = rule.class_distribution.abs
702        if self.m == 0 and not sumDist:
703            return 0.
704        # get distribution
705        if target_class > -1:
706            p = rule.class_distribution[target_class] + self.m * apriori[target_class] / apriori.abs
707            p = p / (rule.class_distribution.abs + self.m)
708        else:
709            p = max(rule.class_distribution) + self.m * apriori[rule.\
710                class_distribution.modus()] / apriori.abs
711            p = p / (rule.class_distribution.abs + self.m)
712        return p
713
714MEstimateEvaluator = deprecated_members({"weightID": "weight_id",
715                                         "targetClass": "target_class"})(MEstimateEvaluator)
716
717
718class CN2Learner(RuleLearner):
719    """
720    Classical CN2 (see Clark and Niblett; 1988) induces a set of ordered
721    rules, which means that classificator must try these rules in the same
722    order as they were learned.
723   
724    If data instances are provided to the constructor, the learning algorithm
725    is called and the resulting classifier is returned instead of the learner.
726
727    :param evaluator: an object that evaluates a rule from covered instances.
728        By default, entropy is used as a measure.
729    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
730    :param beam_width: width of the search beam.
731    :type beam_width: int
732    :param alpha: significance level of the likelihood ratio statistics to
733        determine whether rule is better than the default rule.
734    :type alpha: float
735
736    """
737
738    def __new__(cls, instances=None, weight_id=0, **kwargs):
739        self = RuleLearner.__new__(cls, **kwargs)
740        if instances is not None:
741            self.__init__(**kwargs)
742            return self.__call__(instances, weight_id)
743        else:
744            return self
745
746    def __init__(self, evaluator=RuleEvaluator_Entropy(), beam_width=5,
747        alpha=1.0, **kwds):
748        self.__dict__.update(kwds)
749        self.rule_finder = RuleBeamFinder()
750        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width=beam_width)
751        self.rule_finder.evaluator = evaluator
752        self.rule_finder.validator = RuleValidator_LRS(alpha=alpha)
753
754    def __call__(self, instances, weight=0):
755        supervisedClassCheck(instances)
756
757        cl = RuleLearner.__call__(self, instances, weight)
758        rules = cl.rules
759        return CN2Classifier(rules, instances, weight)
760
761CN2Learner = deprecated_members({"beamWidth": "beam_width",
762                     "ruleFinder": "rule_finder",
763                     "ruleStopping": "rule_stopping",
764                     "dataStopping": "data_stopping",
765                     "coverAndRemove": "cover_and_remove",
766                     "storeInstances": "store_instances",
767                     "targetClass": "target_class",
768                     "baseRules": "base_rules",
769                     "weightID": "weight_id"})(CN2Learner)
770
771
772class CN2Classifier(RuleClassifier):
773    """
774    Classical CN2 (see Clark and Niblett; 1988) classifies a new instance
775    using an ordered set of rules. Usually the learner
776    (:class:`~Orange.classification.rules.CN2Learner`) is used to construct the
777    classifier.
778   
779    :param rules: learned rules to be used for classification (mandatory).
780    :type rules: :class:`~Orange.classification.rules.RuleList`
781   
782    :param instances: data instances that were used for learning.
783    :type instances: :class:`Orange.data.Table`
784   
785    :param weight_id: ID of the weight meta-attribute.
786    :type weight_id: int
787
788    """
789
790    @deprecated_keywords({"examples": "instances"})
791    def __init__(self, rules=None, instances=None, weight_id=0, **argkw):
792        self.rules = rules
793        self.examples = instances
794        self.weight_id = weight_id
795        self.class_var = None if instances is None else instances.domain.class_var
796        self.__dict__.update(argkw)
797        if instances is not None:
798            self.prior = Orange.statistics.distribution.Distribution(instances.domain.class_var, instances)
799
800    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
801        """
802        :param instance: instance to be classified.
803        :type instance: :class:`Orange.data.Instance`
804       
805        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
806              :class:`Orange.classification.Classifier.GetProbabilities` or
807              :class:`Orange.classification.Classifier.GetBoth`
808       
809        :rtype: :class:`Orange.data.Value`,
810              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
811        """
812        classifier = None
813        for r in self.rules:
814         #   r.filter.domain = instance.domain
815            if r(instance) and r.classifier:
816                classifier = r.classifier
817                classifier.defaultDistribution = r.class_distribution
818                break
819        if not classifier:
820            classifier = Orange.classification.ConstantClassifier(instance.domain.class_var, \
821                self.prior.modus())
822            classifier.defaultDistribution = self.prior
823
824        classifier.defaultDistribution.normalize()
825        if result_type == Orange.classification.Classifier.GetValue:
826          return classifier(instance)
827        if result_type == Orange.classification.Classifier.GetProbabilities:
828          return classifier.default_distribution
829        return (classifier(instance), classifier.default_distribution)
830
831    def __str__(self):
832        ret_str = rule_to_string(self.rules[0]) + " " + str(self.rules[0].\
833            class_distribution) + "\n"
834        for r in self.rules[1:]:
835            ret_str += "ELSE " + rule_to_string(r) + " " + str(r.class_distribution) + "\n"
836        return ret_str
837
838CN2Classifier = deprecated_members({"resultType": "result_type",
839                                    "beamWidth": "beam_width"})(CN2Classifier)
840
841
842class CN2UnorderedLearner(RuleLearner):
843    """
844    CN2 unordered (see Clark and Boswell; 1991) induces a set of unordered
845    rules - classification from rules does not assume ordering of rules.
846    Learning rules is quite similar to learning in classical CN2, where
847    the process of learning of rules is separated to learning rules for each
848    class.
849   
850    If data instances are provided to the constructor, the learning algorithm
851    is called and the resulting classifier is returned instead of the learner.
852
853    :param evaluator: an object that evaluates a rule from covered instances.
854        By default, Laplace's rule of succession is used as a measure.
855    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
856    :param beam_width: width of the search beam.
857    :type beam_width: int
858    :param alpha: significance level of the likelihood ratio statistics to
859        determine whether rule is better than the default rule.
860    :type alpha: float
861    """
862    def __new__(cls, instances=None, weight_id=0, **kwargs):
863        self = RuleLearner.__new__(cls, **kwargs)
864        if instances is not None:
865            self.__init__(**kwargs)
866            return self.__call__(instances, weight_id)
867        else:
868            return self
869
870    def __init__(self, evaluator=RuleEvaluator_Laplace(), beam_width=5,
871        alpha=1.0, **kwds):
872        self.__dict__.update(kwds)
873        self.rule_finder = RuleBeamFinder()
874        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width=beam_width)
875        self.rule_finder.evaluator = evaluator
876        self.rule_finder.validator = RuleValidator_LRS(alpha=alpha)
877        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha=1.0)
878        self.rule_stopping = RuleStopping_Apriori()
879        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
880
881    @deprecated_keywords({"weight": "weight_id"})
882    def __call__(self, instances, weight_id=0):
883        supervisedClassCheck(instances)
884
885        rules = RuleList()
886        self.rule_stopping.apriori = Orange.statistics.distribution.Distribution(
887            instances.domain.class_var, instances)
888        progress = getattr(self, "progressCallback", None)
889        if progress:
890            progress.start = 0.0
891            progress.end = 0.0
892            distrib = Orange.statistics.distribution.Distribution(
893                instances.domain.class_var, instances, weight_id)
894            distrib.normalize()
895        for target_class in instances.domain.class_var:
896            if progress:
897                progress.start = progress.end
898                progress.end += distrib[target_class]
899            self.target_class = target_class
900            cl = RuleLearner.__call__(self, instances, weight_id)
901            for r in cl.rules:
902                rules.append(r)
903        if progress:
904            progress(1.0, None)
905        return CN2UnorderedClassifier(rules, instances, weight_id)
906
907CN2UnorderedLearner = deprecated_members({"beamWidth": "beam_width",
908                     "ruleFinder": "rule_finder",
909                     "ruleStopping": "rule_stopping",
910                     "dataStopping": "data_stopping",
911                     "coverAndRemove": "cover_and_remove",
912                     "storeInstances": "store_instances",
913                     "targetClass": "target_class",
914                     "baseRules": "base_rules",
915                     "weightID": "weight_id"})(CN2UnorderedLearner)
916
917
918class CN2UnorderedClassifier(RuleClassifier):
919    """
920    CN2 unordered (see Clark and Boswell; 1991) classifies a new instance using
921    a set of unordered rules. Usually the learner
922    (:class:`~Orange.classification.rules.CN2UnorderedLearner`) is used to
923    construct the classifier.
924   
925    :param rules: learned rules to be used for classification (mandatory).
926    :type rules: :class:`~Orange.classification.rules.RuleList`
927   
928    :param instances: data instances that were used for learning.
929    :type instances: :class:`Orange.data.Table`
930   
931    :param weight_id: ID of the weight meta-attribute.
932    :type weight_id: int
933
934    """
935
936    @deprecated_keywords({"examples": "instances"})
937    def __init__(self, rules=None, instances=None, weight_id=0, **argkw):
938        self.rules = rules
939        self.examples = instances
940        self.weight_id = weight_id
941        self.class_var = instances.domain.class_var if instances is not None else None
942        self.__dict__.update(argkw)
943        if instances is not None:
944            self.prior = Orange.statistics.distribution.Distribution(
945                                instances.domain.class_var, instances)
946
947    @deprecated_keywords({"retRules": "ret_rules"})
948    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue, ret_rules=False):
949        """
950        :param instance: instance to be classified.
951        :type instance: :class:`Orange.data.Instance`
952       
953        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
954              :class:`Orange.classification.Classifier.GetProbabilities` or
955              :class:`Orange.classification.Classifier.GetBoth`
956       
957        :rtype: :class:`Orange.data.Value`,
958              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
959        """
960        def add(disc1, disc2, sumd):
961            disc = Orange.statistics.distribution.Discrete(disc1)
962            sumdisc = sumd
963            for i, d in enumerate(disc):
964                disc[i] += disc2[i]
965                sumdisc += disc2[i]
966            return disc, sumdisc
967
968        # create empty distribution
969        retDist = Orange.statistics.distribution.Discrete(self.examples.domain.class_var)
970        covRules = RuleList()
971        # iterate through instances - add distributions
972        sumdisc = 0.
973        for r in self.rules:
974            if r(instance) and r.class_distribution:
975                retDist, sumdisc = add(retDist, r.class_distribution, sumdisc)
976                covRules.append(r)
977        if not sumdisc:
978            retDist = self.prior
979            sumdisc = self.prior.abs
980
981        if sumdisc > 0.0:
982            for c in self.examples.domain.class_var:
983                retDist[c] /= sumdisc
984        else:
985            retDist.normalize()
986
987        if ret_rules:
988            if result_type == Orange.classification.Classifier.GetValue:
989              return (retDist.modus(), covRules)
990            if result_type == Orange.classification.Classifier.GetProbabilities:
991              return (retDist, covRules)
992            return (retDist.modus(), retDist, covRules)
993        if result_type == Orange.classification.Classifier.GetValue:
994          return retDist.modus()
995        if result_type == Orange.classification.Classifier.GetProbabilities:
996          return retDist
997        return (retDist.modus(), retDist)
998
999    def __str__(self):
1000        retStr = ""
1001        for r in self.rules:
1002            retStr += rule_to_string(r) + " " + str(r.class_distribution) + "\n"
1003        return retStr
1004
1005
1006class CN2SDUnorderedLearner(CN2UnorderedLearner):
1007    """
1008    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules, which
1009    is the same as :class:`~Orange.classification.rules.CN2UnorderedLearner`.
1010    The difference between classical CN2 unordered and CN2-SD is selection of
1011    specific evaluation function and covering function:
1012    :class:`WRACCEvaluator` is used to implement
1013    weight-relative accuracy and
1014    :class:`CovererAndRemover_MultWeights` avoids
1015    excluding covered instances, multiplying their weight by the value of
1016    mult parameter instead.
1017   
1018    If data instances are provided to the constructor, the learning algorithm
1019    is called and the resulting classifier is returned instead of the learner.
1020
1021    :param evaluator: an object that evaluates a rule from covered instances.
1022        By default, weighted relative accuracy is used.
1023    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
1024   
1025    :param beam_width: width of the search beam.
1026    :type beam_width: int
1027   
1028    :param alpha: significance level of the likelihood ratio statistics to
1029        determine whether rule is better than the default rule.
1030    :type alpha: float
1031   
1032    :param mult: multiplicator for weights of covered instances.
1033    :type mult: float
1034    """
1035    def __new__(cls, instances=None, weight_id=0, **kwargs):
1036        self = CN2UnorderedLearner.__new__(cls, **kwargs)
1037        if instances is not None:
1038            self.__init__(**kwargs)
1039            return self.__call__(instances, weight_id)
1040        else:
1041            return self
1042
1043    def __init__(self, evaluator=WRACCEvaluator(), beam_width=5,
1044                alpha=0.05, mult=0.7, **kwds):
1045        CN2UnorderedLearner.__init__(self, evaluator=evaluator,
1046                                          beam_width=beam_width, alpha=alpha, **kwds)
1047        self.cover_and_remove = CovererAndRemover_MultWeights(mult=mult)
1048
1049    def __call__(self, instances, weight=0):
1050        supervisedClassCheck(instances)
1051
1052        oldInstances = Orange.data.Table(instances)
1053        classifier = CN2UnorderedLearner.__call__(self, instances, weight)
1054        for r in classifier.rules:
1055            r.filterAndStore(oldInstances, weight, r.classifier.default_val)
1056        return classifier
1057
1058
1059class ABCN2(RuleLearner):
1060    """
1061    This is an implementation of argument-based CN2 using EVC as evaluation
1062    and LRC classification.
1063   
1064    Rule learning parameters that can be passed to constructor:
1065   
1066    :param width: beam width (default 5).
1067    :type width: int
1068    :param learn_for_class: class for which to learn; None (default) if all
1069       classes are to be learnt.
1070    :param learn_one_rule: decides whether to rule one rule only (default
1071       False).
1072    :type learn_one_rule: boolean
1073    :param analyse_argument: index of argument to analyse; -1 to learn normally
1074       (default)
1075    :type analyse_argument: int
1076    :param debug: sets debug mode - prints some info during execution; False (default)
1077    :type debug: boolean
1078   
1079    The following evaluator related arguments are supported:
1080   
1081    :param m: m for m-estimate to be corrected with EVC (default 2).
1082    :type m: int
1083    :param opt_reduction: type of EVC correction: 0=no correction,
1084       1=pessimistic, 2=normal (default 2).
1085    :type opt_reduction: int
1086    :param nsampling: number of samples in estimating extreme value
1087       distribution for EVC (default 100).
1088    :type nsampling: int
1089    :param evd: pre-given extreme value distributions.
1090    :param evd_arguments: pre-given extreme value distributions for arguments.
1091   
1092    Those parameters control rule validation:
1093   
1094    :param rule_sig: minimal rule significance (default 1.0).
1095    :type rule_sig: float
1096    :param att_sig: minimal attribute significance in rule (default 1.0).
1097    :type att_sig: float
1098    :param max_rule_complexity: maximum number of conditions in rule (default 5).
1099    :type max_rule_complexity: int
1100    :param min_coverage: minimal number of covered instances (default 5).
1101    :type min_coverage: int
1102   
1103    Probabilistic covering can be controlled using:
1104   
1105    :param min_improved: minimal number of instances improved in probabilistic covering (default 1).
1106    :type min_improved: int
1107    :param min_improved_perc: minimal percentage of covered instances that need to be improved (default 0.0).
1108    :type min_improved_perc: float
1109   
1110    Finally, LRC (classifier) related parameters are:
1111   
1112    :param add_sub_rules: decides whether to add sub-rules.
1113    :type add_sub_rules: boolean
1114    :param min_cl_sig: minimal significance of beta in classifier (default 0.5).
1115    :type min_cl_sig: float
1116    :param min_beta: minimal beta value (default 0.0).
1117    :type min_beta: float
1118    :param set_prefix_rules: decides whether ordered prefix rules should be
1119       added (default False).
1120    :type set_prefix_rules: boolean
1121    :param alternative_learner: use rule-learner as a correction method for
1122       other machine learning methods (default None).
1123
1124    """
1125
1126    def __init__(self, argument_id=0, width=5, m=2, opt_reduction=2, nsampling=100, max_rule_complexity=5,
1127                 rule_sig=1.0, att_sig=1.0, postpruning=None, min_quality=0., min_coverage=1, min_improved=1, min_improved_perc=0.0,
1128                 learn_for_class=None, learn_one_rule=False, evd=None, evd_arguments=None, prune_arguments=False, analyse_argument= -1,
1129                 alternative_learner=None, min_cl_sig=0.5, min_beta=0.0, set_prefix_rules=False, add_sub_rules=False, debug=False,
1130                 **kwds):
1131
1132        # argument ID which is passed to abcn2 learner
1133        self.argument_id = argument_id
1134        # learn for specific class only?       
1135        self.learn_for_class = learn_for_class
1136        # only analysing a specific argument or learning all at once
1137        self.analyse_argument = analyse_argument
1138        # should we learn only one rule?
1139        self.learn_one_rule = learn_one_rule
1140        self.postpruning = postpruning
1141        # rule finder
1142        self.rule_finder = RuleBeamFinder()
1143        self.ruleFilter = RuleBeamFilter_Width(width=width)
1144        self.ruleFilter_arguments = ABBeamFilter(width=width)
1145        if max_rule_complexity - 1 < 0:
1146            max_rule_complexity = 10
1147        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha=1.0, min_quality=0., max_rule_complexity=max_rule_complexity - 1, min_coverage=min_coverage)
1148        self.refiner = RuleBeamRefiner_Selector()
1149        self.refiner_arguments = SelectorAdder(discretizer=Orange.feature.discretization.Entropy(forceAttribute=1,
1150                                                                                           maxNumberOfIntervals=2))
1151        self.prune_arguments = prune_arguments
1152        # evc evaluator
1153        evdGet = Orange.core.EVDistGetter_Standard()
1154        self.rule_finder.evaluator = RuleEvaluator_mEVC(m=m, evDistGetter=evdGet, min_improved=min_improved, min_improved_perc=min_improved_perc)
1155        self.rule_finder.evaluator.returnExpectedProb = True
1156        self.rule_finder.evaluator.optimismReduction = opt_reduction
1157        self.rule_finder.evaluator.ruleAlpha = rule_sig
1158        self.rule_finder.evaluator.attributeAlpha = att_sig
1159        self.rule_finder.evaluator.validator = RuleValidator_LRS(alpha=1.0, min_quality=min_quality, min_coverage=min_coverage, max_rule_complexity=max_rule_complexity - 1)
1160
1161        # learn stopping criteria
1162        self.rule_stopping = None
1163        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
1164        # evd fitting
1165        self.evd_creator = EVDFitter(self, n=nsampling)
1166        self.evd = evd
1167        self.evd_arguments = evd_arguments
1168        # classifier
1169        self.add_sub_rules = add_sub_rules
1170        self.classifier = PILAR(alternative_learner=alternative_learner, min_cl_sig=min_cl_sig, min_beta=min_beta, set_prefix_rules=set_prefix_rules)
1171        self.debug = debug
1172        # arbitrary parameters
1173        self.__dict__.update(kwds)
1174
1175
1176    def __call__(self, examples, weight_id=0):
1177        # initialize progress bar
1178        progress = getattr(self, "progressCallback", None)
1179        if progress:
1180            progress.start = 0.0
1181            progress.end = 0.0
1182            distrib = Orange.statistics.distribution.Distribution(
1183                             examples.domain.class_var, examples, weight_id)
1184            distrib.normalize()
1185
1186        # we begin with an empty set of rules
1187        all_rules = RuleList()
1188
1189        # th en, iterate through all classes and learn rule for each class separately
1190        for cl_i, cl in enumerate(examples.domain.class_var):
1191            if progress:
1192                step = distrib[cl] / 2.
1193                progress.start = progress.end
1194                progress.end += step
1195
1196            if self.learn_for_class and not self.learn_for_class in [cl, cl_i]:
1197                continue
1198
1199            # rules for this class only
1200            rules = RuleList()
1201
1202            # create dichotomous class
1203            dich_data = self.create_dich_class(examples, cl)
1204
1205            # preparation of the learner (covering, evd, etc.)
1206            self.prepare_settings(dich_data, weight_id, cl_i, progress)
1207
1208            # learn argumented rules first ...
1209            self.turn_ABML_mode(dich_data, weight_id, cl_i)
1210            # first specialize all unspecialized arguments
1211            # dich_data = self.specialise_arguments(dich_data, weight_id)
1212            # comment: specialisation of arguments is within learning of an argumented rule;
1213            #          this is now different from the published algorithm
1214            if progress:
1215                progress.start = progress.end
1216                progress.end += step
1217
1218            aes = self.get_argumented_examples(dich_data)
1219            aes = self.sort_arguments(aes, dich_data)
1220            while aes:
1221                if self.analyse_argument > -1 and \
1222                   (isinstance(self.analyse_argument, Orange.core.Example) and not Orange.core.Example(dich_data.domain, self.analyse_argument) == aes[0] or \
1223                    isinstance(self.analyse_argument, int) and not dich_data[self.analyse_argument] == aes[0]):
1224                    aes = aes[1:]
1225                    continue
1226                ae = aes[0]
1227                rule = self.learn_argumented_rule(ae, dich_data, weight_id) # target class is always first class (0)
1228                if self.debug and rule:
1229                    print "learned arg rule", Orange.classification.rules.rule_to_string(rule)
1230                elif self.debug:
1231                    print "no rule came out of ", ae
1232                if rule:
1233                    rules.append(rule)
1234                    aes = filter(lambda x: not rule(x), aes)
1235                else:
1236                    aes = aes[1:]
1237                aes = aes[1:]
1238
1239            if not progress and self.debug:
1240                print " arguments finished ... "
1241
1242            # remove all examples covered by rules
1243            for rule in rules:
1244                dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1245            if progress:
1246                progress(self.remaining_probability(dich_data), None)
1247
1248            # learn normal rules on remaining examples
1249            if self.analyse_argument == -1:
1250                self.turn_normal_mode(dich_data, weight_id, cl_i)
1251                while dich_data:
1252                    # learn a rule
1253                    rule = self.learn_normal_rule(dich_data, weight_id, self.apriori)
1254                    if not rule:
1255                        break
1256                    if self.debug:
1257                        print "rule learned: ", Orange.classification.rules.rule_to_string(rule), rule.quality
1258                    dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1259                    if progress:
1260                        progress(self.remaining_probability(dich_data), None)
1261                    rules.append(rule)
1262                    if self.learn_one_rule:
1263                        break
1264
1265            # prune unnecessary rules
1266            rules = self.prune_unnecessary_rules(rules, dich_data, weight_id)
1267
1268            if self.add_sub_rules:
1269                rules = self.add_sub_rules_call(rules, dich_data, weight_id)
1270
1271            # restore domain and class in rules, add them to all_rules
1272            for r in rules:
1273                all_rules.append(self.change_domain(r, cl, examples, weight_id))
1274
1275            if progress:
1276                progress(1.0, None)
1277        # create a classifier from all rules       
1278        return self.create_classifier(all_rules, examples, weight_id)
1279
1280    def learn_argumented_rule(self, ae, examples, weight_id):
1281        # prepare roots of rules from arguments
1282        positive_args = self.init_pos_args(ae, examples, weight_id)
1283        if not positive_args: # something wrong
1284            raise "There is a problem with argumented example %s" % str(ae)
1285            return None
1286        negative_args = self.init_neg_args(ae, examples, weight_id)
1287
1288        # set negative arguments in refiner
1289        self.rule_finder.refiner.notAllowedSelectors = negative_args
1290        self.rule_finder.refiner.example = ae
1291        # set arguments to filter
1292        self.rule_finder.ruleFilter.setArguments(examples.domain, positive_args)
1293
1294        # learn a rule
1295        self.rule_finder.evaluator.bestRule = None
1296        self.rule_finder(examples, weight_id, 0, positive_args)
1297
1298        # return best rule
1299        return self.rule_finder.evaluator.bestRule
1300
1301    def prepare_settings(self, examples, weight_id, cl_i, progress):
1302        # apriori distribution
1303        self.apriori = Orange.statistics.distribution.Distribution(
1304                                examples.domain.class_var, examples, weight_id)
1305
1306        # prepare covering mechanism
1307        self.coverAndRemove = CovererAndRemover_Prob(examples, weight_id, 0, self.apriori, self.argument_id)
1308        self.rule_finder.evaluator.probVar = examples.domain.getmeta(self.cover_and_remove.probAttribute)
1309
1310        # compute extreme distributions
1311        # TODO: why evd and evd_this????
1312        if self.rule_finder.evaluator.optimismReduction > 0 and not self.evd:
1313            self.evd_this = self.evd_creator.computeEVD(examples, weight_id, target_class=0, progress=progress)
1314        if self.evd:
1315            self.evd_this = self.evd[cl_i]
1316
1317    def turn_ABML_mode(self, examples, weight_id, cl_i):
1318        # evaluator
1319        if self.rule_finder.evaluator.optimismReduction > 0 and self.argument_id:
1320            if self.evd_arguments:
1321                self.rule_finder.evaluator.evDistGetter.dists = self.evd_arguments[cl_i]
1322            else:
1323                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD_example(examples, weight_id, target_class=0)
1324        # rule refiner
1325        self.rule_finder.refiner = self.refiner_arguments
1326        self.rule_finder.refiner.argument_id = self.argument_id
1327        self.rule_finder.ruleFilter = self.ruleFilter_arguments
1328
1329    def create_dich_class(self, examples, cl):
1330        """
1331        Create dichotomous class.
1332        """
1333        (newDomain, targetVal) = create_dichotomous_class(examples.domain, examples.domain.class_var, str(cl), negate=0)
1334        newDomainmetas = newDomain.getmetas()
1335        newDomain.addmeta(Orange.feature.Descriptor.new_meta_id(), examples.domain.class_var) # old class as meta
1336        dichData = examples.select(newDomain)
1337        if self.argument_id:
1338            for d in dichData: # remove arguments given to other classes
1339                if not d.getclass() == targetVal:
1340                    d[self.argument_id] = "?"
1341        return dichData
1342
1343    def get_argumented_examples(self, examples):
1344        if not self.argument_id:
1345            return None
1346
1347        # get argumented examples
1348        return ArgumentFilter_hasSpecial()(examples, self.argument_id, target_class=0)
1349
1350    def sort_arguments(self, arg_examples, examples):
1351        if not self.argument_id:
1352            return None
1353        evaluateAndSortArguments(examples, self.argument_id)
1354        if len(arg_examples) > 0:
1355            # sort examples by their arguments quality (using first argument as it has already been sorted)
1356            sorted = arg_examples.native()
1357            sorted.sort(lambda x, y:-cmp(x[self.argument_id].value.positive_arguments[0].quality,
1358                                         y[self.argument_id].value.positive_arguments[0].quality))
1359            return Orange.data.Table(examples.domain, sorted)
1360        else:
1361            return None
1362
1363    def turn_normal_mode(self, examples, weight_id, cl_i):
1364        # evaluator
1365        if self.rule_finder.evaluator.optimismReduction > 0:
1366            if self.evd:
1367                self.rule_finder.evaluator.evDistGetter.dists = self.evd[cl_i]
1368            else:
1369                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD(examples, weight_id, target_class=0)
1370        # rule refiner
1371        self.rule_finder.refiner = self.refiner
1372        self.rule_finder.ruleFilter = self.ruleFilter
1373
1374    def learn_normal_rule(self, examples, weight_id, apriori):
1375        if hasattr(self.rule_finder.evaluator, "bestRule"):
1376            self.rule_finder.evaluator.bestRule = None
1377        rule = self.rule_finder(examples, weight_id, 0, RuleList())
1378        if hasattr(self.rule_finder.evaluator, "bestRule") and self.rule_finder.evaluator.returnExpectedProb:
1379            rule = self.rule_finder.evaluator.bestRule
1380            self.rule_finder.evaluator.bestRule = None
1381        if self.postpruning:
1382            rule = self.postpruning(rule, examples, weight_id, 0, aprior)
1383        return rule
1384
1385    def remove_covered_examples(self, rule, examples, weight_id):
1386        nexamples, nweight = self.cover_and_remove(rule, examples, weight_id, 0)
1387        return nexamples
1388
1389
1390    def prune_unnecessary_rules(self, rules, examples, weight_id):
1391        return self.cover_and_remove.getBestRules(rules, examples, weight_id)
1392
1393    def change_domain(self, rule, cl, examples, weight_id):
1394        rule.filter = Orange.core.Filter_values(domain=examples.domain,
1395                                        conditions=rule.filter.conditions)
1396        rule.filterAndStore(examples, weight_id, cl)
1397        if hasattr(rule, "learner") and hasattr(rule.learner, "arg_example"):
1398            rule.learner.arg_example = Orange.data.Instance(examples.domain, rule.learner.arg_example)
1399        return rule
1400
1401    def create_classifier(self, rules, examples, weight_id):
1402        return self.classifier(rules, examples, weight_id)
1403
1404    def add_sub_rules_call(self, rules, examples, weight_id):
1405        apriori = Orange.statistics.distribution.Distribution(
1406                            examples.domain.class_var, examples, weight_id)
1407        new_rules = RuleList()
1408        for r in rules:
1409            new_rules.append(r)
1410
1411        # loop through rules
1412        for r in rules:
1413            tmpList = RuleList()
1414            tmpRle = r.clone()
1415            tmpRle.filter.conditions = r.filter.conditions[:r.requiredConditions] # do not split argument
1416            tmpRle.parentRule = None
1417            tmpRle.filterAndStore(examples, weight_id, r.classifier.default_val)
1418            tmpRle.complexity = 0
1419            tmpList.append(tmpRle)
1420            while tmpList and len(tmpList[0].filter.conditions) <= len(r.filter.conditions):
1421                tmpList2 = RuleList()
1422                for tmpRule in tmpList:
1423                    # evaluate tmpRule
1424                    oldREP = self.rule_finder.evaluator.returnExpectedProb
1425                    self.rule_finder.evaluator.returnExpectedProb = False
1426                    tmpRule.quality = self.rule_finder.evaluator(tmpRule, examples, weight_id, r.classifier.default_val, apriori)
1427                    self.rule_finder.evaluator.returnExpectedProb = oldREP
1428                tmpList.sort(lambda x, y:-cmp(x.quality, y.quality))
1429                tmpList = tmpList[:self.rule_filter.width]
1430
1431                for tmpRule in tmpList:
1432                    # if rule not in rules already, add it to the list
1433                    if not True in [Orange.classification.rules.rules_equal(ri, tmpRule) for ri in new_rules] and len(tmpRule.filter.conditions) > 0 and tmpRule.quality > apriori[r.classifier.default_val] / apriori.abs:
1434                        new_rules.append(tmpRule)
1435                    # create new tmpRules, set parent Rule, append them to tmpList2
1436                    if not True in [Orange.classification.rules.rules_equal(ri, tmpRule) for ri in new_rules]:
1437                        for c in r.filter.conditions:
1438                            tmpRule2 = tmpRule.clone()
1439                            tmpRule2.parentRule = tmpRule
1440                            tmpRule2.filter.conditions.append(c)
1441                            tmpRule2.filterAndStore(examples, weight_id, r.classifier.default_val)
1442                            tmpRule2.complexity += 1
1443                            if tmpRule2.class_distribution.abs < tmprule.class_distribution.abs:
1444                                tmpList2.append(tmpRule2)
1445                tmpList = tmpList2
1446        return new_rules
1447
1448    def init_pos_args(self, ae, examples, weight_id):
1449        pos_args = RuleList()
1450        # prepare arguments
1451        for p in ae[self.argument_id].value.positive_arguments:
1452            new_arg = Rule(filter=ArgFilter(argument_id=self.argument_id,
1453                                                   filter=self.newFilter_values(p.filter),
1454                                                   arg_example=ae),
1455                                                   complexity=0)
1456            new_arg.valuesFilter = new_arg.filter.filter
1457            pos_args.append(new_arg)
1458
1459
1460        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1461            old_exp = self.rule_finder.evaluator.returnExpectedProb
1462            self.rule_finder.evaluator.returnExpectedProb = False
1463
1464        # argument pruning (all or just unfinished arguments)
1465        # if pruning is chosen, then prune arguments if possible
1466        for p in pos_args:
1467            p.filterAndStore(examples, weight_id, 0)
1468            if not p.learner:
1469                p.learner = DefaultLearner(default_value=ae.getclass())
1470            # pruning on: we check on all conditions and take only best
1471            if self.prune_arguments:
1472                allowed_conditions = [c for c in p.filter.conditions]
1473                pruned_conditions = self.prune_arg_conditions(ae, allowed_conditions, examples, weight_id)
1474                p.baseDist = Orange.statistics.distribution.Distribution(examples.domain.classVar, examples, weight_id)
1475                p.filter.conditions = pruned_conditions
1476                p.learner.setattr("arg_length", 0)
1477
1478            else: # prune only unspecified conditions
1479                spec_conditions = [c for c in p.filter.conditions if not c.unspecialized_condition]
1480                unspec_conditions = [c for c in p.filter.conditions if c.unspecialized_condition]
1481                # let rule cover now all examples filtered by specified conditions
1482                p.filter.conditions = spec_conditions
1483                p.filterAndStore(examples, weight_id, 0)
1484                p.baseDist = p.classDistribution
1485                p.learner.setattr("arg_length", len(p.filter.conditions))
1486                pruned_conditions = self.prune_arg_conditions(ae, unspec_conditions, p.examples, p.weightID)
1487                p.filter.conditions.extend(pruned_conditions)
1488                p.filter.filter.conditions.extend(pruned_conditions)
1489                # if argument does not contain all unspecialized reasons, add those reasons with minimum values
1490                at_oper_pairs = [(c.position, c.oper) for c in p.filter.conditions if type(c) == Orange.core.ValueFilter_continuous]
1491                for u in unspec_conditions:
1492                    if not (u.position, u.oper) in at_oper_pairs:
1493                        # find minimum value
1494                        if u.oper == Orange.core.ValueFilter_continuous.Greater or u.oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1495                            u.ref = min([float(e[u.position]) - 10. for e in p.examples])
1496                        else:
1497                            u.ref = max([float(e[u.position]) + 10. for e in p.examples])
1498                        p.filter.conditions.append(u)
1499                        p.filter.filter.conditions.append(u)
1500
1501        # set parameters to arguments
1502        for p_i, p in enumerate(pos_args):
1503            p.filterAndStore(examples, weight_id, 0)
1504            p.filter.domain = examples.domain
1505            p.classifier = p.learner(p.examples, p.weightID)
1506            p.requiredConditions = len(p.filter.conditions)
1507            p.learner.setattr("arg_example", ae)
1508            p.complexity = len(p.filter.conditions)
1509
1510        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1511            self.rule_finder.evaluator.returnExpectedProb = old_exp
1512
1513        return pos_args
1514
1515    def newFilter_values(self, filter):
1516        newFilter = Orange.core.Filter_values()
1517        newFilter.conditions = filter.conditions[:]
1518        newFilter.domain = filter.domain
1519        newFilter.negate = filter.negate
1520        newFilter.conjunction = filter.conjunction
1521        return newFilter
1522
1523    def init_neg_args(self, ae, examples, weight_id):
1524        return ae[self.argument_id].value.negative_arguments
1525
1526    def remaining_probability(self, examples):
1527        return self.cover_and_remove.covered_percentage(examples)
1528
1529    def prune_arg_conditions(self, crit_example, allowed_conditions, examples, weight_id):
1530        if not allowed_conditions:
1531            return []
1532        cn2_learner = Orange.classification.rules.CN2UnorderedLearner()
1533        cn2_learner.rule_finder = RuleBeamFinder()
1534        cn2_learner.rule_finder.refiner = SelectorArgConditions(crit_example, allowed_conditions)
1535        cn2_learner.rule_finder.evaluator = Orange.classification.rules.MEstimateEvaluator(self.rule_finder.evaluator.m)
1536        rule = cn2_learner.rule_finder(examples, weight_id, 0, RuleList())
1537        return rule.filter.conditions
1538
1539ABCN2 = deprecated_members({"beamWidth": "beam_width",
1540                     "ruleFinder": "rule_finder",
1541                     "ruleStopping": "rule_stopping",
1542                     "dataStopping": "data_stopping",
1543                     "coverAndRemove": "cover_and_remove",
1544                     "storeInstances": "store_instances",
1545                     "targetClass": "target_class",
1546                     "baseRules": "base_rules",
1547                     "weightID": "weight_id",
1548                     "argumentID": "argument_id"})(ABCN2)
1549
1550class CN2EVCUnorderedLearner(ABCN2):
1551    """
1552    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules in a
1553    simmilar manner as
1554    :class:`~Orange.classification.rules.CN2SDUnorderedLearner`. This
1555    implementation uses the EVC rule evaluation.
1556   
1557    If data instances are provided to the constructor, the learning algorithm
1558    is called and the resulting classifier is returned instead of the learner.
1559
1560    :param evaluator: an object that evaluates a rule from covered instances.
1561        By default, weighted relative accuracy is used.
1562    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
1563   
1564    :param beam_width: width of the search beam.
1565    :type beam_width: int
1566   
1567    :param alpha: significance level of the likelihood ratio statistics to
1568        determine whether rule is better than the default rule.
1569    :type alpha: float
1570   
1571    :param mult: multiplicator for weights of covered instances.
1572    :type mult: float
1573    """
1574    def __init__(self, width=5, nsampling=100, rule_sig=1.0, att_sig=1.0, \
1575        min_coverage=1., max_rule_complexity=5.):
1576        ABCN2.__init__(self, width=width, nsampling=nsampling,
1577            rule_sig=rule_sig, att_sig=att_sig, min_coverage=int(min_coverage),
1578            max_rule_complexity=int(max_rule_complexity))
1579
1580class DefaultLearner(Orange.core.Learner):
1581    """
1582    Default lerner - returns default classifier with predefined output class.
1583    """
1584    def __init__(self, default_value=None):
1585        self.default_value = default_value
1586    def __call__(self, examples, weight_id=0):
1587        return Orange.classification.ConstantClassifier(self.default_value, defaultDistribution=Orange.core.Distribution(examples.domain.class_var, examples, weight_id))
1588
1589class ABCN2Ordered(ABCN2):
1590    """
1591    Rules learned by ABCN2 are ordered and used as a decision list.
1592    """
1593    def __init__(self, argument_id=0, **kwds):
1594        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1595        self.classifier.set_prefix_rules = True
1596        self.classifier.optimize_betas = False
1597
1598class ABCN2M(ABCN2):
1599    """
1600    Argument based rule learning with m-estimate as evaluation function.
1601    """
1602    def __init__(self, argument_id=0, **kwds):
1603        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1604        self.opt_reduction = 0
1605        self.rule_finder.evaluator.optimismReduction = self.opt_reduction
1606        self.classifier = CN2UnorderedClassifier
1607
1608class ABCN2MLRC(ABCN2):
1609    """
1610    Argument based rule learning with m-estimate as evaluation function. LRC is used as a classification method.
1611    """
1612    def __init__(self, argument_id=0, **kwds):
1613        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1614        self.opt_reduction = 0
1615        self.rule_finder.evaluator.optimismReduction = self.opt_reduction
1616
1617class ABCN2_StandardClassification(ABCN2):
1618    """
1619    Argument based rule learning with the original classification technique.
1620    """
1621    def __init__(self, argument_id=0, **kwds):
1622        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1623        self.classifier = CN2UnorderedClassifier
1624
1625
1626class RuleStopping_Apriori(RuleStoppingCriteria):
1627    def __init__(self, apriori=None):
1628        self.apriori = None
1629
1630    def __call__(self, rules, rule, instances, data):
1631        if not self.apriori:
1632            return False
1633        if not type(rule.classifier) == Orange.classification.ConstantClassifier:
1634            return False
1635        ruleAcc = rule.class_distribution[rule.classifier.default_val] / rule.class_distribution.abs
1636        aprioriAcc = self.apriori[rule.classifier.default_val] / self.apriori.abs
1637        if ruleAcc > aprioriAcc:
1638            return False
1639        return True
1640
1641
1642class RuleStopping_SetRules(RuleStoppingCriteria):
1643    def __init__(self, validator):
1644        self.rule_stopping = RuleStoppingCriteria_NegativeDistribution()
1645        self.validator = validator
1646
1647    def __call__(self, rules, rule, instances, data):
1648        ru_st = self.rule_stopping(rules, rule, instances, data)
1649        if not ru_st:
1650            self.validator.rules.append(rule)
1651        return bool(ru_st)
1652
1653
1654class LengthValidator(RuleValidator):
1655    """ prune rules with more conditions than self.length. """
1656    def __init__(self, length= -1):
1657        self.length = length
1658
1659    def __call__(self, rule, data, weight_id, target_class, apriori):
1660        if self.length >= 0:
1661            return len(rule.filter.conditions) <= self.length
1662        return True
1663
1664
1665class NoDuplicatesValidator(RuleValidator):
1666    def __init__(self, alpha=.05, min_coverage=0, max_rule_length=0, rules=RuleList()):
1667        self.rules = rules
1668        self.validator = RuleValidator_LRS(alpha=alpha, \
1669            min_coverage=min_coverage, max_rule_length=max_rule_length)
1670
1671    def __call__(self, rule, data, weight_id, target_class, apriori):
1672        if rule_in_set(rule, self.rules):
1673            return False
1674        return bool(self.validator(rule, data, weight_id, target_class, apriori))
1675
1676
1677
1678class RuleClassifier_BestRule(RuleClassifier):
1679    def __init__(self, rules, instances, weight_id=0, **argkw):
1680        self.rules = rules
1681        self.examples = instances
1682        self.class_var = instances.domain.class_var
1683        self.__dict__.update(argkw)
1684        self.prior = Orange.statistics.distribution.Distribution(
1685                    instances.domain.class_var, instances)
1686
1687    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
1688        retDist = Orange.statistics.distribution.Distribution(instance.domain.class_var)
1689        bestRule = None
1690        for r in self.rules:
1691            if r(instance) and (not bestRule or r.quality > bestRule.quality):
1692                for v_i, v in enumerate(instance.domain.class_var):
1693                    retDist[v_i] = r.class_distribution[v_i]
1694                bestRule = r
1695        if not bestRule:
1696            retDist = self.prior
1697        else:
1698            bestRule.used += 1
1699        sumdist = sum(retDist)
1700        if sumdist > 0.0:
1701            for c in self.examples.domain.class_var:
1702                retDist[c] /= sumdisc
1703        else:
1704            retDist.normalize()
1705        # return classifier(instance, result_type=result_type)
1706        if result_type == Orange.classification.Classifier.GetValue:
1707          return retDist.modus()
1708        if result_type == Orange.classification.Classifier.GetProbabilities:
1709          return retDist
1710        return (retDist.modus(), retDist)
1711
1712    def __str__(self):
1713        retStr = ""
1714        for r in self.rules:
1715            retStr += rule_to_string(r) + " " + str(r.class_distribution) + "\n"
1716        return retStr
1717
1718
1719class CovererAndRemover_MultWeights(RuleCovererAndRemover):
1720    """
1721    Covering and removing of instances using weight multiplication:
1722   
1723    :param mult: weighting multiplication factor
1724    :type mult: float   
1725    """
1726
1727    def __init__(self, mult=0.7):
1728        self.mult = mult
1729    def __call__(self, rule, instances, weights, target_class):
1730        if not weights:
1731            weights = Orange.feature.Descriptor.new_meta_id()
1732            instances.addMetaAttribute(weights, 1.)
1733            instances.domain.addmeta(weights, Orange.feature.\
1734                Continuous("weights-" + str(weights)), True)
1735        newWeightsID = Orange.feature.Descriptor.new_meta_id()
1736        instances.addMetaAttribute(newWeightsID, 1.)
1737        instances.domain.addmeta(newWeightsID, Orange.feature.\
1738            Continuous("weights-" + str(newWeightsID)), True)
1739        for instance in instances:
1740            if rule(instance) and instance.getclass() == rule.classifier(\
1741                instance, Orange.classification.Classifier.GetValue):
1742                instance[newWeightsID] = instance[weights] * self.mult
1743            else:
1744                instance[newWeightsID] = instance[weights]
1745        return (instances, newWeightsID)
1746
1747
1748class CovererAndRemover_AddWeights(RuleCovererAndRemover):
1749    """
1750    Covering and removing of instances using weight addition.
1751   
1752    """
1753
1754    def __call__(self, rule, instances, weights, target_class):
1755        if not weights:
1756            weights = Orange.feature.Descriptor.new_meta_id()
1757            instances.addMetaAttribute(weights, 1.)
1758            instances.domain.addmeta(weights, Orange.feature.\
1759                Continuous("weights-" + str(weights)), True)
1760        try:
1761            coverage = instances.domain.getmeta("Coverage")
1762        except:
1763            coverage = Orange.feature.Continuous("Coverage")
1764            instances.domain.addmeta(Orange.feature.Descriptor.new_meta_id(), coverage, True)
1765            instances.addMetaAttribute(coverage, 0.0)
1766        newWeightsID = Orange.feature.Descriptor.new_meta_id()
1767        instances.addMetaAttribute(newWeightsID, 1.)
1768        instances.domain.addmeta(newWeightsID, Orange.feature.\
1769            Continuous("weights-" + str(newWeightsID)), True)
1770        for instance in instances:
1771            if rule(instance) and instance.getclass() == rule.classifier(instance, \
1772                    Orange.classification.Classifier.GetValue):
1773                try:
1774                    instance[coverage] += 1.0
1775                except:
1776                    instance[coverage] = 1.0
1777                instance[newWeightsID] = 1.0 / (instance[coverage] + 1)
1778            else:
1779                instance[newWeightsID] = instance[weights]
1780        return (instances, newWeightsID)
1781
1782
1783class CovererAndRemover_Prob(RuleCovererAndRemover):
1784    """ This class impements probabilistic covering. """
1785    def __init__(self, examples, weight_id, target_class, apriori, argument_id):
1786        self.best_rule = [None] * len(examples)
1787        self.prob_attribute = Orange.feature.Descriptor.new_meta_id()
1788        self.apriori_prob = apriori[target_class] / apriori.abs
1789        examples.addMetaAttribute(self.prob_attribute, self.apriori_prob)
1790        examples.domain.addmeta(self.prob_attribute,
1791            Orange.feature.Continuous("Probs"))
1792        self.argument_id = argument_id
1793
1794    def getBestRules(self, current_rules, examples, weight_id):
1795        best_rules = RuleList()
1796        for r_i, r in enumerate(self.best_rule):
1797            if r and not rule_in_set(r, best_rules) and int(examples[r_i].getclass()) == int(r.classifier.default_value):
1798                if hasattr(r.learner, "arg_example"):
1799                    setattr(r, "best_example", r.learner.arg_example)
1800                else:
1801                    setattr(r, "best_example", examples[r_i])
1802                best_rules.append(r)
1803        return best_rules
1804
1805    def __call__(self, rule, examples, weights, target_class):
1806        """ if example has an argument, then the rule must be consistent with the argument. """
1807        example = getattr(rule.learner, "arg_example", None)
1808        for ei, e in enumerate(examples):
1809            if e == example:
1810                e[self.prob_attribute] = 1.0
1811                self.best_rule[ei] = rule
1812            elif rule(e) and rule.quality > e[self.prob_attribute]:
1813                e[self.prob_attribute] = rule.quality + 0.001 # 0.001 is added to avoid numerical errors
1814                self.best_rule[ei] = rule
1815        return (examples, weights)
1816
1817    def filter_covers_example(self, example, filter):
1818        filter_indices = RuleCoversArguments.filterIndices(filter)
1819        if filter(example):
1820            try:
1821                if example[self.argument_id].value and len(example[self.argument_id].value.positive_arguments) > 0: # example has positive arguments
1822                    # conditions should cover at least one of the positive arguments
1823                    one_arg_covered = False
1824                    for pA in example[self.argument_id].value.positive_arguments:
1825                        arg_covered = [self.condIn(c, filter_indices) for c in pA.filter.conditions]
1826                        one_arg_covered = one_arg_covered or len(arg_covered) == sum(arg_covered) #arg_covered
1827                        if one_arg_covered:
1828                            break
1829                    if not one_arg_covered:
1830                        return False
1831                if example[self.argument_id].value and len(example[self.argument_id].value.negative_arguments) > 0: # example has negative arguments
1832                    # condition should not cover neither of negative arguments
1833                    for pN in example[self.argument_id].value.negative_arguments:
1834                        arg_covered = [self.condIn(c, filter_indices) for c in pN.filter.conditions]
1835                        if len(arg_covered) == sum(arg_covered):
1836                            return False
1837            except:
1838                return True
1839            return True
1840        return False
1841
1842    def condIn(self, cond, filter_indices): # is condition in the filter?
1843        condInd = RuleCoversArguments.conditionIndex(cond)
1844        if operator.or_(condInd, filter_indices[cond.position]) == filter_indices[cond.position]:
1845            return True
1846        return False
1847
1848
1849    def covered_percentage(self, examples):
1850        p = 0.0
1851        for ei, e in enumerate(examples):
1852            p += (e[self.prob_attribute] - self.apriori_prob) / (1.0 - self.apriori_prob)
1853        return p / len(examples)
1854
1855
1856
1857
1858@deprecated_keywords({"showDistribution": "show_distribution"})
1859def rule_to_string(rule, show_distribution=True):
1860    """
1861    Write a string presentation of rule in human readable format.
1862   
1863    :param rule: rule to pretty-print.
1864    :type rule: :class:`~Orange.classification.rules.Rule`
1865   
1866    :param show_distribution: determines whether presentation should also
1867        contain the distribution of covered instances
1868    :type show_distribution: bool
1869   
1870    """
1871    def selectSign(oper):
1872        if oper == Orange.core.ValueFilter_continuous.Less:
1873            return "<"
1874        elif oper == Orange.core.ValueFilter_continuous.LessEqual:
1875            return "<="
1876        elif oper == Orange.core.ValueFilter_continuous.Greater:
1877            return ">"
1878        elif oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1879            return ">="
1880        else: return "="
1881
1882    if not rule:
1883        return "None"
1884    conds = rule.filter.conditions
1885    domain = rule.filter.domain
1886
1887    ret = "IF "
1888    if len(conds) == 0:
1889        ret = ret + "TRUE"
1890
1891    for i, c in enumerate(conds):
1892        if i > 0:
1893            ret += " AND "
1894        if type(c) == Orange.core.ValueFilter_discrete:
1895            ret += domain[c.position].name + "=" + str([domain[c.position].\
1896                values[int(v)] for v in c.values])
1897        elif type(c) == Orange.core.ValueFilter_continuous:
1898            ret += domain[c.position].name + selectSign(c.oper) + str(c.ref)
1899    if rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1900            and rule.classifier.default_val:
1901        ret = ret + " THEN " + domain.class_var.name + "=" + \
1902        str(rule.classifier.default_value)
1903        if show_distribution:
1904            ret += str(rule.class_distribution)
1905    elif rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1906            and type(domain.class_var) == Orange.core.EnumVariable:
1907        ret = ret + " THEN " + domain.class_var.name + "=" + \
1908        str(rule.class_distribution.modus())
1909        if show_distribution:
1910            ret += str(rule.class_distribution)
1911    return ret
1912
1913def supervisedClassCheck(instances):
1914    if not instances.domain.class_var:
1915        raise Exception("Class variable is required!")
1916    if instances.domain.class_var.varType == Orange.core.VarTypes.Continuous:
1917        raise Exception("CN2 requires a discrete class!")
1918
1919
1920def rule_in_set(rule, rules):
1921    for r in rules:
1922        if rules_equal(rule, r):
1923            return True
1924    return False
1925
1926def rules_equal(rule1, rule2):
1927    if not len(rule1.filter.conditions) == len(rule2.filter.conditions):
1928        return False
1929    for c1 in rule1.filter.conditions:
1930        found = False # find the same condition in the other rule
1931        for c2 in rule2.filter.conditions:
1932            try:
1933                if not c1.position == c2.position: continue # same feature?
1934                if not type(c1) == type(c2): continue # same type of condition
1935                if type(c1) == Orange.core.ValueFilter_discrete:
1936                    if not type(c1.values[0]) == type(c2.values[0]): continue
1937                    if not c1.values[0] == c2.values[0]: continue # same value?
1938                if type(c1) == Orange.core.ValueFilter_continuous:
1939                    if not c1.oper == c2.oper: continue # same operator?
1940                    if not c1.ref == c2.ref: continue #same threshold?
1941                found = True
1942                break
1943            except:
1944                pass
1945        if not found:
1946            return False
1947    return True
1948
1949# Miscellaneous - utility functions
1950def avg(l):
1951    if len(l) == 0:
1952        return 0.
1953    return sum(l) / len(l)
1954
1955def var(l):
1956    if len(l) < 2:
1957        return 0.
1958    av = avg(l)
1959    vars = [math.pow(li - av, 2) for li in l]
1960    return sum(vars) / (len(l) - 1)
1961
1962def median(l):
1963    if len(l) == 0:
1964        return 0.
1965    l.sort()
1966    le = len(l)
1967    if le % 2 == 1:
1968        return l[(le - 1) / 2]
1969    else:
1970        return (l[le / 2 - 1] + l[le / 2]) / 2
1971
1972def perc(l, p):
1973    l.sort()
1974    return l[int(math.floor(p * len(l)))]
1975
1976class EVDFitter:
1977    """ Randomizes a dataset and fits an extreme value distribution onto it. """
1978
1979    def __init__(self, learner, n=200, randomseed=100):
1980        self.learner = learner
1981        self.n = n
1982        self.randomseed = randomseed
1983        # initialize random seed to make experiments repeatable
1984        random.seed(self.randomseed)
1985
1986
1987    def createRandomDataSet(self, data):
1988        newData = Orange.core.ExampleTable(data)
1989        # shuffle data
1990        cl_num = newData.toNumpy("C")
1991        random.shuffle(cl_num[0][:, 0])
1992        clData = Orange.core.ExampleTable(Orange.core.Domain([newData.domain.classVar]), cl_num[0])
1993        for d_i, d in enumerate(newData):
1994            d[newData.domain.classVar] = clData[d_i][newData.domain.classVar]
1995        return newData
1996
1997    def createEVDistList(self, evdList):
1998        l = Orange.core.EVDistList()
1999        for el in evdList:
2000            l.append(Orange.core.EVDist(mu=el[0], beta=el[1], percentiles=el[2]))
2001        return l
2002
2003
2004    # estimated fisher tippett parameters for a set of values given in vals list (+ deciles)
2005    def compParameters(self, vals, oldMi, oldBeta, oldPercs, fixedBeta=False):
2006        # compute percentiles
2007        vals.sort()
2008        N = len(vals)
2009        percs = [avg(vals[int(float(N) * i / 10):int(float(N) * (i + 1) / 10)]) for i in range(10)]
2010        if N < 10:
2011            return oldMi, oldBeta, percs
2012        if not fixedBeta:
2013            beta = min(2.0, math.sqrt(6 * var(vals) / math.pow(math.pi, 2)))#min(2.0, max(oldBeta, math.sqrt(6*var(vals)/math.pow(math.pi,2))))
2014        else:
2015            beta = oldBeta
2016        mi = max(oldMi, percs[-1] + beta * math.log(-math.log(0.95)))
2017        mi = percs[-1] + beta * math.log(-math.log(0.95))
2018        return max(oldMi, numpy.average(vals) - beta * 0.5772156649), beta, None
2019
2020    def prepare_learner(self):
2021        self.oldStopper = self.learner.ruleFinder.ruleStoppingValidator
2022        self.evaluator = self.learner.ruleFinder.evaluator
2023        self.refiner = self.learner.ruleFinder.refiner
2024        self.validator = self.learner.ruleFinder.validator
2025        self.ruleFilter = self.learner.ruleFinder.ruleFilter
2026        self.learner.ruleFinder.validator = None
2027        self.learner.ruleFinder.evaluator = Orange.core.RuleEvaluator_LRS()
2028        self.learner.ruleFinder.evaluator.storeRules = True
2029        self.learner.ruleFinder.ruleStoppingValidator = Orange.core.RuleValidator_LRS(alpha=1.0)
2030        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = 0
2031        self.learner.ruleFinder.refiner = Orange.core.RuleBeamRefiner_Selector()
2032        self.learner.ruleFinder.ruleFilter = Orange.core.RuleBeamFilter_Width(width=5)
2033
2034
2035    def restore_learner(self):
2036        self.learner.ruleFinder.evaluator = self.evaluator
2037        self.learner.ruleFinder.ruleStoppingValidator = self.oldStopper
2038        self.learner.ruleFinder.refiner = self.refiner
2039        self.learner.ruleFinder.validator = self.validator
2040        self.learner.ruleFinder.ruleFilter = self.ruleFilter
2041
2042    def computeEVD(self, data, weightID=0, target_class=0, progress=None):
2043        import time
2044        # prepare learned for distribution computation       
2045        self.prepare_learner()
2046
2047        # loop through N (sampling repetitions)
2048        extremeDists = [(0, 1, [])]
2049        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = self.oldStopper.max_rule_complexity
2050        maxVals = [[] for l in range(self.oldStopper.max_rule_complexity + 1)]
2051        for d_i in range(self.n):
2052            if not progress:
2053                if self.learner.debug:
2054                    print d_i,
2055            else:
2056                progress(float(d_i) / self.n, None)
2057            # create data set (remove and randomize)
2058            a = time.time()
2059            tempData = self.createRandomDataSet(data)
2060            a = time.time()
2061            self.learner.ruleFinder.evaluator.rules = RuleList()
2062            a = time.time()
2063            for l in range(self.oldStopper.max_rule_complexity + 2):
2064               self.learner.ruleFinder.evaluator.rules.append(None)
2065            a = time.time()
2066            # Next, learn a rule
2067            self.learner.ruleFinder(tempData, weightID, target_class, RuleList())
2068            a = time.time()
2069            for l in range(self.oldStopper.max_rule_complexity + 1):
2070                if self.learner.ruleFinder.evaluator.rules[l]:
2071                    maxVals[l].append(self.learner.ruleFinder.evaluator.rules[l].quality)
2072                else:
2073                    maxVals[l].append(0)
2074##                qs = [r.quality for r in self.learner.ruleFinder.evaluator.rules if r.complexity == l+1]
2075####                if qs:
2076####                    for r in self.learner.ruleFinder.evaluator.rules:
2077####                        if r.quality == max(qs) and r.classDistribution.abs == 16 and r.classDistribution[0] == 16:
2078####                            print "best rule", orngCN2.ruleToString(r), r.quality
2079##                if qs:
2080##                    maxVals[l].append(max(qs))
2081##                else:
2082##                    maxVals[l].append(0)
2083            a = time.time()
2084
2085        # longer rule should always be better than shorter rule
2086        for l in range(self.oldStopper.max_rule_complexity):
2087            for i in range(len(maxVals[l])):
2088                if maxVals[l + 1][i] < maxVals[l][i]:
2089                    maxVals[l + 1][i] = maxVals[l][i]
2090##        print
2091##        for mi, m in enumerate(maxVals):
2092##            print "mi=",mi,m
2093
2094        mu, beta, perc = 1.0, 2.0, [0.0] * 10
2095        for mi, m in enumerate(maxVals):
2096##            if mi == 0:
2097##                mu, beta, perc = self.compParameters(m, mu, beta, perc)
2098##            else:
2099            mu, beta, perc = self.compParameters(m, mu, beta, perc, fixedBeta=True)
2100            extremeDists.append((mu, beta, perc))
2101            extremeDists.extend([(0, 1, [])] * (mi))
2102            if self.learner.debug:
2103                print mi, mu, beta, perc
2104
2105        self.restore_learner()
2106        return self.createEVDistList(extremeDists)
2107
2108class ABBeamFilter(Orange.core.RuleBeamFilter):
2109    """
2110    ABBeamFilter: Filters beam;
2111        - leaves first N rules (by quality)
2112        - leaves first N rules that have only of arguments in condition part
2113    """
2114    def __init__(self, width=5):
2115        self.width = width
2116        self.pArgs = None
2117
2118    def __call__(self, rulesStar, examples, weight_id):
2119        newStar = Orange.core.RuleList()
2120        rulesStar.sort(lambda x, y:-cmp(x.quality, y.quality))
2121        argsNum = 0
2122        for r_i, r in enumerate(rulesStar):
2123            if r_i < self.width: # either is one of best "width" rules
2124                newStar.append(r)
2125            elif self.onlyPositives(r):
2126                if argsNum < self.width:
2127                    newStar.append(r)
2128                    argsNum += 1
2129        return newStar
2130
2131    def setArguments(self, domain, positive_arguments):
2132        self.pArgs = positive_arguments
2133        self.domain = domain
2134        self.argTab = [0] * len(self.domain.attributes)
2135        for arg in self.pArgs:
2136            for cond in arg.filter.conditions:
2137                self.argTab[cond.position] = 1
2138
2139    def onlyPositives(self, rule):
2140        if not self.pArgs:
2141            return False
2142
2143        ruleTab = [0] * len(self.domain.attributes)
2144        for cond in rule.filter.conditions:
2145            ruleTab[cond.position] = 1
2146        return map(operator.or_, ruleTab, self.argTab) == self.argTab
2147
2148
2149class RuleCoversArguments:
2150    """
2151    Class determines if rule covers one out of a set of arguments.
2152    """
2153    def __init__(self, arguments):
2154        self.arguments = arguments
2155        self.indices = []
2156        for a in self.arguments:
2157            indNA = getattr(a.filter, "indices", None)
2158            if not indNA:
2159                a.filter.setattr("indices", RuleCoversArguments.filterIndices(a.filter))
2160            self.indices.append(a.filter.indices)
2161
2162    def __call__(self, rule):
2163        if not self.indices:
2164            return False
2165        if not getattr(rule.filter, "indices", None):
2166            rule.filter.indices = RuleCoversArguments.filterIndices(rule.filter)
2167        for index in self.indices:
2168            if map(operator.or_, rule.filter.indices, index) == rule.filter.indices:
2169                return True
2170        return False
2171
2172    def filterIndices(filter):
2173        if not filter.domain:
2174            return []
2175        ind = [0] * len(filter.domain.attributes)
2176        for c in filter.conditions:
2177            ind[c.position] = operator.or_(ind[c.position],
2178                                         RuleCoversArguments.conditionIndex(c))
2179        return ind
2180    filterIndices = staticmethod(filterIndices)
2181
2182    def conditionIndex(c):
2183        if type(c) == Orange.core.ValueFilter_continuous:
2184            if (c.oper == Orange.core.ValueFilter_continuous.GreaterEqual or
2185                c.oper == Orange.core.ValueFilter_continuous.Greater):
2186                return 5# 0101
2187            elif (c.oper == Orange.core.ValueFilter_continuous.LessEqual or
2188                  c.oper == Orange.core.ValueFilter_continuous.Less):
2189                return 3 # 0011
2190            else:
2191                return c.oper
2192        else:
2193            return 1 # 0001
2194    conditionIndex = staticmethod(conditionIndex)
2195
2196    def oneSelectorToCover(ruleIndices, argIndices):
2197        at, type = -1, 0
2198        for r_i, ind in enumerate(ruleIndices):
2199            if not argIndices[r_i]:
2200                continue
2201            if at > -1 and not ind == argIndices[r_i]: # need two changes
2202                return (-1, 0)
2203            if not ind == argIndices[r_i]:
2204                if argIndices[r_i] in [1, 3, 5]:
2205                    at, type = r_i, argIndices[r_i]
2206                if argIndices[r_i] == 6:
2207                    if ind == 3:
2208                        at, type = r_i, 5
2209                    if ind == 5:
2210                        at, type = r_i, 3
2211        return at, type
2212    oneSelectorToCover = staticmethod(oneSelectorToCover)
2213
2214
2215class SelectorAdder(Orange.core.RuleBeamRefiner):
2216    """
2217    Selector adder, this function is a refiner function:
2218       - refined rules are not consistent with any of negative arguments.
2219    """
2220    def __init__(self, example=None, not_allowed_selectors=[], argument_id=None,
2221                 discretizer=Orange.feature.discretization.Entropy(forceAttribute=True)):
2222        # required values - needed values of attributes
2223        self.example = example
2224        self.argument_id = argument_id
2225        self.not_allowed_selectors = not_allowed_selectors
2226        self.discretizer = discretizer
2227
2228    def __call__(self, oldRule, data, weight_id, target_class= -1):
2229        inNotAllowedSelectors = RuleCoversArguments(self.not_allowed_selectors)
2230        new_rules = Orange.core.RuleList()
2231
2232        # get positive indices (selectors already in the rule)
2233        indices = getattr(oldRule.filter, "indices", None)
2234        if not indices:
2235            indices = RuleCoversArguments.filterIndices(oldRule.filter)
2236            oldRule.filter.setattr("indices", indices)
2237
2238        # get negative indices (selectors that should not be in the rule)
2239        negative_indices = [0] * len(data.domain.attributes)
2240        for nA in self.not_allowed_selectors:
2241            #print indices, nA.filter.indices
2242            at_i, type_na = RuleCoversArguments.oneSelectorToCover(indices, nA.filter.indices)
2243            if at_i > -1:
2244                negative_indices[at_i] = operator.or_(negative_indices[at_i], type_na)
2245
2246        #iterate through indices = attributes
2247        for i, ind in enumerate(indices):
2248            if not self.example[i] or self.example[i].isSpecial():
2249                continue
2250            if ind == 1:
2251                continue
2252            if data.domain[i].varType == Orange.core.VarTypes.Discrete and not negative_indices[i] == 1: # DISCRETE attribute
2253                if self.example:
2254                    values = [self.example[i]]
2255                else:
2256                    values = data.domain[i].values
2257                for v in values:
2258                    tempRule = oldRule.clone()
2259                    tempRule.filter.conditions.append(Orange.core.ValueFilter_discrete(position=i,
2260                                                                                  values=[Orange.core.Value(data.domain[i], v)],
2261                                                                                  acceptSpecial=0))
2262                    tempRule.complexity += 1
2263                    tempRule.filter.indices[i] = 1 # 1 stands for discrete attribute (see RuleCoversArguments.conditionIndex)
2264                    tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2265                    if len(tempRule.examples) < len(oldRule.examples):
2266                        new_rules.append(tempRule)
2267            elif data.domain[i].varType == Orange.core.VarTypes.Continuous and not negative_indices[i] == 7: # CONTINUOUS attribute
2268                try:
2269                    at = data.domain[i]
2270                    at_d = self.discretizer(at, oldRule.examples)
2271                except:
2272                    continue # discretization failed !
2273                # If discretization makes sense? then:
2274                if len(at_d.values) > 1:
2275                    for p in at_d.getValueFrom.transformer.points:
2276                        #LESS
2277                        if not negative_indices[i] == 3:
2278                            tempRule = self.getTempRule(oldRule, i, Orange.core.ValueFilter_continuous.LessEqual, p, target_class, 3)
2279                            if len(tempRule.examples) < len(oldRule.examples) and self.example[i] <= p:# and not inNotAllowedSelectors(tempRule):
2280                                new_rules.append(tempRule)
2281                        #GREATER
2282                        if not negative_indices[i] == 5:
2283                            tempRule = self.getTempRule(oldRule, i, Orange.core.ValueFilter_continuous.Greater, p, target_class, 5)
2284                            if len(tempRule.examples) < len(oldRule.examples) and self.example[i] > p:# and not inNotAllowedSelectors(tempRule):
2285                                new_rules.append(tempRule)
2286        for r in new_rules:
2287            r.parentRule = oldRule
2288            r.valuesFilter = r.filter.filter
2289        return new_rules
2290
2291    def getTempRule(self, oldRule, pos, oper, ref, target_class, atIndex):
2292        tempRule = oldRule.clone()
2293
2294        tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=pos,
2295                                                                        oper=oper,
2296                                                                        ref=ref,
2297                                                                        acceptSpecial=0))
2298        tempRule.complexity += 1
2299        tempRule.filter.indices[pos] = operator.or_(tempRule.filter.indices[pos], atIndex) # from RuleCoversArguments.conditionIndex
2300        tempRule.filterAndStore(oldRule.examples, tempRule.weightID, target_class)
2301        return tempRule
2302
2303    def setCondition(self, oldRule, target_class, ci, condition):
2304        tempRule = oldRule.clone()
2305        tempRule.filter.conditions[ci] = condition
2306        tempRule.filter.conditions[ci].setattr("specialized", 1)
2307        tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2308        return tempRule
2309
2310SelectorAdder = deprecated_members({"notAllowedSelectors": "not_allowed_selectors",
2311                     "argumentID": "argument_id"})(SelectorAdder)
2312
2313# This filter is the ugliest code ever! Problem is with Orange, I had some problems with inheriting deepCopy
2314# I should take another look at it.
2315class ArgFilter(Orange.core.Filter):
2316    """ This class implements AB-covering principle. """
2317    def __init__(self, argument_id=None, filter=Orange.core.Filter_values(), arg_example=None):
2318        self.filter = filter
2319        self.indices = getattr(filter, "indices", [])
2320        if not self.indices and len(filter.conditions) > 0:
2321            self.indices = RuleCoversArguments.filterIndices(filter)
2322        self.argument_id = argument_id
2323        self.domain = self.filter.domain
2324        self.conditions = filter.conditions
2325        self.arg_example = arg_example
2326
2327    def condIn(self, cond): # is condition in the filter?
2328        condInd = ruleCoversArguments.conditionIndex(cond)
2329        if operator.or_(condInd, self.indices[cond.position]) == self.indices[cond.position]:
2330            return True
2331        return False
2332
2333    def __call__(self, example):
2334##        print "in", self.filter(example)#, self.filter.conditions[0](example)
2335##        print self.filter.conditions[1].values
2336        if self.filter(example) and example != self.arg_example:
2337            return True
2338        elif self.filter(example):
2339            try:
2340                if example[self.argument_id].value and len(example[self.argument_id].value.positiveArguments) > 0: # example has positive arguments
2341                    # conditions should cover at least one of the positive arguments
2342                    oneArgCovered = False
2343                    for pA in example[self.argument_id].value.positiveArguments:
2344                        argCovered = [self.condIn(c) for c in pA.filter.conditions]
2345                        oneArgCovered = oneArgCovered or len(argCovered) == sum(argCovered) #argCovered
2346                        if oneArgCovered:
2347                            break
2348                    if not oneArgCovered:
2349                        return False
2350                if example[self.argument_id].value and len(example[self.argument_id].value.negativeArguments) > 0: # example has negative arguments
2351                    # condition should not cover neither of negative arguments
2352                    for pN in example[self.argument_id].value.negativeArguments:
2353                        argCovered = [self.condIn(c) for c in pN.filter.conditions]
2354                        if len(argCovered) == sum(argCovered):
2355                            return False
2356            except:
2357                return True
2358            return True
2359        else:
2360            return False
2361
2362    def __setattr__(self, name, obj):
2363        self.__dict__[name] = obj
2364        self.filter.setattr(name, obj)
2365
2366    def deep_copy(self):
2367        newFilter = ArgFilter(argument_id=self.argument_id)
2368        newFilter.filter = Orange.core.Filter_values() #self.filter.deepCopy()
2369        newFilter.filter.conditions = self.filter.conditions[:]
2370        newFilter.domain = self.filter.domain
2371        newFilter.negate = self.filter.negate
2372        newFilter.conjunction = self.filter.conjunction
2373        newFilter.domain = self.filter.domain
2374        newFilter.conditions = newFilter.filter.conditions
2375        newFilter.indices = self.indices[:]
2376        return newFilter
2377
2378ArgFilter = deprecated_members({"argumentID": "argument_id"})(ArgFilter)
2379
2380class SelectorArgConditions(Orange.core.RuleBeamRefiner):
2381    """
2382    Selector adder, this function is a refiner function:
2383      - refined rules are not consistent with any of negative arguments.
2384    """
2385    def __init__(self, example, allowed_selectors):
2386        # required values - needed values of attributes
2387        self.example = example
2388        self.allowed_selectors = allowed_selectors
2389
2390    def __call__(self, oldRule, data, weight_id, target_class= -1):
2391        if len(oldRule.filter.conditions) >= len(self.allowed_selectors):
2392            return Orange.core.RuleList()
2393        new_rules = Orange.core.RuleList()
2394        for c in self.allowed_selectors:
2395            # normal condition
2396            if not c.unspecialized_condition:
2397                tempRule = oldRule.clone()
2398                tempRule.filter.conditions.append(c)
2399                tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2400                if len(tempRule.examples) < len(oldRule.examples):
2401                    new_rules.append(tempRule)
2402            # unspecified condition
2403            else:
2404                # find all possible example values
2405                vals = {}
2406                for e in oldRule.examples:
2407                    if not e[c.position].isSpecial():
2408                        vals[str(e[c.position])] = 1
2409                values = vals.keys()
2410                # for each value make a condition
2411                for v in values:
2412                    tempRule = oldRule.clone()
2413                    tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=c.position,
2414                                                                                    oper=c.oper,
2415                                                                                    ref=float(v),
2416                                                                                    acceptSpecial=0))
2417                    if tempRule(self.example):
2418                        tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2419                        if len(tempRule.examples) < len(oldRule.examples):
2420                            new_rules.append(tempRule)
2421##        print " NEW RULES "
2422##        for r in new_rules:
2423##            print Orange.classification.rules.rule_to_string(r)
2424        for r in new_rules:
2425            r.parentRule = oldRule
2426##            print Orange.classification.rules.rule_to_string(r)
2427        return new_rules
2428
2429
2430class CrossValidation:
2431    def __init__(self, folds=5, random_generator=150):
2432        self.folds = folds
2433        self.random_generator = random_generator
2434
2435    def __call__(self, learner, examples, weight):
2436        res = orngTest.crossValidation([learner], (examples, weight), folds=self.folds, random_generator=self.random_generator)
2437        return self.get_prob_from_res(res, examples)
2438
2439    def get_prob_from_res(self, res, examples):
2440        prob_dist = Orange.core.DistributionList()
2441        for tex in res.results:
2442            d = Orange.core.Distribution(examples.domain.class_var)
2443            for di in range(len(d)):
2444                d[di] = tex.probabilities[0][di]
2445            prob_dist.append(d)
2446        return prob_dist
2447
2448
2449class PILAR:
2450    """
2451    PILAR (Probabilistic improvement of learning algorithms with rules).
2452    """
2453    def __init__(self, alternative_learner=None, min_cl_sig=0.5, min_beta=0.0, set_prefix_rules=False, optimize_betas=True):
2454        self.alternative_learner = alternative_learner
2455        self.min_cl_sig = min_cl_sig
2456        self.min_beta = min_beta
2457        self.set_prefix_rules = set_prefix_rules
2458        self.optimize_betas = optimize_betas
2459        self.selected_evaluation = CrossValidation(folds=5)
2460
2461    def __call__(self, rules, examples, weight=0):
2462        rules = self.add_null_rule(rules, examples, weight)
2463        if self.alternative_learner:
2464            prob_dist = self.selected_evaluation(self.alternative_learner, examples, weight)
2465            classifier = self.alternative_learner(examples, weight)
2466##            prob_dist = Orange.core.DistributionList()
2467##            for e in examples:
2468##                prob_dist.append(classifier(e,Orange.core.GetProbabilities))
2469            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas, classifier, prob_dist)
2470        else:
2471            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas)
2472
2473##        print "result"
2474        for ri, r in enumerate(cl.rules):
2475            cl.rules[ri].setattr("beta", cl.ruleBetas[ri])
2476##            if cl.ruleBetas[ri] > 0:
2477##                print Orange.classification.rules.rule_to_string(r), r.quality, cl.ruleBetas[ri]
2478        cl.all_rules = cl.rules
2479        cl.rules = self.sort_rules(cl.rules)
2480        cl.ruleBetas = [r.beta for r in cl.rules]
2481        cl.setattr("data", examples)
2482        return cl
2483
2484    def add_null_rule(self, rules, examples, weight):
2485        for cl in examples.domain.class_var:
2486            tmpRle = Orange.core.Rule()
2487            tmpRle.filter = Orange.core.Filter_values(domain=examples.domain)
2488            tmpRle.parentRule = None
2489            tmpRle.filterAndStore(examples, weight, int(cl))
2490            tmpRle.quality = tmpRle.class_distribution[int(cl)] / tmpRle.class_distribution.abs
2491            rules.append(tmpRle)
2492        return rules
2493
2494    def sort_rules(self, rules):
2495        new_rules = Orange.core.RuleList()
2496        foundRule = True
2497        while foundRule:
2498            foundRule = False
2499            bestRule = None
2500            for r in rules:
2501                if r in new_rules:
2502                    continue
2503                if r.beta < 0.01 and r.beta > -0.01:
2504                    continue
2505                if not bestRule:
2506                    bestRule = r
2507                    foundRule = True
2508                    continue
2509                if len(r.filter.conditions) < len(bestRule.filter.conditions):
2510                    bestRule = r
2511                    foundRule = True
2512                    continue
2513                if len(r.filter.conditions) == len(bestRule.filter.conditions) and r.beta > bestRule.beta:
2514                    bestRule = r
2515                    foundRule = True
2516                    continue
2517            if bestRule:
2518                new_rules.append(bestRule)
2519        return new_rules
2520
2521PILAR = deprecated_members({"sortRules": "sort_rules"})(PILAR)
2522
2523
2524class RuleClassifier_bestRule(Orange.core.RuleClassifier):
2525    """
2526    A very simple classifier, it takes the best rule of each class and
2527    normalizes probabilities.
2528    """
2529    def __init__(self, rules, examples, weight_id=0, **argkw):
2530        self.rules = rules
2531        self.examples = examples
2532        self.apriori = Orange.core.Distribution(examples.domain.class_var, examples, weight_id)
2533        self.apriori_prob = [a / self.apriori.abs for a in self.apriori]
2534        self.weight_id = weight_id
2535        self.__dict__.update(argkw)
2536        self.default_class_index = -1
2537
2538    @deprecated_keywords({"retRules": "ret_rules"})
2539    def __call__(self, example, result_type=Orange.classification.Classifier.GetValue, ret_rules=False):
2540        example = Orange.core.Example(self.examples.domain, example)
2541        tempDist = Orange.core.Distribution(example.domain.class_var)
2542        best_rules = [None] * len(example.domain.class_var.values)
2543
2544        for r in self.rules:
2545            if r(example) and not self.default_class_index == int(r.classifier.default_val) and \
2546               (not best_rules[int(r.classifier.default_val)] or r.quality > tempDist[r.classifier.default_val]):
2547                tempDist[r.classifier.default_val] = r.quality
2548                best_rules[int(r.classifier.default_val)] = r
2549        for b in best_rules:
2550            if b:
2551                used = getattr(b, "used", 0.0)
2552                b.setattr("used", used + 1)
2553        nonCovPriorSum = sum([tempDist[i] == 0. and self.apriori_prob[i] or 0. for i in range(len(self.apriori_prob))])
2554        if tempDist.abs < 1.:
2555            residue = 1. - tempDist.abs
2556            for a_i, a in enumerate(self.apriori_prob):
2557                if tempDist[a_i] == 0.:
2558                    tempDist[a_i] = self.apriori_prob[a_i] * residue / nonCovPriorSum
2559            final_dist = tempDist #Orange.core.Distribution(example.domain.class_var)
2560        else:
2561            tempDist.normalize() # prior probability
2562            tmp_examples = Orange.core.ExampleTable(self.examples)
2563            for r in best_rules:
2564                if r:
2565                    tmp_examples = r.filter(tmp_examples)
2566            tmpDist = Orange.core.Distribution(tmp_examples.domain.class_var, tmp_examples, self.weight_id)
2567            tmpDist.normalize()
2568            probs = [0.] * len(self.examples.domain.class_var.values)
2569            for i in range(len(self.examples.domain.class_var.values)):
2570                probs[i] = tmpDist[i] + tempDist[i] * 2
2571            final_dist = Orange.core.Distribution(self.examples.domain.class_var)
2572            for cl_i, cl in enumerate(self.examples.domain.class_var):
2573                final_dist[cl] = probs[cl_i]
2574            final_dist.normalize()
2575
2576        if ret_rules: # Do you want to return rules with classification?
2577            if result_type == Orange.classification.Classifier.GetValue:
2578              return (final_dist.modus(), best_rules)
2579            if result_type == Orange.core.GetProbabilities:
2580              return (final_dist, best_rules)
2581            return (final_dist.modus(), final_dist, best_rules)
2582        if result_type == Orange.classification.Classifier.GetValue:
2583          return final_dist.modus()
2584        if result_type == Orange.core.GetProbabilities:
2585          return final_dist
2586        return (final_dist.modus(), final_dist)
2587
2588RuleClassifier_bestRule = deprecated_members({"defaultClassIndex": "default_class_index"})(RuleClassifier_bestRule)
Note: See TracBrowser for help on using the repository browser.