source: orange/orange/classification/rules.py @ 9669:165371b04b4a

Revision 9669:165371b04b4a, 108.6 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved content of Orange dir to package dir

Line 
1"""
2
3.. index:: rule induction
4
5.. index::
6   single: classification; rule induction
7
8**************************
9Rule induction (``rules``)
10**************************
11
12This module implements supervised rule induction algorithms
13and rule-based classification methods, specifically the
14`CN2 induction algorithm <http://www.springerlink.com/content/k6q2v76736w5039r/>`_
15in multiple variants, including an argument-based learning one.
16The implementation is modular, based on the rule induction
17framework that is described below, providing the opportunity to change, specialize
18and improve the algorithm.
19
20CN2 algorithm
21=============
22
23.. index::
24   single: classification; CN2
25
26Several variations of well-known CN2 rule learning algorithms are implemented.
27All are implemented by wrapping the
28:class:`~Orange.classification.rules.RuleLearner` class. Each CN2 learner class
29in this module changes some of RuleLearner's replaceable components to reflect
30the required behavior.
31
32Usage is consistent with typical learner usage in Orange:
33
34:download:`rules-cn2.py <code/rules-cn2.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
35
36.. literalinclude:: code/rules-cn2.py
37    :lines: 7-
38
39The result::
40   
41    IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000>
42    IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000>
43    IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000>
44    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000>
45    IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000>
46    IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000>
47    IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000>
48    IF status=['second'] THEN survived=yes<13.000, 80.000>
49    IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000>
50    IF status=['crew'] THEN survived=no<670.000, 192.000>
51    IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000>
52    IF sex=['male'] THEN survived=no<118.000, 57.000>
53    IF age=['child'] THEN survived=no<17.000, 14.000>
54    IF TRUE THEN survived=no<89.000, 76.000>
55   
56.. autoclass:: Orange.classification.rules.CN2Learner
57   :members:
58   :show-inheritance:
59   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
60      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
61   
62.. autoclass:: Orange.classification.rules.CN2Classifier
63   :members:
64   :show-inheritance:
65   :exclude-members: beamWidth, resultType
66   
67.. index:: unordered CN2
68
69.. index::
70   single: classification; unordered CN2
71
72.. autoclass:: Orange.classification.rules.CN2UnorderedLearner
73   :members:
74   :show-inheritance:
75   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
76      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
77   
78.. autoclass:: Orange.classification.rules.CN2UnorderedClassifier
79   :members:
80   :show-inheritance:
81   
82.. index:: CN2-SD
83.. index:: subgroup discovery
84
85.. index::
86   single: classification; CN2-SD
87   
88.. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner
89   :members:
90   :show-inheritance:
91   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
92      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
93   
94.. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner
95   :members:
96   :show-inheritance:
97   
98References
99----------
100
101* Clark, Niblett. `The CN2 Induction Algorithm
102  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine
103  Learning 3(4):261--284, 1989.
104* Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements
105  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In
106  Machine Learning - EWSL-91. Proceedings of the European Working Session on
107  Learning, pp 151--163, Porto, Portugal, March 1991.
108* Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD
109  <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal
110  of Machine Learning Research 5: 153-188, 2004.
111
112
113Argument based CN2
114==================
115
116Orange also supports argument-based CN2 learning.
117
118.. autoclass:: Orange.classification.rules.ABCN2
119   :members:
120   :show-inheritance:
121   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
122      ruleFinder, ruleStopping, storeInstances, targetClass, weightID,
123      argument_id
124   
125   This class has many more undocumented methods; see the source code for
126   reference.
127   
128.. autoclass:: Orange.classification.rules.ABCN2Ordered
129   :members:
130   :show-inheritance:
131   
132.. autoclass:: Orange.classification.rules.ABCN2M
133   :members:
134   :show-inheritance:
135   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
136      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
137
138Thismodule has many more undocumented argument-based learning related classed;
139see the source code for reference.
140
141References
142----------
143
144* Bratko, Mozina, Zabkar. `Argument-Based Machine Learning
145  <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in
146  Computer Science: vol. 4203/2006, 11-17, 2006.
147
148
149Rule induction framework
150========================
151
152A general framework of classes supports the described CN2 implementation, and
153can in fact be fine-tuned to specific needs by replacing individual components.
154Here is a simple example, while a detailed architecture can be observed
155in description of classes that follows it:
156
157part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
158
159.. literalinclude:: code/rules-customized.py
160    :lines: 7-17
161
162probability with m=50. The result is::
163
164    IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000>
165    IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000>
166    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000>
167    IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000>
168    IF status=['second'] THEN survived=yes<13.000, 104.000>
169    IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000>
170    IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000>
171    IF status=['crew'] THEN survived=yes<3.000, 20.000>
172    IF sex=['female'] THEN survived=no<106.000, 90.000>
173    IF TRUE THEN survived=yes<0.000, 5.000>
174
175Notice that it is first necessary to set the :obj:`rule_finder` component,
176because the default components are not constructed when the learner is
177constructed, but only when we run it on data. At that time, the algorithm
178checks which components are necessary and sets defaults. Similarly, when the
179learner finishes, it destructs all *default* components. Continuing with our
180example, assume that we wish to set a different validation function and a
181different bean width. This is simply written as:
182
183part of :download:`rules-customized.py <code/rules-customized.py>` (uses :download:`titanic.tab <code/titanic.tab>`)
184
185.. literalinclude:: code/rules-customized.py
186    :lines: 19-23
187
188.. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1)
189   
190   Representation of a single induced rule.
191   
192   Parameters, that can be passed to the constructor, correspond to the first
193   7 attributes. All attributes are:
194   
195   .. attribute:: filter
196   
197      contents of the rule; this is the basis of the Rule class. Must be of
198      type :class:`Orange.core.Filter`; an instance of
199      :class:`Orange.core.Filter_values` is set as a default.
200   
201   .. attribute:: classifier
202     
203      each rule can be used as a classical Orange like
204      classifier. Must be of type :class:`Orange.classification.Classifier`.
205      By default, an instance of :class:`Orange.classification.ConstantClassifier` is used.
206   
207   .. attribute:: learner
208     
209      learner to be used for making a classifier. Must be of type
210      :class:`Orange.classification.Learner`. By default,
211      :class:`Orange.classification.majority.MajorityLearner` is used.
212   
213   .. attribute:: class_distribution
214     
215      distribution of class in data instances covered by this rule
216      (:class:`Orange.statistics.distribution.Distribution`).
217   
218   .. attribute:: examples
219     
220      data instances covered by this rule (:class:`Orange.data.Table`).
221   
222   .. attribute:: weight_id
223   
224      ID of the weight meta-attribute for the stored data instances (int).
225   
226   .. attribute:: quality
227     
228      quality of the rule. Rules with higher quality are better (float).
229   
230   .. attribute:: complexity
231   
232      complexity of the rule (float). Complexity is used for
233      selecting between rules with equal quality, where rules with lower
234      complexity are preferred. Typically, complexity corresponds to the
235      number of selectors in rule (actually to number of conditions in filter),
236      but, obviously, any other measure can be applied.
237   
238   .. method:: filter_and_store(instances, weight_id=0, target_class=-1)
239   
240      Filter passed data instances and store them in the attribute 'examples'.
241      Also, compute class_distribution, set weight of stored examples and create
242      a new classifier using 'learner' attribute.
243     
244      :param weight_id: ID of the weight meta-attribute.
245      :type weight_id: int
246      :param target_class: index of target class; -1 for all.
247      :type target_class: int
248   
249   Objects of this class can be invoked:
250
251   .. method:: __call__(instance, instances, weight_id=0, target_class=-1)
252   
253      There are two ways of invoking this method. One way is only passing the
254      data instance; then the Rule object returns True if the given instance is
255      covered by the rule's filter.
256     
257      :param instance: data instance.
258      :type instance: :class:`Orange.data.Instance`
259     
260      Another way of invocation is passing a table of data instances,
261      in which case a table of instances, covered by this rule, is returned.
262     
263      :param instances: a table of data instances.
264      :type instances: :class:`Orange.data.Table`
265      :param ref: TODO
266      :type ref: bool
267      :param negate: if set to True, the result is inverted: the resulting
268          table contains instances *not* covered by the rule.
269      :type negate: bool
270
271.. py:class:: Orange.classification.rules.RuleLearner(store_instances = true, target_class = -1, base_rules = Orange.classification.rules.RuleList())
272   
273   Bases: :class:`Orange.classification.Learner`
274   
275   A base rule induction learner. The algorithm follows separate-and-conquer
276   strategy, which has its origins in the AQ family of algorithms
277   (Fuernkranz J.; Separate-and-Conquer Rule Learning, Artificial Intelligence
278   Review 13, 3-54, 1999). Basically, such algorithms search for the "best"
279   possible rule in learning instancess, remove covered data from learning
280   instances (separate) and repeat the process (conquer) on the remaining
281   instances.
282   
283   The class' functionality can be best explained by showing its __call__
284   function:
285   
286   .. parsed-literal::
287
288      def \_\_call\_\_(self, instances, weight_id=0):
289          rule_list = Orange.classification.rules.RuleList()
290          all_instances = Orange.data.Table(instances)
291          while not self.\ **data_stopping**\ (instances, weight_id, self.target_class):
292              new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class,
293                                        self.base_rules)
294              if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id):
295                  break
296              instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances,
297                                                      weight_id, self.target_class)
298              rule_list.append(new_rule)
299          return Orange.classification.rules.RuleClassifier_FirstRule(
300              rules=rule_list, instances=all_instances)
301               
302   The four customizable components here are the invoked :obj:`data_stopping`,
303   :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping`
304   objects. By default, components of the original CN2 algorithm are be used,
305   but this can be changed by modifying those attributes:
306   
307   .. attribute:: data_stopping
308   
309      an object of class
310      :class:`~Orange.classification.rules.RuleDataStoppingCriteria`
311      that determines whether there will be any benefit from further learning
312      (ie. if there is enough data to continue learning). The default
313      implementation
314      (:class:`~Orange.classification.rules.RuleDataStoppingCriteria_NoPositives`)
315      returns True if there are no more instances of given class.
316   
317   .. attribute:: rule_stopping
318     
319      an object of class
320      :class:`~Orange.classification.rules.RuleStoppingCriteria`
321      that decides from the last rule learned if it is worthwhile to use the
322      rule and learn more rules. By default, no rule stopping criteria is
323      used (:obj:`rule_stopping` == :obj:`None`), thus accepting all
324      rules.
325       
326   .. attribute:: cover_and_remove
327       
328      an object of
329      :class:`RuleCovererAndRemover` that removes
330      instances covered by the rule and returns remaining instances. The
331      default implementation
332      (:class:`RuleCovererAndRemover_Default`)
333      only removes the instances that belong to given target class, except if
334      it is not given (ie. :obj:`target_class` == -1).
335   
336   .. attribute:: rule_finder
337     
338      an object of class
339      :class:`~Orange.classification.rules.RuleFinder` that learns a single
340      rule from instances. Default implementation is
341      :class:`~Orange.classification.rules.RuleBeamFinder`.
342
343   :param store_instances: if set to True, the rules will have data instances
344       stored.
345   :type store_instances: bool
346   
347   :param target_class: index of a specific class being learned; -1 for all.
348   :type target_class: int
349   
350   :param base_rules: Rules that we would like to use in :obj:`rule_finder` to
351       constrain the learning space. If not set, it will be set to a set
352       containing only an empty rule.
353   :type base_rules: :class:`~Orange.classification.rules.RuleList`
354
355Rule finders
356------------
357
358.. class:: Orange.classification.rules.RuleFinder
359
360   Base class for all rule finders. These are used to learn a single rule from
361   instances.
362   
363   Rule finders are invokable in the following manner:
364   
365   .. method:: __call__(table, weight_id, target_class, base_rules)
366   
367      Return a new rule, induced from instances in the given table.
368     
369      :param table: data instances to learn from.
370      :type table: :class:`Orange.data.Table`
371     
372      :param weight_id: ID of the weight meta-attribute for the stored data
373          instances.
374      :type weight_id: int
375     
376      :param target_class: index of a specific class being learned; -1 for all.
377      :type target_class: int
378     
379      :param base_rules: Rules that we would like to use in :obj:`rule_finder`
380          to constrain the learning space. If not set, it will be set to a set
381          containing only an empty rule.
382      :type base_rules: :class:`~Orange.classification.rules.RuleList`
383
384.. class:: Orange.classification.rules.RuleBeamFinder
385   
386   Bases: :class:`~Orange.classification.rules.RuleFinder`
387   
388   Beam search for the best rule. This is the default class used in RuleLearner
389   to find the best rule. Pseudo code of the algorithm is shown here:
390
391   .. parsed-literal::
392
393      def \_\_call\_\_(self, table, weight_id, target_class, base_rules):
394          prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id)
395          rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior)
396          \# compute quality of rules in rules_star and best_rule
397          ...
398          while len(rules_star) \> 0:
399              candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id)
400              for cand in candidates:
401                  new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class)
402                  for new_rule in new_rules:
403                      if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution):
404                          new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior)
405                          rules_star.append(new_rule)
406                          if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and
407                              new_rule.quality \> best_rule.quality:
408                              best_rule = new_rule
409              rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id)
410          return best_rule
411
412   Bolded in the pseudo-code are several exchangeable components, exposed as
413   attributes. These are:
414
415   .. attribute:: initializer
416   
417      an object of class
418      :class:`~Orange.classification.rules.RuleBeamInitializer`
419      used to initialize :obj:`rules_star` and for selecting the
420      initial best rule. By default
421      (:class:`~Orange.classification.rules.RuleBeamInitializer_Default`),
422      :obj:`base_rules` are returned as starting :obj:`rulesSet` and the best
423      from :obj:`base_rules` is set as :obj:`best_rule`. If :obj:`base_rules`
424      are not set, this class will return :obj:`rules_star` with rule that
425      covers all instances (has no selectors) and this rule will be also used
426      as :obj:`best_rule`.
427   
428   .. attribute:: candidate_selector
429   
430      an object of class
431      :class:`~Orange.classification.rules.RuleBeamCandidateSelector`
432      used to separate a subset from the current
433      :obj:`rules_star` and return it. These rules will be used in the next
434      specification step. Default component (an instance of
435      :class:`~Orange.classification.rules.RuleBeamCandidateSelector_TakeAll`)
436      takes all rules in :obj:`rules_star`.
437   
438   .. attribute:: refiner
439   
440      an object of class
441      :class:`~Orange.classification.rules.RuleBeamRefiner`
442      used to refine given rule. New rule should cover a
443      strict subset of examples covered by given rule. Default component
444      (:class:`~Orange.classification.rules.RuleBeamRefiner_Selector`) adds
445      a conjunctive selector to selectors present in the rule.
446   
447   .. attribute:: rule_filter
448   
449      an object of class
450      :class:`~Orange.classification.rules.RuleBeamFilter`
451      used to filter rules to keep beam relatively small
452      to contain search complexity. By default, it takes five best rules:
453      :class:`~Orange.classification.rules.RuleBeamFilter_Width`\ *(m=5)*\ .
454
455   .. method:: __call__(data, weight_id, target_class, base_rules)
456
457   Determines the next best rule to cover the remaining data instances.
458   
459   :param data: data instances.
460   :type data: :class:`Orange.data.Table`
461   
462   :param weight_id: index of the weight meta-attribute.
463   :type weight_id: int
464   
465   :param target_class: index of the target class.
466   :type target_class: int
467   
468   :param base_rules: existing rules.
469   :type base_rules: :class:`~Orange.classification.rules.RuleList`
470
471Rule evaluators
472---------------
473
474.. class:: Orange.classification.rules.RuleEvaluator
475
476   Base class for rule evaluators that evaluate the quality of the rule based
477   on covered data instances. All evaluators support being invoked in the
478   following manner:
479   
480   .. method:: __call__(rule, instances, weight_id, target_class, prior)
481   
482      Calculates a non-negative rule quality.
483     
484      :param rule: rule to evaluate.
485      :type rule: :class:`~Orange.classification.rules.Rule`
486     
487      :param instances: a table of instances, covered by the rule.
488      :type instances: :class:`Orange.data.Table`
489     
490      :param weight_id: index of the weight meta-attribute.
491      :type weight_id: int
492     
493      :param target_class: index of target class of this rule.
494      :type target_class: int
495     
496      :param prior: prior class distribution.
497      :type prior: :class:`Orange.statistics.distribution.Distribution`
498
499.. autoclass:: Orange.classification.rules.LaplaceEvaluator
500   :members:
501   :show-inheritance:
502   :exclude-members: targetClass, weightID
503
504.. autoclass:: Orange.classification.rules.WRACCEvaluator
505   :members:
506   :show-inheritance:
507   :exclude-members: targetClass, weightID
508   
509.. class:: Orange.classification.rules.RuleEvaluator_Entropy
510
511   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
512   
513.. class:: Orange.classification.rules.RuleEvaluator_LRS
514
515   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
516
517.. class:: Orange.classification.rules.RuleEvaluator_Laplace
518
519   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
520
521.. class:: Orange.classification.rules.RuleEvaluator_mEVC
522
523   Bases: :class:`~Orange.classification.rules.RuleEvaluator`
524   
525Instance covering and removal
526-----------------------------
527
528.. class:: RuleCovererAndRemover
529
530   Base class for rule coverers and removers that, when invoked, remove
531   instances covered by the rule and return remaining instances.
532
533   .. method:: __call__(rule, instances, weights, target_class)
534   
535      Calculates a non-negative rule quality.
536     
537      :param rule: rule to evaluate.
538      :type rule: :class:`~Orange.classification.rules.Rule`
539     
540      :param instances: a table of instances, covered by the rule.
541      :type instances: :class:`Orange.data.Table`
542     
543      :param weights: index of the weight meta-attribute.
544      :type weights: int
545     
546      :param target_class: index of target class of this rule.
547      :type target_class: int
548
549.. autoclass:: CovererAndRemover_MultWeights
550
551.. autoclass:: CovererAndRemover_AddWeights
552   
553Miscellaneous functions
554-----------------------
555
556.. automethod:: Orange.classification.rules.rule_to_string
557
558..
559    Undocumented are:
560    Data-based Stopping Criteria
561    ----------------------------
562    Rule-based Stopping Criteria
563    ----------------------------
564    Rule-based Stopping Criteria
565    ----------------------------
566
567"""
568
569import random
570import math
571import operator
572import numpy
573
574import Orange
575import Orange.core
576from Orange.core import \
577    AssociationClassifier, \
578    AssociationLearner, \
579    RuleClassifier, \
580    RuleClassifier_firstRule, \
581    RuleClassifier_logit, \
582    RuleLearner, \
583    Rule, \
584    RuleBeamCandidateSelector, \
585    RuleBeamCandidateSelector_TakeAll, \
586    RuleBeamFilter, \
587    RuleBeamFilter_Width, \
588    RuleBeamInitializer, \
589    RuleBeamInitializer_Default, \
590    RuleBeamRefiner, \
591    RuleBeamRefiner_Selector, \
592    RuleClassifierConstructor, \
593    RuleCovererAndRemover, \
594    RuleCovererAndRemover_Default, \
595    RuleDataStoppingCriteria, \
596    RuleDataStoppingCriteria_NoPositives, \
597    RuleEvaluator, \
598    RuleEvaluator_Entropy, \
599    RuleEvaluator_LRS, \
600    RuleEvaluator_Laplace, \
601    RuleEvaluator_mEVC, \
602    RuleFinder, \
603    RuleBeamFinder, \
604    RuleList, \
605    RuleStoppingCriteria, \
606    RuleStoppingCriteria_NegativeDistribution, \
607    RuleValidator, \
608    RuleValidator_LRS
609from Orange.misc import deprecated_keywords
610from Orange.misc import deprecated_members
611
612from orngABML import *
613
614class LaplaceEvaluator(RuleEvaluator):
615    """
616    Laplace's rule of succession.
617    """
618    def __call__(self, rule, data, weight_id, target_class, apriori):
619        if not rule.class_distribution:
620            return 0.
621        sumDist = rule.class_distribution.cases
622        if not sumDist or (target_class>-1 and not rule.class_distribution[target_class]):
623            return 0.
624        # get distribution
625        if target_class>-1:
626            return (rule.class_distribution[target_class]+1)/(sumDist+2)
627        else:
628            return (max(rule.class_distribution)+1)/(sumDist+len(data.domain.class_var.values))
629
630LaplaceEvaluator = deprecated_members({"weightID": "weight_id",
631                                       "targetClass": "target_class"})(LaplaceEvaluator)
632
633
634class WRACCEvaluator(RuleEvaluator):
635    """
636    Weighted relative accuracy.
637    """
638    def __call__(self, rule, data, weight_id, target_class, apriori):
639        if not rule.class_distribution:
640            return 0.
641        sumDist = rule.class_distribution.cases
642        if not sumDist or (target_class>-1 and not rule.class_distribution[target_class]):
643            return 0.
644        # get distribution
645        if target_class>-1:
646            pRule = rule.class_distribution[target_class]/apriori[target_class]
647            pTruePositive = rule.class_distribution[target_class]/sumDist
648            pClass = apriori[target_class]/apriori.cases
649        else:
650            pRule = sumDist/apriori.cases
651            pTruePositive = max(rule.class_distribution)/sumDist
652            pClass = apriori[rule.class_distribution.modus()]/sum(apriori)
653        if pTruePositive>pClass:
654            return pRule*(pTruePositive-pClass)
655        else: return (pTruePositive-pClass)/max(pRule,1e-6)
656
657WRACCEvaluator = deprecated_members({"weightID": "weight_id",
658                                     "targetClass": "target_class"})(WRACCEvaluator)
659
660
661class MEstimateEvaluator(RuleEvaluator):
662    """
663    Rule evaluator using m-estimate of probability rule evaluation function.
664   
665    :param m: m-value for m-estimate
666    :type m: int
667   
668    """
669    def __init__(self, m=2):
670        self.m = m
671    def __call__(self, rule, data, weight_id, target_class, apriori):
672        if not rule.class_distribution:
673            return 0.
674        sumDist = rule.class_distribution.abs
675        if self.m == 0 and not sumDist:
676            return 0.
677        # get distribution
678        if target_class>-1:
679            p = rule.class_distribution[target_class]+self.m*apriori[target_class]/apriori.abs
680            p = p / (rule.class_distribution.abs + self.m)
681        else:
682            p = max(rule.class_distribution)+self.m*apriori[rule.\
683                class_distribution.modus()]/apriori.abs
684            p = p / (rule.class_distribution.abs + self.m)     
685        return p
686
687MEstimateEvaluator = deprecated_members({"weightID": "weight_id",
688                                         "targetClass": "target_class"})(MEstimateEvaluator)
689
690
691class CN2Learner(RuleLearner):
692    """
693    Classical CN2 (see Clark and Niblett; 1988) induces a set of ordered
694    rules, which means that classificator must try these rules in the same
695    order as they were learned.
696   
697    If data instances are provided to the constructor, the learning algorithm
698    is called and the resulting classifier is returned instead of the learner.
699
700    :param evaluator: an object that evaluates a rule from covered instances.
701        By default, entropy is used as a measure.
702    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
703    :param beam_width: width of the search beam.
704    :type beam_width: int
705    :param alpha: significance level of the likelihood ratio statistics to
706        determine whether rule is better than the default rule.
707    :type alpha: float
708
709    """
710   
711    def __new__(cls, instances=None, weight_id=0, **kwargs):
712        self = RuleLearner.__new__(cls, **kwargs)
713        if instances is not None:
714            self.__init__(**kwargs)
715            return self.__call__(instances, weight_id)
716        else:
717            return self
718       
719    def __init__(self, evaluator = RuleEvaluator_Entropy(), beam_width = 5,
720        alpha = 1.0, **kwds):
721        self.__dict__.update(kwds)
722        self.rule_finder = RuleBeamFinder()
723        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width = beam_width)
724        self.rule_finder.evaluator = evaluator
725        self.rule_finder.validator = RuleValidator_LRS(alpha = alpha)
726       
727    def __call__(self, instances, weight=0):
728        supervisedClassCheck(instances)
729       
730        cl = RuleLearner.__call__(self,instances,weight)
731        rules = cl.rules
732        return CN2Classifier(rules, instances, weight)
733
734CN2Learner = deprecated_members({"beamWidth": "beam_width",
735                     "ruleFinder": "rule_finder",
736                     "ruleStopping": "rule_stopping",
737                     "dataStopping": "data_stopping",
738                     "coverAndRemove": "cover_and_remove",
739                     "storeInstances": "store_instances",
740                     "targetClass": "target_class",
741                     "baseRules": "base_rules",
742                     "weightID": "weight_id"})(CN2Learner)
743
744
745class CN2Classifier(RuleClassifier):
746    """
747    Classical CN2 (see Clark and Niblett; 1988) classifies a new instance
748    using an ordered set of rules. Usually the learner
749    (:class:`~Orange.classification.rules.CN2Learner`) is used to construct the
750    classifier.
751   
752    :param rules: learned rules to be used for classification (mandatory).
753    :type rules: :class:`~Orange.classification.rules.RuleList`
754   
755    :param instances: data instances that were used for learning.
756    :type instances: :class:`Orange.data.Table`
757   
758    :param weight_id: ID of the weight meta-attribute.
759    :type weight_id: int
760
761    """
762   
763    @deprecated_keywords({"examples": "instances"})
764    def __init__(self, rules=None, instances=None, weight_id = 0, **argkw):
765        self.rules = rules
766        self.examples = instances
767        self.weight_id = weight_id
768        self.class_var = None if instances is None else instances.domain.class_var
769        self.__dict__.update(argkw)
770        if instances is not None:
771            self.prior = Orange.statistics.distribution.Distribution(instances.domain.class_var,instances)
772
773    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
774        """
775        :param instance: instance to be classified.
776        :type instance: :class:`Orange.data.Instance`
777       
778        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
779              :class:`Orange.classification.Classifier.GetProbabilities` or
780              :class:`Orange.classification.Classifier.GetBoth`
781       
782        :rtype: :class:`Orange.data.Value`,
783              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
784        """
785        classifier = None
786        for r in self.rules:
787         #   r.filter.domain = instance.domain
788            if r(instance) and r.classifier:
789                classifier = r.classifier
790                classifier.defaultDistribution = r.class_distribution
791                break
792        if not classifier:
793            classifier = Orange.classification.ConstantClassifier(instance.domain.class_var,\
794                self.prior.modus())
795            classifier.defaultDistribution = self.prior
796
797        classifier.defaultDistribution.normalize()
798        if result_type == Orange.classification.Classifier.GetValue:
799          return classifier(instance)
800        if result_type == Orange.classification.Classifier.GetProbabilities:
801          return classifier.default_distribution
802        return (classifier(instance),classifier.default_distribution)
803
804    def __str__(self):
805        ret_str = rule_to_string(self.rules[0])+" "+str(self.rules[0].\
806            class_distribution)+"\n"
807        for r in self.rules[1:]:
808            ret_str += "ELSE "+rule_to_string(r)+" "+str(r.class_distribution)+"\n"
809        return ret_str
810
811CN2Classifier = deprecated_members({"resultType": "result_type",
812                                    "beamWidth": "beam_width"})(CN2Classifier)
813
814
815class CN2UnorderedLearner(RuleLearner):
816    """
817    CN2 unordered (see Clark and Boswell; 1991) induces a set of unordered
818    rules - classification from rules does not assume ordering of rules.
819    Learning rules is quite similar to learning in classical CN2, where
820    the process of learning of rules is separated to learning rules for each
821    class.
822   
823    If data instances are provided to the constructor, the learning algorithm
824    is called and the resulting classifier is returned instead of the learner.
825
826    :param evaluator: an object that evaluates a rule from covered instances.
827        By default, Laplace's rule of succession is used as a measure.
828    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
829    :param beam_width: width of the search beam.
830    :type beam_width: int
831    :param alpha: significance level of the likelihood ratio statistics to
832        determine whether rule is better than the default rule.
833    :type alpha: float
834    """
835    def __new__(cls, instances=None, weight_id=0, **kwargs):
836        self = RuleLearner.__new__(cls, **kwargs)
837        if instances is not None:
838            self.__init__(**kwargs)
839            return self.__call__(instances, weight_id)
840        else:
841            return self
842           
843    def __init__(self, evaluator = RuleEvaluator_Laplace(), beam_width = 5,
844        alpha = 1.0, **kwds):
845        self.__dict__.update(kwds)
846        self.rule_finder = RuleBeamFinder()
847        self.rule_finder.ruleFilter = RuleBeamFilter_Width(width = beam_width)
848        self.rule_finder.evaluator = evaluator
849        self.rule_finder.validator = RuleValidator_LRS(alpha = alpha)
850        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha = 1.0)
851        self.rule_stopping = RuleStopping_Apriori()
852        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
853   
854    @deprecated_keywords({"weight": "weight_id"})
855    def __call__(self, instances, weight_id=0):
856        supervisedClassCheck(instances)
857       
858        rules = RuleList()
859        self.rule_stopping.apriori = Orange.statistics.distribution.Distribution(
860            instances.domain.class_var,instances)
861        progress=getattr(self,"progressCallback",None)
862        if progress:
863            progress.start = 0.0
864            progress.end = 0.0
865            distrib = Orange.statistics.distribution.Distribution(
866                instances.domain.class_var, instances, weight_id)
867            distrib.normalize()
868        for target_class in instances.domain.class_var:
869            if progress:
870                progress.start = progress.end
871                progress.end += distrib[target_class]
872            self.target_class = target_class
873            cl = RuleLearner.__call__(self,instances,weight_id)
874            for r in cl.rules:
875                rules.append(r)
876        if progress:
877            progress(1.0,None)
878        return CN2UnorderedClassifier(rules, instances, weight_id)
879
880CN2UnorderedLearner = deprecated_members({"beamWidth": "beam_width",
881                     "ruleFinder": "rule_finder",
882                     "ruleStopping": "rule_stopping",
883                     "dataStopping": "data_stopping",
884                     "coverAndRemove": "cover_and_remove",
885                     "storeInstances": "store_instances",
886                     "targetClass": "target_class",
887                     "baseRules": "base_rules",
888                     "weightID": "weight_id"})(CN2UnorderedLearner)
889
890
891class CN2UnorderedClassifier(RuleClassifier):
892    """
893    CN2 unordered (see Clark and Boswell; 1991) classifies a new instance using
894    a set of unordered rules. Usually the learner
895    (:class:`~Orange.classification.rules.CN2UnorderedLearner`) is used to
896    construct the classifier.
897   
898    :param rules: learned rules to be used for classification (mandatory).
899    :type rules: :class:`~Orange.classification.rules.RuleList`
900   
901    :param instances: data instances that were used for learning.
902    :type instances: :class:`Orange.data.Table`
903   
904    :param weight_id: ID of the weight meta-attribute.
905    :type weight_id: int
906
907    """
908
909    @deprecated_keywords({"examples": "instances"})
910    def __init__(self, rules = None, instances = None, weight_id = 0, **argkw):
911        self.rules = rules
912        self.examples = instances
913        self.weight_id = weight_id
914        self.class_var = instances.domain.class_var if instances is not None else None
915        self.__dict__.update(argkw)
916        if instances is not None:
917            self.prior = Orange.statistics.distribution.Distribution(
918                                instances.domain.class_var, instances)
919
920    @deprecated_keywords({"retRules": "ret_rules"})
921    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue, ret_rules = False):
922        """
923        :param instance: instance to be classified.
924        :type instance: :class:`Orange.data.Instance`
925       
926        :param result_type: :class:`Orange.classification.Classifier.GetValue` or \
927              :class:`Orange.classification.Classifier.GetProbabilities` or
928              :class:`Orange.classification.Classifier.GetBoth`
929       
930        :rtype: :class:`Orange.data.Value`,
931              :class:`Orange.statistics.distribution.Distribution` or a tuple with both
932        """
933        def add(disc1, disc2, sumd):
934            disc = Orange.statistics.distribution.Discrete(disc1)
935            sumdisc = sumd
936            for i,d in enumerate(disc):
937                disc[i]+=disc2[i]
938                sumdisc += disc2[i]
939            return disc, sumdisc
940
941        # create empty distribution
942        retDist = Orange.statistics.distribution.Discrete(self.examples.domain.class_var)
943        covRules = RuleList()
944        # iterate through instances - add distributions
945        sumdisc = 0.
946        for r in self.rules:
947            if r(instance) and r.class_distribution:
948                retDist, sumdisc = add(retDist, r.class_distribution, sumdisc)
949                covRules.append(r)
950        if not sumdisc:
951            retDist = self.prior
952            sumdisc = self.prior.abs
953           
954        if sumdisc > 0.0:
955            for c in self.examples.domain.class_var:
956                retDist[c] /= sumdisc
957        else:
958            retDist.normalize()
959       
960        if ret_rules:
961            if result_type == Orange.classification.Classifier.GetValue:
962              return (retDist.modus(), covRules)
963            if result_type == Orange.classification.Classifier.GetProbabilities:
964              return (retDist, covRules)
965            return (retDist.modus(),retDist,covRules)
966        if result_type == Orange.classification.Classifier.GetValue:
967          return retDist.modus()
968        if result_type == Orange.classification.Classifier.GetProbabilities:
969          return retDist
970        return (retDist.modus(),retDist)
971
972    def __str__(self):
973        retStr = ""
974        for r in self.rules:
975            retStr += rule_to_string(r)+" "+str(r.class_distribution)+"\n"
976        return retStr
977
978
979class CN2SDUnorderedLearner(CN2UnorderedLearner):
980    """
981    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules, which
982    is the same as :class:`~Orange.classification.rules.CN2UnorderedLearner`.
983    The difference between classical CN2 unordered and CN2-SD is selection of
984    specific evaluation function and covering function:
985    :class:`WRACCEvaluator` is used to implement
986    weight-relative accuracy and
987    :class:`CovererAndRemover_MultWeights` avoids
988    excluding covered instances, multiplying their weight by the value of
989    mult parameter instead.
990   
991    If data instances are provided to the constructor, the learning algorithm
992    is called and the resulting classifier is returned instead of the learner.
993
994    :param evaluator: an object that evaluates a rule from covered instances.
995        By default, weighted relative accuracy is used.
996    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
997   
998    :param beam_width: width of the search beam.
999    :type beam_width: int
1000   
1001    :param alpha: significance level of the likelihood ratio statistics to
1002        determine whether rule is better than the default rule.
1003    :type alpha: float
1004   
1005    :param mult: multiplicator for weights of covered instances.
1006    :type mult: float
1007    """
1008    def __new__(cls, instances=None, weight_id=0, **kwargs):
1009        self = CN2UnorderedLearner.__new__(cls, **kwargs)
1010        if instances is not None:
1011            self.__init__(**kwargs)
1012            return self.__call__(instances, weight_id)
1013        else:
1014            return self
1015       
1016    def __init__(self, evaluator = WRACCEvaluator(), beam_width = 5,
1017                alpha = 0.05, mult=0.7, **kwds):
1018        CN2UnorderedLearner.__init__(self, evaluator = evaluator,
1019                                          beam_width = beam_width, alpha = alpha, **kwds)
1020        self.cover_and_remove = CovererAndRemover_MultWeights(mult=mult)
1021
1022    def __call__(self, instances, weight=0):       
1023        supervisedClassCheck(instances)
1024       
1025        oldInstances = Orange.data.Table(instances)
1026        classifier = CN2UnorderedLearner.__call__(self,instances,weight)
1027        for r in classifier.rules:
1028            r.filterAndStore(oldInstances,weight,r.classifier.default_val)
1029        return classifier
1030
1031
1032class ABCN2(RuleLearner):
1033    """
1034    This is an implementation of argument-based CN2 using EVC as evaluation
1035    and LRC classification.
1036   
1037    Rule learning parameters that can be passed to constructor:
1038   
1039    :param width: beam width (default 5).
1040    :type width: int
1041    :param learn_for_class: class for which to learn; None (default) if all
1042       classes are to be learnt.
1043    :param learn_one_rule: decides whether to rule one rule only (default
1044       False).
1045    :type learn_one_rule: boolean
1046    :param analyse_argument: index of argument to analyse; -1 to learn normally
1047       (default)
1048    :type analyse_argument: int
1049    :param debug: sets debug mode - prints some info during execution; False (default)
1050    :type debug: boolean
1051   
1052    The following evaluator related arguments are supported:
1053   
1054    :param m: m for m-estimate to be corrected with EVC (default 2).
1055    :type m: int
1056    :param opt_reduction: type of EVC correction: 0=no correction,
1057       1=pessimistic, 2=normal (default 2).
1058    :type opt_reduction: int
1059    :param nsampling: number of samples in estimating extreme value
1060       distribution for EVC (default 100).
1061    :type nsampling: int
1062    :param evd: pre-given extreme value distributions.
1063    :param evd_arguments: pre-given extreme value distributions for arguments.
1064   
1065    Those parameters control rule validation:
1066   
1067    :param rule_sig: minimal rule significance (default 1.0).
1068    :type rule_sig: float
1069    :param att_sig: minimal attribute significance in rule (default 1.0).
1070    :type att_sig: float
1071    :param max_rule_complexity: maximum number of conditions in rule (default 5).
1072    :type max_rule_complexity: int
1073    :param min_coverage: minimal number of covered instances (default 5).
1074    :type min_coverage: int
1075   
1076    Probabilistic covering can be controlled using:
1077   
1078    :param min_improved: minimal number of instances improved in probabilistic covering (default 1).
1079    :type min_improved: int
1080    :param min_improved_perc: minimal percentage of covered instances that need to be improved (default 0.0).
1081    :type min_improved_perc: float
1082   
1083    Finally, LRC (classifier) related parameters are:
1084   
1085    :param add_sub_rules: decides whether to add sub-rules.
1086    :type add_sub_rules: boolean
1087    :param min_cl_sig: minimal significance of beta in classifier (default 0.5).
1088    :type min_cl_sig: float
1089    :param min_beta: minimal beta value (default 0.0).
1090    :type min_beta: float
1091    :param set_prefix_rules: decides whether ordered prefix rules should be
1092       added (default False).
1093    :type set_prefix_rules: boolean
1094    :param alternative_learner: use rule-learner as a correction method for
1095       other machine learning methods (default None).
1096
1097    """
1098   
1099    def __init__(self, argument_id=0, width=5, m=2, opt_reduction=2, nsampling=100, max_rule_complexity=5,
1100                 rule_sig=1.0, att_sig=1.0, postpruning=None, min_quality=0., min_coverage=1, min_improved=1, min_improved_perc=0.0,
1101                 learn_for_class = None, learn_one_rule = False, evd=None, evd_arguments=None, prune_arguments=False, analyse_argument=-1,
1102                 alternative_learner = None, min_cl_sig = 0.5, min_beta = 0.0, set_prefix_rules = False, add_sub_rules = False, debug=False,
1103                 **kwds):
1104       
1105        # argument ID which is passed to abcn2 learner
1106        self.argument_id = argument_id
1107        # learn for specific class only?       
1108        self.learn_for_class = learn_for_class
1109        # only analysing a specific argument or learning all at once
1110        self.analyse_argument = analyse_argument
1111        # should we learn only one rule?
1112        self.learn_one_rule = learn_one_rule
1113        self.postpruning = postpruning
1114        # rule finder
1115        self.rule_finder = RuleBeamFinder()
1116        self.ruleFilter = RuleBeamFilter_Width(width=width)
1117        self.ruleFilter_arguments = ABBeamFilter(width=width)
1118        if max_rule_complexity - 1 < 0:
1119            max_rule_complexity = 10
1120        self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha = 1.0, min_quality = 0., max_rule_complexity = max_rule_complexity - 1, min_coverage=min_coverage)
1121        self.refiner = RuleBeamRefiner_Selector()
1122        self.refiner_arguments = SelectorAdder(discretizer = Orange.feature.discretization.EntropyDiscretization(forceAttribute = 1,
1123                                                                                           maxNumberOfIntervals = 2))
1124        self.prune_arguments = prune_arguments
1125        # evc evaluator
1126        evdGet = Orange.core.EVDistGetter_Standard()
1127        self.rule_finder.evaluator = RuleEvaluator_mEVC(m=m, evDistGetter = evdGet, min_improved = min_improved, min_improved_perc = min_improved_perc)
1128        self.rule_finder.evaluator.returnExpectedProb = True
1129        self.rule_finder.evaluator.optimismReduction = opt_reduction
1130        self.rule_finder.evaluator.ruleAlpha = rule_sig
1131        self.rule_finder.evaluator.attributeAlpha = att_sig
1132        self.rule_finder.evaluator.validator = RuleValidator_LRS(alpha = 1.0, min_quality = min_quality, min_coverage=min_coverage, max_rule_complexity = max_rule_complexity - 1)
1133
1134        # learn stopping criteria
1135        self.rule_stopping = None
1136        self.data_stopping = RuleDataStoppingCriteria_NoPositives()
1137        # evd fitting
1138        self.evd_creator = EVDFitter(self,n=nsampling)
1139        self.evd = evd
1140        self.evd_arguments = evd_arguments
1141        # classifier
1142        self.add_sub_rules = add_sub_rules
1143        self.classifier = PILAR(alternative_learner = alternative_learner, min_cl_sig = min_cl_sig, min_beta = min_beta, set_prefix_rules = set_prefix_rules)
1144        self.debug = debug
1145        # arbitrary parameters
1146        self.__dict__.update(kwds)
1147
1148
1149    def __call__(self, examples, weight_id=0):
1150        # initialize progress bar
1151        progress=getattr(self,"progressCallback",None)
1152        if progress:
1153            progress.start = 0.0
1154            progress.end = 0.0
1155            distrib = Orange.statistics.distribution.Distribution(
1156                             examples.domain.class_var, examples, weight_id)
1157            distrib.normalize()
1158       
1159        # we begin with an empty set of rules
1160        all_rules = RuleList()
1161
1162        # th en, iterate through all classes and learn rule for each class separately
1163        for cl_i,cl in enumerate(examples.domain.class_var):
1164            if progress:
1165                step = distrib[cl] / 2.
1166                progress.start = progress.end
1167                progress.end += step
1168               
1169            if self.learn_for_class and not self.learn_for_class in [cl,cl_i]:
1170                continue
1171
1172            # rules for this class only
1173            rules = RuleList()
1174
1175            # create dichotomous class
1176            dich_data = self.create_dich_class(examples, cl)
1177
1178            # preparation of the learner (covering, evd, etc.)
1179            self.prepare_settings(dich_data, weight_id, cl_i, progress)
1180
1181            # learn argumented rules first ...
1182            self.turn_ABML_mode(dich_data, weight_id, cl_i)
1183            # first specialize all unspecialized arguments
1184            # dich_data = self.specialise_arguments(dich_data, weight_id)
1185            # comment: specialisation of arguments is within learning of an argumented rule;
1186            #          this is now different from the published algorithm
1187            if progress:
1188                progress.start = progress.end
1189                progress.end += step
1190           
1191            aes = self.get_argumented_examples(dich_data)
1192            aes = self.sort_arguments(aes, dich_data)
1193            while aes:
1194                if self.analyse_argument > -1 and \
1195                   (isinstance(self.analyse_argument, Orange.core.Example) and not Orange.core.Example(dich_data.domain, self.analyse_argument) == aes[0] or \
1196                    isinstance(self.analyse_argument, int) and not dich_data[self.analyse_argument] == aes[0]):
1197                    aes = aes[1:]
1198                    continue
1199                ae = aes[0]
1200                rule = self.learn_argumented_rule(ae, dich_data, weight_id) # target class is always first class (0)
1201                if self.debug and rule:
1202                    print "learned arg rule", Orange.classification.rules.rule_to_string(rule)
1203                elif self.debug:
1204                    print "no rule came out of ", ae
1205                if rule:
1206                    rules.append(rule)
1207                    aes = filter(lambda x: not rule(x), aes)
1208                else:
1209                    aes = aes[1:]
1210                aes = aes[1:]
1211
1212            if not progress and self.debug:
1213                print " arguments finished ... "                   
1214                   
1215            # remove all examples covered by rules
1216            for rule in rules:
1217                dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1218            if progress:
1219                progress(self.remaining_probability(dich_data),None)
1220
1221            # learn normal rules on remaining examples
1222            if self.analyse_argument == -1:
1223                self.turn_normal_mode(dich_data, weight_id, cl_i)
1224                while dich_data:
1225                    # learn a rule
1226                    rule = self.learn_normal_rule(dich_data, weight_id, self.apriori)
1227                    if not rule:
1228                        break
1229                    if self.debug:
1230                        print "rule learned: ", Orange.classification.rules.rule_to_string(rule), rule.quality
1231                    dich_data = self.remove_covered_examples(rule, dich_data, weight_id)
1232                    if progress:
1233                        progress(self.remaining_probability(dich_data),None)
1234                    rules.append(rule)
1235                    if self.learn_one_rule:
1236                        break
1237
1238            # prune unnecessary rules
1239            rules = self.prune_unnecessary_rules(rules, dich_data, weight_id)
1240
1241            if self.add_sub_rules:
1242                rules = self.add_sub_rules_call(rules, dich_data, weight_id)
1243
1244            # restore domain and class in rules, add them to all_rules
1245            for r in rules:
1246                all_rules.append(self.change_domain(r, cl, examples, weight_id))
1247
1248            if progress:
1249                progress(1.0,None)
1250        # create a classifier from all rules       
1251        return self.create_classifier(all_rules, examples, weight_id)
1252
1253    def learn_argumented_rule(self, ae, examples, weight_id):
1254        # prepare roots of rules from arguments
1255        positive_args = self.init_pos_args(ae, examples, weight_id)
1256        if not positive_args: # something wrong
1257            raise "There is a problem with argumented example %s"%str(ae)
1258            return None
1259        negative_args = self.init_neg_args(ae, examples, weight_id)
1260
1261        # set negative arguments in refiner
1262        self.rule_finder.refiner.notAllowedSelectors = negative_args
1263        self.rule_finder.refiner.example = ae
1264        # set arguments to filter
1265        self.rule_finder.ruleFilter.setArguments(examples.domain,positive_args)
1266
1267        # learn a rule
1268        self.rule_finder.evaluator.bestRule = None
1269        self.rule_finder(examples,weight_id, 0, positive_args)
1270       
1271        # return best rule
1272        return self.rule_finder.evaluator.bestRule
1273       
1274    def prepare_settings(self, examples, weight_id, cl_i, progress):
1275        # apriori distribution
1276        self.apriori = Orange.statistics.distribution.Distribution(
1277                                examples.domain.class_var,examples,weight_id)
1278       
1279        # prepare covering mechanism
1280        self.coverAndRemove = CovererAndRemover_Prob(examples, weight_id, 0, self.apriori, self.argument_id)
1281        self.rule_finder.evaluator.probVar = examples.domain.getmeta(self.cover_and_remove.probAttribute)
1282
1283        # compute extreme distributions
1284        # TODO: why evd and evd_this????
1285        if self.rule_finder.evaluator.optimismReduction > 0 and not self.evd:
1286            self.evd_this = self.evd_creator.computeEVD(examples, weight_id, target_class=0, progress = progress)
1287        if self.evd:
1288            self.evd_this = self.evd[cl_i]
1289
1290    def turn_ABML_mode(self, examples, weight_id, cl_i):
1291        # evaluator
1292        if self.rule_finder.evaluator.optimismReduction > 0 and self.argument_id:
1293            if self.evd_arguments:
1294                self.rule_finder.evaluator.evDistGetter.dists = self.evd_arguments[cl_i]
1295            else:
1296                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD_example(examples, weight_id, target_class=0)
1297        # rule refiner
1298        self.rule_finder.refiner = self.refiner_arguments
1299        self.rule_finder.refiner.argument_id = self.argument_id
1300        self.rule_finder.ruleFilter = self.ruleFilter_arguments
1301
1302    def create_dich_class(self, examples, cl):
1303        """
1304        Create dichotomous class.
1305        """
1306        (newDomain, targetVal) = createDichotomousClass(examples.domain, examples.domain.class_var, str(cl), negate=0)
1307        newDomainmetas = newDomain.getmetas()
1308        newDomain.addmeta(Orange.data.new_meta_id(), examples.domain.class_var) # old class as meta
1309        dichData = examples.select(newDomain)
1310        if self.argument_id:
1311            for d in dichData: # remove arguments given to other classes
1312                if not d.getclass() == targetVal:
1313                    d[self.argument_id] = "?"
1314        return dichData
1315
1316    def get_argumented_examples(self, examples):
1317        if not self.argument_id:
1318            return None
1319       
1320        # get argumented examples
1321        return ArgumentFilter_hasSpecial()(examples, self.argument_id, target_class = 0)
1322
1323    def sort_arguments(self, arg_examples, examples):
1324        if not self.argument_id:
1325            return None
1326        evaluateAndSortArguments(examples, self.argument_id)
1327        if len(arg_examples)>0:
1328            # sort examples by their arguments quality (using first argument as it has already been sorted)
1329            sorted = arg_examples.native()
1330            sorted.sort(lambda x,y: -cmp(x[self.argument_id].value.positive_arguments[0].quality,
1331                                         y[self.argument_id].value.positive_arguments[0].quality))
1332            return Orange.data.Table(examples.domain, sorted)
1333        else:
1334            return None
1335
1336    def turn_normal_mode(self, examples, weight_id, cl_i):
1337        # evaluator
1338        if self.rule_finder.evaluator.optimismReduction > 0:
1339            if self.evd:
1340                self.rule_finder.evaluator.evDistGetter.dists = self.evd[cl_i]
1341            else:
1342                self.rule_finder.evaluator.evDistGetter.dists = self.evd_this # self.evd_creator.computeEVD(examples, weight_id, target_class=0)
1343        # rule refiner
1344        self.rule_finder.refiner = self.refiner
1345        self.rule_finder.ruleFilter = self.ruleFilter
1346       
1347    def learn_normal_rule(self, examples, weight_id, apriori):
1348        if hasattr(self.rule_finder.evaluator, "bestRule"):
1349            self.rule_finder.evaluator.bestRule = None
1350        rule = self.rule_finder(examples,weight_id,0,RuleList())
1351        if hasattr(self.rule_finder.evaluator, "bestRule") and self.rule_finder.evaluator.returnExpectedProb:
1352            rule = self.rule_finder.evaluator.bestRule
1353            self.rule_finder.evaluator.bestRule = None
1354        if self.postpruning:
1355            rule = self.postpruning(rule,examples,weight_id,0, aprior)
1356        return rule
1357
1358    def remove_covered_examples(self, rule, examples, weight_id):
1359        nexamples, nweight = self.cover_and_remove(rule,examples,weight_id,0)
1360        return nexamples
1361
1362
1363    def prune_unnecessary_rules(self, rules, examples, weight_id):
1364        return self.cover_and_remove.getBestRules(rules,examples,weight_id)
1365
1366    def change_domain(self, rule, cl, examples, weight_id):
1367        rule.filter = Orange.core.Filter_values(domain = examples.domain,
1368                                        conditions = rule.filter.conditions)
1369        rule.filterAndStore(examples, weight_id, cl)       
1370        if hasattr(rule, "learner") and hasattr(rule.learner, "arg_example"):
1371            rule.learner.arg_example = Orange.data.Instance(examples.domain, rule.learner.arg_example)
1372        return rule
1373
1374    def create_classifier(self, rules, examples, weight_id):
1375        return self.classifier(rules, examples, weight_id)
1376
1377    def add_sub_rules_call(self, rules, examples, weight_id):
1378        apriori = Orange.statistics.distribution.Distribution(
1379                            examples.domain.class_var,examples,weight_id)
1380        new_rules = RuleList()
1381        for r in rules:
1382            new_rules.append(r)
1383
1384        # loop through rules
1385        for r in rules:
1386            tmpList = RuleList()
1387            tmpRle = r.clone()
1388            tmpRle.filter.conditions = r.filter.conditions[:r.requiredConditions] # do not split argument
1389            tmpRle.parentRule = None
1390            tmpRle.filterAndStore(examples,weight_id,r.classifier.default_val)
1391            tmpRle.complexity = 0
1392            tmpList.append(tmpRle)
1393            while tmpList and len(tmpList[0].filter.conditions) <= len(r.filter.conditions):
1394                tmpList2 = RuleList()
1395                for tmpRule in tmpList:
1396                    # evaluate tmpRule
1397                    oldREP = self.rule_finder.evaluator.returnExpectedProb
1398                    self.rule_finder.evaluator.returnExpectedProb = False
1399                    tmpRule.quality = self.rule_finder.evaluator(tmpRule,examples,weight_id,r.classifier.default_val,apriori)
1400                    self.rule_finder.evaluator.returnExpectedProb = oldREP
1401                tmpList.sort(lambda x,y: -cmp(x.quality, y.quality))
1402                tmpList = tmpList[:self.rule_filter.width]
1403                   
1404                for tmpRule in tmpList:
1405                    # if rule not in rules already, add it to the list
1406                    if not True in [Orange.classification.rules.rules_equal(ri,tmpRule) for ri in new_rules] and len(tmpRule.filter.conditions)>0 and tmpRule.quality > apriori[r.classifier.default_val]/apriori.abs:
1407                        new_rules.append(tmpRule)
1408                    # create new tmpRules, set parent Rule, append them to tmpList2
1409                    if not True in [Orange.classification.rules.rules_equal(ri,tmpRule) for ri in new_rules]:
1410                        for c in r.filter.conditions:
1411                            tmpRule2 = tmpRule.clone()
1412                            tmpRule2.parentRule = tmpRule
1413                            tmpRule2.filter.conditions.append(c)
1414                            tmpRule2.filterAndStore(examples,weight_id,r.classifier.default_val)
1415                            tmpRule2.complexity += 1
1416                            if tmpRule2.class_distribution.abs < tmprule.class_distribution.abs:
1417                                tmpList2.append(tmpRule2)
1418                tmpList = tmpList2
1419        return new_rules
1420
1421    def init_pos_args(self, ae, examples, weight_id):
1422        pos_args = RuleList()
1423        # prepare arguments
1424        for p in ae[self.argument_id].value.positive_arguments:
1425            new_arg = Rule(filter=ArgFilter(argument_id = self.argument_id,
1426                                                   filter = self.newFilter_values(p.filter),
1427                                                   arg_example = ae),
1428                                                   complexity = 0)
1429            new_arg.valuesFilter = new_arg.filter.filter
1430            pos_args.append(new_arg)
1431
1432
1433        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1434            old_exp = self.rule_finder.evaluator.returnExpectedProb
1435            self.rule_finder.evaluator.returnExpectedProb = False
1436           
1437        # argument pruning (all or just unfinished arguments)
1438        # if pruning is chosen, then prune arguments if possible
1439        for p in pos_args:
1440            p.filterAndStore(examples, weight_id, 0)
1441            if not p.learner:
1442                p.learner = DefaultLearner(default_value=ae.getclass())
1443            # pruning on: we check on all conditions and take only best
1444            if self.prune_arguments:
1445                allowed_conditions = [c for c in p.filter.conditions]
1446                pruned_conditions = self.prune_arg_conditions(ae, allowed_conditions, examples, weight_id)
1447                p.baseDist = Orange.statistics.distribution.Distribution(examples.domain.classVar, examples, weight_id)
1448                p.filter.conditions = pruned_conditions
1449                p.learner.setattr("arg_length", 0)
1450               
1451            else: # prune only unspecified conditions
1452                spec_conditions = [c for c in p.filter.conditions if not c.unspecialized_condition]
1453                unspec_conditions = [c for c in p.filter.conditions if c.unspecialized_condition]
1454                # let rule cover now all examples filtered by specified conditions
1455                p.filter.conditions = spec_conditions
1456                p.filterAndStore(examples, weight_id, 0)
1457                p.baseDist = p.classDistribution
1458                p.learner.setattr("arg_length", len(p.filter.conditions))
1459                pruned_conditions = self.prune_arg_conditions(ae, unspec_conditions, p.examples, p.weightID)
1460                p.filter.conditions.extend(pruned_conditions)
1461                p.filter.filter.conditions.extend(pruned_conditions)
1462                # if argument does not contain all unspecialized reasons, add those reasons with minimum values
1463                at_oper_pairs = [(c.position, c.oper) for c in p.filter.conditions if type(c) == Orange.core.ValueFilter_continuous]
1464                for u in unspec_conditions:
1465                    if not (u.position, u.oper) in at_oper_pairs:
1466                        # find minimum value
1467                        if u.oper == Orange.core.ValueFilter_continuous.Greater or u.oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1468                            u.ref = min([float(e[u.position])-10. for e in p.examples])
1469                        else:
1470                            u.ref = max([float(e[u.position])+10. for e in p.examples])
1471                        p.filter.conditions.append(u)
1472                        p.filter.filter.conditions.append(u)
1473               
1474        # set parameters to arguments
1475        for p_i,p in enumerate(pos_args):
1476            p.filterAndStore(examples,weight_id,0)
1477            p.filter.domain = examples.domain
1478            p.classifier = p.learner(p.examples, p.weightID)
1479            p.requiredConditions = len(p.filter.conditions)
1480            p.learner.setattr("arg_example", ae)
1481            p.complexity = len(p.filter.conditions)
1482           
1483        if hasattr(self.rule_finder.evaluator, "returnExpectedProb"):
1484            self.rule_finder.evaluator.returnExpectedProb = old_exp
1485
1486        return pos_args
1487
1488    def newFilter_values(self, filter):
1489        newFilter = Orange.core.Filter_values()
1490        newFilter.conditions = filter.conditions[:]
1491        newFilter.domain = filter.domain
1492        newFilter.negate = filter.negate
1493        newFilter.conjunction = filter.conjunction
1494        return newFilter
1495
1496    def init_neg_args(self, ae, examples, weight_id):
1497        return ae[self.argument_id].value.negative_arguments
1498
1499    def remaining_probability(self, examples):
1500        return self.cover_and_remove.covered_percentage(examples)
1501
1502    def prune_arg_conditions(self, crit_example, allowed_conditions, examples, weight_id):
1503        if not allowed_conditions:
1504            return []
1505        cn2_learner = Orange.classification.rules.CN2UnorderedLearner()
1506        cn2_learner.rule_finder = RuleBeamFinder()
1507        cn2_learner.rule_finder.refiner = SelectorArgConditions(crit_example, allowed_conditions)
1508        cn2_learner.rule_finder.evaluator = Orange.classification.rules.MEstimateEvaluator(self.rule_finder.evaluator.m)
1509        rule = cn2_learner.rule_finder(examples,weight_id,0,RuleList())
1510        return rule.filter.conditions
1511
1512ABCN2 = deprecated_members({"beamWidth": "beam_width",
1513                     "ruleFinder": "rule_finder",
1514                     "ruleStopping": "rule_stopping",
1515                     "dataStopping": "data_stopping",
1516                     "coverAndRemove": "cover_and_remove",
1517                     "storeInstances": "store_instances",
1518                     "targetClass": "target_class",
1519                     "baseRules": "base_rules",
1520                     "weightID": "weight_id",
1521                     "argumentID": "argument_id"})(ABCN2)
1522
1523class CN2EVCUnorderedLearner(ABCN2):
1524    """
1525    CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules in a
1526    simmilar manner as
1527    :class:`~Orange.classification.rules.CN2SDUnorderedLearner`. This
1528    implementation uses the EVC rule evaluation.
1529   
1530    If data instances are provided to the constructor, the learning algorithm
1531    is called and the resulting classifier is returned instead of the learner.
1532
1533    :param evaluator: an object that evaluates a rule from covered instances.
1534        By default, weighted relative accuracy is used.
1535    :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator`
1536   
1537    :param beam_width: width of the search beam.
1538    :type beam_width: int
1539   
1540    :param alpha: significance level of the likelihood ratio statistics to
1541        determine whether rule is better than the default rule.
1542    :type alpha: float
1543   
1544    :param mult: multiplicator for weights of covered instances.
1545    :type mult: float
1546    """
1547    def __init__(self, width=5, nsampling=100, rule_sig=1.0, att_sig=1.0,\
1548        min_coverage = 1., max_rule_complexity = 5.):
1549        ABCN2.__init__(self, width=width, nsampling=nsampling,
1550            rule_sig=rule_sig, att_sig=att_sig, min_coverage=int(min_coverage),
1551            max_rule_complexity = int(max_rule_complexity))
1552
1553class DefaultLearner(Orange.core.Learner):
1554    """
1555    Default lerner - returns default classifier with predefined output class.
1556    """
1557    def __init__(self,default_value = None):
1558        self.default_value = default_value
1559    def __call__(self,examples,weight_id=0):
1560        return Orange.classification.majority.ConstantClassifier(self.default_value,defaultDistribution = Orange.core.Distribution(examples.domain.class_var,examples,weight_id))
1561
1562class ABCN2Ordered(ABCN2):
1563    """
1564    Rules learned by ABCN2 are ordered and used as a decision list.
1565    """
1566    def __init__(self, argument_id=0, **kwds):
1567        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1568        self.classifier.set_prefix_rules = True
1569        self.classifier.optimize_betas = False
1570
1571class ABCN2M(ABCN2):
1572    """
1573    Argument based rule learning with m-estimate as evaluation function.
1574    """
1575    def __init__(self, argument_id=0, **kwds):
1576        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1577        self.opt_reduction = 0
1578        self.rule_finder.evaluator.optimismReduction = self.opt_reduction       
1579        self.classifier = CN2UnorderedClassifier
1580
1581class ABCN2MLRC(ABCN2):
1582    """
1583    Argument based rule learning with m-estimate as evaluation function. LRC is used as a classification method.
1584    """
1585    def __init__(self, argument_id=0, **kwds):
1586        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1587        self.opt_reduction = 0
1588        self.rule_finder.evaluator.optimismReduction = self.opt_reduction       
1589
1590class ABCN2_StandardClassification(ABCN2):
1591    """
1592    Argument based rule learning with the original classification technique.
1593    """
1594    def __init__(self, argument_id=0, **kwds):
1595        ABCN2.__init__(self, argument_id=argument_id, **kwds)
1596        self.classifier = CN2UnorderedClassifier       
1597
1598
1599class RuleStopping_Apriori(RuleStoppingCriteria):
1600    def __init__(self, apriori=None):
1601        self.apriori =  None
1602       
1603    def __call__(self,rules,rule,instances,data):
1604        if not self.apriori:
1605            return False
1606        if not type(rule.classifier) == Orange.classification.ConstantClassifier:
1607            return False
1608        ruleAcc = rule.class_distribution[rule.classifier.default_val]/rule.class_distribution.abs
1609        aprioriAcc = self.apriori[rule.classifier.default_val]/self.apriori.abs
1610        if ruleAcc>aprioriAcc:
1611            return False
1612        return True
1613
1614
1615class RuleStopping_SetRules(RuleStoppingCriteria):
1616    def __init__(self,validator):
1617        self.rule_stopping = RuleStoppingCriteria_NegativeDistribution()
1618        self.validator = validator
1619
1620    def __call__(self,rules,rule,instances,data):       
1621        ru_st = self.rule_stopping(rules,rule,instances,data)
1622        if not ru_st:
1623            self.validator.rules.append(rule)
1624        return bool(ru_st)
1625
1626
1627class LengthValidator(RuleValidator):
1628    """ prune rules with more conditions than self.length. """
1629    def __init__(self, length=-1):
1630        self.length = length
1631       
1632    def __call__(self, rule, data, weight_id, target_class, apriori):
1633        if self.length >= 0:
1634            return len(rule.filter.conditions) <= self.length
1635        return True   
1636
1637
1638class NoDuplicatesValidator(RuleValidator):
1639    def __init__(self,alpha=.05,min_coverage=0,max_rule_length=0,rules=RuleList()):
1640        self.rules = rules
1641        self.validator = RuleValidator_LRS(alpha=alpha,\
1642            min_coverage=min_coverage,max_rule_length=max_rule_length)
1643       
1644    def __call__(self, rule, data, weight_id, target_class, apriori):
1645        if rule_in_set(rule,self.rules):
1646            return False
1647        return bool(self.validator(rule,data,weight_id,target_class,apriori))
1648               
1649
1650
1651class RuleClassifier_BestRule(RuleClassifier):
1652    def __init__(self, rules, instances, weight_id = 0, **argkw):
1653        self.rules = rules
1654        self.examples = instances
1655        self.class_var = instances.domain.class_var
1656        self.__dict__.update(argkw)
1657        self.prior = Orange.statistics.distribution.Distribution(
1658                    instances.domain.class_var, instances)
1659
1660    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue):
1661        retDist = Orange.statistics.distribution.Distribution(instance.domain.class_var)
1662        bestRule = None
1663        for r in self.rules:
1664            if r(instance) and (not bestRule or r.quality>bestRule.quality):
1665                for v_i,v in enumerate(instance.domain.class_var):
1666                    retDist[v_i] = r.class_distribution[v_i]
1667                bestRule = r
1668        if not bestRule:
1669            retDist = self.prior
1670        else:
1671            bestRule.used += 1
1672        sumdist = sum(retDist)
1673        if sumdist > 0.0:
1674            for c in self.examples.domain.class_var:
1675                retDist[c] /= sumdisc
1676        else:
1677            retDist.normalize()
1678        # return classifier(instance, result_type=result_type)
1679        if result_type == Orange.classification.Classifier.GetValue:
1680          return retDist.modus()
1681        if result_type == Orange.classification.Classifier.GetProbabilities:
1682          return retDist
1683        return (retDist.modus(),retDist)
1684
1685    def __str__(self):
1686        retStr = ""
1687        for r in self.rules:
1688            retStr += rule_to_string(r)+" "+str(r.class_distribution)+"\n"
1689        return retStr   
1690
1691
1692class CovererAndRemover_MultWeights(RuleCovererAndRemover):
1693    """
1694    Covering and removing of instances using weight multiplication:
1695   
1696    :param mult: weighting multiplication factor
1697    :type mult: float   
1698    """
1699   
1700    def __init__(self, mult = 0.7):
1701        self.mult = mult
1702    def __call__(self, rule, instances, weights, target_class):
1703        if not weights:
1704            weights = Orange.data.new_meta_id()
1705            instances.addMetaAttribute(weights,1.)
1706            instances.domain.addmeta(weights, Orange.data.variable.\
1707                Continuous("weights-"+str(weights)), True)
1708        newWeightsID = Orange.data.new_meta_id()
1709        instances.addMetaAttribute(newWeightsID,1.)
1710        instances.domain.addmeta(newWeightsID, Orange.data.variable.\
1711            Continuous("weights-"+str(newWeightsID)), True)
1712        for instance in instances:
1713            if rule(instance) and instance.getclass() == rule.classifier(\
1714                instance,Orange.classification.Classifier.GetValue):
1715                instance[newWeightsID]=instance[weights]*self.mult
1716            else:
1717                instance[newWeightsID]=instance[weights]
1718        return (instances,newWeightsID)
1719
1720
1721class CovererAndRemover_AddWeights(RuleCovererAndRemover):
1722    """
1723    Covering and removing of instances using weight addition.
1724   
1725    """
1726   
1727    def __call__(self, rule, instances, weights, target_class):
1728        if not weights:
1729            weights = Orange.data.new_meta_id()
1730            instances.addMetaAttribute(weights,1.)
1731            instances.domain.addmeta(weights, Orange.data.variable.\
1732                Continuous("weights-"+str(weights)), True)
1733        try:
1734            coverage = instances.domain.getmeta("Coverage")
1735        except:
1736            coverage = Orange.data.variable.Continuous("Coverage")
1737            instances.domain.addmeta(Orange.data.new_meta_id(),coverage, True)
1738            instances.addMetaAttribute(coverage,0.0)
1739        newWeightsID = Orange.data.new_meta_id()
1740        instances.addMetaAttribute(newWeightsID,1.)
1741        instances.domain.addmeta(newWeightsID, Orange.data.variable.\
1742            Continuous("weights-"+str(newWeightsID)), True)
1743        for instance in instances:
1744            if rule(instance) and instance.getclass() == rule.classifier(instance,\
1745                    Orange.classification.Classifier.GetValue):
1746                try:
1747                    instance[coverage]+=1.0
1748                except:
1749                    instance[coverage]=1.0
1750                instance[newWeightsID]=1.0/(instance[coverage]+1)
1751            else:
1752                instance[newWeightsID]=instance[weights]
1753        return (instances,newWeightsID)
1754
1755
1756class CovererAndRemover_Prob(RuleCovererAndRemover):
1757    """ This class impements probabilistic covering. """
1758    def __init__(self, examples, weight_id, target_class, apriori, argument_id):
1759        self.best_rule = [None]*len(examples)
1760        self.prob_attribute = Orange.data.new_meta_id()
1761        self.apriori_prob = apriori[target_class]/apriori.abs
1762        examples.addMetaAttribute(self.prob_attribute, self.apriori_prob)
1763        examples.domain.addmeta(self.prob_attribute, 
1764            Orange.data.variable.Continuous("Probs"))
1765        self.argument_id = argument_id
1766
1767    def getBestRules(self, current_rules, examples, weight_id):
1768        best_rules = RuleList()
1769        for r_i,r in enumerate(self.best_rule):
1770            if r and not rule_in_set(r,best_rules) and int(examples[r_i].getclass())==int(r.classifier.default_value):
1771                if hasattr(r.learner, "arg_example"):
1772                    setattr(r, "best_example", r.learner.arg_example)
1773                else:
1774                    setattr(r, "best_example", examples[r_i])
1775                best_rules.append(r)
1776        return best_rules
1777       
1778    def __call__(self, rule, examples, weights, target_class):
1779        """ if example has an argument, then the rule must be consistent with the argument. """
1780        example = getattr(rule.learner, "arg_example", None)
1781        for ei, e in enumerate(examples):
1782            if e == example:
1783                e[self.prob_attribute] = 1.0
1784                self.best_rule[ei] = rule
1785            elif rule(e) and rule.quality>e[self.prob_attribute]:
1786                e[self.prob_attribute] = rule.quality+0.001 # 0.001 is added to avoid numerical errors
1787                self.best_rule[ei]=rule
1788        return (examples,weights)
1789
1790    def filter_covers_example(self, example, filter):
1791        filter_indices = RuleCoversArguments.filterIndices(filter)
1792        if filter(example):
1793            try:
1794                if example[self.argument_id].value and len(example[self.argument_id].value.positive_arguments)>0: # example has positive arguments
1795                    # conditions should cover at least one of the positive arguments
1796                    one_arg_covered = False
1797                    for pA in example[self.argument_id].value.positive_arguments:
1798                        arg_covered = [self.condIn(c,filter_indices) for c in pA.filter.conditions]
1799                        one_arg_covered = one_arg_covered or len(arg_covered) == sum(arg_covered) #arg_covered
1800                        if one_arg_covered:
1801                            break
1802                    if not one_arg_covered:
1803                        return False
1804                if example[self.argument_id].value and len(example[self.argument_id].value.negative_arguments)>0: # example has negative arguments
1805                    # condition should not cover neither of negative arguments
1806                    for pN in example[self.argument_id].value.negative_arguments:
1807                        arg_covered = [self.condIn(c, filter_indices) for c in pN.filter.conditions]
1808                        if len(arg_covered)==sum(arg_covered):
1809                            return False
1810            except:
1811                return True
1812            return True
1813        return False
1814
1815    def condIn(self, cond, filter_indices): # is condition in the filter?
1816        condInd = RuleCoversArguments.conditionIndex(cond)
1817        if operator.or_(condInd,filter_indices[cond.position]) == filter_indices[cond.position]:
1818            return True
1819        return False       
1820
1821
1822    def covered_percentage(self, examples):
1823        p = 0.0
1824        for ei, e in enumerate(examples):
1825            p += (e[self.prob_attribute] - self.apriori_prob)/(1.0-self.apriori_prob)
1826        return p/len(examples)
1827
1828
1829
1830
1831@deprecated_keywords({"showDistribution": "show_distribution"})
1832def rule_to_string(rule, show_distribution = True):
1833    """
1834    Write a string presentation of rule in human readable format.
1835   
1836    :param rule: rule to pretty-print.
1837    :type rule: :class:`~Orange.classification.rules.Rule`
1838   
1839    :param show_distribution: determines whether presentation should also
1840        contain the distribution of covered instances
1841    :type show_distribution: bool
1842   
1843    """
1844    def selectSign(oper):
1845        if oper == Orange.core.ValueFilter_continuous.Less:
1846            return "<"
1847        elif oper == Orange.core.ValueFilter_continuous.LessEqual:
1848            return "<="
1849        elif oper == Orange.core.ValueFilter_continuous.Greater:
1850            return ">"
1851        elif oper == Orange.core.ValueFilter_continuous.GreaterEqual:
1852            return ">="
1853        else: return "="
1854
1855    if not rule:
1856        return "None"
1857    conds = rule.filter.conditions
1858    domain = rule.filter.domain
1859   
1860    ret = "IF "
1861    if len(conds)==0:
1862        ret = ret + "TRUE"
1863
1864    for i,c in enumerate(conds):
1865        if i > 0:
1866            ret += " AND "
1867        if type(c) == Orange.core.ValueFilter_discrete:
1868            ret += domain[c.position].name + "=" + str([domain[c.position].\
1869                values[int(v)] for v in c.values])
1870        elif type(c) == Orange.core.ValueFilter_continuous:
1871            ret += domain[c.position].name + selectSign(c.oper) + str(c.ref)
1872    if rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1873            and rule.classifier.default_val:
1874        ret = ret + " THEN "+domain.class_var.name+"="+\
1875        str(rule.classifier.default_value)
1876        if show_distribution:
1877            ret += str(rule.class_distribution)
1878    elif rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\
1879            and type(domain.class_var) == Orange.core.EnumVariable:
1880        ret = ret + " THEN "+domain.class_var.name+"="+\
1881        str(rule.class_distribution.modus())
1882        if show_distribution:
1883            ret += str(rule.class_distribution)
1884    return ret       
1885
1886def supervisedClassCheck(instances):
1887    if not instances.domain.class_var:
1888        raise Exception("Class variable is required!")
1889    if instances.domain.class_var.varType == Orange.core.VarTypes.Continuous:
1890        raise Exception("CN2 requires a discrete class!")
1891   
1892
1893def rule_in_set(rule,rules):
1894    for r in rules:
1895        if rules_equal(rule,r):
1896            return True
1897    return False
1898
1899def rules_equal(rule1,rule2):
1900    if not len(rule1.filter.conditions)==len(rule2.filter.conditions):
1901        return False
1902    for c1 in rule1.filter.conditions:
1903        found=False # find the same condition in the other rule
1904        for c2 in rule2.filter.conditions:
1905            try:
1906                if not c1.position == c2.position: continue # same feature?
1907                if not type(c1) == type(c2): continue # same type of condition
1908                if type(c1) == Orange.core.ValueFilter_discrete:
1909                    if not type(c1.values[0]) == type(c2.values[0]): continue
1910                    if not c1.values[0] == c2.values[0]: continue # same value?
1911                if type(c1) == Orange.core.ValueFilter_continuous:
1912                    if not c1.oper == c2.oper: continue # same operator?
1913                    if not c1.ref == c2.ref: continue #same threshold?
1914                found=True
1915                break
1916            except:
1917                pass
1918        if not found:
1919            return False
1920    return True
1921
1922# Miscellaneous - utility functions
1923def avg(l):
1924    if len(l)==0:
1925        return 0.
1926    return sum(l)/len(l)
1927
1928def var(l):
1929    if len(l)<2:
1930        return 0.
1931    av = avg(l)
1932    vars=[math.pow(li-av,2) for li in l]
1933    return sum(vars)/(len(l)-1)
1934
1935def median(l):
1936    if len(l)==0:
1937        return 0.   
1938    l.sort()
1939    le = len(l)
1940    if le%2 == 1:
1941        return l[(le-1)/2]
1942    else:
1943        return (l[le/2-1]+l[le/2])/2
1944
1945def perc(l,p):
1946    l.sort()
1947    return l[int(math.floor(p*len(l)))]
1948
1949class EVDFitter:
1950    """ Randomizes a dataset and fits an extreme value distribution onto it. """
1951
1952    def __init__(self, learner, n=200, randomseed=100):
1953        self.learner = learner
1954        self.n = n
1955        self.randomseed = randomseed
1956        # initialize random seed to make experiments repeatable
1957        random.seed(self.randomseed)
1958       
1959       
1960    def createRandomDataSet(self, data):
1961        newData = Orange.core.ExampleTable(data)
1962        # shuffle data
1963        cl_num = newData.toNumpy("C")
1964        random.shuffle(cl_num[0][:,0])
1965        clData = Orange.core.ExampleTable(Orange.core.Domain([newData.domain.classVar]),cl_num[0])
1966        for d_i,d in enumerate(newData):
1967            d[newData.domain.classVar] = clData[d_i][newData.domain.classVar]
1968        return newData
1969
1970    def createEVDistList(self, evdList):
1971        l = Orange.core.EVDistList()
1972        for el in evdList:
1973            l.append(Orange.core.EVDist(mu=el[0],beta=el[1],percentiles=el[2]))
1974        return l
1975
1976
1977    # estimated fisher tippett parameters for a set of values given in vals list (+ deciles)
1978    def compParameters(self, vals, oldMi, oldBeta, oldPercs, fixedBeta=False):                   
1979        # compute percentiles
1980        vals.sort()
1981        N = len(vals)
1982        percs = [avg(vals[int(float(N)*i/10):int(float(N)*(i+1)/10)]) for i in range(10)]
1983        if N<10:
1984            return oldMi, oldBeta, percs
1985        if not fixedBeta:
1986            beta = min(2.0, math.sqrt(6*var(vals)/math.pow(math.pi,2)))#min(2.0, max(oldBeta, math.sqrt(6*var(vals)/math.pow(math.pi,2))))
1987        else:
1988            beta = oldBeta
1989        mi = max(oldMi,percs[-1]+beta*math.log(-math.log(0.95)))
1990        mi = percs[-1]+beta*math.log(-math.log(0.95))
1991        return max(oldMi, numpy.average(vals)-beta*0.5772156649), beta, None
1992   
1993    def prepare_learner(self):
1994        self.oldStopper = self.learner.ruleFinder.ruleStoppingValidator
1995        self.evaluator = self.learner.ruleFinder.evaluator
1996        self.refiner = self.learner.ruleFinder.refiner
1997        self.validator = self.learner.ruleFinder.validator
1998        self.ruleFilter = self.learner.ruleFinder.ruleFilter
1999        self.learner.ruleFinder.validator = None
2000        self.learner.ruleFinder.evaluator = Orange.core.RuleEvaluator_LRS()
2001        self.learner.ruleFinder.evaluator.storeRules = True
2002        self.learner.ruleFinder.ruleStoppingValidator = Orange.core.RuleValidator_LRS(alpha=1.0)
2003        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = 0
2004        self.learner.ruleFinder.refiner = Orange.core.RuleBeamRefiner_Selector()
2005        self.learner.ruleFinder.ruleFilter = Orange.core.RuleBeamFilter_Width(width = 5)
2006
2007
2008    def restore_learner(self):
2009        self.learner.ruleFinder.evaluator = self.evaluator
2010        self.learner.ruleFinder.ruleStoppingValidator = self.oldStopper
2011        self.learner.ruleFinder.refiner = self.refiner
2012        self.learner.ruleFinder.validator = self.validator
2013        self.learner.ruleFinder.ruleFilter = self.ruleFilter
2014
2015    def computeEVD(self, data, weightID=0, target_class=0, progress=None):
2016        import time
2017        # prepare learned for distribution computation       
2018        self.prepare_learner()
2019
2020        # loop through N (sampling repetitions)
2021        extremeDists=[(0, 1, [])]
2022        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = self.oldStopper.max_rule_complexity
2023        maxVals = [[] for l in range(self.oldStopper.max_rule_complexity+1)]
2024        for d_i in range(self.n):
2025            if not progress:
2026                if self.learner.debug:
2027                    print d_i,
2028            else:
2029                progress(float(d_i)/self.n, None)               
2030            # create data set (remove and randomize)
2031            a = time.time()
2032            tempData = self.createRandomDataSet(data)
2033            a = time.time()
2034            self.learner.ruleFinder.evaluator.rules = RuleList()
2035            a = time.time()
2036            for l in range(self.oldStopper.max_rule_complexity+2):
2037               self.learner.ruleFinder.evaluator.rules.append(None)
2038            a = time.time()
2039            # Next, learn a rule
2040            self.learner.ruleFinder(tempData,weightID,target_class, RuleList())
2041            a = time.time()
2042            for l in range(self.oldStopper.max_rule_complexity+1):
2043                if self.learner.ruleFinder.evaluator.rules[l]:
2044                    maxVals[l].append(self.learner.ruleFinder.evaluator.rules[l].quality)
2045                else:
2046                    maxVals[l].append(0)
2047##                qs = [r.quality for r in self.learner.ruleFinder.evaluator.rules if r.complexity == l+1]
2048####                if qs:
2049####                    for r in self.learner.ruleFinder.evaluator.rules:
2050####                        if r.quality == max(qs) and r.classDistribution.abs == 16 and r.classDistribution[0] == 16:
2051####                            print "best rule", orngCN2.ruleToString(r), r.quality
2052##                if qs:
2053##                    maxVals[l].append(max(qs))
2054##                else:
2055##                    maxVals[l].append(0)
2056            a = time.time()
2057
2058        # longer rule should always be better than shorter rule
2059        for l in range(self.oldStopper.max_rule_complexity):
2060            for i in range(len(maxVals[l])):
2061                if maxVals[l+1][i] < maxVals[l][i]:
2062                    maxVals[l+1][i] = maxVals[l][i]
2063##        print
2064##        for mi, m in enumerate(maxVals):
2065##            print "mi=",mi,m
2066
2067        mu, beta, perc = 1.0, 2.0, [0.0]*10
2068        for mi,m in enumerate(maxVals):
2069##            if mi == 0:
2070##                mu, beta, perc = self.compParameters(m, mu, beta, perc)
2071##            else:
2072            mu, beta, perc = self.compParameters(m, mu, beta, perc, fixedBeta=True)
2073            extremeDists.append((mu, beta, perc))
2074            extremeDists.extend([(0,1,[])]*(mi))
2075            if self.learner.debug:
2076                print mi, mu, beta, perc
2077
2078        self.restore_learner()
2079        return self.createEVDistList(extremeDists)
2080       
2081class ABBeamFilter(Orange.core.RuleBeamFilter):
2082    """
2083    ABBeamFilter: Filters beam;
2084        - leaves first N rules (by quality)
2085        - leaves first N rules that have only of arguments in condition part
2086    """
2087    def __init__(self,width=5):
2088        self.width=width
2089        self.pArgs=None
2090
2091    def __call__(self,rulesStar,examples,weight_id):
2092        newStar=Orange.core.RuleList()
2093        rulesStar.sort(lambda x,y: -cmp(x.quality,y.quality))
2094        argsNum=0
2095        for r_i,r in enumerate(rulesStar):
2096            if r_i<self.width: # either is one of best "width" rules
2097                newStar.append(r)
2098            elif self.onlyPositives(r):
2099                if argsNum<self.width:
2100                    newStar.append(r)
2101                    argsNum+=1
2102        return newStar               
2103
2104    def setArguments(self,domain,positive_arguments):
2105        self.pArgs = positive_arguments
2106        self.domain = domain
2107        self.argTab = [0]*len(self.domain.attributes)
2108        for arg in self.pArgs:
2109            for cond in arg.filter.conditions:
2110                self.argTab[cond.position]=1
2111       
2112    def onlyPositives(self,rule):
2113        if not self.pArgs:
2114            return False
2115
2116        ruleTab=[0]*len(self.domain.attributes)
2117        for cond in rule.filter.conditions:
2118            ruleTab[cond.position]=1
2119        return map(operator.or_,ruleTab,self.argTab)==self.argTab
2120
2121
2122class RuleCoversArguments:
2123    """
2124    Class determines if rule covers one out of a set of arguments.
2125    """
2126    def __init__(self, arguments):
2127        self.arguments = arguments
2128        self.indices = []
2129        for a in self.arguments:
2130            indNA = getattr(a.filter,"indices",None)
2131            if not indNA:
2132                a.filter.setattr("indices", RuleCoversArguments.filterIndices(a.filter))
2133            self.indices.append(a.filter.indices)
2134
2135    def __call__(self, rule):
2136        if not self.indices:
2137            return False
2138        if not getattr(rule.filter,"indices",None):
2139            rule.filter.indices = RuleCoversArguments.filterIndices(rule.filter)
2140        for index in self.indices:
2141            if map(operator.or_,rule.filter.indices,index) == rule.filter.indices:
2142                return True
2143        return False
2144
2145    def filterIndices(filter):
2146        if not filter.domain:
2147            return []
2148        ind = [0]*len(filter.domain.attributes)
2149        for c in filter.conditions:
2150            ind[c.position]=operator.or_(ind[c.position],
2151                                         RuleCoversArguments.conditionIndex(c))
2152        return ind
2153    filterIndices = staticmethod(filterIndices)
2154
2155    def conditionIndex(c):
2156        if type(c) == Orange.core.ValueFilter_continuous:
2157            if (c.oper == Orange.core.ValueFilter_continuous.GreaterEqual or
2158                c.oper == Orange.core.ValueFilter_continuous.Greater):
2159                return 5# 0101
2160            elif (c.oper == Orange.core.ValueFilter_continuous.LessEqual or
2161                  c.oper == Orange.core.ValueFilter_continuous.Less):
2162                return 3 # 0011
2163            else:
2164                return c.oper
2165        else:
2166            return 1 # 0001
2167    conditionIndex = staticmethod(conditionIndex)       
2168
2169    def oneSelectorToCover(ruleIndices, argIndices):
2170        at, type = -1, 0
2171        for r_i, ind in enumerate(ruleIndices):
2172            if not argIndices[r_i]:
2173                continue
2174            if at>-1 and not ind == argIndices[r_i]: # need two changes
2175                return (-1,0)
2176            if not ind == argIndices[r_i]:
2177                if argIndices[r_i] in [1,3,5]:
2178                    at,type=r_i,argIndices[r_i]
2179                if argIndices[r_i]==6:
2180                    if ind==3:
2181                        at,type=r_i,5
2182                    if ind==5:
2183                        at,type=r_i,3
2184        return at,type
2185    oneSelectorToCover = staticmethod(oneSelectorToCover)
2186
2187
2188class SelectorAdder(Orange.core.RuleBeamRefiner):
2189    """
2190    Selector adder, this function is a refiner function:
2191       - refined rules are not consistent with any of negative arguments.
2192    """
2193    def __init__(self, example=None, not_allowed_selectors=[], argument_id = None,
2194                 discretizer = Orange.core.EntropyDiscretization(forceAttribute=True)):
2195        # required values - needed values of attributes
2196        self.example = example
2197        self.argument_id = argument_id
2198        self.not_allowed_selectors = not_allowed_selectors
2199        self.discretizer = discretizer
2200       
2201    def __call__(self, oldRule, data, weight_id, target_class=-1):
2202        inNotAllowedSelectors = RuleCoversArguments(self.not_allowed_selectors)
2203        new_rules = Orange.core.RuleList()
2204
2205        # get positive indices (selectors already in the rule)
2206        indices = getattr(oldRule.filter,"indices",None)
2207        if not indices:
2208            indices = RuleCoversArguments.filterIndices(oldRule.filter)
2209            oldRule.filter.setattr("indices",indices)
2210
2211        # get negative indices (selectors that should not be in the rule)
2212        negative_indices = [0]*len(data.domain.attributes)
2213        for nA in self.not_allowed_selectors:
2214            #print indices, nA.filter.indices
2215            at_i,type_na = RuleCoversArguments.oneSelectorToCover(indices, nA.filter.indices)
2216            if at_i>-1:
2217                negative_indices[at_i] = operator.or_(negative_indices[at_i],type_na)
2218
2219        #iterate through indices = attributes
2220        for i,ind in enumerate(indices):
2221            if not self.example[i] or self.example[i].isSpecial():
2222                continue
2223            if ind == 1: 
2224                continue
2225            if data.domain[i].varType == Orange.core.VarTypes.Discrete and not negative_indices[i]==1: # DISCRETE attribute
2226                if self.example:
2227                    values = [self.example[i]]
2228                else:
2229                    values = data.domain[i].values
2230                for v in values:
2231                    tempRule = oldRule.clone()
2232                    tempRule.filter.conditions.append(Orange.core.ValueFilter_discrete(position = i,
2233                                                                                  values = [Orange.core.Value(data.domain[i],v)],
2234                                                                                  acceptSpecial=0))
2235                    tempRule.complexity += 1
2236                    tempRule.filter.indices[i] = 1 # 1 stands for discrete attribute (see RuleCoversArguments.conditionIndex)
2237                    tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2238                    if len(tempRule.examples)<len(oldRule.examples):
2239                        new_rules.append(tempRule)
2240            elif data.domain[i].varType == Orange.core.VarTypes.Continuous and not negative_indices[i]==7: # CONTINUOUS attribute
2241                try:
2242                    at = data.domain[i]
2243                    at_d = self.discretizer(at,oldRule.examples)
2244                except:
2245                    continue # discretization failed !
2246                # If discretization makes sense? then:
2247                if len(at_d.values)>1:
2248                    for p in at_d.getValueFrom.transformer.points:
2249                        #LESS
2250                        if not negative_indices[i]==3:
2251                            tempRule = self.getTempRule(oldRule,i,Orange.core.ValueFilter_continuous.LessEqual,p,target_class,3)
2252                            if len(tempRule.examples)<len(oldRule.examples) and self.example[i]<=p:# and not inNotAllowedSelectors(tempRule):
2253                                new_rules.append(tempRule)
2254                        #GREATER
2255                        if not negative_indices[i]==5:
2256                            tempRule = self.getTempRule(oldRule,i,Orange.core.ValueFilter_continuous.Greater,p,target_class,5)
2257                            if len(tempRule.examples)<len(oldRule.examples) and self.example[i]>p:# and not inNotAllowedSelectors(tempRule):
2258                                new_rules.append(tempRule)
2259        for r in new_rules:
2260            r.parentRule = oldRule
2261            r.valuesFilter = r.filter.filter
2262        return new_rules
2263
2264    def getTempRule(self,oldRule,pos,oper,ref,target_class,atIndex):
2265        tempRule = oldRule.clone()
2266
2267        tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=pos,
2268                                                                        oper=oper,
2269                                                                        ref=ref,
2270                                                                        acceptSpecial=0))
2271        tempRule.complexity += 1
2272        tempRule.filter.indices[pos] = operator.or_(tempRule.filter.indices[pos],atIndex) # from RuleCoversArguments.conditionIndex
2273        tempRule.filterAndStore(oldRule.examples,tempRule.weightID,target_class)
2274        return tempRule
2275
2276    def setCondition(self, oldRule, target_class, ci, condition):
2277        tempRule = oldRule.clone()
2278        tempRule.filter.conditions[ci] = condition
2279        tempRule.filter.conditions[ci].setattr("specialized",1)
2280        tempRule.filterAndStore(oldRule.examples,oldRule.weightID,target_class)
2281        return tempRule
2282
2283SelectorAdder = deprecated_members({"notAllowedSelectors": "not_allowed_selectors",
2284                     "argumentID": "argument_id"})(SelectorAdder)
2285
2286# This filter is the ugliest code ever! Problem is with Orange, I had some problems with inheriting deepCopy
2287# I should take another look at it.
2288class ArgFilter(Orange.core.Filter):
2289    """ This class implements AB-covering principle. """
2290    def __init__(self, argument_id=None, filter = Orange.core.Filter_values(), arg_example = None):
2291        self.filter = filter
2292        self.indices = getattr(filter,"indices",[])
2293        if not self.indices and len(filter.conditions)>0:
2294            self.indices = RuleCoversArguments.filterIndices(filter)
2295        self.argument_id = argument_id
2296        self.domain = self.filter.domain
2297        self.conditions = filter.conditions
2298        self.arg_example = arg_example
2299       
2300    def condIn(self,cond): # is condition in the filter?
2301        condInd = ruleCoversArguments.conditionIndex(cond)
2302        if operator.or_(condInd,self.indices[cond.position]) == self.indices[cond.position]:
2303            return True
2304        return False
2305   
2306    def __call__(self,example):
2307##        print "in", self.filter(example)#, self.filter.conditions[0](example)
2308##        print self.filter.conditions[1].values
2309        if self.filter(example) and example != self.arg_example:
2310            return True
2311        elif self.filter(example):
2312            try:
2313                if example[self.argument_id].value and len(example[self.argument_id].value.positiveArguments)>0: # example has positive arguments
2314                    # conditions should cover at least one of the positive arguments
2315                    oneArgCovered = False
2316                    for pA in example[self.argument_id].value.positiveArguments:
2317                        argCovered = [self.condIn(c) for c in pA.filter.conditions]
2318                        oneArgCovered = oneArgCovered or len(argCovered) == sum(argCovered) #argCovered
2319                        if oneArgCovered:
2320                            break
2321                    if not oneArgCovered:
2322                        return False
2323                if example[self.argument_id].value and len(example[self.argument_id].value.negativeArguments)>0: # example has negative arguments
2324                    # condition should not cover neither of negative arguments
2325                    for pN in example[self.argument_id].value.negativeArguments:
2326                        argCovered = [self.condIn(c) for c in pN.filter.conditions]
2327                        if len(argCovered)==sum(argCovered):
2328                            return False
2329            except:
2330                return True
2331            return True
2332        else:
2333            return False
2334
2335    def __setattr__(self,name,obj):
2336        self.__dict__[name]=obj
2337        self.filter.setattr(name,obj)
2338
2339    def deep_copy(self):
2340        newFilter = ArgFilter(argument_id=self.argument_id)
2341        newFilter.filter = Orange.core.Filter_values() #self.filter.deepCopy()
2342        newFilter.filter.conditions = self.filter.conditions[:]
2343        newFilter.domain = self.filter.domain
2344        newFilter.negate = self.filter.negate
2345        newFilter.conjunction = self.filter.conjunction
2346        newFilter.domain = self.filter.domain
2347        newFilter.conditions = newFilter.filter.conditions
2348        newFilter.indices = self.indices[:]
2349        return newFilter
2350
2351ArgFilter = deprecated_members({"argumentID": "argument_id"})(ArgFilter)
2352
2353class SelectorArgConditions(Orange.core.RuleBeamRefiner):
2354    """
2355    Selector adder, this function is a refiner function:
2356      - refined rules are not consistent with any of negative arguments.
2357    """
2358    def __init__(self, example, allowed_selectors):
2359        # required values - needed values of attributes
2360        self.example = example
2361        self.allowed_selectors = allowed_selectors
2362
2363    def __call__(self, oldRule, data, weight_id, target_class=-1):
2364        if len(oldRule.filter.conditions) >= len(self.allowed_selectors):
2365            return Orange.core.RuleList()
2366        new_rules = Orange.core.RuleList()
2367        for c in self.allowed_selectors:
2368            # normal condition
2369            if not c.unspecialized_condition:
2370                tempRule = oldRule.clone()
2371                tempRule.filter.conditions.append(c)
2372                tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2373                if len(tempRule.examples)<len(oldRule.examples):
2374                    new_rules.append(tempRule)
2375            # unspecified condition
2376            else:
2377                # find all possible example values
2378                vals = {}
2379                for e in oldRule.examples:
2380                    if not e[c.position].isSpecial():
2381                        vals[str(e[c.position])] = 1
2382                values = vals.keys()
2383                # for each value make a condition
2384                for v in values:
2385                    tempRule = oldRule.clone()
2386                    tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=c.position,
2387                                                                                    oper=c.oper,
2388                                                                                    ref=float(v),
2389                                                                                    acceptSpecial=0))
2390                    if tempRule(self.example):
2391                        tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class)
2392                        if len(tempRule.examples)<len(oldRule.examples):
2393                            new_rules.append(tempRule)
2394##        print " NEW RULES "
2395##        for r in new_rules:
2396##            print Orange.classification.rules.rule_to_string(r)
2397        for r in new_rules:
2398            r.parentRule = oldRule
2399##            print Orange.classification.rules.rule_to_string(r)
2400        return new_rules
2401
2402
2403class CrossValidation:
2404    def __init__(self, folds=5, random_generator = 150):
2405        self.folds = folds
2406        self.random_generator = random_generator
2407
2408    def __call__(self, learner, examples, weight):
2409        res = orngTest.crossValidation([learner], (examples, weight), folds = self.folds, random_generator = self.random_generator)
2410        return self.get_prob_from_res(res, examples)
2411
2412    def get_prob_from_res(self, res, examples):
2413        prob_dist = Orange.core.DistributionList()
2414        for tex in res.results:
2415            d = Orange.core.Distribution(examples.domain.class_var)
2416            for di in range(len(d)):
2417                d[di] = tex.probabilities[0][di]
2418            prob_dist.append(d)
2419        return prob_dist
2420
2421
2422class PILAR:
2423    """
2424    PILAR (Probabilistic improvement of learning algorithms with rules).
2425    """
2426    def __init__(self, alternative_learner = None, min_cl_sig = 0.5, min_beta = 0.0, set_prefix_rules = False, optimize_betas = True):
2427        self.alternative_learner = alternative_learner
2428        self.min_cl_sig = min_cl_sig
2429        self.min_beta = min_beta
2430        self.set_prefix_rules = set_prefix_rules
2431        self.optimize_betas = optimize_betas
2432        self.selected_evaluation = CrossValidation(folds=5)
2433
2434    def __call__(self, rules, examples, weight=0):
2435        rules = self.add_null_rule(rules, examples, weight)
2436        if self.alternative_learner:
2437            prob_dist = self.selected_evaluation(self.alternative_learner, examples, weight)
2438            classifier = self.alternative_learner(examples,weight)
2439##            prob_dist = Orange.core.DistributionList()
2440##            for e in examples:
2441##                prob_dist.append(classifier(e,Orange.core.GetProbabilities))
2442            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas, classifier, prob_dist)
2443        else:
2444            cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas)
2445
2446##        print "result"
2447        for ri,r in enumerate(cl.rules):
2448            cl.rules[ri].setattr("beta",cl.ruleBetas[ri])
2449##            if cl.ruleBetas[ri] > 0:
2450##                print Orange.classification.rules.rule_to_string(r), r.quality, cl.ruleBetas[ri]
2451        cl.all_rules = cl.rules
2452        cl.rules = self.sort_rules(cl.rules)
2453        cl.ruleBetas = [r.beta for r in cl.rules]
2454        cl.setattr("data", examples)
2455        return cl
2456
2457    def add_null_rule(self, rules, examples, weight):
2458        for cl in examples.domain.class_var:
2459            tmpRle = Orange.core.Rule()
2460            tmpRle.filter = Orange.core.Filter_values(domain = examples.domain)
2461            tmpRle.parentRule = None
2462            tmpRle.filterAndStore(examples,weight,int(cl))
2463            tmpRle.quality = tmpRle.class_distribution[int(cl)]/tmpRle.class_distribution.abs
2464            rules.append(tmpRle)
2465        return rules
2466       
2467    def sort_rules(self, rules):
2468        new_rules = Orange.core.RuleList()
2469        foundRule = True
2470        while foundRule:
2471            foundRule = False
2472            bestRule = None
2473            for r in rules:
2474                if r in new_rules:
2475                    continue
2476                if r.beta < 0.01 and r.beta > -0.01:
2477                    continue
2478                if not bestRule:
2479                    bestRule = r
2480                    foundRule = True
2481                    continue
2482                if len(r.filter.conditions) < len(bestRule.filter.conditions):
2483                    bestRule = r
2484                    foundRule = True
2485                    continue
2486                if len(r.filter.conditions) ==  len(bestRule.filter.conditions) and r.beta > bestRule.beta:
2487                    bestRule = r
2488                    foundRule = True
2489                    continue
2490            if bestRule:
2491                new_rules.append(bestRule)
2492        return new_rules
2493
2494PILAR = deprecated_members({"sortRules": "sort_rules"})(PILAR)
2495
2496
2497class RuleClassifier_bestRule(Orange.core.RuleClassifier):
2498    """
2499    A very simple classifier, it takes the best rule of each class and
2500    normalizes probabilities.
2501    """
2502    def __init__(self, rules, examples, weight_id = 0, **argkw):
2503        self.rules = rules
2504        self.examples = examples
2505        self.apriori = Orange.core.Distribution(examples.domain.class_var,examples,weight_id)
2506        self.apriori_prob = [a/self.apriori.abs for a in self.apriori]
2507        self.weight_id = weight_id
2508        self.__dict__.update(argkw)
2509        self.default_class_index = -1
2510
2511    @deprecated_keywords({"retRules": "ret_rules"})
2512    def __call__(self, example, result_type=Orange.classification.Classifier.GetValue, ret_rules = False):
2513        example = Orange.core.Example(self.examples.domain,example)
2514        tempDist = Orange.core.Distribution(example.domain.class_var)
2515        best_rules = [None]*len(example.domain.class_var.values)
2516
2517        for r in self.rules:
2518            if r(example) and not self.default_class_index == int(r.classifier.default_val) and \
2519               (not best_rules[int(r.classifier.default_val)] or r.quality>tempDist[r.classifier.default_val]):
2520                tempDist[r.classifier.default_val] = r.quality
2521                best_rules[int(r.classifier.default_val)] = r
2522        for b in best_rules:
2523            if b:
2524                used = getattr(b,"used",0.0)
2525                b.setattr("used",used+1)
2526        nonCovPriorSum = sum([tempDist[i] == 0. and self.apriori_prob[i] or 0. for i in range(len(self.apriori_prob))])
2527        if tempDist.abs < 1.:
2528            residue = 1. - tempDist.abs
2529            for a_i,a in enumerate(self.apriori_prob):
2530                if tempDist[a_i] == 0.:
2531                    tempDist[a_i]=self.apriori_prob[a_i]*residue/nonCovPriorSum
2532            final_dist = tempDist #Orange.core.Distribution(example.domain.class_var)
2533        else:
2534            tempDist.normalize() # prior probability
2535            tmp_examples = Orange.core.ExampleTable(self.examples)
2536            for r in best_rules:
2537                if r:
2538                    tmp_examples = r.filter(tmp_examples)
2539            tmpDist = Orange.core.Distribution(tmp_examples.domain.class_var,tmp_examples,self.weight_id)
2540            tmpDist.normalize()
2541            probs = [0.]*len(self.examples.domain.class_var.values)
2542            for i in range(len(self.examples.domain.class_var.values)):
2543                probs[i] = tmpDist[i]+tempDist[i]*2
2544            final_dist = Orange.core.Distribution(self.examples.domain.class_var)
2545            for cl_i,cl in enumerate(self.examples.domain.class_var):
2546                final_dist[cl] = probs[cl_i]
2547            final_dist.normalize()
2548               
2549        if ret_rules: # Do you want to return rules with classification?
2550            if result_type == Orange.classification.Classifier.GetValue:
2551              return (final_dist.modus(),best_rules)
2552            if result_type == Orange.core.GetProbabilities:
2553              return (final_dist, best_rules)
2554            return (final_dist.modus(),final_dist, best_rules)
2555        if result_type == Orange.classification.Classifier.GetValue:
2556          return final_dist.modus()
2557        if result_type == Orange.core.GetProbabilities:
2558          return final_dist
2559        return (final_dist.modus(),final_dist)
2560   
2561RuleClassifier_bestRule = deprecated_members({"defaultClassIndex": "default_class_index"})(RuleClassifier_bestRule)
Note: See TracBrowser for help on using the repository browser.