source: orange/docs/reference/rst/Orange.classification.rules.rst @ 10378:4d682c357ada

Revision 10378:4d682c357ada, 19.7 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

More fixes due to refactoring of classification.rules

Line 
1.. py:currentmodule:: Orange.classification.rules
2
3.. index:: rule induction
4
5.. index:: 
6   single: classification; rule induction
7
8**************************
9Rule induction (``rules``)
10**************************
11
12Module ``rules`` implements supervised rule induction algorithms and
13rule-based classification methods. Rule induction is based on a
14comprehensive framework of components that can be modified or
15replaced. For ease of use, the module already provides multiple
16variations of `CN2 induction algorithm
17<http://www.springerlink.com/content/k6q2v76736w5039r/>`_.
18
19CN2 algorithm
20=============
21
22.. index:: 
23   single: classification; CN2
24
25The use of rule learning algorithms is consistent with a typical
26learner usage in Orange:
27
28:download:`rules-cn2.py <code/rules-cn2.py>`
29
30.. literalinclude:: code/rules-cn2.py
31    :lines: 7-
32
33::
34   
35    IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000>
36    IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000>
37    IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000>
38    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000>
39    IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000>
40    IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000>
41    IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000>
42    IF status=['second'] THEN survived=yes<13.000, 80.000>
43    IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000>
44    IF status=['crew'] THEN survived=no<670.000, 192.000>
45    IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000>
46    IF sex=['male'] THEN survived=no<118.000, 57.000>
47    IF age=['child'] THEN survived=no<17.000, 14.000>
48    IF TRUE THEN survived=no<89.000, 76.000>
49   
50.. autoclass:: Orange.classification.rules.CN2Learner(evaluator=Evaluator_Entropy, beam_width=5, alpha=1)
51   :members:
52   :show-inheritance:
53   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
54      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
55   
56.. autoclass:: Orange.classification.rules.CN2Classifier
57   :members:
58   :show-inheritance:
59   :exclude-members: beamWidth, resultType
60   
61.. index:: unordered CN2
62
63.. index:: 
64   single: classification; unordered CN2
65
66.. autoclass:: Orange.classification.rules.CN2UnorderedLearner(evaluator=Evaluator_Laplace(), beam_width=5, alpha=1.0)
67   :members:
68   :show-inheritance:
69   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
70      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
71   
72.. autoclass:: Orange.classification.rules.CN2UnorderedClassifier
73   :members:
74   :show-inheritance:
75   
76.. index:: CN2-SD
77.. index:: subgroup discovery
78
79.. index:: 
80   single: classification; CN2-SD
81   
82.. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner(evaluator=WRACCEvaluator(), beam_width=5, alpha=0.05, mult=0.7)
83   :members:
84   :show-inheritance:
85   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
86      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
87   
88.. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner
89   :members:
90   :show-inheritance:
91   
92
93..
94    This part is commented out since
95    - there is no documentation on how to provide arguments
96    - the whole thing is represent original research work particular to
97      a specific project and belongs to an
98      extension rather than to the main package
99
100    Argument based CN2
101    ==================
102
103    Orange also supports argument-based CN2 learning.
104
105    .. autoclass:: Orange.classification.rules.ABCN2
106       :members:
107       :show-inheritance:
108       :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
109      ruleFinder, ruleStopping, storeInstances, targetClass, weightID,
110      argument_id
111
112       This class has many more undocumented methods; see the source code for
113       reference.
114
115    .. autoclass:: Orange.classification.rules.ABCN2Ordered
116       :members:
117       :show-inheritance:
118
119    .. autoclass:: Orange.classification.rules.ABCN2M
120       :members:
121       :show-inheritance:
122       :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping,
123      ruleFinder, ruleStopping, storeInstances, targetClass, weightID
124
125    Thismodule has many more undocumented argument-based learning
126    related classed; see the source code for reference.
127
128    References
129    ----------
130
131    * Bratko, Mozina, Zabkar. `Argument-Based Machine Learning
132      <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in
133      Computer Science: vol. 4203/2006, 11-17, 2006.
134
135
136Rule induction framework
137========================
138
139The classes described above are based on a more general framework that
140can be fine-tuned to specific needs by replacing individual components.
141Here is an example:
142
143part of :download:`rules-customized.py <code/rules-customized.py>`
144
145.. literalinclude:: code/rules-customized.py
146    :lines: 7-17
147
148::
149
150    IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000>
151    IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000>
152    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000>
153    IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000>
154    IF status=['second'] THEN survived=yes<13.000, 104.000>
155    IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000>
156    IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000>
157    IF status=['crew'] THEN survived=yes<3.000, 20.000>
158    IF sex=['female'] THEN survived=no<106.000, 90.000>
159    IF TRUE THEN survived=yes<0.000, 5.000>
160
161In the example, we wanted to use a rule evaluator based on the
162m-estimate and set ``m`` to 50. The evaluator is a subcomponent of the
163:obj:`rule_finder` component. Thus, to be able to set the evaluator,
164we first set the :obj:`rule_finder` component, then we added the
165desired subcomponent and set its options. All other components, which
166are left unspecified, are provided by the learner at the training time
167and removed afterwards.
168
169Continuing with the example, assume that we wish to set a different
170validation function and a different beam width.
171
172part of :download:`rules-customized.py <code/rules-customized.py>`
173
174.. literalinclude:: code/rules-customized.py
175    :lines: 19-23
176
177.. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1)
178   
179   Represents a single rule. Constructor arguments correspond to the
180   first seven of the attributes (from :obj:`filter` to
181   :obj:`quality`) below.
182   
183   .. attribute:: filter
184   
185      Rule condition; an instance of
186      :class:`Orange.data.filter.Filter`, typically an instance of a
187      class derived from :class:`Orange.data.filter.Values`
188   
189   .. attribute:: classifier
190     
191      A rule predicts the class by calling an embedded classifier that
192      must be an instance of
193      :class:`~Orange.classification.Classifier`, typically
194      :class:`~Orange.classification.ConstantClassifier`. This
195      classifier is called by the rule classifier, such as
196      :obj:`RuleClassifier`.
197   
198   .. attribute:: learner
199     
200      Learner that is used for constructing a classifier. It must be
201      an instance of :class:`~Orange.classification.Learner`,
202      typically
203      :class:`~Orange.classification.majority.MajorityLearner`.
204   
205   .. attribute:: class_distribution
206     
207      Distribution of class in data instances covered by this rule
208      (:class:`~Orange.statistics.distribution.Distribution`).
209   
210   .. attribute:: instances
211     
212      Data instances covered by this rule (:class:`~Orange.data.Table`).
213   
214   .. attribute:: weight_id
215   
216      ID of the weight meta-attribute for the stored data instances
217      (``int``).
218   
219   .. attribute:: quality
220     
221      Quality of the rule. Rules with higher quality are better
222      (``float``).
223   
224   .. attribute:: complexity
225   
226      Complexity of the rule (``float``), typically the number of
227      selectors (conditions) in the rule. Complexity is used for
228      choosing between rules with the same quality; rules with lower
229      complexity are preferred.
230   
231   .. method:: filter_and_store(instances, weight_id=0, target_class=-1)
232   
233      Filter passed data instances and store them in :obj:`instances`.
234      Also, compute :obj:`class_distribution`, set the weight of
235      stored examples and create a new classifier using :obj:`learner`.
236     
237      :param weight_id: ID of the weight meta-attribute.
238      :type weight_id: int
239      :param target_class: index of target class; -1 for all classes.
240      :type target_class: int
241   
242   .. method:: __call__(instance)
243   
244      Return ``True`` if the given instance matches the rule condition.
245     
246      :param instance: data instance
247      :type instance: :class:`Orange.data.Instance`
248     
249   .. method:: __call__(instances, ref=True, negate=False)
250
251      Return a table of instances that match the rule condition.
252     
253      :param instances: a table of data instances
254      :type instances: :class:`Orange.data.Table`
255      :param ref: if ``True`` (default), the constructed table contains
256          references to the original data instances; if ``False``, the
257          data is copied
258      :type ref: bool
259      :param negate: inverts the selection
260      :type negate: bool
261
262
263
264.. py:class:: Orange.classification.rules.RuleLearner(store_instances=True, target_class=-1, base_rules=Orange.classification.rules.RuleList())
265   
266   Bases: :class:`Orange.classification.Learner`
267   
268   A base rule induction learner. The algorithm follows
269   separate-and-conquer strategy, which has its origins in the AQ
270   family of algorithms (Fuernkranz J.; Separate-and-Conquer Rule
271   Learning, Artificial Intelligence Review 13, 3-54, 1999). Such
272   algorithms search for the optimal rule for the current training
273   set, remove the covered training instances (`separate`) and repeat
274   the process (`conquer`) on the remaining data.
275   
276   :param store_instances: if ``True`` (default), the induced rules
277       contain a table with references to the stored data instances.
278   :type store_instances: bool
279   
280   :param target_class: index of a specific class to learn; if -1
281        there is no target class
282   :type target_class: int
283   
284   :param base_rules: An optional list of initial rules for constraining the :obj:`rule_finder`.
285   :type base_rules: :class:`~Orange.classification.rules.RuleList`
286
287   The class' functionality is best explained by its ``__call__``
288   function.
289   
290   .. parsed-literal::
291
292      def \_\_call\_\_(self, instances, weight_id=0):
293          rule_list = Orange.classification.rules.RuleList()
294          all_instances = Orange.data.Table(instances)
295          while not self.\ **data_stopping**\ (instances, weight_id, self.target_class):
296              new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class, self.base_rules)
297              if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id):
298                  break
299              instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances, weight_id, self.target_class)
300              rule_list.append(new_rule)
301          return Orange.classification.rules.RuleClassifier_FirstRule(
302              rules=rule_list, instances=all_instances)
303       
304   The customizable components are :obj:`data_stopping`,
305   :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping`
306   objects.
307   
308   .. attribute:: data_stopping
309   
310      An instance of
311      :class:`~Orange.classification.rules.DataStoppingCriteria`
312      that determines whether to continue the induction. The default
313      component,
314      :class:`~Orange.classification.rules.DataStoppingCriteria_NoPositives`
315      returns ``True`` if there are no more instances of the target class.
316   
317   .. attribute:: rule_finder
318     
319      An instance of :class:`~Orange.classification.rules.Finder`
320      that learns a single rule. Default is
321      :class:`~Orange.classification.rules.BeamFinder`.
322
323   .. attribute:: rule_stopping
324     
325      An instance of
326      :class:`~Orange.classification.rules.StoppingCriteria` that
327      decides whether to use the induced rule or to discard it and stop
328      the induction. If ``None`` (default) all rules are accepted.
329       
330   .. attribute:: cover_and_remove
331       
332      An instance of :class:`RuleCovererAndRemover` that removes
333      instances covered by the rule and returns remaining
334      instances. The default implementation
335      (:class:`RuleCovererAndRemover_Default`) removes the instances
336      that belong to given target class; if the target is not
337      specified (:obj:`target_class` == -1), it removes all covered
338      instances.   
339
340
341Rule finders
342------------
343
344.. class:: Orange.classification.rules.Finder
345
346   Base class for rule finders, which learn a single rule from
347   instances.
348   
349   .. method:: __call__(table, weight_id, target_class, base_rules)
350   
351      Induce a new rule from the given data.
352     
353      :param table: training data instances
354      :type table: :class:`Orange.data.Table`
355     
356      :param weight_id: ID of the weight meta-attribute
357      :type weight_id: int
358     
359      :param target_class: index of a specific class being learned; -1 for all.
360      :type target_class: int
361     
362      :param base_rules: A list of initial rules for constraining the search space
363      :type base_rules: :class:`~Orange.classification.rules.RuleList`
364
365
366.. class:: Orange.classification.rules.BeamFinder
367   
368   Bases: :class:`~Orange.classification.rules.Finder`
369   
370   Beam search for the best rule. This is the default finder for
371   :obj:`RuleLearner`. Pseudo code of the algorithm is as follows.
372
373   .. parsed-literal::
374
375      def \_\_call\_\_(self, table, weight_id, target_class, base_rules):
376          prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id)
377          rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior)
378          \# compute quality of rules in rules_star and best_rule
379          ...
380          while len(rules_star) \> 0:
381              candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id)
382              for cand in candidates:
383                  new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class)
384                  for new_rule in new_rules:
385                      if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution):
386                          new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior)
387                          rules_star.append(new_rule)
388                          if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and
389                              new_rule.quality \> best_rule.quality:
390                              best_rule = new_rule
391              rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id)
392          return best_rule
393         
394   Modifiable components are shown in bold. These are:
395
396   .. attribute:: initializer
397   
398      An instance of
399      :obj:`~Orange.classification.rules.BeamInitializer` that
400      is used to construct the initial list of rules. The default,
401      :class:`~Orange.classification.rules.BeamInitializer_Default`,
402      returns :obj:`base_rules`, or a rule with no conditions if
403      :obj:`base_rules` is not set.
404   
405   .. attribute:: candidate_selector
406   
407      An instance of
408      :class:`~Orange.classification.rules.BeamCandidateSelector`
409      used to separate a subset of rules from the current
410      :obj:`rules_star` that will be further specialized.  The default
411      component, an instance of
412      :class:`~Orange.classification.rules.BeamCandidateSelector_TakeAll`,
413      selects all rules.
414   
415   .. attribute:: refiner
416   
417      An instance of
418      :class:`~Orange.classification.rules.BeamRefiner` that is
419      used for refining the rules. Refined rule should cover a strict
420      subset of instances covered by the given rule. Default component
421      (:class:`~Orange.classification.rules.BeamRefiner_Selector`)
422      adds a conjunctive selector to selectors present in the rule.
423   
424   .. attribute:: rule_filter
425   
426      An instance of
427      :class:`~Orange.classification.rules.BeamFilter` that is
428      used for filtering rules to trim the search beam. The default
429      component,
430      :class:`~Orange.classification.rules.BeamFilter_Width`\
431      *(m=5)*\, keeps the five best rules.
432
433   .. method:: __call__(data, weight_id, target_class, base_rules)
434
435       Determines the optimal rule to cover the given data instances.
436
437       :param data: data instances.
438       :type data: :class:`Orange.data.Table`
439
440       :param weight_id: index of the weight meta-attribute.
441       :type weight_id: int
442
443       :param target_class: index of the target class.
444       :type target_class: int
445
446       :param base_rules: existing rules
447       :type base_rules: :class:`~Orange.classification.rules.RuleList`
448
449Rule evaluators
450---------------
451
452.. class:: Orange.classification.rules.Evaluator
453
454   Base class for rule evaluators that evaluate the quality of the
455   rule based on the data instances they cover.
456   
457   .. method:: __call__(rule, instances, weight_id, target_class, prior)
458   
459      Calculate a (non-negative) rule quality.
460     
461      :param rule: rule to evaluate
462      :type rule: :class:`~Orange.classification.rules.Rule`
463     
464      :param instances: data instances covered by the rule
465      :type instances: :class:`Orange.data.Table`
466     
467      :param weight_id: index of the weight meta-attribute
468      :type weight_id: int
469     
470      :param target_class: index of target class of this rule
471      :type target_class: int
472     
473      :param prior: prior class distribution
474      :type prior: :class:`Orange.statistics.distribution.Distribution`
475
476.. autoclass:: Orange.classification.rules.LaplaceEvaluator
477   :members:
478   :show-inheritance:
479   :exclude-members: targetClass, weightID
480
481.. autoclass:: Orange.classification.rules.WRACCEvaluator
482   :members:
483   :show-inheritance:
484   :exclude-members: targetClass, weightID
485   
486.. class:: Orange.classification.rules.Evaluator_Entropy
487
488   Bases: :class:`~Orange.classification.rules.Evaluator`
489   
490.. class:: Orange.classification.rules.Evaluator_LRS
491
492   Bases: :class:`~Orange.classification.rules.Evaluator`
493
494.. class:: Orange.classification.rules.Evaluator_Laplace
495
496   Bases: :class:`~Orange.classification.rules.Evaluator`
497
498.. class:: Orange.classification.rules.Evaluator_mEVC
499
500   Bases: :class:`~Orange.classification.rules.Evaluator`
501   
502Instance covering and removal
503-----------------------------
504
505.. class:: RuleCovererAndRemover
506
507   Base class for rule coverers and removers that, when invoked, remove
508   instances covered by the rule and return remaining instances.
509
510.. autoclass:: CovererAndRemover_MultWeights
511
512.. autoclass:: CovererAndRemover_AddWeights
513   
514Miscellaneous functions
515-----------------------
516
517.. automethod:: Orange.classification.rules.rule_to_string
518
519..
520    Undocumented are:
521    Data-based Stopping Criteria
522    ----------------------------
523    Rule-based Stopping Criteria
524    ----------------------------
525    Rule-based Stopping Criteria
526    ----------------------------
527
528References
529----------
530
531* Clark, Niblett. `The CN2 Induction Algorithm
532  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine
533  Learning 3(4):261--284, 1989.
534* Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements
535  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In
536  Machine Learning - EWSL-91. Proceedings of the European Working Session on
537  Learning, pp 151--163, Porto, Portugal, March 1991.
538* Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD
539  <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal
540  of Machine Learning Research 5: 153-188, 2004.
541
Note: See TracBrowser for help on using the repository browser.