Ignore:
Timestamp:
02/25/12 22:39:37 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Removed prefix Rule from most classes related to classification rules.
Moved the documentation to rst and polished it

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.classification.rules.rst

    r9372 r10370  
    1 .. automodule:: Orange.classification.rules 
     1.. py:currentmodule:: Orange.classification.rules 
     2 
     3.. index:: rule induction 
     4 
     5.. index::  
     6   single: classification; rule induction 
     7 
     8************************** 
     9Rule induction (``rules``) 
     10************************** 
     11 
     12Module ``rules`` implements supervised rule induction algorithms and 
     13rule-based classification methods. Rule induction is based on a 
     14comprehensive framework of components that can be modified or 
     15replaced. For ease of use, the module already provides multiple 
     16variations of `CN2 induction algorithm 
     17<http://www.springerlink.com/content/k6q2v76736w5039r/>`_. 
     18 
     19CN2 algorithm 
     20============= 
     21 
     22.. index::  
     23   single: classification; CN2 
     24 
     25The use of rule learning algorithms is consistent with a typical 
     26learner usage in Orange: 
     27 
     28:download:`rules-cn2.py <code/rules-cn2.py>` 
     29 
     30.. literalinclude:: code/rules-cn2.py 
     31    :lines: 7- 
     32 
     33:: 
     34     
     35    IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000> 
     36    IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000> 
     37    IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000> 
     38    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000> 
     39    IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000> 
     40    IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000> 
     41    IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000> 
     42    IF status=['second'] THEN survived=yes<13.000, 80.000> 
     43    IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000> 
     44    IF status=['crew'] THEN survived=no<670.000, 192.000> 
     45    IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000> 
     46    IF sex=['male'] THEN survived=no<118.000, 57.000> 
     47    IF age=['child'] THEN survived=no<17.000, 14.000> 
     48    IF TRUE THEN survived=no<89.000, 76.000> 
     49     
     50.. autoclass:: Orange.classification.rules.CN2Learner(evaluator=Evaluator_Entropy, beam_width=5, alpha=1) 
     51   :members: 
     52   :show-inheritance: 
     53   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     54      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     55    
     56.. autoclass:: Orange.classification.rules.CN2Classifier 
     57   :members: 
     58   :show-inheritance: 
     59   :exclude-members: beamWidth, resultType 
     60    
     61.. index:: unordered CN2 
     62 
     63.. index::  
     64   single: classification; unordered CN2 
     65 
     66.. autoclass:: Orange.classification.rules.CN2UnorderedLearner(evaluator=Evaluator_Laplace(), beam_width=5, alpha=1.0) 
     67   :members: 
     68   :show-inheritance: 
     69   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     70      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     71    
     72.. autoclass:: Orange.classification.rules.CN2UnorderedClassifier 
     73   :members: 
     74   :show-inheritance: 
     75    
     76.. index:: CN2-SD 
     77.. index:: subgroup discovery 
     78 
     79.. index::  
     80   single: classification; CN2-SD 
     81    
     82.. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner(evaluator=WRACCEvaluator(), beam_width=5, alpha=0.05, mult=0.7) 
     83   :members: 
     84   :show-inheritance: 
     85   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     86      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     87    
     88.. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner 
     89   :members: 
     90   :show-inheritance: 
     91    
     92 
     93.. 
     94    This part is commented out since 
     95    - there is no documentation on how to provide arguments 
     96    - the whole thing is represent original research work particular to 
     97      a specific project and belongs to an 
     98      extension rather than to the main package 
     99 
     100    Argument based CN2 
     101    ================== 
     102 
     103    Orange also supports argument-based CN2 learning. 
     104 
     105    .. autoclass:: Orange.classification.rules.ABCN2 
     106       :members: 
     107       :show-inheritance: 
     108       :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     109      ruleFinder, ruleStopping, storeInstances, targetClass, weightID, 
     110      argument_id 
     111 
     112       This class has many more undocumented methods; see the source code for 
     113       reference. 
     114 
     115    .. autoclass:: Orange.classification.rules.ABCN2Ordered 
     116       :members: 
     117       :show-inheritance: 
     118 
     119    .. autoclass:: Orange.classification.rules.ABCN2M 
     120       :members: 
     121       :show-inheritance: 
     122       :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     123      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     124 
     125    Thismodule has many more undocumented argument-based learning 
     126    related classed; see the source code for reference. 
     127 
     128    References 
     129    ---------- 
     130 
     131    * Bratko, Mozina, Zabkar. `Argument-Based Machine Learning 
     132      <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in 
     133      Computer Science: vol. 4203/2006, 11-17, 2006. 
     134 
     135 
     136Rule induction framework 
     137======================== 
     138 
     139The classes described above are based on a more general framework that 
     140can be fine-tuned to specific needs by replacing individual components. 
     141Here is an example: 
     142 
     143part of :download:`rules-customized.py <code/rules-customized.py>` 
     144 
     145.. literalinclude:: code/rules-customized.py 
     146    :lines: 7-17 
     147 
     148:: 
     149 
     150    IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000> 
     151    IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000> 
     152    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000> 
     153    IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000> 
     154    IF status=['second'] THEN survived=yes<13.000, 104.000> 
     155    IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000> 
     156    IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000> 
     157    IF status=['crew'] THEN survived=yes<3.000, 20.000> 
     158    IF sex=['female'] THEN survived=no<106.000, 90.000> 
     159    IF TRUE THEN survived=yes<0.000, 5.000> 
     160 
     161In the example, we wanted to use a rule evaluator based on the 
     162m-estimate and set ``m`` to 50. The evaluator is a subcomponent of the 
     163:obj:`rule_finder` component. Thus, to be able to set the evaluator, 
     164we first set the :obj:`rule_finder` component, then we added the 
     165desired subcomponent and set its options. All other components, which 
     166are left unspecified, are provided by the learner at the training time 
     167and removed afterwards. 
     168 
     169Continuing with the example, assume that we wish to set a different 
     170validation function and a different beam width. 
     171 
     172part of :download:`rules-customized.py <code/rules-customized.py>` 
     173 
     174.. literalinclude:: code/rules-customized.py 
     175    :lines: 19-23 
     176 
     177.. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1) 
     178    
     179   Represents a single rule. Constructor arguments correspond to the 
     180   first seven of the attributes (from :obj:`filter` to 
     181   :obj:`quality`) below. 
     182    
     183   .. attribute:: filter 
     184    
     185      Rule condition; an instance of 
     186      :class:`Orange.data.filter.Filter`, typically an instance of a 
     187      class derived from :class:`Orange.data.filter.Values` 
     188    
     189   .. attribute:: classifier 
     190       
     191      A rule predicts the class by calling an embedded classifier that 
     192      must be an instance of 
     193      :class:`~Orange.classification.Classifier`, typically 
     194      :class:`~Orange.classification.ConstantClassifier`. This 
     195      classifier is called by the rule classifier, such as 
     196      :obj:`RuleClassifier`. 
     197    
     198   .. attribute:: learner 
     199       
     200      Learner that is used for constructing a classifier. It must be 
     201      an instance of :class:`~Orange.classification.Learner`, 
     202      typically 
     203      :class:`~Orange.classification.majority.MajorityLearner`. 
     204    
     205   .. attribute:: class_distribution 
     206       
     207      Distribution of class in data instances covered by this rule 
     208      (:class:`~Orange.statistics.distribution.Distribution`). 
     209    
     210   .. attribute:: instances 
     211       
     212      Data instances covered by this rule (:class:`~Orange.data.Table`). 
     213    
     214   .. attribute:: weight_id 
     215    
     216      ID of the weight meta-attribute for the stored data instances 
     217      (``int``). 
     218    
     219   .. attribute:: quality 
     220       
     221      Quality of the rule. Rules with higher quality are better 
     222      (``float``). 
     223    
     224   .. attribute:: complexity 
     225    
     226      Complexity of the rule (``float``), typically the number of 
     227      selectors (conditions) in the rule. Complexity is used for 
     228      choosing between rules with the same quality; rules with lower 
     229      complexity are preferred. 
     230    
     231   .. method:: filter_and_store(instances, weight_id=0, target_class=-1) 
     232    
     233      Filter passed data instances and store them in :obj:`instances`. 
     234      Also, compute :obj:`class_distribution`, set the weight of 
     235      stored examples and create a new classifier using :obj:`learner`. 
     236       
     237      :param weight_id: ID of the weight meta-attribute. 
     238      :type weight_id: int 
     239      :param target_class: index of target class; -1 for all classes. 
     240      :type target_class: int 
     241    
     242   .. method:: __call__(instance) 
     243    
     244      Return ``True`` if the given instance matches the rule condition. 
     245       
     246      :param instance: data instance 
     247      :type instance: :class:`Orange.data.Instance` 
     248       
     249   .. method:: __call__(instances, ref=True, negate=False) 
     250 
     251      Return a table of instances that match the rule condition. 
     252       
     253      :param instances: a table of data instances 
     254      :type instances: :class:`Orange.data.Table` 
     255      :param ref: if ``True`` (default), the constructed table contains 
     256          references to the original data instances; if ``False``, the 
     257          data is copied 
     258      :type ref: bool 
     259      :param negate: inverts the selection 
     260      :type negate: bool 
     261 
     262 
     263 
     264.. py:class:: Orange.classification.rules.RuleLearner(store_instances=True, target_class=-1, base_rules=Orange.classification.rules.RuleList()) 
     265    
     266   Bases: :class:`Orange.classification.Learner` 
     267    
     268   A base rule induction learner. The algorithm follows 
     269   separate-and-conquer strategy, which has its origins in the AQ 
     270   family of algorithms (Fuernkranz J.; Separate-and-Conquer Rule 
     271   Learning, Artificial Intelligence Review 13, 3-54, 1999). Such 
     272   algorithms search for the optimal rule for the current training 
     273   set, remove the covered training instances (`separate`) and repeat 
     274   the process (`conquer`) on the remaining data. 
     275    
     276   :param store_instances: if ``True`` (default), the induced rules 
     277       contain a table with references to the stored data instances. 
     278   :type store_instances: bool 
     279     
     280   :param target_class: index of a specific class to learn; if -1 
     281        there is no target class 
     282   :type target_class: int 
     283    
     284   :param base_rules: An optional list of initial rules for constraining the :obj:`rule_finder`. 
     285   :type base_rules: :class:`~Orange.classification.rules.RuleList` 
     286 
     287   The class' functionality is best explained by its ``__call__`` 
     288   function. 
     289    
     290   .. parsed-literal:: 
     291 
     292      def \_\_call\_\_(self, instances, weight_id=0): 
     293          rule_list = Orange.classification.rules.RuleList() 
     294          all_instances = Orange.data.Table(instances) 
     295          while not self.\ **data_stopping**\ (instances, weight_id, self.target_class): 
     296              new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class, self.base_rules) 
     297              if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id): 
     298                  break 
     299              instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances, weight_id, self.target_class) 
     300              rule_list.append(new_rule) 
     301          return Orange.classification.rules.RuleClassifier_FirstRule( 
     302              rules=rule_list, instances=all_instances) 
     303        
     304   The customizable components are :obj:`data_stopping`, 
     305   :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping` 
     306   objects. 
     307    
     308   .. attribute:: data_stopping 
     309    
     310      An instance of 
     311      :class:`~Orange.classification.rules.RuleDataStoppingCriteria` 
     312      that determines whether to continue the induction. The default 
     313      component, 
     314      :class:`~Orange.classification.rules.RuleDataStoppingCriteria_NoPositives` 
     315      returns ``True`` if there are no more instances of the target class.  
     316    
     317   .. attribute:: rule_finder 
     318       
     319      An instance of :class:`~Orange.classification.rules.RuleFinder` 
     320      that learns a single rule. Default is 
     321      :class:`~Orange.classification.rules.RuleBeamFinder`. 
     322 
     323   .. attribute:: rule_stopping 
     324       
     325      An instance of 
     326      :class:`~Orange.classification.rules.RuleStoppingCriteria` that 
     327      decides whether to use the induced rule or to discard it and stop 
     328      the induction. If ``None`` (default) all rules are accepted. 
     329        
     330   .. attribute:: cover_and_remove 
     331        
     332      An instance of :class:`RuleCovererAndRemover` that removes 
     333      instances covered by the rule and returns remaining 
     334      instances. The default implementation 
     335      (:class:`RuleCovererAndRemover_Default`) removes the instances 
     336      that belong to given target class; if the target is not 
     337      specified (:obj:`target_class` == -1), it removes all covered 
     338      instances.     
     339 
     340 
     341Rule finders 
     342------------ 
     343 
     344.. class:: Orange.classification.rules.RuleFinder 
     345 
     346   Base class for rule finders, which learn a single rule from 
     347   instances. 
     348    
     349   .. method:: __call__(table, weight_id, target_class, base_rules) 
     350    
     351      Induce a new rule from the given data. 
     352       
     353      :param table: training data instances 
     354      :type table: :class:`Orange.data.Table` 
     355       
     356      :param weight_id: ID of the weight meta-attribute 
     357      :type weight_id: int 
     358       
     359      :param target_class: index of a specific class being learned; -1 for all. 
     360      :type target_class: int  
     361       
     362      :param base_rules: A list of initial rules for constraining the search space 
     363      :type base_rules: :class:`~Orange.classification.rules.RuleList` 
     364 
     365 
     366.. class:: Orange.classification.rules.RuleBeamFinder 
     367    
     368   Bases: :class:`~Orange.classification.rules.RuleFinder` 
     369    
     370   Beam search for the best rule. This is the default finder for 
     371   :obj:`RuleLearner`. Pseudo code of the algorithm is as follows. 
     372 
     373   .. parsed-literal:: 
     374 
     375      def \_\_call\_\_(self, table, weight_id, target_class, base_rules): 
     376          prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id) 
     377          rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior) 
     378          \# compute quality of rules in rules_star and best_rule 
     379          ... 
     380          while len(rules_star) \> 0: 
     381              candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id) 
     382              for cand in candidates: 
     383                  new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class) 
     384                  for new_rule in new_rules: 
     385                      if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution): 
     386                          new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior) 
     387                          rules_star.append(new_rule) 
     388                          if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and 
     389                              new_rule.quality \> best_rule.quality: 
     390                              best_rule = new_rule 
     391              rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id) 
     392          return best_rule 
     393           
     394   Modifiable components are shown in bold. These are: 
     395 
     396   .. attribute:: initializer 
     397    
     398      An instance of 
     399      :obj:`~Orange.classification.rules.RuleBeamInitializer` that 
     400      is used to construct the initial list of rules. The default, 
     401      :class:`~Orange.classification.rules.RuleBeamInitializer_Default`, 
     402      returns :obj:`base_rules`, or a rule with no conditions if 
     403      :obj:`base_rules` is not set. 
     404    
     405   .. attribute:: candidate_selector 
     406    
     407      An instance of 
     408      :class:`~Orange.classification.rules.RuleBeamCandidateSelector` 
     409      used to separate a subset of rules from the current 
     410      :obj:`rules_star` that will be further specialized.  The default 
     411      component, an instance of 
     412      :class:`~Orange.classification.rules.RuleBeamCandidateSelector_TakeAll`, 
     413      selects all rules. 
     414     
     415   .. attribute:: refiner 
     416    
     417      An instance of 
     418      :class:`~Orange.classification.rules.RuleBeamRefiner` that is 
     419      used for refining the rules. Refined rule should cover a strict 
     420      subset of instances covered by the given rule. Default component 
     421      (:class:`~Orange.classification.rules.RuleBeamRefiner_Selector`) 
     422      adds a conjunctive selector to selectors present in the rule. 
     423     
     424   .. attribute:: rule_filter 
     425    
     426      An instance of 
     427      :class:`~Orange.classification.rules.RuleBeamFilter` that is 
     428      used for filtering rules to trim the search beam. The default 
     429      component, 
     430      :class:`~Orange.classification.rules.RuleBeamFilter_Width`\ 
     431      *(m=5)*\, keeps the five best rules. 
     432 
     433   .. method:: __call__(data, weight_id, target_class, base_rules) 
     434 
     435       Determines the optimal rule to cover the given data instances. 
     436 
     437       :param data: data instances. 
     438       :type data: :class:`Orange.data.Table` 
     439 
     440       :param weight_id: index of the weight meta-attribute. 
     441       :type weight_id: int 
     442 
     443       :param target_class: index of the target class. 
     444       :type target_class: int 
     445 
     446       :param base_rules: existing rules 
     447       :type base_rules: :class:`~Orange.classification.rules.RuleList` 
     448 
     449Rule evaluators 
     450--------------- 
     451 
     452.. class:: Orange.classification.rules.RuleEvaluator 
     453 
     454   Base class for rule evaluators that evaluate the quality of the 
     455   rule based on the data instances they cover. 
     456    
     457   .. method:: __call__(rule, instances, weight_id, target_class, prior) 
     458    
     459      Calculate a (non-negative) rule quality. 
     460       
     461      :param rule: rule to evaluate 
     462      :type rule: :class:`~Orange.classification.rules.Rule` 
     463       
     464      :param instances: data instances covered by the rule 
     465      :type instances: :class:`Orange.data.Table` 
     466       
     467      :param weight_id: index of the weight meta-attribute 
     468      :type weight_id: int 
     469       
     470      :param target_class: index of target class of this rule 
     471      :type target_class: int 
     472       
     473      :param prior: prior class distribution 
     474      :type prior: :class:`Orange.statistics.distribution.Distribution` 
     475 
     476.. autoclass:: Orange.classification.rules.LaplaceEvaluator 
     477   :members: 
     478   :show-inheritance: 
     479   :exclude-members: targetClass, weightID 
     480 
     481.. autoclass:: Orange.classification.rules.WRACCEvaluator 
     482   :members: 
     483   :show-inheritance: 
     484   :exclude-members: targetClass, weightID 
     485    
     486.. class:: Orange.classification.rules.RuleEvaluator_Entropy 
     487 
     488   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     489     
     490.. class:: Orange.classification.rules.RuleEvaluator_LRS 
     491 
     492   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     493 
     494.. class:: Orange.classification.rules.RuleEvaluator_Laplace 
     495 
     496   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     497 
     498.. class:: Orange.classification.rules.RuleEvaluator_mEVC 
     499 
     500   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     501    
     502Instance covering and removal 
     503----------------------------- 
     504 
     505.. class:: RuleCovererAndRemover 
     506 
     507   Base class for rule coverers and removers that, when invoked, remove 
     508   instances covered by the rule and return remaining instances. 
     509 
     510.. autoclass:: CovererAndRemover_MultWeights 
     511 
     512.. autoclass:: CovererAndRemover_AddWeights 
     513    
     514Miscellaneous functions 
     515----------------------- 
     516 
     517.. automethod:: Orange.classification.rules.rule_to_string 
     518 
     519.. 
     520    Undocumented are: 
     521    Data-based Stopping Criteria 
     522    ---------------------------- 
     523    Rule-based Stopping Criteria 
     524    ---------------------------- 
     525    Rule-based Stopping Criteria 
     526    ---------------------------- 
     527 
     528References 
     529---------- 
     530 
     531* Clark, Niblett. `The CN2 Induction Algorithm 
     532  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine 
     533  Learning 3(4):261--284, 1989. 
     534* Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements 
     535  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In 
     536  Machine Learning - EWSL-91. Proceedings of the European Working Session on 
     537  Learning, pp 151--163, Porto, Portugal, March 1991. 
     538* Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD 
     539  <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal 
     540  of Machine Learning Research 5: 153-188, 2004. 
     541 
Note: See TracChangeset for help on using the changeset viewer.