Ignore:
Files:
1 added
14 edited

Legend:

Unmodified
Added
Removed
  • Orange/classification/majority.py

    r10191 r10368  
    1 """ 
    2 *********************** 
    3 Majority (``majority``) 
    4 *********************** 
    5  
    6 .. index:: majority classifier 
    7    pair: classification; majority classifier 
    8  
    9 Accuracy of classifiers is often compared to the "default accuracy", 
    10 that is, the accuracy of a classifier which classifies all instances 
    11 to the majority class. To fit into the standard schema, even this 
    12 algorithm is provided in form of the usual learner-classifier pair. 
    13 Learning is done by :obj:`MajorityLearner` and the classifier it 
    14 constructs is an instance of :obj:`ConstantClassifier`. 
    15  
    16 .. class:: MajorityLearner 
    17  
    18     MajorityLearner will most often be used as is, without setting any 
    19     parameters. Nevertheless, it has two. 
    20  
    21     .. attribute:: estimator_constructor 
    22      
    23         An estimator constructor that can be used for estimation of 
    24         class probabilities. If left None, probability of each class is 
    25         estimated as the relative frequency of instances belonging to 
    26         this class. 
    27          
    28     .. attribute:: apriori_distribution 
    29      
    30         Apriori class distribution that is passed to estimator 
    31         constructor if one is given. 
    32  
    33 .. class:: ConstantClassifier 
    34  
    35     ConstantClassifier always classifies to the same class and reports the 
    36     same class probabilities. 
    37  
    38     Its constructor can be called without arguments, with a variable (for 
    39     :obj:`class_var`), value (for :obj:`default_val`) or both. If the value 
    40     is given and is of type :obj:`Orange.data.Value` (alternatives are an 
    41     integer index of a discrete value or a continuous value), its attribute 
    42     :obj:`Orange.data.Value.variable` will either be used for initializing 
    43     :obj:`class_var` if variable is not given as an argument, or checked 
    44     against the variable argument, if it is given.  
    45      
    46     .. attribute:: default_val 
    47      
    48         Value that is returned by the classifier. 
    49      
    50     .. attribute:: default_distribution 
    51  
    52         Class probabilities returned by the classifier. 
    53      
    54     .. attribute:: class_var 
    55      
    56         Class variable that the classifier predicts. 
    57  
    58  
    59 Examples 
    60 ======== 
    61  
    62 This "learning algorithm" will most often be used as a baseline, 
    63 that is, to determine if some other learning algorithm provides 
    64 any information about the class (:download:`majority-classification.py <code/majority-classification.py>`): 
    65  
    66 .. literalinclude:: code/majority-classification.py 
    67     :lines: 7- 
    68  
    69 """ 
    70  
    711from Orange import core 
    722 
  • Orange/classification/rules.py

    r10219 r10370  
    1 """ 
    2  
    3 .. index:: rule induction 
    4  
    5 .. index::  
    6    single: classification; rule induction 
    7  
    8 ************************** 
    9 Rule induction (``rules``) 
    10 ************************** 
    11  
    12 This module implements supervised rule induction algorithms 
    13 and rule-based classification methods, specifically the  
    14 `CN2 induction algorithm <http://www.springerlink.com/content/k6q2v76736w5039r/>`_ 
    15 in multiple variants, including an argument-based learning one.  
    16 The implementation is modular, based on the rule induction  
    17 framework that is described below, providing the opportunity to change, specialize 
    18 and improve the algorithm. 
    19  
    20 CN2 algorithm 
    21 ============= 
    22  
    23 .. index::  
    24    single: classification; CN2 
    25  
    26 Several variations of well-known CN2 rule learning algorithms are implemented. 
    27 All are implemented by wrapping the 
    28 :class:`~Orange.classification.rules.RuleLearner` class. Each CN2 learner class 
    29 in this module changes some of RuleLearner's replaceable components to reflect 
    30 the required behavior. 
    31  
    32 Usage is consistent with typical learner usage in Orange: 
    33  
    34 :download:`rules-cn2.py <code/rules-cn2.py>` 
    35  
    36 .. literalinclude:: code/rules-cn2.py 
    37     :lines: 7- 
    38  
    39 The result:: 
    40      
    41     IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000> 
    42     IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000> 
    43     IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000> 
    44     IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000> 
    45     IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000> 
    46     IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000> 
    47     IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000> 
    48     IF status=['second'] THEN survived=yes<13.000, 80.000> 
    49     IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000> 
    50     IF status=['crew'] THEN survived=no<670.000, 192.000> 
    51     IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000> 
    52     IF sex=['male'] THEN survived=no<118.000, 57.000> 
    53     IF age=['child'] THEN survived=no<17.000, 14.000> 
    54     IF TRUE THEN survived=no<89.000, 76.000> 
    55      
    56 .. autoclass:: Orange.classification.rules.CN2Learner 
    57    :members: 
    58    :show-inheritance: 
    59    :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
    60       ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
    61     
    62 .. autoclass:: Orange.classification.rules.CN2Classifier 
    63    :members: 
    64    :show-inheritance: 
    65    :exclude-members: beamWidth, resultType 
    66     
    67 .. index:: unordered CN2 
    68  
    69 .. index::  
    70    single: classification; unordered CN2 
    71  
    72 .. autoclass:: Orange.classification.rules.CN2UnorderedLearner 
    73    :members: 
    74    :show-inheritance: 
    75    :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
    76       ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
    77     
    78 .. autoclass:: Orange.classification.rules.CN2UnorderedClassifier 
    79    :members: 
    80    :show-inheritance: 
    81     
    82 .. index:: CN2-SD 
    83 .. index:: subgroup discovery 
    84  
    85 .. index::  
    86    single: classification; CN2-SD 
    87     
    88 .. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner 
    89    :members: 
    90    :show-inheritance: 
    91    :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
    92       ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
    93     
    94 .. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner 
    95    :members: 
    96    :show-inheritance: 
    97     
    98 References 
    99 ---------- 
    100  
    101 * Clark, Niblett. `The CN2 Induction Algorithm 
    102   <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine 
    103   Learning 3(4):261--284, 1989. 
    104 * Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements 
    105   <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In 
    106   Machine Learning - EWSL-91. Proceedings of the European Working Session on 
    107   Learning, pp 151--163, Porto, Portugal, March 1991. 
    108 * Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD 
    109   <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal 
    110   of Machine Learning Research 5: 153-188, 2004. 
    111  
    112  
    113 Argument based CN2 
    114 ================== 
    115  
    116 Orange also supports argument-based CN2 learning. 
    117  
    118 .. autoclass:: Orange.classification.rules.ABCN2 
    119    :members: 
    120    :show-inheritance: 
    121    :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
    122       ruleFinder, ruleStopping, storeInstances, targetClass, weightID, 
    123       argument_id 
    124     
    125    This class has many more undocumented methods; see the source code for 
    126    reference. 
    127     
    128 .. autoclass:: Orange.classification.rules.ABCN2Ordered 
    129    :members: 
    130    :show-inheritance: 
    131     
    132 .. autoclass:: Orange.classification.rules.ABCN2M 
    133    :members: 
    134    :show-inheritance: 
    135    :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
    136       ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
    137  
    138 Thismodule has many more undocumented argument-based learning related classed; 
    139 see the source code for reference. 
    140  
    141 References 
    142 ---------- 
    143  
    144 * Bratko, Mozina, Zabkar. `Argument-Based Machine Learning 
    145   <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in 
    146   Computer Science: vol. 4203/2006, 11-17, 2006. 
    147  
    148  
    149 Rule induction framework 
    150 ======================== 
    151  
    152 A general framework of classes supports the described CN2 implementation, and 
    153 can in fact be fine-tuned to specific needs by replacing individual components. 
    154 Here is a simple example, while a detailed architecture can be observed 
    155 in description of classes that follows it: 
    156  
    157 part of :download:`rules-customized.py <code/rules-customized.py>` 
    158  
    159 .. literalinclude:: code/rules-customized.py 
    160     :lines: 7-17 
    161  
    162 probability with m=50. The result is:: 
    163  
    164     IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000> 
    165     IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000> 
    166     IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000> 
    167     IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000> 
    168     IF status=['second'] THEN survived=yes<13.000, 104.000> 
    169     IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000> 
    170     IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000> 
    171     IF status=['crew'] THEN survived=yes<3.000, 20.000> 
    172     IF sex=['female'] THEN survived=no<106.000, 90.000> 
    173     IF TRUE THEN survived=yes<0.000, 5.000> 
    174  
    175 Notice that it is first necessary to set the :obj:`rule_finder` component, 
    176 because the default components are not constructed when the learner is 
    177 constructed, but only when we run it on data. At that time, the algorithm 
    178 checks which components are necessary and sets defaults. Similarly, when the 
    179 learner finishes, it destructs all *default* components. Continuing with our 
    180 example, assume that we wish to set a different validation function and a 
    181 different bean width. This is simply written as: 
    182  
    183 part of :download:`rules-customized.py <code/rules-customized.py>` 
    184  
    185 .. literalinclude:: code/rules-customized.py 
    186     :lines: 19-23 
    187  
    188 .. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1) 
    189     
    190    Representation of a single induced rule. 
    191     
    192    Parameters, that can be passed to the constructor, correspond to the first 
    193    7 attributes. All attributes are: 
    194     
    195    .. attribute:: filter 
    196     
    197       contents of the rule; this is the basis of the Rule class. Must be of 
    198       type :class:`Orange.core.Filter`; an instance of 
    199       :class:`Orange.core.Filter_values` is set as a default. 
    200     
    201    .. attribute:: classifier 
    202        
    203       each rule can be used as a classical Orange like 
    204       classifier. Must be of type :class:`Orange.classification.Classifier`. 
    205       By default, an instance of :class:`Orange.classification.ConstantClassifier` is used. 
    206     
    207    .. attribute:: learner 
    208        
    209       learner to be used for making a classifier. Must be of type 
    210       :class:`Orange.classification.Learner`. By default, 
    211       :class:`Orange.classification.majority.MajorityLearner` is used. 
    212     
    213    .. attribute:: class_distribution 
    214        
    215       distribution of class in data instances covered by this rule 
    216       (:class:`Orange.statistics.distribution.Distribution`). 
    217     
    218    .. attribute:: examples 
    219        
    220       data instances covered by this rule (:class:`Orange.data.Table`). 
    221     
    222    .. attribute:: weight_id 
    223     
    224       ID of the weight meta-attribute for the stored data instances (int). 
    225     
    226    .. attribute:: quality 
    227        
    228       quality of the rule. Rules with higher quality are better (float). 
    229     
    230    .. attribute:: complexity 
    231     
    232       complexity of the rule (float). Complexity is used for 
    233       selecting between rules with equal quality, where rules with lower 
    234       complexity are preferred. Typically, complexity corresponds to the 
    235       number of selectors in rule (actually to number of conditions in filter), 
    236       but, obviously, any other measure can be applied. 
    237     
    238    .. method:: filter_and_store(instances, weight_id=0, target_class=-1) 
    239     
    240       Filter passed data instances and store them in the attribute 'examples'. 
    241       Also, compute class_distribution, set weight of stored examples and create 
    242       a new classifier using 'learner' attribute. 
    243        
    244       :param weight_id: ID of the weight meta-attribute. 
    245       :type weight_id: int 
    246       :param target_class: index of target class; -1 for all. 
    247       :type target_class: int 
    248     
    249    Objects of this class can be invoked: 
    250  
    251    .. method:: __call__(instance, instances, weight_id=0, target_class=-1) 
    252     
    253       There are two ways of invoking this method. One way is only passing the 
    254       data instance; then the Rule object returns True if the given instance is 
    255       covered by the rule's filter. 
    256        
    257       :param instance: data instance. 
    258       :type instance: :class:`Orange.data.Instance` 
    259        
    260       Another way of invocation is passing a table of data instances, 
    261       in which case a table of instances, covered by this rule, is returned. 
    262        
    263       :param instances: a table of data instances. 
    264       :type instances: :class:`Orange.data.Table` 
    265       :param ref: TODO 
    266       :type ref: bool 
    267       :param negate: if set to True, the result is inverted: the resulting 
    268           table contains instances *not* covered by the rule. 
    269       :type negate: bool 
    270  
    271 .. py:class:: Orange.classification.rules.RuleLearner(store_instances = true, target_class = -1, base_rules = Orange.classification.rules.RuleList()) 
    272     
    273    Bases: :class:`Orange.classification.Learner` 
    274     
    275    A base rule induction learner. The algorithm follows separate-and-conquer 
    276    strategy, which has its origins in the AQ family of algorithms 
    277    (Fuernkranz J.; Separate-and-Conquer Rule Learning, Artificial Intelligence 
    278    Review 13, 3-54, 1999). Basically, such algorithms search for the "best" 
    279    possible rule in learning instancess, remove covered data from learning 
    280    instances (separate) and repeat the process (conquer) on the remaining 
    281    instances. 
    282     
    283    The class' functionality can be best explained by showing its __call__ 
    284    function: 
    285     
    286    .. parsed-literal:: 
    287  
    288       def \_\_call\_\_(self, instances, weight_id=0): 
    289           rule_list = Orange.classification.rules.RuleList() 
    290           all_instances = Orange.data.Table(instances) 
    291           while not self.\ **data_stopping**\ (instances, weight_id, self.target_class): 
    292               new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class, 
    293                                         self.base_rules) 
    294               if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id): 
    295                   break 
    296               instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances, 
    297                                                       weight_id, self.target_class) 
    298               rule_list.append(new_rule) 
    299           return Orange.classification.rules.RuleClassifier_FirstRule( 
    300               rules=rule_list, instances=all_instances) 
    301                  
    302    The four customizable components here are the invoked :obj:`data_stopping`, 
    303    :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping` 
    304    objects. By default, components of the original CN2 algorithm are be used, 
    305    but this can be changed by modifying those attributes: 
    306     
    307    .. attribute:: data_stopping 
    308     
    309       an object of class 
    310       :class:`~Orange.classification.rules.RuleDataStoppingCriteria` 
    311       that determines whether there will be any benefit from further learning 
    312       (ie. if there is enough data to continue learning). The default 
    313       implementation 
    314       (:class:`~Orange.classification.rules.RuleDataStoppingCriteria_NoPositives`) 
    315       returns True if there are no more instances of given class.  
    316     
    317    .. attribute:: rule_stopping 
    318        
    319       an object of class  
    320       :class:`~Orange.classification.rules.RuleStoppingCriteria` 
    321       that decides from the last rule learned if it is worthwhile to use the 
    322       rule and learn more rules. By default, no rule stopping criteria is 
    323       used (:obj:`rule_stopping` == :obj:`None`), thus accepting all 
    324       rules. 
    325         
    326    .. attribute:: cover_and_remove 
    327         
    328       an object of 
    329       :class:`RuleCovererAndRemover` that removes 
    330       instances covered by the rule and returns remaining instances. The 
    331       default implementation 
    332       (:class:`RuleCovererAndRemover_Default`) 
    333       only removes the instances that belong to given target class, except if 
    334       it is not given (ie. :obj:`target_class` == -1). 
    335      
    336    .. attribute:: rule_finder 
    337        
    338       an object of class 
    339       :class:`~Orange.classification.rules.RuleFinder` that learns a single 
    340       rule from instances. Default implementation is 
    341       :class:`~Orange.classification.rules.RuleBeamFinder`. 
    342  
    343    :param store_instances: if set to True, the rules will have data instances 
    344        stored. 
    345    :type store_instances: bool 
    346      
    347    :param target_class: index of a specific class being learned; -1 for all. 
    348    :type target_class: int 
    349     
    350    :param base_rules: Rules that we would like to use in :obj:`rule_finder` to 
    351        constrain the learning space. If not set, it will be set to a set 
    352        containing only an empty rule. 
    353    :type base_rules: :class:`~Orange.classification.rules.RuleList` 
    354  
    355 Rule finders 
    356 ------------ 
    357  
    358 .. class:: Orange.classification.rules.RuleFinder 
    359  
    360    Base class for all rule finders. These are used to learn a single rule from 
    361    instances. 
    362     
    363    Rule finders are invokable in the following manner: 
    364     
    365    .. method:: __call__(table, weight_id, target_class, base_rules) 
    366     
    367       Return a new rule, induced from instances in the given table. 
    368        
    369       :param table: data instances to learn from. 
    370       :type table: :class:`Orange.data.Table` 
    371        
    372       :param weight_id: ID of the weight meta-attribute for the stored data 
    373           instances. 
    374       :type weight_id: int 
    375        
    376       :param target_class: index of a specific class being learned; -1 for all. 
    377       :type target_class: int  
    378        
    379       :param base_rules: Rules that we would like to use in :obj:`rule_finder` 
    380           to constrain the learning space. If not set, it will be set to a set 
    381           containing only an empty rule. 
    382       :type base_rules: :class:`~Orange.classification.rules.RuleList` 
    383  
    384 .. class:: Orange.classification.rules.RuleBeamFinder 
    385     
    386    Bases: :class:`~Orange.classification.rules.RuleFinder` 
    387     
    388    Beam search for the best rule. This is the default class used in RuleLearner 
    389    to find the best rule. Pseudo code of the algorithm is shown here: 
    390  
    391    .. parsed-literal:: 
    392  
    393       def \_\_call\_\_(self, table, weight_id, target_class, base_rules): 
    394           prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id) 
    395           rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior) 
    396           \# compute quality of rules in rules_star and best_rule 
    397           ... 
    398           while len(rules_star) \> 0: 
    399               candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id) 
    400               for cand in candidates: 
    401                   new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class) 
    402                   for new_rule in new_rules: 
    403                       if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution): 
    404                           new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior) 
    405                           rules_star.append(new_rule) 
    406                           if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and 
    407                               new_rule.quality \> best_rule.quality: 
    408                               best_rule = new_rule 
    409               rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id) 
    410           return best_rule 
    411  
    412    Bolded in the pseudo-code are several exchangeable components, exposed as 
    413    attributes. These are: 
    414  
    415    .. attribute:: initializer 
    416     
    417       an object of class 
    418       :class:`~Orange.classification.rules.RuleBeamInitializer` 
    419       used to initialize :obj:`rules_star` and for selecting the 
    420       initial best rule. By default 
    421       (:class:`~Orange.classification.rules.RuleBeamInitializer_Default`), 
    422       :obj:`base_rules` are returned as starting :obj:`rulesSet` and the best 
    423       from :obj:`base_rules` is set as :obj:`best_rule`. If :obj:`base_rules` 
    424       are not set, this class will return :obj:`rules_star` with rule that 
    425       covers all instances (has no selectors) and this rule will be also used 
    426       as :obj:`best_rule`. 
    427     
    428    .. attribute:: candidate_selector 
    429     
    430       an object of class 
    431       :class:`~Orange.classification.rules.RuleBeamCandidateSelector` 
    432       used to separate a subset from the current 
    433       :obj:`rules_star` and return it. These rules will be used in the next 
    434       specification step. Default component (an instance of 
    435       :class:`~Orange.classification.rules.RuleBeamCandidateSelector_TakeAll`) 
    436       takes all rules in :obj:`rules_star`. 
    437      
    438    .. attribute:: refiner 
    439     
    440       an object of class 
    441       :class:`~Orange.classification.rules.RuleBeamRefiner` 
    442       used to refine given rule. New rule should cover a 
    443       strict subset of examples covered by given rule. Default component 
    444       (:class:`~Orange.classification.rules.RuleBeamRefiner_Selector`) adds 
    445       a conjunctive selector to selectors present in the rule. 
    446      
    447    .. attribute:: rule_filter 
    448     
    449       an object of class 
    450       :class:`~Orange.classification.rules.RuleBeamFilter` 
    451       used to filter rules to keep beam relatively small 
    452       to contain search complexity. By default, it takes five best rules: 
    453       :class:`~Orange.classification.rules.RuleBeamFilter_Width`\ *(m=5)*\ . 
    454  
    455    .. method:: __call__(data, weight_id, target_class, base_rules) 
    456  
    457    Determines the next best rule to cover the remaining data instances. 
    458     
    459    :param data: data instances. 
    460    :type data: :class:`Orange.data.Table` 
    461     
    462    :param weight_id: index of the weight meta-attribute. 
    463    :type weight_id: int 
    464     
    465    :param target_class: index of the target class. 
    466    :type target_class: int 
    467     
    468    :param base_rules: existing rules. 
    469    :type base_rules: :class:`~Orange.classification.rules.RuleList` 
    470  
    471 Rule evaluators 
    472 --------------- 
    473  
    474 .. class:: Orange.classification.rules.RuleEvaluator 
    475  
    476    Base class for rule evaluators that evaluate the quality of the rule based 
    477    on covered data instances. All evaluators support being invoked in the 
    478    following manner: 
    479     
    480    .. method:: __call__(rule, instances, weight_id, target_class, prior) 
    481     
    482       Calculates a non-negative rule quality. 
    483        
    484       :param rule: rule to evaluate. 
    485       :type rule: :class:`~Orange.classification.rules.Rule` 
    486        
    487       :param instances: a table of instances, covered by the rule. 
    488       :type instances: :class:`Orange.data.Table` 
    489        
    490       :param weight_id: index of the weight meta-attribute. 
    491       :type weight_id: int 
    492        
    493       :param target_class: index of target class of this rule. 
    494       :type target_class: int 
    495        
    496       :param prior: prior class distribution. 
    497       :type prior: :class:`Orange.statistics.distribution.Distribution` 
    498  
    499 .. autoclass:: Orange.classification.rules.LaplaceEvaluator 
    500    :members: 
    501    :show-inheritance: 
    502    :exclude-members: targetClass, weightID 
    503  
    504 .. autoclass:: Orange.classification.rules.WRACCEvaluator 
    505    :members: 
    506    :show-inheritance: 
    507    :exclude-members: targetClass, weightID 
    508     
    509 .. class:: Orange.classification.rules.RuleEvaluator_Entropy 
    510  
    511    Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
    512      
    513 .. class:: Orange.classification.rules.RuleEvaluator_LRS 
    514  
    515    Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
    516  
    517 .. class:: Orange.classification.rules.RuleEvaluator_Laplace 
    518  
    519    Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
    520  
    521 .. class:: Orange.classification.rules.RuleEvaluator_mEVC 
    522  
    523    Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
    524     
    525 Instance covering and removal 
    526 ----------------------------- 
    527  
    528 .. class:: RuleCovererAndRemover 
    529  
    530    Base class for rule coverers and removers that, when invoked, remove 
    531    instances covered by the rule and return remaining instances. 
    532  
    533    .. method:: __call__(rule, instances, weights, target_class) 
    534     
    535       Calculates a non-negative rule quality. 
    536        
    537       :param rule: rule to evaluate. 
    538       :type rule: :class:`~Orange.classification.rules.Rule` 
    539        
    540       :param instances: a table of instances, covered by the rule. 
    541       :type instances: :class:`Orange.data.Table` 
    542        
    543       :param weights: index of the weight meta-attribute. 
    544       :type weights: int 
    545        
    546       :param target_class: index of target class of this rule. 
    547       :type target_class: int 
    548  
    549 .. autoclass:: CovererAndRemover_MultWeights 
    550  
    551 .. autoclass:: CovererAndRemover_AddWeights 
    552     
    553 Miscellaneous functions 
    554 ----------------------- 
    555  
    556 .. automethod:: Orange.classification.rules.rule_to_string 
    557  
    558 .. 
    559     Undocumented are: 
    560     Data-based Stopping Criteria 
    561     ---------------------------- 
    562     Rule-based Stopping Criteria 
    563     ---------------------------- 
    564     Rule-based Stopping Criteria 
    565     ---------------------------- 
    566  
    567 """ 
    568  
    5691import random 
    5702import math 
     
    5746import Orange 
    5757import Orange.core 
    576 from Orange.core import \ 
    577     RuleClassifier, \ 
    578     RuleClassifier_firstRule, \ 
    579     RuleClassifier_logit, \ 
    580     RuleLearner, \ 
    581     Rule, \ 
    582     RuleBeamCandidateSelector, \ 
    583     RuleBeamCandidateSelector_TakeAll, \ 
    584     RuleBeamFilter, \ 
    585     RuleBeamFilter_Width, \ 
    586     RuleBeamInitializer, \ 
    587     RuleBeamInitializer_Default, \ 
    588     RuleBeamRefiner, \ 
    589     RuleBeamRefiner_Selector, \ 
    590     RuleClassifierConstructor, \ 
    591     RuleCovererAndRemover, \ 
    592     RuleCovererAndRemover_Default, \ 
    593     RuleDataStoppingCriteria, \ 
    594     RuleDataStoppingCriteria_NoPositives, \ 
    595     RuleEvaluator, \ 
    596     RuleEvaluator_Entropy, \ 
    597     RuleEvaluator_LRS, \ 
    598     RuleEvaluator_Laplace, \ 
    599     RuleEvaluator_mEVC, \ 
    600     RuleFinder, \ 
    601     RuleBeamFinder, \ 
    602     RuleList, \ 
    603     RuleStoppingCriteria, \ 
    604     RuleStoppingCriteria_NegativeDistribution, \ 
    605     RuleValidator, \ 
    606     RuleValidator_LRS 
     8 
     9RuleClassifier = Orange.core.RuleClassifier 
     10RuleClassifier_firstRule = Orange.core.RuleClassifier_firstRule 
     11RuleClassifier_logit = Orange.core.RuleClassifier_logit 
     12RuleLearner = Orange.core.RuleLearner 
     13Rule = Orange.core.Rule 
     14RuleList = Orange.core.RuleList 
     15 
     16BeamCandidateSelector = Orange.core.RuleBeamCandidateSelector 
     17BeamCandidateSelector_TakeAll = Orange.core.RuleBeamCandidateSelector_TakeAll 
     18BeamFilter = Orange.core.RuleBeamFilter 
     19BeamFilter_Width = Orange.core.RuleBeamFilter_Width 
     20BeamInitializer = Orange.core.RuleBeamInitializer 
     21BeamInitializer_Default = Orange.core.RuleBeamInitializer_Default 
     22BeamRefiner = Orange.core.RuleBeamRefiner 
     23BeamRefiner_Selector = Orange.core.RuleBeamRefiner_Selector 
     24ClassifierConstructor = Orange.core.RuleClassifierConstructor 
     25CovererAndRemover = Orange.core.RuleCovererAndRemover 
     26CovererAndRemover_Default = Orange.core.RuleCovererAndRemover_Default 
     27DataStoppingCriteria = Orange.core.RuleDataStoppingCriteria 
     28DataStoppingCriteria_NoPositives = Orange.core.RuleDataStoppingCriteria_NoPositives 
     29Evaluator = Orange.core.RuleEvaluator 
     30Evaluator_Entropy = Orange.core.RuleEvaluator_Entropy 
     31Evaluator_LRS = Orange.core.RuleEvaluator_LRS 
     32Evaluator_Laplace = Orange.core.RuleEvaluator_Laplace 
     33Evaluator_mEVC = Orange.core.RuleEvaluator_mEVC 
     34Finder = Orange.core.RuleFinder 
     35BeamFinder = Orange.core.RuleBeamFinder 
     36StoppingCriteria = Orange.core.RuleStoppingCriteria 
     37StoppingCriteria_NegativeDistribution = Orange.core.RuleStoppingCriteria_NegativeDistribution 
     38Validator = Orange.core.RuleValidator 
     39Validator_LRS = Orange.core.RuleValidator_LRS 
     40     
    60741from Orange.misc import deprecated_keywords 
    60842from Orange.misc import deprecated_members 
     
    63973 
    64074 
    641 class LaplaceEvaluator(RuleEvaluator): 
     75class LaplaceEvaluator(Evaluator): 
    64276    """ 
    64377    Laplace's rule of succession. 
     
    65993 
    66094 
    661 class WRACCEvaluator(RuleEvaluator): 
     95class WRACCEvaluator(Evaluator): 
    66296    """ 
    66397    Weighted relative accuracy. 
     
    686120 
    687121 
    688 class MEstimateEvaluator(RuleEvaluator): 
     122class MEstimateEvaluator(Evaluator): 
    689123    """ 
    690124    Rule evaluator using m-estimate of probability rule evaluation function. 
     
    718152class CN2Learner(RuleLearner): 
    719153    """ 
    720     Classical CN2 (see Clark and Niblett; 1988) induces a set of ordered 
    721     rules, which means that classificator must try these rules in the same 
    722     order as they were learned. 
    723      
    724     If data instances are provided to the constructor, the learning algorithm 
    725     is called and the resulting classifier is returned instead of the learner. 
    726  
    727     :param evaluator: an object that evaluates a rule from covered instances. 
     154    Classical CN2 inducer (Clark and Niblett; 1988) that constructs a 
     155    set of ordered rules. Constructor returns either an instance of 
     156    :obj:`CN2Learner` or, if training data is provided, a 
     157    :obj:`CN2Classifier`. 
     158     
     159    :param evaluator: an object that evaluates a rule from instances. 
    728160        By default, entropy is used as a measure.  
    729     :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator` 
     161    :type evaluator: :class:`~Orange.classification.rules.Evaluator` 
    730162    :param beam_width: width of the search beam. 
    731163    :type beam_width: int 
    732     :param alpha: significance level of the likelihood ratio statistics to 
    733         determine whether rule is better than the default rule. 
     164    :param alpha: significance level of the likelihood ratio statistics 
     165        to determine whether rule is better than the default rule. 
    734166    :type alpha: float 
    735167 
     
    744176            return self 
    745177 
    746     def __init__(self, evaluator=RuleEvaluator_Entropy(), beam_width=5, 
     178    def __init__(self, evaluator=Evaluator_Entropy(), beam_width=5, 
    747179        alpha=1.0, **kwds): 
    748180        self.__dict__.update(kwds) 
    749         self.rule_finder = RuleBeamFinder() 
    750         self.rule_finder.ruleFilter = RuleBeamFilter_Width(width=beam_width) 
     181        self.rule_finder = BeamFinder() 
     182        self.rule_finder.ruleFilter = BeamFilter_Width(width=beam_width) 
    751183        self.rule_finder.evaluator = evaluator 
    752         self.rule_finder.validator = RuleValidator_LRS(alpha=alpha) 
     184        self.rule_finder.validator = Validator_LRS(alpha=alpha) 
    753185 
    754186    def __call__(self, instances, weight=0): 
     
    772204class CN2Classifier(RuleClassifier): 
    773205    """ 
    774     Classical CN2 (see Clark and Niblett; 1988) classifies a new instance 
    775     using an ordered set of rules. Usually the learner 
    776     (:class:`~Orange.classification.rules.CN2Learner`) is used to construct the 
    777     classifier. 
    778      
    779     :param rules: learned rules to be used for classification (mandatory). 
    780     :type rules: :class:`~Orange.classification.rules.RuleList` 
    781      
    782     :param instances: data instances that were used for learning. 
     206    Classical CN2 classifier (Clark and Niblett; 1988) that predicts a 
     207    class from an ordered list of rules. The classifier is usually 
     208    constructed by :class:`~Orange.classification.rules.CN2Learner`. 
     209     
     210    :param rules: induced rules 
     211    :type rules: :class:`~Orange.classification.rules.List` 
     212     
     213    :param instances: stored training data instances 
    783214    :type instances: :class:`Orange.data.Table` 
    784215     
     
    842273class CN2UnorderedLearner(RuleLearner): 
    843274    """ 
    844     CN2 unordered (see Clark and Boswell; 1991) induces a set of unordered 
    845     rules - classification from rules does not assume ordering of rules. 
    846     Learning rules is quite similar to learning in classical CN2, where 
    847     the process of learning of rules is separated to learning rules for each 
    848     class. 
    849      
    850     If data instances are provided to the constructor, the learning algorithm 
    851     is called and the resulting classifier is returned instead of the learner. 
    852  
     275    Unordered CN2 (Clark and Boswell; 1991) induces a set of unordered 
     276    rules. Learning rules is quite similar to learning in classical 
     277    CN2, where the process of learning of rules is separated to 
     278    learning rules for each class. 
     279 
     280    Constructor returns either an instance of 
     281    :obj:`CN2UnorderedLearner` or, if training data is provided, a 
     282    :obj:`CN2UnorderedClassifier`. 
     283     
    853284    :param evaluator: an object that evaluates a rule from covered instances. 
    854285        By default, Laplace's rule of succession is used as a measure.  
    855     :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator` 
     286    :type evaluator: :class:`~Orange.classification.rules.Evaluator` 
    856287    :param beam_width: width of the search beam. 
    857288    :type beam_width: int 
     
    868299            return self 
    869300 
    870     def __init__(self, evaluator=RuleEvaluator_Laplace(), beam_width=5, 
     301    def __init__(self, evaluator=Evaluator_Laplace(), beam_width=5, 
    871302        alpha=1.0, **kwds): 
    872303        self.__dict__.update(kwds) 
    873         self.rule_finder = RuleBeamFinder() 
    874         self.rule_finder.ruleFilter = RuleBeamFilter_Width(width=beam_width) 
     304        self.rule_finder = BeamFinder() 
     305        self.rule_finder.ruleFilter = BeamFilter_Width(width=beam_width) 
    875306        self.rule_finder.evaluator = evaluator 
    876         self.rule_finder.validator = RuleValidator_LRS(alpha=alpha) 
    877         self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha=1.0) 
    878         self.rule_stopping = RuleStopping_Apriori() 
    879         self.data_stopping = RuleDataStoppingCriteria_NoPositives() 
     307        self.rule_finder.validator = Validator_LRS(alpha=alpha) 
     308        self.rule_finder.rule_stoppingValidator = Validator_LRS(alpha=1.0) 
     309        self.rule_stopping = Stopping_Apriori() 
     310        self.data_stopping = DataStoppingCriteria_NoPositives() 
    880311 
    881312    @deprecated_keywords({"weight": "weight_id"}) 
     
    918349class CN2UnorderedClassifier(RuleClassifier): 
    919350    """ 
    920     CN2 unordered (see Clark and Boswell; 1991) classifies a new instance using 
    921     a set of unordered rules. Usually the learner 
    922     (:class:`~Orange.classification.rules.CN2UnorderedLearner`) is used to 
    923     construct the classifier. 
    924      
    925     :param rules: learned rules to be used for classification (mandatory). 
     351    Unordered CN2 classifier (Clark and Boswell; 1991) classifies an 
     352    instance using a set of unordered rules. The classifier is 
     353    typically constructed with 
     354    :class:`~Orange.classification.rules.CN2UnorderedLearner`. 
     355     
     356    :param rules: induced rules 
    926357    :type rules: :class:`~Orange.classification.rules.RuleList` 
    927358     
    928     :param instances: data instances that were used for learning. 
     359    :param instances: stored training data instances 
    929360    :type instances: :class:`Orange.data.Table` 
    930361     
     
    948379    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue, ret_rules=False): 
    949380        """ 
     381        The call has another optional argument that is used to tell 
     382        the classifier to also return the rules that cover the given 
     383        data instance. 
     384 
    950385        :param instance: instance to be classified. 
    951386        :type instance: :class:`Orange.data.Instance` 
     
    956391         
    957392        :rtype: :class:`Orange.data.Value`,  
    958               :class:`Orange.statistics.distribution.Distribution` or a tuple with both 
     393              :class:`Orange.statistics.distribution.Distribution` or a tuple with both, and a list of rules if :obj:`ret_rules` is ``True`` 
    959394        """ 
    960395        def add(disc1, disc2, sumd): 
     
    1006441class CN2SDUnorderedLearner(CN2UnorderedLearner): 
    1007442    """ 
    1008     CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules, which 
    1009     is the same as :class:`~Orange.classification.rules.CN2UnorderedLearner`. 
    1010     The difference between classical CN2 unordered and CN2-SD is selection of 
    1011     specific evaluation function and covering function: 
    1012     :class:`WRACCEvaluator` is used to implement 
    1013     weight-relative accuracy and  
    1014     :class:`CovererAndRemover_MultWeights` avoids 
    1015     excluding covered instances, multiplying their weight by the value of 
    1016     mult parameter instead. 
    1017      
    1018     If data instances are provided to the constructor, the learning algorithm 
    1019     is called and the resulting classifier is returned instead of the learner. 
     443    CN2-SD (Lavrac et al.; 2004) induces a set of unordered rules used 
     444    by :class:`~Orange.classification.rules.CN2UnorderedClassifier`. 
     445    CN2-SD differs from unordered CN2 by the default function and 
     446    covering function: :class:`WRACCEvaluator` computes weighted 
     447    relative accuracy and :class:`CovererAndRemover_MultWeights` 
     448    decreases the weight of covered data instances instead of removing 
     449    them. 
     450     
     451    Constructor returns either an instance of 
     452    :obj:`CN2SDUnorderedLearner` or, if training data is provided, a 
     453    :obj:`CN2UnorderedClassifier`. 
    1020454 
    1021455    :param evaluator: an object that evaluates a rule from covered instances. 
    1022456        By default, weighted relative accuracy is used. 
    1023     :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator` 
     457    :type evaluator: :class:`~Orange.classification.rules.Evaluator` 
    1024458     
    1025459    :param beam_width: width of the search beam. 
     
    1059493class ABCN2(RuleLearner): 
    1060494    """ 
    1061     This is an implementation of argument-based CN2 using EVC as evaluation 
    1062     and LRC classification. 
    1063      
    1064     Rule learning parameters that can be passed to constructor: 
     495    Argument-based CN2 that uses EVC for evaluation 
     496    and LRC for classification. 
    1065497     
    1066498    :param width: beam width (default 5). 
    1067499    :type width: int 
    1068     :param learn_for_class: class for which to learn; None (default) if all 
    1069        classes are to be learnt. 
    1070     :param learn_one_rule: decides whether to rule one rule only (default 
    1071        False). 
     500    :param learn_for_class: class for which to learn; ``None`` (default) if all 
     501       classes are to be learned. 
     502    :param learn_one_rule: decides whether to learn only a single rule (default: 
     503       ``False``). 
    1072504    :type learn_one_rule: boolean 
    1073505    :param analyse_argument: index of argument to analyse; -1 to learn normally 
    1074506       (default) 
    1075507    :type analyse_argument: int 
    1076     :param debug: sets debug mode - prints some info during execution; False (default) 
     508    :param debug: sets debug mode that prints some info during execution (default: ``False``) 
    1077509    :type debug: boolean 
    1078510     
    1079     The following evaluator related arguments are supported: 
     511    The following evaluator related arguments are also supported: 
    1080512     
    1081513    :param m: m for m-estimate to be corrected with EVC (default 2). 
    1082514    :type m: int 
    1083515    :param opt_reduction: type of EVC correction: 0=no correction, 
    1084        1=pessimistic, 2=normal (default 2). 
     516       1=pessimistic, 2=normal (default). 
    1085517    :type opt_reduction: int 
    1086     :param nsampling: number of samples in estimating extreme value 
    1087        distribution for EVC (default 100). 
     518    :param nsampling: number of samples for estimation of extreme value 
     519       distribution for EVC (default: 100). 
    1088520    :type nsampling: int 
    1089521    :param evd: pre-given extreme value distributions. 
    1090522    :param evd_arguments: pre-given extreme value distributions for arguments. 
    1091523     
    1092     Those parameters control rule validation: 
     524    The following parameters control rule validation: 
    1093525     
    1094526    :param rule_sig: minimal rule significance (default 1.0). 
     
    1140572        self.postpruning = postpruning 
    1141573        # rule finder 
    1142         self.rule_finder = RuleBeamFinder() 
    1143         self.ruleFilter = RuleBeamFilter_Width(width=width) 
     574        self.rule_finder = BeamFinder() 
     575        self.ruleFilter = BeamFilter_Width(width=width) 
    1144576        self.ruleFilter_arguments = ABBeamFilter(width=width) 
    1145577        if max_rule_complexity - 1 < 0: 
    1146578            max_rule_complexity = 10 
    1147         self.rule_finder.rule_stoppingValidator = RuleValidator_LRS(alpha=1.0, min_quality=0., max_rule_complexity=max_rule_complexity - 1, min_coverage=min_coverage) 
    1148         self.refiner = RuleBeamRefiner_Selector() 
     579        self.rule_finder.rule_stoppingValidator = Validator_LRS(alpha=1.0, min_quality=0., max_rule_complexity=max_rule_complexity - 1, min_coverage=min_coverage) 
     580        self.refiner = BeamRefiner_Selector() 
    1149581        self.refiner_arguments = SelectorAdder(discretizer=Orange.feature.discretization.Entropy(forceAttribute=1, 
    1150582                                                                                           maxNumberOfIntervals=2)) 
     
    1152584        # evc evaluator 
    1153585        evdGet = Orange.core.EVDistGetter_Standard() 
    1154         self.rule_finder.evaluator = RuleEvaluator_mEVC(m=m, evDistGetter=evdGet, min_improved=min_improved, min_improved_perc=min_improved_perc) 
     586        self.rule_finder.evaluator = Evaluator_mEVC(m=m, evDistGetter=evdGet, min_improved=min_improved, min_improved_perc=min_improved_perc) 
    1155587        self.rule_finder.evaluator.returnExpectedProb = True 
    1156588        self.rule_finder.evaluator.optimismReduction = opt_reduction 
    1157589        self.rule_finder.evaluator.ruleAlpha = rule_sig 
    1158590        self.rule_finder.evaluator.attributeAlpha = att_sig 
    1159         self.rule_finder.evaluator.validator = RuleValidator_LRS(alpha=1.0, min_quality=min_quality, min_coverage=min_coverage, max_rule_complexity=max_rule_complexity - 1) 
     591        self.rule_finder.evaluator.validator = Validator_LRS(alpha=1.0, min_quality=min_quality, min_coverage=min_coverage, max_rule_complexity=max_rule_complexity - 1) 
    1160592 
    1161593        # learn stopping criteria 
    1162594        self.rule_stopping = None 
    1163         self.data_stopping = RuleDataStoppingCriteria_NoPositives() 
     595        self.data_stopping = DataStoppingCriteria_NoPositives() 
    1164596        # evd fitting 
    1165597        self.evd_creator = EVDFitter(self, n=nsampling) 
     
    1220652            while aes: 
    1221653                if self.analyse_argument > -1 and \ 
    1222                    (isinstance(self.analyse_argument, Orange.core.Example) and not Orange.core.Example(dich_data.domain, self.analyse_argument) == aes[0] or \ 
     654                   (isinstance(self.analyse_argument, Orange.data.Instance) and not Orange.data.Instance(dich_data.domain, self.analyse_argument) == aes[0] or \ 
    1223655                    isinstance(self.analyse_argument, int) and not dich_data[self.analyse_argument] == aes[0]): 
    1224656                    aes = aes[1:] 
     
    1392824 
    1393825    def change_domain(self, rule, cl, examples, weight_id): 
    1394         rule.filter = Orange.core.Filter_values(domain=examples.domain, 
    1395                                         conditions=rule.filter.conditions) 
     826        rule.filter = Orange.data.Values( 
     827            domain=examples.domain, conditions=rule.filter.conditions) 
    1396828        rule.filterAndStore(examples, weight_id, cl) 
    1397829        if hasattr(rule, "learner") and hasattr(rule.learner, "arg_example"): 
     
    1488920                p.filter.filter.conditions.extend(pruned_conditions) 
    1489921                # if argument does not contain all unspecialized reasons, add those reasons with minimum values 
    1490                 at_oper_pairs = [(c.position, c.oper) for c in p.filter.conditions if type(c) == Orange.core.ValueFilter_continuous] 
     922                at_oper_pairs = [(c.position, c.oper) for c in p.filter.conditions if type(c) == Orange.data.filter.ValueFilterContinuous] 
    1491923                for u in unspec_conditions: 
    1492924                    if not (u.position, u.oper) in at_oper_pairs: 
    1493925                        # find minimum value 
    1494                         if u.oper == Orange.core.ValueFilter_continuous.Greater or u.oper == Orange.core.ValueFilter_continuous.GreaterEqual: 
     926                        if u.oper == Orange.data.filter.ValueFilter.Greater or \ 
     927                            u.oper == Orange.data.filter.ValueFilter.GreaterEqual: 
    1495928                            u.ref = min([float(e[u.position]) - 10. for e in p.examples]) 
    1496929                        else: 
     
    1514947 
    1515948    def newFilter_values(self, filter): 
    1516         newFilter = Orange.core.Filter_values() 
     949        newFilter = Orange.data.filter.Values() 
    1517950        newFilter.conditions = filter.conditions[:] 
    1518951        newFilter.domain = filter.domain 
     
    1531964            return [] 
    1532965        cn2_learner = Orange.classification.rules.CN2UnorderedLearner() 
    1533         cn2_learner.rule_finder = RuleBeamFinder() 
     966        cn2_learner.rule_finder = BeamFinder() 
    1534967        cn2_learner.rule_finder.refiner = SelectorArgConditions(crit_example, allowed_conditions) 
    1535968        cn2_learner.rule_finder.evaluator = Orange.classification.rules.MEstimateEvaluator(self.rule_finder.evaluator.m) 
     
    1550983class CN2EVCUnorderedLearner(ABCN2): 
    1551984    """ 
    1552     CN2-SD (see Lavrac et al.; 2004) induces a set of unordered rules in a 
    1553     simmilar manner as 
    1554     :class:`~Orange.classification.rules.CN2SDUnorderedLearner`. This 
    1555     implementation uses the EVC rule evaluation. 
    1556      
    1557     If data instances are provided to the constructor, the learning algorithm 
    1558     is called and the resulting classifier is returned instead of the learner. 
     985    A learner similar to CN2-SD (:obj:`CN2SDUnorderedLearner`) except that 
     986    it uses EVC for rule evaluation. 
    1559987 
    1560988    :param evaluator: an object that evaluates a rule from covered instances. 
    1561989        By default, weighted relative accuracy is used. 
    1562     :type evaluator: :class:`~Orange.classification.rules.RuleEvaluator` 
     990    :type evaluator: :class:`~Orange.classification.rules.Evaluator` 
    1563991     
    1564992    :param beam_width: width of the search beam. 
     
    15781006            max_rule_complexity=int(max_rule_complexity)) 
    15791007 
    1580 class DefaultLearner(Orange.core.Learner): 
    1581     """ 
    1582     Default lerner - returns default classifier with predefined output class. 
     1008class DefaultLearner(Orange.classification.Learner): 
     1009    """ 
     1010    Default learner - returns default classifier with predefined output class. 
    15831011    """ 
    15841012    def __init__(self, default_value=None): 
    15851013        self.default_value = default_value 
    15861014    def __call__(self, examples, weight_id=0): 
    1587         return Orange.classification.ConstantClassifier(self.default_value, defaultDistribution=Orange.core.Distribution(examples.domain.class_var, examples, weight_id)) 
     1015        return Orange.classification.ConstantClassifier(self.default_value, defaultDistribution=Orange.statistics.Distribution(examples.domain.class_var, examples, weight_id)) 
    15881016 
    15891017class ABCN2Ordered(ABCN2): 
     
    16241052 
    16251053 
    1626 class RuleStopping_Apriori(RuleStoppingCriteria): 
     1054class Stopping_Apriori(StoppingCriteria): 
    16271055    def __init__(self, apriori=None): 
    16281056        self.apriori = None 
     
    16401068 
    16411069 
    1642 class RuleStopping_SetRules(RuleStoppingCriteria): 
     1070class Stopping_SetRules(StoppingCriteria): 
    16431071    def __init__(self, validator): 
    1644         self.rule_stopping = RuleStoppingCriteria_NegativeDistribution() 
     1072        self.rule_stopping = StoppingCriteria_NegativeDistribution() 
    16451073        self.validator = validator 
    16461074 
     
    16521080 
    16531081 
    1654 class LengthValidator(RuleValidator): 
     1082class LengthValidator(Validator): 
    16551083    """ prune rules with more conditions than self.length. """ 
    16561084    def __init__(self, length= -1): 
     
    16631091 
    16641092 
    1665 class NoDuplicatesValidator(RuleValidator): 
     1093class NoDuplicatesValidator(Validator): 
    16661094    def __init__(self, alpha=.05, min_coverage=0, max_rule_length=0, rules=RuleList()): 
    16671095        self.rules = rules 
    1668         self.validator = RuleValidator_LRS(alpha=alpha, \ 
     1096        self.validator = Validator_LRS(alpha=alpha, \ 
    16691097            min_coverage=min_coverage, max_rule_length=max_rule_length) 
    16701098 
     
    16761104 
    16771105 
    1678 class RuleClassifier_BestRule(RuleClassifier): 
     1106class Classifier_BestRule(RuleClassifier): 
    16791107    def __init__(self, rules, instances, weight_id=0, **argkw): 
    16801108        self.rules = rules 
     
    17171145 
    17181146 
    1719 class CovererAndRemover_MultWeights(RuleCovererAndRemover): 
    1720     """ 
    1721     Covering and removing of instances using weight multiplication: 
     1147class CovererAndRemover_MultWeights(CovererAndRemover): 
     1148    """ 
     1149    Covering and removing of instances using weight multiplication. 
    17221150     
    17231151    :param mult: weighting multiplication factor 
     
    17461174 
    17471175 
    1748 class CovererAndRemover_AddWeights(RuleCovererAndRemover): 
     1176class CovererAndRemover_AddWeights(CovererAndRemover): 
    17491177    """ 
    17501178    Covering and removing of instances using weight addition. 
     
    17811209 
    17821210 
    1783 class CovererAndRemover_Prob(RuleCovererAndRemover): 
     1211class CovererAndRemover_Prob(CovererAndRemover): 
    17841212    """ This class impements probabilistic covering. """ 
    17851213    def __init__(self, examples, weight_id, target_class, apriori, argument_id): 
     
    18161244 
    18171245    def filter_covers_example(self, example, filter): 
    1818         filter_indices = RuleCoversArguments.filterIndices(filter) 
     1246        filter_indices = CoversArguments.filterIndices(filter) 
    18191247        if filter(example): 
    18201248            try: 
     
    18411269 
    18421270    def condIn(self, cond, filter_indices): # is condition in the filter? 
    1843         condInd = RuleCoversArguments.conditionIndex(cond) 
     1271        condInd = CoversArguments.conditionIndex(cond) 
    18441272        if operator.or_(condInd, filter_indices[cond.position]) == filter_indices[cond.position]: 
    18451273            return True 
     
    18701298    """ 
    18711299    def selectSign(oper): 
    1872         if oper == Orange.core.ValueFilter_continuous.Less: 
     1300        if oper == Orange.data.filter.ValueFilter.Less: 
    18731301            return "<" 
    1874         elif oper == Orange.core.ValueFilter_continuous.LessEqual: 
     1302        elif oper == Orange.data.filter.ValueFilter.LessEqual: 
    18751303            return "<=" 
    1876         elif oper == Orange.core.ValueFilter_continuous.Greater: 
     1304        elif oper == Orange.data.filter.ValueFilter.Greater: 
    18771305            return ">" 
    1878         elif oper == Orange.core.ValueFilter_continuous.GreaterEqual: 
     1306        elif oper == Orange.data.filter.ValueFilter.GreaterEqual: 
    18791307            return ">=" 
    18801308        else: return "=" 
     
    18921320        if i > 0: 
    18931321            ret += " AND " 
    1894         if type(c) == Orange.core.ValueFilter_discrete: 
     1322        if isinstance(c, Orange.data.filter.ValueFilterDiscrete): 
    18951323            ret += domain[c.position].name + "=" + str([domain[c.position].\ 
    18961324                values[int(v)] for v in c.values]) 
    1897         elif type(c) == Orange.core.ValueFilter_continuous: 
     1325        elif isinstance(c, Orange.data.filter.ValueFilterContinuous): 
    18981326            ret += domain[c.position].name + selectSign(c.oper) + str(c.ref) 
    1899     if rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\ 
     1327    if isinstance(rule.classifier, Orange.classification.ConstantClassifier) \ 
    19001328            and rule.classifier.default_val: 
    19011329        ret = ret + " THEN " + domain.class_var.name + "=" + \ 
    1902         str(rule.classifier.default_value) 
     1330            str(rule.classifier.default_value) 
    19031331        if show_distribution: 
    19041332            ret += str(rule.class_distribution) 
    1905     elif rule.classifier and type(rule.classifier) == Orange.classification.ConstantClassifier\ 
    1906             and type(domain.class_var) == Orange.core.EnumVariable: 
     1333    elif isinstance(rule.classifier, Orange.classification.ConstantClassifier) \ 
     1334            and isinstance(domain.class_var, Orange.feature.Discrete): 
    19071335        ret = ret + " THEN " + domain.class_var.name + "=" + \ 
    1908         str(rule.class_distribution.modus()) 
     1336            str(rule.class_distribution.modus()) 
    19091337        if show_distribution: 
    19101338            ret += str(rule.class_distribution) 
     
    19141342    if not instances.domain.class_var: 
    19151343        raise Exception("Class variable is required!") 
    1916     if instances.domain.class_var.varType == Orange.core.VarTypes.Continuous: 
     1344    if instances.domain.class_var.var_type != Orange.feature.Type.Discrete: 
    19171345        raise Exception("CN2 requires a discrete class!") 
    19181346 
     
    19251353 
    19261354def rules_equal(rule1, rule2): 
    1927     if not len(rule1.filter.conditions) == len(rule2.filter.conditions): 
     1355    if len(rule1.filter.conditions) != len(rule2.filter.conditions): 
    19281356        return False 
    19291357    for c1 in rule1.filter.conditions: 
     
    19311359        for c2 in rule2.filter.conditions: 
    19321360            try: 
    1933                 if not c1.position == c2.position: continue # same feature? 
    1934                 if not type(c1) == type(c2): continue # same type of condition 
    1935                 if type(c1) == Orange.core.ValueFilter_discrete: 
    1936                     if not type(c1.values[0]) == type(c2.values[0]): continue 
    1937                     if not c1.values[0] == c2.values[0]: continue # same value? 
    1938                 if type(c1) == Orange.core.ValueFilter_continuous: 
    1939                     if not c1.oper == c2.oper: continue # same operator? 
    1940                     if not c1.ref == c2.ref: continue #same threshold? 
     1361                if c1.position == c2.position and type(c1) == type(c2): 
     1362                    continue # same feature and type? 
     1363                if isinstance(c1, Orange.data.filter.ValueFilterDiscrete): 
     1364                    if type(c1.values[0]) != type(c2.values[0]) or \ 
     1365                            c1.values[0] != c2.values[0]: 
     1366                        continue # same value? 
     1367                if isinstance(c1, Orange.data.filter.ValueFilterContinuous): 
     1368                    if c1.oper != c2.oper or c1.ref != c2.ref: 
     1369                        continue # same operator? 
    19411370                found = True 
    19421371                break 
     
    19861415 
    19871416    def createRandomDataSet(self, data): 
    1988         newData = Orange.core.ExampleTable(data) 
     1417        newData = Orange.data.Table(data) 
    19891418        # shuffle data 
    19901419        cl_num = newData.toNumpy("C") 
    19911420        random.shuffle(cl_num[0][:, 0]) 
    1992         clData = Orange.core.ExampleTable(Orange.core.Domain([newData.domain.classVar]), cl_num[0]) 
     1421        clData = Orange.data.Table(Orange.data.Domain([newData.domain.classVar]), cl_num[0]) 
    19931422        for d_i, d in enumerate(newData): 
    19941423            d[newData.domain.classVar] = clData[d_i][newData.domain.classVar] 
     
    20291458        self.learner.ruleFinder.ruleStoppingValidator = Orange.core.RuleValidator_LRS(alpha=1.0) 
    20301459        self.learner.ruleFinder.ruleStoppingValidator.max_rule_complexity = 0 
    2031         self.learner.ruleFinder.refiner = Orange.core.RuleBeamRefiner_Selector() 
    2032         self.learner.ruleFinder.ruleFilter = Orange.core.RuleBeamFilter_Width(width=5) 
     1460        self.learner.ruleFinder.refiner = BeamRefiner_Selector() 
     1461        self.learner.ruleFinder.ruleFilter = BeamFilter_Width(width=5) 
    20331462 
    20341463 
     
    21061535        return self.createEVDistList(extremeDists) 
    21071536 
    2108 class ABBeamFilter(Orange.core.RuleBeamFilter): 
     1537class ABBeamFilter(BeamFilter): 
    21091538    """ 
    21101539    ABBeamFilter: Filters beam; 
     
    21171546 
    21181547    def __call__(self, rulesStar, examples, weight_id): 
    2119         newStar = Orange.core.RuleList() 
     1548        newStar = RuleList() 
    21201549        rulesStar.sort(lambda x, y:-cmp(x.quality, y.quality)) 
    21211550        argsNum = 0 
     
    21471576 
    21481577 
    2149 class RuleCoversArguments: 
     1578class CoversArguments: 
    21501579    """ 
    21511580    Class determines if rule covers one out of a set of arguments. 
     
    21571586            indNA = getattr(a.filter, "indices", None) 
    21581587            if not indNA: 
    2159                 a.filter.setattr("indices", RuleCoversArguments.filterIndices(a.filter)) 
     1588                a.filter.setattr("indices", CoversArguments.filterIndices(a.filter)) 
    21601589            self.indices.append(a.filter.indices) 
    21611590 
     
    21641593            return False 
    21651594        if not getattr(rule.filter, "indices", None): 
    2166             rule.filter.indices = RuleCoversArguments.filterIndices(rule.filter) 
     1595            rule.filter.indices = CoversArguments.filterIndices(rule.filter) 
    21671596        for index in self.indices: 
    21681597            if map(operator.or_, rule.filter.indices, index) == rule.filter.indices: 
     
    21761605        for c in filter.conditions: 
    21771606            ind[c.position] = operator.or_(ind[c.position], 
    2178                                          RuleCoversArguments.conditionIndex(c)) 
     1607                                         CoversArguments.conditionIndex(c)) 
    21791608        return ind 
    21801609    filterIndices = staticmethod(filterIndices) 
    21811610 
    21821611    def conditionIndex(c): 
    2183         if type(c) == Orange.core.ValueFilter_continuous: 
    2184             if (c.oper == Orange.core.ValueFilter_continuous.GreaterEqual or 
    2185                 c.oper == Orange.core.ValueFilter_continuous.Greater): 
     1612        if isinstance(c, Orange.data.filter.ValueFilterContinuous): 
     1613            if (c.oper == Orange.data.filter.ValueFilter.GreaterEqual or 
     1614                c.oper == Orange.data.filter.ValueFilter.Greater): 
    21861615                return 5# 0101 
    2187             elif (c.oper == Orange.core.ValueFilter_continuous.LessEqual or 
    2188                   c.oper == Orange.core.ValueFilter_continuous.Less): 
     1616            elif (c.oper == Orange.data.filter.ValueFilter.LessEqual or 
     1617                  c.oper == Orange.data.filter.ValueFilter.Less): 
    21891618                return 3 # 0011 
    21901619            else: 
     
    22131642 
    22141643 
    2215 class SelectorAdder(Orange.core.RuleBeamRefiner): 
     1644class SelectorAdder(BeamRefiner): 
    22161645    """ 
    22171646    Selector adder, this function is a refiner function: 
     
    22271656 
    22281657    def __call__(self, oldRule, data, weight_id, target_class= -1): 
    2229         inNotAllowedSelectors = RuleCoversArguments(self.not_allowed_selectors) 
    2230         new_rules = Orange.core.RuleList() 
     1658        inNotAllowedSelectors = CoversArguments(self.not_allowed_selectors) 
     1659        new_rules = RuleList() 
    22311660 
    22321661        # get positive indices (selectors already in the rule) 
    22331662        indices = getattr(oldRule.filter, "indices", None) 
    22341663        if not indices: 
    2235             indices = RuleCoversArguments.filterIndices(oldRule.filter) 
     1664            indices = CoversArguments.filterIndices(oldRule.filter) 
    22361665            oldRule.filter.setattr("indices", indices) 
    22371666 
     
    22401669        for nA in self.not_allowed_selectors: 
    22411670            #print indices, nA.filter.indices 
    2242             at_i, type_na = RuleCoversArguments.oneSelectorToCover(indices, nA.filter.indices) 
     1671            at_i, type_na = CoversArguments.oneSelectorToCover(indices, nA.filter.indices) 
    22431672            if at_i > -1: 
    22441673                negative_indices[at_i] = operator.or_(negative_indices[at_i], type_na) 
     
    22501679            if ind == 1: 
    22511680                continue 
    2252             if data.domain[i].varType == Orange.core.VarTypes.Discrete and not negative_indices[i] == 1: # DISCRETE attribute 
     1681            if data.domain[i].varType == Orange.feature.Type.Discrete and not negative_indices[i] == 1: # DISCRETE attribute 
    22531682                if self.example: 
    22541683                    values = [self.example[i]] 
     
    22571686                for v in values: 
    22581687                    tempRule = oldRule.clone() 
    2259                     tempRule.filter.conditions.append(Orange.core.ValueFilter_discrete(position=i, 
    2260                                                                                   values=[Orange.core.Value(data.domain[i], v)], 
    2261                                                                                   acceptSpecial=0)) 
     1688                    tempRule.filter.conditions.append( 
     1689                        Orange.data.filter.Discrete( 
     1690                            position=i, 
     1691                            values=[Orange.data.Value(data.domain[i], v)], 
     1692                            acceptSpecial=0)) 
    22621693                    tempRule.complexity += 1 
    2263                     tempRule.filter.indices[i] = 1 # 1 stands for discrete attribute (see RuleCoversArguments.conditionIndex) 
     1694                    tempRule.filter.indices[i] = 1 # 1 stands for discrete attribute (see CoversArguments.conditionIndex) 
    22641695                    tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class) 
    22651696                    if len(tempRule.examples) < len(oldRule.examples): 
    22661697                        new_rules.append(tempRule) 
    2267             elif data.domain[i].varType == Orange.core.VarTypes.Continuous and not negative_indices[i] == 7: # CONTINUOUS attribute 
     1698            elif data.domain[i].varType == Orange.feature.Type.Continuous and not negative_indices[i] == 7: # CONTINUOUS attribute 
    22681699                try: 
    22691700                    at = data.domain[i] 
     
    22761707                        #LESS 
    22771708                        if not negative_indices[i] == 3: 
    2278                             tempRule = self.getTempRule(oldRule, i, Orange.core.ValueFilter_continuous.LessEqual, p, target_class, 3) 
     1709                            tempRule = self.getTempRule(oldRule, i, Orange.data.filter.ValueFilter.LessEqual, p, target_class, 3) 
    22791710                            if len(tempRule.examples) < len(oldRule.examples) and self.example[i] <= p:# and not inNotAllowedSelectors(tempRule): 
    22801711                                new_rules.append(tempRule) 
    22811712                        #GREATER 
    22821713                        if not negative_indices[i] == 5: 
    2283                             tempRule = self.getTempRule(oldRule, i, Orange.core.ValueFilter_continuous.Greater, p, target_class, 5) 
     1714                            tempRule = self.getTempRule(oldRule, i, Orange.data.filter.ValueFilter.Greater, p, target_class, 5) 
    22841715                            if len(tempRule.examples) < len(oldRule.examples) and self.example[i] > p:# and not inNotAllowedSelectors(tempRule): 
    22851716                                new_rules.append(tempRule) 
     
    22921723        tempRule = oldRule.clone() 
    22931724 
    2294         tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=pos, 
    2295                                                                         oper=oper, 
    2296                                                                         ref=ref, 
    2297                                                                         acceptSpecial=0)) 
     1725        tempRule.filter.conditions.append( 
     1726            Orange.data.filter.ValueFilterContinuous( 
     1727                position=pos, oper=oper, ref=ref, acceptSpecial=0)) 
    22981728        tempRule.complexity += 1 
    22991729        tempRule.filter.indices[pos] = operator.or_(tempRule.filter.indices[pos], atIndex) # from RuleCoversArguments.conditionIndex 
     
    23131743# This filter is the ugliest code ever! Problem is with Orange, I had some problems with inheriting deepCopy 
    23141744# I should take another look at it. 
    2315 class ArgFilter(Orange.core.Filter): 
     1745class ArgFilter(Orange.data.filter.Filter): 
    23161746    """ This class implements AB-covering principle. """ 
    2317     def __init__(self, argument_id=None, filter=Orange.core.Filter_values(), arg_example=None): 
     1747    def __init__(self, argument_id=None, filter=Orange.data.filter.Values(), arg_example=None): 
    23181748        self.filter = filter 
    23191749        self.indices = getattr(filter, "indices", []) 
    23201750        if not self.indices and len(filter.conditions) > 0: 
    2321             self.indices = RuleCoversArguments.filterIndices(filter) 
     1751            self.indices = CoversArguments.filterIndices(filter) 
    23221752        self.argument_id = argument_id 
    23231753        self.domain = self.filter.domain 
     
    23661796    def deep_copy(self): 
    23671797        newFilter = ArgFilter(argument_id=self.argument_id) 
    2368         newFilter.filter = Orange.core.Filter_values() #self.filter.deepCopy() 
     1798        newFilter.filter = Orange.data.filter.Values() #self.filter.deepCopy() 
    23691799        newFilter.filter.conditions = self.filter.conditions[:] 
    23701800        newFilter.domain = self.filter.domain 
     
    23781808ArgFilter = deprecated_members({"argumentID": "argument_id"})(ArgFilter) 
    23791809 
    2380 class SelectorArgConditions(Orange.core.RuleBeamRefiner): 
     1810class SelectorArgConditions(BeamRefiner): 
    23811811    """ 
    23821812    Selector adder, this function is a refiner function: 
     
    23901820    def __call__(self, oldRule, data, weight_id, target_class= -1): 
    23911821        if len(oldRule.filter.conditions) >= len(self.allowed_selectors): 
    2392             return Orange.core.RuleList() 
    2393         new_rules = Orange.core.RuleList() 
     1822            return RuleList() 
     1823        new_rules = RuleList() 
    23941824        for c in self.allowed_selectors: 
    23951825            # normal condition 
     
    24111841                for v in values: 
    24121842                    tempRule = oldRule.clone() 
    2413                     tempRule.filter.conditions.append(Orange.core.ValueFilter_continuous(position=c.position, 
    2414                                                                                     oper=c.oper, 
    2415                                                                                     ref=float(v), 
    2416                                                                                     acceptSpecial=0)) 
     1843                    tempRule.filter.conditions.append( 
     1844                        Orange.data.filter.ValueFilterContinuous( 
     1845                            position=c.position, oper=c.oper, 
     1846                            ref=float(v), acceptSpecial=0)) 
    24171847                    if tempRule(self.example): 
    2418                         tempRule.filterAndStore(oldRule.examples, oldRule.weightID, target_class) 
     1848                        tempRule.filterAndStore( 
     1849                            oldRule.examples, oldRule.weightID, target_class) 
    24191850                        if len(tempRule.examples) < len(oldRule.examples): 
    24201851                            new_rules.append(tempRule) 
     
    24401871        prob_dist = Orange.core.DistributionList() 
    24411872        for tex in res.results: 
    2442             d = Orange.core.Distribution(examples.domain.class_var) 
     1873            d = Orange.statistics.Distribution(examples.domain.class_var) 
    24431874            for di in range(len(d)): 
    24441875                d[di] = tex.probabilities[0][di] 
     
    24671898##            for e in examples: 
    24681899##                prob_dist.append(classifier(e,Orange.core.GetProbabilities)) 
    2469             cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas, classifier, prob_dist) 
     1900            cl = RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas, classifier, prob_dist) 
    24701901        else: 
    2471             cl = Orange.core.RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas) 
     1902            cl = RuleClassifier_logit(rules, self.min_cl_sig, self.min_beta, examples, weight, self.set_prefix_rules, self.optimize_betas) 
    24721903 
    24731904##        print "result" 
     
    24841915    def add_null_rule(self, rules, examples, weight): 
    24851916        for cl in examples.domain.class_var: 
    2486             tmpRle = Orange.core.Rule() 
    2487             tmpRle.filter = Orange.core.Filter_values(domain=examples.domain) 
     1917            tmpRle = Rule() 
     1918            tmpRle.filter = Orange.data.filter.Values(domain=examples.domain) 
    24881919            tmpRle.parentRule = None 
    24891920            tmpRle.filterAndStore(examples, weight, int(cl)) 
     
    24931924 
    24941925    def sort_rules(self, rules): 
    2495         new_rules = Orange.core.RuleList() 
     1926        new_rules = RuleList() 
    24961927        foundRule = True 
    24971928        while foundRule: 
     
    25221953 
    25231954 
    2524 class RuleClassifier_bestRule(Orange.core.RuleClassifier): 
     1955class RuleClassifier_bestRule(RuleClassifier): 
    25251956    """ 
    25261957    A very simple classifier, it takes the best rule of each class and 
     
    25301961        self.rules = rules 
    25311962        self.examples = examples 
    2532         self.apriori = Orange.core.Distribution(examples.domain.class_var, examples, weight_id) 
     1963        self.apriori = Orange.statistics.Distribution(examples.domain.class_var, examples, weight_id) 
    25331964        self.apriori_prob = [a / self.apriori.abs for a in self.apriori] 
    25341965        self.weight_id = weight_id 
     
    25381969    @deprecated_keywords({"retRules": "ret_rules"}) 
    25391970    def __call__(self, example, result_type=Orange.classification.Classifier.GetValue, ret_rules=False): 
    2540         example = Orange.core.Example(self.examples.domain, example) 
    2541         tempDist = Orange.core.Distribution(example.domain.class_var) 
     1971        example = Orange.data.Instance(self.examples.domain, example) 
     1972        tempDist = Orange.statistics.Distribution(example.domain.class_var) 
    25421973        best_rules = [None] * len(example.domain.class_var.values) 
    25431974 
     
    25601991        else: 
    25611992            tempDist.normalize() # prior probability 
    2562             tmp_examples = Orange.core.ExampleTable(self.examples) 
     1993            tmp_examples = Orange.data.Table(self.examples) 
    25631994            for r in best_rules: 
    25641995                if r: 
    25651996                    tmp_examples = r.filter(tmp_examples) 
    2566             tmpDist = Orange.core.Distribution(tmp_examples.domain.class_var, tmp_examples, self.weight_id) 
     1997            tmpDist = Orange.statistics.Distribution(tmp_examples.domain.class_var, tmp_examples, self.weight_id) 
    25671998            tmpDist.normalize() 
    25681999            probs = [0.] * len(self.examples.domain.class_var.values) 
    25692000            for i in range(len(self.examples.domain.class_var.values)): 
    25702001                probs[i] = tmpDist[i] + tempDist[i] * 2 
    2571             final_dist = Orange.core.Distribution(self.examples.domain.class_var) 
     2002            final_dist = Orange.statistics.Distribution(self.examples.domain.class_var) 
    25722003            for cl_i, cl in enumerate(self.examples.domain.class_var): 
    25732004                final_dist[cl] = probs[cl_i] 
     
    25772008            if result_type == Orange.classification.Classifier.GetValue: 
    25782009              return (final_dist.modus(), best_rules) 
    2579             if result_type == Orange.core.GetProbabilities: 
     2010            if result_type == Orange.classification.Classifier.GetProbabilities: 
    25802011              return (final_dist, best_rules) 
    25812012            return (final_dist.modus(), final_dist, best_rules) 
    25822013        if result_type == Orange.classification.Classifier.GetValue: 
    25832014          return final_dist.modus() 
    2584         if result_type == Orange.core.GetProbabilities: 
     2015        if result_type == Orange.classification.Classifier.GetProbabilities: 
    25852016          return final_dist 
    25862017        return (final_dist.modus(), final_dist) 
  • Orange/classification/svm/__init__.py

    r10300 r10369  
    1 """ 
    2 .. index:: support vector machines (SVM) 
    3 .. index: 
    4    single: classification; support vector machines (SVM) 
    5     
    6 ********************************* 
    7 Support Vector Machines (``svm``) 
    8 ********************************* 
    9  
    10 This is a module for `Support Vector Machine`_ (SVM) classification. It 
    11 exposes the underlying `LibSVM`_ and `LIBLINEAR`_ library in a standard 
    12 Orange Learner/Classifier interface. 
    13  
    14 Choosing the right learner 
    15 ========================== 
    16  
    17 Choose an SVM learner suitable for the problem. 
    18 :obj:`SVMLearner` is a general SVM learner. :obj:`SVMLearnerEasy` will 
    19 help with the data normalization and parameter tuning. Learn with a fast 
    20 :obj:`LinearSVMLearner` on data sets with a large number of features.  
    21  
    22 .. note:: SVM can perform poorly on some data sets. Choose the parameters  
    23           carefully. In cases of low classification accuracy, try scaling the  
    24           data and experiment with different parameters. \ 
    25           :obj:`SVMLearnerEasy` class does this automatically (it is similar 
    26           to the `svm-easy.py` script in the LibSVM distribution). 
    27  
    28            
    29 SVM learners (from `LibSVM`_) 
    30 ============================= 
    31  
    32 The most basic :class:`SVMLearner` implements the standard `LibSVM`_ learner 
    33 It supports four built-in kernel types (Linear, Polynomial, RBF and Sigmoid). 
    34 Additionally kernel functions defined in Python can be used instead.  
    35  
    36 .. note:: For learning from ordinary :class:`Orange.data.Table` use the \ 
    37     :class:`SVMLearner`. For learning from sparse dataset (i.e. 
    38     data in `basket` format) use the :class:`SVMLearnerSparse` class. 
    39  
    40 .. autoclass:: Orange.classification.svm.SVMLearner 
    41     :members: 
    42  
    43 .. autoclass:: Orange.classification.svm.SVMLearnerSparse 
    44     :members: 
    45     :show-inheritance: 
    46      
    47 .. autoclass:: Orange.classification.svm.SVMLearnerEasy 
    48     :members: 
    49     :show-inheritance: 
    50  
    51 The next example shows how to use SVM learners and that :obj:`SVMLearnerEasy`  
    52 with automatic data preprocessing and parameter tuning  
    53 outperforms :obj:`SVMLearner` with the default :obj:`~SVMLearner.nu` and :obj:`~SVMLearner.gamma`:   
    54      
    55 .. literalinclude:: code/svm-easy.py 
    56  
    57  
    58     
    59 Linear SVM learners (from `LIBLINEAR`_) 
    60 ======================================= 
    61  
    62 The :class:`LinearSVMLearner` learner is more suitable for large scale 
    63 problems as it is significantly faster then :class:`SVMLearner` and its 
    64 subclasses. A down side is it only supports a linear kernel (as the name 
    65 suggests) and does not support probability estimation for the 
    66 classifications. Furthermore a Multi-class SVM learner 
    67 :class:`MultiClassSVMLearner` is provided. 
    68     
    69 .. autoclass:: Orange.classification.svm.LinearSVMLearner 
    70    :members: 
    71     
    72 .. autoclass:: Orange.classification.svm.MultiClassSVMLearner 
    73    :members: 
    74     
    75     
    76 SVM Based feature selection and scoring 
    77 ======================================= 
    78  
    79 .. autoclass:: Orange.classification.svm.RFE 
    80  
    81 .. autoclass:: Orange.classification.svm.ScoreSVMWeights 
    82     :show-inheritance: 
    83   
    84   
    85 Utility functions 
    86 ================= 
    87  
    88 .. automethod:: Orange.classification.svm.max_nu 
    89  
    90 .. automethod:: Orange.classification.svm.get_linear_svm_weights 
    91  
    92 .. automethod:: Orange.classification.svm.table_to_svm_format 
    93  
    94 The following example shows how to get linear SVM weights: 
    95      
    96 .. literalinclude:: code/svm-linear-weights.py     
    97  
    98  
    99 .. _kernel-wrapper: 
    100  
    101 Kernel wrappers 
    102 =============== 
    103  
    104 Kernel wrappers are helper classes used to build custom kernels for use 
    105 with :class:`SVMLearner` and subclasses. All wrapper constructors take 
    106 one or more Python functions (`wrapped` attribute) to wrap. The  
    107 function must be a positive definite kernel, taking two arguments of  
    108 type :class:`Orange.data.Instance` and return a float. 
    109  
    110 .. autoclass:: Orange.classification.svm.kernels.KernelWrapper 
    111    :members: 
    112  
    113 .. autoclass:: Orange.classification.svm.kernels.DualKernelWrapper 
    114    :members: 
    115  
    116 .. autoclass:: Orange.classification.svm.kernels.RBFKernelWrapper 
    117    :members: 
    118  
    119 .. autoclass:: Orange.classification.svm.kernels.PolyKernelWrapper 
    120    :members: 
    121  
    122 .. autoclass:: Orange.classification.svm.kernels.AdditionKernelWrapper 
    123    :members: 
    124  
    125 .. autoclass:: Orange.classification.svm.kernels.MultiplicationKernelWrapper 
    126    :members: 
    127  
    128 .. autoclass:: Orange.classification.svm.kernels.CompositeKernelWrapper 
    129    :members: 
    130  
    131 .. autoclass:: Orange.classification.svm.kernels.SparseLinKernel 
    132    :members: 
    133  
    134 Example: 
    135  
    136 .. literalinclude:: code/svm-custom-kernel.py 
    137  
    138 .. _`Support Vector Machine`: http://en.wikipedia.org/wiki/Support_vector_machine 
    139 .. _`LibSVM`: http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 
    140 .. _`LIBLINEAR`: http://www.csie.ntu.edu.tw/~cjlin/liblinear/ 
    141  
    142 """ 
    143  
    1441import math 
    1452 
     
    17128 
    17229def max_nu(data): 
    173     """Return the maximum nu parameter for Nu_SVC support vector learning  
    174     for the given data table.  
     30    """ 
     31    Return the maximum nu parameter for the given data table for 
     32    Nu_SVC learning. 
    17533     
    17634    :param data: Data with discrete class variable 
     
    19149class SVMLearner(_SVMLearner): 
    19250    """ 
    193     :param svm_type: defines the SVM type (can be C_SVC, Nu_SVC  
    194         (default), OneClass, Epsilon_SVR, Nu_SVR) 
     51    :param svm_type: the SVM type 
    19552    :type svm_type: SVMLearner.SVMType 
    196     :param kernel_type: defines the kernel type for learning 
    197         (can be kernels.RBF (default), kernels.Linear, kernels.Polynomial,  
    198         kernels.Sigmoid, kernels.Custom) 
     53    :param kernel_type: the kernel type 
    19954    :type kernel_type: SVMLearner.Kernel 
    200     :param degree: kernel parameter (for Polynomial) (default 3) 
     55    :param degree: kernel parameter (only for ``Polynomial``) 
    20156    :type degree: int 
    202     :param gamma: kernel parameter (Polynomial/RBF/Sigmoid) 
    203         (default 1.0/num_of_features) 
     57    :param gamma: kernel parameter; if 0, it is set to 1.0/#features (for ``Polynomial``, ``RBF`` and ``Sigmoid``) 
    20458    :type gamma: float 
    205     :param coef0: kernel parameter (Polynomial/Sigmoid) (default 0) 
     59    :param coef0: kernel parameter (for ``Polynomial`` and ``Sigmoid``) 
    20660    :type coef0: int 
    207     :param kernel_func: function that will be called if `kernel_type` is 
    208         `kernels.Custom`. It must accept two :obj:`Orange.data.Instance` 
    209         arguments and return a float (see :ref:`kernel-wrapper` for some 
    210         examples). 
    211     :type kernel_func: callable function 
    212     :param C: C parameter for C_SVC, Epsilon_SVR and Nu_SVR 
     61    :param kernel_func: kernel function if ``kernel_type`` is 
     62        ``kernels.Custom`` 
     63    :type kernel_func: callable object 
     64    :param C: C parameter (for ``C_SVC``, ``Epsilon_SVR`` and ``Nu_SVR``) 
    21365    :type C: float 
    214     :param nu: Nu parameter for Nu_SVC, Nu_SVR and OneClass (default 0.5) 
     66    :param nu: Nu parameter (for ``Nu_SVC``, ``Nu_SVR`` and ``OneClass``) 
    21567    :type nu: float 
    216     :param p: epsilon in loss-function for Epsilon_SVR 
     68    :param p: epsilon parameter (for ``Epsilon_SVR``) 
    21769    :type p: float 
    218     :param cache_size: cache memory size in MB (default 200) 
     70    :param cache_size: cache memory size in MB 
    21971    :type cache_size: int 
    220     :param eps: tolerance of termination criterion (default 0.001) 
     72    :param eps: tolerance of termination criterion 
    22173    :type eps: float 
    22274    :param probability: build a probability model 
    223         (default False) 
    22475    :type probability: bool 
    22576    :param shrinking: use shrinking heuristics  
    226         (default True) 
    22777    :type shrinking: bool 
    22878    :param weight: a list of class weights 
    22979    :type weight: list 
    230      
     80 
    23181    Example: 
    23282     
     
    23484        >>> from Orange.classification import svm 
    23585        >>> from Orange.evaluation import testing, scoring 
    236         >>> table = Orange.data.Table("vehicle.tab") 
     86        >>> data = Orange.data.Table("vehicle.tab") 
    23787        >>> learner = svm.SVMLearner() 
    238         >>> results = testing.cross_validation([learner], table, folds=5) 
     88        >>> results = testing.cross_validation([learner], data, folds=5) 
    23989        >>> print scoring.CA(results)[0] 
    24090        0.789613644274 
     
    283133        :type table: Orange.data.Table 
    284134         
    285         :param weight: unused - use the constructors ``weight`` 
    286             parameter to set class weights 
    287          
     135        :param weight: ignored (required due to base class signature); 
    288136        """ 
    289137 
     
    338186    def tune_parameters(self, data, parameters=None, folds=5, verbose=0, 
    339187                       progress_callback=None): 
    340         """Tune the ``parameters`` on given ``data`` using  
    341         cross validation. 
     188        """Tune the ``parameters`` on the given ``data`` using  
     189        internal cross validation. 
    342190         
    343191        :param data: data for parameter tuning 
    344192        :type data: Orange.data.Table  
    345         :param parameters: defaults to ["nu", "C", "gamma"] 
     193        :param parameters: names of parameters to tune 
     194            (default: ["nu", "C", "gamma"]) 
    346195        :type parameters: list of strings 
    347         :param folds: number of folds used for cross validation 
     196        :param folds: number of folds for internal cross validation 
    348197        :type folds: int 
    349         :param verbose: default False 
     198        :param verbose: set verbose output 
    350199        :type verbose: bool 
    351         :param progress_callback: report progress 
     200        :param progress_callback: callback function for reporting progress 
    352201        :type progress_callback: callback function 
    353202             
    354         An example that tunes the `gamma` parameter on `data` using 3-fold cross  
    355         validation. :: 
     203        Here is example of tuning the `gamma` parameter using 
     204        3-fold cross validation. :: 
    356205 
    357206            svm = Orange.classification.svm.SVMLearner() 
     
    445294class SVMLearnerSparse(SVMLearner): 
    446295 
    447     """A :class:`SVMLearner` that learns from 
    448     meta attributes. 
    449      
    450     Meta attributes do not need to be registered with the data set domain, or  
    451     present in all the instances. Use this for large  
    452     sparse data sets. 
    453      
     296    """ 
     297    A :class:`SVMLearner` that learns from data stored in meta 
     298    attributes. Meta attributes do not need to be registered with the 
     299    data set domain, or present in all data instances. 
    454300    """ 
    455301 
     
    472318class SVMLearnerEasy(SVMLearner): 
    473319 
    474     """Apart from the functionality of :obj:`SVMLearner` it automatically scales the  
    475     data and perform parameter optimization with the  
    476     :func:`SVMLearner.tune_parameters`. It is similar to the easy.py script in  
    477     the LibSVM package. 
     320    """A class derived from :obj:`SVMLearner` that automatically 
     321    scales the data and performs parameter optimization using 
     322    :func:`SVMLearner.tune_parameters`. The procedure is similar to 
     323    that implemented in easy.py script from the LibSVM package. 
    478324     
    479325    """ 
     
    545391    def __init__(self, solver_type=L2R_L2LOSS_DUAL, C=1.0, eps=0.01, **kwargs): 
    546392        """ 
    547         :param solver_type: Can be one of class constants: 
    548          
    549             - L2R_L2LOSS_DUAL 
    550             - L2R_L2LOSS  
    551             - L2R_L1LOSS_DUAL 
    552             - L2R_L1LOSS 
    553             - L1R_L2LOSS 
     393        :param solver_type: One of the following class constants: ``LR2_L2LOSS_DUAL``, ``L2R_L2LOSS``, ``LR2_L1LOSS_DUAL``, ``L2R_L1LOSS`` or ``L1R_L2LOSS`` 
    554394         
    555395        :param C: Regularization parameter (default 1.0) 
     
    611451    """Extract attribute weights from the linear SVM classifier. 
    612452     
    613     For multi class classification the weights are square-summed over all 
    614     binary one vs. one classifiers unles obj:`sum` is False, in which case 
    615     the return value is a list of weights for each individual binary 
    616     classifier (in the order of [class1 vs class2, class1 vs class3 ... class2 
    617     vs class3 ...]). 
     453    For multi class classification, the result depends on the argument 
     454    :obj:`sum`. If ``True`` (default) the function computes the 
     455    squared sum of the weights over all binary one vs. one 
     456    classifiers. If :obj:`sum` is ``False`` it returns a list of 
     457    weights for each individual binary classifier (in the order of 
     458    [class1 vs class2, class1 vs class3 ... class2 vs class3 ...]). 
    618459         
    619460    """ 
     
    687528 
    688529class ScoreSVMWeights(Orange.feature.scoring.Score): 
    689     """Score feature by training a linear SVM classifier, using a squared sum of  
    690     weights (of each binary classifier) as the returned score. 
     530    """ 
     531    Score a feature by the squared sum of weights using a linear SVM 
     532    classifier. 
    691533         
    692534    Example: 
     
    759601class RFE(object): 
    760602 
    761     """Recursive feature elimination using linear SVM derived attribute  
    762     weights. 
     603    """Iterative feature elimination based on weights computed by 
     604    linear SVM. 
    763605     
    764606    Example:: 
     
    770612            normalization=False) # normalization=False will not change the domain 
    771613        rfe = Orange.classification.svm.RFE(l) 
    772         data_with_removed_features = rfe(table, 5) 
     614        data_subset_of_features = rfe(table, 5) 
    773615         
    774616    """ 
  • Orange/classification/svm/kernels.py

    r9671 r10369  
    1414    """A base class for kernel function wrappers. 
    1515     
    16     :param wrapped: a function to wrap 
    17     :type wrapped: function(:class:`Orange.data.Instance`, :class:`Orange.data.Instance`) 
     16    :param wrapped: a kernel function to wrap 
    1817     
    1918    """ 
     
    2221        self.wrapped=wrapped 
    2322         
    24     def __call__(self, example1, example2): 
    25         return self.wrapped(example1, example2) 
     23    def __call__(self, inst1, inst2): 
     24        return self.wrapped(inst1, inst2) 
    2625  
    2726class DualKernelWrapper(KernelWrapper): 
    2827     
    29     """A base class for kernel wrapper that wraps two other kernel functions. 
     28    """A base class for kernel wrapper that wraps two kernel functions. 
    3029     
    31     :param wrapped1:  a function to wrap 
    32     :type wrapped1: function(:class:`Orange.data.Instance`, :class:`Orange.data.Instance`) 
    33     :param wrapped2:  a function to wrap 
    34     :type wrapped2: function(:class:`Orange.data.Instance`, :class:`Orange.data.Instance`) 
     30    :param wrapped1:  first kernel function 
     31    :param wrapped2:  second kernel function 
    3532     
    3633    """ 
     
    4239class RBFKernelWrapper(KernelWrapper): 
    4340     
    44     """A Kernel wrapper that uses a wrapped kernel function in a RBF 
    45     (Radial Basis Function). 
     41    """A Kernel wrapper that wraps the given function into RBF 
    4642     
    47     :param wrapped: a function to wrap 
    48     :type wrapped: function(:class:`Orange.data.Instance`, :class:`Orange.data.Instance`) 
     43    :param wrapped: a kernel function 
    4944    :param gamma: the gamma of the RBF 
    5045    :type gamma: double 
     
    5651        self.gamma=gamma 
    5752         
    58     def __call__(self, example1, example2): 
    59         """:math:`exp(-gamma * wrapped(example1, example2) ^ 2)`  
    60          
     53    def __call__(self, inst1, inst2): 
     54        """Return :math:`exp(-gamma * wrapped(inst1, inst2) ^ 2)`  
    6155        """ 
    6256         
    63         return math.exp(-self.gamma*math.pow(self.wrapped(example1,  
    64                                                           example2),2)) 
     57        return math.exp( 
     58            -self.gamma*math.pow(self.wrapped(inst1, inst2), 2)) 
    6559             
    6660class PolyKernelWrapper(KernelWrapper): 
     
    6862    """Polynomial kernel wrapper. 
    6963     
    70     :param wrapped: a function to wrap 
    71     :type wrapped: function(:class:`Orange.data.Instance`, :class:`Orange.data.Instance`) 
     64    :param wrapped: a kernel function 
     65 
    7266    :param degree: degree of the polynomial 
    73     :type degree: double 
     67    :type degree: float 
    7468     
    7569    """ 
     
    7973        self.degree=degree 
    8074         
    81     def __call__(self, example1, example2): 
    82         """:math:`wrapped(example1, example2) ^ d`""" 
     75    def __call__(self, inst1, inst2): 
     76        """Return :math:`wrapped(inst1, inst2) ^ d`""" 
    8377         
    84         return math.pow(self.wrapped(example1, example2), self.degree) 
     78        return math.pow(self.wrapped(inst1, inst2), self.degree) 
    8579 
    8680class AdditionKernelWrapper(DualKernelWrapper): 
    8781     
    88     """Addition kernel wrapper.""" 
     82    """ 
     83    Addition kernel wrapper. 
     84 
     85    :param wrapped1:  first kernel function 
     86    :param wrapped2:  second kernel function 
     87 
     88    """ 
    8989     
    90     def __call__(self, example1, example2): 
    91         """:math:`wrapped1(example1, example2) + wrapped2(example1, example2)` 
     90    def __call__(self, inst1, inst2): 
     91        """Return :math:`wrapped1(inst1, inst2) + wrapped2(inst1, inst2)` 
    9292             
    9393        """ 
    9494         
    95         return self.wrapped1(example1, example2) + \ 
    96                                             self.wrapped2(example1, example2) 
     95        return self.wrapped1(inst1, inst2) + self.wrapped2(inst1, inst2) 
    9796 
    9897class MultiplicationKernelWrapper(DualKernelWrapper): 
    9998     
    100     """Multiplication kernel wrapper.""" 
     99    """ 
     100    Multiplication kernel wrapper. 
     101 
     102    :param wrapped1:  first kernel function 
     103    :param wrapped2:  second kernel function 
     104""" 
    101105     
    102     def __call__(self, example1, example2): 
    103         """:math:`wrapped1(example1, example2) * wrapped2(example1, example2)` 
     106    def __call__(self, inst1, inst2): 
     107        """Return :math:`wrapped1(inst1, inst2) * wrapped2(inst1, inst2)` 
    104108             
    105109        """ 
    106110         
    107         return self.wrapped1(example1, example2) * \ 
    108                                             self.wrapped2(example1, example2) 
     111        return self.wrapped1(inst1, inst2) * self.wrapped2(inst1, inst2) 
    109112 
    110113class CompositeKernelWrapper(DualKernelWrapper): 
    111114     
    112115    """Composite kernel wrapper. 
    113      
    114     :param wrapped1:  a function to wrap 
    115     :type wrapped1: function(:class:`Orange.data.Instance`, :class:`Orange.data.Instance`) 
    116     :param wrapped2:  a function to wrap 
    117     :type wrapped2: function(:class:`Orange.data.Instance`, :class:`Orange.data.Instance`) 
     116 
     117    :param wrapped1:  first kernel function 
     118    :param wrapped2:  second kernel function 
    118119    :param l: coefficient 
    119120    :type l: double 
     
    125126        self.l=l 
    126127         
    127     def __call__(self, example1, example2): 
    128         """:math:`l*wrapped1(example1,example2)+(1-l)*wrapped2(example1,example2)` 
     128    def __call__(self, inst1, inst2): 
     129        """Return :math:`l*wrapped1(inst1, inst2) + (1-l)*wrapped2(inst1, inst2)` 
    129130             
    130131        """ 
    131         return self.l * self.wrapped1(example1, example2) + (1-self.l) * \ 
    132                                             self.wrapped2(example1,example2) 
     132        return self.l * self.wrapped1(inst1, inst2) + \ 
     133            (1-self.l) * self.wrapped2(inst1, inst2) 
    133134 
    134135class SparseLinKernel(object): 
    135     def __call__(self, example1, example2): 
    136         """Computes a linear kernel function using the examples meta attributes 
    137         (need to be floats). 
     136    def __call__(self, inst1, inst2): 
     137        """ 
     138        Compute a linear kernel function using the instances' meta attributes. 
     139        The meta attributes' values must be floats. 
    138140         
    139141        """ 
    140         s = set(example1.getmetas().keys()) & set(example2.getmetas().keys()) 
     142        s = set(inst1.getmetas().keys()) & set(inst2.getmetas().keys()) 
    141143        sum = 0 
    142144        for key in s: 
    143             sum += float(example2[key]) * float(example1[key]) 
     145            sum += float(inst2[key]) * float(inst1[key]) 
    144146        return sum 
    145147 
  • Orange/classification/tree.py

    r10208 r10371  
    16891689    A classification or regression tree learner. If a set of instances 
    16901690    is given on initialization, a :class:`TreeClassifier` is built and 
    1691     returned instead. All attributes can also be set on initialization. 
     1691    returned instead. 
     1692 
     1693    The learning algorithm has a large number of parameters. The class 
     1694    provides reasonable defaults; they can be modified either as attributes 
     1695    or as arguments given to the constructor. 
     1696 
     1697    The algorithm is very flexible, yet slower than the other two 
     1698    implementations that are more suitable for large scale 
     1699    experiments. 
    16921700 
    16931701    **The tree induction process** 
  • Orange/statistics/distribution.py

    r9927 r10372  
    1 """ 
    2 .. index:: Distributions 
    3  
    4 ============= 
    5 Distributions 
    6 ============= 
    7  
    8 :obj:`Distribution` and derived classes store empirical 
    9 distributions of discrete and continuous variables. 
    10  
    11 .. class:: Distribution 
    12  
    13     This class can 
    14     store absolute or relative frequencies. It provides a convenience constructor 
    15     which constructs instances of derived classes. :: 
    16  
    17         >>> import Orange 
    18         >>> data = Orange.data.Table("adult_sample") 
    19         >>> disc = Orange.statistics.distribution.Distribution("workclass", data) 
    20         >>> print disc 
    21         <685.000, 72.000, 28.000, 29.000, 59.000, 43.000, 2.000> 
    22         >>> print type(disc) 
    23         <type 'DiscDistribution'> 
    24  
    25     The resulting distribution is of type :obj:`DiscDistribution` since variable 
    26     `workclass` is discrete. The printed numbers are counts of examples that have particular 
    27     attribute value. :: 
    28  
    29         >>> workclass = data.domain["workclass"] 
    30         >>> for i in range(len(workclass.values)): 
    31         ...     print "%20s: %5.3f" % (workclass.values[i], disc[i]) 
    32                  Private: 685.000 
    33         Self-emp-not-inc: 72.000 
    34             Self-emp-inc: 28.000 
    35              Federal-gov: 29.000 
    36                Local-gov: 59.000 
    37                State-gov: 43.000 
    38              Without-pay: 2.000 
    39             Never-worked: 0.000 
    40  
    41     Distributions resembles dictionaries, supporting indexing by instances of 
    42     :obj:`Orange.data.Value`, integers or floats (depending on the distribution 
    43     type), and symbolic names (if :obj:`variable` is defined). 
    44  
    45     For instance, the number of examples with `workclass="private"`, can be 
    46     obtained in three ways:: 
    47      
    48         print "Private: ", disc["Private"] 
    49         print "Private: ", disc[0] 
    50         print "Private: ", disc[orange.Value(workclass, "Private")] 
    51  
    52     Elements cannot be removed from distributions. 
    53  
    54     Length of distribution equals the number of possible values for discrete 
    55     distributions (if :obj:`variable` is set), the value with the highest index 
    56     encountered (if distribution is discrete and :obj: `variable` is 
    57     :obj:`None`) or the number of different values encountered (for continuous 
    58     distributions). 
    59  
    60     .. attribute:: variable 
    61  
    62         Variable to which the distribution applies; may be :obj:`None` if not 
    63         applicable. 
    64  
    65     .. attribute:: unknowns 
    66  
    67         The number of instances for which the value of the variable was 
    68         undefined. 
    69  
    70     .. attribute:: abs 
    71  
    72         Sum of all elements in the distribution. Usually it equals either 
    73         :obj:`cases` if the instance stores absolute frequencies or 1 if the 
    74         stored frequencies are relative, e.g. after calling :obj:`normalize`. 
    75  
    76     .. attribute:: cases 
    77  
    78         The number of instances from which the distribution is computed, 
    79         excluding those on which the value was undefined. If instances were 
    80         weighted, this is the sum of weights. 
    81  
    82     .. attribute:: normalized 
    83  
    84         :obj:`True` if distribution is normalized. 
    85  
    86     .. attribute:: random_generator 
    87  
    88         A pseudo-random number generator used for method :obj:`Orange.misc.Random`. 
    89  
    90     .. method:: __init__(variable[, data[, weightId=0]]) 
    91  
    92         Construct either :obj:`DiscDistribution` or :obj:`ContDistribution`, 
    93         depending on the variable type. If the variable is the only argument, it 
    94         must be an instance of :obj:`Orange.feature.Descriptor`. In that case, 
    95         an empty distribution is constructed. If data is given as well, the 
    96         variable can also be specified by name or index in the 
    97         domain. Constructor then computes the distribution of the specified 
    98         variable on the given data. If instances are weighted, the id of 
    99         meta-attribute with weights can be passed as the third argument. 
    100  
    101         If variable is given by descriptor, it doesn't need to exist in the 
    102         domain, but it must be computable from given instances. For example, the 
    103         variable can be a discretized version of a variable from data. 
    104  
    105     .. method:: keys() 
    106  
    107         Return a list of possible values (if distribution is discrete and 
    108         :obj:`variable` is set) or a list encountered values otherwise. 
    109  
    110     .. method:: values() 
    111  
    112         Return a list of frequencies of values such as described above. 
    113  
    114     .. method:: items() 
    115  
    116         Return a list of pairs of elements of the above lists. 
    117  
    118     .. method:: native() 
    119  
    120         Return the distribution as a list (for discrete distributions) or as a 
    121         dictionary (for continuous distributions) 
    122  
    123     .. method:: add(value[, weight=1]) 
    124  
    125         Increase the count of the element corresponding to ``value`` by 
    126         ``weight``. 
    127  
    128         :param value: Value 
    129         :type value: :obj:`Orange.data.Value`, string (if :obj:`variable` is set), :obj:`int` for discrete distributions or :obj:`float` for continuous distributions 
    130         :param weight: Weight to be added to the count for ``value`` 
    131         :type weight: float 
    132  
    133     .. method:: normalize() 
    134  
    135         Divide the counts by their sum, set :obj:`normalized` to :obj:`True` and 
    136         :obj:`abs` to 1. Attributes :obj:`cases` and :obj:`unknowns` are 
    137         unchanged. This changes absoluted frequencies into relative. 
    138  
    139     .. method:: modus() 
    140  
    141         Return the most common value. If there are multiple such values, one is 
    142         chosen at random, although the chosen value will always be the same for 
    143         the same distribution. 
    144  
    145     .. method:: random() 
    146  
    147         Return a random value based on the stored empirical probability 
    148         distribution. For continuous distributions, this will always be one of 
    149         the values which actually appeared (e.g. one of the values from 
    150         :obj:`keys`). 
    151  
    152         The method uses :obj:`random_generator`. If none has been constructed or 
    153         assigned yet, a new one is constructed and stored for further use. 
    154  
    155  
    156 .. class:: Discrete 
    157  
    158     Stores a discrete distribution of values. The class differs from its parent 
    159     class in having a few additional constructors. 
    160  
    161     .. method:: __init__(variable) 
    162  
    163         Construct an instance of :obj:`Discrete` and set the variable 
    164         attribute. 
    165  
    166         :param variable: A discrete variable 
    167         :type variable: Orange.feature.Discrete 
    168  
    169     .. method:: __init__(frequencies) 
    170  
    171         Construct an instance and initialize the frequencies from the list, but 
    172         leave `Distribution.variable` empty. 
    173  
    174         :param frequencies: A list of frequencies 
    175         :type frequencies: list 
    176  
    177         Distribution constructed in this way can be used, for instance, to 
    178         generate random numbers from a given discrete distribution:: 
    179  
    180             disc = Orange.statistics.distribution.Discrete([0.5, 0.3, 0.2]) 
    181             for i in range(20): 
    182                 print disc.random(), 
    183  
    184         This prints out approximatelly ten 0's, six 1's and four 2's. The values 
    185         can be named by assigning a variable:: 
    186  
    187             v = orange.EnumVariable(values = ["red", "green", "blue"]) 
    188             disc.variable = v 
    189  
    190     .. method:: __init__(distribution) 
    191  
    192         Copy constructor; makes a shallow copy of the given distribution 
    193  
    194         :param distribution: An existing discrete distribution 
    195         :type distribution: Discrete 
    196  
    197  
    198 .. class:: Continuous 
    199  
    200     Stores a continuous distribution, that is, a dictionary-like structure with 
    201     values and their frequencies. 
    202  
    203     .. method:: __init__(variable) 
    204  
    205         Construct an instance of :obj:`ContDistribution` and set the variable 
    206         attribute. 
    207  
    208         :param variable: A continuous variable 
    209         :type variable: Orange.feature.Continuous 
    210  
    211     .. method:: __init__(frequencies) 
    212  
    213         Construct an instance of :obj:`Continuous` and initialize it from 
    214         the given dictionary with frequencies, whose keys and values must be integers. 
    215  
    216         :param frequencies: Values and their corresponding frequencies 
    217         :type frequencies: dict 
    218  
    219     .. method:: __init__(distribution) 
    220  
    221         Copy constructor; makes a shallow copy of the given distribution 
    222  
    223         :param distribution: An existing continuous distribution 
    224         :type distribution: Continuous 
    225  
    226     .. method:: average() 
    227  
    228         Return the average value. Note that the average can also be 
    229         computed using a simpler and faster classes from module 
    230         :obj:`Orange.statistics.basic`. 
    231  
    232     .. method:: var() 
    233  
    234         Return the variance of distribution. 
    235  
    236     .. method:: dev() 
    237  
    238         Return the standard deviation. 
    239  
    240     .. method:: error() 
    241  
    242         Return the standard error. 
    243  
    244     .. method:: percentile(p) 
    245  
    246         Return the value at the `p`-th percentile. 
    247  
    248         :param p: The percentile, must be between 0 and 100 
    249         :type p: float 
    250         :rtype: float 
    251  
    252         For example, if `d_age` is a continuous distribution, the quartiles can 
    253         be printed by :: 
    254  
    255             print "Quartiles: %5.3f - %5.3f - %5.3f" % (  
    256                  dage.percentile(25), dage.percentile(50), dage.percentile(75)) 
    257  
    258    .. method:: density(x) 
    259  
    260         Return the probability density at `x`. If the value is not in 
    261         :obj:`Distribution.keys`, it is interpolated. 
    262  
    263  
    264 .. class:: Gaussian 
    265  
    266     A class imitating :obj:`Continuous` by returning the statistics and 
    267     densities for Gaussian distribution. The class is not meant only for a 
    268     convenient substitution for code which expects an instance of 
    269     :obj:`Distribution`. For general use, Python module :obj:`random` 
    270     provides a comprehensive set of functions for various random distributions. 
    271  
    272     .. attribute:: mean 
    273  
    274         The mean value parameter of the Gauss distribution. 
    275  
    276     .. attribute:: sigma 
    277  
    278         The standard deviation of the distribution 
    279  
    280     .. attribute:: abs 
    281  
    282         The simulated number of instances; in effect, the Gaussian distribution 
    283         density, as returned by method :obj:`density` is multiplied by 
    284         :obj:`abs`. 
    285  
    286     .. method:: __init__([mean=0, sigma=1]) 
    287  
    288         Construct an instance, set :obj:`mean` and :obj:`sigma` to the given 
    289         values and :obj:`abs` to 1. 
    290  
    291     .. method:: __init__(distribution) 
    292  
    293         Construct a distribution which approximates the given distribution, 
    294         which must be either :obj:`Continuous`, in which case its 
    295         average and deviation will be used for mean and sigma, or and existing 
    296         :obj:`GaussianDistribution`, which will be copied. Attribute :obj:`abs` 
    297         is set to the given distribution's ``abs``. 
    298  
    299     .. method:: average() 
    300  
    301         Return :obj:`mean`. 
    302  
    303     .. method:: dev() 
    304  
    305         Return :obj:`sigma`. 
    306  
    307     .. method:: var() 
    308  
    309         Return square of :obj:`sigma`. 
    310  
    311     .. method:: density(x) 
    312  
    313         Return the density at point ``x``, that is, the Gaussian distribution 
    314         density multiplied by :obj:`abs`. 
    315  
    316  
    317 Class distributions 
    318 =================== 
    319  
    320 There is a convenience function for computing empirical class distributions from 
    321 data. 
    322  
    323 .. function:: getClassDistribution(data[, weightID=0]) 
    324  
    325     Return a class distribution for the given data. 
    326  
    327     :param data: A set of instances. 
    328     :type data: Orange.data.Table 
    329     :param weightID: An id for meta attribute with weights of instances 
    330     :type weightID: int 
    331     :rtype: :obj:`Discrete` or :obj:`Continuous`, depending on the class type 
    332  
    333 Distributions of all variables 
    334 ============================== 
    335  
    336 Distributions of all variables can be computed and stored in 
    337 :obj:`Domain`. The list-like object can be indexed by variable 
    338 indices in the domain, as well as by variables and their names. 
    339  
    340 .. class:: Domain 
    341  
    342     .. method:: __init__(data[, weightID=0]) 
    343  
    344         Construct an instance with distributions of all discrete and continuous 
    345         variables from the given data. 
    346  
    347     :param data: A set of instances. 
    348     :type data: Orange.data.Table 
    349     :param weightID: An id for meta attribute with weights of instances 
    350     :type weightID: int 
    351  
    352 The script below computes distributions for all attributes in the data and 
    353 prints out distributions for discrete and averages for continuous attributes. :: 
    354  
    355     dist = Orange.statistics.distribution.Domain(data) 
    356  
    357     for d in dist: 
    358         if d.variable.var_type == Orange.feature.Type.Discrete: 
    359              print "%30s: %s" % (d.variable.name, d) 
    360         else: 
    361              print "%30s: avg. %5.3f" % (d.variable.name, d.average()) 
    362  
    363 The distribution for, say, attribute `age` can be obtained by its index and also 
    364 by its name:: 
    365  
    366     dist_age = dist["age"] 
    367  
    368 """ 
    369  
    370  
    3711from Orange.core import Distribution 
    3722from Orange.core import DiscDistribution as Discrete 
  • docs/reference/rst/Orange.classification.classfromvar.rst

    r10363 r10376  
    77************************ 
    88 
    9 Classifiers from variable are used not to predict class values 
    10 but to compute variable's values from another variables. 
    11 For instance, when a continuous variable is discretized and replaced by 
    12 a discrete variable, an instance of a classifier from variable takes 
    13 care of automatic value computation when needed. 
     9:obj:`~Orange.classification.ClassifierFromVar` and 
     10:obj:`~Orange.classification.ClassifierFromVarFD` are helper 
     11classifiers used to compute variable's values from another variables. They are used, for instance, in discretization of continuous variables. 
    1412 
    15 There are two classifiers from variable; the simpler :obj:`ClassifierFromVarFD` 
    16 supposes that example is from some fixed domain and the safer 
    17 :obj:`ClassifierFromVar` does not. 
     13:obj:`~Orange.classification.ClassifierFromVarFD` retrieves the 
     14feature value based on its position in the domain and 
     15:obj:`~Orange.classification.ClassifierFromVar` retrieves the feature 
     16with the given descriptor. 
    1817 
    19 Both classifiers can be given a transformer that can modify the value. 
    20 In discretization, for instance, the transformer is responsible to compute 
    21 a discrete interval for a continuous value of the original variable. 
     18Both classifiers can be given a function to transform the value. In 
     19discretization, for instance, the transformer computes the 
     20corresponding discrete interval for a continuous value of the original 
     21variable. 
    2222 
    2323 
    24 ClassifierFromVar 
    25 ================= 
    26  
    27 .. class:: ClassifierFromVar(which_var, transformer) 
     24.. class:: ClassifierFromVar(which_var[, transformer]) 
    2825     
    29     Compute variable's values from variable :obj:`~ClassifierFromVar.which_var` 
    30     using transformation defined by :obj:`~ClassifierFromVar.transformer`. 
    31  
     26    Return the value of variable :obj:`~ClassifierFromVar.which_var`; 
     27    transform it by the :obj:`~ClassifierFromVar.transformer`, if it 
     28    is given. 
     29  
    3230    .. attribute:: which_var 
    3331 
    34         The descriptor of the attribute whose value is to be returned. 
     32        The descriptor of the feature whose value is returned. 
    3533 
    3634    .. attribute:: transformer         
    3735 
    38         The transformer for the value. It should be a class derived from 
    39         :obj:`~Orange.data.utils.TransformValue`, but you can also use a 
    40         callback function. 
     36        The transformer for the value. It should be a class derived 
     37        from :obj:`~Orange.data.utils.TransformValue` or a function 
     38        written in Python. 
     39 
     40    .. attribute:: transform_unknowns 
     41 
     42        Defines the treatment of missing values. 
    4143 
    4244    .. attribute:: distribution_for_unknown 
    4345 
    4446        The distribution that is returned when the 
    45         :obj:`~ClassifierFromVar.which_var`'s value is undefined. 
     47        :obj:`~ClassifierFromVar.which_var`'s value is undefined and 
     48        :obj:`~ClassifierFromVar.transform_unknowns` is ``False``. 
    4649 
    47 When given an instance, :obj:`ClassifierFromVar` will return 
    48 ``transformer(instance[which_var])``. 
    49 Attribute :obj:`~ClassifierFromVar.which_var` can be either an ordinary 
    50 variable, a meta variable or a variable which is not defined for the instance 
    51 but has :obj:`~Orange.feature.Descriptor.get_value_from` that can be used to 
    52 compute the value. If none goes through or if the value found is unknown, a 
    53 Value of subtype Distribution containing 
    54 :obj:`~ClassifierFromVar.distribution_for_unknown` is returned. 
     50    .. method:: __call__(inst[, result_type]) 
    5551 
    56 The class stores the domain version for the last example and its position in 
    57 the domain. If consecutive examples come from the same domain (which is usually 
    58 the case), :obj:`~Orange.classification.ClassifierFromVar` is just two simple 
    59 ifs slower than :obj:`~Orange.classification.ClassifierFromVarFD`. 
     52        Return ``transformer(instance[which_var])``. The value of 
     53        :obj:`~ClassifierFromVar.which_var` can be either an ordinary 
     54        variable, a meta variable or a variable which is not defined 
     55        for the instance but its descriptor has a 
     56        :obj:`~Orange.feature.Descriptor.get_value_from` that can be 
     57        used to compute the value. 
    6058 
    61 As you might have guessed, the crucial component here is the transformer. 
    62 Let us, for sake of demonstration, load a ``monks-1`` dataset and construct an 
    63 attribute ``e1`` that will have value "1", when ``e`` is "1", and "not 1" when 
    64 ``e`` is different than 1. There are many ways to do it, and that same problem 
    65 is covered in different places in Orange documentation. Although the way 
    66 presented here is not the simplest, it will serve to demonstrate how 
    67 ClassifierFromVar works. 
     59        If the feature is not found or its value is missing, the 
     60        missing value is passed to the transformer if 
     61        :obj:`~ClassifierFromVar.transform_unknowns` is 
     62        ``True``. Otherwise, 
     63        :obj:`~ClassifierFromVar.distribution_for_unknown` is 
     64        returned. 
    6865 
     66The following example demonstrates the use of the class on the Monk 1 
     67dataset. It construct a new variable `e1` that has a value of `1`, when 
     68`e` is `1`, and `not 1` otherwise. 
    6969 
    7070.. literalinclude:: code/classifier-from-var-example.py 
    7171    :lines: 1-19 
    7272 
    73 ClassifierFromVarFD 
    74 =================== 
     73 
    7574 
    7675.. class:: ClassifierFromVarFD 
    7776 
    78     :obj:`ClassifierFromVarFD` is very similar to :obj:`ClassifierFromVar` 
    79     except that the variable is not given as a descriptor (like 
    80     :obj:`~ClassifierFromVar.which_var`) but as an index. The index can be 
    81     either a position of the variable in the domain or a meta-id. Given that 
    82     :obj:`ClassifierFromVarFD` is practically no faster than 
    83     :obj:`ClassifierFromVar` (and can in future even be merged with the 
    84     latter), you should seldom need to use the class. 
     77 
     78    A class similar to 
     79    :obj:`~Orange.classification.ClassifierFromVar` except that the 
     80    variable is given by its index in the domain. The index can also 
     81    be negative to denote a meta attribute. 
     82 
     83    The only practical difference between the two classes is that this 
     84    does not compute the value of the variable from other variables 
     85    through the descriptor's 
     86    :obj:`Orange.feature.Descriptor.get_value_from`. 
    8587 
    8688    .. attribute:: domain (inherited from :obj:`ClassifierFromVarFD`) 
    8789     
    88         The domain on which the classifier operates. 
     90        The domain to which the :obj:`position` applies. 
    8991 
    9092    .. attribute:: position 
     
    9294        The position of the attribute in the domain or its meta-id. 
    9395 
    94     .. attribute:: transformer 
     96    .. attribute:: transformer         
    9597 
    96         The transformer for the value. 
     98        The transformer for the value. It should be a class derived 
     99        from :obj:`Orange.data.utils.TransformValue` or a function 
     100        written in Python. 
     101 
     102    .. attribute:: transform_unknowns 
     103 
     104        Defines the treatment of missing values. 
    97105 
    98106    .. attribute:: distribution_for_unknown 
    99107 
    100         The distribution that is returned when the which_var's value is undefined. 
     108        The distribution that is returned when the `which_var`'s value 
     109        is undefined and :obj:`transform_unknowns` is ``False``. 
    101110 
    102 When an instance is passed to :obj:`~Orange.classification.ClassifierFromVarFD`, 
    103 it is first checked whether it is from the correct domain; an exception is 
    104 raised if not. If the domain is OK, the corresponding attribute value is 
    105 retrieved, transformed and returned. 
     111    The use of this class is similar to that of  
     112    :obj:`~Orange.classification.ClassifierFromVar`. 
    106113 
    107 :obj:`ClassifierFromVarFD`'s twin brother, :obj:`ClassifierFromVar`, can also 
    108 handle variables that are not in the instances' domain or meta-variables, 
    109 but can be computed therefrom by using their 
    110 :obj:`~Orange.feature.Descriptor.get_value_from`. Since 
    111 :obj:`ClassifierFromVarFD` doesn't store attribute descriptor but only an index, 
    112 such functionality is obviously impossible. 
    113  
    114 To rewrite the above script to use :obj:`ClassifierFromVarFD`, 
    115 we need to set the domain and the ``e``'s index to position 
    116 (equivalent to setting which_var in :obj:`ClassifierFromVar`). 
    117 The initialization of :obj:`ClassifierFromVarFD` thus goes like this: 
    118  
    119 .. literalinclude:: code/classifier-from-var-example.py 
    120     :lines: 21-25 
     114    .. literalinclude:: code/classifier-from-var-example.py 
     115        :lines: 21-25 
  • docs/reference/rst/Orange.classification.majority.rst

    r9372 r10368  
    1 .. automodule:: Orange.classification.majority 
     1.. py:currentmodule:: Orange.classification.majority 
     2 
     3*********************** 
     4Majority (``majority``) 
     5*********************** 
     6 
     7.. index:: majority classifier 
     8   pair: classification; majority classifier 
     9 
     10Accuracy of classifiers is often compared to the "default accuracy", 
     11that is, the accuracy of a classifier which classifies all instances 
     12to the majority class. The training of such classifier consists of 
     13computing the class distribution and its modus. The model is represented as an instance of :obj:`Orange.classification.ConstantClassifier`. 
     14 
     15.. class:: MajorityLearner 
     16 
     17    MajorityLearner has two components, which are seldom used. 
     18 
     19    .. attribute:: estimator_constructor 
     20     
     21        An estimator constructor that can be used for estimation of 
     22        class probabilities. If left None, probability of each class is 
     23        estimated as the relative frequency of instances belonging to 
     24        this class. 
     25         
     26    .. attribute:: apriori_distribution 
     27     
     28        Apriori class distribution that is passed to estimator 
     29        constructor if one is given. 
     30 
     31Example 
     32======== 
     33 
     34This "learning algorithm" will most often be used as a baseline, 
     35that is, to determine if some other learning algorithm provides 
     36any information about the class (:download:`majority-classification.py <code/majority-classification.py>`): 
     37 
     38.. literalinclude:: code/majority-classification.py 
     39    :lines: 7- 
  • docs/reference/rst/Orange.classification.rst

    r10347 r10375  
    66 
    77Induction of models in Orange is implemented through a two-class 
    8 schema. A learning algorithm is represented as an instance of a class 
     8schema. A learning algorithm is represented by an instance of a class 
    99derived from :obj:`Orange.classification.Learner`. The learner stores 
    10 all parameters of the learning algorithm. When a learner is called 
    11 with some data, it fits a model of the kind specific to the learning 
    12 algorithm and returns it as a (new) instance of a class derived 
    13 :obj:`Orange.classification.Classifier` that holds parameters of the model. 
     10all parameters of the learning algorithm. Induced models are 
     11represented by instances of classes derived from 
     12:obj:`Orange.classification.Classifier`. 
     13 
     14Therefore, to induce models from data, one first needs to construct 
     15the instance representing a learning algorithm 
     16(e.g. :obj:`~Orange.classification.tree.TreeLearner`) and set its 
     17parameters. Calling the learner with some training data returns a 
     18classifier (e.g. :obj:`~Orange.classification.tree.TreeClassifier`). The 
     19learner does not "learn" to classify but constructs classifiers. 
    1420 
    1521.. literalinclude:: code/bayes-run.py 
    1622   :lines: 7- 
    1723 
    18 Orange implements various classifiers that are described in detail on 
     24To simplify the procedure, the learner's constructor can also be given 
     25training data, in which case it fits and returns a model (an instance 
     26of :obj:`~Orange.classification.Classifier`) instead of a learner:: 
     27 
     28    classifier = Orange.classification.bayes.NaiveLearner(titanic) 
     29 
     30 
     31Orange contains a number of learning algorithms described in detail on 
    1932separate pages. 
    2033 
     
    3144   Orange.classification.lookup 
    3245   Orange.classification.classfromvar 
     46   Orange.classification.constant 
    3347    
    34 Base classes 
    35 ------------ 
    3648 
    3749All learning algorithms and prediction models are derived from the following two clases. 
     
    4153    Abstract base class for learning algorithms. 
    4254 
    43     .. method:: __call__(data) 
     55    .. method:: __call__(data[, weightID]) 
    4456 
    4557        An abstract method that fits a model and returns it as an 
    46         instance of :class:`Classifier`. 
     58        instance of :class:`Classifier`. The first argument gives the 
     59        data (as :obj:`Orange.data.Table` and the optional second 
     60        argument gives the id of the meta attribute with instance 
     61        weights. 
    4762 
    4863 
     
    8095              :class:`~Orange.statistics.distribution.Distribution` or a 
    8196              tuple with both 
    82  
    83  
    84 Constant Classifier 
    85 ------------------- 
    86  
    87 The classification module also contains a classifier that always 
    88 predicts the same value. This classifier is constructed by different 
    89 learners such as 
    90 :obj:`~Orange.classification.majority.MajorityLearner`, and also by 
    91 some other methods. 
    92  
    93 .. class:: ConstantClassifier 
    94  
    95     Predict the specified ``default_val`` or ``default_distribution`` 
    96     for any instance. 
    97  
    98     .. attribute:: class_var 
    99  
    100         Class variable that the classifier predicts. 
    101  
    102     .. attribute:: default_val 
    103  
    104         The value returned by the classifier. 
    105  
    106     .. attribute:: default_distribution 
    107  
    108         Class probabilities returned by the classifier. 
    109      
    110     .. method:: __init__(variable, value, distribution) 
    111  
    112         Constructor can be called without arguments, with a 
    113         variable, value or both. If the value is given and is of type 
    114         :obj:`Orange.data.Value`, its attribute 
    115         :obj:`Orange.data.Value.variable` will either be used for 
    116         initializing 
    117         :obj:`~Orange.classification.ConstantClassifier.variable` or 
    118         checked against it, if :obj:`variable` is given as an 
    119         argument. 
    120          
    121         :param variable: Class variable that the classifier predicts. 
    122         :type variable: :obj:`Orange.feature.Descriptor` 
    123         :param value: Value returned by the classifier. 
    124         :type value: :obj:`Orange.data.Value` or int (index) or float 
    125         :param distribution: Class probabilities returned by the classifier. 
    126         :type dstribution: :obj:`Orange.statistics.distribution.Distribution` 
    127         
    128     .. method:: __call__(instance, return_type) 
    129          
    130         Return :obj:`default_val` and/or :obj:`default_distribution` 
    131         (depending upon :obj:`return_type`) disregarding the 
    132         :obj:`instance`. 
    133  
    134  
    135  
  • docs/reference/rst/Orange.classification.rules.rst

    r9372 r10370  
    1 .. automodule:: Orange.classification.rules 
     1.. py:currentmodule:: Orange.classification.rules 
     2 
     3.. index:: rule induction 
     4 
     5.. index::  
     6   single: classification; rule induction 
     7 
     8************************** 
     9Rule induction (``rules``) 
     10************************** 
     11 
     12Module ``rules`` implements supervised rule induction algorithms and 
     13rule-based classification methods. Rule induction is based on a 
     14comprehensive framework of components that can be modified or 
     15replaced. For ease of use, the module already provides multiple 
     16variations of `CN2 induction algorithm 
     17<http://www.springerlink.com/content/k6q2v76736w5039r/>`_. 
     18 
     19CN2 algorithm 
     20============= 
     21 
     22.. index::  
     23   single: classification; CN2 
     24 
     25The use of rule learning algorithms is consistent with a typical 
     26learner usage in Orange: 
     27 
     28:download:`rules-cn2.py <code/rules-cn2.py>` 
     29 
     30.. literalinclude:: code/rules-cn2.py 
     31    :lines: 7- 
     32 
     33:: 
     34     
     35    IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000> 
     36    IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000> 
     37    IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000> 
     38    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000> 
     39    IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000> 
     40    IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000> 
     41    IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000> 
     42    IF status=['second'] THEN survived=yes<13.000, 80.000> 
     43    IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000> 
     44    IF status=['crew'] THEN survived=no<670.000, 192.000> 
     45    IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000> 
     46    IF sex=['male'] THEN survived=no<118.000, 57.000> 
     47    IF age=['child'] THEN survived=no<17.000, 14.000> 
     48    IF TRUE THEN survived=no<89.000, 76.000> 
     49     
     50.. autoclass:: Orange.classification.rules.CN2Learner(evaluator=Evaluator_Entropy, beam_width=5, alpha=1) 
     51   :members: 
     52   :show-inheritance: 
     53   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     54      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     55    
     56.. autoclass:: Orange.classification.rules.CN2Classifier 
     57   :members: 
     58   :show-inheritance: 
     59   :exclude-members: beamWidth, resultType 
     60    
     61.. index:: unordered CN2 
     62 
     63.. index::  
     64   single: classification; unordered CN2 
     65 
     66.. autoclass:: Orange.classification.rules.CN2UnorderedLearner(evaluator=Evaluator_Laplace(), beam_width=5, alpha=1.0) 
     67   :members: 
     68   :show-inheritance: 
     69   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     70      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     71    
     72.. autoclass:: Orange.classification.rules.CN2UnorderedClassifier 
     73   :members: 
     74   :show-inheritance: 
     75    
     76.. index:: CN2-SD 
     77.. index:: subgroup discovery 
     78 
     79.. index::  
     80   single: classification; CN2-SD 
     81    
     82.. autoclass:: Orange.classification.rules.CN2SDUnorderedLearner(evaluator=WRACCEvaluator(), beam_width=5, alpha=0.05, mult=0.7) 
     83   :members: 
     84   :show-inheritance: 
     85   :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     86      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     87    
     88.. autoclass:: Orange.classification.rules.CN2EVCUnorderedLearner 
     89   :members: 
     90   :show-inheritance: 
     91    
     92 
     93.. 
     94    This part is commented out since 
     95    - there is no documentation on how to provide arguments 
     96    - the whole thing is represent original research work particular to 
     97      a specific project and belongs to an 
     98      extension rather than to the main package 
     99 
     100    Argument based CN2 
     101    ================== 
     102 
     103    Orange also supports argument-based CN2 learning. 
     104 
     105    .. autoclass:: Orange.classification.rules.ABCN2 
     106       :members: 
     107       :show-inheritance: 
     108       :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     109      ruleFinder, ruleStopping, storeInstances, targetClass, weightID, 
     110      argument_id 
     111 
     112       This class has many more undocumented methods; see the source code for 
     113       reference. 
     114 
     115    .. autoclass:: Orange.classification.rules.ABCN2Ordered 
     116       :members: 
     117       :show-inheritance: 
     118 
     119    .. autoclass:: Orange.classification.rules.ABCN2M 
     120       :members: 
     121       :show-inheritance: 
     122       :exclude-members: baseRules, beamWidth, coverAndRemove, dataStopping, 
     123      ruleFinder, ruleStopping, storeInstances, targetClass, weightID 
     124 
     125    Thismodule has many more undocumented argument-based learning 
     126    related classed; see the source code for reference. 
     127 
     128    References 
     129    ---------- 
     130 
     131    * Bratko, Mozina, Zabkar. `Argument-Based Machine Learning 
     132      <http://www.springerlink.com/content/f41g17t1259006k4/>`_. Lecture Notes in 
     133      Computer Science: vol. 4203/2006, 11-17, 2006. 
     134 
     135 
     136Rule induction framework 
     137======================== 
     138 
     139The classes described above are based on a more general framework that 
     140can be fine-tuned to specific needs by replacing individual components. 
     141Here is an example: 
     142 
     143part of :download:`rules-customized.py <code/rules-customized.py>` 
     144 
     145.. literalinclude:: code/rules-customized.py 
     146    :lines: 7-17 
     147 
     148:: 
     149 
     150    IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000> 
     151    IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000> 
     152    IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000> 
     153    IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000> 
     154    IF status=['second'] THEN survived=yes<13.000, 104.000> 
     155    IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000> 
     156    IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000> 
     157    IF status=['crew'] THEN survived=yes<3.000, 20.000> 
     158    IF sex=['female'] THEN survived=no<106.000, 90.000> 
     159    IF TRUE THEN survived=yes<0.000, 5.000> 
     160 
     161In the example, we wanted to use a rule evaluator based on the 
     162m-estimate and set ``m`` to 50. The evaluator is a subcomponent of the 
     163:obj:`rule_finder` component. Thus, to be able to set the evaluator, 
     164we first set the :obj:`rule_finder` component, then we added the 
     165desired subcomponent and set its options. All other components, which 
     166are left unspecified, are provided by the learner at the training time 
     167and removed afterwards. 
     168 
     169Continuing with the example, assume that we wish to set a different 
     170validation function and a different beam width. 
     171 
     172part of :download:`rules-customized.py <code/rules-customized.py>` 
     173 
     174.. literalinclude:: code/rules-customized.py 
     175    :lines: 19-23 
     176 
     177.. py:class:: Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1) 
     178    
     179   Represents a single rule. Constructor arguments correspond to the 
     180   first seven of the attributes (from :obj:`filter` to 
     181   :obj:`quality`) below. 
     182    
     183   .. attribute:: filter 
     184    
     185      Rule condition; an instance of 
     186      :class:`Orange.data.filter.Filter`, typically an instance of a 
     187      class derived from :class:`Orange.data.filter.Values` 
     188    
     189   .. attribute:: classifier 
     190       
     191      A rule predicts the class by calling an embedded classifier that 
     192      must be an instance of 
     193      :class:`~Orange.classification.Classifier`, typically 
     194      :class:`~Orange.classification.ConstantClassifier`. This 
     195      classifier is called by the rule classifier, such as 
     196      :obj:`RuleClassifier`. 
     197    
     198   .. attribute:: learner 
     199       
     200      Learner that is used for constructing a classifier. It must be 
     201      an instance of :class:`~Orange.classification.Learner`, 
     202      typically 
     203      :class:`~Orange.classification.majority.MajorityLearner`. 
     204    
     205   .. attribute:: class_distribution 
     206       
     207      Distribution of class in data instances covered by this rule 
     208      (:class:`~Orange.statistics.distribution.Distribution`). 
     209    
     210   .. attribute:: instances 
     211       
     212      Data instances covered by this rule (:class:`~Orange.data.Table`). 
     213    
     214   .. attribute:: weight_id 
     215    
     216      ID of the weight meta-attribute for the stored data instances 
     217      (``int``). 
     218    
     219   .. attribute:: quality 
     220       
     221      Quality of the rule. Rules with higher quality are better 
     222      (``float``). 
     223    
     224   .. attribute:: complexity 
     225    
     226      Complexity of the rule (``float``), typically the number of 
     227      selectors (conditions) in the rule. Complexity is used for 
     228      choosing between rules with the same quality; rules with lower 
     229      complexity are preferred. 
     230    
     231   .. method:: filter_and_store(instances, weight_id=0, target_class=-1) 
     232    
     233      Filter passed data instances and store them in :obj:`instances`. 
     234      Also, compute :obj:`class_distribution`, set the weight of 
     235      stored examples and create a new classifier using :obj:`learner`. 
     236       
     237      :param weight_id: ID of the weight meta-attribute. 
     238      :type weight_id: int 
     239      :param target_class: index of target class; -1 for all classes. 
     240      :type target_class: int 
     241    
     242   .. method:: __call__(instance) 
     243    
     244      Return ``True`` if the given instance matches the rule condition. 
     245       
     246      :param instance: data instance 
     247      :type instance: :class:`Orange.data.Instance` 
     248       
     249   .. method:: __call__(instances, ref=True, negate=False) 
     250 
     251      Return a table of instances that match the rule condition. 
     252       
     253      :param instances: a table of data instances 
     254      :type instances: :class:`Orange.data.Table` 
     255      :param ref: if ``True`` (default), the constructed table contains 
     256          references to the original data instances; if ``False``, the 
     257          data is copied 
     258      :type ref: bool 
     259      :param negate: inverts the selection 
     260      :type negate: bool 
     261 
     262 
     263 
     264.. py:class:: Orange.classification.rules.RuleLearner(store_instances=True, target_class=-1, base_rules=Orange.classification.rules.RuleList()) 
     265    
     266   Bases: :class:`Orange.classification.Learner` 
     267    
     268   A base rule induction learner. The algorithm follows 
     269   separate-and-conquer strategy, which has its origins in the AQ 
     270   family of algorithms (Fuernkranz J.; Separate-and-Conquer Rule 
     271   Learning, Artificial Intelligence Review 13, 3-54, 1999). Such 
     272   algorithms search for the optimal rule for the current training 
     273   set, remove the covered training instances (`separate`) and repeat 
     274   the process (`conquer`) on the remaining data. 
     275    
     276   :param store_instances: if ``True`` (default), the induced rules 
     277       contain a table with references to the stored data instances. 
     278   :type store_instances: bool 
     279     
     280   :param target_class: index of a specific class to learn; if -1 
     281        there is no target class 
     282   :type target_class: int 
     283    
     284   :param base_rules: An optional list of initial rules for constraining the :obj:`rule_finder`. 
     285   :type base_rules: :class:`~Orange.classification.rules.RuleList` 
     286 
     287   The class' functionality is best explained by its ``__call__`` 
     288   function. 
     289    
     290   .. parsed-literal:: 
     291 
     292      def \_\_call\_\_(self, instances, weight_id=0): 
     293          rule_list = Orange.classification.rules.RuleList() 
     294          all_instances = Orange.data.Table(instances) 
     295          while not self.\ **data_stopping**\ (instances, weight_id, self.target_class): 
     296              new_rule = self.\ **rule_finder**\ (instances, weight_id, self.target_class, self.base_rules) 
     297              if self.\ **rule_stopping**\ (rule_list, new_rule, instances, weight_id): 
     298                  break 
     299              instances, weight_id = self.\ **cover_and_remove**\ (new_rule, instances, weight_id, self.target_class) 
     300              rule_list.append(new_rule) 
     301          return Orange.classification.rules.RuleClassifier_FirstRule( 
     302              rules=rule_list, instances=all_instances) 
     303        
     304   The customizable components are :obj:`data_stopping`, 
     305   :obj:`rule_finder`, :obj:`cover_and_remove` and :obj:`rule_stopping` 
     306   objects. 
     307    
     308   .. attribute:: data_stopping 
     309    
     310      An instance of 
     311      :class:`~Orange.classification.rules.RuleDataStoppingCriteria` 
     312      that determines whether to continue the induction. The default 
     313      component, 
     314      :class:`~Orange.classification.rules.RuleDataStoppingCriteria_NoPositives` 
     315      returns ``True`` if there are no more instances of the target class.  
     316    
     317   .. attribute:: rule_finder 
     318       
     319      An instance of :class:`~Orange.classification.rules.RuleFinder` 
     320      that learns a single rule. Default is 
     321      :class:`~Orange.classification.rules.RuleBeamFinder`. 
     322 
     323   .. attribute:: rule_stopping 
     324       
     325      An instance of 
     326      :class:`~Orange.classification.rules.RuleStoppingCriteria` that 
     327      decides whether to use the induced rule or to discard it and stop 
     328      the induction. If ``None`` (default) all rules are accepted. 
     329        
     330   .. attribute:: cover_and_remove 
     331        
     332      An instance of :class:`RuleCovererAndRemover` that removes 
     333      instances covered by the rule and returns remaining 
     334      instances. The default implementation 
     335      (:class:`RuleCovererAndRemover_Default`) removes the instances 
     336      that belong to given target class; if the target is not 
     337      specified (:obj:`target_class` == -1), it removes all covered 
     338      instances.     
     339 
     340 
     341Rule finders 
     342------------ 
     343 
     344.. class:: Orange.classification.rules.RuleFinder 
     345 
     346   Base class for rule finders, which learn a single rule from 
     347   instances. 
     348    
     349   .. method:: __call__(table, weight_id, target_class, base_rules) 
     350    
     351      Induce a new rule from the given data. 
     352       
     353      :param table: training data instances 
     354      :type table: :class:`Orange.data.Table` 
     355       
     356      :param weight_id: ID of the weight meta-attribute 
     357      :type weight_id: int 
     358       
     359      :param target_class: index of a specific class being learned; -1 for all. 
     360      :type target_class: int  
     361       
     362      :param base_rules: A list of initial rules for constraining the search space 
     363      :type base_rules: :class:`~Orange.classification.rules.RuleList` 
     364 
     365 
     366.. class:: Orange.classification.rules.RuleBeamFinder 
     367    
     368   Bases: :class:`~Orange.classification.rules.RuleFinder` 
     369    
     370   Beam search for the best rule. This is the default finder for 
     371   :obj:`RuleLearner`. Pseudo code of the algorithm is as follows. 
     372 
     373   .. parsed-literal:: 
     374 
     375      def \_\_call\_\_(self, table, weight_id, target_class, base_rules): 
     376          prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id) 
     377          rules_star, best_rule = self.\ **initializer**\ (table, weight_id, target_class, base_rules, self.evaluator, prior) 
     378          \# compute quality of rules in rules_star and best_rule 
     379          ... 
     380          while len(rules_star) \> 0: 
     381              candidates, rules_star = self.\ **candidate_selector**\ (rules_star, table, weight_id) 
     382              for cand in candidates: 
     383                  new_rules = self.\ **refiner**\ (cand, table, weight_id, target_class) 
     384                  for new_rule in new_rules: 
     385                      if self.\ **rule_stopping_validator**\ (new_rule, table, weight_id, target_class, cand.class_distribution): 
     386                          new_rule.quality = self.\ **evaluator**\ (new_rule, table, weight_id, target_class, prior) 
     387                          rules_star.append(new_rule) 
     388                          if self.\ **validator**\ (new_rule, table, weight_id, target_class, prior) and 
     389                              new_rule.quality \> best_rule.quality: 
     390                              best_rule = new_rule 
     391              rules_star = self.\ **rule_filter**\ (rules_star, table, weight_id) 
     392          return best_rule 
     393           
     394   Modifiable components are shown in bold. These are: 
     395 
     396   .. attribute:: initializer 
     397    
     398      An instance of 
     399      :obj:`~Orange.classification.rules.RuleBeamInitializer` that 
     400      is used to construct the initial list of rules. The default, 
     401      :class:`~Orange.classification.rules.RuleBeamInitializer_Default`, 
     402      returns :obj:`base_rules`, or a rule with no conditions if 
     403      :obj:`base_rules` is not set. 
     404    
     405   .. attribute:: candidate_selector 
     406    
     407      An instance of 
     408      :class:`~Orange.classification.rules.RuleBeamCandidateSelector` 
     409      used to separate a subset of rules from the current 
     410      :obj:`rules_star` that will be further specialized.  The default 
     411      component, an instance of 
     412      :class:`~Orange.classification.rules.RuleBeamCandidateSelector_TakeAll`, 
     413      selects all rules. 
     414     
     415   .. attribute:: refiner 
     416    
     417      An instance of 
     418      :class:`~Orange.classification.rules.RuleBeamRefiner` that is 
     419      used for refining the rules. Refined rule should cover a strict 
     420      subset of instances covered by the given rule. Default component 
     421      (:class:`~Orange.classification.rules.RuleBeamRefiner_Selector`) 
     422      adds a conjunctive selector to selectors present in the rule. 
     423     
     424   .. attribute:: rule_filter 
     425    
     426      An instance of 
     427      :class:`~Orange.classification.rules.RuleBeamFilter` that is 
     428      used for filtering rules to trim the search beam. The default 
     429      component, 
     430      :class:`~Orange.classification.rules.RuleBeamFilter_Width`\ 
     431      *(m=5)*\, keeps the five best rules. 
     432 
     433   .. method:: __call__(data, weight_id, target_class, base_rules) 
     434 
     435       Determines the optimal rule to cover the given data instances. 
     436 
     437       :param data: data instances. 
     438       :type data: :class:`Orange.data.Table` 
     439 
     440       :param weight_id: index of the weight meta-attribute. 
     441       :type weight_id: int 
     442 
     443       :param target_class: index of the target class. 
     444       :type target_class: int 
     445 
     446       :param base_rules: existing rules 
     447       :type base_rules: :class:`~Orange.classification.rules.RuleList` 
     448 
     449Rule evaluators 
     450--------------- 
     451 
     452.. class:: Orange.classification.rules.RuleEvaluator 
     453 
     454   Base class for rule evaluators that evaluate the quality of the 
     455   rule based on the data instances they cover. 
     456    
     457   .. method:: __call__(rule, instances, weight_id, target_class, prior) 
     458    
     459      Calculate a (non-negative) rule quality. 
     460       
     461      :param rule: rule to evaluate 
     462      :type rule: :class:`~Orange.classification.rules.Rule` 
     463       
     464      :param instances: data instances covered by the rule 
     465      :type instances: :class:`Orange.data.Table` 
     466       
     467      :param weight_id: index of the weight meta-attribute 
     468      :type weight_id: int 
     469       
     470      :param target_class: index of target class of this rule 
     471      :type target_class: int 
     472       
     473      :param prior: prior class distribution 
     474      :type prior: :class:`Orange.statistics.distribution.Distribution` 
     475 
     476.. autoclass:: Orange.classification.rules.LaplaceEvaluator 
     477   :members: 
     478   :show-inheritance: 
     479   :exclude-members: targetClass, weightID 
     480 
     481.. autoclass:: Orange.classification.rules.WRACCEvaluator 
     482   :members: 
     483   :show-inheritance: 
     484   :exclude-members: targetClass, weightID 
     485    
     486.. class:: Orange.classification.rules.RuleEvaluator_Entropy 
     487 
     488   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     489     
     490.. class:: Orange.classification.rules.RuleEvaluator_LRS 
     491 
     492   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     493 
     494.. class:: Orange.classification.rules.RuleEvaluator_Laplace 
     495 
     496   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     497 
     498.. class:: Orange.classification.rules.RuleEvaluator_mEVC 
     499 
     500   Bases: :class:`~Orange.classification.rules.RuleEvaluator` 
     501    
     502Instance covering and removal 
     503----------------------------- 
     504 
     505.. class:: RuleCovererAndRemover 
     506 
     507   Base class for rule coverers and removers that, when invoked, remove 
     508   instances covered by the rule and return remaining instances. 
     509 
     510.. autoclass:: CovererAndRemover_MultWeights 
     511 
     512.. autoclass:: CovererAndRemover_AddWeights 
     513    
     514Miscellaneous functions 
     515----------------------- 
     516 
     517.. automethod:: Orange.classification.rules.rule_to_string 
     518 
     519.. 
     520    Undocumented are: 
     521    Data-based Stopping Criteria 
     522    ---------------------------- 
     523    Rule-based Stopping Criteria 
     524    ---------------------------- 
     525    Rule-based Stopping Criteria 
     526    ---------------------------- 
     527 
     528References 
     529---------- 
     530 
     531* Clark, Niblett. `The CN2 Induction Algorithm 
     532  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9180>`_. Machine 
     533  Learning 3(4):261--284, 1989. 
     534* Clark, Boswell. `Rule Induction with CN2: Some Recent Improvements 
     535  <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1700>`_. In 
     536  Machine Learning - EWSL-91. Proceedings of the European Working Session on 
     537  Learning, pp 151--163, Porto, Portugal, March 1991. 
     538* Lavrac, Kavsek, Flach, Todorovski: `Subgroup Discovery with CN2-SD 
     539  <http://jmlr.csail.mit.edu/papers/volume5/lavrac04a/lavrac04a.pdf>`_. Journal 
     540  of Machine Learning Research 5: 153-188, 2004. 
     541 
  • docs/reference/rst/Orange.classification.svm.rst

    r9372 r10369  
    1 .. automodule:: Orange.classification.svm 
     1.. py:currentmodule:: Orange.classification.svm 
     2 
     3.. index:: support vector machines (SVM) 
     4.. index: 
     5   single: classification; support vector machines (SVM) 
     6    
     7********************************* 
     8Support Vector Machines (``svm``) 
     9********************************* 
     10 
     11The module for `Support Vector Machine`_ (SVM) classification is based 
     12on the popular `LibSVM`_ and `LIBLINEAR`_ libraries. It provides several 
     13learning algorithms: 
     14 
     15- :obj:`SVMLearner`, a general SVM learner; 
     16- :obj:`SVMLearnerEasy`, which is similar to the `svm-easy.py` script 
     17    from the LibSVM distribution and helps with the data normalization and 
     18    parameter tuning; 
     19- :obj:`LinearSVMLearner`, a fast learner useful for data sets with a large 
     20    number of features. 
     21           
     22SVM learners (from `LibSVM`_) 
     23============================= 
     24 
     25:class:`SVMLearner` uses the standard `LibSVM`_ learner. It supports 
     26several built-in kernel types and user-defined kernels functions 
     27written in Python. The kernel type is denoted by constants ``Linear``, 
     28``Polynomial``, ``RBF``, ``Sigmoid`` and ``Custom`` defined in 
     29``Orange.classification.svm.kernels``.  A custom kernel function must 
     30accept two data instances and return a float. See 
     31:ref:`kernel-wrapper` for examples. 
     32     
     33The class also supports several types of optimization: ``C_SVC``, 
     34``Nu_SVC`` (default), ``OneClass``, ``Epsilon_SVR`` and ``Nu_SVR`` 
     35(defined in ``Orange.classification.svm.SVMLearner``). 
     36 
     37Class :obj:`SVMLearner` works on non-sparse data and 
     38:class:`SVMLearnerSparse` class works on sparse data sets, for 
     39instance data from the `basket` format). 
     40 
     41.. autoclass:: Orange.classification.svm.SVMLearner 
     42    :members: 
     43 
     44.. autoclass:: Orange.classification.svm.SVMLearnerSparse 
     45    :members: 
     46    :show-inheritance: 
     47     
     48.. autoclass:: Orange.classification.svm.SVMLearnerEasy 
     49    :members: 
     50    :show-inheritance: 
     51 
     52The example below compares the performances of :obj:`SVMLearnerEasy` 
     53with automatic data preprocessing and parameter tuning and 
     54:obj:`SVMLearner` with the default :obj:`~SVMLearner.nu` and 
     55:obj:`~SVMLearner.gamma`: 
     56     
     57.. literalinclude:: code/svm-easy.py 
     58 
     59 
     60    
     61Linear SVM learners (from `LIBLINEAR`_) 
     62======================================= 
     63 
     64Linear SVM learners are more suitable for large scale problems since 
     65they are significantly faster then :class:`SVMLearner` and its 
     66subclasses. A down side is that they support only a linear kernel and 
     67can not estimate probabilities. 
     68    
     69.. autoclass:: Orange.classification.svm.LinearSVMLearner 
     70   :members: 
     71    
     72.. autoclass:: Orange.classification.svm.MultiClassSVMLearner 
     73   :members: 
     74    
     75    
     76SVM Based feature selection and scoring 
     77======================================= 
     78 
     79.. autoclass:: Orange.classification.svm.RFE 
     80 
     81.. autoclass:: Orange.classification.svm.ScoreSVMWeights 
     82    :show-inheritance: 
     83  
     84  
     85Utility functions 
     86================= 
     87 
     88.. automethod:: Orange.classification.svm.max_nu 
     89 
     90.. automethod:: Orange.classification.svm.get_linear_svm_weights 
     91 
     92.. automethod:: Orange.classification.svm.table_to_svm_format 
     93 
     94The following example shows how to get linear SVM weights: 
     95     
     96.. literalinclude:: code/svm-linear-weights.py     
     97 
     98 
     99.. _kernel-wrapper: 
     100 
     101Kernel wrappers 
     102=============== 
     103 
     104Kernel wrappers are helper classes for building custom kernels for use 
     105with :class:`SVMLearner` and subclasses. They take and transform one 
     106or two Python functions (attributes :obj:`wrapped` or :obj:`wrapped1` 
     107and :obj:`wrapped2`). The function must be a positive definite kernel 
     108that takes two arguments of type :class:`Orange.data.Instance` and 
     109returns a float. 
     110 
     111.. autoclass:: Orange.classification.svm.kernels.KernelWrapper 
     112   :members: 
     113 
     114.. autoclass:: Orange.classification.svm.kernels.DualKernelWrapper 
     115   :members: 
     116 
     117.. autoclass:: Orange.classification.svm.kernels.RBFKernelWrapper 
     118   :members: 
     119 
     120.. autoclass:: Orange.classification.svm.kernels.PolyKernelWrapper 
     121   :members: 
     122 
     123.. autoclass:: Orange.classification.svm.kernels.AdditionKernelWrapper 
     124   :members: 
     125 
     126.. autoclass:: Orange.classification.svm.kernels.MultiplicationKernelWrapper 
     127   :members: 
     128 
     129.. autoclass:: Orange.classification.svm.kernels.CompositeKernelWrapper 
     130   :members: 
     131 
     132.. autoclass:: Orange.classification.svm.kernels.SparseLinKernel 
     133   :members: 
     134 
     135Example: 
     136 
     137.. literalinclude:: code/svm-custom-kernel.py 
     138 
     139.. _`Support Vector Machine`: http://en.wikipedia.org/wiki/Support_vector_machine 
     140.. _`LibSVM`: http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 
     141.. _`LIBLINEAR`: http://www.csie.ntu.edu.tw/~cjlin/liblinear/ 
  • docs/reference/rst/Orange.classification.tree.rst

    r9372 r10371  
    1 .. automodule:: Orange.classification.tree 
     1.. py:currentmodule:: Orange.classification.tree 
     2 
     3.. index:: classification tree 
     4 
     5.. index:: 
     6   single: classification; tree 
     7 
     8******************************* 
     9Classification trees (``tree``) 
     10******************************* 
     11 
     12Orange includes multiple implementations of classification tree learners: 
     13a very flexible :class:`TreeLearner`, a fast :class:`SimpleTreeLearner`, 
     14and a :class:`C45Learner`, which uses the C4.5 tree induction 
     15algorithm. 
     16 
     17The following code builds a :obj:`TreeClassifier` on the Iris data set 
     18(with the depth limited to three levels): 
     19 
     20.. literalinclude:: code/orngTree1.py 
     21   :lines: 1-4 
     22 
     23See `Decision tree learning 
     24<http://en.wikipedia.org/wiki/Decision_tree_learning>`_ on Wikipedia 
     25for introduction to classification trees. 
     26 
     27============================== 
     28Component-based Tree Inducer 
     29============================== 
     30 
     31.. autoclass:: TreeLearner 
     32    :members: 
     33 
     34.. autoclass:: TreeClassifier 
     35    :members: 
     36 
     37.. class:: Node 
     38 
     39    Classification trees are a hierarchy of :obj:`Node` classes. 
     40 
     41    Node stores the instances belonging to the node, a branch selector, 
     42    a list of branches (if the node is not a leaf) with their descriptions 
     43    and strengths, and a classifier. 
     44 
     45    .. attribute:: distribution 
     46     
     47        A distribution of learning instances. 
     48 
     49    .. attribute:: contingency 
     50 
     51        Complete contingency matrices for the learning instances. 
     52 
     53    .. attribute:: instances, weightID 
     54 
     55        Learning instances and the ID of weight meta attribute. The root 
     56        of the tree actually stores all instances, while other nodes 
     57        store only reference to instances in the root node. 
     58 
     59    .. attribute:: node_classifier 
     60 
     61        A classifier for instances coming to the node. If the node is a 
     62        leaf, it chooses the class (or class distribution) of an instance. 
     63 
     64    Internal nodes have additional attributes. The lists :obj:`branches`, 
     65    :obj:`branch_descriptions` and :obj:`branch_sizes` are of the 
     66    same length. 
     67 
     68    .. attribute:: branches 
     69 
     70        A list of subtrees. Each element is a :obj:`Node` or None. 
     71        If None, the node is empty. 
     72 
     73    .. attribute:: branch_descriptions 
     74 
     75        A list with strings describing branches. They are constructed 
     76        by :obj:`SplitConstructor`. A string can contain anything, 
     77        for example 'red' or '>12.3'. 
     78 
     79    .. attribute:: branch_sizes 
     80 
     81        A (weighted) number of training instances for  
     82        each branch. It can be used, for instance, for modeling 
     83        probabilities when classifying instances with unknown values. 
     84 
     85    .. attribute:: branch_selector 
     86 
     87        A :obj:`~Orange.classification.Classifier` that returns a branch 
     88        for each instance (as 
     89        :obj:`Orange.data.Value` in ``[0, len(branches)-1]``).  When an 
     90        instance cannot be classified unambiguously, the selector can 
     91        return a discrete distribution, which proposes how to divide 
     92        the instance between the branches. Whether the proposition will 
     93        be used depends upon the :obj:`Splitter` (for learning) 
     94        or :obj:`Descender` (for classification). 
     95 
     96    .. method:: tree_size() 
     97         
     98        Return the number of nodes in the subtrees (including the node, 
     99        excluding null-nodes). 
     100 
     101-------- 
     102Examples 
     103-------- 
     104 
     105Tree Structure 
     106============== 
     107 
     108This example works with the lenses data set: 
     109 
     110    >>> import Orange 
     111    >>> lenses = Orange.data.Table("lenses") 
     112    >>> tree_classifier = Orange.classification.tree.TreeLearner(lenses) 
     113 
     114The following function counts the number of nodes in a tree: 
     115 
     116    >>> def tree_size(node): 
     117    ...    if not node: 
     118    ...        return 0 
     119    ...    size = 1 
     120    ...    if node.branch_selector: 
     121    ...        for branch in node.branches: 
     122    ...            size += tree_size(branch) 
     123    ...    return size 
     124 
     125If node is None, the function above return 0. Otherwise, the size is 1 
     126(this node) plus the sizes of all subtrees. The algorithm need to check 
     127if a node is internal (it has a :obj:`~Node.branch_selector`), as leaves 
     128don't have the :obj:`~Node.branches` attribute. 
     129 
     130    >>> tree_size(tree_classifier.tree) 
     131    15 
     132 
     133Note that a :obj:`Node` already has a built-in method 
     134:func:`~Node.tree_size`. 
     135 
     136Trees can be printed with a simple recursive function: 
     137 
     138    >>> def print_tree0(node, level): 
     139    ...     if not node: 
     140    ...         print " "*level + "<null node>" 
     141    ...         return 
     142    ...     if node.branch_selector: 
     143    ...         node_desc = node.branch_selector.class_var.name 
     144    ...         node_cont = node.distribution 
     145    ...         print "\\n" + "   "*level + "%s (%s)" % (node_desc, node_cont), 
     146    ...         for i in range(len(node.branches)): 
     147    ...             print "\\n" + "   "*level + ": %s" % node.branch_descriptions[i], 
     148    ...             print_tree0(node.branches[i], level+1) 
     149    ...     else: 
     150    ...         node_cont = node.distribution 
     151    ...         major_class = node.node_classifier.default_value 
     152    ...         print "--> %s (%s) " % (major_class, node_cont), 
     153 
     154The crux of the example is not in the formatting (\\n's etc.); 
     155what matters is everything but the print statements. The code 
     156separately handles three node types: 
     157 
     158* For null nodes (a node to which no learning instances were classified), 
     159  it just prints "<null node>". 
     160* For internal nodes, it prints a node description: 
     161  the feature's name and distribution of classes. :obj:`Node`'s 
     162  branch description is an :obj:`~Orange.classification.Classifier`, 
     163  and its ``class_var`` is the feature whose name is printed.  Class 
     164  distributions are printed as well (they are assumed to be stored). 
     165  The :obj:`print_tree0` with a level increased by 1 to increase the 
     166  indent is recursively called for each branch. 
     167* If the node is a leaf, it prints the distribution of learning instances 
     168  in the node and the class to which the instances in the node would 
     169  be classified. We assume that the :obj:`~Node.node_classifier` is a 
     170  :obj:`DefaultClassifier`. A better print function should be aware of 
     171  possible alternatives. 
     172 
     173The wrapper function that accepts either a 
     174:obj:`TreeClassifier` or a :obj:`Node` can be written as follows: 
     175 
     176    >>> def print_tree(x): 
     177    ...     if isinstance(x, Orange.classification.tree.TreeClassifier): 
     178    ...         print_tree0(x.tree, 0) 
     179    ...     elif isinstance(x, Orange.classification.tree.Node): 
     180    ...         print_tree0(x, 0) 
     181    ...     else: 
     182    ...         raise TypeError, "invalid parameter" 
     183 
     184It's straightforward: if ``x`` is a 
     185:obj:`TreeClassifier`, it prints ``x.tree``; if it's :obj:`Node` it 
     186print ``x``. If it's of some other type, 
     187an exception is raised. The output: 
     188 
     189    >>> print_tree(tree_classifier) 
     190    <BLANKLINE> 
     191    tear_rate (<15.000, 4.000, 5.000>)  
     192    : normal  
     193       astigmatic (<3.000, 4.000, 5.000>)  
     194       : no  
     195          age (<1.000, 0.000, 5.000>)  
     196          : pre-presbyopic --> soft (<0.000, 0.000, 2.000>)   
     197          : presbyopic  
     198             prescription (<1.000, 0.000, 1.000>)  
     199             : hypermetrope --> soft (<0.000, 0.000, 1.000>)   
     200             : myope --> none (<1.000, 0.000, 0.000>)   
     201          : young --> soft (<0.000, 0.000, 2.000>)   
     202       : yes  
     203          prescription (<2.000, 4.000, 0.000>)  
     204          : hypermetrope  
     205             age (<2.000, 1.000, 0.000>)  
     206             : pre-presbyopic --> none (<1.000, 0.000, 0.000>)   
     207             : presbyopic --> none (<1.000, 0.000, 0.000>)   
     208             : young --> hard (<0.000, 1.000, 0.000>)   
     209          : myope --> hard (<0.000, 3.000, 0.000>)   
     210    : reduced --> none (<12.000, 0.000, 0.000>)  
     211 
     212The tree structure examples conclude with a simple pruning function, 
     213written entirely in Python and unrelated to any :class:`Pruner`. It limits 
     214the tree depth (the number of internal nodes on any path down the tree). 
     215For example, to get a two-level tree, call cut_tree(root, 2). The function 
     216is recursive, with the second argument (level) decreasing at each call; 
     217when zero, the current node will be made a leaf: 
     218 
     219    >>> def cut_tree(node, level): 
     220    ...     if node and node.branch_selector: 
     221    ...         if level: 
     222    ...             for branch in node.branches: 
     223    ...                 cut_tree(branch, level-1) 
     224    ...         else: 
     225    ...             node.branch_selector = None 
     226    ...             node.branches = None 
     227    ...             node.branch_descriptions = None 
     228 
     229The function acts only when :obj:`node` and :obj:`node.branch_selector` 
     230are defined. If the level is not zero, is recursively calls  the function 
     231for each branch. Otherwise, it clears the selector, branches and branch 
     232descriptions. 
     233 
     234    >>> cut_tree(tree_classifier.tree, 2) 
     235    >>> print_tree(tree_classifier) 
     236    <BLANKLINE> 
     237    tear_rate (<15.000, 4.000, 5.000>)  
     238    : normal  
     239       astigmatic (<3.000, 4.000, 5.000>)  
     240       : no --> soft (<1.000, 0.000, 5.000>)   
     241       : yes --> hard (<2.000, 4.000, 0.000>)   
     242    : reduced --> none (<12.000, 0.000, 0.000>)  
     243 
     244Setting learning parameters 
     245=========================== 
     246 
     247Let us construct a :obj:`TreeLearner` to play with: 
     248 
     249    >>> import Orange 
     250    >>> lenses = Orange.data.Table("lenses") 
     251    >>> learner = Orange.classification.tree.TreeLearner() 
     252 
     253There are three crucial components in learning: the 
     254:obj:`~TreeLearner.split` and :obj:`~TreeLearner.stop` criteria, and the 
     255example :obj:`~TreeLearner.splitter`. The default ``stop`` is set with: 
     256 
     257    >>> learner.stop = Orange.classification.tree.StopCriteria_common() 
     258 
     259The default stopping parameters are: 
     260 
     261    >>> print learner.stop.max_majority, learner.stop.min_examples 
     262    1.0 0.0 
     263 
     264The defaults only stop splitting when no instances are left or all of 
     265them are in the same class. 
     266 
     267If the minimal subset that is allowed to be split further is set to five 
     268instances, the resulting tree is smaller. 
     269 
     270    >>> learner.stop.min_examples = 5.0 
     271    >>> tree = learner(lenses) 
     272    >>> print tree 
     273    tear_rate=reduced: none (100.00%) 
     274    tear_rate=normal 
     275    |    astigmatic=no 
     276    |    |    age=pre-presbyopic: soft (100.00%) 
     277    |    |    age=presbyopic: none (50.00%) 
     278    |    |    age=young: soft (100.00%) 
     279    |    astigmatic=yes 
     280    |    |    prescription=hypermetrope: none (66.67%) 
     281    |    |    prescription=myope: hard (100.00%) 
     282    <BLANKLINE> 
     283 
     284We can also limit the maximal proportion of majority class. 
     285 
     286    >>> learner.stop.max_majority = 0.5 
     287    >>> tree = learner(lenses) 
     288    >>> print tree 
     289    none (62.50%) 
     290 
     291Redefining tree induction components 
     292==================================== 
     293 
     294This example shows how to use a custom stop function.  First, the 
     295``def_stop`` function defines the default stop function. The first tree 
     296has some added randomness; the induction also stops in 20% of the 
     297cases when ``def_stop`` returns False. The stopping criteria for the 
     298second tree is completely random: it stops induction in 20% of cases. 
     299Note that in the second case lambda function still has three parameters, 
     300even though in does not need any, since so many are necessary 
     301for :obj:`~TreeLearner.stop`. 
     302 
     303.. literalinclude:: code/tree3.py 
     304   :lines: 8-23 
     305 
     306--------------------------------- 
     307Learner and Classifier Components 
     308--------------------------------- 
     309 
     310Split constructors 
     311===================== 
     312 
     313.. class:: SplitConstructor 
     314 
     315    Decide how to divide learning instances, ie. define branching criteria. 
     316     
     317    The :obj:`SplitConstructor` should use the domain 
     318    contingency when possible, both for speed and adaptability.  
     319    Sometimes domain contingency does 
     320    not suffice, for example if ReliefF score is used. 
     321 
     322    A :obj:`SplitConstructor` can veto further tree induction by returning 
     323    no classifier. This is generally related to the number of learning 
     324    instances that would go in each branch. If there are no splits with 
     325    more than :obj:`SplitConstructor.min_subset` instances in the branches 
     326    (null nodes are allowed), the induction is stopped. 
     327 
     328    Split constructors that cannot handle a particular feature 
     329    type (discrete, continuous) quietly skip them. When in doubt, use 
     330    :obj:`SplitConstructor_Combined`, which delegates features to 
     331    specialized split constructors. 
     332 
     333    The same split constructors can be used both for classification and 
     334    regression, if the chosen score (for :obj:`SplitConstructor_Score` 
     335    and derived classes) supports both. 
     336 
     337    .. attribute:: min_subset 
     338 
     339        The minimal (weighted) number in non-null leaves. 
     340 
     341    .. method:: __call__(data, [ weightID, contingency, apriori_distribution, candidates, clsfr])  
     342 
     343        :param data: in any acceptable form. 
     344        :param weightID: Optional; the default of 0 means that all 
     345            instances have a weight of 1.0.  
     346        :param contingency: a domain contingency 
     347        :param apriori_distribution: apriori class probabilities. 
     348        :type apriori_distribution: :obj:`Orange.statistics.distribution.Distribution` 
     349        :param candidates: only consider these  
     350            features (one boolean for each feature). 
     351        :param clsfr: a node classifier (if it was constructed, that is,  
     352            if :obj:`store_node_classifier` is True)  
     353 
     354        Construct a split. Return a tuple (:obj:`branch_selector`, 
     355        :obj:`branch_descriptions` (a list), :obj:`subset_sizes` 
     356        (the number of instances for each branch, may also be 
     357        empty), :obj:`quality` (higher numbers mean better splits), 
     358        :obj:`spent_feature`). If no split is constructed, 
     359        the :obj:`selector`, :obj:`branch_descriptions` and 
     360        :obj:`subset_sizes` are None, while :obj:`quality` is 0.0 and 
     361        :obj:`spent_feature` is -1. 
     362 
     363        If the chosen feature will be useless in the future and 
     364        should not be considered for splitting in any of the subtrees 
     365        (typically, when discrete features are used as-they-are, without 
     366        any binarization or subsetting), then it should return the index 
     367        of this feature as :obj:`spent_feature`. If no features are spent, 
     368        :obj:`spent_feature` is -1. 
     369 
     370.. class:: SplitConstructor_Score 
     371 
     372    Bases: :class:`SplitConstructor` 
     373 
     374    An abstract base class that compare splits 
     375    with a :class:`Orange.feature.scoring.Score`.  All split 
     376    constructors except for :obj:`SplitConstructor_Combined` are derived 
     377    from this class. 
     378 
     379    .. attribute:: measure 
     380 
     381        A :class:`Orange.feature.scoring.Score` for split evaluation. It 
     382        has to handle the class type - for example, you cannot use 
     383        :class:`~Orange.feature.scoring.GainRatio` for regression or 
     384        :class:`~Orange.feature.scoring.MSE` for classification. 
     385 
     386    .. attribute:: worst_acceptable 
     387 
     388        The lowest allowed split quality.  The value strongly depends 
     389        on chosen :obj:`measure` component. Default is 0.0. 
     390 
     391.. class:: SplitConstructor_Feature 
     392 
     393    Bases: :class:`SplitConstructor_Score` 
     394 
     395    Each value of a discrete feature corresponds to a branch.  The feature 
     396    with the highest score (:obj:`~Measure.measure`) is selected. When 
     397    tied, a random feature is selected. 
     398 
     399    The constructed :obj:`branch_selector` is an instance of 
     400    :obj:`orange.ClassifierFromVarFD`, that returns a value of the selected 
     401    feature. :obj:`branch_description` contains the feature's 
     402    values. The feature is marked as spent (it cannot reappear in the 
     403    node's subtrees). 
     404 
     405.. class:: SplitConstructor_ExhaustiveBinary 
     406 
     407    Bases: :class:`SplitConstructor_Score` 
     408 
     409    Finds the binarization with the highest score among all features. In 
     410    case of ties, a random feature is selected. 
     411 
     412    The constructed :obj:`branch_selector` is an instance 
     413    :obj:`orange.ClassifierFromVarFD`, that returns a value of the 
     414    selected feature. Its :obj:`transformer` contains a ``MapIntValue`` 
     415    that maps values of the feature into a binary feature. Branches 
     416    with a single feature value are described with that value and 
     417    branches with more than one are described with ``[<val1>, <val2>, 
     418    ..., <valn>]``. Only binary features are marked as spent. 
     419 
     420.. class:: SplitConstructor_Threshold 
     421 
     422    Bases: :class:`SplitConstructor_Score` 
     423 
     424    The only split constructor for continuous features. It divides the 
     425    range of feature values with a threshold that maximizes the split's 
     426    quality. In case of ties, a random feature is selected.  The feature 
     427    that yields the best binary split is returned. 
     428 
     429    The constructed :obj:`branch_selector` is an instance of 
     430    :obj:`orange.ClassifierFromVarFD` with an attached :obj:`transformer`, 
     431    of type :obj:`Orange.feature.discretization.ThresholdDiscretizer`. The 
     432    branch descriptions are "<threshold" and ">=threshold". The feature 
     433    is not spent. 
     434 
     435.. class:: SplitConstructor_OneAgainstOthers 
     436     
     437    Bases: :class:`SplitConstructor_Score` 
     438 
     439    Undocumented. 
     440 
     441.. class:: SplitConstructor_Combined 
     442 
     443    Bases: :class:`SplitConstructor` 
     444 
     445    Uses different split constructors for discrete and continuous 
     446    features. Each split constructor is called with appropriate 
     447    features. Both construct a candidate for a split; the better of them 
     448    is used. 
     449 
     450    The choice of the split is not probabilistically fair, when 
     451    multiple candidates have the same score. For example, if there 
     452    are nine discrete features with the highest score the split 
     453    constructor for discrete features will select one of them. Now, 
     454    if there is also a single continuous feature with the same score, 
     455    :obj:`SplitConstructor_Combined` would randomly select between the 
     456    proposed discrete feature and continuous feature, unaware that the 
     457    discrete feature  has already competed with eight others.  So, 
     458    the probability for selecting (each) discrete feature would be 
     459    1/18 instead of 1/10. Although incorrect, this should not affect 
     460    the performance. 
     461 
     462    .. attribute: discrete_split_constructor 
     463 
     464        Split constructor for discrete features;  
     465        for instance, :obj:`SplitConstructor_Feature` or 
     466        :obj:`SplitConstructor_ExhaustiveBinary`. 
     467 
     468    .. attribute: continuous_split_constructor 
     469 
     470        Split constructor for continuous features; it  
     471        can be either :obj:`SplitConstructor_Threshold` or a  
     472        a custom split constructor. 
     473 
     474 
     475StopCriteria and StopCriteria_common 
     476============================================ 
     477 
     478:obj:`StopCriteria` determines when to stop the induction of subtrees.  
     479 
     480.. class:: StopCriteria 
     481 
     482    Provides the basic stopping criteria: the tree induction stops 
     483    when there is at most one instance left (the actual, not weighted, 
     484    number). The induction also stops when all instances are in the 
     485    same class (for discrete problems) or have the same outcome value 
     486    (for regression problems). 
     487 
     488    .. method:: __call__(instances[, weightID, domain contingencies]) 
     489 
     490        Return True (stop) of False (continue the induction). 
     491        Contingencies are not used for counting. Derived classes should 
     492        use the contingencies whenever possible. 
     493 
     494.. class:: StopCriteria_common 
     495 
     496    Pre-pruning with additional criteria. 
     497 
     498    .. attribute:: max_majority 
     499 
     500        Maximum proportion of majority class. When exceeded, 
     501        induction stops. 
     502 
     503    .. attribute:: min_instances 
     504 
     505        Minimum number of instances for splitting. Subsets with less 
     506        than :obj:`min_instances` instances are not split further. 
     507        The sample count is weighed. 
     508 
     509 
     510Splitters 
     511================= 
     512 
     513Splitters sort learning instances into branches (the branches are selected 
     514with a :obj:`SplitConstructor`, while a :obj:`Descender` decides the 
     515branch for an instance during classification). 
     516 
     517Most splitters call :obj:`Node.branch_selector` and assign 
     518instances correspondingly. When the value is unknown they choose a 
     519particular branch or skip the instance. 
     520 
     521Some splitters can also split instances: a weighed instance is  
     522used in more than than one subset. Each branch has a weight ID (usually, 
     523each its own ID) and all instances in that branch should have this meta attribute.  
     524 
     525An instance that  
     526hasn't been split has only one additional attribute (weight 
     527ID corresponding to the subset to which it went). Instance that is split 
     528between, say, three subsets, has three new meta attributes, one for each 
     529subset. The weights are used only when needed; when there is no 
     530splitting - no weight IDs are returned. 
     531 
     532.. class:: Splitter 
     533 
     534    An abstract base class that splits instances 
     535    into subsets. 
     536 
     537    .. method:: __call__(node, instances[, weightID]) 
     538 
     539        :param node: a node. 
     540        :type node: :obj:`Node` 
     541        :param instances: a set of instances 
     542        :param weightID: weight ID.  
     543         
     544        Use the information in :obj:`Node` (particularly the 
     545        :obj:`~Node.branch_selector`) to split the given set of instances into 
     546        subsets.  Return a tuple with a list of instance subsets and 
     547        a list of weights.  The list of weights is either a 
     548        list of integers or None when no weights are added. 
     549 
     550.. class:: Splitter_IgnoreUnknowns 
     551 
     552    Bases: :class:`Splitter` 
     553 
     554    Ignores the instances for which no single branch can be determined. 
     555 
     556.. class:: Splitter_UnknownsToCommon 
     557 
     558    Bases: :class:`Splitter` 
     559 
     560    Places all ambiguous instances to a branch with the highest number of 
     561    instances. If there is more than one such branch, one is selected at 
     562    random and then used for all instances. 
     563 
     564.. class:: Splitter_UnknownsToAll 
     565 
     566    Bases: :class:`Splitter` 
     567 
     568    Splits instances with an unknown value of the feature into all branches. 
     569 
     570.. class:: Splitter_UnknownsToRandom 
     571 
     572    Bases: :class:`Splitter` 
     573 
     574    Selects a random branch for ambiguous instances. 
     575 
     576.. class:: Splitter_UnknownsToBranch 
     577 
     578    Bases: :class:`Splitter` 
     579 
     580    Constructs an additional branch for ambiguous instances.  
     581    The branch's description is "unknown". 
     582 
     583.. class:: Splitter_UnknownsAsBranchSizes 
     584 
     585    Bases: :class:`Splitter` 
     586 
     587    Splits instances with unknown value of the feature according to 
     588    proportions of instances in each branch. 
     589 
     590.. class:: Splitter_UnknownsAsSelector 
     591 
     592    Bases: :class:`Splitter` 
     593 
     594    Splits instances with unknown value of the feature according to 
     595    distribution proposed by selector (usually the same as proportions 
     596    of instances in branches). 
     597 
     598Descenders 
     599============================= 
     600 
     601Descenders decide where should the instances that cannot be unambiguously 
     602put in a single branch go during classification (the branches are selected 
     603with a :obj:`SplitConstructor`, while a :obj:`Splitter` sorts instances 
     604during learning). 
     605 
     606.. class:: Descender 
     607 
     608    An abstract base tree descender. It descends a 
     609    an instance as deep as possible, according to the values 
     610    of instance's features. The :obj:`Descender`: calls the node's 
     611    :obj:`~Node.branch_selector` to get the branch index. If it's a 
     612    simple index, the corresponding branch is followed. If not, the 
     613    descender decides what to do. A descender can choose a single 
     614    branch (for instance, the one that is the most recommended by the 
     615    :obj:`~Node.branch_selector`) or it can let the branches vote. 
     616 
     617    Three are possible outcomes of a descent: 
     618 
     619    #. The descender reaches a leaf. This happens when 
     620       there were no unknown or out-of-range values, or when the 
     621       descender selected a single branch and continued the descend 
     622       despite them. The descender returns the :obj:`Node` it has reached. 
     623    #. Node's :obj:`~Node.branch_selector` returned a distribution and 
     624       :obj:`Descender` decided to stop the descend at this (internal) 
     625       node. It returns the current :obj:`Node`. 
     626    #. Node's :obj:`~Node.branch_selector` returned a distribution and the 
     627       :obj:`Node` wants to split the instance (i.e., to decide the class 
     628       by voting). It returns a :obj:`Node` and the vote-weights for 
     629       the branches.  The weights can correspond, for example,  to the 
     630       distribution returned by node's :obj:`~Node.branch_selector`, or to 
     631       the number of learning instances that were assigned to each branch. 
     632 
     633    .. method:: __call__(node, instance) 
     634 
     635        Descends until it reaches a leaf or a node in 
     636        which a vote of subtrees is required. A tuple 
     637        of two elements is returned. If it reached a leaf, the tuple contains 
     638        the leaf node and None. If not, it contains a node and 
     639        a list of floats (weights of votes). 
     640 
     641.. class:: Descender_UnknownToNode 
     642 
     643    Bases: :obj:`Descender` 
     644 
     645    When instance cannot be classified into a single branch, the current 
     646    node is returned. Thus, the node's :obj:`~Node.node_classifier` 
     647    will be used to make a decision. Therefore, internal nodes 
     648    need to have :obj:`Node.node_classifier` defined. 
     649 
     650.. class:: Descender_UnknownToBranch 
     651 
     652    Bases: :obj:`Descender` 
     653 
     654    Classifies instances with unknown value to a special branch. This 
     655    makes sense only if the tree itself was constructed with 
     656    :obj:`Splitter_UnknownsToBranch`. 
     657 
     658.. class:: Descender_UnknownToCommonBranch 
     659 
     660    Bases: :obj:`Descender` 
     661 
     662    Classifies instances with unknown values to the branch with the 
     663    highest number of instances. If there is more than one such branch, 
     664    random branch is chosen for each instance. 
     665 
     666.. class:: Descender_UnknownToCommonSelector 
     667 
     668    Bases: :obj:`Descender` 
     669 
     670    Classifies instances with unknown values to the branch which received 
     671    the highest recommendation by the selector. 
     672 
     673.. class:: Descender_UnknownMergeAsBranchSizes 
     674 
     675    Bases: :obj:`Descender` 
     676 
     677    The subtrees vote for the instance's class; the vote is weighted 
     678    according to the sizes of the branches. 
     679 
     680.. class:: Descender_UnknownMergeAsSelector 
     681 
     682    Bases: :obj:`Descender` 
     683 
     684    The subtrees vote for the instance's class; the vote is weighted 
     685    according to the selectors proposal. 
     686 
     687Pruning 
     688======= 
     689 
     690.. index:: 
     691    pair: classification trees; pruning 
     692 
     693The pruners construct a shallow copy of a tree. The pruned tree's 
     694:obj:`Node` contain references to the same contingency matrices, 
     695node classifiers, branch selectors, ...  as the original tree. 
     696 
     697Pruners cannot construct a new :obj:`~Node.node_classifier`.  Thus, for 
     698pruning, internal nodes must have :obj:`~Node.node_classifier` defined 
     699(the default). 
     700 
     701.. class:: Pruner 
     702 
     703    An abstract base tree pruner. 
     704 
     705    .. method:: __call__(tree) 
     706 
     707        :param tree: either 
     708            a :obj:`Node` or (the C++ version of the classifier, 
     709            saved in :obj:`TreeClassfier.base_classifier`). 
     710 
     711        The resulting pruned tree is of the same type as the argument. 
     712        The original tree remains intact. 
     713 
     714.. class:: Pruner_SameMajority 
     715 
     716    Bases: :class:`Pruner` 
     717 
     718    A tree can have a subtrees where all the leaves have 
     719    the same majority class. This is allowed because leaves can still 
     720    have different class distributions and thus predict different 
     721    probabilities.  The :obj:`Pruner_SameMajority` prunes the tree so 
     722    that there is no subtree in which all the nodes would have the same 
     723    majority class. 
     724 
     725    This pruner will only prune the nodes in which the node classifier 
     726    is a :obj:`~Orange.classification.ConstantClassifier` 
     727    (or a derived class). 
     728 
     729    The pruning works from leaves to the root. 
     730    It siblings have (at least one) common majority class, they can be pruned. 
     731 
     732.. class:: Pruner_m 
     733 
     734    Bases: :class:`Pruner` 
     735 
     736    Prunes a tree by comparing m-estimates of static and dynamic  
     737    error as defined in (Bratko, 2002). 
     738 
     739    .. attribute:: m 
     740 
     741        Parameter m for m-estimation. 
     742 
     743Printing the tree 
     744================= 
     745 
     746The tree printing functions are very flexible. They can print, for 
     747example, numbers of instances, proportions of majority class in nodes 
     748and similar, or more complex statistics like the proportion of instances 
     749in a particular class divided by the proportion of instances of this 
     750class in a parent node. Users may also pass their own functions to print 
     751certain elements. 
     752 
     753The easiest way to print the tree is to print :func:`TreeClassifier`:: 
     754 
     755    >>> print tree 
     756    petal width<0.800: Iris-setosa (100.00%) 
     757    petal width>=0.800 
     758    |    petal width<1.750 
     759    |    |    petal length<5.350: Iris-versicolor (94.23%) 
     760    |    |    petal length>=5.350: Iris-virginica (100.00%) 
     761    |    petal width>=1.750 
     762    |    |    petal length<4.850: Iris-virginica (66.67%) 
     763    |    |    petal length>=4.850: Iris-virginica (100.00%) 
     764 
     765 
     766Format string 
     767------------- 
     768 
     769Format strings are printed at every leaf or internal node with the certain 
     770format specifiers replaced by data from the tree node. Specifiers are 
     771generally of form **%[^]<precision><quantity><divisor>**. 
     772 
     773**^** at the start tells that the number should be multiplied by 100, 
     774which is useful for proportions like percentages. 
     775 
     776**<precision>** is in the same format as in Python (or C) string 
     777formatting. For instance, ``%N`` denotes the number of instances in 
     778the node, hence ``%6.2N`` would mean output to two decimal digits 
     779and six places altogether. If left out, a default format ``5.3`` is 
     780used, unless the numbers are multiplied by 100, in which case the default 
     781is ``.0`` (no decimals, the number is rounded to the nearest integer). 
     782 
     783**<divisor>** tells what to divide the quantity in that node with. 
     784``bP`` means division by the same quantity in the parent node; for instance, 
     785``%NbP`` gives the number of instances in the node divided by the 
     786number of instances in parent node. Precision formatting can be added, 
     787e.g. ``%6.2NbP``. ``bA`` denotes division by the same quantity over the entire 
     788data set, so ``%NbA`` will tell you the proportion of instances (out 
     789of the entire training data set) that fell into that node. If division is 
     790impossible since the parent node does not exist or some data is missing, 
     791a dot is printed out instead. 
     792 
     793**<quantity>** defines what to print and is the only required element.  
     794It can be: 
     795 
     796``V`` 
     797    The predicted value at that node. Precision  
     798    or divisor can not be defined here. 
     799 
     800``N`` 
     801    The number of instances in the node. 
     802 
     803``M`` 
     804    The number of instances in the majority class (that is, the class  
     805    predicted by the node). 
     806 
     807``m`` 
     808    The proportion of instances in the majority class. 
     809 
     810``A`` 
     811    The average class for instances the node; this is available only for  
     812    regression trees. 
     813 
     814``E`` 
     815    Standard error for class of instances in the node; available only for 
     816    regression trees. 
     817 
     818``I`` 
     819    Print out the confidence interval. The modifier is used as  
     820    ``%I(95)`` of (more complicated) ``%5.3I(95)bP``. 
     821 
     822``C`` 
     823    The number of instances in the given class.  For a classification 
     824    example, ``%5.3C="Iris-virginica"bP`` denotes the number of instances 
     825    of Iris-virginica by the number of instances this class in the parent 
     826    node ( instances that are *not* Iris-virginica could be described with 
     827    ``%5.3CbP!="Iris-virginica"``). 
     828 
     829    For regression trees, use operators =, !=, <, <=, >, and >=, as in 
     830    ``%C<22``, with optional precision and divisor. Intervals are also 
     831    possible: ``%C[20, 22]`` gives the number of instances between 
     832    20 and 22 (inclusive) and ``%C(20, 22)`` gives the number of such 
     833    instances excluding the boundaries. Mixing of parentheses is allowed, 
     834    e.g. ``%C(20, 22]``.  Add ``!`` for instances outside the interval, 
     835    like ``%C!(20, 22]``. 
     836 
     837``c`` 
     838    Same as ``C``, except that it computes the proportion of the class 
     839    instead of the number of instances. 
     840 
     841``D`` 
     842    The number of instances in each class. Precision and the divisor 
     843    are applied to each number in the distribution.  This quantity can 
     844    not be computed for regression trees. 
     845 
     846``d`` 
     847    Same as ``D``, except that it shows proportions of instances. 
     848 
     849<user defined formats> 
     850    Instructions and examples of added formats are at the end of this 
     851    section. 
     852 
     853.. rubric:: Examples on classification trees 
     854 
     855A tree on the iris data set with the depth limited to three 
     856levels is built as follows: 
     857     
     858.. literalinclude:: code/orngTree1.py 
     859   :lines: 1-4 
     860 
     861Printing the predicted class at each node, the number 
     862of instances in the majority class with the total number of instances in 
     863the node requires a custom format string:: 
     864 
     865    >>> print tree.to_string(leaf_str="%V (%M out of %N)") 
     866    petal width<0.800: Iris-setosa (50.000 out of 50.000) 
     867    petal width>=0.800 
     868    |    petal width<1.750 
     869    |    |    petal length<5.350: Iris-versicolor (49.000 out of 52.000) 
     870    |    |    petal length>=5.350: Iris-virginica (2.000 out of 2.000) 
     871    |    petal width>=1.750 
     872    |    |    petal length<4.850: Iris-virginica (2.000 out of 3.000) 
     873    |    |    petal length>=4.850: Iris-virginica (43.000 out of 43.000) 
     874 
     875The number of instances as 
     876compared to the entire data set and to the parent node:: 
     877 
     878    >>> print tree.to_string(leaf_str="%V (%^MbA%, %^MbP%)") 
     879    petal width<0.800: Iris-setosa (100%, 100%) 
     880    petal width>=0.800 
     881    |    petal width<1.750 
     882    |    |    petal length<5.350: Iris-versicolor (98%, 100%) 
     883    |    |    petal length>=5.350: Iris-virginica (4%, 40%) 
     884    |    petal width>=1.750 
     885    |    |    petal length<4.850: Iris-virginica (4%, 4%) 
     886    |    |    petal length>=4.850: Iris-virginica (86%, 96%) 
     887 
     888``%M`` is the number of instances in the majority class. Dividing by 
     889the number of all instances from this class on the entire data set 
     890is described with ``%MbA``. Add ``^`` in front for mutiplication with 
     891100. The percent sign *after* that is printed out literally, just as the 
     892comma and parentheses. For the proportion of this class in the parent the 
     893``bA`` is replaced with ``bA``. 
     894 
     895To print the number of versicolors in each node, together with the 
     896proportion of versicolors among the instances in this particular node 
     897and among all versicolors, use the following:: 
     898 
     899    '%C="Iris-versicolor" (%^c="Iris-versicolor"% of node, %^CbA="Iris-versicolor"% of versicolors) 
     900 
     901It gives:: 
     902 
     903    petal width<0.800: 0.000 (0% of node, 0% of versicolors) 
     904    petal width>=0.800 
     905    |    petal width<1.750 
     906    |    |    petal length<5.350: 49.000 (94% of node, 98% of versicolors) 
     907    |    |    petal length>=5.350: 0.000 (0% of node, 0% of versicolors) 
     908    |    petal width>=1.750 
     909    |    |    petal length<4.850: 1.000 (33% of node, 2% of versicolors) 
     910    |    |    petal length>=4.850: 0.000 (0% of node, 0% of versicolors) 
     911 
     912Finally, to print the distributions using a format string ``%D``:: 
     913 
     914    petal width<0.800: [50.000, 0.000, 0.000] 
     915    petal width>=0.800 
     916    |    petal width<1.750 
     917    |    |    petal length<5.350: [0.000, 49.000, 3.000] 
     918    |    |    petal length>=5.350: [0.000, 0.000, 2.000] 
     919    |    petal width>=1.750 
     920    |    |    petal length<4.850: [0.000, 1.000, 2.000] 
     921    |    |    petal length>=4.850: [0.000, 0.000, 43.000] 
     922 
     923As the order of classes is the same as in ``data.domain.class_var.values`` 
     924(setosa, versicolor, virginica), there are 49 versicolors and 3 virginicae 
     925in the node at ``petal length<5.350``. To print the proportions within 
     926nodes rounded to two decimals use ``%.2d``:: 
     927 
     928    petal width<0.800: [1.00, 0.00, 0.00] 
     929    petal width>=0.800 
     930    |    petal width<1.750 
     931    |    |    petal length<5.350: [0.00, 0.94, 0.06] 
     932    |    |    petal length>=5.350: [0.00, 0.00, 1.00] 
     933    |    petal width>=1.750 
     934    |    |    petal length<4.850: [0.00, 0.33, 0.67] 
     935    |    |    petal length>=4.850: [0.00, 0.00, 1.00] 
     936 
     937The most trivial format string for internal nodes is for printing 
     938node predictions. ``.`` in the following example specifies 
     939that the node_str should be the same as leaf_str. 
     940 
     941:: 
     942 
     943    tree.to_string(leaf_str="%V", node_str=".") 
     944  
     945The output:: 
     946 
     947    root: Iris-setosa 
     948    |    petal width<0.800: Iris-setosa 
     949    |    petal width>=0.800: Iris-versicolor 
     950    |    |    petal width<1.750: Iris-versicolor 
     951    |    |    |    petal length<5.350: Iris-versicolor 
     952    |    |    |    petal length>=5.350: Iris-virginica 
     953    |    |    petal width>=1.750: Iris-virginica 
     954    |    |    |    petal length<4.850: Iris-virginica 
     955    |    |    |    petal length>=4.850: Iris-virginica 
     956 
     957A node *root* has appeared and the tree looks one level 
     958deeper. This is needed to also print the data for tree root. 
     959 
     960To observe how the number 
     961of virginicas decreases down the tree try:: 
     962 
     963    print tree.to_string(leaf_str='%^.1CbA="Iris-virginica"% (%^.1CbP="Iris-virginica"%)', node_str='.') 
     964 
     965Interpretation: ``CbA="Iris-virginica"`` is  
     966the number of instances from virginica, divided by the total number 
     967of instances in this class. Add ``^.1`` and the result will be 
     968multiplied and printed with one decimal. The trailing ``%`` is printed 
     969out. In parentheses the same thing was divided by 
     970the instances in the parent node. The single quotes were used for strings, so 
     971that double quotes inside the string can specify the class. 
     972 
     973:: 
     974 
     975    root: 100.0% (.%) 
     976    |    petal width<0.800: 0.0% (0.0%) 
     977    |    petal width>=0.800: 100.0% (100.0%) 
     978    |    |    petal width<1.750: 10.0% (10.0%) 
     979    |    |    |    petal length<5.350: 6.0% (60.0%) 
     980    |    |    |    petal length>=5.350: 4.0% (40.0%) 
     981    |    |    petal width>=1.750: 90.0% (90.0%) 
     982    |    |    |    petal length<4.850: 4.0% (4.4%) 
     983    |    |    |    petal length>=4.850: 86.0% (95.6%) 
     984 
     985If :meth:`~TreeClassifier.to_string` cannot compute something, in this case 
     986because the root has no parent, it prints out a dot. 
     987 
     988The final example with classification trees prints the distributions in 
     989nodes, the distribution compared to the parent, the proportions compared 
     990to the parent and the predicted class in the leaves:: 
     991 
     992    >>> print tree.to_string(leaf_str='"%V   %D %.2DbP %.2dbP"', node_str='"%D %.2DbP %.2dbP"') 
     993    root: [50.000, 50.000, 50.000] . . 
     994    |    petal width<0.800: [50.000, 0.000, 0.000] [1.00, 0.00, 0.00] [3.00, 0.00, 0.00]: 
     995    |        Iris-setosa   [50.000, 0.000, 0.000] [1.00, 0.00, 0.00] [3.00, 0.00, 0.00] 
     996    |    petal width>=0.800: [0.000, 50.000, 50.000] [0.00, 1.00, 1.00] [0.00, 1.50, 1.50] 
     997    |    |    petal width<1.750: [0.000, 49.000, 5.000] [0.00, 0.98, 0.10] [0.00, 1.81, 0.19] 
     998    |    |    |    petal length<5.350: [0.000, 49.000, 3.000] [0.00, 1.00, 0.60] [0.00, 1.04, 0.62]: 
     999    |    |    |        Iris-versicolor   [0.000, 49.000, 3.000] [0.00, 1.00, 0.60] [0.00, 1.04, 0.62] 
     1000    |    |    |    petal length>=5.350: [0.000, 0.000, 2.000] [0.00, 0.00, 0.40] [0.00, 0.00, 10.80]: 
     1001    |    |    |        Iris-virginica   [0.000, 0.000, 2.000] [0.00, 0.00, 0.40] [0.00, 0.00, 10.80] 
     1002    |    |    petal width>=1.750: [0.000, 1.000, 45.000] [0.00, 0.02, 0.90] [0.00, 0.04, 1.96] 
     1003    |    |    |    petal length<4.850: [0.000, 1.000, 2.000] [0.00, 1.00, 0.04] [0.00, 15.33, 0.68]: 
     1004    |    |    |        Iris-virginica   [0.000, 1.000, 2.000] [0.00, 1.00, 0.04] [0.00, 15.33, 0.68] 
     1005    |    |    |    petal length>=4.850: [0.000, 0.000, 43.000] [0.00, 0.00, 0.96] [0.00, 0.00, 1.02]: 
     1006    |    |    |        Iris-virginica   [0.000, 0.000, 43.000] [0.00, 0.00, 0.96] [0.00, 0.00, 1.02] 
     1007 
     1008 
     1009.. rubric:: Examples on regression trees 
     1010 
     1011The regression trees examples use a tree induced from the housing data 
     1012set. Without other argumets, :meth:`TreeClassifier.to_string` prints the 
     1013following:: 
     1014 
     1015    RM<6.941 
     1016    |    LSTAT<14.400 
     1017    |    |    DIS<1.385: 45.6 
     1018    |    |    DIS>=1.385: 22.9 
     1019    |    LSTAT>=14.400 
     1020    |    |    CRIM<6.992: 17.1 
     1021    |    |    CRIM>=6.992: 12.0 
     1022    RM>=6.941 
     1023    |    RM<7.437 
     1024    |    |    CRIM<7.393: 33.3 
     1025    |    |    CRIM>=7.393: 14.4 
     1026    |    RM>=7.437 
     1027    |    |    TAX<534.500: 45.9 
     1028    |    |    TAX>=534.500: 21.9 
     1029 
     1030To add the standard error in both internal nodes and leaves, and 
     1031the 90% confidence intervals in the leaves, use:: 
     1032 
     1033    >>> print tree.to_string(leaf_str="[SE: %E]\t %V %I(90)", node_str="[SE: %E]") 
     1034    root: [SE: 0.409] 
     1035    |    RM<6.941: [SE: 0.306] 
     1036    |    |    LSTAT<14.400: [SE: 0.320] 
     1037    |    |    |    DIS<1.385: [SE: 4.420]: 
     1038    |    |    |        [SE: 4.420]   45.6 [38.331-52.829] 
     1039    |    |    |    DIS>=1.385: [SE: 0.244]: 
     1040    |    |    |        [SE: 0.244]   22.9 [22.504-23.306] 
     1041    |    |    LSTAT>=14.400: [SE: 0.333] 
     1042    |    |    |    CRIM<6.992: [SE: 0.338]: 
     1043    |    |    |        [SE: 0.338]   17.1 [16.584-17.691] 
     1044    |    |    |    CRIM>=6.992: [SE: 0.448]: 
     1045    |    |    |        [SE: 0.448]   12.0 [11.243-12.714] 
     1046    |    RM>=6.941: [SE: 1.031] 
     1047    |    |    RM<7.437: [SE: 0.958] 
     1048    |    |    |    CRIM<7.393: [SE: 0.692]: 
     1049    |    |    |        [SE: 0.692]   33.3 [32.214-34.484] 
     1050    |    |    |    CRIM>=7.393: [SE: 2.157]: 
     1051    |    |    |        [SE: 2.157]   14.4 [10.862-17.938] 
     1052    |    |    RM>=7.437: [SE: 1.124] 
     1053    |    |    |    TAX<534.500: [SE: 0.817]: 
     1054    |    |    |        [SE: 0.817]   45.9 [44.556-47.237] 
     1055    |    |    |    TAX>=534.500: [SE: 0.000]: 
     1056    |    |    |        [SE: 0.000]   21.9 [21.900-21.900] 
     1057 
     1058The predicted value (``%V``) and the average (``%A``) may differ because 
     1059a regression tree does not always predict the leaf average, but whatever 
     1060the :obj:`~Node.node_classifier` in a leaf returns.  As ``%V`` uses the 
     1061:obj:`Orange.feature.Continuous`' function for printing the 
     1062value, the number has the same number of decimals as in the data file. 
     1063 
     1064Regression trees cannot print the distributions in the same way 
     1065as classification trees. They instead offer a set of operators for 
     1066observing the number of instances within a certain range. For instance, 
     1067to print the number of instances with values below 22 and compare 
     1068it with values in the parent nodes use:: 
     1069 
     1070    >>> print tree.to_string(leaf_str="%C<22 (%cbP<22)", node_str=".") 
     1071    root: 277.000 (.) 
     1072    |    RM<6.941: 273.000 (1.160) 
     1073    |    |    LSTAT<14.400: 107.000 (0.661) 
     1074    |    |    |    DIS<1.385: 0.000 (0.000) 
     1075    |    |    |    DIS>=1.385: 107.000 (1.020) 
     1076    |    |    LSTAT>=14.400: 166.000 (1.494) 
     1077    |    |    |    CRIM<6.992: 93.000 (0.971) 
     1078    |    |    |    CRIM>=6.992: 73.000 (1.040) 
     1079    |    RM>=6.941: 4.000 (0.096) 
     1080    |    |    RM<7.437: 3.000 (1.239) 
     1081    |    |    |    CRIM<7.393: 0.000 (0.000) 
     1082    |    |    |    CRIM>=7.393: 3.000 (15.333) 
     1083    |    |    RM>=7.437: 1.000 (0.633) 
     1084    |    |    |    TAX<534.500: 0.000 (0.000) 
     1085    |    |    |    TAX>=534.500: 1.000 (30.000)</xmp> 
     1086 
     1087The last line, for instance, says the the number of instances with the 
     1088class below 22 is among those with tax above 534 is 30 times higher than 
     1089the number of such instances in its parent node. 
     1090 
     1091To count the same for all instances *outside* 
     1092interval [20, 22] and print out the proportions as percents use:: 
     1093 
     1094    >>> print tree.to_string(leaf_str="%C![20,22] (%^cbP![20,22]%)", node_str=".") 
     1095 
     1096The format string  ``%c![20, 22]`` denotes the proportion of instances 
     1097(within the node) whose values are below 20 or above 22. ``%cbP![20, 
     109822]`` derives same statistics computed on the parent. A ``^`` is added 
     1099for percentages. 
     1100 
     1101:: 
     1102 
     1103    root: 439.000 (.%) 
     1104    |    RM<6.941: 364.000 (98%) 
     1105    |    |    LSTAT<14.400: 200.000 (93%) 
     1106    |    |    |    DIS<1.385: 5.000 (127%) 
     1107    |    |    |    DIS>=1.385: 195.000 (99%) 
     1108    |    |    LSTAT>=14.400: 164.000 (111%) 
     1109    |    |    |    CRIM<6.992: 91.000 (96%) 
     1110    |    |    |    CRIM>=6.992: 73.000 (105%) 
     1111    |    RM>=6.941: 75.000 (114%) 
     1112    |    |    RM<7.437: 46.000 (101%) 
     1113    |    |    |    CRIM<7.393: 43.000 (100%) 
     1114    |    |    |    CRIM>=7.393: 3.000 (100%) 
     1115    |    |    RM>=7.437: 29.000 (98%) 
     1116    |    |    |    TAX<534.500: 29.000 (103%) 
     1117    |    |    |    TAX>=534.500: 0.000 (0%) 
     1118 
     1119 
     1120Defining custom printouts 
     1121------------------------- 
     1122 
     1123:meth:`TreeClassifier.to_string`'s argument :obj:`user_formats` can be used to 
     1124print other information.  :obj:`~TreeClassifier.format.user_formats` should 
     1125contain a list of tuples with a regular expression and a function to be 
     1126called when that expression is found in the format string. Expressions 
     1127from :obj:`user_formats` are checked before the built-in expressions 
     1128discussed above. 
     1129 
     1130The regular expression should describe a string like used above, 
     1131for instance ``%.2DbP``. When a leaf or internal node 
     1132is printed, the format string (:obj:`leaf_str` or :obj:`node_str`) 
     1133is checked for these regular expressions and when the match is found, 
     1134the corresponding callback function is called. 
     1135 
     1136The passed function will get five arguments: the format string  
     1137(:obj:`leaf_str` or :obj:`node_str`), the match object, the node which is 
     1138being printed, its parent (can be None) and the tree classifier. 
     1139The function should return the format string in which the part described 
     1140by the match object (that is, the part that is matched by the regular 
     1141expression) is replaced by whatever information your callback function 
     1142is supposed to give. 
     1143 
     1144The function can use several utility function provided in the module. 
     1145 
     1146.. autofunction:: insert_str 
     1147 
     1148.. autofunction:: insert_dot 
     1149 
     1150.. autofunction:: insert_num 
     1151 
     1152.. autofunction:: by_whom 
     1153 
     1154The module also includes reusable regular expressions:  
     1155 
     1156.. autodata:: fs 
     1157 
     1158.. autodata:: by 
     1159 
     1160For a trivial example, ``%V`` is implemented with the 
     1161following tuple:: 
     1162 
     1163    (re.compile("%V"), replaceV) 
     1164 
     1165And ``replaceV`` is defined by:: 
     1166 
     1167    def replaceV(strg, mo, node, parent, tree): 
     1168        return insert_str(strg, mo, str(node.node_classifier.default_value)) 
     1169 
     1170``replaceV`` takes the value predicted at the node 
     1171(``node.node_classifier.default_value`` ), converts it to a string 
     1172and passes it to :func:`insert_str`. 
     1173 
     1174A more complex regular expression is the one for the proportion of 
     1175majority class, defined as ``"%"+fs+"M"+by``. It uses the two partial 
     1176expressions defined above (:obj:`fs` and :obj:`by`). 
     1177 
     1178The following code prints the classification margin for each node, 
     1179that is, the difference between the proportion of the largest and the 
     1180second largest class in the node: 
     1181 
     1182.. literalinclude:: code/orngTree2.py 
     1183   :lines: 7-31 
     1184 
     1185``get_margin`` computes the margin from the distribution. The replacing 
     1186function, ``replaceB``, computes the margin for the node.  If :data:`by` 
     1187group is present, we call :func:`by_whom` to get the node with whose 
     1188margin this node's margin is to be divided. If this node (usually the 
     1189parent) does not exist of if its margin is zero, :func:`insert_dot` 
     1190inserts dot, otherwise :func:`insert_num` is called which inserts the 
     1191number in the user-specified format.  ``my_format`` contains the regular 
     1192expression and the callback function. 
     1193 
     1194Printing the tree with 
     1195 
     1196.. literalinclude:: code/orngTree2.py 
     1197    :lines: 33 
     1198 
     1199yields:: 
     1200 
     1201    petal width<0.800: Iris-setosa 100% (100.00%) 
     1202    petal width>=0.800 
     1203    |    petal width<1.750 
     1204    |    |    petal length<5.350: Iris-versicolor 88% (108.57%) 
     1205    |    |    petal length>=5.350: Iris-virginica 100% (122.73%) 
     1206    |    petal width>=1.750 
     1207    |    |    petal length<4.850: Iris-virginica 33% (34.85%) 
     1208    |    |    petal length>=4.850: Iris-virginica 100% (104.55%) 
     1209 
     1210Plotting with Dot 
     1211--------------------------- 
     1212 
     1213To produce images of trees, first create a .dot file with 
     1214:meth:`TreeClassifier.dot`. If it was saved to "tree5.dot", plot a gif 
     1215with the following command:: 
     1216 
     1217    dot -Tgif tree5.dot -otree5.gif 
     1218 
     1219Check GraphViz's dot documentation for more options and 
     1220output formats. 
     1221 
     1222 
     1223=========================== 
     1224C4.5 Tree Inducer 
     1225=========================== 
     1226 
     1227C4.5 is, as  a standard benchmark in machine learning, incorporated in 
     1228Orange. The implementation uses the original C4.5 code, so the resulting 
     1229tree is exactly like the one that would be build by standalone C4.5. The 
     1230tree build is only made accessible in Python. 
     1231 
     1232:class:`C45Learner` and :class:`C45Classifier` behave 
     1233like any other Orange learner and classifier. Unlike most of Orange  
     1234learning algorithms, C4.5 does not accepts weighted instances. 
     1235 
     1236------------------------- 
     1237Building the C4.5 plug-in 
     1238------------------------- 
     1239 
     1240Due to copyright restrictions, C4.5 is not distributed with Orange, 
     1241but it can be added as a plug-in. A C compiler is needed for the 
     1242procedure: on Windows MS Visual C (CL.EXE and LINK.EXE must be on the 
     1243PATH), on Linux and OS X gcc (OS X users can download it from Apple). 
     1244 
     1245Orange must be installed prior to building C4.5. 
     1246 
     1247#. Download  
     1248   `C4.5 (Release 8) sources <http://www.rulequest.com/Personal/c4.5r8.tar.gz>`_ 
     1249   from the `Rule Quest's site <http://www.rulequest.com/>`_ and extract 
     1250   them. The files will be modified in the 
     1251   further process. 
     1252#. Download 
     1253   `buildC45.zip <http://orange.biolab.si/orange/download/buildC45.zip>`_ 
     1254   and unzip its contents into the directory R8/Src of the C4.5 sources 
     1255   (this directory contains, for instance, the file average.c). 
     1256#. Run buildC45.py, which will build the plug-in and put it next to  
     1257   orange.pyd (or orange.so on Linux/Mac). 
     1258#. Run python, type ``import Orange`` and 
     1259   create ``Orange.classification.tree.C45Learner()``. This should 
     1260   succeed. 
     1261#. Finally, you can remove C4.5 sources. 
     1262 
     1263The buildC45.py creates .h files that wrap Quinlan's .i files and 
     1264ensure that they are not included twice. It modifies C4.5 sources to 
     1265include .h's instead of .i's (this step can hardly fail). Then it compiles 
     1266ensemble.c into c45.dll or c45.so and puts it next to Orange. In the end 
     1267it checks if the built C4.5 gives the same results as the original. 
     1268 
     1269.. autoclass:: C45Learner 
     1270    :members: 
     1271 
     1272.. autoclass:: C45Classifier 
     1273    :members: 
     1274 
     1275.. class:: C45Node 
     1276 
     1277    This class is a reimplementation of the corresponding *struct* from 
     1278    Quinlan's C4.5 code. 
     1279 
     1280    .. attribute:: node_type 
     1281 
     1282        Type of the node:  :obj:`C45Node.Leaf` (0),  
     1283        :obj:`C45Node.Branch` (1), :obj:`C45Node.Cut` (2), 
     1284        :obj:`C45Node.Subset` (3). "Leaves" are leaves, "branches" 
     1285        split instances based on values of a discrete attribute, 
     1286        "cuts" cut them according to a threshold value of a continuous 
     1287        attributes and "subsets" use discrete attributes but with subsetting 
     1288        so that several values can go into the same branch. 
     1289 
     1290    .. attribute:: leaf 
     1291 
     1292        Value returned by that leaf. The field is defined for internal  
     1293        nodes as well. 
     1294 
     1295    .. attribute:: items 
     1296 
     1297        Number of (learning) instances in the node. 
     1298 
     1299    .. attribute:: class_dist 
     1300 
     1301        Class distribution for the node (of type  
     1302        :obj:`Orange.statistics.distribution.Discrete`). 
     1303 
     1304    .. attribute:: tested 
     1305         
     1306        The attribute used in the node's test. If node is a leaf, 
     1307        obj:`tested` is None, if node is of type :obj:`Branch` or :obj:`Cut` 
     1308        :obj:`tested` is a discrete attribute, and if node is of type 
     1309        :obj:`Cut` then :obj:`tested` is a continuous attribute. 
     1310 
     1311    .. attribute:: cut 
     1312 
     1313        A threshold for continuous attributes, if node is of type :obj:`Cut`. 
     1314        Undefined otherwise. 
     1315 
     1316    .. attribute:: mapping 
     1317 
     1318        Mapping for nodes of type :obj:`Subset`. Element ``mapping[i]`` 
     1319        gives the index for an instance whose value of :obj:`tested` is *i*.  
     1320        Here, *i* denotes an index of value, not a :class:`Orange.data.Value`. 
     1321 
     1322    .. attribute:: branch 
     1323         
     1324        A list of branches stemming from this node. 
     1325 
     1326-------- 
     1327Examples 
     1328-------- 
     1329 
     1330This 
     1331script constructs the same learner as you would get by calling 
     1332the usual C4.5: 
     1333 
     1334.. literalinclude:: code/tree_c45.py 
     1335   :lines: 7-14 
     1336 
     1337Both C4.5 command-line symbols and variable names can be used. The  
     1338following lines produce the same result:: 
     1339 
     1340    tree = Orange.classification.tree.C45Learner(data, m=100) 
     1341    tree = Orange.classification.tree.C45Learner(data, min_objs=100) 
     1342 
     1343A veteran C4.5 might prefer :func:`C45Learner.commandline`:: 
     1344 
     1345    lrn = Orange.classification.tree.C45Learner() 
     1346    lrn.commandline("-m 1 -s") 
     1347    tree = lrn(data) 
     1348 
     1349The following script prints out the tree same format as C4.5 does. 
     1350 
     1351.. literalinclude:: code/tree_c45_printtree.py 
     1352 
     1353For the leaves just the value in ``node.leaf`` in printed. Since 
     1354:obj:`C45Node` does not know to which attribute it belongs, we need to 
     1355convert it to a string through ``classvar``, which is passed as an extra 
     1356argument to the recursive part of print_tree. 
     1357 
     1358For discrete splits without subsetting, we print out all attribute values 
     1359and recursively call the function for all branches. Continuous splits 
     1360are equally easy to handle. 
     1361 
     1362For discrete splits with subsetting, we iterate through branches, 
     1363retrieve the corresponding values that go into each branch to inset, 
     1364turn the values into strings and print them out, separately treating 
     1365the case when only a single value goes into the branch. 
     1366 
     1367=================== 
     1368Simple Tree Inducer 
     1369=================== 
     1370 
     1371.. include:: /SimpleTreeLearner.txt 
     1372 
     1373--------         
     1374Examples 
     1375-------- 
     1376 
     1377:obj:`SimpleTreeLearner` is used in much the same way as :obj:`TreeLearner`. 
     1378A typical example of using :obj:`SimpleTreeLearner` would be to build a random 
     1379forest: 
     1380 
     1381.. literalinclude:: code/simple_tree_random_forest.py 
     1382 
     1383========== 
     1384References 
     1385========== 
     1386 
     1387Bratko, I. (2002). `Prolog Programming for Artificial Intelligence`, Addison  
     1388Wesley, 2002. 
     1389 
     1390E Koutsofios, SC North. Drawing Graphs with dot. AT&T Bell Laboratories, 
     1391Murray Hill NJ, U.S.A., October 1993. 
     1392 
     1393`Graphviz - open source graph drawing software <http://www.research.att.com/sw/tools/graphviz/>`_ 
     1394A home page of AT&T's dot and similar software packages. 
     1395 
     1396""" 
     1397 
     1398""" 
     1399TODO C++ aliases 
     1400 
     1401SplitConstructor.discrete/continuous_split_constructor -> SplitConstructor.discrete  
     1402Node.examples -> Node.instances 
  • docs/reference/rst/Orange.statistics.distribution.rst

    r9372 r10372  
    1 .. automodule:: Orange.statistics.distribution 
     1.. py:currentmodule:: Orange.statistics.distribution 
     2 
     3.. index:: Distributions 
     4 
     5============= 
     6Distributions 
     7============= 
     8 
     9:obj:`Distribution` and derived classes store empirical 
     10distributions of discrete and continuous variables. 
     11 
     12.. class:: Distribution 
     13 
     14    This class can 
     15    store absolute or relative frequencies. It provides a convenience constructor 
     16    which constructs instances of derived classes. :: 
     17 
     18        >>> import Orange 
     19        >>> data = Orange.data.Table("adult_sample") 
     20        >>> disc = Orange.statistics.distribution.Distribution("workclass", data) 
     21        >>> print disc 
     22        <685.000, 72.000, 28.000, 29.000, 59.000, 43.000, 2.000> 
     23        >>> print type(disc) 
     24        <type 'DiscDistribution'> 
     25 
     26    The resulting distribution is of type :obj:`DiscDistribution` since variable 
     27    `workclass` is discrete. The printed numbers are counts of examples that have particular 
     28    attribute value. :: 
     29 
     30        >>> workclass = data.domain["workclass"] 
     31        >>> for i in range(len(workclass.values)): 
     32        ...     print "%20s: %5.3f" % (workclass.values[i], disc[i]) 
     33                 Private: 685.000 
     34        Self-emp-not-inc: 72.000 
     35            Self-emp-inc: 28.000 
     36             Federal-gov: 29.000 
     37               Local-gov: 59.000 
     38               State-gov: 43.000 
     39             Without-pay: 2.000 
     40            Never-worked: 0.000 
     41 
     42    Distributions resembles dictionaries, supporting indexing by instances of 
     43    :obj:`Orange.data.Value`, integers or floats (depending on the distribution 
     44    type), and symbolic names (if :obj:`variable` is defined). 
     45 
     46    For instance, the number of examples with `workclass="private"`, can be 
     47    obtained in three ways:: 
     48     
     49        print "Private: ", disc["Private"] 
     50        print "Private: ", disc[0] 
     51        print "Private: ", disc[orange.Value(workclass, "Private")] 
     52 
     53    Elements cannot be removed from distributions. 
     54 
     55    Length of distribution equals the number of possible values for discrete 
     56    distributions (if :obj:`variable` is set), the value with the highest index 
     57    encountered (if distribution is discrete and :obj: `variable` is 
     58    :obj:`None`) or the number of different values encountered (for continuous 
     59    distributions). 
     60 
     61    .. attribute:: variable 
     62 
     63        Variable to which the distribution applies; may be :obj:`None` if not 
     64        applicable. 
     65 
     66    .. attribute:: unknowns 
     67 
     68        The number of instances for which the value of the variable was 
     69        undefined. 
     70 
     71    .. attribute:: abs 
     72 
     73        Sum of all elements in the distribution. Usually it equals either 
     74        :obj:`cases` if the instance stores absolute frequencies or 1 if the 
     75        stored frequencies are relative, e.g. after calling :obj:`normalize`. 
     76 
     77    .. attribute:: cases 
     78 
     79        The number of instances from which the distribution is computed, 
     80        excluding those on which the value was undefined. If instances were 
     81        weighted, this is the sum of weights. 
     82 
     83    .. attribute:: normalized 
     84 
     85        :obj:`True` if distribution is normalized. 
     86 
     87    .. attribute:: random_generator 
     88 
     89        A pseudo-random number generator used for method :obj:`Orange.misc.Random`. 
     90 
     91    .. method:: __init__(variable[, data[, weightId=0]]) 
     92 
     93        Construct either :obj:`DiscDistribution` or :obj:`ContDistribution`, 
     94        depending on the variable type. If the variable is the only argument, it 
     95        must be an instance of :obj:`Orange.feature.Descriptor`. In that case, 
     96        an empty distribution is constructed. If data is given as well, the 
     97        variable can also be specified by name or index in the 
     98        domain. Constructor then computes the distribution of the specified 
     99        variable on the given data. If instances are weighted, the id of 
     100        meta-attribute with weights can be passed as the third argument. 
     101 
     102        If variable is given by descriptor, it doesn't need to exist in the 
     103        domain, but it must be computable from given instances. For example, the 
     104        variable can be a discretized version of a variable from data. 
     105 
     106    .. method:: keys() 
     107 
     108        Return a list of possible values (if distribution is discrete and 
     109        :obj:`variable` is set) or a list encountered values otherwise. 
     110 
     111    .. method:: values() 
     112 
     113        Return a list of frequencies of values such as described above. 
     114 
     115    .. method:: items() 
     116 
     117        Return a list of pairs of elements of the above lists. 
     118 
     119    .. method:: native() 
     120 
     121        Return the distribution as a list (for discrete distributions) or as a 
     122        dictionary (for continuous distributions) 
     123 
     124    .. method:: add(value[, weight=1]) 
     125 
     126        Increase the count of the element corresponding to ``value`` by 
     127        ``weight``. 
     128 
     129        :param value: Value 
     130        :type value: :obj:`Orange.data.Value`, string (if :obj:`variable` is set), :obj:`int` for discrete distributions or :obj:`float` for continuous distributions 
     131        :param weight: Weight to be added to the count for ``value`` 
     132        :type weight: float 
     133 
     134    .. method:: normalize() 
     135 
     136        Divide the counts by their sum, set :obj:`normalized` to :obj:`True` and 
     137        :obj:`abs` to 1. Attributes :obj:`cases` and :obj:`unknowns` are 
     138        unchanged. This changes absoluted frequencies into relative. 
     139 
     140    .. method:: modus() 
     141 
     142        Return the most common value. If there are multiple such values, one is 
     143        chosen at random, although the chosen value will always be the same for 
     144        the same distribution. 
     145 
     146    .. method:: random() 
     147 
     148        Return a random value based on the stored empirical probability 
     149        distribution. For continuous distributions, this will always be one of 
     150        the values which actually appeared (e.g. one of the values from 
     151        :obj:`keys`). 
     152 
     153        The method uses :obj:`random_generator`. If none has been constructed or 
     154        assigned yet, a new one is constructed and stored for further use. 
     155 
     156 
     157.. class:: Discrete 
     158 
     159    Stores a discrete distribution of values. The class differs from its parent 
     160    class in having a few additional constructors. 
     161 
     162    .. method:: __init__(variable) 
     163 
     164        Construct an instance of :obj:`Discrete` and set the variable 
     165        attribute. 
     166 
     167        :param variable: A discrete variable 
     168        :type variable: Orange.feature.Discrete 
     169 
     170    .. method:: __init__(frequencies) 
     171 
     172        Construct an instance and initialize the frequencies from the list, but 
     173        leave `Distribution.variable` empty. 
     174 
     175        :param frequencies: A list of frequencies 
     176        :type frequencies: list 
     177 
     178        Distribution constructed in this way can be used, for instance, to 
     179        generate random numbers from a given discrete distribution:: 
     180 
     181            disc = Orange.statistics.distribution.Discrete([0.5, 0.3, 0.2]) 
     182            for i in range(20): 
     183                print disc.random(), 
     184 
     185        This prints out approximatelly ten 0's, six 1's and four 2's. The values 
     186        can be named by assigning a variable:: 
     187 
     188            v = orange.EnumVariable(values = ["red", "green", "blue"]) 
     189            disc.variable = v 
     190 
     191    .. method:: __init__(distribution) 
     192 
     193        Copy constructor; makes a shallow copy of the given distribution 
     194 
     195        :param distribution: An existing discrete distribution 
     196        :type distribution: Discrete 
     197 
     198 
     199.. class:: Continuous 
     200 
     201    Stores a continuous distribution, that is, a dictionary-like structure with 
     202    values and their frequencies. 
     203 
     204    .. method:: __init__(variable) 
     205 
     206        Construct an instance of :obj:`ContDistribution` and set the variable 
     207        attribute. 
     208 
     209        :param variable: A continuous variable 
     210        :type variable: Orange.feature.Continuous 
     211 
     212    .. method:: __init__(frequencies) 
     213 
     214        Construct an instance of :obj:`Continuous` and initialize it from 
     215        the given dictionary with frequencies, whose keys and values must be integers. 
     216 
     217        :param frequencies: Values and their corresponding frequencies 
     218        :type frequencies: dict 
     219 
     220    .. method:: __init__(distribution) 
     221 
     222        Copy constructor; makes a shallow copy of the given distribution 
     223 
     224        :param distribution: An existing continuous distribution 
     225        :type distribution: Continuous 
     226 
     227    .. method:: average() 
     228 
     229        Return the average value. Note that the average can also be 
     230        computed using a simpler and faster classes from module 
     231        :obj:`Orange.statistics.basic`. 
     232 
     233    .. method:: var() 
     234 
     235        Return the variance of distribution. 
     236 
     237    .. method:: dev() 
     238 
     239        Return the standard deviation. 
     240 
     241    .. method:: error() 
     242 
     243        Return the standard error. 
     244 
     245    .. method:: percentile(p) 
     246 
     247        Return the value at the `p`-th percentile. 
     248 
     249        :param p: The percentile, must be between 0 and 100 
     250        :type p: float 
     251        :rtype: float 
     252 
     253        For example, if `d_age` is a continuous distribution, the quartiles can 
     254        be printed by :: 
     255 
     256            print "Quartiles: %5.3f - %5.3f - %5.3f" % (  
     257                 dage.percentile(25), dage.percentile(50), dage.percentile(75)) 
     258 
     259   .. method:: density(x) 
     260 
     261        Return the probability density at `x`. If the value is not in 
     262        :obj:`Distribution.keys`, it is interpolated. 
     263 
     264 
     265.. class:: Gaussian 
     266 
     267    A class imitating :obj:`Continuous` by returning the statistics and 
     268    densities for Gaussian distribution. The class is not meant only for a 
     269    convenient substitution for code which expects an instance of 
     270    :obj:`Distribution`. For general use, Python module :obj:`random` 
     271    provides a comprehensive set of functions for various random distributions. 
     272 
     273    .. attribute:: mean 
     274 
     275        The mean value parameter of the Gauss distribution. 
     276 
     277    .. attribute:: sigma 
     278 
     279        The standard deviation of the distribution 
     280 
     281    .. attribute:: abs 
     282 
     283        The simulated number of instances; in effect, the Gaussian distribution 
     284        density, as returned by method :obj:`density` is multiplied by 
     285        :obj:`abs`. 
     286 
     287    .. method:: __init__([mean=0, sigma=1]) 
     288 
     289        Construct an instance, set :obj:`mean` and :obj:`sigma` to the given 
     290        values and :obj:`abs` to 1. 
     291 
     292    .. method:: __init__(distribution) 
     293 
     294        Construct a distribution which approximates the given distribution, 
     295        which must be either :obj:`Continuous`, in which case its 
     296        average and deviation will be used for mean and sigma, or and existing 
     297        :obj:`GaussianDistribution`, which will be copied. Attribute :obj:`abs` 
     298        is set to the given distribution's ``abs``. 
     299 
     300    .. method:: average() 
     301 
     302        Return :obj:`mean`. 
     303 
     304    .. method:: dev() 
     305 
     306        Return :obj:`sigma`. 
     307 
     308    .. method:: var() 
     309 
     310        Return square of :obj:`sigma`. 
     311 
     312    .. method:: density(x) 
     313 
     314        Return the density at point ``x``, that is, the Gaussian distribution 
     315        density multiplied by :obj:`abs`. 
     316 
     317 
     318Class distributions 
     319=================== 
     320 
     321There is a convenience function for computing empirical class distributions from 
     322data. 
     323 
     324.. function:: getClassDistribution(data[, weightID=0]) 
     325 
     326    Return a class distribution for the given data. 
     327 
     328    :param data: A set of instances. 
     329    :type data: Orange.data.Table 
     330    :param weightID: An id for meta attribute with weights of instances 
     331    :type weightID: int 
     332    :rtype: :obj:`Discrete` or :obj:`Continuous`, depending on the class type 
     333 
     334Distributions of all variables 
     335============================== 
     336 
     337Distributions of all variables can be computed and stored in 
     338:obj:`Domain`. The list-like object can be indexed by variable 
     339indices in the domain, as well as by variables and their names. 
     340 
     341.. class:: Domain 
     342 
     343    .. method:: __init__(data[, weightID=0]) 
     344 
     345        Construct an instance with distributions of all discrete and continuous 
     346        variables from the given data. 
     347 
     348    :param data: A set of instances. 
     349    :type data: Orange.data.Table 
     350    :param weightID: An id for meta attribute with weights of instances 
     351    :type weightID: int 
     352 
     353The script below computes distributions for all attributes in the data and 
     354prints out distributions for discrete and averages for continuous attributes. :: 
     355 
     356    dist = Orange.statistics.distribution.Domain(data) 
     357 
     358    for d in dist: 
     359        if d.variable.var_type == Orange.feature.Type.Discrete: 
     360             print "%30s: %s" % (d.variable.name, d) 
     361        else: 
     362             print "%30s: avg. %5.3f" % (d.variable.name, d.average()) 
     363 
     364The distribution for, say, attribute `age` can be obtained by its index and also 
     365by its name:: 
     366 
     367    dist_age = dist["age"] 
  • source/orange/_aliases.txt

    r10268 r10374  
    8484FindNearest_BruteForce 
    8585instances examples 
     86 
     87Rule 
     88instances examples 
     89 
Note: See TracChangeset for help on using the changeset viewer.