Changeset 8237:522a4f1bcb1f in orange


Ignore:
Timestamp:
08/19/11 14:06:51 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
10014875e7b7c9f23340b01f1ddba58c000195c4
Message:

Orange.classification.tree: documentation edits.

Location:
orange
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r8216 r8237  
    3232.. autoclass:: TreeClassifier 
    3333    :members: 
     34 
     35.. class:: Node 
     36 
     37    Classification trees are represented as a tree-like hierarchy of 
     38    :obj:`Node` classes. 
     39 
     40    Node stores the instances belonging to the node, a branch selector, 
     41    a list of branches (if the node is not a leaf) with their descriptions 
     42    and strengths, and a classifier. 
     43 
     44    .. attribute:: distribution 
     45     
     46        A distribution for learning instances in the 
     47        node. 
     48 
     49    .. attribute:: contingency 
     50 
     51        Complete contingency matrices for the learning instances 
     52        in the node. 
     53 
     54    .. attribute:: examples, weightID 
     55 
     56        Learning instancess for the node and the corresponding ID 
     57        of weight meta attribute. The root of the tree stores all 
     58        instances, while other nodes store only reference to instances 
     59        in the root node. 
     60 
     61    .. attribute:: node_classifier 
     62 
     63        A classifier (usually a :obj:`DefaultClassifier`) that can be used 
     64        to classify instances coming to the node. If the node is a leaf, 
     65        this is used to decide the final class (or class distribution) 
     66        of an instance. If it's an internal node, it is stored if 
     67        :obj:`Node`'s flag :obj:`store_node_classifier` is set. Since 
     68        the :obj:`node_classifier` is needed by :obj:`Descender` and 
     69        for pruning (see far below), this is the default behaviour; 
     70        space consumption of the default :obj:`DefaultClassifier` is 
     71        rather small. You should never disable this if you intend to 
     72        prune the tree later. 
     73 
     74    If the node is a leaf, the remaining fields are None. If it's 
     75    an internal node, there are several additional fields. The lists 
     76    :obj:`branches`, :obj:`branch_descriptions` and :obj:`branch_sizes` 
     77    are of the same length. 
     78 
     79    .. attribute:: branches 
     80 
     81        Stores a list of subtrees, given as :obj:`Node`.  An element 
     82        can be None; in this case the node is empty. 
     83 
     84    .. attribute:: branch_descriptions 
     85 
     86        A list with string descriptions for branches, constructed by 
     87        :obj:`SplitConstructor`. It can contain different kinds of 
     88        descriptions, but basically, expect things like 'red' or '>12.3'. 
     89 
     90    .. attribute:: branch_sizes 
     91 
     92        Gives a (weighted) number of training instances that went into 
     93        each branch. This can be used later, for instance, for modeling 
     94        probabilities when classifying instances with unknown values. 
     95 
     96    .. attribute:: branch_selector 
     97 
     98        Gives a branch for each instance. The same object is used 
     99        during learning and classifying. The :obj:`branch_selector` 
     100        is of type :obj:`Orange.classification.Classifier`, since its job is 
     101        similar to that of a classifier: it gets an instance and 
     102        returns discrete :obj:`Orange.data.Value` in range :samp:`[0, 
     103        len(branches)-1]`.  When an instance cannot be classified to 
     104        any branch, the selector can return a :obj:`Orange.data.Value` 
     105        containing a special value (sVal) which should be a discrete 
     106        distribution (DiscDistribution). This should represent a 
     107        :obj:`branch_selector`'s opinion of how to divide the instance 
     108        between the branches. Whether the proposition will be used or not 
     109        depends upon the chosen :obj:`Splitter` (when learning) 
     110        or :obj:`Descender` (when classifying). 
     111 
     112    .. method:: tree_size() 
     113         
     114        Return the number of nodes in the subtrees (including the node, 
     115        excluding null-nodes). 
    34116 
    35117 
     
    97179for nicer output. What matters is everything but the print statements. 
    98180As first, we check whether the node is a null-node (a node to which no 
    99 learning examples were classified). If this is so, we just print out 
     181learning instances were classified). If this is so, we just print out 
    100182"<null node>" and return. 
    101183 
    102184After handling null nodes, remaining nodes are internal nodes and 
    103185leaves.  For internal nodes, we print a node description consisting 
    104 of the attribute's name and distribution of classes. :obj:`Node`'s 
     186of the feature's name and distribution of classes. :obj:`Node`'s 
    105187branch description is, for all currently defined splits, an instance 
    106188of a class derived from :obj:`Orange.classification.Classifier`  
     
    114196 
    115197Finally, if the node is a leaf, we print out the distribution of learning 
    116 examples in the node and the class to which the examples in the node 
     198instances in the node and the class to which the instances in the node 
    117199would be classified. We again assume that the :obj:`~Node.node_classifier` is 
    118200the default one - a :obj:`DefaultClassifier`. A better print function 
     
    226308    1.0 0.0 
    227309 
    228 Not very restrictive. This keeps splitting the examples until there's 
    229 nothing left to split or all the examples are in the same class. Let us 
    230 set the minimal subset that we allow to be split to five examples and 
     310Not very restrictive. This keeps splitting the instances until there's 
     311nothing left to split or all the instances are in the same class. Let us 
     312set the minimal subset that we allow to be split to five instances and 
    231313see what comes out. 
    232314 
    233     >>> learner.stop.min_examples = 5.0 
     315    >>> learner.stop.min_instances = 5.0 
    234316    >>> tree = learner(data) 
    235317    >>> print tree.dump() 
     
    257339================================= 
    258340 
    259 Classification trees are represented as a tree-like hierarchy of 
    260 :obj:`Node` classes. 
    261  
    262 Classes :obj:`SplitConstructor`, :obj:`StopCriteria`, 
    263 :obj:`ExampleSplitter`, :obj:`Descender` can be subtyped in Python. You 
    264 can thus program your own components based on these classes (TODO). 
    265  
    266 .. class:: Node 
    267  
    268     Node stores the instances belonging to the node, a branch selector, 
    269     a list of branches (if the node is not a leaf) with their descriptions 
    270     and strengths, and a classifier. 
    271  
    272     .. attribute:: distribution 
    273      
    274         A distribution for learning instances in the 
    275         node. 
    276  
    277     .. attribute:: contingency 
    278  
    279         Complete contingency matrices for the learning instances 
    280         in the node. 
    281  
    282     .. attribute:: examples, weightID 
    283  
    284         Learning instancess for the node and the corresponding ID 
    285         of weight meta attribute. The root of the tree stores all 
    286         instances, while other nodes store only reference to instances 
    287         in the root node. 
    288  
    289     .. attribute:: node_classifier 
    290  
    291         A classifier (usually a :obj:`DefaultClassifier`) that can be used 
    292         to classify instances coming to the node. If the node is a leaf, 
    293         this is used to decide the final class (or class distribution) 
    294         of an instance. If it's an internal node, it is stored if 
    295         :obj:`Node`'s flag :obj:`store_node_classifier` is set. Since 
    296         the :obj:`node_classifier` is needed by :obj:`Descender` and 
    297         for pruning (see far below), this is the default behaviour; 
    298         space consumption of the default :obj:`DefaultClassifier` is 
    299         rather small. You should never disable this if you intend to 
    300         prune the tree later. 
    301  
    302     If the node is a leaf, the remaining fields are None. If it's 
    303     an internal node, there are several additional fields. The lists 
    304     :obj:`branches`, :obj:`branch_descriptions` and :obj:`branch_sizes` 
    305     are of the same length. 
    306  
    307     .. attribute:: branches 
    308  
    309         Stores a list of subtrees, given as :obj:`Node`.  An element 
    310         can be None; in this case the node is empty. 
    311  
    312     .. attribute:: branch_descriptions 
    313  
    314         A list with string descriptions for branches, constructed by 
    315         :obj:`SplitConstructor`. It can contain different kinds of 
    316         descriptions, but basically, expect things like 'red' or '>12.3'. 
    317  
    318     .. attribute:: branch_sizes 
    319  
    320         Gives a (weighted) number of training instances that went into 
    321         each branch. This can be used later, for instance, for modeling 
    322         probabilities when classifying instances with unknown values. 
    323  
    324     .. attribute:: branch_selector 
    325  
    326         Gives a branch for each instance. The same object is used 
    327         during learning and classifying. The :obj:`branch_selector` 
    328         is of type :obj:`Orange.classification.Classifier`, since its job is 
    329         similar to that of a classifier: it gets an instance and 
    330         returns discrete :obj:`Orange.data.Value` in range :samp:`[0, 
    331         len(branches)-1]`.  When an instance cannot be classified to 
    332         any branch, the selector can return a :obj:`Orange.data.Value` 
    333         containing a special value (sVal) which should be a discrete 
    334         distribution (DiscDistribution). This should represent a 
    335         :obj:`branch_selector`'s opinion of how to divide the instance 
    336         between the branches. Whether the proposition will be used or not 
    337         depends upon the chosen :obj:`ExampleSplitter` (when learning) 
    338         or :obj:`Descender` (when classifying). 
    339  
    340     .. method:: tree_size() 
    341          
    342         Return the number of nodes in the subtrees (including the node, 
    343         excluding null-nodes). 
    344  
    345  
    346341Split constructors 
    347342===================== 
    348343 
    349 Split construction is almost as exciting as waiting for a delayed flight. 
    350 Boring, that is. Split constructors juggle with contingency matrices, 
    351 with separate cases for discrete and continuous classes... Most split 
    352 constructors work either for discrete or for continuous attributes. We 
    353 suggest to use a :obj:`SplitConstructor_Combined` that delegates 
    354 attributes to specialized split constructors. 
    355  
    356 Split constructors that cannot handle attributes of particular 
    357 type (discrete, continuous) do not report an error or a warning but 
    358 simply skip the attribute. It is your responsibility to use a correct 
    359 split constructor for your dataset. (May we again suggest using 
    360 :obj:`SplitConstructor_Combined`?) 
     344Split constructors that cannot handle features of particular type 
     345(discrete, continuous) quitely skip them. Therefore use 
     346a correct split constructor for your dataset. We suggest a 
     347:obj:`SplitConstructor_Combined` that delegates features to specialized 
     348split constructors. 
    361349 
    362350The same components can be used either for inducing classification and 
    363351regression trees. The only component that needs to be chosen accordingly 
    364 is the 'measure' attribute for the :obj:`SplitConstructor_Measure` class 
     352is the 'measure' attribute for the :obj:`SplitConstructor_Score` class 
    365353(and derived classes). 
    366354 
     
    368356 
    369357    Finds a suitable criteria for dividing the learning (and later 
    370     testing) examples coming to the node. The data it gets is a set of 
    371     examples (and, optionally, an ID of weight meta-attribute), a domain 
    372     contingency computed from examples, apriori class probabilities, a 
    373     list of candidate attributes it should consider and a node classifier 
    374     (if it was constructed, that is, if :obj:`store_node_classifier` 
    375     is left true). 
    376  
    377     The :obj:`SplitConstructor` should use the domain contingency 
    378     when possible. The reasons are two-fold; one is that it's faster 
    379     and the other is that the contingency matrices are not necessarily 
    380     constructed by simply counting the examples. Why and how is 
    381     explained later. There are, however, cases, when domain contingency 
    382     does not suffice, for examples, when ReliefF is used as a measure 
    383     of quality of attributes. In this case, there's no other way but to 
    384     use the examples and ignore the precomputed contingencies. 
    385  
    386     :obj:`SplitConstructor` returns most of the data we talked 
    387     about when describing the :obj:`Node`. It returns a classifier 
    388     to be used as :obj:`Node`'s :obj:`branch_selector`, a list of branch 
    389     descriptions and a list with the number of examples that go into 
    390     each branch. Just what we need for the :obj:`Node`.  It can return 
    391     an empty list for the number of examples in branches; in this case, 
    392     the :obj:`TreeLearner` will find the number itself after splitting 
    393     the example set into subsets. However, if a split constructors can 
    394     provide the numbers at no extra computational cost, it should do so. 
    395  
    396     In addition, it returns a quality of the split; a number without 
    397     any fixed meaning except that higher numbers mean better splits. 
    398  
    399     If the constructed splitting criterion uses an attribute in such 
    400     a way that the attribute is 'completely spent' and should not be 
     358    testing) instances.  
     359     
     360    The :obj:`SplitConstructor` should use the domain contingency when 
     361    possible, both because it's faster and because the contingency 
     362    matrices are not necessarily constructed by simply counting the 
     363    instances. There are, however, cases when domain contingency does not 
     364    suffice; for example if ReliefF is used to score features. 
     365 
     366    :obj:`SplitConstructor` returns a classifier to be used as 
     367    :obj:`Node`'s :obj:`~Node.branch_selector`, a list of branch descriptions 
     368    a list with the number of instances that go into each branch 
     369    (if empty, the :obj:`TreeLearner` will find the number itself after 
     370    splitting the instances into subsets), a split quality (a number without 
     371    any fixed meaning except that higher numbers mean better splits). 
     372 
     373    If the constructed splitting criterion uses a feature in such 
     374    a way that the feature will be useless in the future and should not be 
    401375    considered as a split criterion in any of the subtrees (the typical 
    402     case of this are discrete attributes that are used as-they-are, that 
    403     is, without any binarization or subsetting), then it should report 
    404     the index of this attribute. Some splits do not spend any attribute; 
     376    case of this are discrete features that are used as-they-are, 
     377    without any binarization or subsetting), then it should report 
     378    the index of this feature. Some splits do not spend any features; 
    405379    this is indicated by returning a negative index. 
    406380 
    407381    A :obj:`SplitConstructor` can veto the further tree induction 
    408382    by returning no classifier. This can happen for many reasons. 
    409     A general one is related to number of examples in the branches. 
     383    A general one is related to number of instances in the branches. 
    410384    :obj:`SplitConstructor` has a field :obj:`min_subset`, which sets 
    411     the minimal number of examples in a branch; null nodes, however, 
     385    the minimal number of instances in a branch; null nodes 
    412386    are allowed. If there is no split where this condition is met, 
    413387    :obj:`SplitConstructor` stops the induction. 
     
    415389    .. attribute:: min_subset 
    416390 
    417         Sets the minimal number of examples in non-null leaves. As 
    418         always in Orange (where not specified otherwise), "number of  
    419         examples" refers to the weighted number of examples. 
    420      
    421     .. method:: __call__(examples, [weightID=0, apriori_distribution, candidates])  
    422  
    423         Construct a split. Returns a tuple (:obj:`branch_selector`, 
    424         :obj:`branch_descriptions`, :obj:`subsetSizes`, :obj:`quality`, 
    425         :obj:`spentAttribute`). :obj:`spentAttribute` is -1 if no 
    426         attribute is completely spent by the split criterion. If no 
     391        The minimal number of (weighted) in non-null leaves. 
     392 
     393    .. method:: __call__(instances, [ weightID, contingency, apriori_distribution, candidates, clsfr])  
     394 
     395        :param instances:  Examples can be given in any acceptable form 
     396            (an :obj:`ExampleGenerator`, such as :obj:`ExampleTable`, or a 
     397            list of instances). 
     398        :param weightID: Optional; the default of 0 means that all 
     399            instances have a weight of 1.0.  
     400        :param contingency: a domain contingency 
     401        :param apriori_distribution: apriori class probabilities. 
     402        :type apriori_distribution: :obj:`Orange.statistics.distribution.Distribution` 
     403        :param candidates: The split constructor should consider only  
     404            the features in the candidate list (one boolean for each 
     405            feature). 
     406        :param clsfr: a node classifier (if it was constructed, that is,  
     407            if :obj:`store_node_classifier` is True)  
     408 
     409        Construct a split. Return a tuple (:obj:`branch_selector`, 
     410        :obj:`branch_descriptions`, :obj:`subset_sizes`, :obj:`quality`, 
     411        :obj:`spent_feature`). :obj:`spent_feature` is -1 if no 
     412        feature is completely spent by the split criterion. If no 
    427413        split is constructed, the :obj:`selector`, :obj:`branch_descriptions` 
    428         and :obj:`subsetSizes` are None, while :obj:`quality` is 0.0 and 
    429         :obj:`spentAttribute` is -1. 
    430  
    431         :param examples:  Examples can be given in any acceptable form 
    432             (an :obj:`ExampleGenerator`, such as :obj:`ExampleTable`, or a 
    433             list of examples). 
    434         :param weightID: Optional; the default of 0 means that all 
    435             examples have a weight of 1.0.  
    436         :param apriori-distribution: Should be of type  
    437             :obj:`Orange.statistics.distribution.Distribution` and candidates should be a Python  
    438             list of objects which are interpreted as booleans. 
    439         :param candidates: The split constructor should consider only  
    440             the attributes in the candidate list (one boolean for each 
    441             attribute). 
    442  
    443  
    444 .. class:: SplitConstructor_Measure 
     414        and :obj:`subset_sizes` are None, while :obj:`quality` is 0.0 and 
     415        :obj:`spent_feature` is -1.  
     416 
     417.. class:: SplitConstructor_Score 
    445418 
    446419    Bases: :class:`SplitConstructor` 
    447420 
    448421    An abstract base class for split constructors that employ 
    449     a :class:`Orange.feature.scoring.Measure` to assess a 
     422    a :class:`Orange.feature.scoring.Score` to assess a 
    450423    quality of a split.  All split constructors except for 
    451424    :obj:`SplitConstructor_Combined` are derived from this class. 
     
    453426    .. attribute:: measure 
    454427 
    455         A component of type :class:`Orange.feature.scoring.Measure` 
     428        A component of type :class:`Orange.feature.scoring.Score` 
    456429        used for split evaluation. You must select a 
    457         :class:`Orange.feature.scoring.Measure` capable of 
     430        :class:`Orange.feature.scoring.Score` capable of 
    458431        handling your class type - for example, you cannot use 
    459         :class:`Orange.feature.scoring.GainRatio` for building regression 
     432        :class:`Orange.feature.scoring.GainRatio` for regression 
    460433        trees or :class:`Orange.feature.scoring.MSE` for classification 
    461434        trees. 
     
    467440        :obj:`measure` component. Default is 0.0. 
    468441 
    469 .. class:: SplitConstructor_Attribute 
    470  
    471     Bases: :class:`SplitConstructor_Measure` 
    472  
    473     Attempts to use a discrete attribute as a split; each value of 
    474     the attribute corresponds to a branch in the tree. Attributes are 
    475     evaluated with the :obj:`measure` and the one with the highest score 
    476     is used for a split. If there is more than one attribute with the 
    477     highest score, one of them is selected by random. 
     442.. class:: SplitConstructor_Feature 
     443 
     444    Bases: :class:`SplitConstructor_Score` 
     445 
     446    Each value of a discrete feature corresponds to a branch 
     447    in the tree. The features with with the highest score 
     448    (:obj:`~Measure.measure`) is used for a split. If multiple features 
     449    are tied for highest score, select a random one. 
    478450 
    479451    The constructed :obj:`branch_selector` is an instance of 
     452    :obj:`orange.ClassifierFromVarFD` that returns a value of the 
     453    selected feature. :obj:`branch_description` contains the feature's 
     454    values. The feature is marked as spent, so that it cannot reappear 
     455    in the node's subtrees. 
     456 
     457.. class:: SplitConstructor_ExhaustiveBinary 
     458 
     459    Bases: :class:`SplitConstructor_Score` 
     460 
     461    Works on discrete features. For each feature it determines which 
     462    binarization gives the the highest score. In case of ties, a random 
     463    feature is selected. 
     464 
     465    The constructed :obj:`branch_selector` is an instance 
    480466    :obj:`orange.ClassifierFromVarFD` that returns a value of the selected 
    481     attribute. If the attribute is :obj:`Orange.data.variable.Discrete`, 
    482     :obj:`branch_description`'s are the attribute's values. The attribute 
    483     is marked as spent, so that it cannot reappear in the node's subtrees. 
    484  
    485 .. class:: SplitConstructor_ExhaustiveBinary 
    486  
    487     Bases: :class:`SplitConstructor_Measure` 
    488  
    489     Works on discrete attributes. For each attribute, it determines 
    490     which binarization of the attribute gives the split with the 
    491     highest score. If more than one split has the highest score, one 
    492     of them is selected by random. After trying all the attributes, 
    493     it returns one of those with the highest score. 
    494  
    495     The constructed :obj:`branch_selector` is again an instance 
    496     :obj:`orange.ClassifierFromVarFD` that returns a value of the 
    497     selected attribute. This time, however, its :obj:`transformer` 
    498     contains an instance of :obj:`MapIntValue` that maps the values of 
    499     the attribute into a binary attribute. Branch descriptions are of 
    500     form "[<val1>, <val2>, ...<valn>]" for branches corresponding to 
    501     more than one value of the attribute. Branches that correspond to a 
    502     single value of the attribute are described with this value. If the 
    503     attribute was originally binary, it is spent and cannot be used in 
    504     the node's subtrees. Otherwise, it can reappear in the subtrees. 
     467    feature. Its :obj:`transformer` contains a :obj:`MapIntValue` 
     468    that maps values of the feature into a binary feature. Branch 
     469    descriptions are of form ``[<val1>, <val2>, ...<valn>]`` for branches 
     470    with more than one feature value. Branches with a single feature 
     471    value are described with that value. If the feature was binary, 
     472    it is spent and cannot be used in the node's subtrees. Otherwise, 
     473    it can reappear in the subtrees. 
    505474 
    506475 
    507476.. class:: SplitConstructor_Threshold 
    508477 
    509     Bases: :class:`SplitConstructor_Measure` 
    510  
    511     This is currently the only constructor for splits with continuous  
    512     attributes. It divides the range of attributes values with a threshold  
    513     that maximizes the split's quality. As always, if there is more than 
    514     one split with the highest score, a random threshold is selected. 
    515     The attribute that yields the highest binary split is returned. 
    516  
    517     The constructed :obj:`branch_selector` is again an instance 
    518     of :obj:`orange.ClassifierFromVarFD` with an attached 
    519     :obj:`transformer`. This time, :obj:`transformer` is of type 
    520     :obj:`Orange.feature.discretization.ThresholdDiscretizer`. The branch 
    521     descriptions are "<threshold" and ">=threshold". The attribute is 
    522     not spent. 
     478    Bases: :class:`SplitConstructor_Score` 
     479 
     480    Currently the only one for continuous features.  It divides the 
     481    range of feature values with a threshold that maximizes the split's 
     482    quality. In case of ties, a random feature is selected.  The feature 
     483    that yields the best binary split is returned. 
     484 
     485    The constructed :obj:`branch_selector` is an instance of 
     486    :obj:`orange.ClassifierFromVarFD` with an attached :obj:`transformer`, 
     487    of type :obj:`Orange.feature.discretization.ThresholdDiscretizer`. The 
     488    branch descriptions are "<threshold" and ">=threshold". The feature 
     489    is not spent. 
    523490 
    524491.. class:: SplitConstructor_OneAgainstOthers 
    525492     
    526     Bases: :class:`SplitConstructor_Measure` 
     493    Bases: :class:`SplitConstructor_Score` 
    527494 
    528495    Undocumented. 
     
    534501    This constructor delegates the task of finding the optimal split  
    535502    to separate split constructors for discrete and for continuous 
    536     attributes. Each split constructor is called, given only attributes 
    537     of appropriate types as candidates. Both construct a candidate for 
     503    features. Each split constructor is called given only features 
     504    of appropriate type. Both construct a candidate for 
    538505    a split; the better of them is selected. 
    539506 
    540     (Note that there is a problem when more candidates have the same 
    541     score. Let there be are nine discrete attributes with the highest 
    542     score; the split constructor for discrete attributes will select 
     507    Note that there is a problem when more candidates have the same 
     508    score. Let there be are nine discrete features with the highest 
     509    score; the split constructor for discrete features will select 
    543510    one of them. Now, let us suppose that there is a single continuous 
    544     attribute with the same score. :obj:`SplitConstructor_Combined` 
    545     would randomly select between the proposed discrete attribute and 
    546     the continuous attribute, not aware of the fact that the discrete 
    547     has already competed with eight other discrete attributes. So, he 
    548     probability for selecting (each) discrete attribute would be 1/18 
     511    feature with the same score. :obj:`SplitConstructor_Combined` 
     512    would randomly select between the proposed discrete feature and 
     513    the continuous feature, not aware of the fact that the discrete 
     514    has already competed with eight other discrete features. So, the 
     515    probability for selecting (each) discrete feature would be 1/18 
    549516    instead of 1/10. Although not really correct, we doubt that this would 
    550     affect the tree's performance; many other machine learning systems 
    551     simply choose the first attribute with the highest score anyway.) 
     517    affect the tree's performance. 
    552518 
    553519    The :obj:`branch_selector`, :obj:`branch_descriptions` and whether 
    554     the attribute is spent is decided by the winning split constructor. 
     520    the feature is spent is decided by the winning split constructor. 
    555521 
    556522    .. attribute: discrete_split_constructor 
    557523 
    558         Split constructor for discrete attributes; can be, 
    559         for instance, :obj:`SplitConstructor_Attribute` or 
     524        Split constructor for discrete features;  
     525        for instance, :obj:`SplitConstructor_Feature` or 
    560526        :obj:`SplitConstructor_ExhaustiveBinary`. 
    561527 
    562528    .. attribute: continuous_split_constructor 
    563529 
    564         Split constructor for continuous attributes; at the moment, it  
     530        Split constructor for continuous features; it  
    565531        can be either :obj:`SplitConstructor_Threshold` or a  
    566532        split constructor you programmed in Python. 
    567533 
    568     .. attribute: continuous_split_constructor 
    569      
    570         Split constructor for continuous attributes; at the moment, 
    571         it can be either :obj:`SplitConstructor_Threshold` or a split 
    572         constructor you programmed in Python. 
    573  
    574534 
    575535StopCriteria and StopCriteria_common 
    576536============================================ 
    577537 
    578 obj:`StopCriteria` determines when to stop the induction of subtrees.  
     538:obj:`StopCriteria` determines when to stop the induction of subtrees.  
    579539 
    580540.. class:: StopCriteria 
    581541 
    582     Given a set of examples, weight ID and contingency matrices, decide 
     542    Decide 
    583543    whether to continue the induction or not. The basic criterion checks 
    584     whether there are any examples and whether they belong to at least 
     544    if there are any instances and if they belong to at least 
    585545    two different classes (if the class is discrete). Derived components 
    586     check things like the number of examples and the proportion of 
     546    check things like the number of instances and the proportion of 
    587547    majority classes. 
    588548 
    589     As opposed to :obj:`SplitConstructor` and similar basic classes, 
    590549    :obj:`StopCriteria` is not an abstract but a fully functional 
    591550    class that provides the basic stopping criteria. That is, the tree 
    592     induction stops when there is at most one example left; in this case, 
    593     it is not the weighted but the actual number of examples that counts. 
    594     Besides that, the induction stops when all examples are in the same 
    595     class (for discrete problems) or have the same value of the outcome 
     551    induction stops when there is at most one instance left;  
     552    it is not the weighted but the actual number of instances that counts. 
     553    The induction also stops when all instances are in the same 
     554    class (for discrete problems) or have the same outcome value  
    596555    (for regression problems). 
    597556 
    598     .. method:: __call__(examples[, weightID, domain contingencies]) 
    599  
    600         Decides whether to stop (true) or continue (false) the induction. 
     557    .. method:: __call__(instances[, weightID, domain contingencies]) 
     558 
     559        Retunr True (stop) of False (continue the induction). 
    601560        If contingencies are given, they are used for checking whether 
    602         the examples are in the same class (but not for counting the 
    603         examples). Derived classes should use the contingencies whenever 
    604         possible. If contingencies are not given, :obj:`StopCriteria` 
    605         will work without them. Derived classes should also use them if 
    606         they are available, but otherwise compute them only when they 
    607         really need them. 
     561        the instances are in the same classm (but not for 
     562        instance counting). Derived classes should use the contingencies 
     563        whenever possible. 
    608564 
    609565.. class:: StopCriteria_common 
    610566 
    611     :obj:`StopCriteria` contains additional criteria for pre-pruning: 
    612     it checks the proportion of majority class and the number of weighted 
    613     examples. 
     567    Additional criteria for pre-pruning: 
     568    the proportion of majority class and the number of weighted 
     569    instances. 
    614570 
    615571    .. attribute:: max_majority 
    616572 
    617         Maximal proportion of majority class. When this is exceeded, 
     573        Maximal proportion of majority class. When exceeded, 
    618574        induction stops. 
    619575 
    620     .. attribute:: min_examples 
    621  
    622         Minimal number of examples in internal leaves. Subsets with 
    623         less than :obj:`min_examples` examples are not split any further. 
    624         Example count is weighed. 
    625  
    626  
    627 Example Splitters 
     576    .. attribute:: min_instances 
     577 
     578        Minimal number of instances in internal leaves. Subsets with 
     579        less than :obj:`min_instances` instances are not split any further. 
     580        The sample count is weighed. 
     581 
     582 
     583Splitters 
    628584================= 
    629585 
    630586Just like the :obj:`Descender` decides the branch for an 
    631 example during classification, the :obj:`ExampleSplitter` 
    632 sorts the learning examples into branches. 
    633  
    634 :obj:`ExampleSplitter` is given a :obj:`Node` (from which  
     587instance during classification, the :obj:`Splitter` 
     588sorts the learning instances into branches. 
     589 
     590:obj:`Splitter` is given a :obj:`Node` (from which  
    635591it can use different stuff, but most of splitters only use the  
    636 :obj:`branch_selector`), a set of examples to be divided, and  
    637 the weight ID. The result is a list of subsets of examples 
     592:obj:`branch_selector`), a set of instances to be divided, and  
     593the weight ID. The result is a list of subsets of instances 
    638594and, optionally, a list of new weight ID's. 
    639595 
    640 Most :obj:`ExampleSplitter` classes simply call the node's 
    641 :obj:`branch_selector` and assign examples to corresponding branches. When 
     596Most :obj:`Splitter` classes simply call the node's 
     597:obj:`branch_selector` and assign instances to corresponding branches. When 
    642598the value is unknown they choose a particular branch or simply skip 
    643 the example. 
    644  
    645 Some enhanced splitters can split examples. An example (actually, a 
     599the instance. 
     600 
     601Some enhanced splitters can split instances. An instance (actually, a 
    646602pointer to it) is copied to more than one subset. To facilitate real 
    647603splitting, weights are needed. Each branch is assigned a weight ID (each 
    648 would usually have its own ID) and all examples that are in that branch 
     604would usually have its own ID) and all instances that are in that branch 
    649605(either completely or partially) should have this meta attribute. If an 
    650 example hasn't been split, it has only one additional attribute - with 
     606instance hasn't been split, it has only one additional attribute - with 
    651607weight ID corresponding to the subset to which it went. Example that 
    652608is split between, say, three subsets, has three new meta attributes, 
    653609one for each subset. ID's of weight meta attributes are returned by 
    654 the :obj:`ExampleSplitter` to be used at induction of the corresponding 
     610the :obj:`Splitter` to be used at induction of the corresponding 
    655611subtrees. 
    656612 
     
    660616 
    661617 
    662 .. class:: ExampleSplitter 
    663  
    664     An abstract base class for objects that split sets of examples 
    665     into subsets. The derived classes treat examples which cannot be 
     618.. class:: Splitter 
     619 
     620    An abstract base class for objects that split sets of instances 
     621    into subsets. The derived classes treat instances which cannot be 
    666622    unambiguously placed into a single branch (usually due to unknown 
    667623    value of the crucial attribute) differently. 
    668624 
    669     .. method:: __call__(node, examples[, weightID]) 
     625    .. method:: __call__(node, instances[, weightID]) 
    670626         
    671627        Use the information in :obj:`node` (particularly the 
    672         :obj:`branch_selector`) to split the given set of examples into 
    673         subsets.  Return a tuple with a list of example generators and 
     628        :obj:`branch_selector`) to split the given set of instances into 
     629        subsets.  Return a tuple with a list of instance generators and 
    674630        a list of weights.  The list of weights is either an ordinary 
    675         python list of integers or a None when no splitting of examples 
     631        python list of integers or a None when no splitting of instances 
    676632        occurs and thus no weights are needed. 
    677633 
    678634 
    679 .. class:: ExampleSplitter_IgnoreUnknowns 
    680  
    681     Bases: :class:`ExampleSplitter` 
    682  
    683     Simply ignores the examples for which no single branch can be 
     635.. class:: Splitter_IgnoreUnknowns 
     636 
     637    Bases: :class:`Splitter` 
     638 
     639    Simply ignores the instances for which no single branch can be 
    684640    determined. 
    685641 
    686 .. class:: ExampleSplitter_UnknownsToCommon 
    687  
    688     Bases: :class:`ExampleSplitter` 
    689  
    690     Places all such examples to a branch with the highest number of 
    691     examples. If there is more than one such branch, one is selected at 
    692     random and then used for all examples. 
    693  
    694 .. class:: ExampleSplitter_UnknownsToAll 
    695  
    696     Bases: :class:`ExampleSplitter` 
    697  
    698     Places examples with unknown value of the attribute into all branches. 
    699  
    700 .. class:: ExampleSplitter_UnknownsToRandom 
    701  
    702     Bases: :class:`ExampleSplitter` 
    703  
    704     Selects a random branch for such examples. 
    705  
    706 .. class:: ExampleSplitter_UnknownsToBranch 
    707  
    708     Bases: :class:`ExampleSplitter` 
    709  
    710     Constructs an additional branch to contain all such examples.  
     642.. class:: Splitter_UnknownsToCommon 
     643 
     644    Bases: :class:`Splitter` 
     645 
     646    Places all such instances to a branch with the highest number of 
     647    instances. If there is more than one such branch, one is selected at 
     648    random and then used for all instances. 
     649 
     650.. class:: Splitter_UnknownsToAll 
     651 
     652    Bases: :class:`Splitter` 
     653 
     654    Places instances with unknown value of the attribute into all branches. 
     655 
     656.. class:: Splitter_UnknownsToRandom 
     657 
     658    Bases: :class:`Splitter` 
     659 
     660    Selects a random branch for such instances. 
     661 
     662.. class:: Splitter_UnknownsToBranch 
     663 
     664    Bases: :class:`Splitter` 
     665 
     666    Constructs an additional branch to contain all such instances.  
    711667    The branch's description is "unknown". 
    712668 
    713 .. class:: ExampleSplitter_UnknownsAsBranchSizes 
    714  
    715     Bases: :class:`ExampleSplitter` 
    716  
    717     Splits examples with unknown value of the attribute according to 
    718     proportions of examples in each branch. 
    719  
    720 .. class:: ExampleSplitter_UnknownsAsSelector 
    721  
    722     Bases: :class:`ExampleSplitter` 
    723  
    724     Splits examples with unknown value of the attribute according to 
     669.. class:: Splitter_UnknownsAsBranchSizes 
     670 
     671    Bases: :class:`Splitter` 
     672 
     673    Splits instances with unknown value of the attribute according to 
     674    proportions of instances in each branch. 
     675 
     676.. class:: Splitter_UnknownsAsSelector 
     677 
     678    Bases: :class:`Splitter` 
     679 
     680    Splits instances with unknown value of the attribute according to 
    725681    distribution proposed by selector (which is in most cases the same 
    726     as proportions of examples in branches). 
     682    as proportions of instances in branches). 
    727683 
    728684Descenders 
    729685============================= 
    730686 
    731 This is a classifier's counterpart for :class:`ExampleSplitter`. It 
    732 decides the destiny of examples that need to be classified and cannot 
     687This is a classifier's counterpart for :class:`Splitter`. It 
     688decides the destiny of instances that need to be classified and cannot 
    733689be unambiguously put in a branch. 
    734690 
     
    763719       instances that were assigned to each branch, or to something else. 
    764720 
    765     .. method:: __call__(node, example) 
     721    .. method:: __call__(node, instance) 
    766722 
    767723        Descends down the tree until it reaches a leaf or a node in 
     
    771727        weights of votes for subtrees (a list of floats). 
    772728 
    773         :obj:`Descender`'s that never split examples always descend to a 
    774         leaf, but they differ in the treatment of examples with unknown 
    775         values (or, in general, examples for which a branch cannot be 
     729        :obj:`Descender`'s that never split instances always descend to a 
     730        leaf, but they differ in the treatment of instances with unknown 
     731        values (or, in general, instances for which a branch cannot be 
    776732        determined at some node(s) the tree).  :obj:`Descender`'s that 
    777         do split examples differ in returned vote weights. 
     733        do split instances differ in returned vote weights. 
    778734 
    779735.. class:: Descender_UnknownsToNode 
     
    781737    Bases: :obj:`Descender` 
    782738 
    783     When example cannot be classified into a single branch, the current 
     739    When instance cannot be classified into a single branch, the current 
    784740    node is returned. Thus, the node's :obj:`NodeClassifier` will be used 
    785741    to make a decision. It is your responsibility to see that even the 
     
    792748    Bases: :obj:`Descender` 
    793749 
    794     Classifies examples with unknown value to a special branch. This 
     750    Classifies instances with unknown value to a special branch. This 
    795751    makes sense only if the tree itself was constructed with 
    796     :obj:`ExampleSplitter_UnknownsToBranch`. 
     752    :obj:`Splitter_UnknownsToBranch`. 
    797753 
    798754.. class:: Descender_UnknownsToCommonBranch 
     
    800756    Bases: :obj:`Descender` 
    801757 
    802     Classifies examples with unknown values to the branch with the 
    803     highest number of examples. If there is more than one such branch, 
    804     random branch is chosen for each example that is to be classified. 
     758    Classifies instances with unknown values to the branch with the 
     759    highest number of instances. If there is more than one such branch, 
     760    random branch is chosen for each instance that is to be classified. 
    805761 
    806762.. class:: Descender_UnknownsToCommonSelector 
     
    808764    Bases: :obj:`Descender` 
    809765 
    810     Classifies examples with unknown values to the branch which received 
     766    Classifies instances with unknown values to the branch which received 
    811767    the highest recommendation by the selector. 
    812768 
     
    815771    Bases: :obj:`Descender` 
    816772 
    817     Makes the subtrees vote for the example's class; the vote is weighted 
     773    Makes the subtrees vote for the instance's class; the vote is weighted 
    818774    according to the sizes of the branches. 
    819775 
     
    822778    Bases: :obj:`Descender` 
    823779 
    824     Makes the subtrees vote for the example's class; the vote is weighted 
     780    Makes the subtrees vote for the instance's class; the vote is weighted 
    825781    according to the selectors proposal. 
    826782 
     
    898854 
    899855The included printing functions can print out practically anything you'd 
    900 like to know, from the number of examples, proportion of examples of 
     856like to know, from the number of instances, proportion of instances of 
    901857majority class in nodes and similar, to more complex statistics like the 
    902 proportion of examples in a particular class divided by the proportion 
     858proportion of instances in a particular class divided by the proportion 
    903859of examples of this class in a parent node. And even more, you can define 
    904860your own callback functions to be used for printing. 
     
    16481604         TreeSplitConstructor as SplitConstructor, \ 
    16491605              TreeSplitConstructor_Combined as SplitConstructor_Combined, \ 
    1650               TreeSplitConstructor_Measure as SplitConstructor_Measure, \ 
     1606              TreeSplitConstructor_Measure as SplitConstructor_Score, \ 
    16511607                   TreeSplitConstructor_Attribute as SplitConstructor_Feature, \ 
    16521608                   TreeSplitConstructor_ExhaustiveBinary as SplitConstructor_ExhaustiveBinary, \ 
     
    19781934        If 1, :class:`SplitConstructor_ExhaustiveBinary` is used. 
    19791935        If 2, use :class:`SplitConstructor_OneAgainstOthers`. If 
    1980         0, do not use binarization (use :class:`SplitConstructor_Attribute`). 
     1936        0, do not use binarization (use :class:`SplitConstructor_Feature`). 
    19811937        Default: 0. 
    19821938 
     
    19981954    .. attribute:: splitter 
    19991955 
    2000         :class:`ExampleSplitter`  or a function with the same 
    2001         signature as :obj:`ExampleSplitter.__call__`. The default is 
    2002         :class:`ExampleSplitter_UnknownsAsSelector` that splits the 
     1956        :class:`Splitter`  or a function with the same 
     1957        signature as :obj:`Splitter.__call__`. The default is 
     1958        :class:`Splitter_UnknownsAsSelector` that splits the 
    20031959        learning instances according to distributions given by the 
    20041960        selector. 
  • orange/fixes/fix_changed_names.py

    r8157 r8237  
    190190           "orange.TreeSplitConstructor":"Orange.classification.tree.SplitConstructor", 
    191191           "orange.TreeSplitConstructor_Combined":"Orange.classification.tree.SplitConstructor_Combined", 
    192            "orange.TreeSplitConstructor_Measure":"Orange.classification.tree.SplitConstructor_Measure", 
     192           "orange.TreeSplitConstructor_Measure":"Orange.classification.tree.SplitConstructor_Score", 
    193193           "orange.TreeSplitConstructor_Attribute":"Orange.classification.tree.SplitConstructor_Feature", 
    194194           "orange.TreeSplitConstructor_ExhaustiveBinary":"Orange.classification.tree.SplitConstructor_ExhaustiveBinary", 
Note: See TracChangeset for help on using the changeset viewer.