Changeset 8258:7772627263db in orange


Ignore:
Timestamp:
08/22/11 15:50:27 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
9d93b8257ac287c9aa4d2077acabc036ada1a5ec
Message:

Orange.classification.tree fixes.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r8237 r8258  
    419419    Bases: :class:`SplitConstructor` 
    420420 
    421     An abstract base class for split constructors that employ 
    422     a :class:`Orange.feature.scoring.Score` to assess a 
    423     quality of a split.  All split constructors except for 
    424     :obj:`SplitConstructor_Combined` are derived from this class. 
     421    An abstract base class for split constructors that compare splits 
     422    with a :class:`Orange.feature.scoring.Score`.  All split 
     423    constructors except for :obj:`SplitConstructor_Combined` are derived 
     424    from this class. 
    425425 
    426426    .. attribute:: measure 
    427427 
    428         A component of type :class:`Orange.feature.scoring.Score` 
    429         used for split evaluation. You must select a 
    430         :class:`Orange.feature.scoring.Score` capable of 
    431         handling your class type - for example, you cannot use 
    432         :class:`Orange.feature.scoring.GainRatio` for regression 
    433         trees or :class:`Orange.feature.scoring.MSE` for classification 
    434         trees. 
     428        A :class:`Orange.feature.scoring.Score` for split evaluation. It 
     429        has to handle the class type - for example, you cannot use 
     430        :class:`~Orange.feature.scoring.GainRatio` for regression or 
     431        :class:`~Orange.feature.scoring.MSE` for classification. 
    435432 
    436433    .. attribute:: worst_acceptable 
    437434 
    438         The lowest required split quality for a split to be acceptable. 
    439         Note that this value make sense only in connection with a 
    440         :obj:`measure` component. Default is 0.0. 
     435        The lowest allowed split quality.  The value strongly depends 
     436        on chosen :obj:`measure` component. Default is 0.0. 
    441437 
    442438.. class:: SplitConstructor_Feature 
     
    444440    Bases: :class:`SplitConstructor_Score` 
    445441 
    446     Each value of a discrete feature corresponds to a branch 
    447     in the tree. The features with with the highest score 
    448     (:obj:`~Measure.measure`) is used for a split. If multiple features 
    449     are tied for highest score, select a random one. 
     442    Each value of a discrete feature corresponds to a branch.  The feature 
     443    with the highest score (:obj:`~Measure.measure`) is selected. When 
     444    tied, a random feature is selected. 
    450445 
    451446    The constructed :obj:`branch_selector` is an instance of 
    452     :obj:`orange.ClassifierFromVarFD` that returns a value of the 
    453     selected feature. :obj:`branch_description` contains the feature's 
    454     values. The feature is marked as spent, so that it cannot reappear 
    455     in the node's subtrees. 
     447    :obj:`orange.ClassifierFromVarFD` that returns a value of the selected 
     448    feature. :obj:`branch_description` contains the feature's 
     449    values. The feature is marked as spent (it cannot reappear in the 
     450    node's subtrees). 
    456451 
    457452.. class:: SplitConstructor_ExhaustiveBinary 
     
    459454    Bases: :class:`SplitConstructor_Score` 
    460455 
    461     Works on discrete features. For each feature it determines which 
    462     binarization gives the the highest score. In case of ties, a random 
    463     feature is selected. 
     456    For each discrete feature it determines which binarization gives 
     457    the the highest score. In case of ties, a random feature is selected. 
    464458 
    465459    The constructed :obj:`branch_selector` is an instance 
     
    470464    with more than one feature value. Branches with a single feature 
    471465    value are described with that value. If the feature was binary, 
    472     it is spent and cannot be used in the node's subtrees. Otherwise, 
    473     it can reappear in the subtrees. 
     466    it is spent and cannot be used in the node's subtrees. Otherwise 
     467    it is not spent. 
    474468 
    475469 
     
    478472    Bases: :class:`SplitConstructor_Score` 
    479473 
    480     Currently the only one for continuous features.  It divides the 
     474    The only split constructor for continuous features.  It divides the 
    481475    range of feature values with a threshold that maximizes the split's 
    482476    quality. In case of ties, a random feature is selected.  The feature 
     
    499493    Bases: :class:`SplitConstructor` 
    500494 
    501     This constructor delegates the task of finding the optimal split  
    502     to separate split constructors for discrete and for continuous 
    503     features. Each split constructor is called given only features 
    504     of appropriate type. Both construct a candidate for 
    505     a split; the better of them is selected. 
    506  
    507     Note that there is a problem when more candidates have the same 
    508     score. Let there be are nine discrete features with the highest 
    509     score; the split constructor for discrete features will select 
    510     one of them. Now, let us suppose that there is a single continuous 
    511     feature with the same score. :obj:`SplitConstructor_Combined` 
    512     would randomly select between the proposed discrete feature and 
    513     the continuous feature, not aware of the fact that the discrete 
    514     has already competed with eight other discrete features. So, the 
    515     probability for selecting (each) discrete feature would be 1/18 
    516     instead of 1/10. Although not really correct, we doubt that this would 
    517     affect the tree's performance. 
     495    This split constructor uses different split constructors for 
     496    discrete and continuous features. Each split constructor is called 
     497    with features of appropriate type only. Both construct a candidate 
     498    for a split; the better of them is selected. 
     499 
     500    There is a problem when multiple candidates have the same score. Let 
     501    there be nine discrete features with the highest score; the split 
     502    constructor for discrete features will select one of them. Now, 
     503    if there is a single continuous feature with the same score, 
     504    :obj:`SplitConstructor_Combined` would randomly select between the 
     505    proposed discrete feature and the continuous feature. It is not aware 
     506    of that the discrete has already competed with eight other discrete 
     507    features. So, the probability for selecting (each) discrete feature 
     508    would be 1/18 instead of 1/10. Although incorrect, we doubt that 
     509    this would affect the tree's performance. 
    518510 
    519511    The :obj:`branch_selector`, :obj:`branch_descriptions` and whether 
     
    540532.. class:: StopCriteria 
    541533 
    542     Decide 
    543     whether to continue the induction or not. The basic criterion checks 
    544     if there are any instances and if they belong to at least 
    545     two different classes (if the class is discrete). Derived components 
    546     check things like the number of instances and the proportion of 
    547     majority classes. 
    548  
    549     :obj:`StopCriteria` is not an abstract but a fully functional 
    550     class that provides the basic stopping criteria. That is, the tree 
    551     induction stops when there is at most one instance left;  
    552     it is not the weighted but the actual number of instances that counts. 
    553     The induction also stops when all instances are in the same 
    554     class (for discrete problems) or have the same outcome value  
     534    Provides the basic stopping criteria: the tree induction stops 
     535    when there is at most one instance left (the actual, not weighted, 
     536    number). The induction also stops when all instances are in the 
     537    same class (for discrete problems) or have the same outcome value 
    555538    (for regression problems). 
    556539 
    557540    .. method:: __call__(instances[, weightID, domain contingencies]) 
    558541 
    559         Retunr True (stop) of False (continue the induction). 
     542        Return True (stop) of False (continue the induction). 
    560543        If contingencies are given, they are used for checking whether 
    561         the instances are in the same classm (but not for 
    562         instance counting). Derived classes should use the contingencies 
    563         whenever possible. 
     544        classes but not for counting. Derived classes should use the 
     545        contingencies whenever possible. 
    564546 
    565547.. class:: StopCriteria_common 
    566548 
    567     Additional criteria for pre-pruning: 
    568     the proportion of majority class and the number of weighted 
    569     instances. 
     549    Additional criteria for pre-pruning: the proportion of majority 
     550    class and the number of weighted instances. 
    570551 
    571552    .. attribute:: max_majority 
     
    576557    .. attribute:: min_instances 
    577558 
    578         Minimal number of instances in internal leaves. Subsets with 
    579         less than :obj:`min_instances` instances are not split any further. 
     559        Minimal number of instances in internal leaves. Subsets with less 
     560        than :obj:`min_instances` instances are not split further. 
    580561        The sample count is weighed. 
    581562 
     
    15741555A home page of AT&T's dot and similar software packages. 
    15751556 
     1557""" 
     1558 
     1559""" 
     1560TODO C++ aliases 
     1561 
     1562SplitConstructor.discrete/continuous_split_constructor -> SplitConstructor.discrete  
     1563Node.examples -> Node.instances 
    15761564""" 
    15771565 
Note: See TracChangeset for help on using the changeset viewer.