Changeset 8185:4c70bc72d28a in orange


Ignore:
Timestamp:
08/16/11 16:40:25 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
27e8f21365fb533f3d52240c5d1c5f82f9a731e2
Message:

Orange.classification.tree text.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r8148 r8185  
    1010******************************* 
    1111 
    12 To build a small tree (:obj:`TreeClassifier`) from the iris data set 
     12To build a :obj:`TreeClassifier` from the Iris data set 
    1313(with the depth limited to three levels), use (part of `orngTree1.py`_, 
    1414uses `iris.tab`_): 
     
    1919.. _orngTree1.py: code/orngTree1.py 
    2020 
     21See `Decision tree learning 
     22<http://en.wikipedia.org/wiki/Decision_tree_learning>`_ on Wikipedia 
     23for introduction to classification trees. 
    2124 
    2225This page first describes the learner and the classifier, and then 
     
    2932.. autoclass:: TreeClassifier 
    3033    :members: 
     34 
     35    .. attribute:: tree 
     36 
     37        The root of the tree, represented as a :class:`Node`. 
     38 
     39 
    3140 
    3241======== 
     
    255264:obj:`Node` classes. 
    256265 
    257 Several classes described above are already functional and can 
    258 (and mostly will) be used as they are: :obj:`Node`, :obj:`_TreeLearner` 
    259 and :obj:`TreeClassifier`.  Classes :obj:`SplitConstructor`, 
    260 :obj:`StopCriteria`, :obj:`ExampleSplitter`, :obj:`Descender` are among 
    261 the Orange (C++ implemented) classes that can be subtyped in Python. You 
    262 can thus program your own components based on these classes. 
     266Classes :obj:`SplitConstructor`, :obj:`StopCriteria`, 
     267:obj:`ExampleSplitter`, :obj:`Descender` can be subtyped in Python. You 
     268can thus program your own components based on these classes (TODO). 
    263269 
    264270.. class:: Node 
    265271 
    266     Node stores information about the learning examples belonging to 
    267     the node, a branch selector, a list of branches (if the node is not 
    268     a leaf) with their descriptions and strengths, and a classifier. 
     272    Node stores the instances belonging to the node, a branch selector, 
     273    a list of branches (if the node is not a leaf) with their descriptions 
     274    and strengths, and a classifier. 
    269275 
    270276    .. attribute:: distribution 
    271277     
    272         Stores a distribution for learning examples belonging to the 
    273         node.  Storing distributions can be disabled by setting the 
    274         :obj:`_TreeLearner`'s store_distributions flag to false. 
     278        A distribution for learning instances in the 
     279        node. 
    275280 
    276281    .. attribute:: contingency 
    277282 
    278         Stores complete contingency matrices for the learning examples 
    279         belonging to the node. Storing contingencies can be enabled by 
    280         setting :obj:`_TreeLearner`'s :obj:`store_contingencies` flag to 
    281         true. Note that even when the flag is not set, the contingencies 
    282         get computed and stored to :obj:`Node`, but are removed shortly 
    283         afterwards.  The details are given in the description of the 
    284         :obj:`_TreeLearner` object. 
     283        Complete contingency matrices for the learning instances 
     284        in the node. 
    285285 
    286286    .. attribute:: examples, weightID 
    287287 
    288         Store a set of learning examples for the node and the 
    289         corresponding ID of /weight meta attribute. The root of the 
    290         tree stores a "master table" of examples, while other nodes' 
    291         :obj:`Orange.data.Table` contain reference to examples in the 
    292         root's :obj:`Orange.data.Table`. Examples are only stored if 
    293         a corresponding flag (:obj:`store_examples`) has been set while 
    294         building the tree; to conserve the space, storing is disabled 
    295         by default. 
     288        Learning instancess for the node and the corresponding ID 
     289        of weight meta attribute. The root of the tree stores all 
     290        instances, while other nodes store only reference to instances 
     291        in the root node. 
    296292 
    297293    .. attribute:: node_classifier 
    298294 
    299         A classifier (usually, but not necessarily, a 
    300         :obj:`DefaultClassifier`) that can be used to classify examples 
    301         coming to the node. If the node is a leaf, this is used to 
    302         decide the final class (or class distribution) of an example. If 
    303         it's an internal node, it is stored if :obj:`Node`'s flag 
    304         :obj:`store_node_classifier` is set. Since the :obj:`node_classifier` 
    305         is needed by :obj:`Descender` and for pruning (see far below), 
    306         this is the default behaviour; space consumption of the default 
    307         :obj:`DefaultClassifier` is rather small. You should never 
    308         disable this if you intend to prune the tree later. 
    309  
    310     If the node is a leaf, the remaining fields are None.  If it's an 
    311     internal node, there are several additional fields. 
     295        A classifier (usually a :obj:`DefaultClassifier`) that can be used 
     296        to classify instances coming to the node. If the node is a leaf, 
     297        this is used to decide the final class (or class distribution) 
     298        of an instance. If it's an internal node, it is stored if 
     299        :obj:`Node`'s flag :obj:`store_node_classifier` is set. Since 
     300        the :obj:`node_classifier` is needed by :obj:`Descender` and 
     301        for pruning (see far below), this is the default behaviour; 
     302        space consumption of the default :obj:`DefaultClassifier` is 
     303        rather small. You should never disable this if you intend to 
     304        prune the tree later. 
     305 
     306    If the node is a leaf, the remaining fields are None. If it's 
     307    an internal node, there are several additional fields. The lists 
     308    :obj:`branches`, :obj:`branch_descriptions` and :obj:`branch_sizes` 
     309    are of the same length. 
    312310 
    313311    .. attribute:: branches 
     
    324322    .. attribute:: branch_sizes 
    325323 
    326         Gives a (weighted) number of training examples that went into 
     324        Gives a (weighted) number of training instances that went into 
    327325        each branch. This can be used later, for instance, for modeling 
    328         probabilities when classifying examples with unknown values. 
     326        probabilities when classifying instances with unknown values. 
    329327 
    330328    .. attribute:: branch_selector 
    331329 
    332         Gives a branch for each example. The same object is used 
     330        Gives a branch for each instance. The same object is used 
    333331        during learning and classifying. The :obj:`branch_selector` 
    334332        is of type :obj:`Orange.classification.Classifier`, since its job is 
    335         similar to that of a classifier: it gets an example and 
     333        similar to that of a classifier: it gets an instance and 
    336334        returns discrete :obj:`Orange.data.Value` in range :samp:`[0, 
    337         len(branches)-1]`.  When an example cannot be classified to 
     335        len(branches)-1]`.  When an instance cannot be classified to 
    338336        any branch, the selector can return a :obj:`Orange.data.Value` 
    339337        containing a special value (sVal) which should be a discrete 
    340338        distribution (DiscDistribution). This should represent a 
    341         :obj:`branch_selector`'s opinion of how to divide the example 
     339        :obj:`branch_selector`'s opinion of how to divide the instance 
    342340        between the branches. Whether the proposition will be used or not 
    343341        depends upon the chosen :obj:`ExampleSplitter` (when learning) 
    344342        or :obj:`Descender` (when classifying). 
    345  
    346     The lists :obj:`branches`, :obj:`branch_descriptions` and 
    347     :obj:`branch_sizes` are of the same length; all of them are 
    348     defined if the node is internal and none if it is a leaf. 
    349343 
    350344    .. method:: tree_size() 
     
    357351 
    358352    Classifies examples according to a tree stored in :obj:`tree`. 
    359     Not meant to be used directly. The :class:`TreeLearner` class 
    360     constructs :class:`TreeClassifier`. 
     353    This class in not to be used directly. The :class:`TreeLearner` 
     354    constructs a :class:`TreeClassifier`, which uses this class. 
    361355 
    362356    .. attribute:: tree 
     
    19861980    """ 
    19871981 
    1988     def __init__(self, baseClassifier): 
    1989         self.nativeClassifier = baseClassifier 
     1982    def __init__(self, base_classifier): 
     1983        self.nativeClassifier = base_classifier 
    19901984        for k, v in self.nativeClassifier.__dict__.items(): 
    19911985            self.__dict__[k] = v 
     
    21122106class TreeLearner(Orange.core.Learner): 
    21132107    """ 
    2114     Assembles the generic classification or regression tree learner  
    2115     (from Orange's objects for induction of decision trees).  
    2116     :class:`TreeLearner` is essentially a wrapper 
    2117     around :class:`_TreeLearner`, provided for easier use of the latter. 
    2118     It sets a number of parameters used in induction that 
    2119     can also be set after the creation of the object, most often through 
    2120     the object's attributes. If upon initialization 
    2121     :class:`TreeLearner` is given a set of examples, then an instance 
    2122     of :class:`TreeClassifier` object is returned instead. 
    2123  
    2124     Attributes can be also be set in the constructor.  
     2108    Assembles the classification or regression tree learner.  Essentially, 
     2109    :class:`TreeLearner` is a wrapper around :class:`_TreeLearner` and 
     2110    provides easier use of the latter.  It sets parameters for tree 
     2111    induction, that are controlled through the object's attributes. 
     2112    the object's attributes. If upon initialization :class:`TreeLearner` 
     2113    is given a set of instances, then an :class:`TreeClassifier` object 
     2114    is built and returned instead. 
     2115 
     2116    Attributes can be also be set on initialization.  
    21252117 
    21262118    .. attribute:: node_learner 
    21272119 
    2128         Induces a classifier from examples belonging to a node. The 
    2129         same learner is used for internal nodes and for leaves. The 
    2130         default is :obj:`Orange.classification.majority.MajorityLearner`. 
     2120        Induces a classifier from instances in a node, both 
     2121        used for internal nodes and leaves. The default is 
     2122        :obj:`Orange.classification.majority.MajorityLearner`. 
    21312123 
    21322124    **Split construction** 
     
    21342126    .. attribute:: split 
    21352127         
    2136         Defines a function that will be used in place of 
    2137         :obj:`SplitConstructor`.  
    2138         Useful when prototyping new tree induction 
    2139         algorithms. When this parameter is defined, other parameters that 
    2140         affect the procedures for growing of the tree are ignored. These 
    2141         include :obj:`binarization`, :obj:`measure`, 
    2142         :obj:`worst_acceptable` and :obj:`min_subset` (Default: 
    2143         :class:SplitConstructor_Combined  
    2144         with separate constructors for discrete and continuous attributes. 
    2145         Discrete attributes are used as they are, while  
    2146         continuous attributes are binarized. 
    2147         Gain ratio is used to select attributes.  
    2148         A minimum of two examples in a leaf is required for  
    2149         discrete and five examples in a leaf for continuous attributes.) 
     2128        A :obj:`SplitConstructor` or a function with the same signature as 
     2129        :obj:`SplitConstructor.__call__`. It is useful for prototyping 
     2130        new tree induction algorithms. When defined, other parameters 
     2131        that affect  the split construction are ignored. These include 
     2132        :obj:`binarization`, :obj:`measure`, :obj:`worst_acceptable` and 
     2133        :obj:`min_subset`. Default: :class:`SplitConstructor_Combined` 
     2134        with separate constructors for discrete and continuous 
     2135        attributes.  Discrete attributes are used as they are, while 
     2136        continuous attributes are binarized.  Gain ratio is used to select 
     2137        attributes.  A minimum of two instances in a leaf is required for 
     2138        discrete and five instances in a leaf for continuous attributes. 
    21502139 
    21512140    .. attribute:: binarization 
     
    21692158    .. attribute:: relief_m, relief_k 
    21702159 
    2171         Sem `m` and `k` to given values if the :obj:`measure` is relief. 
     2160        Set `m` and `k` for Relief, if chosen. 
    21722161 
    21732162    .. attribute:: splitter 
    21742163 
    2175         Object of type :class:`ExampleSplitter`. The default splitter 
    2176         is :class:`ExampleSplitter_UnknownsAsSelector` that splits 
    2177         the learning examples according to distributions given by the 
     2164        :class:`ExampleSplitter`  or a function with the same 
     2165        signature as :obj:`ExampleSplitter.__call__`. The default is 
     2166        :class:`ExampleSplitter_UnknownsAsSelector` that splits the 
     2167        learning examples according to distributions given by the 
    21782168        selector. 
    21792169 
     
    21912181    .. attribute:: min_subset 
    21922182 
    2193         Minimal number of examples in non-null leaves (default: 0). 
     2183        The smalles number of instances in non-null leaves (default: 0). 
    21942184 
    21952185    .. attribute:: min_examples 
    21962186 
    21972187        Data subsets with less than :obj:`min_examples` 
    2198         examples are not split any further, that is, all leaves in the tree 
    2199         will contain at least that many of examples (default: 0). 
     2188        instances are not split any further, that is, all leaves in the tree 
     2189        will contain at least that many instances (default: 0). 
    22002190 
    22012191    .. attribute:: max_depth 
     
    22072197 
    22082198        Induction stops when the proportion of majority class in the 
    2209         node exceeds the value set by this parameter(default: 1.0).  
     2199        node exceeds the value set by this parameter (default: 1.0).  
    22102200        To stop the induction as soon as the majority class reaches 70%, 
    2211         you should use :samp:`max_majority=0.7`, as in the following 
     2201        use :samp:`max_majority=0.7`, as in the following 
    22122202        example. The numbers show the majority class  
    22132203        proportion at each node. The script `tree2.py`_ induces and  
    22142204        prints this tree. 
     2205 
     2206        FIXME 
    22152207 
    22162208        .. _tree2.py: code/tree2.py 
     
    22262218    .. attribute:: stop 
    22272219 
    2228         Used for passing a function which is used in place of 
    2229         :class:`StopCriteria`. Useful when prototyping new 
    2230         tree induction algorithms. See a documentation on  
    2231         :class:`StopCriteria` for more info on this function.  
    2232         When used, parameters  :obj:`max_majority` and :obj:`min_examples`  
    2233         will not be  considered (default: None).  
    2234         The default stopping criterion stops induction when all examples  
    2235         in a node belong to the same class. 
     2220        :class:`StopCriteria` or a function with the same signature as 
     2221        :obj:`StopCriteria.__call__`. Useful for prototyping new tree 
     2222        induction algorithms.  When used, parameters  :obj:`max_majority` 
     2223        and :obj:`min_examples` will not be  considered.  The default 
     2224        stopping criterion stops induction when all examples in a node 
     2225        belong to the same class. 
    22362226 
    22372227    .. attribute:: m_pruning 
     
    22532243        Determines whether to store class distributions, contingencies and 
    22542244        examples in :class:`Node`, and whether the :obj:`node_classifier` 
    2255         should be build for internal nodes. By default everything except  
    2256         :obj:`store_examples` is enabled. You won't save any memory by not storing  
    2257         distributions but storing contingencies, since distributions actually points to 
    2258         the same distribution that is stored in 
    2259         :obj:`contingency.classes`. (default: True except for 
    2260         store_examples, which defaults to False). 
     2245        should be build for internal nodes.  You won't save any memory 
     2246        by not storing distributions but storing contingencies, since 
     2247        distributions actually points to the same distribution that is 
     2248        stored in :obj:`contingency.classes`.  By default everything 
     2249        except :obj:`store_examples` is enabled. 
    22612250     
    22622251    """ 
     
    22892278            self.__setattr__(k,v) 
    22902279       
    2291     def __call__(self, examples, weight=0): 
     2280    def __call__(self, instances, weight=0): 
    22922281        """ 
    2293         Return a classifier from the given examples. 
     2282        Return a classifier from the given instances. 
    22942283        """ 
    22952284        bl = self._base_learner() 
     
    22992288        if not self._handset_split and not self.measure: 
    23002289            measure = fscoring.GainRatio() \ 
    2301                 if examples.domain.class_var.var_type == Orange.data.Type.Discrete \ 
     2290                if instances.domain.class_var.var_type == Orange.data.Type.Discrete \ 
    23022291                else fscoring.MSE() 
    23032292            bl.split.continuous_split_constructor.measure = measure 
     
    23082297 
    23092298        #post pruning 
    2310         tree = bl(examples, weight) 
     2299        tree = bl(instances, weight) 
    23112300        if getattr(self, "same_majority_pruning", 0): 
    23122301            tree = Pruner_SameMajority(tree) 
     
    23142303            tree = Pruner_m(tree, m=self.m_pruning) 
    23152304 
    2316         return TreeClassifier(baseClassifier=tree)  
     2305        return TreeClassifier(base_classifier=tree)  
    23172306 
    23182307    def __setattr__(self, name, value): 
     
    29562945    """ 
    29572946     
    2958     def __init__(self, baseClassifier=None): 
    2959         if not baseClassifier: baseClassifier = _TreeClassifier() 
    2960         self.nativeClassifier = baseClassifier 
     2947    def __init__(self, base_classifier=None): 
     2948        if not base_classifier: base_classifier = _TreeClassifier() 
     2949        self.nativeClassifier = base_classifier 
    29612950        for k, v in self.nativeClassifier.__dict__.items(): 
    29622951            self.__dict__[k] = v 
     
    29992988        Return a string representation of a tree. 
    30002989 
    3001         :arg leaf_str: The format string for printing the tree leaves. If  
     2990        :arg leaf_str: The format string for the tree leaves. If  
    30022991          left empty, "%V (%^.2m%)" will be used for classification trees 
    30032992          and "%V" for regression trees. 
    30042993        :type leaf_str: string 
    3005         :arg node_str: The format string for printing out the internal nodes. 
     2994        :arg node_str: The format string for the internal nodes. 
    30062995          If left empty (as it is by default), no data is printed out for 
    30072996          internal nodes. If set to :samp:`"."`, the same string is 
     
    30363025            user_formats=[], min_examples=0, max_depth=1e10, \ 
    30373026            simple_first=True): 
    3038         """ Print the tree to a file in a format used by  
    3039         `GraphViz <http://www.research.att.com/sw/tools/graphviz>`_. 
    3040         Uses the same parameters as :meth:`dump` defined above 
    3041         plus two parameters which define the shape used for internal 
    3042         nodes and leaves of the tree: 
     3027        """ Print the tree to a file in a format used by `GraphViz 
     3028        <http://www.research.att.com/sw/tools/graphviz>`_.  Uses the 
     3029        same parameters as :meth:`dump` plus two which define the shape 
     3030        of internal nodes and leaves of the tree: 
    30433031 
    30443032        :param leaf_shape: Shape of the outline around leaves of the tree.  
Note: See TracChangeset for help on using the changeset viewer.