Changeset 9073:36588b8e18ca in orange


Ignore:
Timestamp:
10/06/11 23:30:16 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
044a14a3681a2b095f4ab30e84ed9c34c4300694
Message:

updates to Orange.classification.Tree

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r9032 r9073  
    1010******************************* 
    1111 
    12 To build a :obj:`TreeClassifier` from the Iris data set 
    13 (with the depth limited to three levels), use: 
     12The following code builds a :obj:`TreeClassifier` on the Iris data set 
     13(with the depth limited to three levels): 
    1414 
    1515.. literalinclude:: code/orngTree1.py 
     
    3434.. class:: Node 
    3535 
    36     Classification trees are represented as a hierarchy of 
    37     :obj:`Node` classes. 
     36    Classification trees are a a hierarchy of :obj:`Node` classes. 
    3837 
    3938    Node stores the instances belonging to the node, a branch selector, 
     
    4342    .. attribute:: distribution 
    4443     
    45         A distribution for learning instances in the 
    46         node. 
     44        A distribution of learning instances. 
    4745 
    4846    .. attribute:: contingency 
    4947 
    50         Complete contingency matrices for the learning instances 
    51         in the node. 
     48        Complete contingency matrices for the learning instances. 
    5249 
    5350    .. attribute:: examples, weightID 
    5451 
    55         Learning instancess for the node and the corresponding ID 
    56         of weight meta attribute. The root of the tree stores all 
    57         instances, while other nodes store only reference to instances 
    58         in the root node. 
     52        Learning instances and the ID of weight meta attribute. The root 
     53        of the tree actually stores all instances, while other nodes 
     54        store only reference to instances in the root node. 
    5955 
    6056    .. attribute:: node_classifier 
    6157 
    62         A classifier (usually a :obj:`DefaultClassifier`) for instances 
    63         coming to the node. If the node is a leaf, it chooses the class 
    64         (or class distribution) of an instance. If it's an internal node, 
    65         it is only stored if :obj:`TreeLearner.store_node_classifier` 
    66         is True. 
    67  
    68     Internal nodes have additional 
    69     attributes. The lists :obj:`branches`, :obj:`branch_descriptions` and 
    70     :obj:`branch_sizes` are of the same length. 
     58        A classifier for instances coming to the node. If the node is a 
     59        leaf, it chooses the class (or class distribution) of an instance. 
     60 
     61    Internal nodes have additional attributes. The lists :obj:`branches`, 
     62    :obj:`branch_descriptions` and :obj:`branch_sizes` are of the 
     63    same length. 
    7164 
    7265    .. attribute:: branches 
     
    7770    .. attribute:: branch_descriptions 
    7871 
    79         A list with string describing branches, which are constructed by 
    80         :obj:`SplitConstructor`. It can contain anything, 
     72        A list with strings describing branches. They are constructed 
     73        by :obj:`SplitConstructor`. A string can contain anything, 
    8174        for example 'red' or '>12.3'. 
    8275 
     
    9083 
    9184        A :obj:`~Orange.classification.Classifier` that returns a branch 
    92         for each instance: it returns discrete 
    93         :obj:`Orange.data.Value` in ``[0, len(branches)-1]``.  When an 
     85        for each instance (as 
     86        :obj:`Orange.data.Value` in ``[0, len(branches)-1]``).  When an 
    9487        instance cannot be classified unambiguously, the selector can 
    9588        return a discrete distribution, which proposes how to divide 
     
    17441737class TreeLearner(Orange.core.Learner): 
    17451738    """ 
    1746     A classification or regression tree learner.  If upon 
    1747     initialization :class:`TreeLearner` is given a set of instances, 
    1748     then an :class:`TreeClassifier` object is built and returned 
    1749     instead. Attributes can be also be set on initialization. 
     1739    A classification or regression tree learner. If a set of instances 
     1740    is given on initialization, a :class:`TreeClassifier` is built and 
     1741    returned instead. All attributes can also be set on initialization. 
    17501742 
    17511743    **The tree induction process** 
    17521744 
    17531745    #. The learning instances are copied to a table, unless 
    1754        :obj:`store_instances` is `False`  and they already are in table. 
    1755     #. Apriori class probabilities are computed. If the sum 
    1756        of instance weights is zero, the process stops. A list of 
     1746       :obj:`store_instances` is `False` and they already are in table. 
     1747    #. Apriori class probabilities are computed. A list of 
    17571748       candidate features for the split is compiled; in the beginning, 
    1758        all attributes are candidates. 
    1759     #. The recursive part. Its 
    1760        arguments are a set of instances, a weight meta-attribute ID 
    1761        (it can change to 
    1762        accomodate splitting of instances among branches), apriori class 
    1763        distribution and a list of candidates (as a vector 
    1764        of booleans). 
    1765     #. The contingency matrix is computed.   
    1766     #. A :obj:`stop` is called 
    1767        to check whether to continue. If not, a 
    1768        :obj:`~Node.node_classifier` is built and the :obj:`Node` 
    1769        is returned. A :obj:`~Node.node_classifier` is also built 
    1770        for internal nodes if :obj:`store_node_classifier` is 
    1771        `True`.  The :obj:`~Node.node_classifier` is build by calling 
    1772        :obj:`node_learner`'s :obj:`smart_learn` function with the given 
    1773        instances, weight ID and the contingency matrix. As the learner 
    1774        uses contingencies whenever possible, a :obj:`contingency_computer` 
    1775        will often affect the :obj:`~Node.node_classifier`. If 
    1776        :obj:`node_learner` does not return a classifier and the classifier 
    1777        would be needed for classification, the :obj:`TreeClassifier`'s 
    1778        function returns DK or an empty distribution. 
    1779     #. If the induction continues continue, a :obj:`split` is called. 
    1780        If it fails to return a branch selector, induction stops and the 
    1781        :obj:`Node` is returned. 
     1749       all features are candidates. 
     1750    #. The recursive part. The contingency matrix is computed by 
     1751       :obj:`contingency_computer`. Contingencies are used by :obj:`split`, 
     1752       :obj:`stop` and :obj:`splitter`. 
     1753    #. If the induction should :obj:`stop`, a :obj:`~Node.node_classifier` 
     1754       is built by calling :obj:`node_learner` with the given instances, 
     1755       weight ID and the contingency matrix. As the learner uses 
     1756       contingencies whenever possible, the :obj:`contingency_computer` 
     1757       will affect the :obj:`~Node.node_classifier`. The node is returned. 
     1758    #. If the induction continues, a :obj:`split` is called. 
     1759       If :obj:`split` fails to return a branch selector, induction stops  
     1760       and the :obj:`Node` is returned. 
     1761    #. The feature spent (if any) is removed from the candidate list. 
    17821762    #. Instances are divided into child nodes with :obj:`splitter`. 
    1783     #. The contingency is removed if :obj:`store_contingencies` is 
    1784        `False`. Thus, :obj:`split`, :obj:`stop` and :obj:`splitter` 
    1785        were able to use the contingency matrices. 
    1786     #. The object recursively calls itself (see step 3) for each of 
    1787        the non-empty subsets. If the splitter returned weights, 
    1788        they are used for each branch. The feature spent 
    1789        (if any) is removed from the candidate list 
    1790        for the subtree. 
    1791     #. Instances are stored in the corresponding node, 
    1792        if :obj:`store_instances` is `True`. If not, the new weight 
    1793        attributes that were created are removed. 
     1763       The process recursively continues with step 3 for 
     1764       each of the non-empty subsets. If the splitter returned weights, 
     1765       they are used for each branch. 
    17941766 
    17951767    **Attributes** 
     
    18031775    .. attribute:: descender 
    18041776 
    1805         Descending component that the induces :obj:`TreeClassifier` will 
    1806         use. Default descender is :obj:`Descender_UnknownMergeAsSelector` 
    1807         which votes using the :obj:`branch_selector`'s distribution for 
     1777        The descender that the induced :obj:`TreeClassifier` will 
     1778        use. The default is :obj:`Descender_UnknownMergeAsSelector`. 
     1779        It votes with the :obj:`branch_selector`'s distribution for 
    18081780        vote weights. 
     1781 
     1782    .. attribute:: contingency_computer 
     1783 
     1784        Defines the computation of contingency matrices (used by 
     1785        :obj:`split`, :obj:`stop`, :obj:`splitter`). It can be used, 
     1786        for example, to change the treatment of unknown values. By 
     1787        default ordinary contingency matrices are computed. 
    18091788 
    18101789    **Split construction** 
     
    18141793        A :obj:`SplitConstructor` or a function with the same signature as 
    18151794        :obj:`SplitConstructor.__call__`. It is useful for prototyping 
    1816         new tree induction algorithms. When defined, other parameters 
    1817         that affect  the split construction are ignored. These include 
     1795        new tree induction algorithms. If :obj:`split` is defined, other  
     1796        arguments that affect split construction are ignored. These include 
    18181797        :obj:`binarization`, :obj:`measure`, :obj:`worst_acceptable` and 
    18191798        :obj:`min_subset`. Default: :class:`SplitConstructor_Combined` 
    18201799        with separate constructors for discrete and continuous 
    1821         attributes.  Discrete attributes are used as they are, while 
    1822         continuous attributes are binarized.  Gain ratio is used to select 
    1823         attributes.  A minimum of two instances in a leaf is required for 
    1824         discrete and five instances in a leaf for continuous attributes. 
     1800        features. Discrete features are used as they are, while 
     1801        continuous are binarized. Features are scored with gain ratio. 
     1802        At least two instances in a leaf are required for 
     1803        discrete and five for continuous features. 
    18251804 
    18261805    .. attribute:: binarization 
     
    18331812    .. attribute:: measure 
    18341813     
    1835         Measure for scoring of the attributes when deciding which of the 
    1836         attributes will be used for splitting of the instances in a node. 
    1837         A subclass of :class:`Orange.feature.scoring.Score` (perhaps 
    1838         :class:`~Orange.feature.scoring.InfoGain`,  
    1839         :class:`~Orange.feature.scoring.GainRatio`,  
     1814        A score to evaluate features for splitting instances in a 
     1815        node.  A subclass of :class:`Orange.feature.scoring.Score` 
     1816        (perhaps :class:`~Orange.feature.scoring.InfoGain`, 
     1817        :class:`~Orange.feature.scoring.GainRatio`, 
    18401818        :class:`~Orange.feature.scoring.Gini`, 
    18411819        :class:`~Orange.feature.scoring.Relief`, or 
    1842         :class:`~Orange.feature.scoring.MSE`). Default: :class:`Orange.feature.scoring.GainRatio`. 
     1820        :class:`~Orange.feature.scoring.MSE`). Default: 
     1821        :class:`Orange.feature.scoring.GainRatio`. 
    18431822 
    18441823    .. attribute:: relief_m, relief_k 
    18451824 
    1846         Set `m` and `k` for Relief, if chosen. 
     1825        Set `m` and `k` for :class:`~Orange.feature.scoring.Relief`, 
     1826        if chosen. 
    18471827 
    18481828    .. attribute:: splitter 
    18491829 
    1850         :class:`Splitter`  or a function with the same 
     1830        :class:`Splitter` or a function with the same 
    18511831        signature as :obj:`Splitter.__call__`. The default is 
    18521832        :class:`Splitter_UnknownsAsSelector` that splits the 
     
    18541834        selector. 
    18551835 
    1856     .. attribute:: contingency_computer 
    1857      
    1858         Used to change the way the contingency matrices (used 
    1859         by :class:`SplitConstructor` and :class:`StopCriteria`) are 
    1860         computed, for example, to change the treatment of unknown values. 
    1861         By default ordinary contingency matrices are computed for 
    1862         instances at each node. 
    1863  
    18641836    **Pruning** 
    18651837 
    18661838    .. attribute:: worst_acceptable 
    18671839 
    1868         Used in pre-pruning, sets the lowest required attribute 
    1869         score. If the score of the best attribute is below this margin, the 
    1870         tree at that node is not grown further (default: 0). 
    1871  
    1872         So, to allow splitting only when gain ratio (the default measure) 
    1873         is greater than 0.6, set ``worst_acceptable=0.6``. 
     1840        The lowest required feature score. If the score of the best 
     1841        feature is below this margin, the tree is not grown further 
     1842        (default: 0). 
    18741843 
    18751844    .. attribute:: min_subset 
    18761845 
    1877         The smalles number of instances in non-null leaves (default: 0). 
     1846        The lowest required number of instances in non-null leaves (default: 0). 
    18781847 
    18791848    .. attribute:: min_instances 
    18801849 
    18811850        Data subsets with less than :obj:`min_instances` 
    1882         instances are not split any further, that is, all leaves in the tree 
     1851        instances are not split any further. Therefore, all leaves in the tree 
    18831852        will contain at least that many instances (default: 0). 
    18841853 
    18851854    .. attribute:: max_depth 
    18861855 
    1887         Gives maximal tree depth;  0 means that only root is generated.  
     1856        Maximal tree depth. If 0, only root is generated.  
    18881857        The default is 100.  
    18891858 
     
    19271896        contingencies and instances in :class:`Node`, and whether the 
    19281897        :obj:`Node.node_classifier` should be build for internal nodes 
    1929         also (it is needed by the :obj:`Descender` or for pruning). 
     1898        also (it is needed by the :obj:`Descender` or for post-pruning). 
    19301899        Not storing distributions but storing contingencies does not 
    19311900        save any memory, since distributions actually points to the 
Note: See TracChangeset for help on using the changeset viewer.