Ignore:
Timestamp:
03/02/11 13:19:22 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
c291da2078664eda697cccf093625ac47ee00d26
Message:

Some documentation update.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r7713 r7718  
    3030    :members: 
    3131 
    32  
    33 For a bit more complex example, here's how to write your own stop 
    34 function. The example itself is more funny than useful. It constructs 
     32======== 
     33Examples 
     34======== 
     35 
     36For example, here's how to write your own stop 
     37function. The example constructs 
    3538and prints two trees. For the first one we define the *defStop* 
    3639function, which is used by default, and combine it with a random function 
    37 so that the stop criteria will also be met in additional 20% of the cases 
    38 when *defStop* is false. The second tree is build such that it 
    39 considers only the random function as the stopping criteria. Note that in 
     40so that the stop criteria will also be met in 20% of the cases 
     41when *defStop* is false. For the second tree the stopping criteria 
     42is random. Note that in 
    4043the second case lambda function still has three parameters, since this is 
    4144a necessary number of parameters for the stop function (:obj:`StopCriteria`). 
     
    4952The output is not shown here since the resulting trees are rather 
    5053big. 
     54 
     55Tree Structure 
     56============== 
     57 
     58To have something to work on, we'll take the data from lenses dataset  
     59and build a tree using the default components (part of `treestructure.py`_, uses `lenses.tab`_): 
     60 
     61.. literalinclude:: code/treestructure.py 
     62   :lines: 7-10 
     63 
     64How big is our tree (part of `treestructure.py`_, uses `lenses.tab`_)? 
     65 
     66.. _lenses.tab: code/lenses.tab 
     67.. _treestructure.py: code/treestructure.py 
     68 
     69.. literalinclude:: code/treestructure.py 
     70   :lines: 12-21 
     71 
     72If node is None, we have a null-node; null nodes don't count,  
     73so we return 0. Otherwise, the size is 1 (this node) plus the 
     74sizes of all subtrees. The node is an internal node if it has a  
     75:obj:`branchSelector`; it there's no selector, it's a leaf. Don't 
     76attempt to skip the if statement: leaves don't have an empty list  
     77of branches, they don't have a list of branches at all. 
     78 
     79    >>> treeSize(treeClassifier.tree) 
     80    10 
     81 
     82Don't forget that this was only an excercise - :obj:`Node` has a  
     83built-in method :obj:`Node.treeSize` that does exactly the same. 
     84 
     85Let us now write a script that prints out a tree. The recursive 
     86part of the function will get a node and its level  
     87(part of `treestructure.py`_, uses `lenses.tab`_). 
     88 
     89.. literalinclude:: code/treestructure.py 
     90   :lines: 26-41 
     91 
     92Don't waste time on studying formatting tricks (\n's etc.), this is just 
     93for nicer output. What matters is everything but the print statements. 
     94As first, we check whether the node is a null-node (a node to which no 
     95learning examples were classified). If this is so, we just print out 
     96"<null node>" and return. 
     97 
     98After handling null nodes, remaining nodes are internal nodes and leaves. 
     99For internal nodes, we print a node description consisting of the 
     100attribute's name and distribution of classes. :obj:`Node`'s branch 
     101description is, for all currently defined splits, an instance of a 
     102class derived from :obj:`orange.Classifier` (in fact, it is a 
     103:obj:`orange.ClassifierFromVarFD`, but a :obj:`orange.Classifier` would  
     104suffice), and its :obj:`classVar` points to the attribute we seek.  
     105So we print its name. We will also assume that storing class distributions  
     106has not been disabled and print them as well.   
     107Then we iterate  
     108through branches; for each we print a branch description and iteratively  
     109call the :obj:`printTree0` with a level increased by 1 (to increase  
     110the indent). 
     111 
     112Finally, if the node is a leaf, we print out the distribution of  
     113learning examples in the node and the class to which the examples in  
     114the node would be classified. We again assume that the :obj:`nodeClassifier`  
     115is the default one - a :obj:`DefaultClassifier`. A better print  
     116function should be aware of possible alternatives. 
     117 
     118Now, we just need to write a simple function to call our printTree0.  
     119We could write something like... 
     120 
     121:: 
     122 
     123    def printTree(x): 
     124        printTree0(x.tree, 0) 
     125 
     126... but we won't. Let us learn how to handle arguments of different 
     127types. Let's write a function that will accept either a  
     128:obj:`TreeClassifier` or a :obj:`Node`. 
     129Part of `treestructure.py`_, uses `lenses.tab`_. 
     130 
     131.. literalinclude:: code/treestructure.py 
     132   :lines: 43-49 
     133 
     134It's fairly straightforward: if :obj:`x` is of type derived from  
     135:obj:`TreeClassifier`, we print :obj:`x.tree`; if it's  
     136:obj:`Node` we just call :obj:`printTree0` with :obj:`x`. If it's  
     137of some other type, we don't know how to handle it and thus raise  
     138an exception. The output:: 
     139 
     140    >>> printTree(treeClassifier) 
     141    tear_rate (<15.000, 5.000, 4.000>) 
     142    : reduced --> none (<12.000, 0.000, 0.000>) 
     143    : normal 
     144       astigmatic (<3.000, 5.000, 4.000>) 
     145       : no 
     146          age (<1.000, 5.000, 0.000>) 
     147          : young --> soft (<0.000, 2.000, 0.000>) 
     148          : pre-presbyopic --> soft (<0.000, 2.000, 0.000>) 
     149          : presbyopic --> none (<1.000, 1.000, 0.000>) 
     150       : yes 
     151          prescription (<2.000, 0.000, 4.000>) 
     152          : myope --> hard (<0.000, 0.000, 3.000>) 
     153          : hypermetrope --> none (<2.000, 0.000, 1.000>) 
     154 
     155For a final exercise, let us write a simple pruning function. It will  
     156be written entirely in Python, unrelated to any :obj:`Pruner`. It 
     157will limit the maximal tree depth (the number of internal nodes on any 
     158path down the tree) given as an argument. 
     159For example, to get a two-level tree, we would 
     160call cutTree(root, 2). The function will be recursive, with the second  
     161argument (level) decreasing at each call; when zero, the current node  
     162will be made a leaf (part of `treestructure.py`_, uses `lenses.tab`_): 
     163 
     164.. literalinclude:: code/treestructure.py 
     165   :lines: 54-62 
     166 
     167There's nothing to prune at null-nodes or leaves, so we act only when  
     168:obj:`node` and :obj:`node.branchSelector` are defined. If level is  
     169not zero, we call the function for each branch. Otherwise, we clear  
     170the selector, branches and branch descriptions. 
     171 
     172    >>> cutTree(tree.tree, 2) 
     173    >>> printTree(tree) 
     174    tear_rate (<15.000, 5.000, 4.000>) 
     175    : reduced --> none (<12.000, 0.000, 0.000>) 
     176    : normal 
     177       astigmatic (<3.000, 5.000, 4.000>) 
     178       : no --> soft (<1.000, 5.000, 0.000>) 
     179       : yes --> hard (<2.000, 0.000, 4.000>) 
     180 
     181Learning 
     182======== 
     183 
     184You could just call :class:`TreeLearner` and let it fill the empty 
     185slots with the default 
     186components. This section will teach you three things: what are the 
     187missing components (and how to set the same components yourself), 
     188how to use alternative components to get a different tree and, 
     189finally, how to write a skeleton for tree induction in Python. 
     190 
     191.. _treelearner.py: code/treelearner.py 
     192 
     193Let us construct a :obj:`TreeLearner` to play with (treelearner.py`_, uses `lenses.tab`_): 
     194 
     195.. literalinclude:: code/treelearner.py 
     196   :lines: 7-10 
     197 
     198There are three crucial components in learning: the split and stop 
     199criteria, and the :obj:`exampleSplitter` YY (there are some others, 
     200which become important during classification; we'll talk about them 
     201later). They are not defined; if you use the learner, the slots are 
     202filled temporarily but later cleared again. 
     203 
     204FIXME: the following example is not true anymore. 
     205 
     206:: 
     207 
     208    >>> print learner.split 
     209    None 
     210    >>> learner(data) 
     211    <TreeClassifier instance at 0x01F08760> 
     212    >>> print learner.split 
     213    None 
     214 
     215Stopping criteria 
     216----------------- 
     217 
     218The stop is trivial. The default is set by 
     219 
     220:: 
     221 
     222    >>> learner.stop = Orange.classification.tree.StopCriteria_common() 
     223 
     224Well, this is actually done in C++ and it uses a global component 
     225that is constructed once for all, but apart from that we did 
     226effectively the same thing. 
     227 
     228We can now examine the default stopping parameters. 
     229 
     230    >>> print learner.stop.maxMajority, learner.stop.minExamples 
     231    1.0 0.0 
     232 
     233Not very restrictive. This keeps splitting the examples until 
     234there's nothing left to split or all the examples are in the same 
     235class. Let us set the minimal subset that we allow to be split to 
     236five examples and see what comes out. 
     237 
     238    >>> learner.stop.minExamples = 5.0 
     239    >>> tree = learner(data) 
     240    >>> print tree.dump() 
     241    tear_rate=reduced: none (100.00%) 
     242    tear_rate=normal 
     243    |    astigmatic=no 
     244    |    |    age=pre-presbyopic: soft (100.00%) 
     245    |    |    age=presbyopic: none (50.00%) 
     246    |    |    age=young: soft (100.00%) 
     247    |    astigmatic=yes 
     248    |    |    prescription=hypermetrope: none (66.67%) 
     249    |    |    prescription=myope: hard (100.00%) 
     250 
     251OK, that's better. If we want an even smaller tree, we can also limit 
     252the maximal proportion of majority class. 
     253 
     254    >>> learner.stop.maxMajority = 0.5 
     255    >>> tree = learner(data) 
     256    >>> print tree.dump() 
     257    none (62.50%) 
     258 
    51259 
    52260================= 
     
    15021710        Parameter m for m-estimation. 
    15031711 
    1504 ======== 
    1505 Examples 
    1506 ======== 
    1507  
    1508 This page does not provide examples for programming your own components,  
    1509 such as, for instance, a :obj:`SplitConstructor`. Those examples 
    1510 can be found on a page dedicated to callbacks to Python XXXXXXXX. 
    1511  
    1512 Tree Structure 
    1513 ============== 
    1514  
    1515 To have something to work on, we'll take the data from lenses dataset  
    1516 and build a tree using the default components (part of `treestructure.py`_, uses `lenses.tab`_): 
    1517  
    1518 .. literalinclude:: code/treestructure.py 
    1519    :lines: 7-10 
    1520  
    1521 How big is our tree (part of `treestructure.py`_, uses `lenses.tab`_)? 
    1522  
    1523 .. _lenses.tab: code/lenses.tab 
    1524 .. _treestructure.py: code/treestructure.py 
    1525  
    1526 .. literalinclude:: code/treestructure.py 
    1527    :lines: 12-21 
    1528  
    1529 If node is None, we have a null-node; null nodes don't count,  
    1530 so we return 0. Otherwise, the size is 1 (this node) plus the 
    1531 sizes of all subtrees. The node is an internal node if it has a  
    1532 :obj:`branchSelector`; it there's no selector, it's a leaf. Don't 
    1533 attempt to skip the if statement: leaves don't have an empty list  
    1534 of branches, they don't have a list of branches at all. 
    1535  
    1536     >>> treeSize(treeClassifier.tree) 
    1537     10 
    1538  
    1539 Don't forget that this was only an excercise - :obj:`Node` has a  
    1540 built-in method :obj:`Node.treeSize` that does exactly the same. 
    1541  
    1542 Let us now write a simple script that prints out a tree. The recursive 
    1543 part of the function will get a node and its level (part of `treestructure.py`_, uses `lenses.tab`_). 
    1544  
    1545 .. literalinclude:: code/treestructure.py 
    1546    :lines: 26-41 
    1547  
    1548 Don't waste time on studying formatting tricks (\n's etc.), this is just 
    1549 for nicer output. What matters is everything but the print statements. 
    1550 As first, we check whether the node is a null-node (a node to which no 
    1551 learning examples were classified). If this is so, we just print out 
    1552 "<null node>" and return. 
    1553  
    1554 After handling null nodes, remaining nodes are internal nodes and leaves. 
    1555 For internal nodes, we print a node description consisting of the 
    1556 attribute's name and distribution of classes. :obj:`Node`'s branch 
    1557 description is, for all currently defined splits, an instance of a 
    1558 class derived from :obj:`orange.Classifier` (in fact, it is a 
    1559 :obj:`orange.ClassifierFromVarFD`, but a :obj:`orange.Classifier` would  
    1560 suffice), and its :obj:`classVar` XXXXX points to the attribute we seek.  
    1561 So we print its name. We will also assume that storing class distributions  
    1562 has not been disabled and print them as well. A more able function for  
    1563 printing trees (:meth:`TreeClassifier.dump`) has an alternative  
    1564 means to get the distribution, when this fails. Then we iterate  
    1565 through branches; for each we print a branch description and iteratively  
    1566 call the :obj:`printTree0` with a level increased by 1 (to increase  
    1567 the indent). 
    1568  
    1569 Finally, if the node is a leaf, we print out the distribution of  
    1570 learning examples in the node and the class to which the examples in  
    1571 the node would be classified. We again assume that the :obj:`nodeClassifier`  
    1572 is the default one - a :obj:`DefaultClassifier`. A better print  
    1573 function should be aware of possible alternatives. 
    1574  
    1575 Now, we just need to write a simple function to call our printTree0.  
    1576 We could write something like... 
    1577  
    1578 :: 
    1579  
    1580     def printTree(x): 
    1581         printTree0(x.tree, 0) 
    1582  
    1583 ... but we won't. Let us learn how to handle arguments of different 
    1584 types. Let's write a function that will accept either a :obj:`TreeClassifier` 
    1585 or a :obj:`Node`; just like :obj:`Pruner`, remember? Part of `treestructure.py`_, uses `lenses.tab`_. 
    1586  
    1587 .. literalinclude:: code/treestructure.py 
    1588    :lines: 43-49 
    1589  
    1590 It's fairly straightforward: if :obj:`x` is of type derived from  
    1591 :obj:`TreeClassifier`, we print :obj:`x.tree`; if it's  
    1592 :obj:`Node` we just call :obj:`printTree0` with :obj:`x`. If it's  
    1593 of some other type, we don't know how to handle it and thus raise  
    1594 an exception. (Note that we could also use  
    1595  
    1596 :: 
    1597  
    1598     if isinstance(x, Orange.classification.tree.TreeClassifier) 
    1599  
    1600 but this would only work if :obj:`x` would be of type  
    1601 :obj:`TreeClassifier` and not of any derived types. The latter,  
    1602 however, do not exist yet...) 
    1603  
    1604     >>> printTree(treeClassifier) 
    1605     tear_rate (<15.000, 5.000, 4.000>) 
    1606     : reduced --> none (<12.000, 0.000, 0.000>) 
    1607     : normal 
    1608        astigmatic (<3.000, 5.000, 4.000>) 
    1609        : no 
    1610           age (<1.000, 5.000, 0.000>) 
    1611           : young --> soft (<0.000, 2.000, 0.000>) 
    1612           : pre-presbyopic --> soft (<0.000, 2.000, 0.000>) 
    1613           : presbyopic --> none (<1.000, 1.000, 0.000>) 
    1614        : yes 
    1615           prescription (<2.000, 0.000, 4.000>) 
    1616           : myope --> hard (<0.000, 0.000, 3.000>) 
    1617           : hypermetrope --> none (<2.000, 0.000, 1.000>) 
    1618  
    1619 For a final exercise, let us write a simple pruning procedure. It will  
    1620 be written entirely in Python, unrelated to any :obj:`Pruner`. Our 
    1621 procedure will limit the tree depth - the maximal depth (here defined 
    1622 as the number of internal nodes on any path down the tree) shall be 
    1623 given as an argument. For example, to get a two-level tree, we would 
    1624 call cutTree(root, 2). The function will be recursive, with the second  
    1625 argument (level) decreasing at each call; when zero, the current node  
    1626 will be made a leaf (part of `treestructure.py`_, uses `lenses.tab`_): 
    1627  
    1628 .. literalinclude:: code/treestructure.py 
    1629    :lines: 54-62 
    1630  
    1631 There's nothing to prune at null-nodes or leaves, so we act only when  
    1632 :obj:`node` and :obj:`node.branchSelector` are defined. If level is  
    1633 not zero, we call the function for each branch. Otherwise, we clear  
    1634 the selector, branches and branch descriptions. 
    1635  
    1636     >>> cutTree(tree.tree, 2) 
    1637     >>> printTree(tree) 
    1638     tear_rate (<15.000, 5.000, 4.000>) 
    1639     : reduced --> none (<12.000, 0.000, 0.000>) 
    1640     : normal 
    1641        astigmatic (<3.000, 5.000, 4.000>) 
    1642        : no --> soft (<1.000, 5.000, 0.000>) 
    1643        : yes --> hard (<2.000, 0.000, 4.000>) 
    1644  
    1645 Learning 
    1646 ======== 
    1647  
    1648 You've already seen a simple example of using a :obj:`TreeLearnerBase`. 
    1649 You can just call it and let it fill the empty slots with the default 
    1650 components. This section will teach you three things: what are the 
    1651 missing components (and how to set the same components yourself), 
    1652 how to use alternative components to get a different tree and, 
    1653 finally, how to write a skeleton for tree induction in Python. 
    1654  
    1655 Default components for TreeLearnerBase 
    1656 ====================================== 
    1657  
    1658 Let us construct a :obj:`TreeLearnerBase` to play with. 
    1659  
    1660 .. _treelearner.py: code/treelearner.py 
    1661  
    1662 `treelearner.py`_, uses `lenses.tab`_: 
    1663  
    1664 .. literalinclude:: code/treelearner.py 
    1665    :lines: 7-10 
    1666  
    1667 There are three crucial components in learning: the split and stop 
    1668 criteria, and the :obj:`exampleSplitter` (there are some others, 
    1669 which become important during classification; we'll talk about them 
    1670 later). They are not defined; if you use the learner, the slots are 
    1671 filled temporarily but later cleared again. 
    1672  
    1673 :: 
    1674  
    1675     >>> print learner.split 
    1676     None 
    1677     >>> learner(data) 
    1678     <TreeClassifier instance at 0x01F08760> 
    1679     >>> print learner.split 
    1680     None 
    1681  
    1682 Stopping criteria 
    1683 ================= 
    1684  
    1685 The stop is trivial. The default is set by 
    1686  
    1687 :: 
    1688     >>> learner.stop = Orange.classification.tree.StopCriteria_common() 
    1689  
    1690 Well, this is actually done in C++ and it uses a global component 
    1691 that is constructed once for all, but apart from that we did 
    1692 effectively the same thing. 
    1693  
    1694 We can now examine the default stopping parameters. 
    1695  
    1696     >>> print learner.stop.maxMajority, learner.stop.minExamples 
    1697     1.0 0.0 
    1698  
    1699 Not very restrictive. This keeps splitting the examples until 
    1700 there's nothing left to split or all the examples are in the same 
    1701 class. Let us set the minimal subset that we allow to be split to 
    1702 five examples and see what comes out. 
    1703  
    1704     >>> learner.stop.minExamples = 5.0 
    1705     >>> tree = learner(data) 
    1706     >>> print tree.dump() 
    1707     tear_rate=reduced: none (100.00%) 
    1708     tear_rate=normal 
    1709     |    astigmatic=no 
    1710     |    |    age=pre-presbyopic: soft (100.00%) 
    1711     |    |    age=presbyopic: none (50.00%) 
    1712     |    |    age=young: soft (100.00%) 
    1713     |    astigmatic=yes 
    1714     |    |    prescription=hypermetrope: none (66.67%) 
    1715     |    |    prescription=myope: hard (100.00%) 
    1716  
    1717 OK, that's better. If we want an even smaller tree, we can also limit 
    1718 the maximal proportion of majority class. 
    1719  
    1720     >>> learner.stop.maxMajority = 0.5 
    1721     >>> tree = learner(data) 
    1722     >>> print tree.dump() 
    1723     none (62.50%) 
    1724  
    17251712Undocumented 
    17261713============ 
     
    19841971 
    19851972.. autofunction:: printTreeC45 
     1973 
     1974TODO 
     1975==== 
     1976 
     1977This page does not provide examples for programming your own components,  
     1978such as, for instance, a :obj:`SplitConstructor`. Those examples 
     1979can be found on a page dedicated to callbacks to Python. 
    19861980 
    19871981References 
     
    21972191 
    21982192        So, to allow splitting only when gainRatio (the default measure) 
    2199         is greater than 0.6, one should run the learner like this: 
    2200         :samp:`l = Orange.classification.tree.TreeLearner(data, worstAcceptable=0.6)` 
     2193        is greater than 0.6, set :samp:`worstAcceptable=0.6`. 
    22012194 
    22022195    .. attribute:: minSubset 
     
    22202213        node exceeds the value set by this parameter(default: 1.0).  
    22212214        To stop the induction as soon as the majority class reaches 70%, 
    2222         you should user :samp:`maxMajority = 0.7`. 
    2223  
    2224         This is an example of the tree on iris data set, with  
    2225         :samp:`maxMajority = 0.7`. The numbers show the majority class  
     2215        you should use :samp:`maxMajority=0.7`, as in the following 
     2216        example. The numbers show the majority class  
    22262217        proportion at each node. The script `tree2.py`_ induces and  
    22272218        prints this tree. 
     
    22832274       
    22842275    def __init__(self, **kw): 
    2285         self.learner = None 
    22862276        self.__dict__.update(kw) 
    22872277       
     
    29592949        """ Prints the tree to a file in a format used by  
    29602950        `GraphViz <http://www.research.att.com/sw/tools/graphviz>`_. 
    2961         Uses the same parameters as :func:`printTxt` defined above 
     2951        Uses the same parameters as :meth:`dump` defined above 
    29622952        plus two parameters which define the shape used for internal 
    29632953        nodes and laves of the tree: 
Note: See TracChangeset for help on using the changeset viewer.