Changeset 8999:90315f44439a in orange


Ignore:
Timestamp:
09/21/11 16:37:18 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
e020d075c7bd75f774142a147d3a52c04e9acfde
Message:

Orange.classification.tree: C45 is now underscore_separated.

Location:
orange
Files:
1 deleted
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r8986 r8999  
    1111 
    1212To build a :obj:`TreeClassifier` from the Iris data set 
    13 (with the depth limited to three levels), use (part of `orngTree1.py`_, 
    14 uses `iris.tab`_): 
     13(with the depth limited to three levels), use: 
    1514 
    1615.. literalinclude:: code/orngTree1.py 
     
    128127is random. Note that in the second case lambda function still has three 
    129128parameters, since this is a necessary number of parameters for the stop 
    130 function (:obj:`StopCriteria`).  Part of `tree3.py`_ (uses  `iris.tab`_): 
     129function (:obj:`StopCriteria`).  
    131130 
    132131.. _tree3.py: code/tree3.py 
     
    142141 
    143142To have something to work on, we'll take the data from lenses dataset and 
    144 build a tree using the default components (part of `treestructure.py`_, 
    145 uses `lenses.tab`_): 
     143build a tree using the default components: 
    146144 
    147145.. literalinclude:: code/treestructure.py 
    148146   :lines: 7-10 
    149147 
    150 How big is our tree (part of `treestructure.py`_, uses `lenses.tab`_)? 
     148How big is our tree? 
    151149 
    152150.. _lenses.tab: code/lenses.tab 
     
    170168 
    171169Let us now write a script that prints out a tree. The recursive part of 
    172 the function will get a node and its level (part of `treestructure.py`_, 
    173 uses `lenses.tab`_). 
     170the function will get a node and its level. 
    174171 
    175172.. literalinclude:: code/treestructure.py 
     
    211208... but we won't. Let us learn how to handle arguments of 
    212209different types. Let's write a function that will accept either a 
    213 :obj:`TreeClassifier` or a :obj:`Node`.  Part of `treestructure.py`_, 
    214 uses `lenses.tab`_. 
     210:obj:`TreeClassifier` or a :obj:`Node`. 
    215211 
    216212.. literalinclude:: code/treestructure.py 
     
    243239tree, we would call cutTree(root, 2). The function will be recursive, 
    244240with the second argument (level) decreasing at each call; when zero, 
    245 the current node will be made a leaf (part of `treestructure.py`_, uses 
    246 `lenses.tab`_): 
     241the current node will be made a leaf: 
    247242 
    248243.. literalinclude:: code/treestructure.py 
     
    274269.. _treelearner.py: code/treelearner.py 
    275270 
    276 Let us construct a :obj:`TreeLearner` to play with (`treelearner.py`_, 
    277 uses `lenses.tab`_): 
     271Let us construct a :obj:`TreeLearner` to play with: 
    278272 
    279273.. literalinclude:: code/treelearner.py 
     
    935929 
    936930We shall build a small tree from the iris data set - we shall limit the 
    937 depth to three levels (part of `orngTree1.py`_, uses `iris.tab`_): 
     931depth to three levels: 
    938932 
    939933.. literalinclude:: code/orngTree1.py 
     
    13071301Let's say with like to print the classification margin for each node, 
    13081302that is, the difference between the proportion of the largest and the 
    1309 second largest class in the node (part of `orngTree2.py`_): 
     1303second largest class in the node: 
    13101304 
    13111305.. _orngTree2.py: code/orngTree2.py 
     
    13581352=========================== 
    13591353 
    1360 As C4.5 is a standard benchmark in machine learning,  
    1361 it is incorporated in Orange, although Orange has its own 
    1362 implementation of decision trees. 
    1363  
    1364 The implementation uses the original Quinlan's code for learning so the 
    1365 tree you get is exactly like the one that would be build by standalone 
    1366 C4.5. Upon return, however, the original tree is copied to Orange 
    1367 components that contain exactly the same information plus what is needed 
    1368 to make them visible from Python. To be sure that the algorithm behaves 
    1369 just as the original, we use a dedicated class :class:`C45Node` 
    1370 instead of reusing the components used by Orange's tree inducer 
    1371 (ie, :class:`Node`). This, however, could be done and probably 
    1372 will be done in the future; we shall still retain :class:`C45Node`  
    1373 but offer transformation to :class:`Node` so that routines 
    1374 that work on Orange trees will also be usable for C45 trees. 
     1354C4.5 is incorporated in Orange because it is a standard benchmark in 
     1355machine learning. The implementation uses the original C4.5 code, so the 
     1356resulting tree is exactly like the one that would be build by standalone 
     1357C4.5. The built tree is made accessible in Python. 
    13751358 
    13761359:class:`C45Learner` and :class:`C45Classifier` behave 
     
    13811364========================= 
    13821365 
    1383 C4.5 is not distributed with Orange, but it can be incorporated as a 
    1384 plug-in. A C compiler is need for the procedure: on Windows MS Visual C 
     1366C4.5 is not distributed with Orange, but it can be added as a 
     1367plug-in. A C compiler is needed for the procedure: on Windows MS Visual C 
    13851368(CL.EXE and LINK.EXE must be on the PATH), on Linux and OS X gcc (OS X 
    13861369users can download it from Apple). 
     
    14731456.. _iris.tab: code/iris.tab 
    14741457 
    1475 The simplest way to use :class:`C45Learner` is to call it. This 
     1458This 
    14761459script constructs the same learner as you would get by calling 
    1477 the usual C4.5 (`tree_c45.py`_, uses `iris.tab`_): 
     1460the usual C4.5: 
    14781461 
    14791462.. literalinclude:: code/tree_c45.py 
    14801463   :lines: 7-14 
    14811464 
    1482 Arguments can be set by the usual mechanism (the below to lines do the 
    1483 same, except that one uses command-line symbols and the other internal 
    1484 variable names) 
    1485  
    1486 :: 
     1465Both C4.5 command-line symbols and variable names can be used. The  
     1466following lines produce the same result:: 
    14871467 
    14881468    tree = Orange.classification.tree.C45Learner(data, m=100) 
    1489     tree = Orange.classification.tree.C45Learner(data, minObjs=100) 
    1490  
    1491 The way that could be prefered by veteran C4.5 user might be through 
    1492 method `:obj:C45Learner.commandline`. 
    1493  
    1494 :: 
     1469    tree = Orange.classification.tree.C45Learner(data, min_objs=100) 
     1470 
     1471A veteran C4.5 might prefer :func:`C45Learner.commandline`:: 
    14951472 
    14961473    lrn = Orange.classification.tree.C45Learner() 
     
    14981475    tree = lrn(data) 
    14991476 
    1500 There's nothing special about using :obj:`C45Classifier` - it's  
    1501 just like any other. To demonstrate what the structure of  
    1502 :class:`C45Node`'s looks like, will show a script that prints  
    1503 it out in the same format as C4.5 does. 
     1477The following script prints out the tree same format as C4.5 does. 
    15041478 
    15051479.. literalinclude:: code/tree_c45_printtree.py 
    15061480 
    1507 Leaves are the simplest. We just print out the value contained 
    1508 in :samp:`node.leaf`. Since this is not a qualified value (ie.,  
    1509 :obj:`C45Node` does not know to which attribute it belongs), we need to 
    1510 convert it to a string through :obj:`class_var`, which is passed as an 
    1511 extra argument to the recursive part of printTree. 
     1481For the leaves just the value in ``node.leaf`` in printed. Since 
     1482:obj:`C45Node` does not know to which attribute it belongs, we need to 
     1483convert it to a string through ``classvar``, which is passed as an extra 
     1484argument to the recursive part of printTree. 
    15121485 
    15131486For discrete splits without subsetting, we print out all attribute values 
    1514 and recursively call the function for all branches. Continuous splits are 
    1515 equally easy to handle. 
    1516  
    1517 For discrete splits with subsetting, we iterate through branches, retrieve 
    1518 the corresponding values that go into each branch to inset, turn 
    1519 the values into strings and print them out, separately treating the 
    1520 case when only a single value goes into the branch. 
     1487and recursively call the function for all branches. Continuous splits 
     1488are equally easy to handle. 
     1489 
     1490For discrete splits with subsetting, we iterate through branches, 
     1491retrieve the corresponding values that go into each branch to inset, 
     1492turn the values into strings and print them out, separately treating 
     1493the case when only a single value goes into the branch. 
    15211494 
    15221495================= 
     
    15561529:obj:`SimpleTreeLearner` is used in much the same way as :obj:`TreeLearner`. 
    15571530A typical example of using :obj:`SimpleTreeLearner` would be to build a random 
    1558 forest (uses `iris.tab`_): 
     1531forest: 
    15591532 
    15601533.. literalinclude:: code/simple_tree_random_forest.py 
     
    16361609    nothing, you are running C4.5. 
    16371610 
    1638     .. attribute:: gainRatio (g) 
     1611    .. attribute:: gain_ratio (g) 
    16391612         
    16401613        Determines whether to use information gain (false, default) 
     
    16511624        Enables subsetting (default: false, no subsetting), 
    16521625  
    1653     .. attribute:: probThresh (p) 
     1626    .. attribute:: prob_thresh (p) 
    16541627 
    16551628        Probabilistic threshold for continuous attributes (default: false). 
    16561629 
    1657     .. attribute:: minObjs (m) 
     1630    .. attribute:: min_objs (m) 
    16581631         
    16591632        Minimal number of objects (examples) in leaves (default: 2). 
     
    16821655    """ 
    16831656 
     1657    _rename_new_old = { "min_objs": "minObjs", "probTresh": "prob_tresh", 
     1658            "gain_ratio": "gainRatio" } 
     1659    #_rename_new_old = {} 
     1660    _rename_old_new = dict((a,b) for b,a in _rename_new_old.items()) 
     1661 
     1662    @classmethod 
     1663    def _rename_dict(cls, dic): 
     1664        return dict((cls._rename_arg(a),b) for a,b in dic.items()) 
     1665 
     1666    @classmethod 
     1667    def _rename_arg(cls, a): 
     1668        if a in cls._rename_old_new: 
     1669            Orange.misc.deprecation_warning(a, cls._rename_old_new[a], stacklevel=4) 
     1670        return cls._rename_new_old.get(a, a) 
     1671 
    16841672    def __new__(cls, instances = None, weightID = 0, **argkw): 
    1685         self = Orange.classification.Learner.__new__(cls, **argkw) 
     1673        self = Orange.classification.Learner.__new__(cls, **cls._rename_dict(argkw)) 
    16861674        if instances: 
    1687             self.__init__(**argkw) 
     1675            self.__init__(**cls._rename_dict(argkw)) 
    16881676            return self.__call__(instances, weightID) 
    16891677        else: 
     
    16911679         
    16921680    def __init__(self, **kwargs): 
    1693         self.base = _C45Learner(**kwargs) 
     1681        self.base = _C45Learner(**self._rename_dict(kwargs)) 
    16941682 
    16951683    def __setattr__(self, name, value): 
    1696         if name != "base" and name in self.base.__dict__: 
    1697             self.base.__dict__[name] = value 
     1684        nn = self._rename_arg(name) 
     1685        if name != "base" and nn in self.base.__dict__: 
     1686            self.base.__dict__[nn] = value 
     1687        elif name == "base": 
     1688            self.__dict__["base"] = value 
    16981689        else: 
    1699             self.__dict__["base"] = value 
    1700  
    1701     def __getattr(self, name): 
     1690            settingAttributesNotSuccessful 
     1691 
     1692    def __getattr__(self, name): 
     1693        nn = self._rename_arg(name) 
    17021694        if name != "base": 
    1703             return self.base.__dict__[name] 
     1695            return self.base.__dict__[nn] 
    17041696        else: 
    17051697            return self.base 
    17061698 
    17071699    def __call__(self, *args, **kwargs): 
    1708         return C45Classifier(self.base(*args, **kwargs)) 
     1700        return C45Classifier(self.base(*args, **self._rename_dict(kwargs))) 
    17091701 
    17101702    def commandline(self, ln): 
     
    17871779 
    17881780 
    1789         The standalone C4.5 would, on the same data, print:: 
     1781        The standalone C4.5 would print:: 
    17901782 
    17911783            physician-fee-freeze = n: democrat (253.4/5.9) 
     
    18001792            |   |   |   |   anti-satellite-test-ban = y: republican (2.2/1.0) 
    18011793 
    1802         4.5 also prints out the number of errors on learning data in 
     1794        C4.5 also prints out the number of errors on learning data in 
    18031795        each node. 
    18041796        """ 
  • orange/doc/Orange/rst/code/tree_c45.py

    r8042 r8999  
    3636print 
    3737 
    38 import orngStat, orngTest 
    39 res = orngTest.crossValidation([Orange.classification.tree.C45Learner(),  
     38 
     39res = Orange.evaluation.testing.cross_validation([Orange.classification.tree.C45Learner(),  
    4040    Orange.classification.tree.C45Learner(convertToOrange=1)], data) 
    41 print "Classification accuracy: %5.3f (converted to tree: %5.3f)" % tuple(orngStat.CA(res)) 
    42 print "Brier score: %5.3f (converted to tree: %5.3f)" % tuple(orngStat.BrierScore(res)) 
     41print "Classification accuracy: %5.3f (converted to tree: %5.3f)" % tuple(Orange.evaluation.scoring.CA(res)) 
     42print "Brier score: %5.3f (converted to tree: %5.3f)" % tuple(Orange.evaluation.scoring.Brier_score(res)) 
Note: See TracChangeset for help on using the changeset viewer.