Changeset 7739:369736292db8 in orange


Ignore:
Timestamp:
03/14/11 16:45:03 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
e919c3943f4f42a7b444475b0cea7eb9db96926b
Message:

Made python wrappers around C45Learner + Classfiers (this enables to implement dump() as a method of the classifier).

Location:
orange
Files:
1 added
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r7738 r7739  
    17431743gives the same results as the original. 
    17441744 
    1745 .. class:: C45Learner 
    1746  
    1747     :class:`C45Learner`'s attributes have double names - those that 
    1748     you know from C4.5 command lines and the corresponding names of C4.5's 
    1749     internal variables. All defaults are set as in C4.5; if you change 
    1750     nothing, you are running C4.5. 
    1751  
    1752     .. attribute:: gainRatio (g) 
    1753          
    1754         Determines whether to use information gain (false>, default) 
    1755         or gain ratio for selection of attributes (true). 
    1756  
    1757     .. attribute:: batch (b) 
    1758  
    1759         Turn on batch mode (no windows, no iterations); this option is 
    1760         not documented in C4.5 manuals. It conflicts with "window", 
    1761         "increment" and "trials". 
    1762  
    1763     .. attribute:: subset (s) 
    1764          
    1765         Enables subsetting (default: false, no subsetting), 
    1766   
    1767     .. attribute:: probThresh (p) 
    1768  
    1769         Probabilistic threshold for continuous attributes (default: false). 
    1770  
    1771     .. attribute:: minObjs (m) 
    1772          
    1773         Minimal number of objects (examples) in leaves (default: 2). 
    1774  
    1775     .. attribute:: window (w) 
    1776  
    1777         Initial windows size (default: maximum of 20% and twice the 
    1778         square root of the number of data objects). 
    1779  
    1780     .. attribute:: increment (i) 
    1781  
    1782         The maximum number of objects that can be added to the window 
    1783         at each iteration (default: 20% of the initial window size). 
    1784  
    1785     .. attribute:: cf (c) 
    1786  
    1787         Prunning confidence level (default: 25%). 
    1788  
    1789     .. attribute:: trials (t) 
    1790  
    1791         Set the number of trials in iterative (i.e. non-batch) mode (default: 10). 
    1792  
    1793     .. attribute:: prune 
    1794          
    1795         Return pruned tree (not an original C4.5 option) (default: true) 
    1796  
    1797  
    1798 :class:`C45Learner` also offers another way for setting 
    1799 the arguments: it provides a function :obj:`C45Learner.commandLine` 
    1800 which is given a string and parses it the same way as C4.5 would 
    1801 parse its command line.  
    1802  
    1803 .. class:: C45Classifier 
    1804  
    1805     A faithful reimplementation of Quinlan's function from C4.5. The only 
    1806     difference (and the only reason it's been rewritten) is that it uses 
    1807     a tree composed of :class:`C45Node` instead of C4.5's 
    1808     original tree structure. 
    1809  
    1810     .. attribute:: tree 
    1811  
    1812         C4.5 tree stored as a tree of :obj:`C45Node`. 
    1813  
     1745.. autoclass:: C45Learner 
     1746    :members: 
     1747 
     1748.. autoclass:: C45Classifier 
     1749    :members: 
    18141750 
    18151751.. class:: C45Node 
     
    18911827:: 
    18921828 
    1893     lrn = Orange.classification.tree..C45Learner() 
     1829    lrn = Orange.classification.tree.C45Learner() 
    18941830    lrn.commandline("-m 1 -s") 
    18951831    tree = lrn(data) 
     
    19171853case when only a single value goes into the branch. 
    19181854 
    1919 Printing out C45 Tree 
    1920 ===================== 
    1921  
    1922 .. autofunction:: printTreeC45 
    1923  
    19241855References 
    19251856========== 
     
    19391870     TreeLearner as _TreeLearner, \ 
    19401871         TreeClassifier as _TreeClassifier, \ 
    1941          C45Learner, \ 
    1942          C45Classifier, \ 
     1872         C45Learner as _C45Learner, \ 
     1873         C45Classifier as _C45Classifier, \ 
    19431874         C45TreeNode as C45Node, \ 
    19441875         C45TreeNodeList as C45NodeList, \ 
     
    19791910import Orange.data 
    19801911import Orange.feature.scoring 
    1981 import Orange.classification.tree 
     1912 
     1913class C45Learner(Orange.classification.Learner): 
     1914    """ 
     1915    :class:`C45Learner`'s attributes have double names - those that 
     1916    you know from C4.5 command lines and the corresponding names of C4.5's 
     1917    internal variables. All defaults are set as in C4.5; if you change 
     1918    nothing, you are running C4.5. 
     1919 
     1920    .. attribute:: gainRatio (g) 
     1921         
     1922        Determines whether to use information gain (false, default) 
     1923        or gain ratio for selection of attributes (true). 
     1924 
     1925    .. attribute:: batch (b) 
     1926 
     1927        Turn on batch mode (no windows, no iterations); this option is 
     1928        not documented in C4.5 manuals. It conflicts with "window", 
     1929        "increment" and "trials". 
     1930 
     1931    .. attribute:: subset (s) 
     1932         
     1933        Enables subsetting (default: false, no subsetting), 
     1934  
     1935    .. attribute:: probThresh (p) 
     1936 
     1937        Probabilistic threshold for continuous attributes (default: false). 
     1938 
     1939    .. attribute:: minObjs (m) 
     1940         
     1941        Minimal number of objects (examples) in leaves (default: 2). 
     1942 
     1943    .. attribute:: window (w) 
     1944 
     1945        Initial windows size (default: maximum of 20% and twice the 
     1946        square root of the number of data objects). 
     1947 
     1948    .. attribute:: increment (i) 
     1949 
     1950        The maximum number of objects that can be added to the window 
     1951        at each iteration (default: 20% of the initial window size). 
     1952 
     1953    .. attribute:: cf (c) 
     1954 
     1955        Prunning confidence level (default: 25%). 
     1956 
     1957    .. attribute:: trials (t) 
     1958 
     1959        Set the number of trials in iterative (i.e. non-batch) mode (default: 10). 
     1960 
     1961    .. attribute:: prune 
     1962         
     1963        Return pruned tree (not an original C4.5 option) (default: true) 
     1964    """ 
     1965 
     1966    def __new__(cls, instances = None, weightID = 0, **argkw): 
     1967        self = Orange.classification.Learner.__new__(cls, **argkw) 
     1968        if instances: 
     1969            self.__init__(**argkw) 
     1970            return self.__call__(instances, weightID) 
     1971        else: 
     1972            return self 
     1973         
     1974    def __init__(self, **kwargs): 
     1975        self.base = _C45Learner(**kwargs) 
     1976 
     1977    def __setattr__(self, name, value): 
     1978        if name != "base" and name in self.base.__dict__: 
     1979            self.base.__dict__[name] = value 
     1980        else: 
     1981            self.__dict__["base"] = value 
     1982 
     1983    def __getattr(self, name): 
     1984        if name != "base": 
     1985            return self.base.__dict__[name] 
     1986        else: 
     1987            return self.base 
     1988 
     1989    def __call__(self, *args, **kwargs): 
     1990        return C45Classifier(self.base(*args, **kwargs)) 
     1991 
     1992    def commandline(self, ln): 
     1993        """ 
     1994        Set the arguments with a C4.5 command line. 
     1995        """ 
     1996        self.base.commandline(ln) 
     1997     
     1998  
     1999class C45Classifier(Orange.classification.Classifier): 
     2000    """ 
     2001    A faithful reimplementation of Quinlan's function from C4.5. The only 
     2002    difference (and the only reason it's been rewritten) is that it uses 
     2003    a tree composed of :class:`C45Node` instead of C4.5's 
     2004    original tree structure. 
     2005 
     2006    .. attribute:: tree 
     2007 
     2008        C4.5 tree stored as a tree of :obj:`C45Node`. 
     2009    """ 
     2010 
     2011    def __init__(self, baseClassifier): 
     2012        self.nativeClassifier = baseClassifier 
     2013        for k, v in self.nativeClassifier.__dict__.items(): 
     2014            self.__dict__[k] = v 
     2015   
     2016    def __call__(self, instance, result_type=Orange.classification.Classifier.GetValue, 
     2017                 *args, **kwdargs): 
     2018        """Classify a new instance. 
     2019         
     2020        :param instance: instance to be classified. 
     2021        :type instance: :class:`Orange.data.Instance` 
     2022        :param result_type:  
     2023              :class:`Orange.classification.Classifier.GetValue` or \ 
     2024              :class:`Orange.classification.Classifier.GetProbabilities` or 
     2025              :class:`Orange.classification.Classifier.GetBoth` 
     2026         
     2027        :rtype: :class:`Orange.data.Value`,  
     2028              :class:`Orange.statistics.Distribution` or a tuple with both 
     2029        """ 
     2030        return self.nativeClassifier(instance, result_type, *args, **kwdargs) 
     2031 
     2032    def __setattr__(self, name, value): 
     2033        if name == "nativeClassifier": 
     2034            self.__dict__[name] = value 
     2035            return 
     2036        if name in self.nativeClassifier.__dict__: 
     2037            self.nativeClassifier.__dict__[name] = value 
     2038        self.__dict__[name] = value 
     2039     
     2040    def dump(self):   
     2041        """ 
     2042        Prints the tree given as an argument in the same form as Ross Quinlan's  
     2043        C4.5 program. 
     2044 
     2045        :: 
     2046 
     2047            import Orange 
     2048 
     2049            data = Orange.data.Table("voting") 
     2050            c45 = Orange.classification.tree.C45Learner(data) 
     2051            Orange.classification.tree.printTreeC45(c45) 
     2052 
     2053        will print out 
     2054 
     2055        :: 
     2056 
     2057            physician-fee-freeze = n: democrat (253.4) 
     2058            physician-fee-freeze = y: 
     2059            |   synfuels-corporation-cutback = n: republican (145.7) 
     2060            |   synfuels-corporation-cutback = y: 
     2061            |   |   mx-missile = y: democrat (6.0) 
     2062            |   |   mx-missile = n: 
     2063            |   |   |   adoption-of-the-budget-resolution = n: republican (22.6) 
     2064            |   |   |   adoption-of-the-budget-resolution = y: 
     2065            |   |   |   |   anti-satellite-test-ban = n: democrat (5.0) 
     2066            |   |   |   |   anti-satellite-test-ban = y: republican (2.2) 
     2067 
     2068 
     2069        If you run the original C4.5 (that is, the standalone C4.5 - Orange does use the original C4.5) on the same data, it will print out 
     2070 
     2071        :: 
     2072 
     2073            physician-fee-freeze = n: democrat (253.4/5.9) 
     2074            physician-fee-freeze = y: 
     2075            |   synfuels-corporation-cutback = n: republican (145.7/6.2) 
     2076            |   synfuels-corporation-cutback = y: 
     2077            |   |   mx-missile = y: democrat (6.0/2.4) 
     2078            |   |   mx-missile = n: 
     2079            |   |   |   adoption-of-the-budget-resolution = n: republican (22.6/5.2) 
     2080            |   |   |   adoption-of-the-budget-resolution = y: 
     2081            |   |   |   |   anti-satellite-test-ban = n: democrat (5.0/1.2) 
     2082            |   |   |   |   anti-satellite-test-ban = y: republican (2.2/1.0) 
     2083 
     2084        which is adoringly similar, except that C4.5 tested the tree on  
     2085        the learning data and has also printed out the number of errors  
     2086        in each node - something which :obj:`c45_printTree` obviously can't do 
     2087        (nor is there any need it should). 
     2088 
     2089        """ 
     2090        return  _c45_printTree0(self.tree, self.classVar, 0) 
    19822091 
    19832092def _c45_showBranch(node, classvar, lev, i): 
    19842093    var = node.tested 
     2094    str_ = "" 
    19852095    if node.nodeType == 1: 
    1986         print ("\n"+"|   "*lev + "%s = %s:") % (var.name, var.values[i]), 
    1987         _c45_printTree0(node.branch[i], classvar, lev+1) 
     2096        str_ += "\n"+"|   "*lev + "%s = %s:" % (var.name, var.values[i]) 
     2097        str_ += _c45_printTree0(node.branch[i], classvar, lev+1) 
    19882098    elif node.nodeType == 2: 
    1989         print ("\n"+"|   "*lev + "%s %s %.1f:") % (var.name, ["<=", ">"][i], node.cut), 
    1990         _c45_printTree0(node.branch[i], classvar, lev+1) 
     2099        str_ += "\n"+"|   "*lev + "%s %s %.1f:" % (var.name, ["<=", ">"][i], node.cut) 
     2100        str_ += _c45_printTree0(node.branch[i], classvar, lev+1) 
    19912101    else: 
    19922102        inset = filter(lambda a:a[1]==i, enumerate(node.mapping)) 
    19932103        inset = [var.values[j[0]] for j in inset] 
    19942104        if len(inset)==1: 
    1995             print ("\n"+"|   "*lev + "%s = %s:") % (var.name, inset[0]), 
     2105            str_ += "\n"+"|   "*lev + "%s = %s:" % (var.name, inset[0]) 
    19962106        else: 
    1997             print ("\n"+"|   "*lev + "%s in {%s}:") % (var.name, ", ".join(inset)), 
    1998         _c45_printTree0(node.branch[i], classvar, lev+1) 
     2107            str_ +=  "\n"+"|   "*lev + "%s in {%s}:" % (var.name, ", ".join(inset)) 
     2108        str_ += _c45_printTree0(node.branch[i], classvar, lev+1) 
     2109    return str_ 
    19992110         
    20002111         
    20012112def _c45_printTree0(node, classvar, lev): 
    20022113    var = node.tested 
     2114    str_ = "" 
    20032115    if node.nodeType == 0: 
    2004         print "%s (%.1f)" % (classvar.values[int(node.leaf)], node.items), 
     2116        str_ += "%s (%.1f)" % (classvar.values[int(node.leaf)], node.items)  
    20052117    else: 
    20062118        for i, branch in enumerate(node.branch): 
    20072119            if not branch.nodeType: 
    2008                 _c45_showBranch(node, classvar, lev, i) 
     2120                str_ += _c45_showBranch(node, classvar, lev, i) 
    20092121        for i, branch in enumerate(node.branch): 
    20102122            if branch.nodeType: 
    2011                 _c45_showBranch(node, classvar, lev, i) 
    2012  
    2013 def printTreeC45(tree): 
    2014     """ 
    2015     Prints the tree given as an argument in the same form as Ross Quinlan's  
    2016     C4.5 program. 
    2017  
    2018     :: 
    2019  
    2020         import Orange 
    2021  
    2022         data = Orange.data.Table("voting") 
    2023         c45 = Orange.classification.tree.C45Learner(data) 
    2024         Orange.classification.tree.printTreeC45(c45) 
    2025  
    2026     will print out 
    2027  
    2028     :: 
    2029  
    2030         physician-fee-freeze = n: democrat (253.4) 
    2031         physician-fee-freeze = y: 
    2032         |   synfuels-corporation-cutback = n: republican (145.7) 
    2033         |   synfuels-corporation-cutback = y: 
    2034         |   |   mx-missile = y: democrat (6.0) 
    2035         |   |   mx-missile = n: 
    2036         |   |   |   adoption-of-the-budget-resolution = n: republican (22.6) 
    2037         |   |   |   adoption-of-the-budget-resolution = y: 
    2038         |   |   |   |   anti-satellite-test-ban = n: democrat (5.0) 
    2039         |   |   |   |   anti-satellite-test-ban = y: republican (2.2) 
    2040  
    2041  
    2042     If you run the original C4.5 (that is, the standalone C4.5 - Orange does use the original C4.5) on the same data, it will print out 
    2043  
    2044     :: 
    2045  
    2046         physician-fee-freeze = n: democrat (253.4/5.9) 
    2047         physician-fee-freeze = y: 
    2048         |   synfuels-corporation-cutback = n: republican (145.7/6.2) 
    2049         |   synfuels-corporation-cutback = y: 
    2050         |   |   mx-missile = y: democrat (6.0/2.4) 
    2051         |   |   mx-missile = n: 
    2052         |   |   |   adoption-of-the-budget-resolution = n: republican (22.6/5.2) 
    2053         |   |   |   adoption-of-the-budget-resolution = y: 
    2054         |   |   |   |   anti-satellite-test-ban = n: democrat (5.0/1.2) 
    2055         |   |   |   |   anti-satellite-test-ban = y: republican (2.2/1.0) 
    2056  
    2057     which is adoringly similar, except that C4.5 tested the tree on  
    2058     the learning data and has also printed out the number of errors  
    2059     in each node - something which :obj:`c45_printTree` obviously can't do 
    2060     (nor is there any need it should). 
    2061  
    2062     """ 
    2063     _c45_printTree0(tree.tree, tree.classVar, 0) 
     2123                str_ += _c45_showBranch(node, classvar, lev, i) 
     2124    return str_ 
     2125 
     2126def _printTreeC45(tree): 
     2127    print _c45_printTree0(tree.tree, tree.classVar, 0) 
    20642128 
    20652129 
     
    23552419        return learner 
    23562420 
     2421# 
    23572422# the following is for the output 
    2358  
     2423# 
    23592424 
    23602425fs = r"(?P<m100>\^?)(?P<fs>(\d*\.?\d*)?)" 
  • orange/orngC45.py

    r7414 r7739  
    1 from Orange.classification.tree import printTreeC45 as printTree 
     1from Orange.classification.tree import _printTreeC45 as printTree 
Note: See TracChangeset for help on using the changeset viewer.