Changeset 7522:4790e178d5d1 in orange


Ignore:
Timestamp:
02/04/11 20:17:40 (3 years ago)
Author:
markotoplak
Branch:
default
Convert:
d0fbe6f6c76e3e6710d66ee794e36514de273f06
Message:

Cleaned out orngTree.py.

Location:
orange
Files:
1 added
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/tree.py

    r7485 r7522  
    9191        type :obj:`orange.Classifier`, since its job is similar to that 
    9292        of a classifier: it gets an example and returns discrete 
    93         :obj:`orange.Value` in range :samp:`[0, len(branches)-1]`. 
     93        :obj:`Orange.data.Value` in range :samp:`[0, len(branches)-1]`. 
    9494        When an example cannot be classified to any branch, the selector 
    95         can return a :obj:`orange.Value` containing a special value 
     95        can return a :obj:`Orange.data.Value` containing a special value 
    9696        (sVal) which should be a discrete distribution 
    9797        (DiscDistribution). This should represent a 
     
    173173If you'd like to understand how the classification works in C++,  
    174174start reading at :obj:`TTreeClassifier::vote`. It gets a  
    175 :obj:`Node`, an :obj:`orange.Example`> and a distribution of  
     175:obj:`Node`, an :obj:`Orange.data.Instance`> and a distribution of  
    176176vote weights. For each node, it calls the  
    177177:obj:`TTreeClassifier::classDistribution` and then multiplies  
     
    415415        Induces a classifier from examples belonging to a node. The 
    416416        same learner is used for internal nodes and for leaves. The 
    417         default :obj:`nodeLearner` is :obj:`orange.MajorityLearner`. 
     417        default :obj:`nodeLearner` is :obj:`Orange.classification.majority.MajorityLearner`. 
    418418 
    419419    .. attribute:: descender 
     
    484484    the given examples, weight ID and the just computed matrix. If  
    485485    the learner can use the matrix (and the default,  
    486     :obj:`MajorityLearner`, can), it won't touch the examples. Thus, 
     486    :obj:`Orange.classification.majority.MajorityLearner`, can), it won't touch the examples. Thus, 
    487487    a choice of :obj:`contingencyComputer` will, in many cases,  
    488488    affect the :obj:`nodeClassifier`. The :obj:`nodeLearner` can 
     
    554554in Python and have the call operator overloadedd in such a way that it 
    555555is callbacked from C++ code. You can thus program your own components 
    556 for :obj:`orange.TreeLearnerBase` and :obj:`TreeClassifier`. The detailed  
     556for :obj:`TreeLearnerBase` and :obj:`TreeClassifier`. The detailed  
    557557information on how this is done and what can go wrong, is given in a  
    558558separate page, dedicated to callbacks to Python XXXXXXXXXX. 
     
    585585 
    586586    An abstract base class for split constructors that employ  
    587     a :obj:`orange.MeasureAttribute` to assess a quality of a split. At present, 
     587    a :obj:`Orange.feature.scoring.Measure` to assess a quality of a split. At present, 
    588588    all split constructors except for :obj:`SplitConstructor_Combined` 
    589589    are derived from this class. 
     
    591591    .. attribute:: measure 
    592592 
    593         A component of type :obj:`orange.MeasureAttribute` used for 
     593        A component of type :obj:`Orange.feature.scoring.Measure` used for 
    594594        evaluation of a split. Note that you must select the subclass  
    595         :obj:`MeasureAttribute` capable of handling your class type  
    596         - you cannot use :obj:`orange.MeasureAttribute_gainRatio` 
    597         for building regression trees or :obj:`orange.MeasureAttribute_MSE` 
     595        :obj:`Orange.feature.scoring.Measure` capable of handling your class type  
     596        - you cannot use :obj:`Orange.feature.scoring.GainRatio` 
     597        for building regression trees or :obj:`Orange.feature.scoring.MSE` 
    598598        for classification trees. 
    599599 
     
    616616    The constructed :obj:`branchSelector` is an instance of  
    617617    :obj:`orange.ClassifierFromVarFD` that returns a value of the  
    618     selected attribute. If the attribute is :obj:`orange.EnumVariable`, 
     618    selected attribute. If the attribute is :obj:`Orange.data.feature.Discrete`, 
    619619    :obj:`branchDescription`'s are the attribute's values. The  
    620620    attribute is marked as spent, so that it cannot reappear in the  
     
    980980So we print its name. We will also assume that storing class distributions  
    981981has not been disabled and print them as well. A more able function for  
    982 printing trees (as one defined in orngTree XXXXXXXXXX) has an alternative  
     982printing trees (as one defined in XXXXXXXXXX) has an alternative  
    983983means to get the distribution, when this fails. Then we iterate  
    984984through branches; for each we print a branch description and iteratively  
     
    10081008 
    10091009It's fairly straightforward: if :obj:`x` is of type derived from  
    1010 :obj:`orange.TreeClassifier`, we print :obj:`x.tree`; if it's  
     1010:obj:`TreeClassifier`, we print :obj:`x.tree`; if it's  
    10111011:obj:`Node` we just call :obj:`printTree0` with :obj:`x`. If it's  
    10121012of some other type, we don't know how to handle it and thus raise  
     
    10151015:: 
    10161016 
    1017     if type(x) == orange.TreeClassifier: 
     1017    if isinstance(x, Orange.classification.tree.TreeClassifier) 
    10181018 
    10191019but this would only work if :obj:`x` would be of type  
    1020 :obj:`orange.TreeClassifier` and not of any derived types. The latter,  
     1020:obj:`TreeClassifier` and not of any derived types. The latter,  
    10211021however, do not exist yet...) 
    10221022 
     
    11051105 
    11061106:: 
    1107     >>> learner.stop = orange.TreeStopCriteria_common() 
     1107    >>> learner.stop = Orange.classification.tree.StopCriteria_common() 
    11081108 
    11091109Well, this is actually done in C++ and it uses a global component 
     
    12111211#. Run buildC45.py, which will build the plug-in and put it next to  
    12121212   orange.pyd (or orange.so on Linux/Mac). 
    1213 #. Run python, import orange and create create :samp:`orange.C45Learner()`. 
     1213#. Run python, type :samp:`import Orange` and  
     1214   create create :samp:`Orange.classification.tree.C45Learner()`. 
    12141215   If this fails, something went wrong; see the diagnostic messages from 
    12151216   buildC45.py and read the below paragraph. 
     
    13441345        Mapping for nodes of type :obj:`Subset`. Element :samp:`mapping[i]` 
    13451346        gives the index for an example whose value of :obj:`tested` is *i*.  
    1346         Here, *i* denotes an index of value, not a :class:`orange.Value`. 
     1347        Here, *i* denotes an index of value, not a :class:`Orange.data.Value`. 
    13471348 
    13481349    .. attribute:: branch 
     
    13691370:: 
    13701371 
    1371     tree = orange.C45Learner(data, m=100) 
    1372     tree = orange.C45Learner(data, minObjs=100) 
     1372    tree = Orange.classification.tree.C45Learner(data, m=100) 
     1373    tree = Orange.classification.tree.C45Learner(data, minObjs=100) 
    13731374 
    13741375The way that could be prefered by veteran C4.5 user might be through 
     
    13771378:: 
    13781379 
    1379     lrn = orange.C45Learner() 
     1380    lrn = Orange.classification.tree..C45Learner() 
    13801381    lrn.commandline("-m 1 -s") 
    13811382    tree = lrn(data) 
     
    14081409.. autofunction:: printTreeC45 
    14091410 
    1410 =============== 
    1411 orngTree module 
    1412 =============== 
     1411======================= 
     1412orngTree module XXXXXXX 
     1413======================= 
    14131414 
    14141415.. autoclass:: TreeLearner 
     
    15581559 
    15591560We shall build a small tree from the iris data set - we shall limit the 
    1560 depth to three levels. 
    1561  
    1562 <p class="header">part of <a href="orngTree1.py">orngTree1.py</a></p> 
    1563 <xmp class="code">import orange, orngTree 
    1564 data = Orange.data.Table("iris") 
    1565 tree = orngTree.TreeLearner(data, maxDepth=3) 
    1566 </xmp> 
     1561depth to three levels (part of `orngTree1.py`_, uses `iris.tab`_): 
     1562 
     1563.. literalinclude:: code/orngTree1.py 
     1564   :lines: 0-4 
     1565 
     1566.. _orngTree1.py: code/orngTree1.py 
    15671567 
    15681568The easiest way to call the function is to pass the tree as the only  
    15691569argument:: 
    15701570 
    1571     >>> orngTree.printTree(tree) 
     1571    >>> Orange.classification.tree.printTree(tree) 
    15721572    petal width<0.800: Iris-setosa (100.00%) 
    15731573    petal width>=0.800 
     
    15831583in the node:: 
    15841584 
    1585     >>> orngTree.printTree(tree, leafStr="%V (%M out of %N)") 
     1585    >>> Orange.classification.tree.printTree(tree, leafStr="%V (%M out of %N)") 
    15861586    petal width<0.800: Iris-setosa (50.000 out of 50.000) 
    15871587    petal width>=0.800 
     
    16781678:: 
    16791679 
    1680     orngTree.printTree(tree, leafStr="%V", nodeStr=".") 
     1680    Orange.classification.tree.printTree(tree, leafStr="%V", nodeStr=".") 
    16811681     
    16821682says that the nodeStr should be the same as leafStr (not very useful  
     
    17021702of virginicas decreases down the tree:: 
    17031703 
    1704     orngTree.printTree(tree, leafStr='%^.1CbA="Iris-virginica"% (%^.1CbP="Iris-virginica"%)', nodeStr='.') 
     1704    Orange.classification.tree.printTree(tree, leafStr='%^.1CbA="Iris-virginica"% (%^.1CbP="Iris-virginica"%)', nodeStr='.') 
    17051705 
    17061706Let's first interpret the format string: :samp:`CbA="Iris-virginica"` is  
     
    17371737:: 
    17381738 
    1739     >>>orngTree.printTree(tree, leafStr='"%V   %D %.2DbP %.2dbP"', nodeStr='"%D %.2DbP %.2dbP"') 
     1739    >>>Orange.classification.tree.printTree(tree, leafStr='"%V   %D %.2DbP %.2dbP"', nodeStr='"%D %.2DbP %.2dbP"') 
    17401740    root: [50.000, 50.000, 50.000] . . 
    17411741    |    petal width<0.800: [50.000, 0.000, 0.000] [1.00, 0.00, 0.00] [3.00, 0.00, 0.00]: 
     
    1775177590% confidence intervals in the leaves:: 
    17761776 
    1777     >>> orngTree.printTree(tree, leafStr="[SE: %E]\t %V %I(90)", nodeStr="[SE: %E]") 
     1777    >>> Orange.classification.tree.printTree(tree, leafStr="[SE: %E]\t %V %I(90)", nodeStr="[SE: %E]") 
    17781778    root: [SE: 0.409] 
    17791779    |    RM<6.941: [SE: 0.306] 
     
    18051805:attr:`TreeClassifier.nodeClassifier` in a leaf returns.  
    18061806As :samp:`%V` uses the  
    1807 :obj:`orange.FloatVariable`'s function for printing out the value,  
     1807:obj:`Orange.data.feature.Continuous`'s function for printing out the value,  
    18081808therefore the printed number has the same number of decimals  
    18091809as in the data file. 
     
    18151815this number with values in the parent nodes:: 
    18161816 
    1817     >>> orngTree.printTree(tree, leafStr="%C<22 (%cbP<22)", nodeStr=".") 
     1817    >>> Orange.classification.tree.printTree(tree, leafStr="%C<22 (%cbP<22)", nodeStr=".") 
    18181818    root: 277.000 (.) 
    18191819    |    RM<6.941: 273.000 (1.160) 
     
    18421842:: 
    18431843 
    1844     >>> orngTree.printTree(tree, leafStr="%C![20,22] (%^cbP![20,22]%)", nodeStr=".") 
     1844    >>> Orange.classification.tree.printTree(tree, leafStr="%C![20,22] (%^cbP![20,22]%)", nodeStr=".") 
    18451845 
    18461846OK, let's observe the format string for one last time. :samp:`%c![20, 22]` 
     
    20252025         TreeSplitConstructor as SplitConstructor, \ 
    20262026              TreeSplitConstructor_Combined as SplitConstructor_Combined, \ 
    2027               TreeSplitConstructor_Measure as SplitConstructor_Score, \ 
     2027              TreeSplitConstructor_Measure as SplitConstructor_Measure, \ 
    20282028                   TreeSplitConstructor_Attribute as SplitConstructor_Feature, \ 
    20292029                   TreeSplitConstructor_ExhaustiveBinary as SplitConstructor_ExhaustiveBinary, \ 
     
    20342034              TreeStopCriteria_common as StopCriteria_common 
    20352035 
    2036 import orange 
     2036import Orange.core 
    20372037import operator 
    20382038import base64 
    20392039import re 
     2040import Orange.data 
     2041import Orange.feature.scoring 
     2042import Orange.classification.tree 
    20402043 
    20412044def _c45_showBranch(node, classvar, lev, i): 
     
    21222125 
    21232126 
    2124 class TreeLearner(orange.Learner): 
     2127class TreeLearner(Orange.core.Learner): 
    21252128    """ 
    21262129    Assembles the generic classification or regression tree learner  
     
    21402143        Induces a classifier from examples belonging to a node. The 
    21412144        same learner is used for internal nodes and for leaves. The 
    2142         default :obj:`nodeLearner` is :obj:`MajorityLearner`. 
     2145        default :obj:`nodeLearner` is :obj:`Orange.classification.majority.MajorityLearner`. 
    21432146 
    21442147    **Split construction** 
     
    21732176        attributes will be used for splitting of the example set in the node. 
    21742177        Can be either a measure XXXXX or one of 
    2175         "infoGain" (:class:`orange.MeasureAttribute_info`),  
    2176         "gainRatio" (:class:`orange.MeasureAttribute_gainRatio`),  
    2177         "gini" (:class:`orange.MeasureAttribute_gini`), 
    2178         "relief" (:class:`orange.MeasureAttribute_relief`), 
    2179         "retis" (:class: `orange.MeasureAttribute_MSE`). Default: "gainRatio". 
     2178        "infoGain" (:class:`Orange.feature.scoring.InfoGain`),  
     2179        "gainRatio" (:class:`Orange.feature.scoring.GainRatio`),  
     2180        "gini" (:class:`Orange.feature.scoring.Gini`), 
     2181        "relief" (:class:`Orange.feature.scoring.Relief`), 
     2182        "retis" (:class: `Orange.feature.scoring.MSE`). Default: "gainRatio". 
    21802183 
    21812184    .. attribute:: reliefM, reliefK 
     
    21932196        So, to allow splitting only when gainRatio (the default measure) 
    21942197        is greater than 0.6, one should run the learner like this: 
    2195         :samp:`l = orngTree.TreeLearner(data, worstAcceptable=0.6)` 
     2198        :samp:`l = Orange.classification.tree.TreeLearner(data, worstAcceptable=0.6)` 
    21962199 
    21972200    .. attribute:: minSubset 
     
    22162219        to stop the induction as soon as the majority class reaches 70%, 
    22172220        you should say  
    2218         :samp:`tree2 = orngTree.TreeLearner(data, maxMajority=0.7)` 
     2221        :samp:`tree2 = Orange.classification.tree.TreeLearner(data, maxMajority=0.7)` 
    22192222 
    22202223        This is an example of the tree on iris data set, built with 
     
    22692272    """ 
    22702273    def __new__(cls, examples = None, weightID = 0, **argkw): 
    2271         self = orange.Learner.__new__(cls, **argkw) 
     2274        self = Orange.core.Learner.__new__(cls, **argkw) 
    22722275        if examples: 
    22732276            self.__init__(**argkw) 
     
    22932296            self.learner = self.instance() 
    22942297        if not hasattr(self, "split") and not hasattr(self, "measure"): 
    2295             if examples.domain.classVar.varType == orange.VarTypes.Discrete: 
    2296                 measure = orange.MeasureAttribute_gainRatio() 
     2298            if examples.domain.classVar.varType == Orange.data.Type.Discrete: 
     2299                measure = Orange.feature.scoring.GainRatio() 
    22972300            else: 
    2298                 measure = orange.MeasureAttribute_MSE() 
     2301                measure = Orange.feature.scoring.MSE() 
    22992302            self.learner.split.continuousSplitConstructor.measure = measure 
    23002303            self.learner.split.discreteSplitConstructor.measure = measure 
     
    23022305        tree = self.learner(examples, weight) 
    23032306        if getattr(self, "sameMajorityPruning", 0): 
    2304             tree = orange.TreePruner_SameMajority(tree) 
     2307            tree = Orange.classification.tree.Pruner_SameMajority(tree) 
    23052308        if getattr(self, "mForPruning", 0): 
    2306             tree = orange.TreePruner_m(tree, m = self.mForPruning) 
     2309            tree = Orange.classification.tree.Pruner_m(tree, m = self.mForPruning) 
    23072310        return tree 
    23082311 
     
    23112314        Return the constructed learner - an object of :class:`TreeLearnerBase`. 
    23122315        """ 
    2313         learner = orange.TreeLearner() 
     2316        learner = Orange.classification.tree.TreeLearnerBase() 
    23142317 
    23152318        hasSplit = hasattr(self, "split") 
     
    23172320            learner.split = self.split 
    23182321        else: 
    2319             learner.split = orange.TreeSplitConstructor_Combined() 
    2320             learner.split.continuousSplitConstructor = orange.TreeSplitConstructor_Threshold() 
     2322            learner.split = Orange.classification.tree.SplitConstructor_Combined() 
     2323            learner.split.continuousSplitConstructor = Orange.classification.tree.SplitConstructor_Threshold() 
    23212324            binarization = getattr(self, "binarization", 0) 
    23222325            if binarization == 1: 
    2323                 learner.split.discreteSplitConstructor = orange.TreeSplitConstructor_ExhaustiveBinary() 
     2326                learner.split.discreteSplitConstructor = Orange.classification.tree.SplitConstructor_ExhaustiveBinary() 
    23242327            elif binarization == 2: 
    2325                 learner.split.discreteSplitConstructor = orange.TreeSplitConstructor_OneAgainstOthers() 
     2328                learner.split.discreteSplitConstructor = Orange.classification.tree.SplitConstructor_OneAgainstOthers() 
    23262329            else: 
    2327                 learner.split.discreteSplitConstructor = orange.TreeSplitConstructor_Attribute() 
    2328  
    2329             measures = {"infoGain": orange.MeasureAttribute_info, 
    2330                 "gainRatio": orange.MeasureAttribute_gainRatio, 
    2331                 "gini": orange.MeasureAttribute_gini, 
    2332                 "relief": orange.MeasureAttribute_relief, 
    2333                 "retis": orange.MeasureAttribute_MSE 
     2330                learner.split.discreteSplitConstructor = Orange.classification.tree.SplitConstructor_Feature() 
     2331 
     2332            measures = {"infoGain": Orange.feature.scoring.InfoGain, 
     2333                "gainRatio": Orange.feature.scoring.GainRatio, 
     2334                "gini": Orange.feature.scoring.Gini, 
     2335                "relief": Orange.feature.scoring.Relief, 
     2336                "retis": Orange.feature.scoring.MSE 
    23342337                } 
    23352338 
     
    23382341                measure = measures[measure]() 
    23392342            if not hasSplit and not measure: 
    2340                 measure = orange.MeasureAttribute_gainRatio() 
    2341  
    2342             measureIsRelief = type(measure) == orange.MeasureAttribute_relief 
     2343                measure = Orange.feature.scoring.GainRatio() 
     2344 
     2345            measureIsRelief = type(measure) == Orange.feature.scoring.Relief 
    23432346            relM = getattr(self, "reliefM", None) 
    23442347            if relM and measureIsRelief: 
     
    23652368            learner.stop = self.stop 
    23662369        else: 
    2367             learner.stop = orange.TreeStopCriteria_common() 
     2370            learner.stop = Orange.classification.tree.StopCriteria_common() 
    23682371            mm = getattr(self, "maxMajority", 1.0) 
    23692372            if mm < 1.0: 
     
    23972400    :type tree: :class:`TreeClassifier` 
    23982401    """ 
    2399     return __countNodes(isinstance(tree == Orange.classification.tree.TreeClassifier) and tree.tree or tree) 
     2402    return __countNodes(isinstance(tree, Orange.classification.tree.TreeClassifier) and tree.tree or tree) 
    24002403 
    24012404 
     
    24172420    :type tree: :class:`TreeClassifier` 
    24182421    """ 
    2419     return __countLeaves(isinstance(tree == Orange.classification.tree.TreeClassifier) and tree.tree or tree) 
     2422    return __countLeaves(isinstance(tree, Orange.classification.tree.TreeClassifier) and tree.tree or tree) 
    24202423 
    24212424 
     
    25402543 
    25412544def replaceCdisc(strg, mo, node, parent, tree): 
    2542     if tree.classVar.varType != orange.VarTypes.Discrete: 
     2545    if tree.classVar.varType != Orange.data.Type.Discrete: 
    25432546        return insertDot(strg, mo) 
    25442547     
     
    25582561     
    25592562def replacecdisc(strg, mo, node, parent, tree): 
    2560     if tree.classVar.varType != orange.VarTypes.Discrete: 
     2563    if tree.classVar.varType != Orange.data.Type.Discrete: 
    25612564        return insertDot(strg, mo) 
    25622565     
     
    25802583 
    25812584def replaceCcont(strg, mo, node, parent, tree): 
    2582     if tree.classVar.varType != orange.VarTypes.Continuous: 
     2585    if tree.classVar.varType != Orange.data.Type.Continuous: 
    25832586        return insertDot(strg, mo) 
    25842587     
     
    26002603     
    26012604def replaceccont(strg, mo, node, parent, tree): 
    2602     if tree.classVar.varType != orange.VarTypes.Continuous: 
     2605    if tree.classVar.varType != Orange.data.Type.Continuous: 
    26032606        return insertDot(strg, mo) 
    26042607     
     
    26342637     
    26352638def replaceCconti(strg, mo, node, parent, tree): 
    2636     if tree.classVar.varType != orange.VarTypes.Continuous: 
     2639    if tree.classVar.varType != Orange.data.Type.Continuous: 
    26372640        return insertDot(strg, mo) 
    26382641 
     
    26522655             
    26532656def replacecconti(strg, mo, node, parent, tree): 
    2654     if tree.classVar.varType != orange.VarTypes.Continuous: 
     2657    if tree.classVar.varType != Orange.data.Type.Continuous: 
    26552658        return insertDot(strg, mo) 
    26562659 
     
    26742677     
    26752678def replaceD(strg, mo, node, parent, tree): 
    2676     if tree.classVar.varType != orange.VarTypes.Discrete: 
     2679    if tree.classVar.varType != Orange.data.Type.Discrete: 
    26772680        return insertDot(strg, mo) 
    26782681 
     
    26932696 
    26942697def replaced(strg, mo, node, parent, tree): 
    2695     if tree.classVar.varType != orange.VarTypes.Discrete: 
     2698    if tree.classVar.varType != Orange.data.Type.Discrete: 
    26962699        return insertDot(strg, mo) 
    26972700 
     
    27152718 
    27162719def replaceAE(strg, mo, node, parent, tree): 
    2717     if tree.classVar.varType != orange.VarTypes.Continuous: 
     2720    if tree.classVar.varType != Orange.data.Type.Continuous: 
    27182721        return insertDot(strg, mo) 
    27192722 
     
    27442747 
    27452748def replaceI(strg, mo, node, parent, tree): 
    2746     if tree.classVar.varType != orange.VarTypes.Continuous: 
     2749    if tree.classVar.varType != Orange.data.Type.Continuous: 
    27472750        return insertDot(strg, mo) 
    27482751 
     
    27822785            self.leafStr = leafStr 
    27832786        else: 
    2784             if tree.classVar.varType == orange.VarTypes.Discrete: 
     2787            if tree.classVar.varType == Orange.data.Type.Discrete: 
    27852788                self.leafStr = "%V (%^.2m%)" 
    27862789            else: 
  • orange/orngTree.py

    r7340 r7522  
    1 import orange 
    2 import base64 
    3 from warnings import warn 
    4  
    5 class TreeLearner(orange.Learner): 
    6     """TreeLearner(**kwargs) 
    7      
    8     Keyword arguments: 
    9     split -- Object of type TreeSplitConstructor. Default value, provided by TreeLearner, is SplitConstructor_Combined  
    10         with separate constructors for discrete and continuous attributes. Discrete attributes are used as are, while  
    11         continuous attributes are binarized. Gain ratio is used to select attributes.  
    12         A minimum of two examples in a leaf is required for discrete and five examples in a leaf for continuous attributes. 
    13     binarization -- can be one of: 
    14         1: orange.TreeSplitConstructor_ExhaustiveBinary() 
    15         2: orange.TreeSplitConstructor_OneAgainstOthers() 
    16         else: learner.split.discreteSplitConstructor = orange.TreeSplitConstructor_Attribute() 
    17     measure -- can be either a measure or a string, in which case one of the following is used: 
    18         "infoGain": orange.MeasureAttribute_info, 
    19         "gainRatio": orange.MeasureAttribute_gainRatio, 
    20         "gini": orange.MeasureAttribute_gini, 
    21         "relief": orange.MeasureAttribute_relief, 
    22         "retis": orange.MeasureAttribute_MSE 
    23     worstAcceptable --  
    24     minSubset -- 
    25     stop -- Object of type TreeStopCriteria. The default stopping criterion stops induction when all examples in a node belong to the same class. 
    26     maxMajority -- 
    27     minExamples -- 
    28     nodeLearner -- 
    29     maxDepth -- Gives maximal tree depth;  
    30         0 means that only root is generated. The default is 100 to prevent any infinite tree induction due to missettings in stop criteria.  
    31         If you are sure you need larger trees, increase it. If you, on the other hand, want to lower this hard limit, you can do so as well. 
    32     sameMajorityPruning -- 
    33     mForPruning --  
    34     reliefM --  
    35     reliefK -- 
    36     storeDistributions -- 
    37     storeContingencies -- 
    38     storeExamples -- 
    39     storeNodeClassifier -- 
    40     nodeLearner -- 
    41     """ 
    42     def __new__(cls, examples = None, weightID = 0, **argkw): 
    43         self = orange.Learner.__new__(cls, **argkw) 
    44         if examples: 
    45             self.__init__(**argkw) 
    46             return self.__call__(examples, weightID) 
    47         else: 
    48             return self 
    49        
    50     def __init__(self, **kw): 
    51         self.learner = None 
    52         self.__dict__.update(kw) 
    53        
    54     def __setattr__(self, name, value): 
    55         if name in ["split", "binarization", "measure", "worstAcceptable", "minSubset", 
    56               "stop", "maxMajority", "minExamples", "nodeLearner", "maxDepth", "reliefM", "reliefK"]: 
    57             self.learner = None 
    58         self.__dict__[name] = value 
    59  
    60     def __call__(self, examples, weight=0): 
    61         if not self.learner: 
    62             self.learner = self.instance() 
    63         if not hasattr(self, "split") and not hasattr(self, "measure"): 
    64             if examples.domain.classVar.varType == orange.VarTypes.Discrete: 
    65                 measure = orange.MeasureAttribute_gainRatio() 
    66             else: 
    67                 measure = orange.MeasureAttribute_MSE() 
    68             self.learner.split.continuousSplitConstructor.measure = measure 
    69             self.learner.split.discreteSplitConstructor.measure = measure 
    70              
    71         tree = self.learner(examples, weight) 
    72         if getattr(self, "sameMajorityPruning", 0): 
    73             tree = orange.TreePruner_SameMajority(tree) 
    74         if getattr(self, "mForPruning", 0): 
    75             tree = orange.TreePruner_m(tree, m = self.mForPruning) 
    76         return tree 
    77  
    78     def instance(self): 
    79         learner = orange.TreeLearner() 
    80  
    81         hasSplit = hasattr(self, "split") 
    82         if hasSplit: 
    83             learner.split = self.split 
    84         else: 
    85             learner.split = orange.TreeSplitConstructor_Combined() 
    86             learner.split.continuousSplitConstructor = orange.TreeSplitConstructor_Threshold() 
    87             binarization = getattr(self, "binarization", 0) 
    88             if binarization == 1: 
    89                 learner.split.discreteSplitConstructor = orange.TreeSplitConstructor_ExhaustiveBinary() 
    90             elif binarization == 2: 
    91                 learner.split.discreteSplitConstructor = orange.TreeSplitConstructor_OneAgainstOthers() 
    92             else: 
    93                 learner.split.discreteSplitConstructor = orange.TreeSplitConstructor_Attribute() 
    94  
    95             measures = {"infoGain": orange.MeasureAttribute_info, 
    96                 "gainRatio": orange.MeasureAttribute_gainRatio, 
    97                 "gini": orange.MeasureAttribute_gini, 
    98                 "relief": orange.MeasureAttribute_relief, 
    99                 "retis": orange.MeasureAttribute_MSE 
    100                 } 
    101  
    102             measure = getattr(self, "measure", None) 
    103             if type(measure) == str: 
    104                 measure = measures[measure]() 
    105             if not hasSplit and not measure: 
    106                 measure = orange.MeasureAttribute_gainRatio() 
    107  
    108             measureIsRelief = type(measure) == orange.MeasureAttribute_relief 
    109             relM = getattr(self, "reliefM", None) 
    110             if relM and measureIsRelief: 
    111                 measure.m = relM 
    112              
    113             relK = getattr(self, "reliefK", None) 
    114             if relK and measureIsRelief: 
    115                 measure.k = relK 
    116  
    117             learner.split.continuousSplitConstructor.measure = measure 
    118             learner.split.discreteSplitConstructor.measure = measure 
    119  
    120             wa = getattr(self, "worstAcceptable", 0) 
    121             if wa: 
    122                 learner.split.continuousSplitConstructor.worstAcceptable = wa 
    123                 learner.split.discreteSplitConstructor.worstAcceptable = wa 
    124  
    125             ms = getattr(self, "minSubset", 0) 
    126             if ms: 
    127                 learner.split.continuousSplitConstructor.minSubset = ms 
    128                 learner.split.discreteSplitConstructor.minSubset = ms 
    129  
    130         if hasattr(self, "stop"): 
    131             learner.stop = self.stop 
    132         else: 
    133             learner.stop = orange.TreeStopCriteria_common() 
    134             mm = getattr(self, "maxMajority", 1.0) 
    135             if mm < 1.0: 
    136                 learner.stop.maxMajority = self.maxMajority 
    137             me = getattr(self, "minExamples", 0) 
    138             if me: 
    139                 learner.stop.minExamples = self.minExamples 
    140  
    141         for a in ["storeDistributions", "storeContingencies", "storeExamples", "storeNodeClassifier", "nodeLearner", "maxDepth"]: 
    142             if hasattr(self, a): 
    143                 setattr(learner, a, getattr(self, a)) 
    144  
    145         return learner 
    146  
    147  
    148 def __countNodes(node): 
    149     count = 0 
    150     if node: 
    151         count += 1 
    152         if node.branches: 
    153             for node in node.branches: 
    154                 count += __countNodes(node) 
    155     return count 
    156  
    157 def countNodes(tree): 
    158     return __countNodes(type(tree) == orange.TreeClassifier and tree.tree or tree) 
    159  
    160  
    161 def __countLeaves(node): 
    162     count = 0 
    163     if node: 
    164         if node.branches: # internal node 
    165             for node in node.branches: 
    166                 count += __countLeaves(node) 
    167         else: 
    168             count += 1 
    169     return count 
    170  
    171 def countLeaves(tree): 
    172     return __countLeaves(type(tree) == orange.TreeClassifier and tree.tree or tree) 
    173  
    174  
    175  
    176 import re 
    177 fs = r"(?P<m100>\^?)(?P<fs>(\d*\.?\d*)?)" 
    178 by = r"(?P<by>(b(P|A)))?" 
    179 bysub = r"((?P<bysub>b|s)(?P<by>P|A))?" 
    180 opc = r"(?P<op>=|<|>|(<=)|(>=)|(!=))(?P<num>\d*\.?\d+)" 
    181 opd = r'(?P<op>=|(!=))"(?P<cls>[^"]*)"' 
    182 intrvl = r'((\((?P<intp>\d+)%?\))|(\(0?\.(?P<intv>\d+)\))|)' 
    183 fromto = r"(?P<out>!?)(?P<lowin>\(|\[)(?P<lower>\d*\.?\d+)\s*,\s*(?P<upper>\d*\.?\d+)(?P<upin>\]|\))" 
    184 re_V = re.compile("%V") 
    185 re_N = re.compile("%"+fs+"N"+by) 
    186 re_M = re.compile("%"+fs+"M"+by) 
    187 re_m = re.compile("%"+fs+"m"+by) 
    188 re_Ccont = re.compile("%"+fs+"C"+by+opc) 
    189 re_Cdisc = re.compile("%"+fs+"C"+by+opd) 
    190 re_ccont = re.compile("%"+fs+"c"+by+opc) 
    191 re_cdisc = re.compile("%"+fs+"c"+by+opd) 
    192 re_Cconti = re.compile("%"+fs+"C"+by+fromto) 
    193 re_cconti = re.compile("%"+fs+"c"+by+fromto) 
    194 re_D = re.compile("%"+fs+"D"+by) 
    195 re_d = re.compile("%"+fs+"d"+by) 
    196 re_AE = re.compile("%"+fs+"(?P<AorE>A|E)"+bysub) 
    197 re_I = re.compile("%"+fs+"I"+intrvl) 
    198  
    199 def insertDot(s, mo): 
    200     return s[:mo.start()] + "." + s[mo.end():] 
    201  
    202 def insertStr(s, mo, sub): 
    203     return s[:mo.start()] + sub + s[mo.end():] 
    204  
    205 def insertNum(s, mo, n): 
    206     grps = mo.groupdict() 
    207     m100 = grps.get("m100", None) 
    208     if m100: 
    209         n *= 100 
    210     fs = grps.get("fs") or (m100 and ".0" or "5.3") 
    211     return s[:mo.start()] + ("%%%sf" % fs % n) + s[mo.end():] 
    212  
    213 def byWhom(by, parent, tree): 
    214         if by=="bP": 
    215             return parent 
    216         else: 
    217             return tree.tree 
    218  
    219 def replaceV(strg, mo, node, parent, tree): 
    220     return insertStr(strg, mo, str(node.nodeClassifier.defaultValue)) 
    221  
    222 def replaceN(strg, mo, node, parent, tree): 
    223     by = mo.group("by") 
    224     N = node.distribution.abs 
    225     if by: 
    226         whom = byWhom(by, parent, tree) 
    227         if whom and whom.distribution: 
    228             if whom.distribution.abs > 1e-30: 
    229                 N /= whom.distribution.abs 
    230         else: 
    231             return insertDot(strg, mo) 
    232     return insertNum(strg, mo, N) 
    233          
    234  
    235 def replaceM(strg, mo, node, parent, tree): 
    236     by = mo.group("by") 
    237     maj = int(node.nodeClassifier.defaultValue) 
    238     N = node.distribution[maj] 
    239     if by: 
    240         whom = byWhom(by, parent, tree) 
    241         if whom and whom.distribution: 
    242             if whom.distribution[maj] > 1e-30: 
    243                 N /= whom.distribution[maj] 
    244         else: 
    245             return insertDot(strg, mo) 
    246     return insertNum(strg, mo, N) 
    247          
    248  
    249 def replacem(strg, mo, node, parent, tree): 
    250     by = mo.group("by") 
    251     maj = int(node.nodeClassifier.defaultValue) 
    252     if node.distribution.abs > 1e-30: 
    253         N = node.distribution[maj] / node.distribution.abs 
    254         if by: 
    255             if whom and whom.distribution: 
    256                 byN = whom.distribution[maj] / whom.distribution.abs 
    257                 if byN > 1e-30: 
    258                     N /= byN 
    259             else: 
    260                 return insertDot(strg, mo) 
    261     else: 
    262         N = 0. 
    263     return insertNum(strg, mo, N) 
    264  
    265  
    266 def replaceCdisc(strg, mo, node, parent, tree): 
    267     if tree.classVar.varType != orange.VarTypes.Discrete: 
    268         return insertDot(strg, mo) 
    269      
    270     by, op, cls = mo.group("by", "op", "cls") 
    271     N = node.distribution[cls] 
    272     if op == "!=": 
    273         N = node.distribution.abs - N 
    274     if by: 
    275         whom = byWhom(by, parent, tree) 
    276         if whom and whom.distribution: 
    277             if whom.distribution[cls] > 1e-30: 
    278                 N /= whom.distribution[cls] 
    279         else: 
    280             return insertDot(strg, mo) 
    281     return insertNum(strg, mo, N) 
    282  
    283      
    284 def replacecdisc(strg, mo, node, parent, tree): 
    285     if tree.classVar.varType != orange.VarTypes.Discrete: 
    286         return insertDot(strg, mo) 
    287      
    288     op, by, cls = mo.group("op", "by", "cls") 
    289     N = node.distribution[cls] 
    290     if node.distribution.abs > 1e-30: 
    291         N /= node.distribution.abs 
    292         if op == "!=": 
    293             N = 1 - N 
    294     if by: 
    295         whom = byWhom(by, parent, tree) 
    296         if whom and whom.distribution: 
    297             if whom.distribution[cls] > 1e-30: 
    298                 N /= whom.distribution[cls] / whom.distribution.abs 
    299         else: 
    300             return insertDot(strg, mo) 
    301     return insertNum(strg, mo, N) 
    302  
    303  
    304 import operator 
    305 __opdict = {"<": operator.lt, "<=": operator.le, ">": operator.gt, ">=": operator.ge, "=": operator.eq, "!=": operator.ne} 
    306  
    307 def replaceCcont(strg, mo, node, parent, tree): 
    308     if tree.classVar.varType != orange.VarTypes.Continuous: 
    309         return insertDot(strg, mo) 
    310      
    311     by, op, num = mo.group("by", "op", "num") 
    312     op = __opdict[op] 
    313     num = float(num) 
    314     N = sum([x[1] for x in node.distribution.items() if op(x[0], num)], 0.) 
    315     if by: 
    316         whom = byWhom(by, parent, tree) 
    317         if whom and whom.distribution: 
    318             byN = sum([x[1] for x in whom.distribution.items() if op(x[0], num)], 0.) 
    319             if byN > 1e-30: 
    320                 N /= byN 
    321         else: 
    322             return insertDot(strg, mo) 
    323  
    324     return insertNum(strg, mo, N) 
    325      
    326      
    327 def replaceccont(strg, mo, node, parent, tree): 
    328     if tree.classVar.varType != orange.VarTypes.Continuous: 
    329         return insertDot(strg, mo) 
    330      
    331     by, op, num = mo.group("by", "op", "num") 
    332     op = __opdict[op] 
    333     num = float(num) 
    334     N = sum([x[1] for x in node.distribution.items() if op(x[0], num)], 0.) 
    335     if node.distribution.abs > 1e-30: 
    336         N /= node.distribution.abs 
    337     if by: 
    338         whom = byWhom(by, parent, tree) 
    339         if whom and whom.distribution: 
    340             byN = sum([x[1] for x in whom.distribution.items() if op(x[0], num)], 0.) 
    341             if byN > 1e-30: 
    342                 N /= byN/whom.distribution.abs # abs > byN, so byN>1e-30 => abs>1e-30 
    343         else: 
    344             return insertDot(strg, mo) 
    345     return insertNum(strg, mo, N) 
    346  
    347  
    348 def extractInterval(mo, dist): 
    349     out, lowin, lower, upper, upin = mo.group("out", "lowin", "lower", "upper", "upin") 
    350     lower, upper = float(lower), float(upper) 
    351     if out: 
    352         lop = lowin == "(" and operator.le or operator.lt 
    353         hop = upin == ")" and operator.ge or operator.ge 
    354         return filter(lambda x:lop(x[0], lower) or hop(x[0], upper), dist.items()) 
    355     else: 
    356         lop = lowin == "(" and operator.gt or operator.ge 
    357         hop = upin == ")" and operator.lt or operator.le 
    358         return filter(lambda x:lop(x[0], lower) and hop(x[0], upper), dist.items()) 
    359  
    360      
    361 def replaceCconti(strg, mo, node, parent, tree): 
    362     if tree.classVar.varType != orange.VarTypes.Continuous: 
    363         return insertDot(strg, mo) 
    364  
    365     by = mo.group("by") 
    366     N = sum([x[1] for x in extractInterval(mo, node.distribution)]) 
    367     if by: 
    368         whom = byWhom(by, parent, tree) 
    369         if whom and whom.distribution: 
    370             byN = sum([x[1] for x in extractInterval(mo, whom.distribution)]) 
    371             if byN > 1e-30: 
    372                 N /= byN 
    373         else: 
    374             return insertDot(strg, mo) 
    375          
    376     return insertNum(strg, mo, N) 
    377  
    378              
    379 def replacecconti(strg, mo, node, parent, tree): 
    380     if tree.classVar.varType != orange.VarTypes.Continuous: 
    381         return insertDot(strg, mo) 
    382  
    383     N = sum([x[1] for x in extractInterval(mo, node.distribution)]) 
    384     ab = node.distribution.abs 
    385     if ab > 1e-30: 
    386         N /= ab 
    387  
    388     by = mo.group("by") 
    389     if by: 
    390         whom = byWhom(by, parent, tree) 
    391         if whom and whom.distribution: 
    392             byN = sum([x[1] for x in extractInterval(mo, whom.distribution)]) 
    393             if byN > 1e-30: 
    394                 N /= byN/whom.distribution.abs 
    395         else: 
    396             return insertDot(strg, mo) 
    397          
    398     return insertNum(strg, mo, N) 
    399  
    400      
    401 def replaceD(strg, mo, node, parent, tree): 
    402     if tree.classVar.varType != orange.VarTypes.Discrete: 
    403         return insertDot(strg, mo) 
    404  
    405     fs, by, m100 = mo.group("fs", "by", "m100") 
    406     dist = list(node.distribution) 
    407     if by: 
    408         whom = byWhom(by, parent, tree) 
    409         if whom: 
    410             for i, d in enumerate(whom.distribution): 
    411                 if d > 1e-30: 
    412                     dist[i] /= d 
    413         else: 
    414             return insertDot(strg, mo) 
    415     mul = m100 and 100 or 1 
    416     fs = fs or (m100 and ".0" or "5.3") 
    417     return insertStr(strg, mo, "["+", ".join(["%%%sf" % fs % (N*mul) for N in dist])+"]") 
    418  
    419  
    420 def replaced(strg, mo, node, parent, tree): 
    421     if tree.classVar.varType != orange.VarTypes.Discrete: 
    422         return insertDot(strg, mo) 
    423  
    424     fs, by, m100 = mo.group("fs", "by", "m100") 
    425     dist = list(node.distribution) 
    426     ab = node.distribution.abs 
    427     if ab > 1e-30: 
    428         dist = [d/ab for d in dist] 
    429     if by: 
    430         whom = byWhom(by, parent, tree) 
    431         if whom: 
    432             for i, d in enumerate(whom.distribution): 
    433                 if d > 1e-30: 
    434                     dist[i] /= d/whom.distribution.abs # abs > d => d>1e-30 => abs>1e-30 
    435         else: 
    436             return insertDot(strg, mo) 
    437     mul = m100 and 100 or 1 
    438     fs = fs or (m100 and ".0" or "5.3") 
    439     return insertStr(strg, mo, "["+", ".join(["%%%sf" % fs % (N*mul) for N in dist])+"]") 
    440  
    441  
    442 def replaceAE(strg, mo, node, parent, tree): 
    443     if tree.classVar.varType != orange.VarTypes.Continuous: 
    444         return insertDot(strg, mo) 
    445  
    446     AorE, bysub, by = mo.group("AorE", "bysub", "by") 
    447      
    448     if AorE == "A": 
    449         A = node.distribution.average() 
    450     else: 
    451         A = node.distribution.error() 
    452     if by: 
    453         whom = byWhom("b"+by, parent, tree) 
    454         if whom: 
    455             if AorE == "A": 
    456                 avg = whom.distribution.average() 
    457             else: 
    458                 avg = whom.distribution.error() 
    459             if bysub == "b": 
    460                 if avg > 1e-30: 
    461                     A /= avg 
    462             else: 
    463                 A -= avg 
    464         else: 
    465             return insertDot(strg, mo) 
    466     return insertNum(strg, mo, A) 
    467  
    468  
    469 Z = { 0.75:1.15, 0.80:1.28, 0.85:1.44, 0.90:1.64, 0.95:1.96, 0.99:2.58 } 
    470  
    471 def replaceI(strg, mo, node, parent, tree): 
    472     if tree.classVar.varType != orange.VarTypes.Continuous: 
    473         return insertDot(strg, mo) 
    474  
    475     fs = mo.group("fs") or "5.3" 
    476     intrvl = float(mo.group("intp") or mo.group("intv") or "95")/100. 
    477     mul = mo.group("m100") and 100 or 1 
    478  
    479     if not Z.has_key(intrvl): 
    480         raise SystemError, "Cannot compute %5.3f% confidence intervals" % intrvl 
    481  
    482     av = node.distribution.average()     
    483     il = node.distribution.error() * Z[intrvl] 
    484     return insertStr(strg, mo, "[%%%sf-%%%sf]" % (fs, fs) % ((av-il)*mul, (av+il)*mul)) 
    485  
    486  
    487 # This class is more a collection of function, merged into a class so that they don't 
    488 # need to transfer too many arguments. It will be constructed, used and discarded, 
    489 # it is not meant to store any information. 
    490 class __TreeDumper: 
    491     defaultStringFormats = [(re_V, replaceV), (re_N, replaceN), (re_M, replaceM), (re_m, replacem), 
    492                               (re_Cdisc, replaceCdisc), (re_cdisc, replacecdisc), 
    493                               (re_Ccont, replaceCcont), (re_ccont, replaceccont), 
    494                               (re_Cconti, replaceCconti), (re_cconti, replacecconti), 
    495                               (re_D, replaceD), (re_d, replaced), 
    496                               (re_AE, replaceAE), (re_I, replaceI) 
    497                              ] 
    498  
    499     def __init__(self, leafStr, nodeStr, stringFormats, minExamples, maxDepth, simpleFirst, tree, **kw): 
    500         self.stringFormats = stringFormats 
    501         self.minExamples = minExamples 
    502         self.maxDepth = maxDepth 
    503         self.simpleFirst = simpleFirst 
    504         self.tree = tree 
    505         self.__dict__.update(kw) 
    506  
    507         if leafStr: 
    508             self.leafStr = leafStr 
    509         else: 
    510             if tree.classVar.varType == orange.VarTypes.Discrete: 
    511                 self.leafStr = "%V (%^.2m%)" 
    512             else: 
    513                 self.leafStr = "%V" 
    514  
    515         if nodeStr == ".": 
    516             self.nodeStr = self.leafStr 
    517         else: 
    518             self.nodeStr = nodeStr 
    519          
    520  
    521     def formatString(self, strg, node, parent): 
    522         if hasattr(strg, "__call__"): 
    523             return strg(node, parent, self.tree) 
    524          
    525         if not node: 
    526             return "<null node>" 
    527          
    528         for rgx, replacer in self.stringFormats: 
    529             if not node.distribution: 
    530                 strg = rgx.sub(".", strg) 
    531             else: 
    532                 strt = 0 
    533                 while True: 
    534                     mo = rgx.search(strg, strt) 
    535                     if not mo: 
    536                         break 
    537                     strg = replacer(strg, mo, node, parent, self.tree) 
    538                     strt = mo.start()+1 
    539                          
    540         return strg 
    541          
    542  
    543     def showBranch(self, node, parent, lev, i): 
    544         bdes = node.branchDescriptions[i] 
    545         bdes = node.branchSelector.classVar.name + (bdes[0] not in "<=>" and "=" or "") + bdes 
    546         if node.branches[i]: 
    547             nodedes = self.nodeStr and ": "+self.formatString(self.nodeStr, node.branches[i], node) or "" 
    548         else: 
    549             nodedes = "<null node>" 
    550         return "|    "*lev + bdes + nodedes 
    551          
    552          
    553     def dumpTree0(self, node, parent, lev): 
    554         if node.branches: 
    555             if node.distribution.abs < self.minExamples or lev > self.maxDepth: 
    556                 return "|    "*lev + ". . .\n" 
    557              
    558             res = "" 
    559             if self.leafStr and self.nodeStr and self.leafStr != self.nodeStr: 
    560                 leafsep = "\n"+("|    "*lev)+"    " 
    561             else: 
    562                 leafsep = "" 
    563             if self.simpleFirst: 
    564                 for i, branch in enumerate(node.branches): 
    565                     if not branch or not branch.branches: 
    566                         if self.leafStr == self.nodeStr: 
    567                             res += "%s\n" % self.showBranch(node, parent, lev, i) 
    568                         else: 
    569                             res += "%s: %s\n" % (self.showBranch(node, parent, lev, i), 
    570                                                  leafsep + self.formatString(self.leafStr, branch, node)) 
    571             for i, branch in enumerate(node.branches): 
    572                 if branch and branch.branches: 
    573                     res += "%s\n%s" % (self.showBranch(node, parent, lev, i), 
    574                                        self.dumpTree0(branch, node, lev+1)) 
    575                 elif not self.simpleFirst: 
    576                     if self.leafStr == self.nodeStr: 
    577                         res += "%s\n" % self.showBranch(node, parent, lev, i) 
    578                     else: 
    579                         res += "%s: %s\n" % (self.showBranch(node, parent, lev, i), 
    580                                              leafsep+self.formatString(self.leafStr, branch, node)) 
    581             return res 
    582         else: 
    583             return self.formatString(self.leafStr, node, parent) 
    584  
    585  
    586     def dumpTree(self): 
    587         if self.nodeStr: 
    588             lev, res = 1, "root: %s\n" % self.formatString(self.nodeStr, self.tree.tree, None) 
    589             self.maxDepth += 1 
    590         else: 
    591             lev, res = 0, "" 
    592         return res + self.dumpTree0(self.tree.tree, None, lev) 
    593          
    594  
    595     def dotTree0(self, node, parent, internalName): 
    596         if node.branches: 
    597             if node.distribution.abs < self.minExamples or len(internalName)-1 > self.maxDepth: 
    598                 self.fle.write('%s [ shape="plaintext" label="..." ]\n' % _quoteName(internalName)) 
    599                 return 
    600                  
    601             label = node.branchSelector.classVar.name 
    602             if self.nodeStr: 
    603                 label += "\\n" + self.formatString(self.nodeStr, node, parent) 
    604             self.fle.write('%s [ shape=%s label="%s"]\n' % (_quoteName(internalName), self.nodeShape, label)) 
    605              
    606             for i, branch in enumerate(node.branches): 
    607                 if branch: 
    608                     internalBranchName = internalName+chr(i+65) 
    609                     self.fle.write('%s -> %s [ label="%s" ]\n' % (_quoteName(internalName), _quoteName(internalBranchName), node.branchDescriptions[i])) 
    610                     self.dotTree0(branch, node, internalBranchName) 
    611                      
    612         else: 
    613             self.fle.write('%s [ shape=%s label="%s"]\n' % (internalName, self.leafShape, self.formatString(self.leafStr, node, parent))) 
    614  
    615  
    616     def dotTree(self, internalName="n"): 
    617         self.fle.write("digraph G {\n") 
    618         self.dotTree0(self.tree.tree, None, internalName) 
    619         self.fle.write("}\n") 
    620  
    621 def _quoteName(x): 
    622     return '"%s"' % (base64.b64encode(x)) 
    623  
    624 def dumpTree(tree, leafStr = "", nodeStr = "", **argkw): 
    625     return __TreeDumper(leafStr, nodeStr, argkw.get("userFormats", []) + __TreeDumper.defaultStringFormats, 
    626                         argkw.get("minExamples", 0), argkw.get("maxDepth", 1e10), argkw.get("simpleFirst", True), 
    627                         tree).dumpTree() 
    628  
    629  
    630 def printTree(*a, **aa): 
    631     print dumpTree(*a, **aa) 
    632  
    633 printTxt = printTree 
    634  
    635  
    636 def dotTree(tree, fileName, leafStr = "", nodeStr = "", leafShape="plaintext", nodeShape="plaintext", **argkw): 
    637     fle = type(fileName) == str and file(fileName, "wt") or fileName 
    638  
    639     __TreeDumper(leafStr, nodeStr, argkw.get("userFormats", []) + __TreeDumper.defaultStringFormats, 
    640                  argkw.get("minExamples", 0), argkw.get("maxDepth", 1e10), argkw.get("simpleFirst", True), 
    641                  tree, 
    642                  leafShape = leafShape, nodeShape = nodeShape, fle = fle).dotTree() 
    643                          
    644 printDot = dotTree 
    645          
    646 ##import orange, orngTree, os 
    647 ##os.chdir("c:\\d\\ai\\orange\\doc\\datasets") 
    648 ##data = orange.ExampleTable("iris") 
    649 ###data = orange.ExampleTable("housing") 
    650 ##tree = orngTree.TreeLearner(data) 
    651 ##printTxt(tree) 
    652 ###print printTree(tree, '%V %4.2NbP %.3C!="Iris-virginica"') 
    653 ###print printTree(tree, '%A %I(95) %C![20,22)bP', ".", maxDepth=3) 
    654 ###dotTree("c:\\d\\ai\\orange\\x.dot", tree, '%A', maxDepth= 3) 
     1from Orange.classification.tree import * 
Note: See TracChangeset for help on using the changeset viewer.