Changeset 8999:90315f44439a in orange
- Timestamp:
- 09/21/11 16:37:18 (20 months ago)
- Branch:
- default
- Convert:
- e020d075c7bd75f774142a147d3a52c04e9acfde
- Location:
- orange
- Files:
-
- 1 deleted
- 2 edited
-
Orange/classification/tree.py (modified) (20 diffs)
-
doc/Orange/rst/code/c45.py (deleted)
-
doc/Orange/rst/code/tree_c45.py (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
orange/Orange/classification/tree.py
r8986 r8999 11 11 12 12 To build a :obj:`TreeClassifier` from the Iris data set 13 (with the depth limited to three levels), use (part of `orngTree1.py`_, 14 uses `iris.tab`_): 13 (with the depth limited to three levels), use: 15 14 16 15 .. literalinclude:: code/orngTree1.py … … 128 127 is random. Note that in the second case lambda function still has three 129 128 parameters, since this is a necessary number of parameters for the stop 130 function (:obj:`StopCriteria`). Part of `tree3.py`_ (uses `iris.tab`_):129 function (:obj:`StopCriteria`). 131 130 132 131 .. _tree3.py: code/tree3.py … … 142 141 143 142 To have something to work on, we'll take the data from lenses dataset and 144 build a tree using the default components (part of `treestructure.py`_, 145 uses `lenses.tab`_): 143 build a tree using the default components: 146 144 147 145 .. literalinclude:: code/treestructure.py 148 146 :lines: 7-10 149 147 150 How big is our tree (part of `treestructure.py`_, uses `lenses.tab`_)?148 How big is our tree? 151 149 152 150 .. _lenses.tab: code/lenses.tab … … 170 168 171 169 Let us now write a script that prints out a tree. The recursive part of 172 the function will get a node and its level (part of `treestructure.py`_, 173 uses `lenses.tab`_). 170 the function will get a node and its level. 174 171 175 172 .. literalinclude:: code/treestructure.py … … 211 208 ... but we won't. Let us learn how to handle arguments of 212 209 different types. Let's write a function that will accept either a 213 :obj:`TreeClassifier` or a :obj:`Node`. Part of `treestructure.py`_, 214 uses `lenses.tab`_. 210 :obj:`TreeClassifier` or a :obj:`Node`. 215 211 216 212 .. literalinclude:: code/treestructure.py … … 243 239 tree, we would call cutTree(root, 2). The function will be recursive, 244 240 with the second argument (level) decreasing at each call; when zero, 245 the current node will be made a leaf (part of `treestructure.py`_, uses 246 `lenses.tab`_): 241 the current node will be made a leaf: 247 242 248 243 .. literalinclude:: code/treestructure.py … … 274 269 .. _treelearner.py: code/treelearner.py 275 270 276 Let us construct a :obj:`TreeLearner` to play with (`treelearner.py`_, 277 uses `lenses.tab`_): 271 Let us construct a :obj:`TreeLearner` to play with: 278 272 279 273 .. literalinclude:: code/treelearner.py … … 935 929 936 930 We shall build a small tree from the iris data set - we shall limit the 937 depth to three levels (part of `orngTree1.py`_, uses `iris.tab`_):931 depth to three levels: 938 932 939 933 .. literalinclude:: code/orngTree1.py … … 1307 1301 Let's say with like to print the classification margin for each node, 1308 1302 that is, the difference between the proportion of the largest and the 1309 second largest class in the node (part of `orngTree2.py`_):1303 second largest class in the node: 1310 1304 1311 1305 .. _orngTree2.py: code/orngTree2.py … … 1358 1352 =========================== 1359 1353 1360 As C4.5 is a standard benchmark in machine learning, 1361 it is incorporated in Orange, although Orange has its own 1362 implementation of decision trees. 1363 1364 The implementation uses the original Quinlan's code for learning so the 1365 tree you get is exactly like the one that would be build by standalone 1366 C4.5. Upon return, however, the original tree is copied to Orange 1367 components that contain exactly the same information plus what is needed 1368 to make them visible from Python. To be sure that the algorithm behaves 1369 just as the original, we use a dedicated class :class:`C45Node` 1370 instead of reusing the components used by Orange's tree inducer 1371 (ie, :class:`Node`). This, however, could be done and probably 1372 will be done in the future; we shall still retain :class:`C45Node` 1373 but offer transformation to :class:`Node` so that routines 1374 that work on Orange trees will also be usable for C45 trees. 1354 C4.5 is incorporated in Orange because it is a standard benchmark in 1355 machine learning. The implementation uses the original C4.5 code, so the 1356 resulting tree is exactly like the one that would be build by standalone 1357 C4.5. The built tree is made accessible in Python. 1375 1358 1376 1359 :class:`C45Learner` and :class:`C45Classifier` behave … … 1381 1364 ========================= 1382 1365 1383 C4.5 is not distributed with Orange, but it can be incorporated as a1384 plug-in. A C compiler is need for the procedure: on Windows MS Visual C1366 C4.5 is not distributed with Orange, but it can be added as a 1367 plug-in. A C compiler is needed for the procedure: on Windows MS Visual C 1385 1368 (CL.EXE and LINK.EXE must be on the PATH), on Linux and OS X gcc (OS X 1386 1369 users can download it from Apple). … … 1473 1456 .. _iris.tab: code/iris.tab 1474 1457 1475 Th e simplest way to use :class:`C45Learner` is to call it. This1458 This 1476 1459 script constructs the same learner as you would get by calling 1477 the usual C4.5 (`tree_c45.py`_, uses `iris.tab`_):1460 the usual C4.5: 1478 1461 1479 1462 .. literalinclude:: code/tree_c45.py 1480 1463 :lines: 7-14 1481 1464 1482 Arguments can be set by the usual mechanism (the below to lines do the 1483 same, except that one uses command-line symbols and the other internal 1484 variable names) 1485 1486 :: 1465 Both C4.5 command-line symbols and variable names can be used. The 1466 following lines produce the same result:: 1487 1467 1488 1468 tree = Orange.classification.tree.C45Learner(data, m=100) 1489 tree = Orange.classification.tree.C45Learner(data, minObjs=100) 1490 1491 The way that could be prefered by veteran C4.5 user might be through 1492 method `:obj:C45Learner.commandline`. 1493 1494 :: 1469 tree = Orange.classification.tree.C45Learner(data, min_objs=100) 1470 1471 A veteran C4.5 might prefer :func:`C45Learner.commandline`:: 1495 1472 1496 1473 lrn = Orange.classification.tree.C45Learner() … … 1498 1475 tree = lrn(data) 1499 1476 1500 There's nothing special about using :obj:`C45Classifier` - it's 1501 just like any other. To demonstrate what the structure of 1502 :class:`C45Node`'s looks like, will show a script that prints 1503 it out in the same format as C4.5 does. 1477 The following script prints out the tree same format as C4.5 does. 1504 1478 1505 1479 .. literalinclude:: code/tree_c45_printtree.py 1506 1480 1507 Leaves are the simplest. We just print out the value contained 1508 in :samp:`node.leaf`. Since this is not a qualified value (ie., 1509 :obj:`C45Node` does not know to which attribute it belongs), we need to 1510 convert it to a string through :obj:`class_var`, which is passed as an 1511 extra argument to the recursive part of printTree. 1481 For the leaves just the value in ``node.leaf`` in printed. Since 1482 :obj:`C45Node` does not know to which attribute it belongs, we need to 1483 convert it to a string through ``classvar``, which is passed as an extra 1484 argument to the recursive part of printTree. 1512 1485 1513 1486 For discrete splits without subsetting, we print out all attribute values 1514 and recursively call the function for all branches. Continuous splits are1515 equally easy to handle.1516 1517 For discrete splits with subsetting, we iterate through branches, retrieve1518 the corresponding values that go into each branch to inset, turn 1519 t he values into strings and print them out, separately treating the1520 case when only a single value goes into the branch.1487 and recursively call the function for all branches. Continuous splits 1488 are equally easy to handle. 1489 1490 For discrete splits with subsetting, we iterate through branches, 1491 retrieve the corresponding values that go into each branch to inset, 1492 turn the values into strings and print them out, separately treating 1493 the case when only a single value goes into the branch. 1521 1494 1522 1495 ================= … … 1556 1529 :obj:`SimpleTreeLearner` is used in much the same way as :obj:`TreeLearner`. 1557 1530 A typical example of using :obj:`SimpleTreeLearner` would be to build a random 1558 forest (uses `iris.tab`_):1531 forest: 1559 1532 1560 1533 .. literalinclude:: code/simple_tree_random_forest.py … … 1636 1609 nothing, you are running C4.5. 1637 1610 1638 .. attribute:: gain Ratio (g)1611 .. attribute:: gain_ratio (g) 1639 1612 1640 1613 Determines whether to use information gain (false, default) … … 1651 1624 Enables subsetting (default: false, no subsetting), 1652 1625 1653 .. attribute:: prob Thresh (p)1626 .. attribute:: prob_thresh (p) 1654 1627 1655 1628 Probabilistic threshold for continuous attributes (default: false). 1656 1629 1657 .. attribute:: min Objs (m)1630 .. attribute:: min_objs (m) 1658 1631 1659 1632 Minimal number of objects (examples) in leaves (default: 2). … … 1682 1655 """ 1683 1656 1657 _rename_new_old = { "min_objs": "minObjs", "probTresh": "prob_tresh", 1658 "gain_ratio": "gainRatio" } 1659 #_rename_new_old = {} 1660 _rename_old_new = dict((a,b) for b,a in _rename_new_old.items()) 1661 1662 @classmethod 1663 def _rename_dict(cls, dic): 1664 return dict((cls._rename_arg(a),b) for a,b in dic.items()) 1665 1666 @classmethod 1667 def _rename_arg(cls, a): 1668 if a in cls._rename_old_new: 1669 Orange.misc.deprecation_warning(a, cls._rename_old_new[a], stacklevel=4) 1670 return cls._rename_new_old.get(a, a) 1671 1684 1672 def __new__(cls, instances = None, weightID = 0, **argkw): 1685 self = Orange.classification.Learner.__new__(cls, ** argkw)1673 self = Orange.classification.Learner.__new__(cls, **cls._rename_dict(argkw)) 1686 1674 if instances: 1687 self.__init__(** argkw)1675 self.__init__(**cls._rename_dict(argkw)) 1688 1676 return self.__call__(instances, weightID) 1689 1677 else: … … 1691 1679 1692 1680 def __init__(self, **kwargs): 1693 self.base = _C45Learner(** kwargs)1681 self.base = _C45Learner(**self._rename_dict(kwargs)) 1694 1682 1695 1683 def __setattr__(self, name, value): 1696 if name != "base" and name in self.base.__dict__: 1697 self.base.__dict__[name] = value 1684 nn = self._rename_arg(name) 1685 if name != "base" and nn in self.base.__dict__: 1686 self.base.__dict__[nn] = value 1687 elif name == "base": 1688 self.__dict__["base"] = value 1698 1689 else: 1699 self.__dict__["base"] = value 1700 1701 def __getattr(self, name): 1690 settingAttributesNotSuccessful 1691 1692 def __getattr__(self, name): 1693 nn = self._rename_arg(name) 1702 1694 if name != "base": 1703 return self.base.__dict__[n ame]1695 return self.base.__dict__[nn] 1704 1696 else: 1705 1697 return self.base 1706 1698 1707 1699 def __call__(self, *args, **kwargs): 1708 return C45Classifier(self.base(*args, ** kwargs))1700 return C45Classifier(self.base(*args, **self._rename_dict(kwargs))) 1709 1701 1710 1702 def commandline(self, ln): … … 1787 1779 1788 1780 1789 The standalone C4.5 would , on the same data,print::1781 The standalone C4.5 would print:: 1790 1782 1791 1783 physician-fee-freeze = n: democrat (253.4/5.9) … … 1800 1792 | | | | anti-satellite-test-ban = y: republican (2.2/1.0) 1801 1793 1802 4.5 also prints out the number of errors on learning data in1794 C4.5 also prints out the number of errors on learning data in 1803 1795 each node. 1804 1796 """ -
orange/doc/Orange/rst/code/tree_c45.py
r8042 r8999 36 36 print 37 37 38 import orngStat, orngTest 39 res = orngTest.crossValidation([Orange.classification.tree.C45Learner(),38 39 res = Orange.evaluation.testing.cross_validation([Orange.classification.tree.C45Learner(), 40 40 Orange.classification.tree.C45Learner(convertToOrange=1)], data) 41 print "Classification accuracy: %5.3f (converted to tree: %5.3f)" % tuple( orngStat.CA(res))42 print "Brier score: %5.3f (converted to tree: %5.3f)" % tuple( orngStat.BrierScore(res))41 print "Classification accuracy: %5.3f (converted to tree: %5.3f)" % tuple(Orange.evaluation.scoring.CA(res)) 42 print "Brier score: %5.3f (converted to tree: %5.3f)" % tuple(Orange.evaluation.scoring.Brier_score(res))
Note: See TracChangeset
for help on using the changeset viewer.
