Changeset 11051:04009d17e84e in orange for docs/tutorial/rst/regression.rst
 Timestamp:
 01/06/13 00:27:59 (16 months ago)
 Branch:
 default
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

docs/tutorial/rst/regression.rst
r9385 r11051 1 .. index:: regression2 3 1 Regression 4 2 ========== 5 3 6 At the time of writing of this part of tutorial, there were 7 essentially two different learning methods for regression modelling: 8 regression trees and instancebased learner (knearest neighbors). In 9 this lesson, we will see that using regression is just like using 10 classifiers, and evaluation techniques are not much different either. 4 .. index:: regression 5 6 From the interface point of view, regression methods in Orange are very similar to classification. Both intended for supervised data mining, they require classlabeled data. Just like in classification, regression is implemented with learners and regression models (regressors). Regression learners are objects that accept data and return regressors. Regression models are given data items to predict the value of continuous class: 7 8 .. literalinclude:: code/regression.py 9 10 11 Handful of Regressors 12  11 13 12 14 .. index:: 13 single: regression; regression trees15 single: regression; tree 14 16 15 Few simple regressors 16  17 Let us start with regression trees. Below is an example script that builds the tree from data on housing prices and prints out the tree in textual form: 17 18 18 Let us start with regression trees. Below is an example script that builds 19 the tree from :download:`housing.tab <code/housing.tab>` data set and prints 20 out the tree in textual form (:download:`regression1.py <code/regression1.py>`):: 19 .. literalinclude:: code/regressiontree.py 20 :lines: 3 21 21 22 import orange, orngTree 22 The script outputs the tree:: 23 23 24 data = orange.ExampleTable("housing.tab") 25 rt = orngTree.TreeLearner(data, measure="retis", mForPruning=2, minExamples=20) 26 orngTree.printTxt(rt, leafStr="%V %I") 27 28 Notice special setting for attribute evaluation measure! Following is 29 the output of this script:: 30 31 RM<6.941: 19.9 [19.33320.534] 32 RM>=6.941 33  RM<7.437 34   CRIM>=7.393: 14.4 [10.17218.628] 35   CRIM<7.393 36    DIS<1.886: 45.7 [37.12454.176] 37    DIS>=1.886: 32.7 [31.65633.841] 38  RM>=7.437 39   TAX<534.500: 45.9 [44.29547.498] 40   TAX>=534.500: 21.9 [21.90021.900] 24 RM<=6.941: 19.9 25 RM>6.941 26  RM<=7.437 27   CRIM>7.393: 14.4 28   CRIM<=7.393 29    DIS<=1.886: 45.7 30    DIS>1.886: 32.7 31  RM>7.437 32   TAX<=534.500: 45.9 33   TAX>534.500: 21.9 34 35 Following is initialization of few other regressors and their prediction of the first five data instances in housing price data set: 41 36 42 37 .. index:: 43 single: regression; k nearest neighbours 38 single: regression; mars 39 single: regression; linear 44 40 45 Predicting continues classes is just like predicting crisp ones. In 46 this respect, the following script will be nothing new. It uses both 47 regression trees and knearest neighbors, and also uses a majority 48 learner which for regression simply returns an average value from 49 learning data set (:download:`regression2.py <code/regression2.py>`):: 41 .. literalinclude:: code/regressionother.py 42 :lines: 3 50 43 51 import orange, orngTree, orngTest, orngStat 52 53 data = orange.ExampleTable("housing.tab") 54 selection = orange.MakeRandomIndices2(data, 0.5) 55 train_data = data.select(selection, 0) 56 test_data = data.select(selection, 1) 57 58 maj = orange.MajorityLearner(train_data) 59 maj.name = "default" 60 61 rt = orngTree.TreeLearner(train_data, measure="retis", mForPruning=2, minExamples=20) 62 rt.name = "reg. tree" 63 64 k = 5 65 knn = orange.kNNLearner(train_data, k=k) 66 knn.name = "kNN (k=%i)" % k 67 68 regressors = [maj, rt, knn] 69 70 print "\n%10s " % "original", 71 for r in regressors: 72 print "%10s " % r.name, 73 print 74 75 for i in range(10): 76 print "%10.1f " % test_data[i].getclass(), 77 for r in regressors: 78 print "%10.1f " % r(test_data[i]), 79 print 44 Looks like the housing prices are not that hard to predict:: 80 45 81 The otput of this script is:: 46 y lin mars tree 47 21.4 24.8 23.0 20.1 48 15.7 14.4 19.0 17.3 49 36.5 35.7 35.6 33.8 82 50 83 original default reg. tree kNN (k=5) 84 24.0 50.0 25.0 24.6 85 21.6 50.0 25.0 22.0 86 34.7 50.0 35.4 26.6 87 28.7 50.0 25.0 36.2 88 27.1 50.0 21.7 18.9 89 15.0 50.0 21.7 18.9 90 18.9 50.0 21.7 18.9 91 18.2 50.0 21.7 21.0 92 17.5 50.0 21.7 16.6 93 20.2 50.0 21.7 23.1 51 Cross Validation 52  94 53 95 .. index: mean squared error 54 Just like for classification, the same evaluation module (``Orange.evaluation``) is available for regression. Its testing submodule includes procedures such as crossvalidation, leaveoneout testing and similar, and functions in scoring submodule can assess the accuracy from the testing: 96 55 97 Evaluation and scoring 98 56 .. literalinclude:: code/regressionother.py 57 :lines: 3 99 58 100 For our third and last example for regression, let us see how we can 101 use crossvalidation testing and for a score function use 102 (:download:`regression3.py <code/regression3.py>`, uses `housing.tab <code/housing.tab>`):: 59 .. index: 60 single: regression; root mean squared error 103 61 104 import orange, orngTree, orngTest, orngStat 105 106 data = orange.ExampleTable("housing.tab") 107 108 maj = orange.MajorityLearner() 109 maj.name = "default" 110 rt = orngTree.TreeLearner(measure="retis", mForPruning=2, minExamples=20) 111 rt.name = "regression tree" 112 k = 5 113 knn = orange.kNNLearner(k=k) 114 knn.name = "kNN (k=%i)" % k 115 learners = [maj, rt, knn] 116 117 data = orange.ExampleTable("housing.tab") 118 results = orngTest.crossValidation(learners, data, folds=10) 119 mse = orngStat.MSE(results) 120 121 print "Learner MSE" 122 for i in range(len(learners)): 123 print "%15s %5.3f" % (learners[i].name, mse[i]) 62 `MARS <http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines>`_ has the lowest root mean squared error:: 124 63 125 Again, compared to classification tasks, this is nothing new. The only 126 news in the above script is a mean squared error evaluation function 127 (``orngStat.MSE``). The scripts prints out the following report:: 64 Learner RMSE 65 lin 4.83 66 mars 3.84 67 tree 5.10 128 68 129 Learner MSE130 default 84.777131 regression tree 40.096132 kNN (k=5) 17.532133 134 Other scoring techniques are available to evaluate the success of135 regression. Script below uses a range of them, plus features a nice136 implementation where a list of scoring techniques is defined137 independetly from the code that reports on the results (part of138 :download:`regression4.py <code/regression4.py>`)::139 140 lr = orngRegression.LinearRegressionLearner(name="lr")141 rt = orngTree.TreeLearner(measure="retis", mForPruning=2,142 minExamples=20, name="rt")143 maj = orange.MajorityLearner(name="maj")144 knn = orange.kNNLearner(k=10, name="knn")145 learners = [maj, lr, rt, knn]146 147 # evaluation and reporting of scores148 results = orngTest.learnAndTestOnTestData(learners, train, test)149 scores = [("MSE", orngStat.MSE),150 ("RMSE", orngStat.RMSE),151 ("MAE", orngStat.MAE),152 ("RSE", orngStat.RSE),153 ("RRSE", orngStat.RRSE),154 ("RAE", orngStat.RAE),155 ("R2", orngStat.R2)]156 157 print "Learner " + "".join(["%7s" % s[0] for s in scores])158 for i in range(len(learners)):159 print "%8s " % learners[i].name + "".join(["%6.3f " % s[1](results)[i] for s in scores])160 161 Here, we used a number of different scores, including:162 163 * MSE  mean squared errror,164 * RMSE  root mean squared error,165 * MAE  mean absolute error,166 * RSE  relative squared error,167 * RRSE  root relative squared error,168 * RAE  relative absolute error, and169 * R2  coefficient of determinatin, also referred to as Rsquared.170 171 For precise definition of these measures, see :py:mod:`Orange.statistics`. Running172 the script above yields::173 174 Learner MSE RMSE MAE RSE RRSE RAE R2175 maj 84.777 9.207 6.659 1.004 1.002 1.002 0.004176 lr 23.729 4.871 3.413 0.281 0.530 0.513 0.719177 rt 40.096 6.332 4.569 0.475 0.689 0.687 0.525178 knn 17.244 4.153 2.670 0.204 0.452 0.402 0.796179
Note: See TracChangeset
for help on using the changeset viewer.