# source:orange/Orange/doc/ofb/regression.htm@9671:a7b056375472

Revision 9671:a7b056375472, 6.8 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line
4<body>
5
6<p class="Path">
7Prev: <a href="c_bagging.htm">Bagging</a>,
8Next: <a href="assoc.htm">Association Rules</a>,
9Up: <a href="default.htm">On Tutorial 'Orange for Beginners'</a>
10</p>
11
12<H1>Regression</H1>
13<index name="regression">
14<p>At the time of writing of this part of tutorial, there were
15essentially two different learning methods for regression modelling:
16regression trees and instance-based learner (k-nearest neighbors). In
17this lesson, we will see that using regression is just like using
18classifiers, and evaluation techniques are not much different
19either.</p>
20
21<h2>Few Simple Regressors</h2>
22
24trees"></index>regression trees. Below is an example script that builds
25the tree from <a href="housing.tab">housing</a> data set and prints
26out the tree in textual form.</p>
27
28<p class="header"><a href="regression1.py">regression1.py</a> (uses <a href=
29"housing.tab">housing.tab</a>)</p>
30<xmp class="code">import orange, orngTree
31
32data = orange.ExampleTable("housing.tab")
33rt = orngTree.TreeLearner(data, measure="retis", mForPruning=2, minExamples=20)
34orngTree.printTxt(rt, leafStr="%V %I")
35</xmp>
36
37<p>Notice special setting for attribute evaluation measure! Following is the output of this script:</p>
38
39<xmp class="code">RM<6.941: 19.9 [19.333-20.534]
40RM>=6.941
41|    RM<7.437
42|    |    CRIM>=7.393: 14.4 [10.172-18.628]
43|    |    CRIM<7.393
44|    |    |    DIS<1.886: 45.7 [37.124-54.176]
45|    |    |    DIS>=1.886: 32.7 [31.656-33.841]
46|    RM>=7.437
47|    |    TAX<534.500: 45.9 [44.295-47.498]
48|    |    TAX>=534.500: 21.9 [21.900-21.900]
49</xmp>
50
51<index name="regression/k nearest neighbours">
52<p>Predicting continues classes is just like predicting crisp ones. In
53this respect, the following script will be nothing new. It uses both
54regression trees and k-nearest neighbors, and also uses a majority
55learner which for regression simply returns an average value from
56learning data set.</p>
57
58<p class="header"><a href="regression2.py">regression2.py</a> (uses <a href=
59"housing.tab">housing.tab</a>)</p>
60<xmp class="code">import orange, orngTree, orngTest, orngStat
61
62data = orange.ExampleTable("housing.tab")
63selection = orange.MakeRandomIndices2(data, 0.5)
64train_data = data.select(selection, 0)
65test_data = data.select(selection, 1)
66
67maj = orange.MajorityLearner(train_data)
68maj.name = "default"
69
70rt = orngTree.TreeLearner(train_data, measure="retis", mForPruning=2, minExamples=20)
71rt.name = "reg. tree"
72
73k = 5
74knn = orange.kNNLearner(train_data, k=k)
75knn.name = "k-NN (k=%i)" % k
76
77regressors = [maj, rt, knn]
78
79print "\n%10s " % "original",
80for r in regressors:
81  print "%10s " % r.name,
82print
83
84for i in range(10):
85  print "%10.1f " % test_data[i].getclass(),
86  for r in regressors:
87    print "%10.1f " % r(test_data[i]),
88  print
89</xmp>
90
91<p>Here goes the output:</p>
92<xmp class="code">  original     default   reg. tree  k-NN (k=5)
93      24.0        50.0        25.0        24.6
94      21.6        50.0        25.0        22.0
95      34.7        50.0        35.4        26.6
96      28.7        50.0        25.0        36.2
97      27.1        50.0        21.7        18.9
98      15.0        50.0        21.7        18.9
99      18.9        50.0        21.7        18.9
100      18.2        50.0        21.7        21.0
101      17.5        50.0        21.7        16.6
102      20.2        50.0        21.7        23.1
103</xmp>
104
105<h2>Evaluation and Scoring</h2>
106
107<p>For our third and last example for regression, let us see how we
108can use cross-validation testing and for a score function use <index>mean
109squared error</index>.</p>
110
111<p class="header"><a href="regression3.py">regression3.py</a> (uses <a href=
112"housing.tab">housing.tab</a>)</p>
113<xmp class="code">import orange, orngTree, orngTest, orngStat
114
115data = orange.ExampleTable("housing.tab")
116
117maj = orange.MajorityLearner()
118maj.name = "default"
119rt = orngTree.TreeLearner(measure="retis", mForPruning=2, minExamples=20)
120rt.name = "regression tree"
121k = 5
122knn = orange.kNNLearner(k=k)
123knn.name = "k-NN (k=%i)" % k
124learners = [maj, rt, knn]
125
126data = orange.ExampleTable("housing.tab")
127results = orngTest.crossValidation(learners, data, folds=10)
128mse = orngStat.MSE(results)
129
130print "Learner        MSE"
131for i in range(len(learners)):
132  print "%-15s %5.3f" % (learners[i].name, mse[i])
133</xmp>
134
135<p>Again, compared to classification tasks, this is nothing new. The
136only news in the above script is a mean squared error evaluation
137function (<code>orngStat.MSE</code>). The scripts prints out the
138following report:</p>
139
140<xmp class="code">Learner        MSE
141default         84.777
142regression tree 40.096
143k-NN (k=5)      17.532
144</xmp>
145
146<p>Other scoring techniques are available to evaluate the success of
147regression. Script below uses a range of them, plus features a nice
148implementation where a list of scoring techniques is defined
149independetly from the code that reports on the results.</p>
150
151<p class="header">part of <a href="regression4.py">regression4.py</a> (uses <a href=
152"housing.tab">housing.tab</a>)</p>
153<xmp class="code">lr = orngRegression.LinearRegressionLearner(name="lr")
154rt = orngTree.TreeLearner(measure="retis", mForPruning=2,
155                          minExamples=20, name="rt")
156maj = orange.MajorityLearner(name="maj")
157knn = orange.kNNLearner(k=10, name="knn")
158learners = [maj, lr, rt, knn]
159
160# evaluation and reporting of scores
161results = orngTest.learnAndTestOnTestData(learners, train, test)
162scores = [("MSE", orngStat.MSE),
163          ("RMSE", orngStat.RMSE),
164          ("MAE", orngStat.MAE),
165          ("RSE", orngStat.RSE),
166          ("RRSE", orngStat.RRSE),
167          ("RAE", orngStat.RAE),
168          ("R2", orngStat.R2)]
169
170print "Learner  " + "".join(["%-7s" % s[0] for s in scores])
171for i in range(len(learners)):
172    print "%-8s " % learners[i].name + "".join(["%6.3f " % s[1](results)[i] for s in scores])
173</xmp>
174
175<p>Here, we used a number of different scores, including:</P>
176<ul>
177  <li>MSE - mean squared errror</li>
178  <li>RMSE - root mean squared error</li>
179  <li>MAE - mean absolute error</li>
180  <li>RSE - relative squared error</li>
181  <li>RRSE - root relative squared error</li>
182  <li>RAE - relative absolute error</li>
183  <li>R2 - coefficient of determinatin, also referred to as R-squared</li>
184</ul>
185<p>For precise definition of these measures, see <a href="../modules/orngStat.htm">orngStat documentation</a>. Running the script above yields:</p>
186
187<xmp class="code">Learner  MSE    RMSE   MAE    RSE    RRSE   RAE    R2
188maj      84.777  9.207  6.659  1.004  1.002  1.002 -0.004
189lr       23.729  4.871  3.413  0.281  0.530  0.513  0.719
190rt       40.096  6.332  4.569  0.475  0.689  0.687  0.525
191knn      17.244  4.153  2.670  0.204  0.452  0.402  0.796
192</xmp>
193
194<hr><br><p class="Path">
195Prev: <a href="c_bagging.htm">Bagging</a>,
196Next: <a href="assoc.htm">Association Rules</a>,
197Up: <a href="default.htm">On Tutorial 'Orange for Beginners'</a>
198</p>
199
200</body></html>
Note: See TracBrowser for help on using the repository browser.