source: orange/Orange/doc/modules/orngLR.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 16.5 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<!-- saved from url=(0022)http://internet.e-mail -->
2<!-- saved from url=(0022)http://internet.e-mail -->
3<html><HEAD>
4<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
5</HEAD>
6<body>
7<h1>orngLR: Orange&nbsp;Logistic Regression&nbsp;Module</h1>
8<index name="modules/logistic regression">
9<index name="classifiers+logistic regression">
10<index name="classes/LogRegLearner (in orngLR)">
11
12<P>Module orngLR is a set of wrappers around classes LogisticLearner and
13LogisticClassifier, that are implemented in core Orange. This module expanses
14use of logistic regression to discrete attributes, it helps handling various
15anomalies in attributes, such as constant variables and singularities, that
16makes fitting logistic regression almost impossible. It also implements a
17function for constructing a stepwise logistic regression, which is a good
18technique to prevent overfitting, and is a good feature subset selection
19technique as well.<HR>
20
21
22</P>
23
24<H2>Functions</H2>
25<DL>
26
27<DT><B>LogRegLearner</B>([<EM>examples=None, weightID=0, removeSingular=0, fitter = None, stepwiseLR = 0, addCrit=0.2, deleteCrit=0.3, numAttr=-1</EM>])
28<DD class=ddfun>
29Returns a LogisticClassifier if examples are given. If <code>examples</code> are
30not specified, an instance of object LogisticLearner with its parameters
31appropriately initialized is returned.&nbsp; Parameter <code>weightID</code>
32defines the ID of the weight meta attribute. Set parameter <code>removeSingular</code> to 1,if you want automatic removal of disturbing attributes, such as constants and
33singularities. Examples can contain discrete and continuous attributes.
34Parameter <code>fitter</code> is used to alternate fitting algorithm. Currently a Newton-Raphson fitting algorithm is used, however you can change it to something else. You can find <code>bayesianFitter</code> in orngLR to test it out. The last three parameters <code>addCrit, deleteCrit, numAttr</code> are used to set parameters for stepwise attribute selection (see next method). If you wish to use stepwise within LogRegLearner, <code>stpewiseLR</code> must be set as 1.
35<dt>&nbsp;</dt> 
36</dd>
37<DT><B>stepWiseFSS</B>([<em>examples=None, addCrit=0.2, deleteCrit=0.3, numAttr=-1</em>])
38<DD class=ddfun>
39If <code>examples</code> are specified, stepwise logistic regression implemented in <code>stepWiseFSS_class</code> is performed and a list of chosen attributes is returned. If <code>examples</code> are not specified an instance of <code>stepWiseFSS_class</code> with all parameters set is returned. Parameters <code>addCrit, deleteCrit</code> and <code>numAttr</code> are explained in the description of <code>stepWiseFSS_class</code>.
40</dd>
41<DT><B>bestNAtt</B>([<em>examples, N, addCrit=0.2, deleteCrit=0.3</em>])
42<DD class=ddfun>
43Returns "best" <code>N</code> attributes selected with stepwise logistic regression. Parameter <code>examples</code> is required.
44</dd>
45
46<DT><B>printOUT</B>([<em>classifier</em>])
47<DD class=ddfun>
48Formatted print to console of all major attributes in logistic regression classifier. Parameter <code>classifier</code> is a logistic regression classifier.
49</dd>
50
51</dl>
52
53<h2>Classes</h2>
54
55
56<dl>
57<dt><b>stepWiseFSS_class</b>([<em>examples=None, addCrit=0.2, deleteCrit=0.3, numAttr=-1</em>])</dt>
58<dd class="ddfun">Performs stepwise logistic regression and returns a list of &quot;most&quot; informative attributes. Each step of this algorithm is composed of two parts. First is backward elimination, where each already chosen attribute is tested for significant contribution to overall model.
59If worst among all tested attributes has higher significance that is specified in <code>deleteCrit</code>, this attribute is removed from the model. The second step is forward selection, which is similar to backward elimination. It loops through all attributes that are not in model and tests whether they contribute to common model with significance lower that <code>addCrit</code>.
60Algorithm stops when no attribute in model is to be removed and no attribute out of the model is to be added. By setting <code>numAttr</code> larger than -1, algorithm will stop its execution when number of attributes in model will exceed that number.
61<br>
62Significances are assesed via the likelihood ration chi-square test. Normal F test is not appropriate, because errors are assumed to follow a binomial distribution.
63
64</dd>
65
66</dl>
67
68<H2>Examples</H2>
69
70First example shows a very simple induction of a logistic regression classifier.
71<XMP class=code>import orange, orngLR
72data = orange.ExampleTable("titanic")
73
74lr = orngLR.LogRegLearner(data)
75
76# compute classification accuracy
77correct = 0.0
78for ex in data:
79    if lr(ex) == ex.getclass():
80        correct += 1
81print "Classification accuracy:", correct/len(data)
82orngLR.printOUT(lr)
83</XMP>
84
85<p>Result:</p>
86
87<XMP class=code>Classification accuracy: 0.778282598819
88
89class attribute = survived
90class values = <yes, no>
91
92    Attribute       beta  st. error     wald Z          P
93
94    Intercept       0.38       0.14       2.73       0.01
95status=second       1.02       0.20       5.13       0.00
96 status=third       1.78       0.17      10.24       0.00
97  status=crew       0.86       0.16       5.39       0.00
98    age=child      -1.06       0.25      -4.30       0.00
99   sex=female      -2.42       0.14     -17.04       0.00
100</XMP>
101
102Next examples shows how to handle singularities in data sets.
103<XMP class=code>import orange, orngLR
104data = orange.ExampleTable("adult_sample.tab")
105lr = orngLR.LogRegLearner(data, removeSingular = 1)
106
107for ex in data[:5]:
108    print ex.getclass(), lr(ex)
109orngLR.printOUT(lr)
110</XMP>
111
112<p>Result:</p>
113
114<XMP class=code>removing education=Preschool
115removing education-num
116removing workclass=Never-worked
117removing native-country=Honduras
118removing native-country=Thailand
119removing native-country=Ecuador
120removing native-country=Portugal
121removing native-country=France
122removing native-country=Yugoslavia
123removing native-country=Trinadad&Tobago
124removing native-country=Hong
125removing native-country=Hungary
126removing native-country=Holand-Netherlands
127<=50K <=50K
128<=50K <=50K
129<=50K <=50K
130>50K >50K
131<=50K >50K
132
133class attribute = y
134class values = <>50K, <=50K>
135
136                                Attribute       beta  st. error     wald Z          P
137
138                                Intercept       1.39       0.82       1.71       0.09
139                                      age      -0.04       0.01      -3.60       0.00
140               workclass=Self-emp-not-inc      -0.10       0.37      -0.27       0.79
141                   workclass=Self-emp-inc       0.49       0.50       0.97       0.33
142                    workclass=Federal-gov      -1.38       0.52      -2.69       0.01
143                      workclass=Local-gov      -0.12       0.41      -0.29       0.77
144                      workclass=State-gov       0.30       0.47       0.63       0.53
145                    workclass=Without-pay       2.50       2.55       0.98       0.33
146                                   fnlwgt      -0.00       0.00      -0.66       0.51
147                   education=Some-college       1.12       0.32       3.44       0.00
148                           education=11th       1.51       0.75       2.02       0.04
149                        education=HS-grad       1.16       0.31       3.76       0.00
150                    education=Prof-school       0.03       1.18       0.03       0.98
151                     education=Assoc-acdm       0.42       0.48       0.88       0.38
152                      education=Assoc-voc       2.05       0.58       3.55       0.00
153                            education=9th       2.67       0.99       2.71       0.01
154                        education=7th-8th       2.04       0.77       2.64       0.01
155                           education=12th       3.82       1.11       3.44       0.00
156                        education=Masters      -0.14       0.41      -0.34       0.73
157                        education=1st-4th       1.05       1.52       0.69       0.49
158                           education=10th       3.00       0.91       3.29       0.00
159                      education=Doctorate       0.03       0.80       0.04       0.97
160                        education=5th-6th       0.33       1.28       0.26       0.80
161                  marital-status=Divorced       3.71       1.09       3.39       0.00
162             marital-status=Never-married       3.36       1.03       3.28       0.00
163                 marital-status=Separated       3.21       1.22       2.64       0.01
164                   marital-status=Widowed       3.66       1.20       3.04       0.00
165     marital-status=Married-spouse-absent       5.28       1.72       3.07       0.00
166         marital-status=Married-AF-spouse       4.93       5.06       0.97       0.33
167                  occupation=Craft-repair       0.67       0.51       1.30       0.19
168                 occupation=Other-service       1.97       0.64       3.06       0.00
169                         occupation=Sales       0.43       0.53       0.81       0.42
170               occupation=Exec-managerial       0.45       0.51       0.89       0.37
171                occupation=Prof-specialty       0.54       0.52       1.03       0.30
172             occupation=Handlers-cleaners       1.71       0.74       2.33       0.02
173             occupation=Machine-op-inspct       1.15       0.62       1.84       0.07
174                  occupation=Adm-clerical       0.67       0.53       1.27       0.20
175               occupation=Farming-fishing       1.40       0.78       1.80       0.07
176              occupation=Transport-moving       0.92       0.57       1.62       0.10
177               occupation=Priv-house-serv       2.38       1.81       1.32       0.19
178               occupation=Protective-serv       0.47       0.76       0.61       0.54
179                  occupation=Armed-Forces       1.89       6.36       0.30       0.77
180                   relationship=Own-child      -0.18       1.05      -0.17       0.87
181                     relationship=Husband       1.21       0.51       2.37       0.02
182               relationship=Not-in-family      -0.31       1.12      -0.28       0.78
183              relationship=Other-relative      -1.00       1.23      -0.81       0.42
184                   relationship=Unmarried      -0.47       1.17      -0.40       0.69
185                  race=Asian-Pac-Islander      -0.66       0.90      -0.74       0.46
186                  race=Amer-Indian-Eskimo       1.65       1.91       0.86       0.39
187                               race=Other       2.67       1.53       1.75       0.08
188                               race=Black       0.48       0.38       1.26       0.21
189                                 sex=Male      -0.18       0.37      -0.49       0.62
190                             capital-gain      -0.00       0.00      -6.74       0.00
191                             capital-loss      -0.00       0.00      -2.96       0.00
192                           hours-per-week      -0.04       0.01      -4.37       0.00
193                      native-country=Cuba      -1.04       5.24      -0.20       0.84
194                   native-country=Jamaica      -4.48       2.25      -1.99       0.05
195                     native-country=India       1.03       1.42       0.73       0.47
196                    native-country=Mexico       0.77       0.95       0.81       0.42
197                     native-country=South       1.36       5.84       0.23       0.82
198               native-country=Puerto-Rico       0.52       5.00       0.10       0.92
199                   native-country=England       1.50       2.40       0.63       0.53
200                    native-country=Canada      -0.68       1.41      -0.48       0.63
201                   native-country=Germany      -0.61       0.91      -0.67       0.50
202                      native-country=Iran       3.31       3.46       0.96       0.34
203               native-country=Philippines       1.56       1.98       0.79       0.43
204                     native-country=Italy      -2.10       1.40      -1.50       0.13
205                    native-country=Poland       0.84       2.60       0.32       0.75
206                  native-country=Columbia      -1.93       1.78      -1.08       0.28
207                  native-country=Cambodia       1.19       5.52       0.21       0.83
208                      native-country=Laos       3.40       3.34       1.02       0.31
209                    native-country=Taiwan      -0.14       1.81      -0.08       0.94
210                     native-country=Haiti      -3.22       2.07      -1.55       0.12
211        native-country=Dominican-Republic      -3.90       7.85      -0.50       0.62
212               native-country=El-Salvador       1.23       2.87       0.43       0.67
213                 native-country=Guatemala      -1.17       5.41      -0.22       0.83
214                     native-country=China       0.95       1.69       0.56       0.58
215                     native-country=Japan      -2.31       3.31      -0.70       0.48
216                      native-country=Peru      -0.40       4.39      -0.09       0.93
217native-country=Outlying-US(Guam-USVI-etc)       0.95       4.78       0.20       0.84
218                  native-country=Scotland       1.88       3.15       0.60       0.55
219                    native-country=Greece      -0.12       5.00      -0.02       0.98
220                 native-country=Nicaragua       1.26       2.80       0.45       0.65
221                   native-country=Vietnam       2.33       2.53       0.92       0.36
222                   native-country=Ireland      -1.00       1.45      -0.69       0.49
223</XMP>
224
225<p>In case we set <code>removeSingular</code> to 0, inducing logistic regression classifier would return an error:</p>
226
227<XMP class=code>Traceback (most recent call last):
228  File "C:\Python23\lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript
229    exec codeObject in __main__.__dict__
230  File "C:\Python23\Lib\site-packages\logreg1.py", line 4, in ?
231    lr = Ndomain.LogisticLearner(data, removeSingular = 1)
232  File "C:\Python23\Lib\site-packages\Ndomain.py", line 49, in LogisticLearner
233    return lr(examples)
234  File "C:\Python23\Lib\site-packages\Ndomain.py", line 68, in __call__
235    lr = orange.LogisticLearner(nexamples, showSingularity = self.showSingularity)
236KernelException: 'orange.LogisticFitterMinimization': singularity in education=Preschool
237</XMP>
238
239<p>We can see that attribute <code>education=Preschool</code> is causeing singularity. The issue of this is that we should remove <code>Preschool</code> manually or leave it to function <code>LogRegLearner</code> to remove it automatically. In the last example it is shown, how the use of stepwise logistic regression can help us in achieving better classification.
240
241<XMP class=code>import orange, orngFSS, orngTest, orngStat, orngLR
242
243def StepWiseFSS_Filter(examples = None, **kwds):
244    """
245        check function StepWiseFSS()
246    """
247
248    filter = apply(StepWiseFSS_Filter_class, (), kwds)
249    if examples:
250        return filter(examples)
251    else:
252        return filter
253
254
255class StepWiseFSS_Filter_class:
256    def __init__(self, addCrit=0.2, deleteCrit=0.3, numAttr = -1):
257        self.addCrit = addCrit
258        self.deleteCrit = deleteCrit
259        self.numAttr = numAttr
260    def __call__(self, examples):
261        attr = orngLR.StepWiseFSS(examples, addCrit=self.addCrit, deleteCrit = self.deleteCrit, numAttr = self.numAttr)
262        return examples.select(orange.Domain(attr, examples.domain.classVar))
263
264
265data = orange.ExampleTable("d:\\data\\ionosphere.tab")
266
267lr = orngLR.LogRegLearner(removeSingular=1)
268learners = (orngLR.LogRegLearner(name='logistic', removeSingular=1),
269            orngFSS.FilteredLearner(lr, filter=StepWiseFSS_Filter(addCrit=0.05, deleteCrit=0.9), name='filtered'))
270results = orngTest.crossValidation(learners, data, storeClassifiers=1)
271
272# output the results
273print "Learner      CA"
274for i in range(len(learners)):
275  print "%-12s %5.3f" % (learners[i].name, orngStat.CA(results)[i])
276
277# find out which attributes were retained by filtering
278
279print "\nNumber of times attributes were used in cross-validation:"
280attsUsed = {}
281for i in range(10):
282  for a in results.classifiers[i][1].atts():
283    if a.name in attsUsed.keys(): attsUsed[a.name] += 1
284    else: attsUsed[a.name] = 1
285for k in attsUsed.keys():
286  print "%2d x %s" % (attsUsed[k], k)
287</XMP>
288
289<p>Result:</p>
290
291<XMP class=code>Learner      CA
292logistic     0.835
293filtered     0.846
294
295Number of times attributes were used in cross-validation:
296 1 x a20
297 1 x a21
29810 x a22
299 7 x a23
300 5 x a24
301 2 x a25
30210 x a26
30310 x a27
304 3 x a29
305 3 x a17
306 1 x a16
307 4 x a12
308 2 x a32
309 7 x a15
31010 x a14
31110 x a31
312 8 x a30
31310 x a11
314 1 x a10
315 1 x a13
31610 x a34
317 1 x a18
31810 x a3
31910 x a5
320 5 x a4
321 3 x a7
322 7 x a6
323 7 x a9
32410 x a8
325</XMP>
326
327
328<H2>References</H2>
329<P>
330David W. Hosmer, Stanley Lemeshow. Applied Logistic Regression - 2nd ed. Wiley, New York, 2000
331</P>
332
333
334</body>
335</HTML> 
Note: See TracBrowser for help on using the repository browser.