# source:orange/Orange/doc/reference/LogisticLearner.htm@9671:a7b056375472

Revision 9671:a7b056375472, 6.9 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line
1<html>
6<body>
7<index name="classifiers+logistic regression">
8<h1>Logistic Regression Classifier and Learner</h1>
9<P align=left>Logistic regression is a popular classification method that comes from statistics. The model is described by a linear combination of coefficients,
10<P align=center>F = beta_0 + beta_1*X_1 + beta_2*X_2 + ... + beta_k*X_k<BR></P>
11<P>and the probability (p)&nbsp;of a class value is&nbsp; computed as:</P>
12<P align=center>p = exp(F)/(1+exp(F))</P>
13
14<P>The outcome variable (class) must be binary (dichotomous) and discrete attributes must be translated to continuous. While Orange kernel provides the basic functionality, module <A href="../modules/orngLR.htm">orngLR.py</A> covers the necessary adaptations and conversions.</P>
15
16<hr>
17
18<H2>LogRegClassifier</H2>
19<index name="classes/LogRegClassifier">
20
21<P><CODE>LogRegClassifier</CODE> stores estimated values of regression coefficients and their significances, and uses them to predict classes and class probabilities using the equations described above.</P>
22
23<P class=section>Attributes</P>
24<DL class=attributes>
25<DT>beta</DT>
26<DD>Estimated regression coefficients.</DD>
27
28<DT>beta_se</DT>
29<DD>Estimated standard errors for regression coefficients.</DD>
30
31<DT>wald_Z</DT>
32<DD>Wald Z statistics for beta coefficients. Wald Z is computed as <CODE>beta</CODE>/<CODE>beta_se</CODE>.</DD>
33
34<DT>P</DT>
35<DD>List of P-values for beta coefficients, that is, the probability that beta coefficients differ from 0.0. The probability is computed from squared Wald Z statistics that is distributed with Chi-Square distribution.</DD>
36
37<DT>likelihood</DT>
38<DD>The probability of the sample (ie. learning examples) observed on the basis of the derived model, as a function of the regression parameters.
39</DD>
40
41<DT>fitStatus</DT>
42<DD>Tells how the model fitting ended - either regularly  (<CODE>LogRegFitter.OK</CODE>), or it was interrupted due to one of beta coefficients escaping towards infinity (<CODE>LogRegFitter.Infinity</CODE>) or since the values didn't converge (<CODE>LogRegFitter.Divergence</CODE>). The value tells about the classifier's "reliability"; the classifier itself is useful in either case.</DD>
43</DL>
44
45<H2>LogRegLearner</H2>
46<index name="classes/LogRegLearner">
47
48<P>Logistic learner fits the beta coefficients and computes the related statistics by calling the specified <CODE>fitter</CODE>.</P>
49
50<P class=section>Attributes</P>
51<DL class=attributes>
52<DT>fitter</DT>
53<DD>An object that fits beta coefficients and corresponding
54standard errors from a set of data.</DD>
55</DL>
56
57<P class=section>Methods</P>
58<DL class=attributes>
59<DT>fitModel(examples[, weightID =])</DT>
60<DD>Fits the model by calling <CODE>fitter</CODE>. If fitting succeeds, it returns a <CODE>Classifer</CODE>; if not, it returns the offending attribute. You should therefore always check the type of result returned, as follows.
61
62<XMP class=code>c = fitModel(examples)
63if isinstance(c, Variable):
64    < remove the attribute c and see what happens >
65else:
66    < we have a classifier, life is beautiful >
67</XMP>
68</DL>
69
70<P>As all learners, <CODE>LogRegLearner</CODE> naturally provides the usual call operator, whom you pass examples (and weights, if you have them) and which returns a classifier or throws an exception if it can't. Use <CODE>fitModel</CODE> in the code that will iteratively remove problem attributes until it gets a classifier; in fact, that's exactly what <a href="../modules/orngLR.htm"><CODE>orngLR</CODE></A> does.</DD>
71</DL>
72
73<H2><A name=fitters></A>Logistic Regression Fitters</H2>
74
75<P>Fitters are objects that LogRegLearner uses to fit the model.</P>
76
77<H3>LogRegFitter</H3>
78<index name="classes/LogRegFitter">
79
80<P><CODE>LogRegFitter</CODE> is the abstract base class for logistic fitters. It defines the form of call operator and the constants denoting its (un)success:</P>
81
82<P class=section>Constants</P>
83<DL class=attributes>
84<DT>OK</DT>
85<DD>Fitter succeeded to converge to the optimal fit.</DD>
86
87<DT>Infinity</DT>
88<DD>Fitter failed due to one or more beta coefficients escaping towards infinity.</DD>
89
90<DT>Divergence</DT>
91<DD>Beta coefficients failed to converge, but none of beta coefficients escaped.</DD>
92
93<DT>Constant</DT>
94<DD>There is a constant attribute that causes the matrix to be singular.</DD>
95
96<DT>Singularity</DT>
97<DD>The matrix is singular.</DD>
98</DL>
99
100<P class=section>Methods</P>
101<DL class=attributes>
102<DT>__call__(examples, weightID)</DT>
103<DD>Performs the fitting. There can be two different cases: either the fitting succeeded to find a set of beta coefficients (although possibly with difficulties) or the fitting failed altogether. The two cases return different results.
104<DL class=attributes>
105<DT>(status, beta, beta_se, likelihood)</DT>
106<DD>The fitter managed to fit the model. The first element of the tuple, <CODE>result</CODE>, tells about the problems occurred; it can be either <CODE>OK</CODE>, <CODE>Infinity</CODE> or <CODE>Divergence</CODE>. In the latter cases, returned values may still be useful for making predictions, but it's recommended that you inspect the coefficients and their errors and make your decision whether to use the model or not.</DD>
107
108<DT>(status, attribute)</DT>
109<DD>The fitter failed and the returned <CODE>attribute</CODE> is responsible for it. The type of failure is reported in <CODE>status</CODE>, which can be either <CODE>Constant</CODE> or <CODE>Singularity</CODE>.</DD>
110</DL>
111</P>
112
113<P style="margin-top:12pt">The proper way of calling the fitter is to expect and handle all the situations described. For instance, if <CODE>fitter</CODE> is an instance of some fitter and <CODE>examples</CODE> contain a set of suitable examples, a script should look like this:</P>
114
115<XMP class=code>res = fitter(examples)
116if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]:
117   status, beta, beta_se, likelihood = res
118   < proceed by doing something with what you got >
119else:
120   status, attr = res
121   < remove the attribute or complain to the user or ... >
122</XMP>
123</DL>
124
125<H3>LogRegFitter_Cholesky</H3>
126<index name="classes/LogRegFitter_Cholesky">
127
128<P><CODE>LogRegFitter_Cholesky</CODE> is the sole fitter available at the moment. It is a C++ translation of <A href="http://users.bigpond.net.au/amiller/">Alan Miller's logistic regression code</A>. It uses Newton-Raphson algorithm to iteratively minimize least squares error computed from learning examples.</P>
129
130<hr>
131
132<P><FONT size=5>Examples</FONT></P></H2>
133
134<p>Since basic logistic regression allows only continuous
135attributes and a dichotome class, we show only a very basic example. More detailed use of logistic regression is shown in logistic regression module.</p>
136
137<p>Let us load the data, induce a classifier and see how it performs on the first five examples.</p>
138
139<xmp class="code">>>> data = orange.ExampleTable("ionosphere")
140>>> logistic = orange.LogRegLearner(data)
141>>>
142>>> for ex in data[:5]:
143...    print ex.getclass(), logistic(ex)
144g g
145b b
146g g
147b b
148g g
149</xmp>
150
151</body>
Note: See TracBrowser for help on using the repository browser.