# source:orange/docs/reference/rst/Orange.classification.logreg.rst@9818:2ec8ecdb81e5

Revision 9818:2ec8ecdb81e5, 9.1 KB checked in by Matija Polajnar <matija.polajnar@…>, 2 years ago (diff)

Finish the logreg refactoring, along with documentation improvement.

Line
1.. automodule:: Orange.classification.logreg
2
3.. index: logistic regression
4.. index:
5   single: classification; logistic regression
6
7********************************
8Logistic regression (``logreg``)
9********************************
10
11`Logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_
12is a statistical classification methods that fits data to a logistic
13function. Orange's implementation of algorithm
14can handle various anomalies in features, such as constant variables and
15singularities, that could make direct fitting of logistic regression almost
16impossible. Stepwise logistic regression, which iteratively selects the most
17informative features, is also supported.
18
19.. autoclass:: LogRegLearner
20   :members:
21
22.. class :: LogRegClassifier
23
24    A logistic regression classification model. Stores estimated values of
25    regression coefficients and their significances, and uses them to predict
26    classes and class probabilities.
27
28    .. attribute :: beta
29
30        Estimated regression coefficients.
31
32    .. attribute :: beta_se
33
34        Estimated standard errors for regression coefficients.
35
36    .. attribute :: wald_Z
37
38        Wald Z statistics for beta coefficients. Wald Z is computed
39        as beta/beta_se.
40
41    .. attribute :: P
42
43        List of P-values for beta coefficients, that is, the probability
44        that beta coefficients differ from 0.0. The probability is
45        computed from squared Wald Z statistics that is distributed with
46        Chi-Square distribution.
47
48    .. attribute :: likelihood
49
50        The probability of the sample (ie. learning examples) observed on
51        the basis of the derived model, as a function of the regression
52        parameters.
53
54    .. attribute :: fit_status
55
56        Tells how the model fitting ended - either regularly
57        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta
58        coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`)
59        or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The
60        value tells about the classifier's "reliability"; the classifier
61        itself is useful in either case.
62
63    .. method:: __call__(instance, result_type)
64
65        Classify a new instance.
66
67        :param instance: instance to be classified.
68        :type instance: :class:`~Orange.data.Instance`
69        :param result_type: :class:`~Orange.classification.Classifier.GetValue` or
70              :class:`~Orange.classification.Classifier.GetProbabilities` or
71              :class:`~Orange.classification.Classifier.GetBoth`
72
73        :rtype: :class:`~Orange.data.Value`,
74              :class:`~Orange.statistics.distribution.Distribution` or a
75              tuple with both
76
77
78.. class:: LogRegFitter
79
80    :obj:`LogRegFitter` is the abstract base class for logistic fitters. It
81    defines the form of call operator and the constants denoting its
82    (un)success:
83
84    .. attribute:: OK
85
86        Fitter succeeded to converge to the optimal fit.
87
88    .. attribute:: Infinity
89
90        Fitter failed due to one or more beta coefficients escaping towards infinity.
91
92    .. attribute:: Divergence
93
94        Beta coefficients failed to converge, but none of beta coefficients escaped.
95
96    .. attribute:: Constant
97
98        There is a constant attribute that causes the matrix to be singular.
99
100    .. attribute:: Singularity
101
102        The matrix is singular.
103
104
105    .. method:: __call__(examples, weight_id)
106
107        Performs the fitting. There can be two different cases: either
108        the fitting succeeded to find a set of beta coefficients (although
109        possibly with difficulties) or the fitting failed altogether. The
110        two cases return different results.
111
112        `(status, beta, beta_se, likelihood)`
113            The fitter managed to fit the model. The first element of
114            the tuple, result, tells about the problems occurred; it can
115            be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In
116            the latter cases, returned values may still be useful for
117            making predictions, but it's recommended that you inspect
118            the coefficients and their errors and make your decision
119            whether to use the model or not.
120
121        `(status, attribute)`
122            The fitter failed and the returned attribute is responsible
123            for it. The type of failure is reported in status, which
124            can be either :obj:`Constant` or :obj:`Singularity`.
125
126        The proper way of calling the fitter is to expect and handle all
127        the situations described. For instance, if fitter is an instance
128        of some fitter and examples contain a set of suitable examples,
129        a script should look like this::
130
131            res = fitter(examples)
132            if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]:
133               status, beta, beta_se, likelihood = res
134               < proceed by doing something with what you got >
135            else:
136               status, attr = res
137               < remove the attribute or complain to the user or ... >
138
139
140.. class :: LogRegFitter_Cholesky
141
142    The sole fitter available at the
143    moment. It is a C++ translation of `Alan Miller's logistic regression
144    code <http://users.bigpond.net.au/amiller/>`_. It uses Newton-Raphson
145    algorithm to iteratively minimize least squares error computed from
146    learning examples.
147
148
149.. autoclass:: StepWiseFSS
150   :members:
151   :show-inheritance:
152
153.. autofunction:: dump
154
155
156
157Examples
158--------
159
160The first example shows a very simple induction of a logistic regression
162
163.. literalinclude:: code/logreg-run.py
164
165Result::
166
167    Classification accuracy: 0.778282598819
168
169    class attribute = survived
170    class values = <no, yes>
171
172        Attribute       beta  st. error     wald Z          P OR=exp(beta)
173
174        Intercept      -1.23       0.08     -15.15      -0.00
175     status=first       0.86       0.16       5.39       0.00       2.36
176    status=second      -0.16       0.18      -0.91       0.36       0.85
177     status=third      -0.92       0.15      -6.12       0.00       0.40
178        age=child       1.06       0.25       4.30       0.00       2.89
179       sex=female       2.42       0.14      17.04       0.00      11.25
180
181The next examples shows how to handle singularities in data sets
183
184.. literalinclude:: code/logreg-singularities.py
185
186The first few lines of the output of this script are::
187
188    <=50K <=50K
189    <=50K <=50K
190    <=50K <=50K
191    >50K >50K
192    <=50K >50K
193
194    class attribute = y
195    class values = <>50K, <=50K>
196
197                               Attribute       beta  st. error     wald Z          P OR=exp(beta)
198
199                               Intercept       6.62      -0.00       -inf       0.00
200                                     age      -0.04       0.00       -inf       0.00       0.96
201                                  fnlwgt      -0.00       0.00       -inf       0.00       1.00
202                           education-num      -0.28       0.00       -inf       0.00       0.76
203                 marital-status=Divorced       4.29       0.00        inf       0.00      72.62
204            marital-status=Never-married       3.79       0.00        inf       0.00      44.45
205                marital-status=Separated       3.46       0.00        inf       0.00      31.95
206                  marital-status=Widowed       3.85       0.00        inf       0.00      46.96
207    marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63
208        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19
209                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72
210
211If :obj:`remove_singular` is set to 0, inducing a logistic regression
212classifier would return an error::
213
214    Traceback (most recent call last):
215      File "logreg-singularities.py", line 4, in <module>
216        lr = classification.logreg.LogRegLearner(table, removeSingular=0)
217      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner
218        return lr(examples, weightID)
219      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__
220        lr = learner(examples, weight)
221    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked
222
223We can see that the attribute workclass is causing a singularity.
224
225The example below shows how the use of stepwise logistic regression can help to
227
228.. literalinclude:: code/logreg-stepwise.py
229
230The output of this script is::
231
232    Learner      CA
233    logistic     0.841
234    filtered     0.846
235
236    Number of times attributes were used in cross-validation:
237     1 x a21
238    10 x a22
239     8 x a23
240     7 x a24
241     1 x a25
242    10 x a26
243    10 x a27
244     3 x a28
245     7 x a29
246     9 x a31
247     2 x a16
248     7 x a12
249     1 x a32
250     8 x a15
251    10 x a14
252     4 x a17
253     7 x a30
254    10 x a11
255     1 x a10
256     1 x a13
257    10 x a34
258     2 x a19
259     1 x a18
260    10 x a3
261    10 x a5
262     4 x a4
263     4 x a7
264     8 x a6
265    10 x a9
266    10 x a8
Note: See TracBrowser for help on using the repository browser.