source: orange/docs/reference/rst/Orange.classification.logreg.rst @ 10346:c99dada8a093

Revision 10346:c99dada8a093, 8.6 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Polished documentation for logistic regression

Line 
1.. automodule:: Orange.classification.logreg
2
3.. index: logistic regression
4.. index:
5   single: classification; logistic regression
6
7********************************
8Logistic regression (``logreg``)
9********************************
10
11`Logistic regression
12<http://en.wikipedia.org/wiki/Logistic_regression>`_ is a statistical
13classification method that fits data to a logistic function. Orange
14provides various enhancement of the method, such as stepwise selection
15of variables and handling of constant variables and singularities.
16
17.. autoclass:: LogRegLearner
18   :members:
19
20.. class :: LogRegClassifier
21
22    A logistic regression classification model. Stores estimated values of
23    regression coefficients and their significances, and uses them to predict
24    classes and class probabilities.
25
26    .. attribute :: beta
27
28        Estimated regression coefficients.
29
30    .. attribute :: beta_se
31
32        Estimated standard errors for regression coefficients.
33
34    .. attribute :: wald_Z
35
36        Wald Z statistics for beta coefficients. Wald Z is computed
37        as beta/beta_se.
38
39    .. attribute :: P
40
41        List of P-values for beta coefficients, that is, the probability
42        that beta coefficients differ from 0.0. The probability is
43        computed from squared Wald Z statistics that is distributed with
44        chi-squared distribution.
45
46    .. attribute :: likelihood
47
48        The likelihood of the sample (ie. learning data) given the
49        fitted model.
50
51    .. attribute :: fit_status
52
53        Tells how the model fitting ended, either regularly
54        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of
55        beta coefficients escaping towards infinity
56        (:obj:`LogRegFitter.Infinity`) or since the values did not
57        converge (:obj:`LogRegFitter.Divergence`).
58
59        Although the model is functional in all cases, it is
60        recommended to inspect whether the coefficients of the model
61        if the fitting did not end normally.
62
63    .. method:: __call__(instance, result_type)
64
65        Classify a new instance.
66
67        :param instance: instance to be classified.
68        :type instance: :class:`~Orange.data.Instance`
69        :param result_type: :class:`~Orange.classification.Classifier.GetValue` or
70              :class:`~Orange.classification.Classifier.GetProbabilities` or
71              :class:`~Orange.classification.Classifier.GetBoth`
72
73        :rtype: :class:`~Orange.data.Value`,
74              :class:`~Orange.statistics.distribution.Distribution` or a
75              tuple with both
76
77
78.. class:: LogRegFitter
79
80    :obj:`LogRegFitter` is the abstract base class for logistic
81    fitters. Fitters can be called with a data table and return a
82    vector of coefficients and the corresponding statistics, or a
83    status signifying an error. The possible statuses are
84
85    .. attribute:: OK
86
87        Optimization converged
88
89    .. attribute:: Infinity
90
91        Optimization failed due to one or more beta coefficients
92        escaping towards infinity.
93
94    .. attribute:: Divergence
95
96        Beta coefficients failed to converge, but without any of beta
97        coefficients escaping toward infinity.
98
99    .. attribute:: Constant
100
101        The data is singular due to a constant variable.
102
103    .. attribute:: Singularity
104
105        The data is singular.
106
107
108    .. method:: __call__(data, weight_id)
109
110        Fit the model and return a tuple with the fitted values and
111        the corresponding statistics or an error indicator. The two
112        cases differ by the tuple length and the status (the first
113        tuple element).
114
115        ``(status, beta, beta_se, likelihood)`` Fitting succeeded. The
116            first element, ``status`` is either :obj:`OK`,
117            :obj:`Infinity` or :obj:`Divergence`. In the latter cases,
118            returned values may still be useful for making
119            predictions, but it is recommended to inspect the
120            coefficients and their errors and decide whether to use
121            the model or not.
122
123        ``(status, variable)``
124            The fitter failed due to the indicated
125            ``variable``. ``status`` is either :obj:`Constant` or
126            :obj:`Singularity`.
127
128        The proper way of calling the fitter is to handle both scenarios ::
129
130            res = fitter(examples)
131            if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]:
132               status, beta, beta_se, likelihood = res
133               < proceed by doing something with what you got >
134            else:
135               status, attr = res
136               < remove the attribute or complain to the user or ... >
137
138
139.. class :: LogRegFitter_Cholesky
140
141    The sole fitter available at the
142    moment. This is a C++ translation of `Alan Miller's logistic regression
143    code <http://users.bigpond.net.au/amiller/>`_ that uses Newton-Raphson
144    algorithm to iteratively minimize least squares error computed from
145    training data.
146
147
148.. autoclass:: StepWiseFSS
149   :members:
150   :show-inheritance:
151
152.. autofunction:: dump
153
154
155
156Examples
157--------
158
159The first example shows a straightforward use a logistic regression (:download:`logreg-run.py <code/logreg-run.py>`).
160
161.. literalinclude:: code/logreg-run.py
162
163Result::
164
165    Classification accuracy: 0.778282598819
166
167    class attribute = survived
168    class values = <no, yes>
169
170        Attribute       beta  st. error     wald Z          P OR=exp(beta)
171
172        Intercept      -1.23       0.08     -15.15      -0.00
173     status=first       0.86       0.16       5.39       0.00       2.36
174    status=second      -0.16       0.18      -0.91       0.36       0.85
175     status=third      -0.92       0.15      -6.12       0.00       0.40
176        age=child       1.06       0.25       4.30       0.00       2.89
177       sex=female       2.42       0.14      17.04       0.00      11.25
178
179The next examples shows how to handle singularities in data sets
180(:download:`logreg-singularities.py <code/logreg-singularities.py>`).
181
182.. literalinclude:: code/logreg-singularities.py
183
184The first few lines of the output of this script are::
185
186    <=50K <=50K
187    <=50K <=50K
188    <=50K <=50K
189    >50K >50K
190    <=50K >50K
191
192    class attribute = y
193    class values = <>50K, <=50K>
194
195                               Attribute       beta  st. error     wald Z          P OR=exp(beta)
196
197                               Intercept       6.62      -0.00       -inf       0.00
198                                     age      -0.04       0.00       -inf       0.00       0.96
199                                  fnlwgt      -0.00       0.00       -inf       0.00       1.00
200                           education-num      -0.28       0.00       -inf       0.00       0.76
201                 marital-status=Divorced       4.29       0.00        inf       0.00      72.62
202            marital-status=Never-married       3.79       0.00        inf       0.00      44.45
203                marital-status=Separated       3.46       0.00        inf       0.00      31.95
204                  marital-status=Widowed       3.85       0.00        inf       0.00      46.96
205    marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63
206        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19
207                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72
208
209If :obj:`remove_singular` is set to 0, inducing a logistic regression
210classifier returns an error::
211
212    Traceback (most recent call last):
213      File "logreg-singularities.py", line 4, in <module>
214        lr = classification.logreg.LogRegLearner(table, removeSingular=0)
215      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner
216        return lr(examples, weightID)
217      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__
218        lr = learner(examples, weight)
219    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked
220
221The attribute variable which causes the singularity is ``workclass``.
222
223The example below shows how the use of stepwise logistic regression can help to
224gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`):
225
226.. literalinclude:: code/logreg-stepwise.py
227
228The output of this script is::
229
230    Learner      CA
231    logistic     0.841
232    filtered     0.846
233
234    Number of times attributes were used in cross-validation:
235     1 x a21
236    10 x a22
237     8 x a23
238     7 x a24
239     1 x a25
240    10 x a26
241    10 x a27
242     3 x a28
243     7 x a29
244     9 x a31
245     2 x a16
246     7 x a12
247     1 x a32
248     8 x a15
249    10 x a14
250     4 x a17
251     7 x a30
252    10 x a11
253     1 x a10
254     1 x a13
255    10 x a34
256     2 x a19
257     1 x a18
258    10 x a3
259    10 x a5
260     4 x a4
261     4 x a7
262     8 x a6
263    10 x a9
264    10 x a8
Note: See TracBrowser for help on using the repository browser.