source: orange/docs/reference/rst/Orange.classification.logreg.rst @ 10886:3c31daaf34ae

Revision 10886:3c31daaf34ae, 8.6 KB checked in by markotoplak, 2 years ago (diff)

Added logistic regression with LinearLearner to the rst documentation.

Line 
1.. automodule:: Orange.classification.logreg
2
3.. index: logistic regression
4.. index:
5   single: classification; logistic regression
6
7********************************
8Logistic regression (``logreg``)
9********************************
10
11`Logistic regression
12<http://en.wikipedia.org/wiki/Logistic_regression>`_ is a statistical
13classification method that fits data to a logistic function. Orange
14provides various enhancement of the method, such as stepwise selection
15of variables and handling of constant variables and singularities.
16
17.. autoclass:: LogRegLearner
18   :members:
19
20.. class :: LogRegClassifier
21
22    A logistic regression classification model. Stores estimated values of
23    regression coefficients and their significances, and uses them to predict
24    classes and class probabilities.
25
26    .. attribute :: beta
27
28        Estimated regression coefficients.
29
30    .. attribute :: beta_se
31
32        Estimated standard errors for regression coefficients.
33
34    .. attribute :: wald_Z
35
36        Wald Z statistics for beta coefficients. Wald Z is computed
37        as beta/beta_se.
38
39    .. attribute :: P
40
41        List of P-values for beta coefficients, that is, the probability
42        that beta coefficients differ from 0.0. The probability is
43        computed from squared Wald Z statistics that is distributed with
44        chi-squared distribution.
45
46    .. attribute :: likelihood
47
48        The likelihood of the sample (ie. learning data) given the
49        fitted model.
50
51    .. attribute :: fit_status
52
53        Tells how the model fitting ended, either regularly
54        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of
55        beta coefficients escaping towards infinity
56        (:obj:`LogRegFitter.Infinity`) or since the values did not
57        converge (:obj:`LogRegFitter.Divergence`).
58
59        Although the model is functional in all cases, it is
60        recommended to inspect whether the coefficients of the model
61        if the fitting did not end normally.
62
63    .. method:: __call__(instance, result_type)
64
65        Classify a new instance.
66
67        :param instance: instance to be classified.
68        :type instance: :class:`~Orange.data.Instance`
69        :param result_type: :class:`~Orange.classification.Classifier.GetValue` or
70              :class:`~Orange.classification.Classifier.GetProbabilities` or
71              :class:`~Orange.classification.Classifier.GetBoth`
72
73        :rtype: :class:`~Orange.data.Value`,
74              :class:`~Orange.statistics.distribution.Distribution` or a
75              tuple with both
76
77
78.. class:: LogRegFitter
79
80    :obj:`LogRegFitter` is the abstract base class for logistic
81    fitters. Fitters can be called with a data table and return a
82    vector of coefficients and the corresponding statistics, or a
83    status signifying an error. The possible statuses are
84
85    .. attribute:: OK
86
87        Optimization converged
88
89    .. attribute:: Infinity
90
91        Optimization failed due to one or more beta coefficients
92        escaping towards infinity.
93
94    .. attribute:: Divergence
95
96        Beta coefficients failed to converge, but without any of beta
97        coefficients escaping toward infinity.
98
99    .. attribute:: Constant
100
101        The data is singular due to a constant variable.
102
103    .. attribute:: Singularity
104
105        The data is singular.
106
107
108    .. method:: __call__(data, weight_id)
109
110        Fit the model and return a tuple with the fitted values and
111        the corresponding statistics or an error indicator. The two
112        cases differ by the tuple length and the status (the first
113        tuple element).
114
115        ``(status, beta, beta_se, likelihood)`` Fitting succeeded. The
116            first element, ``status`` is either :obj:`OK`,
117            :obj:`Infinity` or :obj:`Divergence`. In the latter cases,
118            returned values may still be useful for making
119            predictions, but it is recommended to inspect the
120            coefficients and their errors and decide whether to use
121            the model or not.
122
123        ``(status, variable)``
124            The fitter failed due to the indicated
125            ``variable``. ``status`` is either :obj:`Constant` or
126            :obj:`Singularity`.
127
128        The proper way of calling the fitter is to handle both scenarios ::
129
130            res = fitter(examples)
131            if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]:
132               status, beta, beta_se, likelihood = res
133               < proceed by doing something with what you got >
134            else:
135               status, attr = res
136               < remove the attribute or complain to the user or ... >
137
138
139.. class :: LogRegFitter_Cholesky
140
141    The sole fitter available at the
142    moment. This is a C++ translation of `Alan Miller's logistic regression
143    code <http://users.bigpond.net.au/amiller/>`_ that uses Newton-Raphson
144    algorithm to iteratively minimize least squares error computed from
145    training data.
146
147
148.. autoclass:: StepWiseFSS
149   :members:
150   :show-inheritance:
151
152.. autofunction:: dump
153
154.. autoclass:: LibLinearLogRegLearner
155   :members:
156
157
158Examples
159--------
160
161The first example shows a straightforward use a logistic regression (:download:`logreg-run.py <code/logreg-run.py>`).
162
163.. literalinclude:: code/logreg-run.py
164
165Result::
166
167    Classification accuracy: 0.778282598819
168
169    class attribute = survived
170    class values = <no, yes>
171
172        Attribute       beta  st. error     wald Z          P OR=exp(beta)
173
174        Intercept      -1.23       0.08     -15.15      -0.00
175     status=first       0.86       0.16       5.39       0.00       2.36
176    status=second      -0.16       0.18      -0.91       0.36       0.85
177     status=third      -0.92       0.15      -6.12       0.00       0.40
178        age=child       1.06       0.25       4.30       0.00       2.89
179       sex=female       2.42       0.14      17.04       0.00      11.25
180
181The next examples shows how to handle singularities in data sets
182(:download:`logreg-singularities.py <code/logreg-singularities.py>`).
183
184.. literalinclude:: code/logreg-singularities.py
185
186The first few lines of the output of this script are::
187
188    <=50K <=50K
189    <=50K <=50K
190    <=50K <=50K
191    >50K >50K
192    <=50K >50K
193
194    class attribute = y
195    class values = <>50K, <=50K>
196
197                               Attribute       beta  st. error     wald Z          P OR=exp(beta)
198
199                               Intercept       6.62      -0.00       -inf       0.00
200                                     age      -0.04       0.00       -inf       0.00       0.96
201                                  fnlwgt      -0.00       0.00       -inf       0.00       1.00
202                           education-num      -0.28       0.00       -inf       0.00       0.76
203                 marital-status=Divorced       4.29       0.00        inf       0.00      72.62
204            marital-status=Never-married       3.79       0.00        inf       0.00      44.45
205                marital-status=Separated       3.46       0.00        inf       0.00      31.95
206                  marital-status=Widowed       3.85       0.00        inf       0.00      46.96
207    marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63
208        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19
209                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72
210
211If :obj:`remove_singular` is set to 0, inducing a logistic regression
212classifier returns an error::
213
214    Traceback (most recent call last):
215      File "logreg-singularities.py", line 4, in <module>
216        lr = classification.logreg.LogRegLearner(table, removeSingular=0)
217      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner
218        return lr(examples, weightID)
219      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__
220        lr = learner(examples, weight)
221    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked
222
223The attribute variable which causes the singularity is ``workclass``.
224
225The example below shows how the use of stepwise logistic regression can help to
226gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`):
227
228.. literalinclude:: code/logreg-stepwise.py
229
230The output of this script is::
231
232    Learner      CA
233    logistic     0.841
234    filtered     0.846
235
236    Number of times attributes were used in cross-validation:
237     1 x a21
238    10 x a22
239     8 x a23
240     7 x a24
241     1 x a25
242    10 x a26
243    10 x a27
244     3 x a28
245     7 x a29
246     9 x a31
247     2 x a16
248     7 x a12
249     1 x a32
250     8 x a15
251    10 x a14
252     4 x a17
253     7 x a30
254    10 x a11
255     1 x a10
256     1 x a13
257    10 x a34
258     2 x a19
259     1 x a18
260    10 x a3
261    10 x a5
262     4 x a4
263     4 x a7
264     8 x a6
265    10 x a9
266    10 x a8
267
268..
Note: See TracBrowser for help on using the repository browser.