source: orange/docs/reference/rst/Orange.classification.logreg.rst @ 11605:a35955141513

Revision 11605:a35955141513, 8.6 KB checked in by Ales Erjavec <ales.erjavec@…>, 10 months ago (diff)

Added some basic documentation for the LIBLINEAR based classifiers.

RevLine 
[9372]1.. automodule:: Orange.classification.logreg
[9818]2
3.. index: logistic regression
4.. index:
5   single: classification; logistic regression
6
7********************************
8Logistic regression (``logreg``)
9********************************
10
[10346]11`Logistic regression
12<http://en.wikipedia.org/wiki/Logistic_regression>`_ is a statistical
13classification method that fits data to a logistic function. Orange
14provides various enhancement of the method, such as stepwise selection
15of variables and handling of constant variables and singularities.
[9818]16
17.. autoclass:: LogRegLearner
18   :members:
19
20.. class :: LogRegClassifier
21
22    A logistic regression classification model. Stores estimated values of
23    regression coefficients and their significances, and uses them to predict
24    classes and class probabilities.
25
26    .. attribute :: beta
27
28        Estimated regression coefficients.
29
30    .. attribute :: beta_se
31
32        Estimated standard errors for regression coefficients.
33
34    .. attribute :: wald_Z
35
36        Wald Z statistics for beta coefficients. Wald Z is computed
37        as beta/beta_se.
38
39    .. attribute :: P
40
41        List of P-values for beta coefficients, that is, the probability
42        that beta coefficients differ from 0.0. The probability is
43        computed from squared Wald Z statistics that is distributed with
[10346]44        chi-squared distribution.
[9818]45
46    .. attribute :: likelihood
47
[10346]48        The likelihood of the sample (ie. learning data) given the
49        fitted model.
[9818]50
51    .. attribute :: fit_status
52
[10346]53        Tells how the model fitting ended, either regularly
54        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of
55        beta coefficients escaping towards infinity
56        (:obj:`LogRegFitter.Infinity`) or since the values did not
57        converge (:obj:`LogRegFitter.Divergence`).
58
59        Although the model is functional in all cases, it is
60        recommended to inspect whether the coefficients of the model
61        if the fitting did not end normally.
[9818]62
63    .. method:: __call__(instance, result_type)
64
65        Classify a new instance.
66
67        :param instance: instance to be classified.
68        :type instance: :class:`~Orange.data.Instance`
69        :param result_type: :class:`~Orange.classification.Classifier.GetValue` or
70              :class:`~Orange.classification.Classifier.GetProbabilities` or
71              :class:`~Orange.classification.Classifier.GetBoth`
72
73        :rtype: :class:`~Orange.data.Value`,
74              :class:`~Orange.statistics.distribution.Distribution` or a
75              tuple with both
76
77
78.. class:: LogRegFitter
79
[10346]80    :obj:`LogRegFitter` is the abstract base class for logistic
81    fitters. Fitters can be called with a data table and return a
82    vector of coefficients and the corresponding statistics, or a
83    status signifying an error. The possible statuses are
[9818]84
[10346]85    .. attribute:: OK
[9818]86
[10346]87        Optimization converged
[9818]88
[10346]89    .. attribute:: Infinity
[9818]90
[10346]91        Optimization failed due to one or more beta coefficients
92        escaping towards infinity.
[9818]93
[10346]94    .. attribute:: Divergence
[9818]95
[10346]96        Beta coefficients failed to converge, but without any of beta
97        coefficients escaping toward infinity.
[9818]98
[10346]99    .. attribute:: Constant
[9818]100
[10346]101        The data is singular due to a constant variable.
[9818]102
[10346]103    .. attribute:: Singularity
[9818]104
[10346]105        The data is singular.
[9818]106
107
[10246]108    .. method:: __call__(data, weight_id)
[9818]109
[10346]110        Fit the model and return a tuple with the fitted values and
111        the corresponding statistics or an error indicator. The two
112        cases differ by the tuple length and the status (the first
113        tuple element).
[9818]114
[10346]115        ``(status, beta, beta_se, likelihood)`` Fitting succeeded. The
116            first element, ``status`` is either :obj:`OK`,
117            :obj:`Infinity` or :obj:`Divergence`. In the latter cases,
118            returned values may still be useful for making
119            predictions, but it is recommended to inspect the
120            coefficients and their errors and decide whether to use
121            the model or not.
[9818]122
[10346]123        ``(status, variable)``
124            The fitter failed due to the indicated
125            ``variable``. ``status`` is either :obj:`Constant` or
126            :obj:`Singularity`.
[9818]127
[10346]128        The proper way of calling the fitter is to handle both scenarios ::
[9818]129
130            res = fitter(examples)
131            if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]:
132               status, beta, beta_se, likelihood = res
133               < proceed by doing something with what you got >
134            else:
135               status, attr = res
136               < remove the attribute or complain to the user or ... >
137
138
139.. class :: LogRegFitter_Cholesky
140
141    The sole fitter available at the
[10346]142    moment. This is a C++ translation of `Alan Miller's logistic regression
143    code <http://users.bigpond.net.au/amiller/>`_ that uses Newton-Raphson
[9818]144    algorithm to iteratively minimize least squares error computed from
[10346]145    training data.
[9818]146
147
148.. autoclass:: StepWiseFSS
149   :members:
150   :show-inheritance:
151
152.. autofunction:: dump
153
[11605]154
[10886]155.. autoclass:: LibLinearLogRegLearner
156   :members:
[11605]157   :member-order: bysource
[9818]158
159
160Examples
161--------
162
[10346]163The first example shows a straightforward use a logistic regression (:download:`logreg-run.py <code/logreg-run.py>`).
[9818]164
165.. literalinclude:: code/logreg-run.py
166
167Result::
168
169    Classification accuracy: 0.778282598819
170
171    class attribute = survived
172    class values = <no, yes>
173
174        Attribute       beta  st. error     wald Z          P OR=exp(beta)
175
176        Intercept      -1.23       0.08     -15.15      -0.00
177     status=first       0.86       0.16       5.39       0.00       2.36
178    status=second      -0.16       0.18      -0.91       0.36       0.85
179     status=third      -0.92       0.15      -6.12       0.00       0.40
180        age=child       1.06       0.25       4.30       0.00       2.89
181       sex=female       2.42       0.14      17.04       0.00      11.25
182
183The next examples shows how to handle singularities in data sets
184(:download:`logreg-singularities.py <code/logreg-singularities.py>`).
185
186.. literalinclude:: code/logreg-singularities.py
187
188The first few lines of the output of this script are::
189
190    <=50K <=50K
191    <=50K <=50K
192    <=50K <=50K
193    >50K >50K
194    <=50K >50K
195
196    class attribute = y
197    class values = <>50K, <=50K>
198
199                               Attribute       beta  st. error     wald Z          P OR=exp(beta)
200
201                               Intercept       6.62      -0.00       -inf       0.00
202                                     age      -0.04       0.00       -inf       0.00       0.96
203                                  fnlwgt      -0.00       0.00       -inf       0.00       1.00
204                           education-num      -0.28       0.00       -inf       0.00       0.76
205                 marital-status=Divorced       4.29       0.00        inf       0.00      72.62
206            marital-status=Never-married       3.79       0.00        inf       0.00      44.45
207                marital-status=Separated       3.46       0.00        inf       0.00      31.95
208                  marital-status=Widowed       3.85       0.00        inf       0.00      46.96
209    marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63
210        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19
211                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72
212
213If :obj:`remove_singular` is set to 0, inducing a logistic regression
[10346]214classifier returns an error::
[9818]215
216    Traceback (most recent call last):
217      File "logreg-singularities.py", line 4, in <module>
218        lr = classification.logreg.LogRegLearner(table, removeSingular=0)
219      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner
220        return lr(examples, weightID)
221      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__
222        lr = learner(examples, weight)
223    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked
224
[10346]225The attribute variable which causes the singularity is ``workclass``.
[9818]226
227The example below shows how the use of stepwise logistic regression can help to
228gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`):
229
230.. literalinclude:: code/logreg-stepwise.py
231
232The output of this script is::
233
234    Learner      CA
235    logistic     0.841
236    filtered     0.846
237
238    Number of times attributes were used in cross-validation:
239     1 x a21
240    10 x a22
241     8 x a23
242     7 x a24
243     1 x a25
244    10 x a26
245    10 x a27
246     3 x a28
247     7 x a29
248     9 x a31
249     2 x a16
250     7 x a12
251     1 x a32
252     8 x a15
253    10 x a14
254     4 x a17
255     7 x a30
256    10 x a11
257     1 x a10
258     1 x a13
259    10 x a34
260     2 x a19
261     1 x a18
262    10 x a3
263    10 x a5
264     4 x a4
265     4 x a7
266     8 x a6
267    10 x a9
268    10 x a8
[10886]269
270..
Note: See TracBrowser for help on using the repository browser.