Ignore:
Timestamp:
02/06/12 20:00:44 (2 years ago)
Author:
Matija Polajnar <matija.polajnar@…>
Branch:
default
rebase_source:
50b865d3d6764767b1ca538019c5b08631aee272
Message:

Finish the logreg refactoring, along with documentation improvement.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.classification.logreg.rst

    r9372 r9818  
    11.. automodule:: Orange.classification.logreg 
     2 
     3.. index: logistic regression 
     4.. index: 
     5   single: classification; logistic regression 
     6 
     7******************************** 
     8Logistic regression (``logreg``) 
     9******************************** 
     10 
     11`Logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ 
     12is a statistical classification methods that fits data to a logistic 
     13function. Orange's implementation of algorithm 
     14can handle various anomalies in features, such as constant variables and 
     15singularities, that could make direct fitting of logistic regression almost 
     16impossible. Stepwise logistic regression, which iteratively selects the most 
     17informative features, is also supported. 
     18 
     19.. autoclass:: LogRegLearner 
     20   :members: 
     21 
     22.. class :: LogRegClassifier 
     23 
     24    A logistic regression classification model. Stores estimated values of 
     25    regression coefficients and their significances, and uses them to predict 
     26    classes and class probabilities. 
     27 
     28    .. attribute :: beta 
     29 
     30        Estimated regression coefficients. 
     31 
     32    .. attribute :: beta_se 
     33 
     34        Estimated standard errors for regression coefficients. 
     35 
     36    .. attribute :: wald_Z 
     37 
     38        Wald Z statistics for beta coefficients. Wald Z is computed 
     39        as beta/beta_se. 
     40 
     41    .. attribute :: P 
     42 
     43        List of P-values for beta coefficients, that is, the probability 
     44        that beta coefficients differ from 0.0. The probability is 
     45        computed from squared Wald Z statistics that is distributed with 
     46        Chi-Square distribution. 
     47 
     48    .. attribute :: likelihood 
     49 
     50        The probability of the sample (ie. learning examples) observed on 
     51        the basis of the derived model, as a function of the regression 
     52        parameters. 
     53 
     54    .. attribute :: fit_status 
     55 
     56        Tells how the model fitting ended - either regularly 
     57        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta 
     58        coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`) 
     59        or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The 
     60        value tells about the classifier's "reliability"; the classifier 
     61        itself is useful in either case. 
     62 
     63    .. method:: __call__(instance, result_type) 
     64 
     65        Classify a new instance. 
     66 
     67        :param instance: instance to be classified. 
     68        :type instance: :class:`~Orange.data.Instance` 
     69        :param result_type: :class:`~Orange.classification.Classifier.GetValue` or 
     70              :class:`~Orange.classification.Classifier.GetProbabilities` or 
     71              :class:`~Orange.classification.Classifier.GetBoth` 
     72 
     73        :rtype: :class:`~Orange.data.Value`, 
     74              :class:`~Orange.statistics.distribution.Distribution` or a 
     75              tuple with both 
     76 
     77 
     78.. class:: LogRegFitter 
     79 
     80    :obj:`LogRegFitter` is the abstract base class for logistic fitters. It 
     81    defines the form of call operator and the constants denoting its 
     82    (un)success: 
     83 
     84    .. attribute:: OK 
     85 
     86        Fitter succeeded to converge to the optimal fit. 
     87 
     88    .. attribute:: Infinity 
     89 
     90        Fitter failed due to one or more beta coefficients escaping towards infinity. 
     91 
     92    .. attribute:: Divergence 
     93 
     94        Beta coefficients failed to converge, but none of beta coefficients escaped. 
     95 
     96    .. attribute:: Constant 
     97 
     98        There is a constant attribute that causes the matrix to be singular. 
     99 
     100    .. attribute:: Singularity 
     101 
     102        The matrix is singular. 
     103 
     104 
     105    .. method:: __call__(examples, weight_id) 
     106 
     107        Performs the fitting. There can be two different cases: either 
     108        the fitting succeeded to find a set of beta coefficients (although 
     109        possibly with difficulties) or the fitting failed altogether. The 
     110        two cases return different results. 
     111 
     112        `(status, beta, beta_se, likelihood)` 
     113            The fitter managed to fit the model. The first element of 
     114            the tuple, result, tells about the problems occurred; it can 
     115            be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In 
     116            the latter cases, returned values may still be useful for 
     117            making predictions, but it's recommended that you inspect 
     118            the coefficients and their errors and make your decision 
     119            whether to use the model or not. 
     120 
     121        `(status, attribute)` 
     122            The fitter failed and the returned attribute is responsible 
     123            for it. The type of failure is reported in status, which 
     124            can be either :obj:`Constant` or :obj:`Singularity`. 
     125 
     126        The proper way of calling the fitter is to expect and handle all 
     127        the situations described. For instance, if fitter is an instance 
     128        of some fitter and examples contain a set of suitable examples, 
     129        a script should look like this:: 
     130 
     131            res = fitter(examples) 
     132            if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]: 
     133               status, beta, beta_se, likelihood = res 
     134               < proceed by doing something with what you got > 
     135            else: 
     136               status, attr = res 
     137               < remove the attribute or complain to the user or ... > 
     138 
     139 
     140.. class :: LogRegFitter_Cholesky 
     141 
     142    The sole fitter available at the 
     143    moment. It is a C++ translation of `Alan Miller's logistic regression 
     144    code <http://users.bigpond.net.au/amiller/>`_. It uses Newton-Raphson 
     145    algorithm to iteratively minimize least squares error computed from 
     146    learning examples. 
     147 
     148 
     149.. autoclass:: StepWiseFSS 
     150   :members: 
     151   :show-inheritance: 
     152 
     153.. autofunction:: dump 
     154 
     155 
     156 
     157Examples 
     158-------- 
     159 
     160The first example shows a very simple induction of a logistic regression 
     161classifier (:download:`logreg-run.py <code/logreg-run.py>`). 
     162 
     163.. literalinclude:: code/logreg-run.py 
     164 
     165Result:: 
     166 
     167    Classification accuracy: 0.778282598819 
     168 
     169    class attribute = survived 
     170    class values = <no, yes> 
     171 
     172        Attribute       beta  st. error     wald Z          P OR=exp(beta) 
     173 
     174        Intercept      -1.23       0.08     -15.15      -0.00 
     175     status=first       0.86       0.16       5.39       0.00       2.36 
     176    status=second      -0.16       0.18      -0.91       0.36       0.85 
     177     status=third      -0.92       0.15      -6.12       0.00       0.40 
     178        age=child       1.06       0.25       4.30       0.00       2.89 
     179       sex=female       2.42       0.14      17.04       0.00      11.25 
     180 
     181The next examples shows how to handle singularities in data sets 
     182(:download:`logreg-singularities.py <code/logreg-singularities.py>`). 
     183 
     184.. literalinclude:: code/logreg-singularities.py 
     185 
     186The first few lines of the output of this script are:: 
     187 
     188    <=50K <=50K 
     189    <=50K <=50K 
     190    <=50K <=50K 
     191    >50K >50K 
     192    <=50K >50K 
     193 
     194    class attribute = y 
     195    class values = <>50K, <=50K> 
     196 
     197                               Attribute       beta  st. error     wald Z          P OR=exp(beta) 
     198 
     199                               Intercept       6.62      -0.00       -inf       0.00 
     200                                     age      -0.04       0.00       -inf       0.00       0.96 
     201                                  fnlwgt      -0.00       0.00       -inf       0.00       1.00 
     202                           education-num      -0.28       0.00       -inf       0.00       0.76 
     203                 marital-status=Divorced       4.29       0.00        inf       0.00      72.62 
     204            marital-status=Never-married       3.79       0.00        inf       0.00      44.45 
     205                marital-status=Separated       3.46       0.00        inf       0.00      31.95 
     206                  marital-status=Widowed       3.85       0.00        inf       0.00      46.96 
     207    marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63 
     208        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19 
     209                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72 
     210 
     211If :obj:`remove_singular` is set to 0, inducing a logistic regression 
     212classifier would return an error:: 
     213 
     214    Traceback (most recent call last): 
     215      File "logreg-singularities.py", line 4, in <module> 
     216        lr = classification.logreg.LogRegLearner(table, removeSingular=0) 
     217      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner 
     218        return lr(examples, weightID) 
     219      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__ 
     220        lr = learner(examples, weight) 
     221    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 
     222 
     223We can see that the attribute workclass is causing a singularity. 
     224 
     225The example below shows how the use of stepwise logistic regression can help to 
     226gain in classification performance (:download:`logreg-stepwise.py <code/logreg-stepwise.py>`): 
     227 
     228.. literalinclude:: code/logreg-stepwise.py 
     229 
     230The output of this script is:: 
     231 
     232    Learner      CA 
     233    logistic     0.841 
     234    filtered     0.846 
     235 
     236    Number of times attributes were used in cross-validation: 
     237     1 x a21 
     238    10 x a22 
     239     8 x a23 
     240     7 x a24 
     241     1 x a25 
     242    10 x a26 
     243    10 x a27 
     244     3 x a28 
     245     7 x a29 
     246     9 x a31 
     247     2 x a16 
     248     7 x a12 
     249     1 x a32 
     250     8 x a15 
     251    10 x a14 
     252     4 x a17 
     253     7 x a30 
     254    10 x a11 
     255     1 x a10 
     256     1 x a13 
     257    10 x a34 
     258     2 x a19 
     259     1 x a18 
     260    10 x a3 
     261    10 x a5 
     262     4 x a4 
     263     4 x a7 
     264     8 x a6 
     265    10 x a9 
     266    10 x a8 
Note: See TracChangeset for help on using the changeset viewer.