Changeset 10346:c99dada8a093 in orange


Ignore:
Timestamp:
02/23/12 22:47:51 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Polished documentation for logistic regression

Files:
2 edited

Legend:

Unmodified
Added
Removed
  • Orange/classification/logreg.py

    r10246 r10346  
    88 
    99def dump(classifier): 
    10     """ Return a formatted string of all major features in logistic regression 
    11     classifier. 
     10    """ Return a formatted string describing the logistic regression model 
    1211 
    1312    :param classifier: logistic regression classifier. 
     
    5352    """ Logistic regression learner. 
    5453 
    55     If data instances are provided to 
    56     the constructor, the learning algorithm is called and the resulting 
    57     classifier is returned instead of the learner. 
    58  
    59     :param data: data table with either discrete or continuous features 
     54    Returns either a learning algorithm (instance of 
     55    :obj:`LogRegLearner`) or, if data is provided, a fitted model 
     56    (instance of :obj:`LogRegClassifier`). 
     57 
     58    :param data: data table; it may contain discrete and continuous features 
    6059    :type data: Orange.data.Table 
    6160    :param weight_id: the ID of the weight meta attribute 
    6261    :type weight_id: int 
    63     :param remove_singular: set to 1 if you want automatic removal of 
    64         disturbing features, such as constants and singularities 
     62    :param remove_singular: automated removal of constant 
     63        features and singularities (default: `False`) 
    6564    :type remove_singular: bool 
    66     :param fitter: the fitting algorithm (by default the Newton-Raphson 
    67         fitting algorithm is used) 
    68     :param stepwise_lr: set to 1 if you wish to use stepwise logistic 
    69         regression 
     65    :param fitter: the fitting algorithm (default: :obj:`LogRegFitter_Cholesky`) 
     66    :param stepwise_lr: enables stepwise feature selection (default: `False`) 
    7067    :type stepwise_lr: bool 
    71     :param add_crit: parameter for stepwise feature selection 
     68    :param add_crit: threshold for adding a feature in stepwise 
     69        selection (default: 0.2) 
    7270    :type add_crit: float 
    73     :param delete_crit: parameter for stepwise feature selection 
     71    :param delete_crit: threshold for removing a feature in stepwise 
     72        selection (default: 0.3) 
    7473    :type delete_crit: float 
    75     :param num_features: parameter for stepwise feature selection 
     74    :param num_features: number of features in stepwise selection 
     75        (default: -1, no limit) 
    7676    :type num_features: int 
    7777    :rtype: :obj:`LogRegLearner` or :obj:`LogRegClassifier` 
     
    9696    @deprecated_keywords({"examples": "data"}) 
    9797    def __call__(self, data, weight=0): 
    98         """Learn from the given table of data instances. 
    99  
    100         :param data: Data instances to learn from. 
     98        """Fit a model to the given data. 
     99 
     100        :param data: Data instances. 
    101101        :type data: :class:`~Orange.data.Table` 
    102         :param weight: Id of meta attribute with weights of instances 
     102        :param weight: Id of meta attribute with instance weights 
    103103        :type weight: int 
    104104        :rtype: :class:`~Orange.classification.logreg.LogRegClassifier` 
     
    685685class StepWiseFSS(Orange.classification.Learner): 
    686686  """ 
    687   Algorithm described in Hosmer and Lemeshow, 
    688   Applied Logistic Regression, 2000. 
    689  
    690   Perform stepwise logistic regression and return a list of the 
    691   most "informative" features. Each step of the algorithm is composed 
    692   of two parts. The first is backward elimination, where each already 
    693   chosen feature is tested for a significant contribution to the overall 
    694   model. If the worst among all tested features has higher significance 
    695   than is specified in :obj:`delete_crit`, the feature is removed from 
    696   the model. The second step is forward selection, which is similar to 
    697   backward elimination. It loops through all the features that are not 
    698   in the model and tests whether they contribute to the common model 
    699   with significance lower that :obj:`add_crit`. The algorithm stops when 
    700   no feature in the model is to be removed and no feature not in the 
    701   model is to be added. By setting :obj:`num_features` larger than -1, 
    702   the algorithm will stop its execution when the number of features in model 
    703   exceeds that number. 
    704  
    705   Significances are assesed via the likelihood ration chi-square 
    706   test. Normal F test is not appropriate, because errors are assumed to 
    707   follow a binomial distribution. 
    708  
    709   If :obj:`table` is specified, stepwise logistic regression implemented 
    710   in :obj:`StepWiseFSS` is performed and a list of chosen features 
    711   is returned. If :obj:`table` is not specified, an instance of 
    712   :obj:`StepWiseFSS` with all parameters set is returned and can be called 
    713   with data later. 
    714  
    715   :param table: data set. 
     687  A learning algorithm for logistic regression that implements a 
     688  stepwise feature subset selection as described in Applied Logistic 
     689  Regression (Hosmer and Lemeshow, 2000). 
     690 
     691  Each step of the algorithm is composed of two parts. The first is 
     692  backward elimination in which the least significant variable in the 
     693  model is removed if its p-value is above the prescribed threshold 
     694  :obj:`delete_crit`. The second step is forward selection in which 
     695  all variables are tested for addition to the model, and the one with 
     696  the most significant contribution is added if the corresponding 
     697  p-value is smaller than the prescribed :obj:d`add_crit`. The 
     698  algorithm stops when no more variables can be added or removed. 
     699 
     700  The model can be additionaly constrained by setting 
     701  :obj:`num_features` to a non-negative value. The algorithm will then 
     702  stop when the number of variables exceeds the given limit. 
     703 
     704  Significances are assesed by the likelihood ratio chi-square 
     705  test. Normal F test is not appropriate since the errors are assumed 
     706  to follow a binomial distribution. 
     707 
     708  The class constructor returns an instance of learning algorithm or, 
     709  if given training data, a list of selected variables. 
     710 
     711  :param table: training data. 
    716712  :type table: Orange.data.Table 
    717713 
    718   :param add_crit: "Alpha" level to judge if variable has enough importance to 
    719        be added in the new set. (e.g. if add_crit is 0.2, 
    720        then features is added if its P is lower than 0.2). 
     714  :param add_crit: threshold for adding a variable (default: 0.2) 
    721715  :type add_crit: float 
    722716 
    723   :param delete_crit: Similar to add_crit, just that it is used at backward 
    724       elimination. It should be higher than add_crit! 
     717  :param delete_crit: threshold for removing a variable 
     718      (default: 0.3); should be higher than :obj:`add_crit`. 
    725719  :type delete_crit: float 
    726720 
  • docs/reference/rst/Orange.classification.logreg.rst

    r10246 r10346  
    99******************************** 
    1010 
    11 `Logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ 
    12 is a statistical classification methods that fits data to a logistic 
    13 function. Orange's implementation of algorithm 
    14 can handle various anomalies in features, such as constant variables and 
    15 singularities, that could make direct fitting of logistic regression almost 
    16 impossible. Stepwise logistic regression, which iteratively selects the most 
    17 informative features, is also supported. 
     11`Logistic regression 
     12<http://en.wikipedia.org/wiki/Logistic_regression>`_ is a statistical 
     13classification method that fits data to a logistic function. Orange 
     14provides various enhancement of the method, such as stepwise selection 
     15of variables and handling of constant variables and singularities. 
    1816 
    1917.. autoclass:: LogRegLearner 
     
    4442        that beta coefficients differ from 0.0. The probability is 
    4543        computed from squared Wald Z statistics that is distributed with 
    46         Chi-Square distribution. 
     44        chi-squared distribution. 
    4745 
    4846    .. attribute :: likelihood 
    4947 
    50         The probability of the sample (ie. learning examples) observed on 
    51         the basis of the derived model, as a function of the regression 
    52         parameters. 
     48        The likelihood of the sample (ie. learning data) given the 
     49        fitted model. 
    5350 
    5451    .. attribute :: fit_status 
    5552 
    56         Tells how the model fitting ended - either regularly 
    57         (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta 
    58         coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`) 
    59         or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The 
    60         value tells about the classifier's "reliability"; the classifier 
    61         itself is useful in either case. 
     53        Tells how the model fitting ended, either regularly 
     54        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of 
     55        beta coefficients escaping towards infinity 
     56        (:obj:`LogRegFitter.Infinity`) or since the values did not 
     57        converge (:obj:`LogRegFitter.Divergence`). 
     58 
     59        Although the model is functional in all cases, it is 
     60        recommended to inspect whether the coefficients of the model 
     61        if the fitting did not end normally. 
    6262 
    6363    .. method:: __call__(instance, result_type) 
     
    7878.. class:: LogRegFitter 
    7979 
    80     :obj:`LogRegFitter` is the abstract base class for logistic fitters. It 
    81     defines the form of call operator and the constants denoting its 
    82     (un)success: 
    83  
    84     .. attribute:: OK 
    85  
    86         Fitter succeeded to converge to the optimal fit. 
    87  
    88     .. attribute:: Infinity 
    89  
    90         Fitter failed due to one or more beta coefficients escaping towards infinity. 
    91  
    92     .. attribute:: Divergence 
    93  
    94         Beta coefficients failed to converge, but none of beta coefficients escaped. 
    95  
    96     .. attribute:: Constant 
    97  
    98         There is a constant attribute that causes the matrix to be singular. 
    99  
    100     .. attribute:: Singularity 
    101  
    102         The matrix is singular. 
     80    :obj:`LogRegFitter` is the abstract base class for logistic 
     81    fitters. Fitters can be called with a data table and return a 
     82    vector of coefficients and the corresponding statistics, or a 
     83    status signifying an error. The possible statuses are 
     84 
     85    .. attribute:: OK 
     86 
     87        Optimization converged 
     88 
     89    .. attribute:: Infinity 
     90 
     91        Optimization failed due to one or more beta coefficients 
     92        escaping towards infinity. 
     93 
     94    .. attribute:: Divergence 
     95 
     96        Beta coefficients failed to converge, but without any of beta 
     97        coefficients escaping toward infinity. 
     98 
     99    .. attribute:: Constant 
     100 
     101        The data is singular due to a constant variable. 
     102 
     103    .. attribute:: Singularity 
     104 
     105        The data is singular. 
    103106 
    104107 
    105108    .. method:: __call__(data, weight_id) 
    106109 
    107         Performs the fitting. There can be two different cases: either 
    108         the fitting succeeded to find a set of beta coefficients (although 
    109         possibly with difficulties) or the fitting failed altogether. The 
    110         two cases return different results. 
    111  
    112         `(status, beta, beta_se, likelihood)` 
    113             The fitter managed to fit the model. The first element of 
    114             the tuple, result, tells about the problems occurred; it can 
    115             be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In 
    116             the latter cases, returned values may still be useful for 
    117             making predictions, but it's recommended that you inspect 
    118             the coefficients and their errors and make your decision 
    119             whether to use the model or not. 
    120  
    121         `(status, attribute)` 
    122             The fitter failed and the returned attribute is responsible 
    123             for it. The type of failure is reported in status, which 
    124             can be either :obj:`Constant` or :obj:`Singularity`. 
    125  
    126         The proper way of calling the fitter is to expect and handle all 
    127         the situations described. For instance, if fitter is an instance 
    128         of some fitter and examples contain a set of suitable examples, 
    129         a script should look like this:: 
     110        Fit the model and return a tuple with the fitted values and 
     111        the corresponding statistics or an error indicator. The two 
     112        cases differ by the tuple length and the status (the first 
     113        tuple element). 
     114 
     115        ``(status, beta, beta_se, likelihood)`` Fitting succeeded. The 
     116            first element, ``status`` is either :obj:`OK`, 
     117            :obj:`Infinity` or :obj:`Divergence`. In the latter cases, 
     118            returned values may still be useful for making 
     119            predictions, but it is recommended to inspect the 
     120            coefficients and their errors and decide whether to use 
     121            the model or not. 
     122 
     123        ``(status, variable)`` 
     124            The fitter failed due to the indicated 
     125            ``variable``. ``status`` is either :obj:`Constant` or 
     126            :obj:`Singularity`. 
     127 
     128        The proper way of calling the fitter is to handle both scenarios :: 
    130129 
    131130            res = fitter(examples) 
     
    141140 
    142141    The sole fitter available at the 
    143     moment. It is a C++ translation of `Alan Miller's logistic regression 
    144     code <http://users.bigpond.net.au/amiller/>`_. It uses Newton-Raphson 
     142    moment. This is a C++ translation of `Alan Miller's logistic regression 
     143    code <http://users.bigpond.net.au/amiller/>`_ that uses Newton-Raphson 
    145144    algorithm to iteratively minimize least squares error computed from 
    146     learning examples. 
     145    training data. 
    147146 
    148147 
     
    158157-------- 
    159158 
    160 The first example shows a very simple induction of a logistic regression 
    161 classifier (:download:`logreg-run.py <code/logreg-run.py>`). 
     159The first example shows a straightforward use a logistic regression (:download:`logreg-run.py <code/logreg-run.py>`). 
    162160 
    163161.. literalinclude:: code/logreg-run.py 
     
    210208 
    211209If :obj:`remove_singular` is set to 0, inducing a logistic regression 
    212 classifier would return an error:: 
     210classifier returns an error:: 
    213211 
    214212    Traceback (most recent call last): 
     
    221219    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 
    222220 
    223 We can see that the attribute workclass is causing a singularity. 
     221The attribute variable which causes the singularity is ``workclass``. 
    224222 
    225223The example below shows how the use of stepwise logistic regression can help to 
Note: See TracChangeset for help on using the changeset viewer.