Changeset 10346:c99dada8a093 in orange
- Timestamp:
- 02/23/12 22:47:51 (15 months ago)
- Branch:
- default
- Files:
-
- 2 edited
-
Orange/classification/logreg.py (modified) (4 diffs)
-
docs/reference/rst/Orange.classification.logreg.rst (modified) (7 diffs)
Legend:
- Unmodified
- Added
- Removed
-
Orange/classification/logreg.py
r10246 r10346 8 8 9 9 def dump(classifier): 10 """ Return a formatted string of all major features in logistic regression 11 classifier. 10 """ Return a formatted string describing the logistic regression model 12 11 13 12 :param classifier: logistic regression classifier. … … 53 52 """ Logistic regression learner. 54 53 55 If data instances are provided to56 the constructor, the learning algorithm is called and the resulting57 classifier is returned instead of the learner.58 59 :param data: data table with either discrete orcontinuous features54 Returns either a learning algorithm (instance of 55 :obj:`LogRegLearner`) or, if data is provided, a fitted model 56 (instance of :obj:`LogRegClassifier`). 57 58 :param data: data table; it may contain discrete and continuous features 60 59 :type data: Orange.data.Table 61 60 :param weight_id: the ID of the weight meta attribute 62 61 :type weight_id: int 63 :param remove_singular: set to 1 if you want automatic removal of64 disturbing features, such as constants and singularities62 :param remove_singular: automated removal of constant 63 features and singularities (default: `False`) 65 64 :type remove_singular: bool 66 :param fitter: the fitting algorithm (by default the Newton-Raphson 67 fitting algorithm is used) 68 :param stepwise_lr: set to 1 if you wish to use stepwise logistic 69 regression 65 :param fitter: the fitting algorithm (default: :obj:`LogRegFitter_Cholesky`) 66 :param stepwise_lr: enables stepwise feature selection (default: `False`) 70 67 :type stepwise_lr: bool 71 :param add_crit: parameter for stepwise feature selection 68 :param add_crit: threshold for adding a feature in stepwise 69 selection (default: 0.2) 72 70 :type add_crit: float 73 :param delete_crit: parameter for stepwise feature selection 71 :param delete_crit: threshold for removing a feature in stepwise 72 selection (default: 0.3) 74 73 :type delete_crit: float 75 :param num_features: parameter for stepwise feature selection 74 :param num_features: number of features in stepwise selection 75 (default: -1, no limit) 76 76 :type num_features: int 77 77 :rtype: :obj:`LogRegLearner` or :obj:`LogRegClassifier` … … 96 96 @deprecated_keywords({"examples": "data"}) 97 97 def __call__(self, data, weight=0): 98 """ Learn from the given table of data instances.99 100 :param data: Data instances to learn from.98 """Fit a model to the given data. 99 100 :param data: Data instances. 101 101 :type data: :class:`~Orange.data.Table` 102 :param weight: Id of meta attribute with weights of instances102 :param weight: Id of meta attribute with instance weights 103 103 :type weight: int 104 104 :rtype: :class:`~Orange.classification.logreg.LogRegClassifier` … … 685 685 class StepWiseFSS(Orange.classification.Learner): 686 686 """ 687 Algorithm described in Hosmer and Lemeshow, 688 Applied Logistic Regression, 2000. 689 690 Perform stepwise logistic regression and return a list of the 691 most "informative" features. Each step of the algorithm is composed 692 of two parts. The first is backward elimination, where each already 693 chosen feature is tested for a significant contribution to the overall 694 model. If the worst among all tested features has higher significance 695 than is specified in :obj:`delete_crit`, the feature is removed from 696 the model. The second step is forward selection, which is similar to 697 backward elimination. It loops through all the features that are not 698 in the model and tests whether they contribute to the common model 699 with significance lower that :obj:`add_crit`. The algorithm stops when 700 no feature in the model is to be removed and no feature not in the 701 model is to be added. By setting :obj:`num_features` larger than -1, 702 the algorithm will stop its execution when the number of features in model 703 exceeds that number. 704 705 Significances are assesed via the likelihood ration chi-square 706 test. Normal F test is not appropriate, because errors are assumed to 707 follow a binomial distribution. 708 709 If :obj:`table` is specified, stepwise logistic regression implemented 710 in :obj:`StepWiseFSS` is performed and a list of chosen features 711 is returned. If :obj:`table` is not specified, an instance of 712 :obj:`StepWiseFSS` with all parameters set is returned and can be called 713 with data later. 714 715 :param table: data set. 687 A learning algorithm for logistic regression that implements a 688 stepwise feature subset selection as described in Applied Logistic 689 Regression (Hosmer and Lemeshow, 2000). 690 691 Each step of the algorithm is composed of two parts. The first is 692 backward elimination in which the least significant variable in the 693 model is removed if its p-value is above the prescribed threshold 694 :obj:`delete_crit`. The second step is forward selection in which 695 all variables are tested for addition to the model, and the one with 696 the most significant contribution is added if the corresponding 697 p-value is smaller than the prescribed :obj:d`add_crit`. The 698 algorithm stops when no more variables can be added or removed. 699 700 The model can be additionaly constrained by setting 701 :obj:`num_features` to a non-negative value. The algorithm will then 702 stop when the number of variables exceeds the given limit. 703 704 Significances are assesed by the likelihood ratio chi-square 705 test. Normal F test is not appropriate since the errors are assumed 706 to follow a binomial distribution. 707 708 The class constructor returns an instance of learning algorithm or, 709 if given training data, a list of selected variables. 710 711 :param table: training data. 716 712 :type table: Orange.data.Table 717 713 718 :param add_crit: "Alpha" level to judge if variable has enough importance to 719 be added in the new set. (e.g. if add_crit is 0.2, 720 then features is added if its P is lower than 0.2). 714 :param add_crit: threshold for adding a variable (default: 0.2) 721 715 :type add_crit: float 722 716 723 :param delete_crit: Similar to add_crit, just that it is used at backward724 elimination. It should be higher than add_crit!717 :param delete_crit: threshold for removing a variable 718 (default: 0.3); should be higher than :obj:`add_crit`. 725 719 :type delete_crit: float 726 720 -
docs/reference/rst/Orange.classification.logreg.rst
r10246 r10346 9 9 ******************************** 10 10 11 `Logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ 12 is a statistical classification methods that fits data to a logistic 13 function. Orange's implementation of algorithm 14 can handle various anomalies in features, such as constant variables and 15 singularities, that could make direct fitting of logistic regression almost 16 impossible. Stepwise logistic regression, which iteratively selects the most 17 informative features, is also supported. 11 `Logistic regression 12 <http://en.wikipedia.org/wiki/Logistic_regression>`_ is a statistical 13 classification method that fits data to a logistic function. Orange 14 provides various enhancement of the method, such as stepwise selection 15 of variables and handling of constant variables and singularities. 18 16 19 17 .. autoclass:: LogRegLearner … … 44 42 that beta coefficients differ from 0.0. The probability is 45 43 computed from squared Wald Z statistics that is distributed with 46 Chi-Squaredistribution.44 chi-squared distribution. 47 45 48 46 .. attribute :: likelihood 49 47 50 The probability of the sample (ie. learning examples) observed on 51 the basis of the derived model, as a function of the regression 52 parameters. 48 The likelihood of the sample (ie. learning data) given the 49 fitted model. 53 50 54 51 .. attribute :: fit_status 55 52 56 Tells how the model fitting ended - either regularly 57 (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta 58 coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`) 59 or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The 60 value tells about the classifier's "reliability"; the classifier 61 itself is useful in either case. 53 Tells how the model fitting ended, either regularly 54 (:obj:`LogRegFitter.OK`), or it was interrupted due to one of 55 beta coefficients escaping towards infinity 56 (:obj:`LogRegFitter.Infinity`) or since the values did not 57 converge (:obj:`LogRegFitter.Divergence`). 58 59 Although the model is functional in all cases, it is 60 recommended to inspect whether the coefficients of the model 61 if the fitting did not end normally. 62 62 63 63 .. method:: __call__(instance, result_type) … … 78 78 .. class:: LogRegFitter 79 79 80 :obj:`LogRegFitter` is the abstract base class for logistic fitters. It 81 defines the form of call operator and the constants denoting its 82 (un)success: 83 84 .. attribute:: OK 85 86 Fitter succeeded to converge to the optimal fit. 87 88 .. attribute:: Infinity 89 90 Fitter failed due to one or more beta coefficients escaping towards infinity. 91 92 .. attribute:: Divergence 93 94 Beta coefficients failed to converge, but none of beta coefficients escaped. 95 96 .. attribute:: Constant 97 98 There is a constant attribute that causes the matrix to be singular. 99 100 .. attribute:: Singularity 101 102 The matrix is singular. 80 :obj:`LogRegFitter` is the abstract base class for logistic 81 fitters. Fitters can be called with a data table and return a 82 vector of coefficients and the corresponding statistics, or a 83 status signifying an error. The possible statuses are 84 85 .. attribute:: OK 86 87 Optimization converged 88 89 .. attribute:: Infinity 90 91 Optimization failed due to one or more beta coefficients 92 escaping towards infinity. 93 94 .. attribute:: Divergence 95 96 Beta coefficients failed to converge, but without any of beta 97 coefficients escaping toward infinity. 98 99 .. attribute:: Constant 100 101 The data is singular due to a constant variable. 102 103 .. attribute:: Singularity 104 105 The data is singular. 103 106 104 107 105 108 .. method:: __call__(data, weight_id) 106 109 107 Performs the fitting. There can be two different cases: either 108 the fitting succeeded to find a set of beta coefficients (although 109 possibly with difficulties) or the fitting failed altogether. The 110 two cases return different results. 111 112 `(status, beta, beta_se, likelihood)` 113 The fitter managed to fit the model. The first element of 114 the tuple, result, tells about the problems occurred; it can 115 be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In 116 the latter cases, returned values may still be useful for 117 making predictions, but it's recommended that you inspect 118 the coefficients and their errors and make your decision 119 whether to use the model or not. 120 121 `(status, attribute)` 122 The fitter failed and the returned attribute is responsible 123 for it. The type of failure is reported in status, which 124 can be either :obj:`Constant` or :obj:`Singularity`. 125 126 The proper way of calling the fitter is to expect and handle all 127 the situations described. For instance, if fitter is an instance 128 of some fitter and examples contain a set of suitable examples, 129 a script should look like this:: 110 Fit the model and return a tuple with the fitted values and 111 the corresponding statistics or an error indicator. The two 112 cases differ by the tuple length and the status (the first 113 tuple element). 114 115 ``(status, beta, beta_se, likelihood)`` Fitting succeeded. The 116 first element, ``status`` is either :obj:`OK`, 117 :obj:`Infinity` or :obj:`Divergence`. In the latter cases, 118 returned values may still be useful for making 119 predictions, but it is recommended to inspect the 120 coefficients and their errors and decide whether to use 121 the model or not. 122 123 ``(status, variable)`` 124 The fitter failed due to the indicated 125 ``variable``. ``status`` is either :obj:`Constant` or 126 :obj:`Singularity`. 127 128 The proper way of calling the fitter is to handle both scenarios :: 130 129 131 130 res = fitter(examples) … … 141 140 142 141 The sole fitter available at the 143 moment. Itis a C++ translation of `Alan Miller's logistic regression144 code <http://users.bigpond.net.au/amiller/>`_ . It uses Newton-Raphson142 moment. This is a C++ translation of `Alan Miller's logistic regression 143 code <http://users.bigpond.net.au/amiller/>`_ that uses Newton-Raphson 145 144 algorithm to iteratively minimize least squares error computed from 146 learning examples.145 training data. 147 146 148 147 … … 158 157 -------- 159 158 160 The first example shows a very simple induction of a logistic regression 161 classifier (:download:`logreg-run.py <code/logreg-run.py>`). 159 The first example shows a straightforward use a logistic regression (:download:`logreg-run.py <code/logreg-run.py>`). 162 160 163 161 .. literalinclude:: code/logreg-run.py … … 210 208 211 209 If :obj:`remove_singular` is set to 0, inducing a logistic regression 212 classifier would returnan error::210 classifier returns an error:: 213 211 214 212 Traceback (most recent call last): … … 221 219 orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 222 220 223 We can see that the attribute workclass is causing a singularity.221 The attribute variable which causes the singularity is ``workclass``. 224 222 225 223 The example below shows how the use of stepwise logistic regression can help to
Note: See TracChangeset
for help on using the changeset viewer.
