Changeset 7583:c9b17506b1c3 in orange


Ignore:
Timestamp:
02/04/11 23:55:31 (3 years ago)
Author:
blaz <blaz.zupan@…>
Branch:
default
Convert:
3468eeed63d850768a0dbfacaa44b72494013e51
Message:

some editing, also, needs further work (statistical functions at the end need to be moved)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/logreg.py

    r7413 r7583  
    11""" 
    2 .. index: logreg 
    3  
    4 =================== 
    5 Logistic Regression 
    6 =================== 
    7  
    8 Implements logistic regression and extends it's use to discrete features. 
    9 It can handle various anomalies in features, such as constant variables 
    10 and singularities, that make fitting logistic regression almost 
    11 impossible. It also implements a function for constructing stepwise 
    12 logistic regression, which is a good technique for prevent overfitting, 
    13 and is a good feature subset selection technique as well. 
    14  
    15  
    16 Useful Functions 
    17 ---------------- 
     2.. index: logistic regression 
     3.. index: 
     4   single: classification; logistic regression 
     5 
     6******************* 
     7Logistic regression 
     8******************* 
     9 
     10Implements `logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ 
     11with an extension for proper treatment of discrete features. 
     12The algorithm can handle various anomalies in features, such as constant variables 
     13and singularities, that could make fitting of logistic regression almost 
     14impossible. Stepwise logistic regression, which iteratively selects the most informative features, 
     15is also supported. 
     16 
    1817 
    1918.. autofunction:: LogRegLearner 
     
    2120.. autofunction:: dump 
    2221 
    23  
    24 Class 
    25 ----- 
    26  
    2722.. autoclass:: StepWiseFSS_class 
    2823   :members: 
    2924 
    3025Examples 
    31 -------- 
     26======== 
    3227 
    3328The first example shows a very simple induction of a logistic regression 
     
    5752.. literalinclude:: code/logreg-singularities.py 
    5853 
    59 Result:: 
     54The first few lines of the output of this script are:: 
    6055 
    6156    <=50K <=50K 
     
    8176        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19 
    8277                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72 
    83                  occupation=Craft-repair       0.37       0.00        inf       0.00       1.45 
    84                 occupation=Other-service       2.68       0.00        inf       0.00      14.61 
    85                         occupation=Sales       0.22       0.00        inf       0.00       1.24 
    86                occupation=Prof-specialty       0.18       0.00        inf       0.00       1.19 
    87             occupation=Handlers-cleaners       1.29       0.00        inf       0.00       3.64 
    88             occupation=Machine-op-inspct       0.86       0.00        inf       0.00       2.37 
    89                  occupation=Adm-clerical       0.30       0.00        inf       0.00       1.35 
    90               occupation=Farming-fishing       1.12       0.00        inf       0.00       3.06 
    91              occupation=Transport-moving       0.62       0.00        inf       0.00       1.85 
    92               occupation=Priv-house-serv       3.46       0.00        inf       0.00      31.87 
    93               occupation=Protective-serv       0.11       0.00        inf       0.00       1.12 
    94                  occupation=Armed-Forces       0.59       0.00        inf       0.00       1.81 
    95                        relationship=Wife      -1.06      -0.00        inf       0.00       0.35 
    96                   relationship=Own-child      -1.04      60.00      -0.02       0.99       0.35 
    97               relationship=Not-in-family      -1.94  532845.00      -0.00       1.00       0.14 
    98              relationship=Other-relative      -2.42       0.00       -inf       0.00       0.09 
    99                   relationship=Unmarried      -1.92       0.00       -inf       0.00       0.15 
    100                  race=Asian-Pac-Islander      -0.19       0.00       -inf       0.00       0.83 
    101                  race=Amer-Indian-Eskimo       2.88       0.00        inf       0.00      17.78 
    102                               race=Other       3.93       0.00        inf       0.00      51.07 
    103                               race=Black       0.11       0.00        inf       0.00       1.12 
    104                               sex=Female       0.30       0.00        inf       0.00       1.36 
    105                             capital-gain      -0.00       0.00       -inf       0.00       1.00 
    106                             capital-loss      -0.00       0.00       -inf       0.00       1.00 
    107                           hours-per-week      -0.04       0.00       -inf       0.00       0.96 
    108  
    109 In case we set :obj:`removeSingular` to 0, inducing a logistic regression 
     78 
     79If :obj:`removeSingular` is set to 0, inducing a logistic regression 
    11080classifier would return an error:: 
    11181 
     
    11989    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 
    12090 
    121  
    12291We can see that the attribute workclass is causing a singularity. 
    12392 
    124 The last example shows, how the use of stepwise logistic 
    125 regression can help us achieve better classification 
    126 (`logreg-stepwise.py`_, uses `ionosphere.tab`_): 
     93The example below shows, how the use of stepwise logistic regression can help to 
     94gain in classification performance (`logreg-stepwise.py`_, uses `ionosphere.tab`_): 
    12795 
    12896.. literalinclude:: code/logreg-stepwise.py 
    12997 
    130 Result:: 
     98The output of this script is:: 
    13199 
    132100    Learner      CA 
     
    166134    10 x a8 
    167135 
    168 References 
    169 ---------- 
    170  
    171 David W. Hosmer, Stanley Lemeshow. Applied Logistic Regression - 2nd ed. Wiley, New York, 2000  
    172  
    173  
    174136.. _logreg-run.py: code/logreg-run.py 
    175137.. _logreg-singularities.py: code/logreg-singularities.py 
     
    191153 
    192154 
    193 ####################### 
    194 ## Print out methods ## 
    195 ####################### 
     155########################################################################## 
     156## Print out methods 
     157 
    196158def dump(classifier): 
    197159    """ Formatted print to console of all major features in logistic 
     
    800762     
    801763############################################################ 
    802 ####  Feature subset selection for logistic regression  #### 
    803 ############################################################ 
     764#  Feature subset selection for logistic regression 
    804765 
    805766 
     
    1027988 
    1028989#################################### 
    1029 ####  PROBABILITY CALCULATIONS  #### 
    1030 #################################### 
     990##  PROBABILITY CALCULATIONS 
    1031991 
    1032992def lchisqprob(chisq,df): 
    1033993    """ 
    1034 Returns the (1-tailed) probability value associated with the provided 
    1035 chi-square value and df.  Adapted from chisq.c in Gary Perlman's |Stat. 
    1036  
    1037 Usage:   lchisqprob(chisq,df) 
    1038 """ 
     994    Return the (1-tailed) probability value associated with the provided 
     995    chi-square value and df.  Adapted from chisq.c in Gary Perlman's |Stat. 
     996    """ 
    1039997    BIG = 20.0 
    1040998    def ex(x): 
     
    10911049def zprob(z): 
    10921050    """ 
    1093 Returns the area under the normal curve 'to the left of' the given z value. 
    1094 Thus,  
     1051    Returns the area under the normal curve 'to the left of' the given z value. 
     1052    Thus::  
     1053 
    10951054    for z<0, zprob(z) = 1-tail probability 
    10961055    for z>0, 1.0-zprob(z) = 1-tail probability 
    10971056    for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 
    1098 Adapted from z.c in Gary Perlman's |Stat. 
    1099  
    1100 Usage:   lzprob(z) 
    1101 """ 
     1057 
     1058    Adapted from z.c in Gary Perlman's |Stat. 
     1059    """ 
    11021060    Z_MAX = 6.0    # maximum meaningful z-value 
    11031061    if z == 0.0: 
Note: See TracChangeset for help on using the changeset viewer.