Changeset 7583:c9b17506b1c3 in orange
 Timestamp:
 02/04/11 23:55:31 (3 years ago)
 Branch:
 default
 Convert:
 3468eeed63d850768a0dbfacaa44b72494013e51
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

orange/Orange/classification/logreg.py
r7413 r7583 1 1 """ 2 .. index: logreg 3 4 =================== 5 Logistic Regression 6 =================== 7 8 Implements logistic regression and extends it's use to discrete features. 9 It can handle various anomalies in features, such as constant variables 10 and singularities, that make fitting logistic regression almost 11 impossible. It also implements a function for constructing stepwise 12 logistic regression, which is a good technique for prevent overfitting, 13 and is a good feature subset selection technique as well. 14 15 16 Useful Functions 17  2 .. index: logistic regression 3 .. index: 4 single: classification; logistic regression 5 6 ******************* 7 Logistic regression 8 ******************* 9 10 Implements `logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ 11 with an extension for proper treatment of discrete features. 12 The algorithm can handle various anomalies in features, such as constant variables 13 and singularities, that could make fitting of logistic regression almost 14 impossible. Stepwise logistic regression, which iteratively selects the most informative features, 15 is also supported. 16 18 17 19 18 .. autofunction:: LogRegLearner … … 21 20 .. autofunction:: dump 22 21 23 24 Class25 26 27 22 .. autoclass:: StepWiseFSS_class 28 23 :members: 29 24 30 25 Examples 31  26 ======== 32 27 33 28 The first example shows a very simple induction of a logistic regression … … 57 52 .. literalinclude:: code/logregsingularities.py 58 53 59 Result::54 The first few lines of the output of this script are:: 60 55 61 56 <=50K <=50K … … 81 76 maritalstatus=MarriedAFspouse 4.01 0.00 inf 0.00 55.19 82 77 occupation=Techsupport 0.32 0.00 inf 0.00 0.72 83 occupation=Craftrepair 0.37 0.00 inf 0.00 1.45 84 occupation=Otherservice 2.68 0.00 inf 0.00 14.61 85 occupation=Sales 0.22 0.00 inf 0.00 1.24 86 occupation=Profspecialty 0.18 0.00 inf 0.00 1.19 87 occupation=Handlerscleaners 1.29 0.00 inf 0.00 3.64 88 occupation=Machineopinspct 0.86 0.00 inf 0.00 2.37 89 occupation=Admclerical 0.30 0.00 inf 0.00 1.35 90 occupation=Farmingfishing 1.12 0.00 inf 0.00 3.06 91 occupation=Transportmoving 0.62 0.00 inf 0.00 1.85 92 occupation=Privhouseserv 3.46 0.00 inf 0.00 31.87 93 occupation=Protectiveserv 0.11 0.00 inf 0.00 1.12 94 occupation=ArmedForces 0.59 0.00 inf 0.00 1.81 95 relationship=Wife 1.06 0.00 inf 0.00 0.35 96 relationship=Ownchild 1.04 60.00 0.02 0.99 0.35 97 relationship=Notinfamily 1.94 532845.00 0.00 1.00 0.14 98 relationship=Otherrelative 2.42 0.00 inf 0.00 0.09 99 relationship=Unmarried 1.92 0.00 inf 0.00 0.15 100 race=AsianPacIslander 0.19 0.00 inf 0.00 0.83 101 race=AmerIndianEskimo 2.88 0.00 inf 0.00 17.78 102 race=Other 3.93 0.00 inf 0.00 51.07 103 race=Black 0.11 0.00 inf 0.00 1.12 104 sex=Female 0.30 0.00 inf 0.00 1.36 105 capitalgain 0.00 0.00 inf 0.00 1.00 106 capitalloss 0.00 0.00 inf 0.00 1.00 107 hoursperweek 0.04 0.00 inf 0.00 0.96 108 109 In case we set :obj:`removeSingular` to 0, inducing a logistic regression 78 79 If :obj:`removeSingular` is set to 0, inducing a logistic regression 110 80 classifier would return an error:: 111 81 … … 119 89 orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Neverworked 120 90 121 122 91 We can see that the attribute workclass is causing a singularity. 123 92 124 The last example shows, how the use of stepwise logistic 125 regression can help us achieve better classification 126 (`logregstepwise.py`_, uses `ionosphere.tab`_): 93 The example below shows, how the use of stepwise logistic regression can help to 94 gain in classification performance (`logregstepwise.py`_, uses `ionosphere.tab`_): 127 95 128 96 .. literalinclude:: code/logregstepwise.py 129 97 130 Result::98 The output of this script is:: 131 99 132 100 Learner CA … … 166 134 10 x a8 167 135 168 References169 170 171 David W. Hosmer, Stanley Lemeshow. Applied Logistic Regression  2nd ed. Wiley, New York, 2000172 173 174 136 .. _logregrun.py: code/logregrun.py 175 137 .. _logregsingularities.py: code/logregsingularities.py … … 191 153 192 154 193 ####################### 194 ## Print out methods ##195 ####################### 155 ########################################################################## 156 ## Print out methods 157 196 158 def dump(classifier): 197 159 """ Formatted print to console of all major features in logistic … … 800 762 801 763 ############################################################ 802 #### Feature subset selection for logistic regression #### 803 ############################################################ 764 # Feature subset selection for logistic regression 804 765 805 766 … … 1027 988 1028 989 #################################### 1029 #### PROBABILITY CALCULATIONS #### 1030 #################################### 990 ## PROBABILITY CALCULATIONS 1031 991 1032 992 def lchisqprob(chisq,df): 1033 993 """ 1034 Returns the (1tailed) probability value associated with the provided 1035 chisquare value and df. Adapted from chisq.c in Gary Perlman's Stat. 1036 1037 Usage: lchisqprob(chisq,df) 1038 """ 994 Return the (1tailed) probability value associated with the provided 995 chisquare value and df. Adapted from chisq.c in Gary Perlman's Stat. 996 """ 1039 997 BIG = 20.0 1040 998 def ex(x): … … 1091 1049 def zprob(z): 1092 1050 """ 1093 Returns the area under the normal curve 'to the left of' the given z value. 1094 Thus, 1051 Returns the area under the normal curve 'to the left of' the given z value. 1052 Thus:: 1053 1095 1054 for z<0, zprob(z) = 1tail probability 1096 1055 for z>0, 1.0zprob(z) = 1tail probability 1097 1056 for any z, 2.0*(1.0zprob(abs(z))) = 2tail probability 1098 Adapted from z.c in Gary Perlman's Stat. 1099 1100 Usage: lzprob(z) 1101 """ 1057 1058 Adapted from z.c in Gary Perlman's Stat. 1059 """ 1102 1060 Z_MAX = 6.0 # maximum meaningful zvalue 1103 1061 if z == 0.0:
Note: See TracChangeset
for help on using the changeset viewer.