#
source:
orange/orange/doc/reference/LogisticLearner.htm
@
6538:a5f65d7f0b2c

Revision 6538:a5f65d7f0b2c, 6.9 KB checked in by Mitar <Mitar@…>, 4 years ago (diff) |
---|

Line | |
---|---|

1 | <html> |

2 | <HEAD> |

3 | <LINK REL=StyleSheet HREF="../style.css" TYPE="text/css"> |

4 | <LINK REL=StyleSheet HREF="style-print.css" TYPE="text/css" MEDIA=print> |

5 | </HEAD> |

6 | <body> |

7 | <index name="classifiers+logistic regression"> |

8 | <h1>Logistic Regression Classifier and Learner</h1> |

9 | <P align=left>Logistic regression is a popular classification method that comes from statistics. The model is described by a linear combination of coefficients, |

10 | <P align=center>F = beta_0 + beta_1*X_1 + beta_2*X_2 + ... + beta_k*X_k<BR></P> |

11 | <P>and the probability (p) of a class value is computed as:</P> |

12 | <P align=center>p = exp(F)/(1+exp(F))</P> |

13 | |

14 | <P>The outcome variable (class) must be binary (dichotomous) and discrete attributes must be translated to continuous. While Orange kernel provides the basic functionality, module <A href="../modules/orngLR.htm">orngLR.py</A> covers the necessary adaptations and conversions.</P> |

15 | |

16 | <hr> |

17 | |

18 | <H2>LogRegClassifier</H2> |

19 | <index name="classes/LogRegClassifier"> |

20 | |

21 | <P><CODE>LogRegClassifier</CODE> stores estimated values of regression coefficients and their significances, and uses them to predict classes and class probabilities using the equations described above.</P> |

22 | |

23 | <P class=section>Attributes</P> |

24 | <DL class=attributes> |

25 | <DT>beta</DT> |

26 | <DD>Estimated regression coefficients.</DD> |

27 | |

28 | <DT>beta_se</DT> |

29 | <DD>Estimated standard errors for regression coefficients.</DD> |

30 | |

31 | <DT>wald_Z</DT> |

32 | <DD>Wald Z statistics for beta coefficients. Wald Z is computed as <CODE>beta</CODE>/<CODE>beta_se</CODE>.</DD> |

33 | |

34 | <DT>P</DT> |

35 | <DD>List of P-values for beta coefficients, that is, the probability that beta coefficients differ from 0.0. The probability is computed from squared Wald Z statistics that is distributed with Chi-Square distribution.</DD> |

36 | |

37 | <DT>likelihood</DT> |

38 | <DD>The probability of the sample (ie. learning examples) observed on the basis of the derived model, as a function of the regression parameters. |

39 | </DD> |

40 | |

41 | <DT>fitStatus</DT> |

42 | <DD>Tells how the model fitting ended - either regularly (<CODE>LogRegFitter.OK</CODE>), or it was interrupted due to one of beta coefficients escaping towards infinity (<CODE>LogRegFitter.Infinity</CODE>) or since the values didn't converge (<CODE>LogRegFitter.Divergence</CODE>). The value tells about the classifier's "reliability"; the classifier itself is useful in either case.</DD> |

43 | </DL> |

44 | |

45 | <H2>LogRegLearner</H2> |

46 | <index name="classes/LogRegLearner"> |

47 | |

48 | <P>Logistic learner fits the beta coefficients and computes the related statistics by calling the specified <CODE>fitter</CODE>.</P> |

49 | |

50 | <P class=section>Attributes</P> |

51 | <DL class=attributes> |

52 | <DT>fitter</DT> |

53 | <DD>An object that fits beta coefficients and corresponding |

54 | standard errors from a set of data.</DD> |

55 | </DL> |

56 | |

57 | <P class=section>Methods</P> |

58 | <DL class=attributes> |

59 | <DT>fitModel(examples[, weightID =])</DT> |

60 | <DD>Fits the model by calling <CODE>fitter</CODE>. If fitting succeeds, it returns a <CODE>Classifer</CODE>; if not, it returns the offending attribute. You should therefore always check the type of result returned, as follows. |

61 | |

62 | <XMP class=code>c = fitModel(examples) |

63 | if isinstance(c, Variable): |

64 | < remove the attribute c and see what happens > |

65 | else: |

66 | < we have a classifier, life is beautiful > |

67 | </XMP> |

68 | </DL> |

69 | |

70 | <P>As all learners, <CODE>LogRegLearner</CODE> naturally provides the usual call operator, whom you pass examples (and weights, if you have them) and which returns a classifier or throws an exception if it can't. Use <CODE>fitModel</CODE> in the code that will iteratively remove problem attributes until it gets a classifier; in fact, that's exactly what <a href="../modules/orngLR.htm"><CODE>orngLR</CODE></A> does.</DD> |

71 | </DL> |

72 | |

73 | <H2><A name=fitters></A>Logistic Regression Fitters</H2> |

74 | |

75 | <P>Fitters are objects that LogRegLearner uses to fit the model.</P> |

76 | |

77 | <H3>LogRegFitter</H3> |

78 | <index name="classes/LogRegFitter"> |

79 | |

80 | <P><CODE>LogRegFitter</CODE> is the abstract base class for logistic fitters. It defines the form of call operator and the constants denoting its (un)success:</P> |

81 | |

82 | <P class=section>Constants</P> |

83 | <DL class=attributes> |

84 | <DT>OK</DT> |

85 | <DD>Fitter succeeded to converge to the optimal fit.</DD> |

86 | |

87 | <DT>Infinity</DT> |

88 | <DD>Fitter failed due to one or more beta coefficients escaping towards infinity.</DD> |

89 | |

90 | <DT>Divergence</DT> |

91 | <DD>Beta coefficients failed to converge, but none of beta coefficients escaped.</DD> |

92 | |

93 | <DT>Constant</DT> |

94 | <DD>There is a constant attribute that causes the matrix to be singular.</DD> |

95 | |

96 | <DT>Singularity</DT> |

97 | <DD>The matrix is singular.</DD> |

98 | </DL> |

99 | |

100 | <P class=section>Methods</P> |

101 | <DL class=attributes> |

102 | <DT>__call__(examples, weightID)</DT> |

103 | <DD>Performs the fitting. There can be two different cases: either the fitting succeeded to find a set of beta coefficients (although possibly with difficulties) or the fitting failed altogether. The two cases return different results. |

104 | <DL class=attributes> |

105 | <DT>(status, beta, beta_se, likelihood)</DT> |

106 | <DD>The fitter managed to fit the model. The first element of the tuple, <CODE>result</CODE>, tells about the problems occurred; it can be either <CODE>OK</CODE>, <CODE>Infinity</CODE> or <CODE>Divergence</CODE>. In the latter cases, returned values may still be useful for making predictions, but it's recommended that you inspect the coefficients and their errors and make your decision whether to use the model or not.</DD> |

107 | |

108 | <DT>(status, attribute)</DT> |

109 | <DD>The fitter failed and the returned <CODE>attribute</CODE> is responsible for it. The type of failure is reported in <CODE>status</CODE>, which can be either <CODE>Constant</CODE> or <CODE>Singularity</CODE>.</DD> |

110 | </DL> |

111 | </P> |

112 | |

113 | <P style="margin-top:12pt">The proper way of calling the fitter is to expect and handle all the situations described. For instance, if <CODE>fitter</CODE> is an instance of some fitter and <CODE>examples</CODE> contain a set of suitable examples, a script should look like this:</P> |

114 | |

115 | <XMP class=code>res = fitter(examples) |

116 | if res[0] in [fitter.OK, fitter.Infinity, fitter.Divergence]: |

117 | status, beta, beta_se, likelihood = res |

118 | < proceed by doing something with what you got > |

119 | else: |

120 | status, attr = res |

121 | < remove the attribute or complain to the user or ... > |

122 | </XMP> |

123 | </DL> |

124 | |

125 | <H3>LogRegFitter_Cholesky</H3> |

126 | <index name="classes/LogRegFitter_Cholesky"> |

127 | |

128 | <P><CODE>LogRegFitter_Cholesky</CODE> is the sole fitter available at the moment. It is a C++ translation of <A href="http://users.bigpond.net.au/amiller/">Alan Miller's logistic regression code</A>. It uses Newton-Raphson algorithm to iteratively minimize least squares error computed from learning examples.</P> |

129 | |

130 | <hr> |

131 | |

132 | <P><FONT size=5>Examples</FONT></P></H2> |

133 | |

134 | <p>Since basic logistic regression allows only continuous |

135 | attributes and a dichotome class, we show only a very basic example. More detailed use of logistic regression is shown in logistic regression module.</p> |

136 | |

137 | <p>Let us load the data, induce a classifier and see how it performs on the first five examples.</p> |

138 | |

139 | <xmp class="code">>>> data = orange.ExampleTable("ionosphere") |

140 | >>> logistic = orange.LogRegLearner(data) |

141 | >>> |

142 | >>> for ex in data[:5]: |

143 | ... print ex.getclass(), logistic(ex) |

144 | g g |

145 | b b |

146 | g g |

147 | b b |

148 | g g |

149 | </xmp> |

150 | |

151 | </body> |

**Note:**See TracBrowser for help on using the repository browser.