#
source:
orange/docs/widgets/rst/classify/nomogram.rst
@
11778:ecd4beec2099

Revision 11778:ecd4beec2099, 5.9 KB checked in by Ales Erjavec <ales.erjavec@…>, 5 months ago (diff) |
---|

Line | |
---|---|

1 | .. _Nomogram: |

2 | |

3 | Nomogram |

4 | ======== |

5 | |

6 | .. image:: ../../../../Orange/OrangeWidgets/Classify/icons/Nomogram.svg |

7 | |

8 | Nomogram |

9 | |

10 | Signals |

11 | ------- |

12 | |

13 | Inputs: |

14 | - Classifier (orange.Classifier) |

15 | A classifier (either naive Bayesian classifier or logistic regression) |

16 | |

17 | |

18 | Outputs: |

19 | - None |

20 | |

21 | |

22 | Description |

23 | ----------- |

24 | |

25 | Nomogram is a simple and intuitive, yet useful and powerful representation of |

26 | linear models, such as logistic regression and naive Bayesian classifier. In |

27 | statistical terms, the nomogram plots log odds ratios for each value of each |

28 | attribute. We shall describe its basic properties here, though we recommend |

29 | reading the paper in which we introduced the nomograms for naive Bayesian |

30 | classifier, `Nomograms for Visualization of Naive Bayesian Classifier`_. This |

31 | description will show the nomogram for a naive Bayesian classifier; nomograms |

32 | for other types of classifiers are similar, though they lack some functionality |

33 | due to inherent limitations of these models. |

34 | |

35 | .. _Nomograms for Visualization of Naive Bayesian Classifier: http://www.ailab.si/blaz/papers/2004-PKDD.pdf |

36 | |

37 | The snapshot below shows a naive Bayesian nomogram for the heart disease data. |

38 | The first attribute, gender, has two values, where log odds ratio for |

39 | females is -1 (as read from the axis on the top) and for males it is around |

40 | 0.4. For the next attribute, the type of chest pain, the asymptotic pain |

41 | votes for the target class (having narrowed vessels), and the other three |

42 | have negative odds of different magnitudes. Note that these are odds for |

43 | naive Bayesian classifier, where, unlike in logistic regression, there is |

44 | no "base value" which would have a odds ratio of zero. |

45 | |

46 | .. image:: images/Nomogram.png |

47 | |

48 | The third attribute, SBP at rest, is continuous. To get log odds ratios |

49 | for a particular value of the attribute, find the value (say 175) of the |

50 | vertical axis to the left of the curve corresponding to the attribute. Then |

51 | imagine a line to the left, at the point where it hits the curve, turn |

52 | upwards and read the number on the top scale. The SBP of 175 has log odds |

53 | ration of approximately 1 (0.93, to be precise). The curve thus shows a |

54 | mapping from attribute values on the left to log odds at the top. |

55 | |

56 | Nomogram is a great data exploration tool. Lengths of the lines correspond |

57 | to spans of odds ratios, suggesting importance of attributes. It also shows |

58 | impacts of individual values; being female is good and being male is bad |

59 | (w.r.t. this disease, at least); besides, being female is much more |

60 | beneficial than being male is harmful. Gender is, however, a much less |

61 | important attribute than the maximal heart rate (HR) with log odds from |

62 | -3.5 to +2.2. SBP's from 125 to 140 are equivalent, that is, have the |

63 | same odds ratios... |

64 | |

65 | .. image:: images/Nomogram-predictions.png |

66 | |

67 | Nomograms can also be used for making probabilistic prediction. A sum |

68 | of log odds ratios for a male with asymptomatic chest pain, a rest |

69 | SBP of 100, cholesterol 200 and maximal heart rate 175 is |

70 | `0.38 + 1.16 + -0.51 + -0.4 = -0.58`, which corresponds to a probability |

71 | 32 % for having the disease. To use the widget for classification, |

72 | check :obj:`Show predictions`. The widget then shows a blue dots on |

73 | attribute axes, which can be dragged around - or left at the zero-line |

74 | if the corresponding value is unknown. The axes at the bottom then show |

75 | a mapping from the sum of log odds to probabilities. |

76 | |

77 | Now for the settings. Option :obj:`Target Class` defines the target class, |

78 | Attribute values to the right of the zero line represent arguments for |

79 | that class and values to the left are arguments against it. |

80 | |

81 | |

82 | Log odds for naive Bayesian classifier are computed so that all values |

83 | can have non-zero log odds. The nomogram is drawn as shown above, if |

84 | alignment is set to :obj:`Align by zero influence`. If set to |

85 | :obj:`Align left`, all attribute axes are left-aligned. Logistic regression |

86 | compares the base value with other attribute values, so the base value |

87 | always has log odds ratio of 0, and the attribute axes are always aligned |

88 | to the left. |

89 | |

90 | The influence of continuous attribute can be shown as two dimensional |

91 | curves (:obj:`2D curve`) or with the values projected onto a single line |

92 | (:obj:`1D projection`). The latter make the nomogram smaller, but can be |

93 | unreadable if the log odds are not monotonous. In our sample, the |

94 | nomogram would look OK for the heart rate and SBP, but not for cholesterol. |

95 | |

96 | The widget can show either log odds ratios (:obj:`Log odds ratios`), |

97 | as above, or "points" (:obj:`Point scale`). In the latter case, log OR |

98 | are simply scaled to the interval -100 to 100 for easier (manual) |

99 | calculation, for instance, if one wishes to print out the nomogram |

100 | and use it on the paper. |

101 | |

102 | :obj:`Show prediction` puts a blue dot at each attribute which we |

103 | can drag to the corresponding value. The widget sums the log odds |

104 | ratios and shows the probability of the target class on the bottom |

105 | axes. :obj:`Confidence intervals` adds confidence intervals for the |

106 | individual log ratios and for probability prediction. :obj:`Show histogram` |

107 | adds a bar whose height represents the relative number of examples for |

108 | each value of discrete attribute, while for continuous attributes the |

109 | curve is thickened where the number of examples is higher. |

110 | |

111 | .. image:: images/Nomogram-histograms.png |

112 | |

113 | For instance, for gender the number of males is about twice as big than |

114 | the number of females, and the confidence interval for the log OR is |

115 | correspondingly smaller. The histograms and confidence intervals also |

116 | explain the strange finding that extreme cholesterol level (600) is healthy, |

117 | healthier than 200, while really low cholesterol (50) is almost as bad as |

118 | levels around 300. The big majority of patients have cholesterol between |

119 | 200 and 300; what happens outside this interval may be a random effect, |

120 | which is also suggested by the very wide confidence intervals. |

121 | |

122 | |

123 | Examples |

124 | -------- |

125 | |

126 | To draw a nomogram, we need to get some data (e.g. from the |

127 | :ref:`File` widget, induce a classifier and give it to the nomogram. |

128 | |

129 | .. image:: images/NaiveBayes-SchemaClassifier.png |

130 | :alt: Naive Bayesian Classifier - Schema with a Nomogram |

**Note:**See TracBrowser for help on using the repository browser.