source: orange/docs/widgets/rst/classify/nomogram.rst @ 11778:ecd4beec2099

Revision 11778:ecd4beec2099, 5.9 KB checked in by Ales Erjavec <ales.erjavec@…>, 5 months ago (diff)

Use new SVG icons in the widget documentation.

Line 
1.. _Nomogram:
2
3Nomogram
4========
5
6.. image:: ../../../../Orange/OrangeWidgets/Classify/icons/Nomogram.svg
7
8Nomogram
9
10Signals
11-------
12
13Inputs:
14   - Classifier (orange.Classifier)
15      A classifier (either naive Bayesian classifier or logistic regression)
16
17
18Outputs:
19   - None
20
21
22Description
23-----------
24
25Nomogram is a simple and intuitive, yet useful and powerful representation of
26linear models, such as logistic regression and naive Bayesian classifier. In
27statistical terms, the nomogram plots log odds ratios for each value of each
28attribute. We shall describe its basic properties here, though we recommend
29reading the paper in which we introduced the nomograms for naive Bayesian
30classifier, `Nomograms for Visualization of Naive Bayesian Classifier`_. This
31description will show the nomogram for a naive Bayesian classifier; nomograms
32for other types of classifiers are similar, though they lack some functionality
33due to inherent limitations of these models.
34
35.. _Nomograms for Visualization of Naive Bayesian Classifier: http://www.ailab.si/blaz/papers/2004-PKDD.pdf
36
37The snapshot below shows a naive Bayesian nomogram for the heart disease data.
38The first attribute, gender, has two values, where log odds ratio for
39females is -1 (as read from the axis on the top) and for males it is around
400.4. For the next attribute, the type of chest pain, the asymptotic pain
41votes for the target class (having narrowed vessels), and the other three
42have negative odds of different magnitudes. Note that these are odds for
43naive Bayesian classifier, where, unlike in logistic regression, there is
44no "base value" which would have a odds ratio of zero.
45
46.. image:: images/Nomogram.png
47
48The third attribute, SBP at rest, is continuous. To get log odds ratios
49for a particular value of the attribute, find the value (say 175) of the
50vertical axis to the left of the curve corresponding to the attribute. Then
51imagine a line to the left, at the point where it hits the curve, turn
52upwards and read the number on the top scale. The SBP of 175 has log odds
53ration of approximately 1 (0.93, to be precise). The curve thus shows a
54mapping from attribute values on the left to log odds at the top.
55
56Nomogram is a great data exploration tool. Lengths of the lines correspond
57to spans of odds ratios, suggesting importance of attributes. It also shows
58impacts of individual values; being female is good and being male is bad
59(w.r.t. this disease, at least); besides, being female is much more
60beneficial than being male is harmful. Gender is, however, a much less
61important attribute than the maximal heart rate (HR) with log odds from
62-3.5 to +2.2. SBP's from 125 to 140 are equivalent, that is, have the
63same odds ratios...
64
65.. image:: images/Nomogram-predictions.png
66
67Nomograms can also be used for making probabilistic prediction. A sum
68of log odds ratios for a male with asymptomatic chest pain, a rest
69SBP of 100, cholesterol 200 and maximal heart rate 175 is
70`0.38 + 1.16 + -0.51 + -0.4 = -0.58`, which corresponds to a probability
7132 % for having the disease. To use the widget for classification,
72check :obj:`Show predictions`. The widget then shows a blue dots on
73attribute axes, which can be dragged around - or left at the zero-line
74if the corresponding value is unknown. The axes at the bottom then show
75a mapping from the sum of log odds to probabilities.
76
77Now for the settings. Option :obj:`Target Class` defines the target class,
78Attribute values to the right of the zero line represent arguments for
79that class and values to the left are arguments against it.
80
81
82Log odds for naive Bayesian classifier are computed so that all values
83can have non-zero log odds. The nomogram is drawn as shown above, if
84alignment is set to :obj:`Align by zero influence`. If set to
85:obj:`Align left`, all attribute axes are left-aligned. Logistic regression
86compares the base value with other attribute values, so the base value
87always has log odds ratio of 0, and the attribute axes are always aligned
88to the left.
89
90The influence of continuous attribute can be shown as two dimensional
91curves (:obj:`2D curve`) or with the values projected onto a single line
92(:obj:`1D projection`). The latter make the nomogram smaller, but can be
93unreadable if the log odds are not monotonous. In our sample, the
94nomogram would look OK for the heart rate and SBP, but not for cholesterol.
95
96The widget can show either log odds ratios (:obj:`Log odds ratios`),
97as above, or "points" (:obj:`Point scale`). In the latter case, log OR
98are simply scaled to the interval -100 to 100 for easier (manual)
99calculation, for instance, if one wishes to print out the nomogram
100and use it on the paper.
101
102:obj:`Show prediction` puts a blue dot at each attribute which we
103can drag to the corresponding value. The widget sums the log odds
104ratios and shows the probability of the target class on the bottom
105axes. :obj:`Confidence intervals` adds confidence intervals for the
106individual log ratios and for probability prediction. :obj:`Show histogram`
107adds a bar whose height represents the relative number of examples for
108each value of discrete attribute, while for continuous attributes the
109curve is thickened where the number of examples is higher.
110
111.. image:: images/Nomogram-histograms.png
112
113For instance, for gender the number of males is about twice as big than
114the number of females, and the confidence interval for the log OR is
115correspondingly smaller. The histograms and confidence intervals also
116explain the strange finding that extreme cholesterol level (600) is healthy,
117healthier than 200, while really low cholesterol (50) is almost as bad as
118levels around 300. The big majority of patients have cholesterol between
119200 and 300; what happens outside this interval may be a random effect,
120which is also suggested by the very wide confidence intervals.
121
122
123Examples
124--------
125
126To draw a nomogram, we need to get some data (e.g. from the
127:ref:`File` widget, induce a classifier and give it to the nomogram.
128
129.. image:: images/NaiveBayes-SchemaClassifier.png
130   :alt: Naive Bayesian Classifier - Schema with a Nomogram
Note: See TracBrowser for help on using the repository browser.