source: orange/orange/doc/datasets/lung-cancer.htm @ 1760:9d4bb141fb0e

Revision 1760:9d4bb141fb0e, 2.2 KB checked in by blaz <blaz.zupan@…>, 9 years ago (diff)

data info file

Line 
1<html>
2<head>
3<title>Lung Cancer Data Base</title>
4</head>
5<body>
6<h1>Info on Lung Cancer Data Base</h1>
7<pre>
81. Title: Lung Cancer Data
9
102. Source Information:
11    - Data was published in :
12      Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small
13      Number of Samples and Design Method of Classifier on the Plane",
14      Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991.
15    - Donor: Stefan Aeberhard, stefan@coral.cs.jcu.edu.au
16    - Date : May, 1992
17
183. Past Usage:
19    - Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small
20          Number of Samples and Design Method of Classifier on the Plane",
21          Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991.
22    - Aeberhard, S., Coomans, D, De Vel, O. "Comparisons of
23      Classification Methods in High Dimensional Settings",
24      submitted to Technometrics.
25    - Aeberhard, S., Coomans, D, De Vel, O. "The Dangers of
26      Bias in High Dimensional Settings", submitted to
27      pattern Recognition.
28
294. Relevant Information:
30    - This data was used by Hong and Young to illustrate the
31      power of the optimal discriminant plane even in ill-posed
32      settings. Applying the KNN method in the resulting plane 
33      gave 77% accuracy. However, these results are strongly
34      biased (See Aeberhard's second ref. above, or email to
35      stefan@coral.cs.jcu.edu.au). Results obtained by
36      Aeberhard et al. are :
37      RDA : 62.5%, KNN 53.1%, Opt. Disc. Plane 59.4%
38
39      The data described 3 types of pathological lung cancers.
40      The Authors give no information on the individual
41      variables nor on where the data was originally used.
42
43       -  In the original data 4 values for the fifth attribute were -1.
44          These values have been changed to ? (unknown). (*)
45       -  In the original data 1 value for the 39 attribute was 4.  This
46          value has been changed to ? (unknown). (*)
47   
48     
495. Number of Instances: 32
50
516. Number of Attributes: 57 (1 class attribute, 56 predictive)
52
537. Attribute Information:
54
55    attribute 1 is the class label.
56   
57    - All predictive attributes are nominal, taking on integer
58      values 0-3
59
608. Missing Attribute Values: Attributes 5 and 39 (*)
61
629. Class Distribution:
63    - 3 classes,
64        1.) 9 observations
65        2.) 13     "
66        3.) 10     "
67</pre>
68</body>
69</html>
Note: See TracBrowser for help on using the repository browser.