source: orange/orange/doc/datasets/iris.htm @ 1760:9d4bb141fb0e

Revision 1760:9d4bb141fb0e, 2.7 KB checked in by blaz <blaz.zupan@…>, 9 years ago (diff)

data info file

Line 
1<html>
2<head>
3<title>Iris Plants Data Base</title>
4</head>
5<body>
6<h1>Info on Iris Plants Data Base</h1>
7<pre>
81. Title: Iris Plants Database
9
102. Sources:
11     (a) Creator: R.A. Fisher
12     (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
13     (c) Date: July, 1988
14
153. Past Usage:
16   - Publications: too many to mention!!!  Here are a few.
17   1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
18      Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
19      to Mathematical Statistics" (John Wiley, NY, 1950).
20   2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
21      (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
22   3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
23      Structure and Classification Rule for Recognition in Partially Exposed
24      Environments".  IEEE Transactions on Pattern Analysis and Machine
25      Intelligence, Vol. PAMI-2, No. 1, 67-71.
26      -- Results:
27         -- very low misclassification rates (0% for the setosa class)
28   4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE
29      Transactions on Information Theory, May 1972, 431-433.
30      -- Results:
31         -- very low misclassification rates again
32   5. See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al's AUTOCLASS II
33      conceptual clustering system finds 3 classes in the data.
34
354. Relevant Information:
36   --- This is perhaps the best known database to be found in the pattern
37       recognition literature.  Fisher's paper is a classic in the field
38       and is referenced frequently to this day.  (See Duda & Hart, for
39       example.)  The data set contains 3 classes of 50 instances each,
40       where each class refers to a type of iris plant.  One class is
41       linearly separable from the other 2; the latter are NOT linearly
42       separable from each other.
43   --- Predicted attribute: class of iris plant.
44   --- This is an exceedingly simple domain.
45
465. Number of Instances: 150 (50 in each of three classes)
47
486. Number of Attributes: 4 numeric, predictive attributes and the class
49
507. Attribute Information:
51   1. sepal length in cm
52   2. sepal width in cm
53   3. petal length in cm
54   4. petal width in cm
55   5. class:
56      -- Iris Setosa
57      -- Iris Versicolour
58      -- Iris Virginica
59
608. Missing Attribute Values: None
61
62Summary Statistics:
63             Min  Max   Mean    SD   Class Correlation
64   sepal length: 4.3  7.9   5.84  0.83    0.7826   
65    sepal width: 2.0  4.4   3.05  0.43   -0.4194
66   petal length: 1.0  6.9   3.76  1.76    0.9490  (high!)
67    petal width: 0.1  2.5   1.20  0.76    0.9565  (high!)
68
699. Class Distribution: 33.3% for each of 3 classes.
70</pre>
71</body>
72</html>
Note: See TracBrowser for help on using the repository browser.