source: orange/orange/doc/datasets/hayes-roth_test.htm @ 1760:9d4bb141fb0e

Revision 1760:9d4bb141fb0e, 6.4 KB checked in by blaz <blaz.zupan@…>, 9 years ago (diff)

data info file

Line 
1<html>
2<head>
3<title>Hayes-Roth & Hayes-Roth Data Base</title>
4</head>
5<body>
6<h1>Info on Hayes-Roth & Hayes-Roth Data Base</h1>
7<pre>
81. Title: Hayes-Roth & Hayes-Roth (1977) Database
9
102. Source Information:
11   (a) Creators: Barbara and Frederick Hayes-Roth
12   (b) Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779   
13   (c) Date: March, 1989
14
153. Past Usage:
16    1. Hayes-Roth, B., & Hayes-Roth, F. (1977).  Concept learning and the
17       recognition and classification of exemplars.  Journal of Verbal Learning
18       and Verbal Behavior, 16, 321-338.
19       -- Results:
20          -- Human subjects classification and recognition performance:
21           1. decreases with distance from the prototype,
22           2. is better on unseen prototypes than old instances, and
23           3. improves with presentation frequency during learning.
24    2. Anderson, J.R., & Kline, P.J. (1979).  A learning system and its
25       psychological implications.  In Proceedings of the Sixth International
26       Joint Conference on Artificial Intelligence (pp. 16-21).  Tokyo, Japan:
27       Morgan Kaufmann.
28       -- Partitioned the results into 4 classes:
29        1. prototypes
30        2. near-prototypes with high presentation frequency during learning
31        3. near-prototypes with low presentation frequency during learning
32        4. instances that are far from protoypes
33       -- Described evidence that ACT's classification confidence and
34          recognition behaviors closely simulated human subjects' behaviors.
35    3. Aha, D.W. (1989).  Incremental learning of independent, overlapping, and
36       graded concept descriptions with an instance-based process framework.
37       Manuscript submitted for publication.
38       -- Used same partition as Anderson & Kline
39       -- Described evidence that Bloom's classification confidence behavior
40      is similar to the human subjects' behavior.  Bloom fitted the data
41      more closely than did ACT.
42
434. Relevant Information:
44     This database contains 5 numeric-valued attributes.  Only a subset of
45     3 are used during testing (the latter 3).  Furthermore, only 2 of the
46     3 concepts are "used" during testing (i.e., those with the prototypes
47     000 and 111).  I've mapped all values to their zero-indexing equivalents.
48
49     Some instances could be placed in either category 0 or 1.  I've followed
50     the authors' suggestion, placing them in each category with equal
51     probability.
52
53     I've replaced the actual values of the attributes (i.e., hobby has values
54     chess, sports and stamps) with numeric values.  I think this is how
55     the authors' did this when testing the categorization models described
56     in the paper.  I find this unfair.  While the subjects were able to bring
57     background knowledge to bear on the attribute values and their
58     relationships, the algorithms were provided with no such knowledge.  I'm
59     uncertain whether the 2 distractor attributes (name and hobby) are
60     presented to the authors' algorithms during testing.  However, it is clear
61     that only the age, educational status, and marital status attributes are
62     given during the human subjects' transfer tests. 
63   
645. Number of Instances: 132 training instances, 28 test instances
65
666. Number of Attributes: 5 plus the class membership attribute.  3 concepts.
67
687. Attribute Information:
69      -- 1. name: distinct for each instance and represented numerically
70      -- 2. hobby: nominal values ranging between 1 and 3
71      -- 3. age: nominal values ranging between 1 and 4
72      -- 4. educational level: nominal values ranging between 1 and 4
73      -- 5. marital status: nominal values ranging between 1 and 4
74      -- 6. class: nominal value between 1 and 3
75
769. Missing Attribute Values: none
77
7810. Class Distribution: see below
79
8011. Detailed description of the experiment:
81  1. 3 categories (1, 2, and neither -- which I call 3)
82     -- some of the instances could be classified in either class 1 or 2, and
83        they have been evenly distributed between the two classes
84  2. 5 Attributes
85     -- A. name (a randomly-generated number between 1 and 132)
86     -- B. hobby (a randomly-generated number between 1 and 3)
87     -- C. age (a number between 1 and 4)
88     -- D. education level (a number between 1 and 4)
89     -- E. marital status (a number between 1 and 4)
90  3. Classification:
91     -- only attributes C-E are diagnostic; values for A and B are ignored
92     -- Class Neither: if a 4 occurs for any attribute C-E
93     -- Class 1: Otherwise, if (# of 1's)>(# of 2's) for attributes C-E
94     -- Class 2: Otherwise, if (# of 2's)>(# of 1's) for attributes C-E
95     -- Either 1 or 2: Otherwise, if (# of 2's)=(# of 1's) for attributes C-E
96  4. Prototypes:
97     -- Class 1: 111
98     -- Class 2: 222
99     -- Class Either: 333
100     -- Class Neither: 444 
101  5. Number of training instances: 132
102     -- Each instance presented 0, 1, or 10 times
103     -- None of the prototypes seen during training
104     -- 3 instances from each of categories 1, 2, and either are repeated
105        10 times each
106     -- 3 additional instances from the Either category are shown during
107        learning
108  5. Number of test instances: 28
109     -- All 9 class 1
110     -- All 9 class 2
111     -- All 6 class Either
112     -- All 4 prototypes
113     --------------------
114     --    28 total
115
116Observations of interest:
117  1. Relative classification confidence of
118     -- prototypes for classes 1 and 2 (2 instances)
119        (Anderson calls these Class 1 instances)
120     -- instances of class 1 with frequency 10 during training and
121        instances of class 2 with frequency 10 during training that
122        are 1 value away from their respective prototypes (6 instances)
123        (Anderson calls these Class 2 instances)
124     -- instances of class 1 with frequency 1 during training and
125        instances of class 2 with frequency 1 during training that
126        are 1 value away from their respective prototypes (6 instances)
127        (Anderson calls these Class 3 instances)
128     -- instances of class 1 with frequency 1 during training and
129        instances of class 2 with frequency 1 during training that
130        are 2 values away from their respective prototypes (6 instances)
131        (Anderson calls these Class 4 instances)
132 2. Relative classification recognition of them also
133
134Some Expected results:
135   Both frequency and distance from prototype will effect the classification
136   accuracy of instances.  Greater the frequency, higher the classification
137   confidence.  Closer to prototype, higher the classification confidence
138</pre>
139</body>
140</html>
Note: See TracBrowser for help on using the repository browser.