source: orange/docs/tutorial/rst/code/disc1.py @ 9744:a0ad3a6cd405

Revision 9744:a0ad3a6cd405, 1.0 KB checked in by Miha Stajdohar <miha.stajdohar@…>, 2 years ago (diff)

Changed data set path.

Line 
1# Description: Entropy based discretization compared to discretization with equal-frequency
2#              of instances in intervals
3# Category:    preprocessing
4# Uses:        wdbc.tab
5# Classes:     Preprocessor_discretize, EntropyDiscretization
6# Referenced:  o_categorization.htm
7
8import orange
9
10def show_values(data, heading):
11  for a in data.domain.attributes:
12    print "%s/%d: %s" % (a.name, len(a.values), reduce(lambda x, y: x + ', ' + y, [i for i in a.values]))
13
14data = orange.ExampleTable("wdbc.tab")
15print '%d features in original data set, discretized:' % len(data.domain.attributes)
16data_ent = orange.Preprocessor_discretize(data, method=orange.EntropyDiscretization())
17show_values(data_ent, "Entropy based discretization")
18
19print '\nFeatures with sole value after discretization:'
20for a in data_ent.domain.attributes:
21  if len(a.values) == 1:
22    print a.name
23
24import orngDisc
25data_ent2 = orngDisc.entropyDiscretization(data)
26print '%d features after removing features discretized to a constant value' % len(data_ent2.domain.attributes)
Note: See TracBrowser for help on using the repository browser.