Orange Forum • View topic - Behaviour of EntropyDiscretization()?

Behaviour of EntropyDiscretization()?

A place to ask questions about methods in Orange and how they are used and other general support.

Behaviour of EntropyDiscretization()?

Postby RichardMR » Tue May 05, 2009 19:21

Dear Orange,

I would be grateful if anyone would clarify the functioning of EntropyDiscretization() for me.

I've read: ... zation.htm

I've also had a look at the source code: ... retize.cpp (Although, I'm much more comfortable programming in Python; I think I can get the gist of what most functions are doing here, but wasn't able to answer my own question.)

So, I understand that this means of discretization works by selecting a number of candidate split points from the N - 1 midpoints of the N membered data set, ordered by the attribute to be dicretized. A split point is chosen if the information gain > MDL (with probabilities being calculated using simple frequency counts).

If, however, forceAttribute=True, I understood this would mean the split point with the highest information gain would be selected even if the gain <MDL, IFF the normal procedure would lead to no split points being selected.

However, I've found that forceAttribute=True also generated extra split points for other attributes, which were already split before.

I'm confused. Does forceAttribute=True mean that MDL is simply ignored? If so, I would presume this would generate N-1 split points?

Or,does it mean MDL is ignored once - giving rise to an extra split in all attributes split before?

The syntax I used to observe this in using Python:

train_data = orange.ExampleTable(train_tab_output_name)

entro = orange.EntropyDiscretization()

if ForceOneCut == True: #My own command line adjustable boolean
entro = orange.EntropyDiscretization(forceAttribute=True)

for attr in train_data.domain.attributes:
#Don't discretize the IDs!
if not == IDTag:
disc = entro(attr,train_data)
Desc2Points[] = disc.getValueFrom.transformer.points

Thank you so much if you can clarify my understanding of this method!


Return to Questions & Support