Orange Forum • View topic - Manhattan distance for discrete attributes

Manhattan distance for discrete attributes

A place to ask questions about methods in Orange and how they are used and other general support.

Manhattan distance for discrete attributes

Postby Guest1 » Wed Feb 10, 2010 22:18

Hi!

I have been looking at distances between examples reference and I don't understand how you calculate Manhattan distance between discrete attributes. Can you please help me?

I know what Manhattan distance is, but I can't figure out how you get these results:
*** Reference example: ['young', 'myope', 'no', 'reduced', 'none']
['young', 'myope', 'no', 'reduced', 'none'] 0.0
['young', 'myope', 'no', 'normal', 'soft'] 1.0
['young', 'myope', 'yes', 'reduced', 'none'] 1.0
['young', 'myope', 'yes', 'normal', 'hard'] 2.0

Looks strange

Postby michaelhecht » Thu Feb 11, 2010 9:39

I tried to reproduce this but got an expected behaviour:

Code: Select all
import orange

a = orange.EnumVariable("a",values=['0','1','2'])
b = orange.EnumVariable("b",values=['0','1','2'])
c = orange.EnumVariable("c",values=['0','1','2'])
d = orange.EnumVariable("d",values=['0','1','2'])
e = orange.EnumVariable("e",values=['0','1','2'])
f = orange.EnumVariable("f",values=['0','1','2'])
g = orange.EnumVariable("g",values=['0','1','2'])
h = orange.EnumVariable("h",values=['0','1','2'])

myDom = orange.Domain([a,b,c,d,e,f,g,h])

myDat = []
myDat.append(['0','0','0','0','0','0','0','0'])
myDat.append(['1','0','0','0','0','0','0','0'])
myDat.append(['1','1','0','0','0','0','0','0'])
myDat.append(['1','1','1','0','0','0','0','0'])
myDat.append(['1','1','1','1','0','0','0','0'])
myDat.append(['1','1','1','1','1','0','0','0'])
myDat.append(['1','1','1','1','1','1','0','0'])
myDat.append(['1','1','1','1','1','1','1','0'])
myDat.append(['1','1','1','1','1','1','1','1'])

data = orange.ExampleTable(myDom,myDat)

distance = orange.ExamplesDistanceConstructor_Manhattan(data)

for ex in data:
  print ex, distance(ex, data[0])


results in

Code: Select all

['0', '0', '0', '0', '0', '0', '0', '0'] 0.0
['1', '0', '0', '0', '0', '0', '0', '0'] 1.0
['1', '1', '0', '0', '0', '0', '0', '0'] 2.0
['1', '1', '1', '0', '0', '0', '0', '0'] 3.0
['1', '1', '1', '1', '0', '0', '0', '0'] 4.0
['1', '1', '1', '1', '1', '0', '0', '0'] 5.0
['1', '1', '1', '1', '1', '1', '0', '0'] 6.0
['1', '1', '1', '1', '1', '1', '1', '0'] 7.0
['1', '1', '1', '1', '1', '1', '1', '1'] 7.0



The only strange thing is, that the last two values have the same distance to the reference.

Postby Ales » Thu Feb 11, 2010 11:12

Distance measures by default ignore the class value.
You can pass the ignoreClass argument to the ExamplesDistanceConstructor like this:
Code: Select all
distance = orange.ExamplesDistanceConstructor_Manhattan(data, ignoreClass=False)

Postby Guest1 » Thu Feb 11, 2010 20:25

michaelhecht, did you try working with strings? I know that it works correctly with integers and floats. Only with strings I can't figure out how it computes Manhattan or even Hamming distance...

Found it

Postby michaelhecht » Fri Feb 12, 2010 17:51

Its funny, but my problem is in fact also your problem:

The type of variables in the "lenses" data set is the same, i.e. EnumVariable. The differences you get seem to result from the fact that the "lenses" data have no class attribute. Therefore the last attribute is interpreted as class by the distance operator.

If you use:
Code: Select all
distance =orange.ExamplesDistanceConstructor_Manhattan(data,ignoreClass=False)


you will get the right distances:

Code: Select all
[age, prescription, astigmatic, tear_rate, lenses]
EnumVariable 'age' <pre-presbyopic, presbyopic, young>
EnumVariable 'prescription' <hypermetrope, myope>
EnumVariable 'astigmatic' <no, yes>
EnumVariable 'tear_rate' <normal, reduced>
*** Reference example:  ['young', 'myope', 'no', 'reduced', 'none']
['young', 'myope', 'no', 'reduced', 'none'] 0.0
['young', 'myope', 'no', 'normal', 'soft'] 2.0
['young', 'myope', 'yes', 'reduced', 'none'] 1.0
['young', 'myope', 'yes', 'normal', 'hard'] 3.0
['young', 'hypermetrope', 'no', 'reduced', 'none'] 1.0
['young', 'hypermetrope', 'no', 'normal', 'soft'] 3.0
['young', 'hypermetrope', 'yes', 'reduced', 'none'] 2.0
['young', 'hypermetrope', 'yes', 'normal', 'hard'] 4.0
['pre-presbyopic', 'myope', 'no', 'reduced', 'none'] 1.0
['pre-presbyopic', 'myope', 'no', 'normal', 'soft'] 3.0
['pre-presbyopic', 'myope', 'yes', 'reduced', 'none'] 2.0
['pre-presbyopic', 'myope', 'yes', 'normal', 'hard'] 4.0
['pre-presbyopic', 'hypermetrope', 'no', 'reduced', 'none'] 2.0
['pre-presbyopic', 'hypermetrope', 'no', 'normal', 'soft'] 4.0
['pre-presbyopic', 'hypermetrope', 'yes', 'reduced', 'none'] 3.0
['pre-presbyopic', 'hypermetrope', 'yes', 'normal', 'none'] 4.0
['presbyopic', 'myope', 'no', 'reduced', 'none'] 1.0
['presbyopic', 'myope', 'no', 'normal', 'none'] 2.0
['presbyopic', 'myope', 'yes', 'reduced', 'none'] 2.0
['presbyopic', 'myope', 'yes', 'normal', 'hard'] 4.0
['presbyopic', 'hypermetrope', 'no', 'reduced', 'none'] 2.0
['presbyopic', 'hypermetrope', 'no', 'normal', 'soft'] 4.0
['presbyopic', 'hypermetrope', 'yes', 'reduced', 'none'] 3.0
['presbyopic', 'hypermetrope', 'yes', 'normal', 'none'] 4.0

Postby Janez » Fri Feb 12, 2010 22:58

This one is mine. I found this great comment in my code:

if (ignoreClass) // can't check it, but suppose there is a class attribute
ei--;

This code is really old, so I only vaguely remember why I "can't check it". This was really sloppy programming, I'll fix it.

Postby Janez » Fri Feb 12, 2010 23:26

No, wait: lenses does have the class attribute. I think Orange works OK here.

However, I have fixed the problem mentioned above. It was a bug, but in a part of code which only executes when user gives some weird combination of arguments. It didn't execute in your case.


Return to Questions & Support



cron