Orange Forum • View topic - Manhattan distance for discrete attributes

## Manhattan distance for discrete attributes

A place to ask questions about methods in Orange and how they are used and other general support.

### Manhattan distance for discrete attributes

Hi!

I have been looking at distances between examples reference and I don't understand how you calculate Manhattan distance between discrete attributes. Can you please help me?

I know what Manhattan distance is, but I can't figure out how you get these results:
*** Reference example: ['young', 'myope', 'no', 'reduced', 'none']
['young', 'myope', 'no', 'reduced', 'none'] 0.0
['young', 'myope', 'no', 'normal', 'soft'] 1.0
['young', 'myope', 'yes', 'reduced', 'none'] 1.0
['young', 'myope', 'yes', 'normal', 'hard'] 2.0

### Looks strange

I tried to reproduce this but got an expected behaviour:

Code: Select all
`import orangea = orange.EnumVariable("a",values=['0','1','2'])b = orange.EnumVariable("b",values=['0','1','2'])c = orange.EnumVariable("c",values=['0','1','2'])d = orange.EnumVariable("d",values=['0','1','2'])e = orange.EnumVariable("e",values=['0','1','2'])f = orange.EnumVariable("f",values=['0','1','2'])g = orange.EnumVariable("g",values=['0','1','2'])h = orange.EnumVariable("h",values=['0','1','2'])myDom = orange.Domain([a,b,c,d,e,f,g,h])myDat = []myDat.append(['0','0','0','0','0','0','0','0'])myDat.append(['1','0','0','0','0','0','0','0'])myDat.append(['1','1','0','0','0','0','0','0'])myDat.append(['1','1','1','0','0','0','0','0'])myDat.append(['1','1','1','1','0','0','0','0'])myDat.append(['1','1','1','1','1','0','0','0'])myDat.append(['1','1','1','1','1','1','0','0'])myDat.append(['1','1','1','1','1','1','1','0'])myDat.append(['1','1','1','1','1','1','1','1'])data = orange.ExampleTable(myDom,myDat)distance = orange.ExamplesDistanceConstructor_Manhattan(data)for ex in data:  print ex, distance(ex, data[0])`

results in

Code: Select all
`['0', '0', '0', '0', '0', '0', '0', '0'] 0.0['1', '0', '0', '0', '0', '0', '0', '0'] 1.0['1', '1', '0', '0', '0', '0', '0', '0'] 2.0['1', '1', '1', '0', '0', '0', '0', '0'] 3.0['1', '1', '1', '1', '0', '0', '0', '0'] 4.0['1', '1', '1', '1', '1', '0', '0', '0'] 5.0['1', '1', '1', '1', '1', '1', '0', '0'] 6.0['1', '1', '1', '1', '1', '1', '1', '0'] 7.0['1', '1', '1', '1', '1', '1', '1', '1'] 7.0`

The only strange thing is, that the last two values have the same distance to the reference.

Distance measures by default ignore the class value.
You can pass the ignoreClass argument to the ExamplesDistanceConstructor like this:
Code: Select all
`distance = orange.ExamplesDistanceConstructor_Manhattan(data, ignoreClass=False)`

michaelhecht, did you try working with strings? I know that it works correctly with integers and floats. Only with strings I can't figure out how it computes Manhattan or even Hamming distance...

### Found it

Its funny, but my problem is in fact also your problem:

The type of variables in the "lenses" data set is the same, i.e. EnumVariable. The differences you get seem to result from the fact that the "lenses" data have no class attribute. Therefore the last attribute is interpreted as class by the distance operator.

If you use:
Code: Select all
`distance =orange.ExamplesDistanceConstructor_Manhattan(data,ignoreClass=False)`

you will get the right distances:

Code: Select all
`[age, prescription, astigmatic, tear_rate, lenses]EnumVariable 'age' <pre-presbyopic, presbyopic, young>EnumVariable 'prescription' <hypermetrope, myope>EnumVariable 'astigmatic' <no, yes>EnumVariable 'tear_rate' <normal, reduced>*** Reference example:  ['young', 'myope', 'no', 'reduced', 'none']['young', 'myope', 'no', 'reduced', 'none'] 0.0['young', 'myope', 'no', 'normal', 'soft'] 2.0['young', 'myope', 'yes', 'reduced', 'none'] 1.0['young', 'myope', 'yes', 'normal', 'hard'] 3.0['young', 'hypermetrope', 'no', 'reduced', 'none'] 1.0['young', 'hypermetrope', 'no', 'normal', 'soft'] 3.0['young', 'hypermetrope', 'yes', 'reduced', 'none'] 2.0['young', 'hypermetrope', 'yes', 'normal', 'hard'] 4.0['pre-presbyopic', 'myope', 'no', 'reduced', 'none'] 1.0['pre-presbyopic', 'myope', 'no', 'normal', 'soft'] 3.0['pre-presbyopic', 'myope', 'yes', 'reduced', 'none'] 2.0['pre-presbyopic', 'myope', 'yes', 'normal', 'hard'] 4.0['pre-presbyopic', 'hypermetrope', 'no', 'reduced', 'none'] 2.0['pre-presbyopic', 'hypermetrope', 'no', 'normal', 'soft'] 4.0['pre-presbyopic', 'hypermetrope', 'yes', 'reduced', 'none'] 3.0['pre-presbyopic', 'hypermetrope', 'yes', 'normal', 'none'] 4.0['presbyopic', 'myope', 'no', 'reduced', 'none'] 1.0['presbyopic', 'myope', 'no', 'normal', 'none'] 2.0['presbyopic', 'myope', 'yes', 'reduced', 'none'] 2.0['presbyopic', 'myope', 'yes', 'normal', 'hard'] 4.0['presbyopic', 'hypermetrope', 'no', 'reduced', 'none'] 2.0['presbyopic', 'hypermetrope', 'no', 'normal', 'soft'] 4.0['presbyopic', 'hypermetrope', 'yes', 'reduced', 'none'] 3.0['presbyopic', 'hypermetrope', 'yes', 'normal', 'none'] 4.0`

This one is mine. I found this great comment in my code:

if (ignoreClass) // can't check it, but suppose there is a class attribute
ei--;

This code is really old, so I only vaguely remember why I "can't check it". This was really sloppy programming, I'll fix it.

No, wait: lenses does have the class attribute. I think Orange works OK here.

However, I have fixed the problem mentioned above. It was a bug, but in a part of code which only executes when user gives some weird combination of arguments. It didn't execute in your case.