Postby disappearedng » Sun Feb 07, 2010 7:52

I am planning to use orange for kmeans clustering. I have gone through the tutorials, but I still have a couple of questions which I would like to ask:

I am dealing with clustering on vectors of high dimension.
1) Is there a cosine distance implemented?
2) I do not want to give zeros to empty values. I tried not having any zeros in empty fields and am getting the error:
SystemError: 'orange.TabDelimExampleGenerator': the number of attribute types does not match the number of attributes

How do I indicate an empty value?
3) Is there a way to use incorporate an "ID" into the example table? I want to label my data by an ID (NOT classification) for easier reference. I do not the ID column to be my official part of my data.

4) Is there a way to output differently for kmeans clustering?
I would much prefer something in this format:
cluster1: [ <id1>, <id2>, ...]
cluster2: [ <id3>, ... ]
rather than just [1, 2, 3,1 , 2, ... ]


Postby Ales » Thu Feb 11, 2010 11:43

1. No

2. You can indicate an unknown value with "?" in it (see for more details)

3. Yes you can define ids as meta attributes (see the same link as above).

