Orange Forum • View topic - clustering question

clustering question

A place to ask questions about methods in Orange and how they are used and other general support.

clustering question

Postby tripy_r » Wed Jan 28, 2009 1:59

Hello,

Maybe I get some help on this site.

I am looking for a clustering algorithm that won't be budged by points density and will be able to isolate one point as a cluster even when facing high density groups at other clusters.

I would like to give my example:

I have around 100 points scatter around x=1 y=1
I have another 100 points scatter around point x=1.2 y=1.2
Now I had another point (just one) at x=100 y=100

I would like an algorithm that will see it either as a new cluster or will see the 2000 points as one cluster and the last one as another cluster.

Most algorithms I tested join the last point to one of the groups... even when I define more then 3 groups (I tried even 10), non isolated it.

What type of algorithm will be able to deal with such scenarios?
my real data set is 5D and not 2D as I presented here. Be glad to provide it anyone want to play with it.

I will appreciate any help on this matter,
T.

Postby Blaz » Sun Mar 01, 2009 18:18

First, sorry for a late reply. Most of the data analysis algorithms would (correctly) view your [100, 100] point as an outlier, and would not consider it as an independent cluster. An outlier detection algorithm (such as the one in Orange) should detect such a data point. However, if you would run hierarchical clustering on your data set, and then cut the clustering so that to obtain two or three clusters, you would indeed have this particular point "as a separate cluster" (try using Hiearchical Clustering widget on your data).

Have you looked at MST-based clustering algos?

Postby Yakov » Sat Mar 07, 2009 2:23

You're probably looking for a graph-based (MST-based) clustering algo. Here is a link to an algo that I've found highly useful: http://people.cs.uchicago.edu/~pff/segment/

If you define your pairwise metric appropriately, the dimensionality of your space will no longer matter.

I have it coded in Matlab, can port it to Python if there is enough interest.

Yakov


Return to Questions & Support



cron