## clustering question

clustering question

Hello,

Maybe I get some help on this site.

I am looking for a clustering algorithm that won't be budged by points density and will be able to isolate one point as a cluster even when facing high density groups at other clusters.

I would like to give my example:

I have around 100 points scatter around x=1 y=1

I have another 100 points scatter around point x=1.2 y=1.2

Now I had another point (just one) at x=100 y=100

I would like an algorithm that will see it either as a new cluster or will see the 2000 points as one cluster and the last one as another cluster.

Most algorithms I tested join the last point to one of the groups... even when I define more then 3 groups (I tried even 10), non isolated it.

What type of algorithm will be able to deal with such scenarios?

my real data set is 5D and not 2D as I presented here. Be glad to provide it anyone want to play with it.

I will appreciate any help on this matter,

T.

### Have you looked at MST-based clustering algos?

You're probably looking for a graph-based (MST-based) clustering algo. Here is a link to an algo that I've found highly useful: http://people.cs.uchicago.edu/~pff/segment/

If you define your pairwise metric appropriately, the dimensionality of your space will no longer matter.

I have it coded in Matlab, can port it to Python if there is enough interest.

Yakov

