Orange Forum • View topic - K-means clustering questions

K-means clustering questions

A place to ask questions about methods in Orange and how they are used and other general support.

K-means clustering questions

Postby RRBammann » Mon Oct 22, 2012 19:52

Hi,

I'm working with neurons and need to divide them into groups based on their electrophysiological properties. There are 15 parameters I'm using, based on a paper from Perrenoud (2012). However, I'm not totally familiar with clustering, specially the method they use, k-means clustering. To make things a bit more complicated, the Orange Widgets information for K-means clustering is not up-to-date with the widget. For instance, there are no Fitness and BIC values, instead a Score. What does the score mean, and which one is the best? How many restarts are ideal for the settings? I assumed random initialization would be best, but not totally sure.
I appreciate any help in these matters.

Re: K-means clustering questions

Postby Ales » Thu Oct 25, 2012 10:36

RRBammann wrote: What does the score mean, and which one is the best?
The values in the 'Score' column are the best values (from all restarts) of the 'Scoring' function (selected in the 'Clusters (k)' box) the for each number of clusters (k). The best one is marked with a dot (and is automatically selected). But whether a bigger score is better depends on the particular choice of the scoring function (for 'Distance to centroids' a smaller score is better).
RRBammann wrote:How many restarts are ideal for the settings? I assumed random initialization would be best, but not totally sure.
There is no easy answer to this. It' more of a 'try it and see if it works' problem, but generally random initialization and a large restart rate works good.

Re: K-means clustering questions

Postby RRBammann » Mon Oct 29, 2012 11:20

Hi Ales,

Thanks for your reply, it helped understanding the widged. I'm going for a large restart rate, as you said, so it might give some good results. But comparing k-means clustering to hierarchical clustering, the latter has a way of normalizing the data when using "Example Distance" widged. Is there a way of doing the same normalization for the k-means clustering, or do I have to prepare the data before inputing into my schema?
Thanks for the help!

Re: K-means clustering questions

Postby Ales » Fri Nov 02, 2012 12:16

RRBammann wrote:But comparing k-means clustering to hierarchical clustering, the latter has a way of normalizing the data when using "Example Distance" widged. Is there a way of doing the same normalization for the k-means clustering, or do I have to prepare the data before inputing into my schema?
The data is normalized automatically by the K-Means widget where applicable (i.e. it computes distances in the same way that 'Example Distance' widget would with the 'Normalize data' checked).

Re: K-means clustering questions

Postby RRBammann » Fri Nov 02, 2012 12:35

Thanks for the answer! That is helpfull.

One last question: is there a way of making the software learn the classification, so for new data it could classify it according to one of the clusters made by the k-means clustering widged?


Return to Questions & Support



cron