## K-means clustering questions

5 posts
• Page

**1**of**1**### K-means clustering questions

Hi,

I'm working with neurons and need to divide them into groups based on their electrophysiological properties. There are 15 parameters I'm using, based on a paper from Perrenoud (2012). However, I'm not totally familiar with clustering, specially the method they use, k-means clustering. To make things a bit more complicated, the Orange Widgets information for K-means clustering is not up-to-date with the widget. For instance, there are no Fitness and BIC values, instead a Score. What does the score mean, and which one is the best? How many restarts are ideal for the settings? I assumed random initialization would be best, but not totally sure.

I appreciate any help in these matters.

I'm working with neurons and need to divide them into groups based on their electrophysiological properties. There are 15 parameters I'm using, based on a paper from Perrenoud (2012). However, I'm not totally familiar with clustering, specially the method they use, k-means clustering. To make things a bit more complicated, the Orange Widgets information for K-means clustering is not up-to-date with the widget. For instance, there are no Fitness and BIC values, instead a Score. What does the score mean, and which one is the best? How many restarts are ideal for the settings? I assumed random initialization would be best, but not totally sure.

I appreciate any help in these matters.

### Re: K-means clustering questions

The values in the 'Score' column are the best values (from all restarts) of the 'Scoring' function (selected in the 'Clusters (k)' box) the for each number of clusters (k). The best one is marked with a dot (and is automatically selected). But whether a bigger score is better depends on the particular choice of the scoring function (for 'Distance to centroids' a smaller score is better).RRBammann wrote: What does the score mean, and which one is the best?

There is no easy answer to this. It' more of a 'try it and see if it works' problem, but generally random initialization and a large restart rate works good.RRBammann wrote:How many restarts are ideal for the settings? I assumed random initialization would be best, but not totally sure.

### Re: K-means clustering questions

Hi Ales,

Thanks for your reply, it helped understanding the widged. I'm going for a large restart rate, as you said, so it might give some good results. But comparing k-means clustering to hierarchical clustering, the latter has a way of normalizing the data when using "Example Distance" widged. Is there a way of doing the same normalization for the k-means clustering, or do I have to prepare the data before inputing into my schema?

Thanks for the help!

Thanks for your reply, it helped understanding the widged. I'm going for a large restart rate, as you said, so it might give some good results. But comparing k-means clustering to hierarchical clustering, the latter has a way of normalizing the data when using "Example Distance" widged. Is there a way of doing the same normalization for the k-means clustering, or do I have to prepare the data before inputing into my schema?

Thanks for the help!

### Re: K-means clustering questions

The data is normalized automatically by the K-Means widget where applicable (i.e. it computes distances in the same way that 'Example Distance' widget would with the 'Normalize data' checked).RRBammann wrote:But comparing k-means clustering to hierarchical clustering, the latter has a way of normalizing the data when using "Example Distance" widged. Is there a way of doing the same normalization for the k-means clustering, or do I have to prepare the data before inputing into my schema?

### Re: K-means clustering questions

Thanks for the answer! That is helpfull.

One last question: is there a way of making the software learn the classification, so for new data it could classify it according to one of the clusters made by the k-means clustering widged?

One last question: is there a way of making the software learn the classification, so for new data it could classify it according to one of the clusters made by the k-means clustering widged?

5 posts
• Page

**1**of**1**