Orange Forum • View topic - orngClustering errors

orngClustering errors

Report bugs (or imagined bugs).
(Archived/read-only, please use our ticketing system for reporting bugs and their discussion.)
Forum rules
Archived/read-only, please use our ticketing system for reporting bugs and their discussion.

orngClustering errors

Postby srikanth » Wed May 26, 2010 5:26

(1) If only a single attribute is passed as an array, all the cases are classified under one cluster only.

(2) If there are repeat values in the array, the clustering goes haywire. For the same attribute value, two or more are clustered differently.

The clustering calculation seems to work normally only when there are multiple attributes and even then if no significant repeats are present. Kinfdly clarify.

Postby Ales » Wed May 26, 2010 10:13

Make sure your one attribute in an array is not the class attribute. The distance measures used by default ignore the class value of the examples.
You can override this behavior by passing a ignoreClass=False keyword argument to example distance constructors.

orngClustering

Postby srikanth » Thu May 27, 2010 6:30

Hi Ales,

You have said "..passing a ignoreClass=False keyword argument..."

Could you pls elaborate. Am using
orngClustering.KMeans

and it does not have the above attribute. So where exactly do we specify what you have refered to?

orngClustering errors /data

Postby srikanth » Thu May 27, 2010 7:32

Further, i have tried with
...
km = orngClustering.KMeans(data, max_clusters, minscorechange=0, inner_callback=callback, distance = orange.ExamplesDistanceConstructor_Manhattan(ignoreClass=False) )
...
The earlier stated issues remain. I can provide the data, code used if required.

Postby Ales » Thu May 27, 2010 11:48

Ok. Looking at the code for k-means it seems that ignoreClass argument wont fix the problem if your one attribute is the class (problem is in the data_center function, it sets the class variable to "0" regardless of its center).

Instead try making sure the domain of your example table does not have a class attribute. Try this running this before the call to k-means.

Code: Select all
data = orange.ExampleTable(orange.Domain(data.domain.variables, None), data)


If this does not fix the problem then the code and data would be welcome.

orngClustering errors - solved

Postby srikanthvc » Thu Jun 03, 2010 6:22

It works fine now, after 'declassing' the domain. Thanks for the hint, you may want to add this to the documentation even though this is a trivial case.

orngClustering /handling missing values

Postby srik » Wed Jun 23, 2010 10:41

Hi

How to handle missing values (in datasets having 2 or more attributes) while clustering. Currently the program returns zero if any missing values are present.


Return to Bugs