Orange Forum • View topic - New to orange, new to data mining

New to orange, new to data mining

A place to ask questions about methods in Orange and how they are used and other general support.

New to orange, new to data mining

Postby virtualsolo » Sat Sep 08, 2012 9:08

Hi,

I have a task about recommending restaurants to a given customer. First I want to group(cluster) customers based on their preferences. My sample data source is like this:

userID smoker drink_level dress_preference ambience transport marital_status budget
-----------------------------------------------------------------------------------------------------------------------
U001 False social informal family on foot medium
U002 True casual formal friends car low
U003 True abstemious no preference solitary public low

There are also other attributes which might be irrelevant for clustering, like 'height', 'religion'. What I did now is to simply use orange clustering by passing the whole dataset of customer like thisorngClustering.KMeans(user_data, 5) . It did give me some output, but I am not sure whether this is a good clustering.
So I want to know what is the right way to use orange clustering class in order to get a good clustering. Specifically in my case whether I need to do some preprocessing work on my user data, like remove some irrelevant attributes?
Your help will be appreciated!

Re: New to orange, new to data mining

Postby Ales » Mon Sep 10, 2012 10:25

virtualsolo wrote:So I want to know what is the right way to use orange clustering class in order to get a good clustering.
See Orange.clustering.kmeans, specificaly plot_silhouette
virtualsolo wrote:I need to do some preprocessing work on my user data, like remove some irrelevant attributes?
Not specifically tailored for clustering tasks, but you might try treating the clusters as classes and use Orange.feature.scoring to remove least informative features and observe what effect does removing them have on the silhouette.

Re: New to orange, new to data mining

Postby virtualsolo » Tue Sep 11, 2012 9:03

Thanks for your reply.

After trying KMeans clustering, I got several questions with clustering my user data:
    1. Why userID is displayed as '?' in km.centroids,(I marked userID with prefix S#) how should I do if want to see which users are selected as centroids in each iteration
    2. I want to plot the clustering process following the kmeans-trace.py. But in my case, how do I specify the x,y attributes to determine corordinates, because most relevant attributes are discrete(like smoker, drink_level etc.)

Sorry for my continuous questions at the start, I really want to do something useful with orange :)


Return to Questions & Support