Orange Forum • View topic - Text Classification problem

Text Classification problem

A place to ask questions about methods in Orange and how they are used and other general support.

Text Classification problem

Postby w4k1ng » Sun Jun 01, 2014 0:42

I want to use orange to classify text into a 'problem' category (what is the problem).

1. At present I am feeding the data into a few classifiers (bayes, svm) and it is just spitting out one problem category. So we've got 10000 rows of data all forum posts that indicate a particular problem.There's about 5 problem categories, the text has been cleaned and split into individual word terms etc. No matter what I've tried I can only get a classifier to output/ classify as one particular problem against the test data

2. One issue is that a post can be linked to up to 4 problems. I am not sure how to cut the data and set up the classifiers to handle multiple problems...can anyone assist?

Any thoughts or ideas would be appreciated.

Thanks

Re: Text Classification problem

Postby axanthos » Sun Jun 01, 2014 14:05

Regarding your second bullet point, I'd consider training 5 independent classifiers to produce a binary response for each of the 5 categories. Something like:
- classifier 1, "does the post belong to category 1" => yes/no
- classifier 2, "does the post belong to category 2" => yes/no
and so on.

Another option would be to recode multiple categories into single labels (e.g. "00001" for a post that belongs to category 1 only, "00010" for category 2, "00011" for categories 1 and 2, and so on), then proceed as with a single category classification task.

Hope this helps

Re: Text Classification problem

Postby w4k1ng » Tue Jun 03, 2014 5:07

Thanks axanthos, which classifiers should I look at and how would I set it up in Orange? (Without code)

Thanks

Re: Text Classification problem

Postby axanthos » Fri Jun 13, 2014 15:01

It seems you're already doing it:

I am feeding the data into a few classifiers (bayes, svm) and it is just spitting out one problem category


My suggestion is essentially to repeat this for each category (with whatever classifier gives the best results for your specific dataset, as shown by widget Evaluate > Test learners, e.g.).


Return to Questions & Support