Orange Forum • View topic - how to implement document classifier?

how to implement document classifier?

A place to ask questions about methods in Orange and how they are used and other general support.

how to implement document classifier?

Postby guest » Tue Aug 03, 2010 22:40

i have a series of documents, each of which belongs to one of two classes. each document contains a few thousand binary vectors. i would like to train a classifier on this data set, but all the documentation i can find describes training on tab-delimited data within a single file, so i am confused how to go about building this classifier in orange. you can think of the problem like spam filtering, where the documents are email, the elements of the documents are words, and each document is either spam or not. my case is roughly equivalent, although instead of words, the elements of the documents are binary vectors of length ~800. i built a very primitive classifier in pure python, but as i am new to these techniques, i don't quite trust it, so i wanted to build it with orange. please let me know if you have any advice, or could perhaps point me to an example that classifies documents based on elements within the document. thanks in advance!

Return to Questions & Support