Orange Forum • View topic - Failing while building SVM classifier

Failing while building SVM classifier

A place to ask questions about methods in Orange and how they are used and other general support.

Failing while building SVM classifier

Postby Alan Gibson » Sun Jan 07, 2007 14:21

first off, great work on orange. i really like what i see so far, but im running into a bizarre problem using support vector machines. im trying to do relatively straight forward natural language classification, but i cant get a classifier built.

im using a very sparse document-term matrix stored in a csv file as input. the csv file itself is 25.7MB; being a representation of textual data, most of this is just zeros. there are about 4000 examples (rows) and about 5000 word attributes (columns). there is one additional column for the class, named 'cD#category'.

the data loads fine with ExampleTable. calling the ExampleTables domain attribute returns all of the expected ~5000 attributes.

upon building the classifier, python seems to have a full scale nervous breakdown. after the control is returned to the python command line (ie the learner has presumably finished building the classifier), the python cli still responds but nothing works, not even simple print statements. main memory is tapped out, but plenty of swap file remains, so i dont think the issue is resources.

when used directly from the command line, libsvm has no problem handling this data.

any ideas on what might be going on?

thanks,
alan

Postby Ales » Fri Mar 23, 2007 11:23

Try using the latest snapshot to fix the output.
As far as the memory is concerned, just loading a 5000x4000 file into orange eats up 300M of memory for me and the svm uses another copy of it for internal use (without taking into account the sparseness -I'll try to fix this some day).

Postby Ales » Fri May 18, 2007 9:07

and the svm uses another copy of it for internal use (without taking into account the sparseness -I'll try to fix this some day).


Fixed it. Or to be more exact I added a new learner SVMLearnerSparse that learns from the examples meta attributes (we use meta attributes in orange to represent sparse data sets)


Return to Questions & Support



cron