Orange Forum • View topic - Issue with classification

Issue with classification

A place to ask questions about methods in Orange and how they are used and other general support.

Issue with classification

Postby billyb » Mon Sep 10, 2012 22:07

Hi,

I've created and saved a Random Forest classificator for predicting a discrete string variable (5 classes) on a .tab file with 300.000 records and 25 variables.

While classifying the original dataset with my saved classificator works fine, I have trouble using it on data from a new period (of similar size and exactly the same structure). A small sample of 20 records runs fine, while anything larger (i.e. 10000 records) yields an immediate "pythonw.exe has stopped working" error without any other error logs in the output window.

My main workstream is very simple: an input file and a Load Classificator widget connected to the Prediction widget.

I find it hard to believe that size would be an issue since there was no problem with the original dataset. Also, hardware should not be an issue (workstation-spec with 8GB RAM, w2008 R2).

Am I missing something really obvious? Any suggestions will be appreciated.

Re: Issue with classification

Postby Ales » Tue Sep 11, 2012 14:41

I am not sure what the problem could be (it's probably not the size). The first thing is we should determine if the problem is with the classifier or the Predictions widget itself.

Can you add Python Script widget to the canvas and connect it to the 'Load Classifier' widget and File widget (in place of the . Then copy/paste and execute
Code: Select all
for inst in in_data:
    p = in_classifier(inst, 1)
print "Success"

If this succeeds (does not crash) then the problem is in the Predictions widget and not the classifier.

Re: Issue with classification

Postby billyb » Tue Sep 11, 2012 15:28

Thank you for the quick reply.

Yes, the script finished with "Success".

Re: Issue with classification

Postby Ales » Thu Sep 13, 2012 11:32

Sorry, but I can't duplicate the problem.
Can you provide an example workflow and dataset that produces the problem (if it is not confidential)?


Return to Questions & Support