Ticket #1195 (closed bug: fixed)

Opened 2 years ago

Last modified 2 years ago

RandomForestClassifer gives cpp assertion error, followed by core dump

Reported by: holbech Owned by: jurezb
Milestone: Component: library
Severity: minor Keywords: RandomForest pickle
Cc: Blocking:
Blocked By:

Description

I have learned a RandomForestClassifier using the canvas and saved it using pickle in a "Python Script"-node.

When I later unpickle the RandomForestClassifier in my own Python script, and call it on an instance created using the RandomForestClassifers own domain, I get the following error:

{{{ python: tdidt_simple.cpp:770: virtual TValue TSimpleTreeClassifier::operator()(const TExample&): Assertion `type == Regression' failed. Aborted (core dumped) }}}

I have attached the pickled file, but am unsure what else to include. Of course I will add any information that you might need.

(Btw. I am quite impressed by Orange.)

Attachments

RandomForest.pkl Download (5.6 KB) - added by holbech 2 years ago.
Screenshot.png Download (38.6 KB) - added by holbech 2 years ago.
dataset.tab Download (445.8 KB) - added by holbech 2 years ago.
dataset2.tab Download (246.8 KB) - added by holbech 2 years ago.
Not working.ows Download (23.2 KB) - added by holbech 2 years ago.
rf.pickle Download (7.9 KB) - added by holbech 2 years ago.

Change History

Changed 2 years ago by holbech

Changed 2 years ago by holbech

Changed 2 years ago by holbech

Changed 2 years ago by holbech

Changed 2 years ago by holbech

comment:1 Changed 2 years ago by holbech

I have worked some more on this, and I have managed to isolate the behaviour pretty much. I have uploaded a simple canvas-project that exhibit the problem. The flow looks like this:

The top branch reads a datafile (dataset.tab) and builds a classifier and bottom branch just reads a datafile (dataset2.tab). The "Apply_works" node applies the constructed classifier to the cases in dataset2.tab using:

for i in in_data:
    print in_classifier(i)

This works.

The node "Apply_does_not_work" has the same inputs as "Apply_works", and has a similar loop, the only difference being that the classifier is pickled and unpickled before set to work. This node fails with the error message given in my original post.

I hope this makes it easier to pursue this. (I have tried looking into the sources, but cannot entirely grasp how the c++ logic gets encapsulated in Python, and hence how the pickling happens.)

comment:2 Changed 2 years ago by ales

  • Status changed from new to assigned
  • Owner set to jurezb

comment:3 Changed 2 years ago by jurezb

holbech,

thank you for the detailed bug report. Unfortunately, executing "Apply_does_not_work" works on my computer (Linux). It generates a valid pickle file, then reads it, rebuilds the model and predicts the categories. The pickle file, which you have attached does not actually contain a complete random forest model so I'm guessing the problem is pickling. Can you run the following script in the same directory as dataset.tab:

import Orange
import pickle

data = Orange.data.Table('dataset.tab')
model = Orange.ensemble.forest.RandomForestLearner(data)

# pickle & unpickle
pickle.dump(model, open('rf.pickle', 'w'))
model = pickle.load(open('rf.pickle'))

print model(data[0])

The script outputs product on my machine and might crash on yours. If it does in fact crash, please attach the generated rf.pickle file. In any case I would like to know which operating system you are using and which version of Orange.

comment:4 Changed 2 years ago by holbech

Thank you for looking into this so quickly. The script that you gave me crashed in the same way on my machine. I am attaching the pickle file below.

My machine is a fairly unmodified Ubuntu 12.04:

$ uname -a
Linux admazely-lap1 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

I am using the latest version of Orange from the official APT-repository (version 2.0b built 24 jan this year). Python is at 2.7.3.

Changed 2 years ago by holbech

comment:5 Changed 2 years ago by jurezb

Your version of Orange is pretty old and pickling of SimpleTreeLearner, the base tree learner of random forests, was implemented only recently. You can try building a fresh version of Orange. First uninstall the old version of Orange, then issue the following commands (don't worry, the commands will only write to /tmp/orange and should not mess up your filesystem):

$ cd /tmp

# pull the latest sources from our hg repository
$ hg clone https://bitbucket.org/biolab/orange

# build the c++ sources
$ cd orange/source
$ make

# tell python where to find Orange
$ export PYTHONPATH=/tmp/orange

# run Orange canvas
$ python /tmp/orange/Orange/OrangeCanvas/orngCanvas.pyw

comment:6 follow-up: ↓ 7 Changed 2 years ago by holbech

Yes, that works. Thank you.

I am planning to use Orange in a production setup, where multiple developers and servers need to have the exact same stable version of Orange installed at all times, and hoped that apt could be my guarantee of that. I guess I will have to think of something else for that - maybe have mercurial check out a specific version or something.

Thanks again for prompt answers.

comment:7 in reply to: ↑ 6 Changed 2 years ago by jurezb

  • Status changed from assigned to closed
  • Resolution set to fixed

maybe have mercurial check out a specific version or something.

I think this will be your best bet. I'm closing the ticket now.

Note: See TracTickets for help on using tickets.