Orange Forum • View topic - Pickling Trees/Forests for re-instantiation

## Pickling Trees/Forests for re-instantiation

Report bugs (or imagined bugs).
(Archived/read-only, please use our ticketing system for reporting bugs and their discussion.)
Forum rules
Archived/read-only, please use our ticketing system for reporting bugs and their discussion.

### Pickling Trees/Forests for re-instantiation

Hello ! ,

I was playing with picking a 'learned' object of a decision tree or a random forest, such that it could be instantiated and used later. I was playing with the wines dataset from UCI repository.

I found that the 'unpickled' tree did not have the correct attribute information for the branch nodes.

Eg:
Original Tree :

A13<875.000
| A7<1.400
| | A10<3.635: 2 (100.00%)
| | A10>=3.635: 3 (100.00%)
| A7>=1.400
| | A13>=725.000: 1 (100.00%)
| | A13<725.000
| | | A12<1.445: 3 (100.00%)
| | | A12>=1.445: 2 (100.00%)
A13>=875.000
| A10<3.525: 2 (100.00%)
| A10>=3.525: 1 (100.00%)

Pickled - Saved - Unpickled - Loaded tree :

Wine<875.000
| Wine<1.400
| | Wine<3.635: 2 (100.00%)
| | Wine>=3.635: 3 (100.00%)
| Wine>=1.400
| | Wine>=725.000: 1 (100.00%)
| | Wine<725.000
| | | Wine<1.445: 3 (100.00%)
| | | Wine>=1.445: 2 (100.00%)
Wine>=875.000
| Wine<3.525: 2 (100.00%)
| Wine>=3.525: 1 (100.00%)

The very simplistic steps i used where these ->
----------------------
import pickle
import orange, orngTree, orngEnsemble

data = orange.ExampleTable('wine.tab')

# build tree
tree = orngTree.TreeLearner(data)
orngTree.printTree(tree)

# save to file
file1 = file('pickler_save.txt','w')
pickler = pickle.Pickler(file1)
pickler.dump(tree)

file2 = file('pickler_save.txt','r')
unpickler = pickle.Unpickler(file2)

orngTree.printTree(tree_file)

----------------------

Am i doing anything wrong ?

Thank you very much,
Suraj

There are two issues here. Your trees are just printed out wrong, they would actually work ... if it wasn't for another, more major problem with pickling which we were aware of for quite some time, but hoped it would not appear. I am not going into details, but it's about a certain inconsistency for which we hoped it wouldn't matter. The problem is that I see no other way of fixing it than redesigning a part of the kernel, which would take at least week of work. I hope I'd have time this week, but I can promise nothing.

Hi Janez,

Thank you for the clarification !

I did test the working of the trees too. I did the following

-----
ex = orange.Example(data.domain, list(data[1]))

print tree(ex,orange.GetBoth)
> > (<orange.Value 'Wine'='1'>, <1.000, 0.000, 0.000>)

print tree_file(ex,orange.GetBoth)
> >(<orange.Value 'Wine'='2'>, <0.331, 0.399, 0.270>)
-----

Are the different results due to the printing problem too ? I was looking to build an application that can save the trees, and also re-instantiate them to validate them on newer unseen datasets.

Is there any other way you suggest me to proceed ?

Suraj

Oops, this is more than a printing problem, I'll try to fix this as soon as possible. I know a quick solution, so I hope I can do it in a few days.

Thanks Janez !

I'm sorry, this is getting worse and worse the more I go into it. I'll have to put a note that pickling is unsafe - what you get at unpickling may or may not be the same this which you pickled.

We found out that Orange's structures are connected in such a way that they cannot always be unpickled, but we can make the problem rare and we can always detect whether the problem occurred and raise an exception in that case.

But it will require some work, and since I'm the only one who knows that part of Orange and since I'm going to vacation in a few days, it will have to wait. We will probably even postpone the problem and go after other, simpler and more commonly occurring bugs first.

Janez

Thank you for keeping me updated !

I will try to find a work-around to do what I need to do.

Thanks much,
Suraj

### All methods have pickling problems?

I was wondering if this pickling problem affects all methods, or just the orngTree. Thanks!

Not all, only most. :(

It's almost solved, but we need some time to finish it, and now we'd also like to test it more thoroughly then the first time.

Janez