Postby suraj_amo » Thu May 31, 2007 22:04

Hello ! ,

I was playing with picking a 'learned' object of a decision tree or a random forest, such that it could be instantiated and used later. I was playing with the wines dataset from UCI repository.

I found that the 'unpickled' tree did not have the correct attribute information for the branch nodes.

Original Tree :

| A7<1.400
| | A10<3.635: 2 (100.00%)
| | A10>=3.635: 3 (100.00%)
| A7>=1.400
| | A13>=725.000: 1 (100.00%)
| | A13<725.000
| | | A12<1.445: 3 (100.00%)
| | | A12>=1.445: 2 (100.00%)
| A10<3.525: 2 (100.00%)
| A10>=3.525: 1 (100.00%)

Pickled - Saved - Unpickled - Loaded tree :

| Wine<1.400
| | Wine<3.635: 2 (100.00%)
| | Wine>=3.635: 3 (100.00%)
| Wine>=1.400
| | Wine>=725.000: 1 (100.00%)
| | Wine<725.000
| | | Wine<1.445: 3 (100.00%)
| | | Wine>=1.445: 2 (100.00%)
| Wine<3.525: 2 (100.00%)
| Wine>=3.525: 1 (100.00%)

The very simplistic steps i used where these ->
import pickle
import orange, orngTree, orngEnsemble

# read data
data = orange.ExampleTable('')

# build tree
tree = orngTree.TreeLearner(data)

# save to file
file1 = file('pickler_save.txt','w')
pickler = pickle.Pickler(file1)

# read from file
file2 = file('pickler_save.txt','r')
unpickler = pickle.Unpickler(file2)
tree_file = unpickler.load()



Am i doing anything wrong ?

Thank you very much,

Postby Janez » Sat Jun 02, 2007 11:55

There are two issues here. Your trees are just printed out wrong, they would actually work ... if it wasn't for another, more major problem with pickling which we were aware of for quite some time, but hoped it would not appear. I am not going into details, but it's about a certain inconsistency for which we hoped it wouldn't matter. The problem is that I see no other way of fixing it than redesigning a part of the kernel, which would take at least week of work. I hope I'd have time this week, but I can promise nothing.

Postby suraj_amo » Mon Jun 04, 2007 15:43

Hi Janez,

Thank you for the clarification !

I did test the working of the trees too. I did the following

ex = orange.Example(data.domain, list(data[1]))

print tree(ex,orange.GetBoth)
> > (<orange.Value 'Wine'='1'>, <1.000, 0.000, 0.000>)

print tree_file(ex,orange.GetBoth)
> >(<orange.Value 'Wine'='2'>, <0.331, 0.399, 0.270>)

Are the different results due to the printing problem too ? I was looking to build an application that can save the trees, and also re-instantiate them to validate them on newer unseen datasets.

Is there any other way you suggest me to proceed ?

Thanks much for your time,

Postby Janez » Mon Jun 04, 2007 17:00

Oops, this is more than a printing problem, I'll try to fix this as soon as possible. I know a quick solution, so I hope I can do it in a few days.

Postby suraj_amo » Mon Jun 04, 2007 18:06

Thanks Janez !

Postby Janez » Wed Jun 20, 2007 0:48

I'm sorry, this is getting worse and worse the more I go into it. I'll have to put a note that pickling is unsafe - what you get at unpickling may or may not be the same this which you pickled.

We found out that Orange's structures are connected in such a way that they cannot always be unpickled, but we can make the problem rare and we can always detect whether the problem occurred and raise an exception in that case.

But it will require some work, and since I'm the only one who knows that part of Orange and since I'm going to vacation in a few days, it will have to wait. We will probably even postpone the problem and go after other, simpler and more commonly occurring bugs first.

Sorry about that...


Postby suraj_amo » Wed Jun 20, 2007 13:37

Thank you for keeping me updated !

I will try to find a work-around to do what I need to do. :)

Thanks much,

All methods have pickling problems?

Postby danielamaral » Tue Aug 21, 2007 18:27

I was wondering if this pickling problem affects all methods, or just the orngTree. Thanks!

Postby Janez » Tue Aug 21, 2007 19:24

Not all, only most. :(

It's almost solved, but we need some time to finish it, and now we'd also like to test it more thoroughly then the first time.


