Orange Forum • View topic - Error creating ExampleTable from Numpy Array

Error creating ExampleTable from Numpy Array

Report bugs (or imagined bugs).
(Archived/read-only, please use our ticketing system for reporting bugs and their discussion.)
Forum rules
Archived/read-only, please use our ticketing system for reporting bugs and their discussion.

Error creating ExampleTable from Numpy Array

Postby psederberg » Wed Sep 12, 2007 4:37

Hi Everybody:

Given that I'm already doing a lot of data manipulation with python-numpy, I figured that the easiest way for me to make use of Orange is to generate my ExampleTables on the fly instead of first writing out to file. So, I followed the tutorial on how to do this:

http://www.ailab.si/orange/doc/reference/matrix.htm

However, I think there is a problem when using the EnumVariable, which gives rise to incorrect classification and even segmentation faults. If you run the sample code provided in matrix.py, it improperly fills the class variable with #RNGE instead of the actual class:

import orange
data = orange.ExampleTable("/usr/share/doc/orange/ofb/iris.tab")
a = data.toNumarray("ac")[0]
columns = "sep length", "sep width", "pet length", "pet width"
classValues = "setosa", "versicolor", "virginica"
d4 = orange.Domain(map(orange.FloatVariable, columns),orange.EnumVariable("type", values=classValues))
t4 = orange.ExampleTable(d4, a)

In [9]: t4[0]
Out[9]: [5.100, 3.500, 1.400, 0.200, '#RNGE']

If you reuse an already-created domain, it works:

t5 = orange.ExampleTable(data.domain,a)

In [11]: t5[0]
Out[11]: [5.1, 3.5, 1.4, 0.2, 'Iris-setosa']

The incorrect classes subsequently give rise to incorrect classification.

Finally, this bug occurs on both 32- and 64-bit linux, running Python 2.5.1.

Thanks for any help or bug fixes,
Per

EnumVariable problem on 64-bit

Postby psederberg » Wed Sep 12, 2007 14:19

In a followup to my post, I've tried out various things with my 64-bit linux machine and I'm now pretty convinced the error is in EnumVariable on 64-bit.

If I run the following code:

import orange
fruit = orange.EnumVariable("fruit", values = ["plum", "apple", "lemon"])
fruit.values

no values are set and I get <>.

This code works correctly on 32-bit linux. So, does anyone have any ideas what would cause this only on 64-bit? I know 64-bit is not supported and that I had to make some minor code changes to get it to compile on my machine, but it sure would be nice to use orange on my system.

Thanks for any help,
Per

64bit domain issue

Postby C. » Thu Sep 13, 2007 16:49

I have the same problem. On a 64-bit machine the domain is not created correctly when creating ExampleTables on the fly. The only way I could make it work is to read in the domain from a file which is really cubersome. Does anybody have any idea how I could use Orange on a 64 bit machine without having to read the domain from a file? It works fine on a 32bit machine.

Thanks for any help / work-arounds!

Postby Janez » Fri Sep 14, 2007 13:33

I have a 32-bit windows machine and cannot reproduce the bug.

There is a 64-bit machine dual boot machine I could borrow to fix Orange for 64 bits, but I'd prefer to do it on Windows (I never debugged on Linux :oops:): can anybody tell me whether this problem also occurs on on 64-bit Windows with Orange compiled on that machine, not with 32-bit binary Orange?

Janez

Domain with EnumVariable bug

Postby psederberg » Fri Sep 14, 2007 15:19

Hi Janez:

I don't have any windows machines, so I can't verify for you, but since I do have 64-bit linux machines I can try and help figure it out.

I've narrowed it down to an error in creating the domain (if I use an existing domain loaded in with data from a .tab file the EnumVariable fills correctly when I create an ExampleTable from numpy arrays.)

If I create a domain with an EnumVariable as outlined in the examples, the values for the EnumVariable are empty. So, we have a relatively small area of the code to examine.

I'll look through the code, but not having looked at it before I'm at a great disadvantage compared to you. If you can point me to precise places to test I'll be happy to try them out.

Thanks for working on this. I really would like to use orange to classify neural data, but it's hard when I can't create the ExampleTables on the fly.

Best,
Per

Postby Janez » Fri Sep 14, 2007 18:39

To narrow down the problem some more, can you try whether one of these two assignments work (check fruit.values afterwards...)?

Code: Select all
fruit.values = ["plum", "apple", "lemon"]
fruit.values = orange.StringList(["plum", "apple", "lemon"])


If not, what do you get with

Code: Select all
print orange.StringList(["plum", "apple", "lemon"])


(If the first line works, the problem is in the assignment within the constructor. If the second is OK, we a problem with type casting. The last one checks if at least the StringList, the class containing the values, constructs OK.)[/code]

EnumVariable problem

Postby psederberg » Sat Sep 15, 2007 14:17

Howdy Janez:

I'm happy to report that both of the tests worked! That means that I can do all the domain creation I need by making the EnumVariable first, adding in the values, and then passing it into the Domain.

So, it looks like the error is in the constructor. If you would like for me to try out a change, just let me know and I'll recompile and test it.

Thanks for the help!
Per

EnumVariable bug in 64-bit

Postby psederberg » Wed Sep 19, 2007 20:14

Hi Janez:

Though I now know how to avoid the bug in my code, there is still code in orange that creates EnumVariables on the fly, filling in the values with the constructor, and these still break on my machine.

Specifically, I'd like to make use of the MultiClass code, however, orngMultiClass.py has the following:

bin = orange.EnumVariable(name="binary",values=['0','1'])

which fails due to the same bug.

Now that we've narrowed down that the bug is in the constructor, is there a code change to the EnumVariable constructor that would fix it on 64-bit systems?

As always, I'm happy to make the change on my machine, recompile, and test to see if it works.

Thanks,
Per

Postby Janez » Fri Sep 21, 2007 13:01

Can you try this: in file source/orange/orange.cpp, line 215 (in function SetAttr_FromDict), change
Code: Select all
int pos = 0;

to
Code: Select all
Py_ssize_t pos = 0;


Does it change anything?

Thanks for you help,
JAnez

EnumVariables

Postby psederberg » Fri Sep 21, 2007 15:49

That worked like charm, too!

I had put the cast on another line in order to get it to compile, which is probably why I was having problems:

while (PyDict_Next(dict, (Py_ssize_t *)&pos, &key, &value)) {

I figure that cast was not the best way to do it.

Thanks for all the help and I'll fix this in my code snapshot from the website so that I can use it on my 64-bit machine.

Best,
Per

64-bit Working!!!

Postby psederberg » Fri Sep 21, 2007 18:35

Hi Janez:

I made all the following changes to the current snapshot release and everything I have tried so far seems to work on 64-bit linux running Ubuntu:

[/opt/local/src/orange/source/orange] % grep -n Py_ssize_t *
dictproxy.cpp:193: Py_ssize_t pos = 0;
dictproxy.cpp:507: Py_ssize_t di_used;
dictproxy.cpp:508: Py_ssize_t di_pos;
lib_components.cpp:588: Py_ssize_t bufSize;
lib_components.cpp:4065: Py_ssize_t pos = 0;
lib_kernel.cpp:505: Py_ssize_t i = 0;
lib_kernel.cpp:1089: Py_ssize_t pos = 0;
lib_kernel.cpp:1167: Py_ssize_t pos=0;
lib_kernel.cpp:2082: Py_ssize_t pos=0;
lib_kernel.cpp:4464: Py_ssize_t pos = 0;
maptemplates.hpp:125: Py_ssize_t pos=0;
orange.cpp:215: Py_ssize_t pos = 0;


Thanks for all the help. Will there be some way to include these changes in the future releases, perhaps with a compile flag to switch the int to Py_ssize_t in those locations?

Best,
Per

Postby Janez » Fri Sep 21, 2007 21:42

That's great.

All indices in Python 2.5 should be of type Py_ssize_t. It however only matters for lists that contain more than 2^32 elements (which is not very likely to happen and would break on 32 bit machines anyway), or when the argument is a pointer to the index, for instance in function PyDict_Next, which is, according to your list, the only such function we use.

I will surely change this in the source code. No compile flags are necessary since Py_ssize_t is defined as int on 32-bit machines.

You may encounter some more problems in pickling (when it finally gets working) and in scatterplot or linear projection with bitmap background. Can you open the canvas, give some data (say Iris) to the scatter plot and check Show probabilities (tab Settings)? I assume it crashes, but it should be easy to fix.

Python documentation, PEP 353 recommends manually checking all use of "int" in the code, which is insane.

Thanks,
Janez


Return to Bugs