Orange Forum • View topic - DECREF Leak (somewhere!)

DECREF Leak (somewhere!)

Report bugs (or imagined bugs).
(Archived/read-only, please use our ticketing system for reporting bugs and their discussion.)
Forum rules
Archived/read-only, please use our ticketing system for reporting bugs and their discussion.

DECREF Leak (somewhere!)

Postby Feanor76 » Wed Feb 21, 2007 20:47

Hey all,

So, sometime and somewhere after the stable release on August 18, 2006 (Version 0.9.63?) a memory leak seems to have found its way into the C code.

The line of code that fails is a call to ExampleTable.select with a newly created orngAttr that has a custom getValueFrom. The purpose here is to construct a new feature from an old feature.

Now, I believe the error comes from the fact that the orngAttr is a PythonVariable. The error does not occur every time ... the program will run successfully for several iterations and then crash (true crash, no traceback) on the ExampleTable.select call with the following error:

"Fatal Python error: deallocating None"

From some google research, this appears to happen when a Py_DECREF(Py_None) occurs on the last value of Py_None. Of course, if PyObject *x = Py_None and Py_DECREF(x) ... that would do it too.

I did some diff'ing between the August release and the current stable, but I wasn't able to come up with anything. There were too many differences for me to sort through. If in fact this is a problem with a DECREF dealing with attrs of type PythonVariable, what parts of the code should I look at? I assume pythonvars.cpp ... but the DECREF's in there look safe.

Also, I sprinkled my code with:

print sys.getrefcount(None)

And the count seemed to be stable (before and after the offending ExampleTable.select call). So, could there be another problem that gives the same error? Another intersting fact: apparently Python starts up with about 1500 references to None .... so if you have a DECREF leak, you have to burn through those (at least) before your program will bomb.



To give you the context (although it is embedded in a LOT of other stuff, so it won't really help):


self.orngAttr = self.myType.asOrange(self.name)
self.orngAttr.getValueFrom = self._inner

# For Aggregate and other PythonVariable Types
# we have to have this odd second level of getValueFrom
if isinstance(self.orngAttr, orange.PythonVariable):
self.orngAttr.getValueFrom.classVar = self.orngAttr

tTable = orange.ExampleTable([_li for _a in self.args
for _li in _a.getOrngDataList()])
self.orngData = tTable.select([self.orngAttr])

The last line is the line that bombs. All this code has been working for a long time (2 years).

Regards,
Mark

Postby Janez » Fri Feb 23, 2007 13:34

This can be nasty. I wanted to log all decrefs of None in Orange, but what if Orange steals a borrowed reference (not necessarily a direct reference to None) and puts it into, say, a list? None is then decref-ed when the list is deallocated a light year away.

We'll have to narrow down the code where the bug happens. You say that it occurs when you use ExampleTable.select with a PythonVariable which has getValueFrom defined? Does it also occur if you have a PythonVariable and call its getValueFrom for many times? Can you try executing for i in range(100000)): tTable[0][self.orngAttr]? Or for i in range(100000): self.orngAttr.getValueFrom(tTable[0])?

Do you have time to construct a simpler script which reconstructs the bug?

Thanks,
Janez

Postby Janez » Fri Feb 23, 2007 18:31

I may have fixed it. I found a bug in PythonVariable, which sometimes caused deallocating Py_None and was added in Sept 06, so it is probably what bothered you. Please do a cvs update of C sources (you still compile Orange yourself, right?) and check.

I just hope I didn't break anything else by fixing it (regression tests pass OK, but...)

Janez

Postby Feanor76 » Thu Mar 01, 2007 4:23

Janez,

Wow, thanks for looking at this so quickly. I can't verify if your fix in the CVS sources fixed my issue until at least Monday (March 5). I'll try to test it then. At worst, it will take me the week. I'll try to generate a simple script that recreates the error (if it still exists).

Regards,
Mark

Postby Feanor76 » Thu Mar 08, 2007 18:39

It appears that your fix worked. I need to test it a bit more thoroughly, but things look good on my end. Thanks again. That's ANOTHER beer I owe you.

Regards,
Mark


Return to Bugs