Orange Forum • View topic - Random Forest

Random Forest

Report bugs (or imagined bugs).
(Archived/read-only, please use our ticketing system for reporting bugs and their discussion.)
Forum rules
Archived/read-only, please use our ticketing system for reporting bugs and their discussion.

Random Forest

Postby Guest » Sat Oct 22, 2005 15:44

Hello,

I installed the orange-source-snap-2005-10-21 (but I get the same problems with previous snaps and with orange-source-0.9.61).

The ensemble2.py example for random forest learning works fine with the bupa.tab dataset, but fails with some other dataset (like the lenses.tab). The error message is:

>>> results = orngTest.crossValidation(learners, data, folds=10)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "orngTest.py", line 160, in crossValidation
return apply(testWithIndices, (learners, (examples, weight), indices, indicesrandseed, pps), argkw)
File "orngTest.py", line 366, in testWithIndices
cr = classifiers[cl](ex, orange.GetBoth)
File "orngEnsemble.py", line 217, in __call__
a = [x for x in c(example, orange.GetProbabilities)]
TypeError: iteration over non-sequence


If I use random forest as classifier, i.e.:
>>> forest = orngEnsemble.RandomForestLearner(data, trees=50, name="forest")

the training went fine, but, when I try to get the predictions, in some cases I get errors, like

>>> forest(data[0])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "orngEnsemble.py", line 229, in __call__
cfreq[int(c(example))] += 1
TypeError: value of 'lenses' is unknown and cannot be cast to an integer

but not for all data instances, i.e.
>>> forest(data[3])
<orange.Value 'lenses'='hard'>

I can't see any differences between data[0] and data[3] in lenses.tab.
Similar behaviour occurs for other datasets with discrete attributes that I created.

Oh, and in the orngEnsemble.py there is the header of a function which is then emtpy (row 9):

def boostrapSample(examples, rand, returnOutOfBag=0):

(I commented it)


Thanks for any help

Re: Random Forest

Postby Brett » Fri Dec 09, 2005 19:26

I am having the same problem with the crossValidation line of ensemble2.py. It works on bupa, but I get the same error reported below when I apply it to other data sets (adult_sample.tab, cars.tab, galaxy.tab). I haven't yet found another data set it works on.

Thanks for any suggestions.

Anonymous wrote:Hello,

I installed the orange-source-snap-2005-10-21 (but I get the same problems with previous snaps and with orange-source-0.9.61).

The ensemble2.py example for random forest learning works fine with the bupa.tab dataset, but fails with some other dataset (like the lenses.tab). The error message is:

>>> results = orngTest.crossValidation(learners, data, folds=10)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "orngTest.py", line 160, in crossValidation
return apply(testWithIndices, (learners, (examples, weight), indices, indicesrandseed, pps), argkw)
File "orngTest.py", line 366, in testWithIndices
cr = classifiers[cl](ex, orange.GetBoth)
File "orngEnsemble.py", line 217, in __call__
a = [x for x in c(example, orange.GetProbabilities)]
TypeError: iteration over non-sequence


If I use random forest as classifier, i.e.:
>>> forest = orngEnsemble.RandomForestLearner(data, trees=50, name="forest")

the training went fine, but, when I try to get the predictions, in some cases I get errors, like

>>> forest(data[0])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "orngEnsemble.py", line 229, in __call__
cfreq[int(c(example))] += 1
TypeError: value of 'lenses' is unknown and cannot be cast to an integer

but not for all data instances, i.e.
>>> forest(data[3])
<orange.Value 'lenses'='hard'>

I can't see any differences between data[0] and data[3] in lenses.tab.
Similar behaviour occurs for other datasets with discrete attributes that I created.

Oh, and in the orngEnsemble.py there is the header of a function which is then emtpy (row 9):

def boostrapSample(examples, rand, returnOutOfBag=0):

(I commented it)


Thanks for any help

Postby Blaz » Sun Feb 19, 2006 11:30

This bug should now be fixed. The problem was in orange tree classifier. To speed it up for random forests, we made some changes that affected how trees handle cases where there is no data for some (required) attribute-value combination (so-call null leaves). This has now been fixed. I did tryed it on lenses and some other smaller data sets. Let us know if you have any more problems of this type. [for random forest to work in cases you reported, download the latest snapshot of orange].

problem after snapshot

Postby Brett » Mon Feb 20, 2006 16:04

Hi,

I downloaded the 2006-02-17 snapshot and installed it (python setup.py install as root), but I'm still getting similar errors on lenses.tab. I'm not sure what might be wrong.

Thanks, Brett

[ml-brett:RandomForest]% python ensemble2.py
Traceback (most recent call last):
File "ensemble2.py", line 22, in ?
results = orngTest.crossValidation(learners, data, folds=10)
File "/usr/lib/python2.3/site-packages/orange/orngTest.py", line 161, in crossValidation
return apply(testWithIndices, (learners, (examples, weight), indices, indicesrandseed, pps), argkw)
File "/usr/lib/python2.3/site-packages/orange/orngTest.py", line 367, in testWithIndices
cr = classifiers[cl](ex, orange.GetBoth)
File "/usr/lib/python2.3/site-packages/orange/orngEnsemble.py", line 214, in __call__
a = [x for x in c(example, orange.GetProbabilities)]
TypeError: iteration over non-sequence

Postby Blaz » Mon Feb 20, 2006 16:12

my mistake: i posted this note too early. the fix was new yesterday (on CVS), and as snapshots for win are generate at midnight, you should really take the snapshot from sunday (that is, orange-win-snap-2006-02-20.exe). sorry for this.

Postby Brett » Mon Feb 20, 2006 16:18

I'm using linux. Will the changes be reflected in one of these snapshots? Thanks

Postby Janez » Mon Feb 20, 2006 21:15

Brett, you can build orange from the sources at the bottom of http://www.ailab.si/orange/downloads.asp. Random forests should work.

It seems that Linux binaries haven't compiled since 02/17 and the sources in there are old as well. I'll check what's wrong tomorrow.


Return to Bugs



cron