Orange Forum • View topic - ReliefF does not work with TreeLearner?

ReliefF does not work with TreeLearner?

Report bugs (or imagined bugs).
(Archived/read-only, please use our ticketing system for reporting bugs and their discussion.)
Forum rules
Archived/read-only, please use our ticketing system for reporting bugs and their discussion.

ReliefF does not work with TreeLearner?

Postby rmarko » Sun Apr 30, 2006 18:41

The code
Code: Select all
tree = orngTree.TreeLearner(measure="relief")
classifier=tree(data)


produces the error message

File "C:\Python23\lib\site-packages\orange\orngTree.py", line 33, in __call__
tree = self.learner(examples, weight)
KernelException: 'orange.TreeSplitConstructor_Threshold': cannot use a measure that requires an example set or domain contingency


Is this an error or should I supply additional arguments?
Other measures like infoGain work as documented.
I have the latest snapshot.

Cheers, Marko

Postby Janez » Mon May 01, 2006 18:03

Hi Marko,

it doesn't work with continuous attributes. It's an old bug which I never added to the list and never fixed. I added it now - you need it soon?

(It's not that ReliefF wouldn't be able to do it, the bug is a side-effect of some optimizations in the tree induction algorithm.)

Janez

Postby rmarko » Tue May 02, 2006 14:26

Hi Janez,

it is not an urgent matter, I will do a bypass :)
Let's hope that we will have anopther version avilable in Orange soon.

Cheers, Marko

Postby Janez » Sun Jun 04, 2006 22:15

Marko,

I'm working on that. When Orange computes the gain ratio or similar (myopic) impurity measures for different thresholds for continuous attributes (or for different subsets for the multinominal discrete), all it needs to do is to throw examples from the distribution for the right interval into the left one and recompute the entropy. ReliefF was programmed according to its textbook definition which doesn't allow for such a shortcut.

I am now rewriting the implementation of ReliefF so that it prepares lists of reference examples and their neighbours, so it can quickly (although not as fast as impurity measures) compute ReliefF for any attribute constructed from the existing attributes. These new attributes do not participate in the computation of distances between examples.

OK, I'm writing this just to tell you that we're working on that. Since you based your earlier career on ReliefF, drop by if you have any advice I could use.

Postby Janez » Sat Jun 10, 2006 22:13

Fixed, I guess.

ReliefF should now work in tree induction on data sets with continuous attributes, too. It will even select the attribute with the best ReliefF on the binarized attribute, not the best binarization for the attribute with the best ReliefF.

However, if you use ReliefF with discrete attributes, don't enable binarization. This used to throw an exception - the same one as in your first message. Now it is implemented, but still needs some debugging. With binarization, ReliefF will be computed for each possible binarization of the attribute. It will throw an exception for attributes with more than 16 values, but such attributes are not that common.


Return to Bugs