Orange Forum • View topic - A Question about Random Forests

A Question about Random Forests

A place to ask questions about methods in Orange and how they are used and other general support.

A Question about Random Forests

Postby suraj_amo » Fri Aug 17, 2007 14:18

Hello,

Breiman's Random Forest implementation gives access to various parameters that can be modified - Number of Trees(N), Number of Randomly selected Variables for tree-building (parameter f) , and attributes to build a forest (m), being the most important ones.

For the Orange implementation - The number of trees can be set trivially.
It was not apparent how the other parameters can be tweaked.

How can these parameters be accessed ?

Also - does the Orange implementation have ways of computing the importance of the features (this procedure is found in Breimans implementation ).

If not - are there any recommendations for the changes as above ?

Thanks !
Suraj

Postby Blaz » Tue Aug 21, 2007 10:03

When deciding which attribute will be used in the decision tree node, The Random Forest implementation in orngEnsemble randomly chooses a fixed-size subset of attributes and then from this subset selects the most informative one. The size of this set is controlled with the argument "attributes". See documentation for this and other arguments.

I lately use Random Forests often, they indeed prove most predictive on many of data sets from bioinformatics (compared with, for instance, C4.5, variants of SVM, and Naive Bayes). I most often limit the depth of the trees that are included. You can do this by setting the properties of the learner (that is, the property of classification tree inducer) by, for example:

Code: Select all
forest = orngEnsemble.RandomForestLearner(trees=100, name="forest")
forest.learner.maxDepth=5


The above would build random forest with the trees with the maximum depth of 5 (root node having a depth of 1, I believe).

Feature importance with rand forests - we have a prototype code for this, you've just reminded me to include it in CVS (should be done in a few days, will post a note here).

Postby suraj_amo » Tue Aug 21, 2007 13:27

Hi Blaz ,

Thank you very much for the information ! That was very helpful.

The importance calculation should be a great help for the Random Forests code too.

Regards,
Suraj

Postby marko » Wed Aug 22, 2007 8:52

For computing feature importance with random forests, you can use MeasureAttribute_randomForests. It is documented under orngEnsemble.

Marko


Return to Questions & Support