Ticket #1243 (closed bug: fixed)

Opened 17 months ago

Last modified 9 months ago

Segmentation fault when training a randomforest learner

Reported by: blisszen Owned by: jurezb
Milestone: Component: library
Severity: major Keywords: segmentation fault
Cc: blisszen Blocking:
Blocked By:

Description (last modified by blisszen) (diff)

I installed a few days ago using easy_install, so I guess it's the latest version of Orange. When learning RandomForestLearner, python raised an error "Segmentation Fault" and stopped. This crash happens on my window machine (Windows 7 x64) and on my linux machine (Sorry I don't know much about the linux machine) as well. Please find the attached an example script that crashes, and training/testing data.

Attachments

orange_crash.zip Download (89.7 KB) - added by blisszen 17 months ago.

Change History

Changed 17 months ago by blisszen

comment:1 Changed 17 months ago by blisszen

  • Cc blisszen added
  • Severity changed from minor to major

comment:2 Changed 17 months ago by blisszen

  • Description modified (diff)

comment:3 follow-up: ↓ 4 Changed 17 months ago by ales

  • Owner set to jurezb
  • Status changed from new to assigned

The problem seems to be in SimpleTreeLearner (in tdidt_simple.cpp build_tree). It seems to get stuck in an infinite recursion always passing down data subset of the same size (20). My guess is all of the values of the split attribute on the subset are the same and the build_tree always splits it in 0/20 subsets.

comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 17 months ago by blisszen

Replying to ales:

The problem seems to be in SimpleTreeLearner (in tdidt_simple.cpp build_tree). It seems to get stuck in an infinite recursion always passing down data subset of the same size (20). My guess is all of the values of the split attribute on the subset are the same and the build_tree always splits it in 0/20 subsets.

Hi Ales,

Thank you for the reply. Then, what do I need to do to resolve it? Do I have to wait until new patch comes out?

comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 17 months ago by ales

Replying to blisszen:

Hi Ales,

Thank you for the reply. Then, what do I need to do to resolve it? Do I have to wait until new patch comes out?

Yes, although you can use a workaround for the time beeing. Set the base_learner on the RandomTreeLearner to be a SimpleTreeLearner with a fixed max_depth.

stl = Orange.classification.tree.SimpleTreeLearner(max_depth=1000)
rndfor = Orange.ensemble.forest.RandomForestLearner(training_data,
                                                    base_learner=stl)

comment:6 in reply to: ↑ 5 Changed 17 months ago by blisszen

Replying to ales:

Replying to blisszen:

Hi Ales,

Thank you for the reply. Then, what do I need to do to resolve it? Do I have to wait until new patch comes out?

Yes, although you can use a workaround for the time beeing. Set the base_learner on the RandomTreeLearner to be a SimpleTreeLearner with a fixed max_depth.

stl = Orange.classification.tree.SimpleTreeLearner(max_depth=1000)
rndfor = Orange.ensemble.forest.RandomForestLearner(training_data,
                                                    base_learner=stl)

Thank you for the workaround. I'll try.

comment:7 Changed 9 months ago by Ales Erjavec <ales.erjavec@…>

  • Status changed from assigned to closed
  • Resolution set to fixed

In [365165ac0244558a571a5738ffdb22d3aebe426d/orange]:

Added an explicit check for subset size reduction.

Fixes #1243

Note: See TracTickets for help on using tickets.