Ticket #1111 (accepted wish)

Opened 3 years ago

Last modified 3 years ago

MultiTreeLearner fails on missing classes

Reported by: blaz Owned by: lanz
Milestone: Future Component: library
Severity: minor Keywords:
Cc: Blocking:
Blocked By:

Description

MultiTreeLearner fails if any of the classes includes missing values. Should work for this type of data as well. Example:

import Orange
data = Orange.data.Table("bridges.mt1.tab")
tree = Orange.multitarget.tree.MultiTreeLearner()
c = tree(data)

Throws ValueError: Data has missing class values.

Change History

comment:1 Changed 3 years ago by lanz

  • Status changed from new to accepted
  • Component changed from bioinformatics add-on to library

Well, when features have missing values it should work (and does, as far as I know). When classes have missing values it becomes more like a semi-supervised learning problem, for which there could be multiple approaches.

Probably the wish is that data should be automatically imputed before learning and that sounds reasonable to me (see also #1087). It is of course better than nothing, the only drawback of doing it automatically could be that the user would not even know when he is doing supervised or semi-supervised learning.

Also some data (multilabel?) have many classes, but only a few are defined for each instance. These should probably be handled differently (maybe transform missing to a special value or use some other appropriate method).

A little off-topic: I see bridges has non-binary categorical classes... to handle these correctly, a modified splitting function should be used (entropy instead of variance). This is still on a todo list - I will open another ticket for that.

Note: See TracTickets for help on using tickets.