Ticket #795 (closed task: wontfix)

Opened 3 years ago

Last modified 3 years ago

Multiclass learner wrapper (contrib)

Reported by: yang Owned by: matija
Milestone: 2.5 Component: library
Severity: minor Keywords:
Cc: Blocking:
Blocked By:

Description

A learner that wraps around an existing learner for learning multi-class predictions, basically by building a binary classifier for each class value. This is somewhat similar to orngMultivariatePrediction, but is easier to use and more complete (its MultiClassPrediction isn't really orange-classifier-like).

import orange

def transform_classvar(table, new_values, xformer):
  classvar = table.domain.classVar
  newclassvar = orange.EnumVariable(name=classvar.name, values=new_values)
  
  newattrs = [a for a in table.domain if a != classvar]
  
  newdomain = orange.Domain(newattrs, newclassvar)
  
  for id, meta in table.domain.getmetas().iteritems():
    newdomain.addmeta(id, meta)
    
  newtable = orange.ExampleTable(newdomain)
  for row in table:
    newrow = orange.Example(
      newdomain, 
      [v.value for v in row if v.variable != classvar] +
      [xformer(row[classvar].value)]) 

    for id, val in row.getmetas().iteritems():
      newrow.setmeta(id, val.value)
      
    newtable.append(newrow)
    
  return newtable  

class MultiClassLearner(object):
  """
  A learner that wraps around an existing learner for learning multi-class
  predictions, basically by building a binary classifier for each class value.
  This is somewhat similar to orngMultivariatePrediction, but is
  easier to use and more complete (its MultiClassPrediction isn't really
  orange-classifier-like).
  """
  
  def __init__(self, base_learner, name='multiclass', **kwargs):
    self.base_learner = base_learner
    self.name = name
    self.__dict__.update(kwargs)
    
  def __call__(self, data, weight=0):
    classifiers = []
    for klass in data.domain.classVar.values:
      table = transform_classvar(data, ['True', 'False'], lambda val: str(val == klass))
      classifiers.append(self.base_learner(table, weight))
      
    return MultiClassClassifier(name=self.name, classifiers=classifiers, classVar=data.domain.classVar)
  
class MultiClassClassifier(object):
  def __init__(self, **kwargs):
    self.__dict__.update(kwargs)
    
  def __call__(self, x, what=orange.GetValue):
    # Get probabilities from each binary classifier
    probs = [c(x, orange.GetProbabilities) for c in self.classifiers]
    
    # We only need the true probabilities
    probs = [t for t, _ in probs]
    
    if what == orange.GetProbabilities:
      return probs
    else:
      klass = self.classVar[probs.index(max(probs))]
      return klass if what == orange.GetValue else (klass, probs)

Change History

comment:1 Changed 3 years ago by janez

  • Status changed from new to assigned
  • Owner changed from janez to matija

comment:2 Changed 3 years ago by matija

Thank you for your contribution.

This year, as part of a 'Google Summer of Code' project, Orange is on its way to get a full multi-label classification framework. Therefore, I will not include your code into Orange distribution, but we will surely take a careful look at it to see if we can gain any idea from it.

comment:3 Changed 3 years ago by matija

  • Status changed from assigned to closed
  • Resolution set to wontfix

I've sent your contribution to the student working on MLC implementation; he's read it and has learnt from it what he could.

Note: See TracTickets for help on using tickets.