classification, gsoc, mlc, multilabel

Multi-label classification (and Multi-target prediction) in Orange

BIOLAB

Jan 09, 2012

The last summer, student Wencan Luo participated in Google Summer of Code to implement Multi-label Classification in Orange. He provided a framework, implemented a few algorithms and some prototype widgets. His work has been "hidden" in our repositories for too long; finally, we have merged part of his code into Orange (widgets are not there yet ...) and added a more general support for multi-target prediction.

You can load multi-label tab-delimited data (e.g. emotions.tab) just like any other tab-delimited data:

    >>> zoo = Orange.data.Table('zoo')            # single-target
    >>> emotions = Orange.data.Table('emotions')  # multi-label

The difference is that now zoo's domain has a non-empty class_var field, while a list of emotions' labels can be obtained through it's domain's class_vars:

    >>> zoo.domain.class_var
    EnumVariable 'type'
    >>> emotions.domain.class_vars
    <EnumVariable 'amazed-suprised',
     EnumVariable 'happy-pleased',
     EnumVariable 'relaxing-calm',
     EnumVariable 'quiet-still',
     EnumVariable 'sad-lonely',
     EnumVariable 'angry-aggresive'>

A simple example of a multi-label classification learner is a "binary relevance" learner. Let's try it out.

    >>> learner = Orange.multilabel.BinaryRelevanceLearner()
    >>> classifier = learner(emotions)
    >>> classifier(emotions[0])
    [<orange.Value 'amazed-suprised'='0'>,
     <orange.Value 'happy-pleased'='0'>,
     <orange.Value 'relaxing-calm'='1'>,
     <orange.Value 'quiet-still'='1'>,
     <orange.Value 'sad-lonely'='1'>,
     <orange.Value 'angry-aggresive'='0'>]
    >>> classifier(emotions[0], Orange.classification.Classifier.GetProbabilities)
    [<1.000, 0.000>, <0.881, 0.119>, <0.000, 1.000>,
     <0.046, 0.954>, <0.000, 1.000>, <1.000, 0.000>]

Real values of label variables of emotions[0] instance can be obtained by calling emotions[0].get_classes(), which is analogous to the get_class method in the single-target case.

For multi-label classification, we can also perform testing like usual, however, specialised evaluation measures have to be used:

    >>> test = Orange.evaluation.testing.cross_validation([learner], emotions)
    >>> Orange.evaluation.scoring.mlc_hamming_loss(test)
    [0.2228780213603148]

In one of the following blog posts, a multi-target regression method PLS that is in the process of implementation will be described.