Version 3 (modified by wencanluo, 3 years ago) (diff)

Initial version of the plan of MultiLabelClassification for Google Summer code by Wencan Luo

Plan of Multi-label Classification Implementation

Dataset Support

Method 1: add a special prefix to each labels

Add a special prefix to each class label and set the optional flag to be ‘meta’. For example, there are a four-label data set, “Sports”, “Religion”, ”Science”, ”Politics”, respectively. Then we can name their attribute names as “_c_Sports”, “_c_Religion”, ” _c_Science”, ” _c_Politics”. With this flag, we can deal the labels. What needs to be changed?

Problems to be solved:

  • Whenever it visits the class attributes, the code should search all the attributes to locate the attributes that have prefix “_c_”

Method 2: using special Attribute value

Since Orange can also support arbitrary attributes types such as a list, derived from PythonVariable. In addition, they can be converted to ordinary types. See more in PythonVariable link. In this way, we can store the labels as a list, and use special converter like getValueFrom to deal with it.

Method 3: adding a special value into their 'attributes' dictionary

Method 4: allow to multi 'class' optional flags

Now the tab file can have only at most one 'class' flag. We can allow several attributes to be 'class'. What needs to be changed?

  • ExampleTable: add a vector to store all the class's names
  • date input and output related to ExampleTable

Problem-Transformation Methods

  • BR (mulan/classifier/transformation/
  • CLR (mulan/classifier/transformation/
  • LP (mulan/classifier/transformation/
  • PPT (mulan/classifier/transformation/

Algorithm Adaptation Methods

  • ML-kNN (mulan/classifier/lazy/
  • BR-KNN (mulan/classifier/lazy/
  • MultiLabel-KNN (mulan/classifier/lazy/
  • MMP (mulan/classifier/neural/model/
  • NaiveBayes (weka.classifiers.bayes/NaiveBayes.class)

Evaluation Measures

  • Hamming loss (mulan/evaluation/loss/
  • Example-based Accuracy, Precision, Recall (mulan/evaluation/measure/ExampleBased*.java)
  • Label-based (mulan/evaluation/measure/LabelBased*.java)
  • Ranking (mulan/evaluation/measure/
  • Average precision (mulan/evaluation/measure/
  • Hierarchical (mulan/evaluation/measure/

Feature Selection

  • LP based (mulan/dimensionalityReduction/
  • Transformation based(../
  • Ranking (../

GUI Support

See Widget development manual


April 25 – May 23 (Before official coding time)
To discuss the details about my ideas with my mentor to archive a final agreement, including designing the dataset support, choosing which transformation and adaptive methods to implement. Based on the final agreement, I will write some testing to make clear about all goals.
May 23 – June 18 (Official coding period starts)
Start to design the framework to support multi-label classification, including the multi-label data structure –instance, instances, attribute, evaluator, etc. Coding on designing basic multi-label dataset, two of problem-transformation methods-Binary relevance (BR), Calibrated label ranking (CLR), one GUI widget, and two evaluation measures: Example-based Hamming-Loss, Classfication Accuracy, Precision, Recall; Label-based.
June 18 – July 5
Finish the work on improving problem-transformation methods, and test the whole work to ensure it can work properly. Start to code on algorithm adaptation method: ML-KNN, Multi-class multi-label perceptron (MMP).
July 6 – July 15 (Mid-term)
Finish the work on adaptation models and do some test work to ensure it could work properly. Start to implement feature selection methods: LP based and Transformation based. Submit mid-term evaluation.
July 16– July 31
Finish all my work on the Multi-Label project and do bug fixing work and test. Make a document about what we have now and what to do next.
August 1 – August 15
Redundant time for some unpredictable stuff to do. If it is possible, I could work on to implement more problem-transformation models, adapted models and evaluation methods. Submit final evaluation.