wiki:MultiLabelClassification

Version 7 (modified by loop111, 3 years ago) (diff)

It's better not to set the label attributes as meta id, because it's easy to set some of them as class if they're common attributes

Plan of Multi-label Classification Implementation

Dataset Support

Method 1: add a special prefix to each labels

Add a special prefix to each class label and set the optional flag to be ‘meta’. For example, there are a four-label data set, “Sports”, “Religion”, ”Science”, ”Politics”, respectively. Then we can name their attribute names as “_c_Sports”, “_c_Religion”, ” _c_Science”, ” _c_Politics”. With this flag, we can deal the labels.

What needs to be changed?

Problems to be solved:

  • Whenever it visits the class attributes, the code should search all the attributes to locate the attributes that have prefix “_c_”. This problems can be solved by adding some flag to indicate which attributes are class type.

Method 2: using special Attribute value

Since Orange can also support arbitrary attributes types such as a list, derived from PythonVariable. In addition, they can be converted to ordinary types. See more in PythonVariable link. In this way, we can store the labels as a list, and use special converter like getValueFrom to deal with it.

Method 3: adding a special value into their 'attributes' dictionary

The Variable descriptors can store additional variables, see Storing additional variables. So we can add a special value, like "lable" into the 'attributes' dictionary with vaule 1.

In this way, a tab file will look like below:

Feature	Sports	Religion	SCience	Politics
d	d	d	d	d
	label=1	label=1	label=1	label=1
1	1	0	0	1
2	0	0	1	1
3	1	0	0	0
4	0	1	1	0

where 'Sports','Religion','SCience','Politics' are multi labels.The first example has feature "Feature=1" and belongs to label 'Sports' and label 'Politics'; The second example belongs to label 'SCience' and label 'Politics'; The third example belongs to only label 'Sports'; The fourth example belongs to label 'Religion' and label 'SCience';

What needs to be changed?

  • Getclass method

What doesn't need to be changed?

  • Input file parser, like Tab file
  • one-label methods

Method 4: allow to multi 'class' optional flags

Now the tab file can have only at most one 'class' flag. We can allow several attributes to be 'class'.

What needs to be changed?

  • ExampleTable: add a vector to store all the class's names
  • date input and output related to ExampleTable

Problem-Transformation Methods

  • BR (mulan/classifier/transformation/BinaryRelevance.java)
  • CLR (mulan/classifier/transformation/CalibratedLabelRanking.java)
  • LP (mulan/classifier/transformation/LabelPowerset.java)
  • PPT (mulan/classifier/transformation/PPT.java)

Algorithm Adaptation Methods

  • ML-kNN (mulan/classifier/lazy/MLkNN.java)
  • BR-KNN (mulan/classifier/lazy/BRkNN.java)
  • MultiLabel-KNN (mulan/classifier/lazy/MultiLabelKNN.java)
  • MMP (mulan/classifier/neural/model/MMPLearner.java)
  • NaiveBayes (weka.classifiers.bayes/NaiveBayes.class)

Evaluation Measures

  • Hamming loss (mulan/evaluation/loss/HammingLoss.java)
  • Example-based Accuracy, Precision, Recall (mulan/evaluation/measure/ExampleBased*.java)
  • Label-based (mulan/evaluation/measure/LabelBased*.java)
  • Ranking (mulan/evaluation/measure/RankingLoss.java)
  • Average precision (mulan/evaluation/measure/AveragePrecision.java)
  • Hierarchical (mulan/evaluation/measure/HierarchicalLoss.java

Feature Selection

  • LP based (mulan/dimensionalityReduction/LabelPowersetAttributeEvaluator.java)
  • Transformation based(../MultiClassAttributeEvaluator.java)
  • Ranking (../Ranker.java)

GUI Support

See Widget development manual

ToDo List

Before May 23 (official coding time)

  • Designing the dataset support
  • choosing which transformation and adaptive methods to implement
  • Familiar with structure of Orange source code
  • How the Python code and C++ code combined?

May 23 – June 18 (Official coding period starts)

  • design the framework to support multi-label classification, including the multi-label data structure –instance, instances, attribute, evaluator, etc.
  • Coding on designing basic multi-label dataset
  • two of problem-transformation methods-Binary relevance (BR), Calibrated label ranking (CLR)
  • one GUI widget
  • two evaluation measures: Example-based Hamming-Loss, Classfication Accuracy, Precision, Recall; Label-based.
  • convert  mulan multilabel data file format to Tab file

June 18 – July 5

  • Finish the work on improving problem-transformation methods
  • test the whole work to ensure it can work properly.
  • Start to code on algorithm adaptation method: ML-KNN, Multi-class multi-label perceptron (MMP).

July 6 – July 15 (Mid-term)

  • Finish the work on adaptation models and do some test work to ensure it could work properly
  • Start to implement feature selection methods: LP based and Transformation based.
  • Submit mid-term evaluation.

July 16– July 31

  • Finish all my work on the Multi-Label project and do bug fixing work and test.
  • Make a document about what we have now and what to do next.

August 1 – August 15

  • Redundant time for some unpredictable stuff to do.
  • Submit final evaluation.