|Version 4 (modified by wencanluo, 3 years ago) (diff)|
Plan of Multi-label Classification Implementation
Method 1: add a special prefix to each labels
Add a special prefix to each class label and set the optional flag to be ‘meta’. For example, there are a four-label data set, “Sports”, “Religion”, ”Science”, ”Politics”, respectively. Then we can name their attribute names as “_c_Sports”, “_c_Religion”, ” _c_Science”, ” _c_Politics”. With this flag, we can deal the labels.
What needs to be changed?
Problems to be solved:
- Whenever it visits the class attributes, the code should search all the attributes to locate the attributes that have prefix “_c_”. This problems can be solved by adding some flag to indicate which attributes are class type.
Method 2: using special Attribute value
Since Orange can also support arbitrary attributes types such as a list, derived from PythonVariable. In addition, they can be converted to ordinary types. See more in PythonVariable link. In this way, we can store the labels as a list, and use special converter like getValueFrom to deal with it.
Method 3: adding a special value into their 'attributes' dictionary
Method 4: allow to multi 'class' optional flags
Now the tab file can have only at most one 'class' flag. We can allow several attributes to be 'class'.
What needs to be changed?
- ExampleTable: add a vector to store all the class's names
- date input and output related to ExampleTable
- BR (mulan/classifier/transformation/BinaryRelevance.java)
- CLR (mulan/classifier/transformation/CalibratedLabelRanking.java)
- LP (mulan/classifier/transformation/LabelPowerset.java)
- PPT (mulan/classifier/transformation/PPT.java)
Algorithm Adaptation Methods
- ML-kNN (mulan/classifier/lazy/MLkNN.java)
- BR-KNN (mulan/classifier/lazy/BRkNN.java)
- MultiLabel-KNN (mulan/classifier/lazy/MultiLabelKNN.java)
- MMP (mulan/classifier/neural/model/MMPLearner.java)
- NaiveBayes (weka.classifiers.bayes/NaiveBayes.class)
- Hamming loss (mulan/evaluation/loss/HammingLoss.java)
- Example-based Accuracy, Precision, Recall (mulan/evaluation/measure/ExampleBased*.java)
- Label-based (mulan/evaluation/measure/LabelBased*.java)
- Ranking (mulan/evaluation/measure/RankingLoss.java)
- Average precision (mulan/evaluation/measure/AveragePrecision.java)
- Hierarchical (mulan/evaluation/measure/HierarchicalLoss.java
- LP based (mulan/dimensionalityReduction/LabelPowersetAttributeEvaluator.java)
- Transformation based(../MultiClassAttributeEvaluator.java)
- Ranking (../Ranker.java)
See Widget development manual
- April 25 – May 23 (Before official coding time)
- To discuss the details about my ideas with my mentor to archive a final agreement, including designing the dataset support, choosing which transformation and adaptive methods to implement. Familiar with structure of Orange source code. How the Python code and C++ code combined? Based on the final agreement, I will write some testing code to make clear about all goals.
- May 23 – June 18 (Official coding period starts)
- Start to design the framework to support multi-label classification, including the multi-label data structure –instance, instances, attribute, evaluator, etc. Coding on designing basic multi-label dataset, two of problem-transformation methods-Binary relevance (BR), Calibrated label ranking (CLR), one GUI widget, and two evaluation measures: Example-based Hamming-Loss, Classfication Accuracy, Precision, Recall; Label-based.
- June 18 – July 5
- Finish the work on improving problem-transformation methods, and test the whole work to ensure it can work properly. Start to code on algorithm adaptation method: ML-KNN, Multi-class multi-label perceptron (MMP).
- July 6 – July 15 (Mid-term)
- Finish the work on adaptation models and do some test work to ensure it could work properly. Start to implement feature selection methods: LP based and Transformation based. Submit mid-term evaluation.
- July 16– July 31
- Finish all my work on the Multi-Label project and do bug fixing work and test. Make a document about what we have now and what to do next.
- August 1 – August 15
- Redundant time for some unpredictable stuff to do. If it is possible, I could work on to implement more problem-transformation models, adapted models and evaluation methods. Submit final evaluation.