## How to compare classifiers across different data sets?

3 posts
• Page

**1**of**1**### How to compare classifiers across different data sets?

I have been using AUC as the primary measure to rank classifiers. With the same data set I have been using Orange.evaluation.scoring.McNemar_of_two() to determine if the difference between classifiers is significant.

I have 2 datasets with different features, and a third which is a combination of the other two. I think ranking them using AUC is still fine (need to find a mathematician to talk to).

How can I determine if the difference between classifiers, across different data sets, is significant?

I have 2 datasets with different features, and a third which is a combination of the other two. I think ranking them using AUC is still fine (need to find a mathematician to talk to).

How can I determine if the difference between classifiers, across different data sets, is significant?

### Re: How to compare classifiers across different data sets?

A couple of papers claim the the Wilcoxon signed-rank test is reasonable. This method Orange.evaluation.scoring.AUCWilcoxon() looks close. To perform the test, is it just a matter of collating the from AUCWilcoson() for each iteration (10 fold cross validation) for each classier. And then using the using scipy.stats.wilcoxon() to obtain the p-value?

### Re: How to compare classifiers across different data sets?

After discussion with the resident "math" guru in our lab here is what I doing. Hopefully is will be of some help to others, should they need to compare classifiers from different data sets (populations).

1. Use Orange.evaluation.scoring.AUCWilcoxon() for each classifier. As I understand this function, this produces a WIlcoxon statistic. That is a value that can be used in the Wilcoxon-signed rank test.

2. Given two Wilcoxon statistics, I can now use scipy.stats.wilcoxon() to obtain the p-value.

Michael.

--

Note: Posting as a separate message in case other are tracking as I am not sure if "edits" trigger emails etc.

1. Use Orange.evaluation.scoring.AUCWilcoxon() for each classifier. As I understand this function, this produces a WIlcoxon statistic. That is a value that can be used in the Wilcoxon-signed rank test.

2. Given two Wilcoxon statistics, I can now use scipy.stats.wilcoxon() to obtain the p-value.

Michael.

--

Note: Posting as a separate message in case other are tracking as I am not sure if "edits" trigger emails etc.

3 posts
• Page

**1**of**1**