## Area under ROC in multivalued class problems

2 posts
• Page

**1**of**1**### Area under ROC in multivalued class problems

Hi,

I am relatively new to Orange so any help would be appreciated. I noticed that Oranges

default method for computing the area under ROC curve is ByWeightedPairs. I have also

found a python script in the documentation that makes me understand the computation

of the area under ROC if the testing method is CrossValidation. But i couldn't figure out

how ByWeightedPairs works if the testing method is LeaveOneOut. Obviously each fold

in this testing method contains only one item. So paring is not possible.

Is the default method for computing the area under ROC curve not ByWeightedPairs

or behaves ByWeightedPairs differently if we use LeaveOneOut testing?

Thanks in advance

I am relatively new to Orange so any help would be appreciated. I noticed that Oranges

default method for computing the area under ROC curve is ByWeightedPairs. I have also

found a python script in the documentation that makes me understand the computation

of the area under ROC if the testing method is CrossValidation. But i couldn't figure out

how ByWeightedPairs works if the testing method is LeaveOneOut. Obviously each fold

in this testing method contains only one item. So paring is not possible.

Is the default method for computing the area under ROC curve not ByWeightedPairs

or behaves ByWeightedPairs differently if we use LeaveOneOut testing?

Thanks in advance

### Re: Area under ROC in multivalued class problems

Computation ByWeightedPair is used, when class variable contains more than one value (like in iris dataset). It specifies that AUC is computed for each pair of the class values (such as setosa vs. versicolor, versicolor vs. virginica and setosa vs. virginica), an weighted sum of computed AUC values is returned.

AUC for cross-validation is computed for each fold independently, and the average score is returned. (if the dataset has a multivalued class variable, the method from the first paragraph is used for each fold). As you have correctly noticed, AUC scores for each fold cannot be computed, when leave one out validation is used (because each fold contains only one example). In that specific case, Orange combines results from all folds and computes AUC on all of them.

If you are interested where this happens in the AUC code, you should see https://bitbucket.org/biolab/orange/src/c4f002bce47b/Orange/evaluation/scoring.py#cl-1725. When a fold contains only one example, is_CDT_empty(cdts[0]) will return True, and code below will compute AUC using all of the examples.

I hope this answers your question.

Anže

AUC for cross-validation is computed for each fold independently, and the average score is returned. (if the dataset has a multivalued class variable, the method from the first paragraph is used for each fold). As you have correctly noticed, AUC scores for each fold cannot be computed, when leave one out validation is used (because each fold contains only one example). In that specific case, Orange combines results from all folds and computes AUC on all of them.

If you are interested where this happens in the AUC code, you should see https://bitbucket.org/biolab/orange/src/c4f002bce47b/Orange/evaluation/scoring.py#cl-1725. When a fold contains only one example, is_CDT_empty(cdts[0]) will return True, and code below will compute AUC using all of the examples.

I hope this answers your question.

Anže

2 posts
• Page

**1**of**1**