Orange Forum • View topic - Use of One-Class SVM

Use of One-Class SVM

A place to ask questions about methods in Orange and how they are used and other general support.

Use of One-Class SVM

Postby Feanor76 » Wed Aug 29, 2012 17:59

Hi folks,

I've been away from orange for a long time, but I'm very happy with its continued success. I'm now exploring some uses for a one-class SVM and I can't quite get the results I expect. As a warning, my expectations may be wrong or misguided. I made sample code that follows this message.

When I run it and vary the value of "nu", I can see how testing on the training data results in different proportions of training cases being classified as normal or outlier (incidentally, is 1.0 the "normal" and -1.0 "abnormal/outlier"?). However, for two handcrafted test points (that aren't in the training data), they both are always labelled -1.0. One is chosen from the center of the training data (should be the mean) and one is chosen far outside the training data. Perhaps an alternative kernel choice would help?

In my real use case, the data are discrete tokens from a vocabulary. I don't know if that matters to this example.

Any comments or suggestions would be most welcome.

Thanks,
Mark

Code: Select all
import numpy as np
import Orange
from Orange.classification import svm

def randomUniformLike(shape, low=-0.01, high=0.01):
    return np.random.uniform(low = low,
                             high = high,
                             size=shape)

NUM_FTRS = 5
NUM_CASES = 1000
trainDataArray = randomUniformLike((NUM_CASES, NUM_FTRS), 0, 5)

makeVar = Orange.feature.Continuous
vars = [makeVar('var%i' % x) for x in range(NUM_FTRS)]
d = Orange.data.Domain(vars)

trainData = Orange.data.Table(d, trainDataArray)

learnerArgs = {"svm_type" : svm.SVMLearner.OneClass,
               "nu" : .9999}
oLearner = svm.SVMLearner(**learnerArgs)
oClassifier = oLearner(trainData)

# "test" on the training data
print sum(oClassifier(case) for case in trainData) / NUM_CASES

newCase = [100.0] * NUM_FTRS
print "on an outlier:", oClassifier(newCase)

newCase = [2.5] * NUM_FTRS
print "on an inlier:", oClassifier(newCase)

Re: Use of One-Class SVM

Postby Ales » Thu Aug 30, 2012 10:12

Using normal python lists, confuses the domain translation system in this case (I remember having run into this problem before). In fact I would advise against using plain lists as arguments for classifiers.

Instead use explicit Orange.data.Instance objects with the original training domain.
Code: Select all
newCase = Orange.data.Instance(trainData.domain, [2.5] * NUM_FTRS)

Re: Use of One-Class SVM

Postby Feanor76 » Thu Aug 30, 2012 11:54

Ales,

Thank you! I would not have thought of that as an issue. I am very glad you knew of that problem and were able to share it with me. That could have been very hard to debug.

Best,
Mark


Return to Questions & Support