Orange Forum • View topic - Problem with labels returned by classifier - #RNGE label

Problem with labels returned by classifier - #RNGE label

A place to ask questions about methods in Orange and how they are used and other general support.

Problem with labels returned by classifier - #RNGE label

Postby michaelpl » Wed May 18, 2011 11:57

Hi,

I wrote the following short piece of code:
Code: Select all
import orange
import numpy as np
from sets import Set

#prepare train data:
vals = []
etiqs = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
for i in range(12):
    vals.append([2 * i, 4 - i / 10.0, 5.7])

print "vals:", vals
print "etiqs:", etiqs

#train orange:
data = map(lambda a: a[0] + [a[1]], zip(vals, etiqs))
#make numpy array
num_data = np.array(data)
#name columns:
columns = tuple(["col" + str(i) for i in range(len(vals[0]))])#"col1", "col2", "col3"
classValues = tuple(map(lambda a: str(a), list(Set(etiqs))))
domain = orange.Domain(map(orange.FloatVariable, columns),
       orange.EnumVariable("type", values=classValues))
exmpl_data = orange.ExampleTable(domain, num_data)

tab = orange.ExampleTable(domain, exmpl_data)
classif_orang = orange.kNNLearner(tab, 1)

#test:
for i in range(20):
    j = [2 * i, 4 - i / 20.0, 4.2]
    print "classiffying element:", j
    print "orange:", classif_orang(orange.Example(domain, j+[0]))


Which gives the following results:
Code: Select all
vals: [[0, 4.0, 5.7000000000000002], [2, 3.8999999999999999, 5.7000000000000002], [4, 3.7999999999999998, 5.7000000000000002], [6, 3.7000000000000002, 5.7000000000000002], [8, 3.6000000000000001, 5.7000000000000002], [10, 3.5, 5.7000000000000002], [12, 3.3999999999999999, 5.7000000000000002], [14, 3.2999999999999998, 5.7000000000000002], [16, 3.2000000000000002, 5.7000000000000002], [18, 3.1000000000000001, 5.7000000000000002], [20, 3.0, 5.7000000000000002], [22, 2.8999999999999999, 5.7000000000000002]]
etiqs: [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
classiffying element: [0, 4.0, 4.2000000000000002]
orange: 2
classiffying element: [2, 3.9500000000000002, 4.2000000000000002]
orange: 2
classiffying element: [4, 3.8999999999999999, 4.2000000000000002]
orange: 2
classiffying element: [6, 3.8500000000000001, 4.2000000000000002]
orange: 2
classiffying element: [8, 3.7999999999999998, 4.2000000000000002]
orange: 3
classiffying element: [10, 3.75, 4.2000000000000002]
orange: 3
classiffying element: [12, 3.7000000000000002, 4.2000000000000002]
orange: 3
classiffying element: [14, 3.6499999999999999, 4.2000000000000002]
orange: 3
classiffying element: [16, 3.6000000000000001, 4.2000000000000002]
orange: 4
classiffying element: [18, 3.5499999999999998, 4.2000000000000002]
orange: 4
classiffying element: [20, 3.5, 4.2000000000000002]
orange: 4
classiffying element: [22, 3.4500000000000002, 4.2000000000000002]
orange: 4
classiffying element: [24, 3.3999999999999999, 4.2000000000000002]
orange: #RNGE
classiffying element: [26, 3.3500000000000001, 4.2000000000000002]
orange: #RNGE
classiffying element: [28, 3.2999999999999998, 4.2000000000000002]
orange: #RNGE
classiffying element: [30, 3.25, 4.2000000000000002]
orange: #RNGE
classiffying element: [32, 3.2000000000000002, 4.2000000000000002]
orange: #RNGE
classiffying element: [34, 3.1499999999999999, 4.2000000000000002]
orange: #RNGE
classiffying element: [36, 3.1000000000000001, 4.2000000000000002]
orange: #RNGE
classiffying element: [38, 3.0499999999999998, 4.2000000000000002]
orange: #RNGE


I am confused about the #RNGE returned by this 1NN classifier. What does it mean? What am I doing wrong?

Cheers!

Re: Problem with labels returned by classifier - #RNGE label

Postby Ales » Thu May 19, 2011 9:57

'#RNGE' is a string that is printed by orange.Value for orange.EnumVariable when values index into variables values list is out of bounds. Note that the values of EnumVariables are arbitrary strings and when constructing an orange.ExampleTable numerical values are interpreted as indexes into this list of values.

In your case you are passing values in the range [1..4] which map to [classValues[1]..classValues[4]] instead of [classValues[0]..classValues[3]].
Just substract 1 from the class column of your data to make it work properly.
Code: Select all
num_data[:, -1] -= 1

Re: Problem with labels returned by classifier - #RNGE label

Postby michaelpl » Fri May 20, 2011 0:07

Thank you Ales, that helped. However, here comes the problem I thought was in my earlier post (in earlier topic), where I thought Orange makes some kind of normalization of data.

Here is a modified piece of code. On the same input data I used: KNN Orange classifier, KNN MLPy classifier and my own KNN classifier. The results given by Orange on the boundaries differ from those given by MLPy and the one written by me.

Code: Select all
import orange
import numpy as np
import mlpy
from sets import Set
import math

def my_knn(vals_train, etiqs_train, inp):
    error = [-1, 10000]#etiquette, min_error
    #iterate through training data:
    for v, e in zip(vals_train, etiqs_train):
        curr_error = [e, 0]
        #calculate error:
        for etrain, einp in zip(v, inp):
            curr_error[1] += abs(einp - etrain)
        #check if it is minimum:
        error = min(curr_error, error, key = lambda l: l[1])
    return error[0]

#prepare train data:
vals = []
etiqs = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
for i in range(12):
    vals.append([2 * i, 4 - i / 10.0, 5.7])

print "vals:", vals
print "etiqs:", etiqs

#train orange:
data = map(lambda a: a[0] + [a[1]], zip(vals, etiqs))
#make numpy array
num_data = np.array(data)
#name columns:
columns = tuple(["col" + str(i) for i in range(len(vals[0]))])
classValues = tuple(map(lambda a: str(a), list(Set(etiqs))))
domain = orange.Domain(map(orange.FloatVariable, columns),
       orange.EnumVariable("type", values=classValues))
num_data[:, -1] -= 1
exmpl_data = orange.ExampleTable(domain, num_data)
tab = orange.ExampleTable(domain, exmpl_data)
classif_orang = orange.kNNLearner(tab, 1)

#train mlpy:
classif_mlpy = mlpy.Knn(k=1, dist = 'e')
arr = np.array(vals)
etiqs_np = np.array(etiqs)
classif_mlpy.compute(arr, etiqs_np)

#test:
for i in range(20):
    j = [2 * i, 4 - i / 20.0, 4.2]
    print "classiffying element:", j
    print "orange:", classif_orang(orange.Example(domain, j+[0]))
    xtr = np.array(j)
    print "mlpy:", classif_mlpy.predict(xtr)
    print "my knn:", my_knn(vals, etiqs, j)


Code: Select all
vals: [[0, 4.0, 5.7000000000000002], [2, 3.8999999999999999, 5.7000000000000002], [4, 3.7999999999999998, 5.7000000000000002], [6, 3.7000000000000002, 5.7000000000000002], [8, 3.6000000000000001, 5.7000000000000002], [10, 3.5, 5.7000000000000002], [12, 3.3999999999999999, 5.7000000000000002], [14, 3.2999999999999998, 5.7000000000000002], [16, 3.2000000000000002, 5.7000000000000002], [18, 3.1000000000000001, 5.7000000000000002], [20, 3.0, 5.7000000000000002], [22, 2.8999999999999999, 5.7000000000000002]]
etiqs: [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
classiffying element: [0, 4.0, 4.2000000000000002]
orange: 1
mlpy: 1
my knn: 1
classiffying element: [2, 3.9500000000000002, 4.2000000000000002]
orange: 1
mlpy: 1
my knn: 1
classiffying element: [4, 3.8999999999999999, 4.2000000000000002]
orange: 1
mlpy: 1
my knn: 1
classiffying element: [6, 3.8500000000000001, 4.2000000000000002]
orange: 1
mlpy: 2
my knn: 2
classiffying element: [8, 3.7999999999999998, 4.2000000000000002]
orange: 2
mlpy: 2
my knn: 2
classiffying element: [10, 3.75, 4.2000000000000002]
orange: 2
mlpy: 2
my knn: 2
classiffying element: [12, 3.7000000000000002, 4.2000000000000002]
orange: 2
mlpy: 3
my knn: 3
classiffying element: [14, 3.6499999999999999, 4.2000000000000002]
orange: 2
mlpy: 3
my knn: 3
classiffying element: [16, 3.6000000000000001, 4.2000000000000002]
orange: 3
mlpy: 3
my knn: 3
classiffying element: [18, 3.5499999999999998, 4.2000000000000002]
orange: 3
mlpy: 4
my knn: 4
classiffying element: [20, 3.5, 4.2000000000000002]
orange: 3
mlpy: 4
my knn: 4
classiffying element: [22, 3.4500000000000002, 4.2000000000000002]
orange: 3
mlpy: 4
my knn: 4
classiffying element: [24, 3.3999999999999999, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4
classiffying element: [26, 3.3500000000000001, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4
classiffying element: [28, 3.2999999999999998, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4
classiffying element: [30, 3.25, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4
classiffying element: [32, 3.2000000000000002, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4
classiffying element: [34, 3.1499999999999999, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4
classiffying element: [36, 3.1000000000000001, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4
classiffying element: [38, 3.0499999999999998, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: 4


Do you know what the problem is?

Re: Problem with labels returned by classifier - #RNGE label

Postby Ales » Fri May 20, 2011 9:59

My guess is, it is due to precision. Orange uses c++ single precision floats, Python uses double precision.

Edit: The differences seem to be to big for precision errors.

Re: Problem with labels returned by classifier - #RNGE label

Postby michaelpl » Mon May 23, 2011 1:22

I think so as well.. Do you have any other guesses?

Cheers,
Michal

Re: Problem with labels returned by classifier - #RNGE label

Postby Janez » Mon May 23, 2011 10:10

If there are two (or more) training examples at the same distance from the example you are classifying, you take the first example you encounter, right? Orange will pick one at random - but always the same one, so that the result is random but deterministic. I guess this is where the difference comes from. This is easy to check - just change your knn to print out something when it has more than one closest neighbour.

Re: Problem with labels returned by classifier - #RNGE label

Postby michaelpl » Mon May 23, 2011 12:10

Hi,

thanks for another shot;).

I'm afraid this is not the case. As you suggested I changed my_knn method to return many classes if there are samples of same distance:
Code: Select all
def my_knn(vals_train, etiqs_train, inp):
    error = [[-1, 10000]]#etiquette, min_error
    #iterate through training data:
    for v, e in zip(vals_train, etiqs_train):
        curr_error = [e, 0]
        #calculate error:
        for etrain, einp in zip(v, inp):
            curr_error[1] += abs(einp - etrain)
        #check if it is minimum:
        #error = min(curr_error, error, key = lambda l: l[1])
        if(error[0][1] == curr_error[1]):
            error.append(curr_error)
        elif(error[0][1] > curr_error[1]):
            error = [curr_error]
    return [error_single[0] for error_single in error]


I also added 1 more etiquette with one sample, so that to be sure one test sample is of the same distance from 2 training samples of different etiquettes. The code now looks like this:

Code: Select all
import orange
import numpy as np
import mlpy
from sets import Set
import math

def my_knn(vals_train, etiqs_train, inp):
    error = [[-1, 10000]]#etiquette, min_error
    #iterate through training data:
    for v, e in zip(vals_train, etiqs_train):
        curr_error = [e, 0]
        #calculate error:
        for etrain, einp in zip(v, inp):
            curr_error[1] += abs(einp - etrain)
        #check if it is minimum:
        #error = min(curr_error, error, key = lambda l: l[1])
        if(error[0][1] == curr_error[1]):
            error.append(curr_error)
        elif(error[0][1] > curr_error[1]):
            error = [curr_error]
    return [error_single[0] for error_single in error]

#prepare train data:
vals = []
etiqs = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5]
for i in range(12):
    vals.append([2 * i, 4 - i / 10.0, 5.7])

vals.append([2 * 1, 4 - 1 / 10.0, 2.7])

print "vals:", vals
print "etiqs:", etiqs

#train orange:
data = map(lambda a: a[0] + [a[1]], zip(vals, etiqs))
#make numpy array
num_data = np.array(data)
#name columns:
columns = tuple(["col" + str(i) for i in range(len(vals[0]))])
classValues = tuple(map(lambda a: str(a), list(Set(etiqs))))
domain = orange.Domain(map(orange.FloatVariable, columns),
       orange.EnumVariable("type", values=classValues))
num_data[:, -1] -= 1
exmpl_data = orange.ExampleTable(domain, num_data)
tab = orange.ExampleTable(domain, exmpl_data)
classif_orang = orange.kNNLearner(tab, 1)

#train mlpy:
classif_mlpy = mlpy.Knn(k=1, dist = 'e')
arr = np.array(vals)
etiqs_np = np.array(etiqs)
classif_mlpy.compute(arr, etiqs_np)

#test:
for i in range(20):
    j = [2 * i, 4 - i / 20.0, 4.2]
    print "classiffying element:", j
    print "orange:", classif_orang(orange.Example(domain, j+[0]))
    xtr = np.array(j)
    print "mlpy:", classif_mlpy.predict(xtr)
    print "my knn:", my_knn(vals, etiqs, j)


Anfortunately, the problem remains the same:

Code: Select all
vals: [[0, 4.0, 5.7000000000000002], [2, 3.8999999999999999, 5.7000000000000002], [4, 3.7999999999999998, 5.7000000000000002], [6, 3.7000000000000002, 5.7000000000000002], [8, 3.6000000000000001, 5.7000000000000002], [10, 3.5, 5.7000000000000002], [12, 3.3999999999999999, 5.7000000000000002], [14, 3.2999999999999998, 5.7000000000000002], [16, 3.2000000000000002, 5.7000000000000002], [18, 3.1000000000000001, 5.7000000000000002], [20, 3.0, 5.7000000000000002], [22, 2.8999999999999999, 5.7000000000000002], [2, 3.8999999999999999, 2.7000000000000002]]
etiqs: [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5]
classiffying element: [0, 4.0, 4.2000000000000002]
orange: 1
mlpy: 1
my knn: [1]
classiffying element: [2, 3.9500000000000002, 4.2000000000000002]
orange: 5
mlpy: 5
my knn: [1, 5]
classiffying element: [4, 3.8999999999999999, 4.2000000000000002]
orange: 5
mlpy: 1
my knn: [1]
classiffying element: [6, 3.8500000000000001, 4.2000000000000002]
orange: 1
mlpy: 2
my knn: [2]
classiffying element: [8, 3.7999999999999998, 4.2000000000000002]
orange: 2
mlpy: 2
my knn: [2]
classiffying element: [10, 3.75, 4.2000000000000002]
orange: 2
mlpy: 2
my knn: [2]
classiffying element: [12, 3.7000000000000002, 4.2000000000000002]
orange: 2
mlpy: 3
my knn: [3]
classiffying element: [14, 3.6499999999999999, 4.2000000000000002]
orange: 2
mlpy: 3
my knn: [3]
classiffying element: [16, 3.6000000000000001, 4.2000000000000002]
orange: 3
mlpy: 3
my knn: [3]
classiffying element: [18, 3.5499999999999998, 4.2000000000000002]
orange: 3
mlpy: 4
my knn: [4]
classiffying element: [20, 3.5, 4.2000000000000002]
orange: 3
mlpy: 4
my knn: [4]
classiffying element: [22, 3.4500000000000002, 4.2000000000000002]
orange: 3
mlpy: 4
my knn: [4]
classiffying element: [24, 3.3999999999999999, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]
classiffying element: [26, 3.3500000000000001, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]
classiffying element: [28, 3.2999999999999998, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]
classiffying element: [30, 3.25, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]
classiffying element: [32, 3.2000000000000002, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]
classiffying element: [34, 3.1499999999999999, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]
classiffying element: [36, 3.1000000000000001, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]
classiffying element: [38, 3.0499999999999998, 4.2000000000000002]
orange: 4
mlpy: 4
my knn: [4]


Maybe I specified something wrong in the arguments of orange knn? Can you see it?

Cheers!

Re: Problem with labels returned by classifier - #RNGE label

Postby Anze » Tue May 24, 2011 13:54

If no Distance Estimator is specified, Orange uses ExamplesDistanceConstructor_Euclidean by default.

ExamplesDistanceConstructor_Euclidean calculates distances using the following formula
Code: Select all
sqrt(apply(add, [x*x for x in dist]))

where dist is the result of ExamplesDistance_Normalized.attributeDistances.

k-NN learner does not preprocess data, but distance function that is used by default, does. As far as I know, Orange does not provide Euclidean distance that doesn't normalize data.

You can however provide your own implementation of distance function
Code: Select all
class EuclideanDistance(orange.ExamplesDistance):
    def __call__(self, a, b):
        import math
        return math.sqrt(sum((a[attr]-b[attr])**2 for attr in a.domain.attributes))

class EuclideanDistance_Constructor(orange.ExamplesDistanceConstructor):
    def __call__(self, *args, **kwds):
        return EuclideanDistance()

classif_orang = orange.kNNLearner(tab, 1, distance_constructor=EuclideanDistance_Constructor())


Return to Questions & Support



cron