Orange Forum • View topic - orange.ContingencyAttrClass index out of range

orange.ContingencyAttrClass index out of range

A place to ask questions about methods in Orange and how they are used and other general support.

orange.ContingencyAttrClass index out of range

Postby lefterav » Fri Jun 29, 2012 16:38

Hi, I have trained many classifiers given a training set of a few thousand instances. When asking the classifiers to classify a test set of instances, in a new .tab file, it works for most classifiers. Unfortunately, only when it comes to Bayes Classifier, I get this error:

Code: Select all
file ~/lib/python2.7/site-packages/Orange/classification/bayes.py", line 151, in __call__
    return self.native_bayes_classifier(instance, result_type, *args, **kwdargs)
orange.KernelException: 'orange.ContingencyAttrClass': index 2 is out of range 0-1


Does it have to do with the fact that an attribute of the test set (discrete?), has a value which was not seen in the training set? How can I fix that? Shouldn't the domain of the classifier be "translated" automatically, to fit the test set?

Re: orange.ContingencyAttrClass index out of range

Postby Ales » Mon Jul 02, 2012 9:57

lefterav wrote:Does it have to do with the fact that an attribute of the test set (discrete?), has a value which was not seen in the training set?

Most likely.
lefterav wrote:How can I fix that?

You can list all of the values for a discrete variable in the .tab file even if the value i not used in the data itself. For instance
Code: Select all
my_var
value1 value2 value3

value1
value2
Note the values in the declaration are separated by spaces.

Re: orange.ContingencyAttrClass index out of range

Postby lefterav » Mon Jul 02, 2012 13:04

Thanks. I understand the nature of the problem. Though, it seems that it is not always possible during training to predict the values of some discrete features. Of course, the solution would be to filter out these out-of-domain values of the particular instances, since they would anyway be useless for the testing (the classifier doesn't know how to treat these values anyway).

I think I could handle this programmatically. The question is :
1) does the classifier store in its object, the exact domain it has been trained on? or should I retain it from the training data set?
2) The problem is rather why this particular classifier gives this error, but similar set works fine with other classifiers? Is this maybe a bug?

Re: orange.ContingencyAttrClass index out of range

Postby djwonk » Sat Nov 10, 2012 17:43

lefterav: Your points seem valid to me. I'm also running into this.

What work-around did you go with?

Unless I'm misunderstanding something, the default behavior of a classifier *should be* to tolerate and ignore discrete values it has not seen before.

Re: orange.ContingencyAttrClass index out of range

Postby lefterav » Sun Jun 30, 2013 2:30

This is the function I used to fix the problem

Code: Select all

#myclassifier holds the classifier you trained
#orange_table is your orange data table 

discrete_features = [feature.Descriptor.make(feat.name,feat.var_type,[],feat.values,0) for feat in myclassifier.domain.features if isinstance(feat, feature.Discrete)]

def clean_discrete_features(orange_table):
        #kill instances that do not fit training data
        classifier_discrete_features = discrete_features
        logging.debug(len(orange_table))
        i = 0
        k = 0
        for feat, status in classifier_discrete_features:
            classifier_feat_values = set([val for val in feat.values])
            table_feat_values = set([val for val in orange_table.domain[feat.name].values])
            missing_values = table_feat_values - classifier_feat_values
           
            if not missing_values:
                continue
           
            modus = distribution.Distribution(feat.name, orange_table).modus()
            instances = set(orange_table.filter_ref({feat.name:list(missing_values)}))
            for inst in instances:
                inst[feat.name] = modus
           
            i+=len(instances)
            k+=1   
        sys.stderr.write("Warning: Reset {} appearances of {} discrete attributes\n".format(i, k))
        return orange_table


I cannot check against the original Naive Bayes code of orange, if something similar exists, but I would strongly recommend to add something similar to this...


Return to Questions & Support



cron