Orange Forum • View topic - inconsistency check for dataset

inconsistency check for dataset

A place to ask questions about methods in Orange and how they are used and other general support.

inconsistency check for dataset

Postby defig » Sat Feb 09, 2008 5:59

is there any inconsistency check for the dataset?

inconsistency is defined as same input but different output. for instance, [[1 1] 1] vs [[1 1] 2]

thanks

Postby Janez » Tue Mar 11, 2008 20:55

This is overkill and it only works if all attributes are discrete.

Code: Select all
orange.IMBySorting(data, []).fuzzy()


If this returns 1, the data set is inconsistent.

You may also want to do it yourself, in Python

Code: Select all
data.sort()
import itertools
inconsistent = bool(filter(None,
    (list(ex1)[:-1]==list(ex2)[:-1] and ex1.getclass()!=ex2.getclass()
     for ex1, ex2 in itertools.izip(data[:-1], data[1:]))))


This sorts the examples, constructs a list of True's, one for each consecutive pair with same values of attributes and different classes, and casts it into a Boolean to check whether it is empty or not.


Return to Questions & Support