Postby defig » Sat Feb 09, 2008 5:59

is there any inconsistency check for the dataset?

inconsistency is defined as same input but different output. for instance, [[1 1] 1] vs [[1 1] 2]


Postby Janez » Tue Mar 11, 2008 20:55

This is overkill and it only works if all attributes are discrete.

orange.IMBySorting(data, []).fuzzy()

If this returns 1, the data set is inconsistent.

You may also want to do it yourself, in Python

import itertools
inconsistent = bool(filter(None,
    (list(ex1)[:-1]==list(ex2)[:-1] and ex1.getclass()!=ex2.getclass()
     for ex1, ex2 in itertools.izip(data[:-1], data[1:]))))

This sorts the examples, constructs a list of True's, one for each consecutive pair with same values of attributes and different classes, and casts it into a Boolean to check whether it is empty or not.

