Outlier detection (outliers)¶
- class Orange.data.outliers.OutlierDetection¶
A class for detecting outliers.
It calculates average distances of each example to other examples and converts them to Z-scores. Z-scores higher than zero denote an example that is more distant to other examples than average.
Detection of outliers can be performed directly on examples or on an existant distance matrix. Also, the number of nearest neighbours used for averaging distances can be set. The default 0 means that all examples are used when calculating average distances.
Return the distance matrix of the dataset.
Set the distance matrix on which the outlier detection will be performed.
- set_examples(examples, distance=None)¶
Set examples on which the outlier detection will be performed. Distance is a distance constructor for distances between examples. If omitted, Manhattan distance is used.
Set the number of nearest neighbours considered in determinating.
Return a list of Z values of average distances for each element to others. N-th number in the list is the Z-value of N-th example.
The following example prints a list of Z-values of examples in bridges dataset (outlier1.py).
import Orange bridges = Orange.data.Table("bridges") outlierDet = Orange.data.outliers.OutlierDetection() outlierDet.set_examples(bridges) print ", ".join("%.8f" % val for val in outlierDet.z_values())
The following example prints 5 examples with highest Z-scores. Euclidean distance is used as a distance measurement and average distance is calculated over 3 nearest neighbours (outlier2.py).
import Orange bridges = Orange.data.Table("bridges") outlier_det = Orange.data.outliers.OutlierDetection() outlier_det.set_examples(bridges, Orange.distance.Euclidean(bridges)) outlier_det.set_knn(3) z_values = outlier_det.z_values() for ex, zv in sorted(zip(bridges, z_values), key=lambda x: x)[-5:]: print ex, "Z-score: %5.3f" % zv
['M', 1838, 'HIGHWAY', ?, 2, 'N', 'THROUGH', 'WOOD', '?', 'S', 'WOOD'] Z-score: 1.732 ['M', 1818, 'HIGHWAY', ?, 2, 'N', 'THROUGH', 'WOOD', 'SHORT', 'S', 'WOOD'] Z-score: 1.732 ['A', 1853, 'RR', ?, 2, 'N', 'DECK', 'WOOD', '?', 'S', 'WOOD'] Z-score: 1.732 ['A', 1829, 'AQUEDUCT', ?, 1, 'N', 'THROUGH', 'WOOD', '?', 'S', 'WOOD'] Z-score: 1.733 ['A', 1848, 'AQUEDUCT', ?, 1, 'N', 'DECK', 'WOOD', '?', 'S', 'WOOD'] Z-score: 1.733