The orngMDS module provides the functionality to perform multi dimensional scaling


MDS is the main class for performing multi dimensional scaling


Holds the current configuration of projected points
An orange.SymMatrix that contains the distances that we want to achieve(LSMT changes these)
An orange.SymMatrix that contains the distances between the elements of points
An orange.SymMatrix that contains the original distances
An orange.SymMatrix holding the stress
An integer holding the dimension of the projected space
An integer holding the number of elements
An float holding the average stress in stress
A function that gets called after each optimization step in the run() method


MDS(diss, dim=2, points=None)
Constructor that takes the original (diss)similarity and an optional arguments dim indicating the dimension of the projected space and an initial configuration of points
Computes the distances between points and updates the projectedDistances matrix
Computes the stress between the current projectedDistances and distances matrix using stressFunc and updates the stress matrix and avgStress accordingly
Runs the torgerson algorithm that computes an initial analytical solution of the problem
Kruskal monotone transformation
Performs a single iteration of a Smacof algorithm that optimizes stress and updates the points
run(numIter, stressFunc=SgnRelStress, eps=1e-3, progressCallback=None)
A convenience function that performs optimization until stopping conditions are met. That is eider optimization runs for numIter iteration of SMACOFstep function, or the stress improvement ratio is smaller then eps(oldStress-newStress smaller then oldStress*eps)


MDS scatterplot

In our first example, we will take iris data set, compute the distance between the examples and then run MDS on a distance matrix. This is done by the following code:

part of (uses

import orange import orngMDS data=orange.ExampleTable("../datasets/") euclidean = orange.ExamplesDistanceConstructor_Euclidean(data) distance = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): distance[i, j] = euclidean(data[i], data[j]) mds=orngMDS.MDS(distance)

Notice that we are running MDS through 100 iterations. We will now use matplotlib to plot the data points using the coordinates computed with MDS (you need to install matplotlib, it does not come with Orange). Each data point in iris is classified in one of the three classes, so we will use colors to denote instance's class.

part of (uses

from pylab import * colors = ["red", "yellow", "blue"] points = [] for (i,d) in enumerate(data): points.append((mds.points[i][0], mds.points[i][1], d.getclass())) for c in range(len(data.domain.classVar.values)): sel = filter(lambda x: x[-1]==c, points) x = [s[0] for s in sel] y = [s[1] for s in sel] scatter(x, y, c=colors[c]) show()

Executing the above script pops-up a pylab window with the following scatterplot:

Iris is a relatively simple data set with respect to classification, and to no surprise we see that MDS found such instance placement in 2-D where instances of different class are well separated. Notice also that MDS does this with no knowledge on the instance class.

A more advanced example

We are going to write a script that is similar to the functionality of the method, but performs 10 steps of Smacof optimization before computing the stress. This is suitable if you have a large dataset and want to save some time. First we load the data and compute the distance matrix (just like in our previous example). (uses

import orange import orngMDS import math data=orange.ExampleTable("../datasets/") dist = orange.ExamplesDistanceConstructor_Euclidean(data) matrix = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): matrix[i, j] = dist(data[i], data[j])

Then we construct the MDS instance and perform the initial torgerson approximation, after which we update the stress matrix using the orngMDS.KruskalStress function.

mds=orngMDS.MDS(matrix) mds.Torgerson() mds.getStress(orngMDS.KruskalStress)

And finally the main optimization loop, after which we print the projected points along with the data

i=0 while 100>i: i+=1 oldStress=mds.avgStress for j in range(10): mds.SMACOFstep() mds.getStress(orngMDS.KruskalStress) if oldStress*1e-3 > math.fabs(oldStress-mds.avgStress): break; for (p, e) in zip(mds.points, data): print p, e

Stress function

StressFunction computes the stress between two points


__call__(correct, current, weight=1.0)
computes the stress using the correct and the current distance value(the distances and projectedDistances elements)

The orngMDS module provides 4 stress functions

  • orngMDS.SgnRelStress
  • orngMDS.KruskalStress
  • orngMDS.SammonStress
  • orngMDS.SgnSammonStress