Orange

orngMDS

The orngMDS module provides the functionality to perform multi dimensional scaling

MDS

MDS is the main class for performing multi dimensional scaling

Attributes

points
Holds the current configuration of projected points
distances
An orange.SymMatrix that contains the distances that we want to achieve(LSMT changes these)
projectedDistances
An orange.SymMatrix that contains the distances between the elements of points
originalDistances
An orange.SymMatrix that contains the original distances
stress
An orange.SymMatrix holding the stress
dim
An integer holding the dimension of the projected space
n
An integer holding the number of elements
avgStress
An float holding the average stress in stress
progressCallback
A function that gets called after each optimization step in the run() method

Methods

MDS(diss, dim=2, points=None)
Constructor that takes the original (diss)similarity and an optional arguments dim indicating the dimension of the projected space and an initial configuration of points
getDistance()
Computes the distances between points and updates the projectedDistances matrix
getStress(stressFunc=orngMDS.SgnRelStress)
Computes the stress between the current projectedDistances and distances matrix using stressFunc and updates the stress matrix and avgStress accordingly
Torgerson()
Runs the torgerson algorithm that computes an initial analytical solution of the problem
LSMT()
Kruskal monotone transformation
SMACOFstep()
Performs a single iteration of a Smacof algorithm that optimizes stress and updates the points
run(numIter, stressFunc=SgnRelStress, eps=1e-3, progressCallback=None)
A convenience function that performs optimization until stopping conditions are met. That is eider optimization runs for numIter iteration of SMACOFstep function, or the stress improvement ratio is smaller then eps(oldStress-newStress smaller then oldStress*eps)

Examples

MDS scatterplot

In our first example, we will take iris data set, compute the distance between the examples and then run MDS on a distance matrix. This is done by the following code:

part of mds2.py (uses iris.tab)

import orange import orngMDS data=orange.ExampleTable("../datasets/iris.tab") euclidean = orange.ExamplesDistanceConstructor_Euclidean(data) distance = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): distance[i, j] = euclidean(data[i], data[j]) mds=orngMDS.MDS(distance) mds.run(100)

Notice that we are running MDS through 100 iterations. We will now use matplotlib to plot the data points using the coordinates computed with MDS (you need to install matplotlib, it does not come with Orange). Each data point in iris is classified in one of the three classes, so we will use colors to denote instance's class.

part of mds2.py (uses iris.tab)

from pylab import * colors = ["red", "yellow", "blue"] points = [] for (i,d) in enumerate(data): points.append((mds.points[i][0], mds.points[i][1], d.getclass())) for c in range(len(data.domain.classVar.values)): sel = filter(lambda x: x[-1]==c, points) x = [s[0] for s in sel] y = [s[1] for s in sel] scatter(x, y, c=colors[c]) show()

Executing the above script pops-up a pylab window with the following scatterplot:

Iris is a relatively simple data set with respect to classification, and to no surprise we see that MDS found such instance placement in 2-D where instances of different class are well separated. Notice also that MDS does this with no knowledge on the instance class.

A more advanced example

We are going to write a script that is similar to the functionality of the orngMDS.run method, but performs 10 steps of Smacof optimization before computing the stress. This is suitable if you have a large dataset and want to save some time. First we load the data and compute the distance matrix (just like in our previous example).

mds1.py (uses iris.tab)

import orange import orngMDS import math data=orange.ExampleTable("../datasets/iris.tab") dist = orange.ExamplesDistanceConstructor_Euclidean(data) matrix = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): matrix[i, j] = dist(data[i], data[j])

Then we construct the MDS instance and perform the initial torgerson approximation, after which we update the stress matrix using the orngMDS.KruskalStress function.

mds=orngMDS.MDS(matrix) mds.Torgerson() mds.getStress(orngMDS.KruskalStress)

And finally the main optimization loop, after which we print the projected points along with the data

i=0 while 100>i: i+=1 oldStress=mds.avgStress for j in range(10): mds.SMACOFstep() mds.getStress(orngMDS.KruskalStress) if oldStress*1e-3 > math.fabs(oldStress-mds.avgStress): break; for (p, e) in zip(mds.points, data): print p, e

Stress function

StressFunction computes the stress between two points

Methods

__call__(correct, current, weight=1.0)
computes the stress using the correct and the current distance value(the distances and projectedDistances elements)

The orngMDS module provides 4 stress functions

  • orngMDS.SgnRelStress
  • orngMDS.KruskalStress
  • orngMDS.SammonStress
  • orngMDS.SgnSammonStress