# orngMDS

The orngMDS module provides the functionality to perform multi dimensional scaling

## MDS

Attributes

- points
- Holds the current configuration of projected points
- distances
- An orange.SymMatrix that contains the distances that we want to achieve(
`LSMT`

changes these) - projectedDistances
- An orange.SymMatrix that contains the distances between the elements of
`points`

- originalDistances
- An orange.SymMatrix that contains the original distances
- stress
- An orange.SymMatrix holding the stress
- dim
- An integer holding the dimension of the projected space
- n
- An integer holding the number of elements
- avgStress
- An float holding the average stress in
`stress`

- progressCallback
- A function that gets called after each optimization step in the
`run()`

method

Methods

- MDS(diss, dim=2, points=None)
- Constructor that takes the original (diss)similarity and an optional arguments
`dim`

indicating the dimension of the projected space and an initial configuration of`points`

- getDistance()
- Computes the distances between
`points`

and updates the`projectedDistances`

matrix - getStress(stressFunc=orngMDS.SgnRelStress)
- Computes the stress between the current
`projectedDistances`

and`distances`

matrix using`stressFunc`

and updates the`stress`

matrix and`avgStress`

accordingly - Torgerson()
- Runs the torgerson algorithm that computes an initial analytical solution of the problem
- LSMT()
- Kruskal monotone transformation
- SMACOFstep()
- Performs a single iteration of a Smacof algorithm that optimizes stress and updates the
`points`

- run(numIter, stressFunc=SgnRelStress, eps=1e-3, progressCallback=None)
- A convenience function that performs optimization until stopping conditions are met. That is eider optimization runs for
`numIter`

iteration of SMACOFstep function, or the stress improvement ratio is smaller then`eps`

(oldStress-newStress smaller then oldStress*eps) -
## Examples

### MDS scatterplot

In our first example, we will take iris data set, compute the distance between the examples and then run MDS on a distance matrix. This is done by the following code:

part of mds2.py (uses iris.tab)

import orange import orngMDS data=orange.ExampleTable("../datasets/iris.tab") euclidean = orange.ExamplesDistanceConstructor_Euclidean(data) distance = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): distance[i, j] = euclidean(data[i], data[j]) mds=orngMDS.MDS(distance) mds.run(100) Notice that we are running MDS through 100 iterations. We will now use matplotlib to plot the data points using the coordinates computed with MDS (you need to install matplotlib, it does not come with Orange). Each data point in iris is classified in one of the three classes, so we will use colors to denote instance's class.

part of mds2.py (uses iris.tab)

from pylab import * colors = ["red", "yellow", "blue"] points = [] for (i,d) in enumerate(data): points.append((mds.points[i][0], mds.points[i][1], d.getclass())) for c in range(len(data.domain.classVar.values)): sel = filter(lambda x: x[-1]==c, points) x = [s[0] for s in sel] y = [s[1] for s in sel] scatter(x, y, c=colors[c]) show() Executing the above script pops-up a pylab window with the following scatterplot:

Iris is a relatively simple data set with respect to classification, and to no surprise we see that MDS found such instance placement in 2-D where instances of different class are well separated. Notice also that MDS does this with no knowledge on the instance class.

### A more advanced example

We are going to write a script that is similar to the functionality of the orngMDS.run method, but performs 10 steps of Smacof optimization before computing the stress. This is suitable if you have a large dataset and want to save some time. First we load the data and compute the distance matrix (just like in our previous example).

import orange import orngMDS import math data=orange.ExampleTable("../datasets/iris.tab") dist = orange.ExamplesDistanceConstructor_Euclidean(data) matrix = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): matrix[i, j] = dist(data[i], data[j]) Then we construct the MDS instance and perform the initial torgerson approximation, after which we update the stress matrix using the orngMDS.KruskalStress function.

mds=orngMDS.MDS(matrix) mds.Torgerson() mds.getStress(orngMDS.KruskalStress) And finally the main optimization loop, after which we print the projected points along with the data

i=0 while 100>i: i+=1 oldStress=mds.avgStress for j in range(10): mds.SMACOFstep() mds.getStress(orngMDS.KruskalStress) if oldStress*1e-3 > math.fabs(oldStress-mds.avgStress): break; for (p, e) in zip(mds.points, data): print p, e ## Stress function

`StressFunction`

computes the stress between two pointsMethods

- __call__(correct, current, weight=1.0)
- computes the stress using the correct and the current distance value(the
`distances`

and`projectedDistances`

elements) - orngMDS.SgnRelStress
- orngMDS.KruskalStress
- orngMDS.SammonStress
- orngMDS.SgnSammonStress

The orngMDS module provides 4 stress functions