orngMDS
The orngMDS module provides the functionality to perform multi dimensional scaling
MDS
Attributes
- points
- Holds the current configuration of projected points
- distances
- An orange.SymMatrix that contains the distances that we want to achieve(
LSMTchanges these) - projectedDistances
- An orange.SymMatrix that contains the distances between the elements of
points - originalDistances
- An orange.SymMatrix that contains the original distances
- stress
- An orange.SymMatrix holding the stress
- dim
- An integer holding the dimension of the projected space
- n
- An integer holding the number of elements
- avgStress
- An float holding the average stress in
stress - progressCallback
- A function that gets called after each optimization step in the
run()method
Methods
- MDS(diss, dim=2, points=None)
- Constructor that takes the original (diss)similarity and an optional arguments
dimindicating the dimension of the projected space and an initial configuration ofpoints - getDistance()
- Computes the distances between
pointsand updates theprojectedDistancesmatrix - getStress(stressFunc=orngMDS.SgnRelStress)
- Computes the stress between the current
projectedDistancesanddistancesmatrix usingstressFuncand updates thestressmatrix andavgStressaccordingly - Torgerson()
- Runs the torgerson algorithm that computes an initial analytical solution of the problem
- LSMT()
- Kruskal monotone transformation
- SMACOFstep()
- Performs a single iteration of a Smacof algorithm that optimizes stress and updates the
points - run(numIter, stressFunc=SgnRelStress, eps=1e-3, progressCallback=None)
- A convenience function that performs optimization until stopping conditions are met. That is eider optimization runs for
numIteriteration of SMACOFstep function, or the stress improvement ratio is smaller theneps(oldStress-newStress smaller then oldStress*eps) -
Examples
MDS scatterplot
In our first example, we will take iris data set, compute the distance between the examples and then run MDS on a distance matrix. This is done by the following code:
part of mds2.py (uses iris.tab)
import orange import orngMDS data=orange.ExampleTable("../datasets/iris.tab") euclidean = orange.ExamplesDistanceConstructor_Euclidean(data) distance = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): distance[i, j] = euclidean(data[i], data[j]) mds=orngMDS.MDS(distance) mds.run(100) Notice that we are running MDS through 100 iterations. We will now use matplotlib to plot the data points using the coordinates computed with MDS (you need to install matplotlib, it does not come with Orange). Each data point in iris is classified in one of the three classes, so we will use colors to denote instance's class.
part of mds2.py (uses iris.tab)
from pylab import * colors = ["red", "yellow", "blue"] points = [] for (i,d) in enumerate(data): points.append((mds.points[i][0], mds.points[i][1], d.getclass())) for c in range(len(data.domain.classVar.values)): sel = filter(lambda x: x[-1]==c, points) x = [s[0] for s in sel] y = [s[1] for s in sel] scatter(x, y, c=colors[c]) show() Executing the above script pops-up a pylab window with the following scatterplot:
Iris is a relatively simple data set with respect to classification, and to no surprise we see that MDS found such instance placement in 2-D where instances of different class are well separated. Notice also that MDS does this with no knowledge on the instance class.
A more advanced example
We are going to write a script that is similar to the functionality of the orngMDS.run method, but performs 10 steps of Smacof optimization before computing the stress. This is suitable if you have a large dataset and want to save some time. First we load the data and compute the distance matrix (just like in our previous example).
import orange import orngMDS import math data=orange.ExampleTable("../datasets/iris.tab") dist = orange.ExamplesDistanceConstructor_Euclidean(data) matrix = orange.SymMatrix(len(data)) for i in range(len(data)): for j in range(i+1): matrix[i, j] = dist(data[i], data[j]) Then we construct the MDS instance and perform the initial torgerson approximation, after which we update the stress matrix using the orngMDS.KruskalStress function.
mds=orngMDS.MDS(matrix) mds.Torgerson() mds.getStress(orngMDS.KruskalStress) And finally the main optimization loop, after which we print the projected points along with the data
i=0 while 100>i: i+=1 oldStress=mds.avgStress for j in range(10): mds.SMACOFstep() mds.getStress(orngMDS.KruskalStress) if oldStress*1e-3 > math.fabs(oldStress-mds.avgStress): break; for (p, e) in zip(mds.points, data): print p, e Stress function
StressFunctioncomputes the stress between two pointsMethods
- __call__(correct, current, weight=1.0)
- computes the stress using the correct and the current distance value(the
distancesandprojectedDistanceselements) - orngMDS.SgnRelStress
- orngMDS.KruskalStress
- orngMDS.SammonStress
- orngMDS.SgnSammonStress
The orngMDS module provides 4 stress functions
