source: orange/Orange/doc/modules/orngMDS.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 6.6 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html>
2<HEAD>
3<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
4<LINK REL=StyleSheet HREF="../style-print.css" TYPE="text/css" MEDIA=print></LINK>
5</HEAD>
6
7<BODY>
8<h1>orngMDS</h1>
9
10<index name="modules+multidimensional scaling">
11
12<p>The orngMDS module provides the functionality to perform multi
13dimensional scaling</p>
14
15<h2>MDS</h2>
16
17<p><INDEX name="classes/MDS (in orngMDS)">MDS is the main class for
18performing multi dimensional scaling</p>
19
20<p class=section>Attributes</p>
21<DL class =attributes>
22<DT>points</DT>
23<DD>Holds the current configuration of projected points</DD>
24<DT>distances</DT>
25<DD>An orange.SymMatrix that contains the distances that we want to achieve(<code>LSMT</code> changes these)</DD>
26<DT>projectedDistances<DT>
27<DD>An orange.SymMatrix that contains the distances between the elements of <code>points</code>
28<DT>originalDistances</DT>
29<DD>An orange.SymMatrix that contains the original distances</DD>
30<DT>stress<DT>
31<DD>An orange.SymMatrix holding the stress</DD>
32<DT>dim</DT>
33<DD>An integer holding the dimension of the projected space</DD>
34<DT>n</DT>
35<DD>An integer holding the number of elements</DD>
36<DT>avgStress</DT>
37<DD>An float holding the average stress in <code>stress</code></DD>
38<DT>progressCallback</DT>
39<DD>A function that gets called after each optimization step in the <code>run()</code> method</DD>
40</DL>
41<P class=section>Methods</P>
42<DL class=attributes>
43<DT>MDS(diss, dim=2, points=None)</DT>
44<DD>Constructor that takes the original (diss)similarity  and an optional arguments <code>dim</code> indicating the dimension of the projected space  and an initial configuration of <code>points</code></DD>
45<DT>getDistance()</DT>
46<DD>Computes the distances between <code>points</code> and updates the <code>projectedDistances</code> matrix</DD>
47<DT>getStress(stressFunc=orngMDS.SgnRelStress)</DT>
48<DD>Computes the stress between the current <code>projectedDistances</code> and <code>distances</code> matrix using <code>stressFunc</code> and updates the <code>stress</code> matrix and <code>avgStress</code> accordingly</DD>
49<DT>Torgerson()</DT>
50<DD>Runs the torgerson algorithm  that computes an initial analytical solution of the problem </DD>
51<DT>LSMT()</DT>
52<DD>Kruskal monotone transformation</DD>
53<DT>SMACOFstep()</DT>
54<DD>Performs a single iteration of a Smacof algorithm that optimizes stress and updates the <code>points</code></DD>
55<DT>run(numIter, stressFunc=SgnRelStress, eps=1e-3, progressCallback=None)</DT>
56<DD>A convenience function that performs optimization until stopping conditions are met. That is eider optimization runs for <code>numIter</code> iteration of SMACOFstep function, or the stress improvement ratio is smaller then <code>eps</code>(oldStress-newStress smaller then oldStress*eps)</DD>
57<DT>
58
59<h2>Examples</h2>
60
61<h3>MDS scatterplot</h3>
62
63<p>In our first example, we will take iris data set, compute the
64distance between the examples and then run MDS on a distance
65matrix. This is done by the following code:</p>
66
67<p class="header">part of <a href="mds2.py">mds2.py</a> (uses <a href=
68"iris.tab">iris.tab</a>)</p>
69<xmp class=code>import orange
70import orngMDS
71
72data=orange.ExampleTable("../datasets/iris.tab")
73euclidean = orange.ExamplesDistanceConstructor_Euclidean(data)
74distance = orange.SymMatrix(len(data))
75for i in range(len(data)):
76   for j in range(i+1):
77       distance[i, j] = euclidean(data[i], data[j])
78
79mds=orngMDS.MDS(distance)
80mds.run(100)
81</xmp>
82
83<p>Notice that we are running MDS through 100 iterations. We will now
84use <a href"http://matplotlib.sourceforge.net/">matplotlib</a> to plot
85the data points using the coordinates computed with MDS (you need to
86install <a href"http://matplotlib.sourceforge.net/">matplotlib</a>, it
87does not come with Orange). Each data point in iris is classified in
88one of the three classes, so we will use colors to denote instance's
89class.<p>
90
91<p class="header">part of <a href="mds2.py">mds2.py</a> (uses <a href=
92"iris.tab">iris.tab</a>)</p>
93<xmp class=code>
94from pylab import *
95colors = ["red", "yellow", "blue"]
96
97points = []
98for (i,d) in enumerate(data):
99   points.append((mds.points[i][0], mds.points[i][1], d.getclass()))
100for c in range(len(data.domain.classVar.values)):
101    sel = filter(lambda x: x[-1]==c, points)
102    x = [s[0] for s in sel]
103    y = [s[1] for s in sel]
104    scatter(x, y, c=colors[c])
105show()
106</xmp>
107
108<p>Executing the above script pops-up a pylab window with the
109following scatterplot:</p>
110
111<img src="mds-iris.png">
112
113<p>Iris is a relatively simple data set with respect to
114classification, and to no surprise we see that MDS found such instance
115placement in 2-D where instances of different class are well
116separated. Notice also that MDS does this with no knowledge on the
117instance class.</p>
118
119<h3>A more advanced example</h3>
120
121<p>We are going to write a script that is similar to the functionality
122of the orngMDS.run method, but performs 10 steps of Smacof
123optimization before computing the stress. This is suitable if you have
124a large dataset and want to save some time. First we load the data and
125compute the distance matrix (just like in our previous example).</p>
126
127<p class="header"><a href="mds1.py">mds1.py</a> (uses <a href=
128"iris.tab">iris.tab</a>)</p>
129<XMP class= code>import orange
130import orngMDS
131import math
132data=orange.ExampleTable("../datasets/iris.tab")
133dist = orange.ExamplesDistanceConstructor_Euclidean(data)
134matrix = orange.SymMatrix(len(data))
135for i in range(len(data)):
136   for j in range(i+1):
137       matrix[i, j] = dist(data[i], data[j])
138</XMP>
139
140<p>Then we construct the MDS instance and perform the initial
141torgerson approximation, after which we update the stress matrix using
142the orngMDS.KruskalStress function.</p>
143
144<XMP class= code>mds=orngMDS.MDS(matrix) mds.Torgerson()
145mds.getStress(orngMDS.KruskalStress)
146</XMP>
147
148<p>And finally the main optimization loop, after which we print the
149projected points along with the data</p>
150
151<XMP class= code>i=0 while 100>i:
152    i+=1
153    oldStress=mds.avgStress
154    for j in range(10): mds.SMACOFstep()
155    mds.getStress(orngMDS.KruskalStress)
156    if oldStress*1e-3 > math.fabs(oldStress-mds.avgStress):
157        break;
158for (p, e) in zip(mds.points, data):
159    print p, e
160</XMP>
161
162<h2>Stress function</h2>
163<P><code>StressFunction</code> computes the stress between two points</P>
164<P class=section>Methods</P>
165<DL class=attributes>
166<DT>__call__(correct, current, weight=1.0)</DT>
167<DD>computes the stress using the correct and the current distance value(the <code>distances</code> and <code>projectedDistances</code> elements)</DD>
168</DL>
169<p>The orngMDS module provides 4 stress functions</p>
170<p>
171<ul>
172<li>orngMDS.SgnRelStress</li>
173<li>orngMDS.KruskalStress</li>
174<li>orngMDS.SammonStress</li>
175<li>orngMDS.SgnSammonStress</li>
176</ul>
177</p>
178</body>
179</html>
180
181
Note: See TracBrowser for help on using the repository browser.