source: orange/Orange/doc/modules/orngPCA.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 8.2 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html><HEAD> 
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css"> 
3</HEAD> 
4<body> 
5 
6<h1>orngPCA: Principal component analysis module</h1>
7
8<p>This module contains tool for performing principal components analysis on data stored as <code>Example table</code>.</p>
9
10<h2>PCA class</h2>
11
12<dl class=attributes>
13<p>PCA(dataset = None, attributes = None, rows = None, standardize = 0, imputer = defaultImputer, continuizer = defaultContinuizer,
14maxNumberOfComponents = 10, varianceCovered = 0.95, useGeneralizedVectors = 0)</p>
15<dt>dataset</dt>
16<dd><code>ExampleTable</code> instance on which PCA will be performed. If <code>None</code>, only parameters are set and PCA instance
17is returned. Projection can then be performed like this:</dd>
18
19<xmp class="code">import orange, orngPCA
20
21dataset = orange.ExampleTable('iris.tab')
22
23pca = orngPCA.PCA(standardize = True)
24pca = pca(data)</xmp>
25
26<dt>attributes</dt>
27<dd>List of attributes that will be used in projection. Names must match those of <code>ExampleTable</code> instance and there should
28be at least two. If <code>None</code>, whole domain is used.</dd>
29
30<p class="header">part of <a href="PCA1.py">PCA1.py</a></p>
31<xmp class="code">import orange, orngPCA
32
33data = orange.ExampleTable("iris.tab")
34attributes = ['sepal length', 'sepal width', 'petal length', 'petal width']
35
36pca = orngPCA.PCA(data, standardize = True, attributes = attributes)
37
38print "PCA on attributes sepal.length, sepal.width, petal.length, petal.width:"
39print pca</xmp>
40
41<dt>rows</dt>
42<dd><code>True</code>/<code>False</code> array or list with the same length as number of examples in <code>ExampleTable</code> instance.
43Only examples that corresponds to <code>True</code> will be used for projection. If <code>None</code>, all data is used.</dd>
44
45<p class="header">part of <a href="PCA1.py">PCA1.py</a></p>
46<xmp class="code">import orange, orngPCA
47
48data = orange.ExampleTable("iris.tab")
49rows = [1, 0] * (len(data) / 2)
50
51pca = PCA(data, standardize = True, rows = rows)
52
53print "PCA on every second row:"
54print pca</xmp>
55
56<dt>standardize</dt>
57<dd>If <code>True</code>, standardization of data is performed before projection.</dd>
58<dt>imputer</dt>
59<dd><code>orange.Imputer</code> instance. Defines how data is imputed if values are missing. Must NOT be trained. Default is average
60imputation</dd>
61<dt>continuizer</dt>
62<dd>
63<p><code>orange.Continuizer</code> instance. Defines how data is continuized. Default values:</p>
64<p>- Multinomial -> as normalized ordinal</p>
65<p>- Class -> ignore</p>
66<p>- Continuous -> leave</p>
67</dd>
68
69<p class="header">Example on how to use your own imputer and continuizer (<a href="PCA2.py">PCA2.py</a>)</p>
70<xmp class="code">import orange, orngPCA
71
72data = orange.ExampleTable("bridges.tab")
73
74imputer = orange.ImputerConstructor_maximal
75
76continuizer = orange.DomainContinuizer()
77continuizer.multinomialTreatment = continuizer.AsNormalizedOrdinal
78continuizer.classTreatment = continuizer.Ignore
79continuizer.continuousTreatment = continuizer.Leave
80
81pca = PCA(data, standardize = True, imputer = imputer, continuizer = continuizer)
82print pca</xmp>
83
84<dt>maxNumberOfComponents</dt>
85<dd>Defines how many components will be retained. Default is 10, if -1 all components will be retained.</dd>
86<dt>varianceCovered</dt>
87<dd>Defines how much variance of original data should be explained. Default is 0.95</dd>
88<dt>useGeneralizedVectors</dt>
89<dd>If <code>True</code>, generalized vectors are used.</dd>
90
91<p class="header">part of <a href="PCA3.py">PCA3.py</a></p>
92<xmp class="code">import orange, orngPCA
93
94data = orange.ExampleTable("iris.tab")
95
96attributes = ['sepal length', 'sepal width', 'petal length', 'petal width']
97
98pca = PCA(data, standardize = True, attributes = attributes,
99          maxNumberOfComponents = -1, varianceCovered = 1.0)
100         
101print pca</xmp>
102
103<p class="header">Output:</p>
104<xmp class="code">PCA SUMMARY
105
106Center:
107
108    sepal length       sepal width      petal length       petal width 
109          5.8433            3.0540            3.7587            1.1987 
110
111Deviation:
112
113    sepal length       sepal width      petal length       petal width 
114          0.8253            0.4321            1.7585            0.7606 
115
116Importance of components:
117
118  eigenvalues     proportion     cumulative
119       2.9108         0.7277         0.7277
120       0.9212         0.2303         0.9580
121       0.1474         0.0368         0.9948
122       0.0206         0.0052         1.0000
123
124Loadings:
125
126      PC1      PC2      PC3      PC4
127   0.5224  -0.3723  -0.7210   0.2620   sepal length                 
128  -0.2634  -0.9256   0.2420  -0.1241   sepal width                   
129   0.5813  -0.0211   0.1409  -0.8012   petal length                 
130   0.5656  -0.0654   0.6338   0.5235   petal width</xmp>
131
132</dl>
133
134<h2>PCAClassifier class</h2>
135
136<p>Object of this class is returned when PCA is performed successfully. It will contain domain of data on which PCA was performed,
137imputer and continuizer for use in projection, center, deviation, evalues and loadings for PCA. It will also store data of the
138last projection performed for use with biplot.</p>
139
140<p>Summary of projection can be obtained by printing PCAClassifier instance after PCA projection was successfully completed.</p>
141
142<h3>Performing projection:</h3>
143
144<p>Projection can be performed by calling PCA classifier instance with <code>ExampleTable</code> instance. Projection will fail
145if <code>ExampleTable</code> instance domain is not the same as in training set (however, it does not have to be in the same order).
146New <code>ExampleTable</code> instance that is returned will have data projected and domain PC+N where N is goes from 1 to number of
147components.</p>
148
149part of <a href="PCA4.py">PCA4.py</a>
150<xmp class="code">import orange, orngPCA
151
152data = orange.ExampleTable("iris.tab")
153
154attributes = ['sepal length', 'sepal width', 'petal length', 'petal width']
155pca = PCA(data, attributes = attributes, standardize = True)
156
157projected = pca(data)</xmp>
158
159<h3>Plotting functions:</h3>
160
161<p>Matplotlib is needed.</p>
162
163<dl class=attributes>
164<dt>plot(title = 'Scree plot', filename = 'Scree_plot.png')</dt>
165<dd>
166<p>Creates scree plot for current PCA.</p>
167<p>title: title of the scree plot</p>
168<p>filename: path and filename to where the figure should be saved. If <code>None</code> figure is displayed directly.</p>
169</dd>
170
171<dt>biplot(choices = [1,2], scale = 1., title = 'Biplot', filename = 'biplot.png')</dt>
172<dd>
173<p>Creates a biplot for current projection.</p>
174<p>Before calling biplot at least one projection must be performed (it will plot last performed projection). </p>
175<p>- choices: two components number that will be plotted, first on x-axis, second on y-axis. Components are numbered from 1 to N
176where N is number of components returned by PCA. Biplot does not work if there is only one component available. Only the default
177is a biplot in the strict sense</p>
178<p>- scale: transformed data is scaled by lambda ^ scale and loadings are scaled by 1/(lambda ^ scale) where lambda are the
179singular values as computed by princomp multiplied by square root of data length. Normally scale is inside [0, 1], and a
180warning will be printed if the specified scale is outside this range.</p>
181<p>- title and filename: same as for plot</p>
182</dd>
183
184<p class="header">part of <a href="PCA5.py">PCA5.py</a></p>
185<xmp class="code">import orange, orngPCA
186
187data = orange.ExampleTable("iris.tab")
188
189attributes = ['sepal length', 'sepal width', 'petal length', 'petal width']
190pca = PCA(data, standardize = True, attributes = attributes)
191
192pca(data)
193pca.biplot()</xmp>
194
195<p class="header">Output stored in file <code>biplot.png</code>:</p>
196<p><img src="biplot.png"></p>
197</dl>
198
199<h2>Utility functions</h2>
200
201<dl class=attributes>
202<dt>defaultImputer(dataset)</dt>
203<dd>Returns <code>orange.ImputerConstructor_average(dataset)</code>.</dd>
204<dt>defaultContinuizer(dataset)</dt>
205<dd>
206<p>Creates default continuizer with:</p>
207<p>- multinomial -> as normalized ordinal</p>
208<p>- class -> ignore</p>
209<p>- continuous -> leave</p>
210</dd>
211<dt>centerData(dataMatrix)</dt>
212<dd>Perfomrs centering od data along rows, returns center and centered data. <code>dataMatrix</code> is instance of <code>numpy.array</code></dd>
213<dt>standardizeData(dataMatrix)</dt>
214<dd>Performs standardization of data along rows, returns scale and scaled data. Throws error if constant variable is present.</dd>
215</dl>
Note: See TracBrowser for help on using the repository browser.