Orange Forum • View topic - Interpretation of PCA Report

Interpretation of PCA Report

General discussions about Orange and with Orange connected things (data mining, machine learning, bioinformatics...).

Interpretation of PCA Report

Postby bricklemacho » Sat Sep 21, 2013 2:00

I have been using the PCA widget for dimensionality reduction and I want to investigate the relationship of PCA components to the original variables.

I believe that PCA transforms the data set by a linear combination of the original variables. How do I find out the contribution of the original variables? Is it from the "report" within the PCA Widget? So if Comp1 has a Proportion of Var 35.16, then essentially this is 35.16% of each of the original variables? Or do I need to do some magic with the EigenVectors on the output?

If some PCA components have proportion of variance of zero or close to zero does this mean I have some poor/bad/redundant features? If so, is there a simple way to determine which they are form the PCA results?

Let me know if I need to take this to a maths/stats forum.



Re: Interpretation of PCA Report

Postby bricklemacho » Tue Sep 24, 2013 1:30

Okay, I am a step closer. Hope the following helps anyone who may stumble across this question. The principal component variables are defined as linear combinations of the original variables X1,...,Xk,...,Xm. The Extracted Eigenvectors table provides coefficients for equations below[1].

Yk = Ck1X1 + Ck2X2 + ... + CkmXm


Yk is the k-th principal component k
C's are the coefficients in table

I still getting up to speed on PCA, jargon and terminology. Anyone know how to produce a "Score Plot" and/or a "Loading Plot" given the above information?


Re: Interpretation of PCA Report

Postby bricklemacho » Sun Sep 29, 2013 8:28

As a final followup. If I multiply the eigenvector and the data and then plot a row that will give me a score plot (attributes that are similar will be grouped). If I plot a column that will give me a loadings plot (example that behave similar should be grouped on the plot).

Return to General Discussions