Orange Forum • View topic - Multiply PCA eignevector with new data in Orange-canvas

Multiply PCA eignevector with new data in Orange-canvas

A place to ask questions about methods in Orange and how they are used and other general support.

Multiply PCA eignevector with new data in Orange-canvas

Postby bricklemacho » Tue Oct 01, 2013 10:27

So I have trained a classifier on transformed data via PCA. At them moment I save the eigenvector and perform the multiplication outside Orange and then load the transformed data.

Can I multply the PC's eignevector with new data within Orange?

Re: Multiply PCA eignevector with new data in Orange-canvas

Postby Ales » Wed Oct 02, 2013 15:36

You can use the "Translate domain" widget (in the Prototypes category).
Connect the "Transformed Data" (NOT "Eigen Vectors") PCA output to the "Target Domain" input and the data you wish to transform to the "Input Data".

Although if you just want to use the classifier on the original data the classifier should apply the transform automatically.

Also see this thread for some more discussion about.

Re: Multiply PCA eignevector with new data in Orange-canvas

Postby bricklemacho » Wed Oct 02, 2013 16:32

Ales wrote:You can use the "Translate domain" widget (in the Prototypes category).
Connect the "Transformed Data" (NOT "Eigen Vectors") PCA output to the "Target Domain" input and the data you wish to transform to the "Input Data".


Thanks. I have a look/play. I am on a bit of a steep learning cure, but have started to do most the work from python scripts. I still use the visual interface to explore, prototype and communicate my ideas to others. I have even converted a couple of people within our lab to Orange.

I am confused over "NOT 'Eigen Vectors'". I wasn't clear on my purpose. At the moment I perform PCA on all the data, but pretty sure this is incorrect mathematically. What I want to do is split the data into train/test prior to PCA. Probably showing my ignorance of PCA but I was under the impression that I need to transform the unseen data using the "learned" eigen vectors" and test on transformed unseen data. Do I have this wrong?

Secondly, I want to produce Score and Loading plots. Again, I haven't worked this out yet, but again I was under the impression it somehow uses the eigen vector, original data to allow me to explore the realtions between example (Scores Plot) and attribute (Loading Plot). Hoping somehow to use the scatter plot widget for this, perhaps may need to customise the PCA widget.


Ales wrote:Although if you just want to use the classifier on the original data the classifier should apply the transform automatically.

Also see this thread for some more discussion about.


I confused how the classifier can apply PCA?

Michael.
--

Re: Multiply PCA eignevector with new data in Orange-canvas

Postby bricklemacho » Wed Oct 02, 2013 16:40

Ales wrote:You can use the "Translate domain" widget (in the Prototypes category).
Connect the "Transformed Data" (NOT "Eigen Vectors") PCA output to the "Target Domain" input and the data you wish to transform to the "Input Data".


Thanks. That worked, as described. Now just to understand what is going on.

Re: Multiply PCA eignevector with new data in Orange-canvas

Postby Ales » Wed Oct 02, 2013 18:16

bricklemacho wrote: What I want to do is split the data into train/test prior to PCA. Probably showing my ignorance of PCA but I was under the impression that I need to transform the unseen data using the "learned" eigen vectors" and test on transformed unseen data. Do I have this wrong?
No, you are correct.
But the PCA component features (Comp.1, Comp.2, ...) already "carry" with them the proper transformation. Imagine them as derived/defined features that compute their value based on some other features (the original untransformed ones).

bricklemacho wrote:I confused how the classifier can apply PCA?

The domain on which the classifier was trained is stored (*). Then when the classifier is presented with a new instance to classify the first thing it does is translate this instance into that domain. Something like this
Code: Select all
instance = Orange.data.Instance(self.domain, instance)

And magic happens here. If the domain contains a feature which is not in the instance's domain and that feature has a get_value_from then the get_value_from(instance) is called in an attempt to fill the feature's value.
In the PCA's case this would mean that the _ProjectSingleComponent.__call__ is called.

* Actually this might not always be the case. Various learners might apply additional domain transformations like discretization, ...

Re: Multiply PCA eignevector with new data in Orange-canvas

Postby bricklemacho » Wed Oct 02, 2013 19:05

Ales wrote:But the PCA component features (Comp.1, Comp.2, ...) already "carry" with them the proper transformation. Imagine them as derived/defined features that compute their value based on some other features (the original untransformed ones).


Thanks, understand now.


Ales wrote:The domain on which the classifier was trained is stored (*). Then when the classifier is presented with a new instance to classify the first thing it does is translate this instance into that domain. Something like this
Code: Select all
instance = Orange.data.Instance(self.domain, instance)

And magic happens here. If the domain contains a feature which is not in the instance's domain and that feature has a get_value_from then the get_value_from(instance) is called in an attempt to fill the feature's value.
In the PCA's case this would mean that the _ProjectSingleComponent.__call__ is called.


That's pretty impressive. So depending on the classifier, I could give it the "unseen" data (split from original) and it will auto-magically transform the data as required.

Thanks, these concepts will also in the scripting environment.


Return to Questions & Support