Orange Forum • View topic - Any way to save or lock in Linear Projection anchor info?

Any way to save or lock in Linear Projection anchor info?

A place to ask questions about methods in Orange and how they are used and other general support.

Any way to save or lock in Linear Projection anchor info?

Postby Pea » Mon Apr 30, 2012 16:53

I'm currently trying to use PCA and the linear projection tool to sort different images of fruit by what fruit it is. What I've done is taken each 64x64 image, converted it into an array of integers representing each byte value, removed every 0 value, then padded the end with 0 values to make them all the same length. This data for 24 images is passed into a data table, which sends it through PCA to a linear projection of the transformed data. From there, I use freeviz to optimize separation via the python implementation, and it separates the images by fruit (passed in by class) extremely well.

What I want to do, but can't figure out how, is send an additional image through the process without changing any of the previous data. In practice, once I have the eigenvectors (which I have from the PCA widget) I could manually transform the new data item in the same manner as the existing data. I can't figure out how to do this though orange, but if I can't figure it out I can always cheat and pass it in already transformed. The problem I'm having is every time the input changes at all, the anchors for all the principle components on the linear projection reset, so I can't see where the new data point falls. Additionally, I can't simply send in the new data point through the whole point and re-optimize because the mandatory class value influences the optimization, and I end up with something completely different. If I pass it in with the correct class value, I know it will change the anchors to make it work, but if I pass in a new or different existing class value, it will make sure the new item does not appear in the proper area.

My question: Is there any way to lock in the anchor settings for a linear projection between changes to the input? Alternatively, is there a way to view and manually set the anchor angles/positions/lengths? Also, is it possible to perform transpositions, multiplications, or other mathematical operations on data tables as matrices?

Re: Any way to save or lock in Linear Projection anchor info

Postby whoeverest » Wed May 02, 2012 23:19

Hi Pea. I'm can't help you with the settings anchoring problem, because I'm relatively new Orange user and I'm not familiar with the implementation in question. Probably a member from the core team will be able to give you more information.

However, I'm developing an add-on for Orange for computer vision, as a part of GSoC, concretely widgets for image preprocessing. And since you are using Orange for working with images, can you tell me - from your perspective as a user - what would you like to see implemented in such an add-on? It would be really useful for me to get such feedback so I can make the add-on better and more useful.

As for working with matrices: the widgets that I'll implement are going to be based on the OpenCV library for Python that uses NumPy arrays (matrices really), so after I'm done you should be able to. But for now - no, I'm afraid.

Re: Any way to save or lock in Linear Projection anchor info

Postby Ales » Thu May 03, 2012 10:29

You can use the 'Translate Domain' widget in the prototypes section (http://orange.biolab.si/trac/wiki/HowTo/EnablePrototypesWidgetCategory) to compute the transformations for the new data instances.

Connect the 'Transformed Data' output channel from the 'PCA' widget to the 'Target Domain' channel of the 'Translate Domain' widget and your new additional image instances to the 'Input Data' channel. Then connect the 'Translate Domain's 'Translated Data' channel to the Linear Projection's 'Data Subset' input.

If all went well you should see your additional image instances in the Linear Projection widget drawn with filled symbols and the projection should stay the same even if the 'Data Subset' channel changes.

Re: Any way to save or lock in Linear Projection anchor info

Postby Pea » Thu May 03, 2012 14:29

Thank you so much, Ales! I'll try that out right away and post the results.


Whoeverest, congrats on getting into GSoC! As far as my use of images with orange, it's only temporary. I'm researching the application of PCA in malware detection, so I hope to be using executable files instead of images in the near future. Images are just a lot easier to find in large sets of same-size images. I've progressed to a set of 105 icons, and I'm still hoping to find some larger sets.

I'm not sure if it's already implemented (I'm pretty darn new) but the ability to turn an image into a matrix of pixels (for one image) or a matrix where each row of is all of an image's pixels (for multiple images) would be convenient. I currently convert all the images into a tab-separated text file of the byte values before I pass them into Orange, so it's not necessarily something that needs to be done in Orange (And I'm sure there's people more familiar with it that feel it shouldn't be done). However, now that I think (and Google) about it, being able to categorize and distinguish between different types of pictures is kind of like computer vision, isn't it? Well, I'll see how the Translate Domain widget works out for me.

Thanks!


EDIT: I processed data for additional images to test out the distribution, but when I pass the new data into the translate Domain widget, it puts up an error (Failed to convert the domain(Value Error('shape mismatch: objects cannot be broadcast to a single shape',))). The info displayed by the widget itself is "Target domain with 105 features and discrete class" and "input data with 10 instances". Any idea what I'm doing wrong?

Re: Any way to save or lock in Linear Projection anchor info

Postby Ales » Fri May 04, 2012 9:57

Pea wrote:it puts up an error (Failed to convert the domain(Value Error('shape mismatch: objects cannot be broadcast to a single shape',)))

My guess is this is due to a different number of features in the new data then it was used to construct the PCA. Can you check the domain of the new data?

Re: Any way to save or lock in Linear Projection anchor info

Postby Pea » Mon May 07, 2012 14:58

Okay, this is a bit funky...

Both data sets had the same number of columns (which I believe are considered features). When I plugged it into the widget, it displayed "Target Domain with 105 features and discrete class. Input data with 10 instances." as it had been. Then I remembered the fun stuff about requirements for multiplying matrices and how the number of columns for one had to be equal to the number of rows in the other... Long story short, I swapped the two (target domain vs. input data) and it passed through. of course, since the data was not meant to be sent through that way, all I got was a 105x105 table of ?'s. Then I tried using the transpose widget on the new data to make it have 105 instances, like the original data, but it threw the original error again.

I also decided to try plugging the new data directly into the data subset port on the linear projection widget. It throws the same error, but the error output popup window appears with what may be a more helpful description of the error being thrown. I'll paste it below.

Code: Select all
There were problems importing the following widgets:
   OWC45Tree: c45 is not found
   OWItemsetViz: No module named OWNetworkCanvas
   OWModelMap: No module named scipy.stats
   OWRScript: No module named rpy2.robjects
   OWLinProj3D: No module named OpenGL
   OWNxExplorer3D: No module named OWNxExplorerQt
   OWScatterPlot3D: No module named OpenGL
   OWSphereviz3D: No module named OpenGL
The following widgets could not be imported and will not be available: OWSphereviz3D, OWScatterPlot3D, OWLinProj3D, OWC45Tree, OWNxExplorer3D.
The following prototype widgets could not be imported and will not be available: OWRScript, OWModelMap, OWItemsetViz.

Unhandled exception of type ValueError occured at 9:51:16:
Traceback:
  File: orngSignalManager.py, line 622 in processNewSignals
  Code: self.widgets[i].processSignals()
    File: OWBaseWidget.py, line 691 in processSignals
    Code: self.handleNewSignals()
      File: OWLinProj.py, line 301 in handleNewSignals
      Code: self.graph.setData(self.data, self.subsetData)
        File: __init__.py, line 199 in wrap_call
        Code: return func(*args, **kwargs)
          File: OWLinProjGraph.py, line 73 in setData
          Code: orngScaleLinProjData.setData(self, data, subsetData, **args)
            File: __init__.py, line 199 in wrap_call
            Code: return func(*args, **kwargs)
              File: __init__.py, line 199 in wrap_call
              Code: return func(*args, **kwargs)
                File: scaling.py, line 266 in set_data
                Code: full_data = self.merge_data_sets(data, subset_data)
                  File: __init__.py, line 199 in wrap_call
                  Code: return func(*args, **kwargs)
                    File: __init__.py, line 199 in wrap_call
                    Code: return func(*args, **kwargs)
                      ValueError: shape mismatch: objects cannot be broadcast to a single shape

Unhandled exception of type ValueError occured at 9:51:25:
Traceback:
  File: orngSignalManager.py, line 622 in processNewSignals
  Code: self.widgets[i].processSignals()
    File: OWBaseWidget.py, line 691 in processSignals
    Code: self.handleNewSignals()
      File: OWLinProj.py, line 301 in handleNewSignals
      Code: self.graph.setData(self.data, self.subsetData)
        File: __init__.py, line 199 in wrap_call
        Code: return func(*args, **kwargs)
          File: OWLinProjGraph.py, line 73 in setData
          Code: orngScaleLinProjData.setData(self, data, subsetData, **args)
            File: __init__.py, line 199 in wrap_call
            Code: return func(*args, **kwargs)
              File: __init__.py, line 199 in wrap_call
              Code: return func(*args, **kwargs)
                File: scaling.py, line 266 in set_data
                Code: full_data = self.merge_data_sets(data, subset_data)
                  File: __init__.py, line 199 in wrap_call
                  Code: return func(*args, **kwargs)
                    File: __init__.py, line 199 in wrap_call
                    Code: return func(*args, **kwargs)
                      ValueError: shape mismatch: objects cannot be broadcast to a single shape

Unhandled exception of type TypeError occured at 9:51:34:
Traceback:
  File: __init__.py, line 199 in wrap_call
  Code: return func(*args, **kwargs)
    File: OWLinProjGraph.py, line 487 in mouseMoveEvent
    Code: attrVal = self.scaledData[self.attributeNameIndex[label]][index]
      TypeError: 'NoneType' object is not subscriptable

Unhandled exception of type TypeError occured at 9:51:36:
Traceback:
  File: __init__.py, line 199 in wrap_call
  Code: return func(*args, **kwargs)
    File: OWLinProjGraph.py, line 487 in mouseMoveEvent
    Code: attrVal = self.scaledData[self.attributeNameIndex[label]][index]
      TypeError: 'NoneType' object is not subscriptable

Unhandled exception of type ValueError occured at 9:56:24:
Traceback:
  File: orngSignalManager.py, line 622 in processNewSignals
  Code: self.widgets[i].processSignals()
    File: OWBaseWidget.py, line 691 in processSignals
    Code: self.handleNewSignals()
      File: OWLinProj.py, line 301 in handleNewSignals
      Code: self.graph.setData(self.data, self.subsetData)
        File: __init__.py, line 199 in wrap_call
        Code: return func(*args, **kwargs)
          File: OWLinProjGraph.py, line 73 in setData
          Code: orngScaleLinProjData.setData(self, data, subsetData, **args)
            File: __init__.py, line 199 in wrap_call
            Code: return func(*args, **kwargs)
              File: __init__.py, line 199 in wrap_call
              Code: return func(*args, **kwargs)
                File: scaling.py, line 266 in set_data
                Code: full_data = self.merge_data_sets(data, subset_data)
                  File: __init__.py, line 199 in wrap_call
                  Code: return func(*args, **kwargs)
                    File: __init__.py, line 199 in wrap_call
                    Code: return func(*args, **kwargs)
                      ValueError: shape mismatch: objects cannot be broadcast to a single shape

Unhandled exception of type ValueError occured at 9:59:26:
Traceback:
  File: orngSignalManager.py, line 622 in processNewSignals
  Code: self.widgets[i].processSignals()
    File: OWBaseWidget.py, line 691 in processSignals
    Code: self.handleNewSignals()
      File: OWLinProj.py, line 301 in handleNewSignals
      Code: self.graph.setData(self.data, self.subsetData)
        File: __init__.py, line 199 in wrap_call
        Code: return func(*args, **kwargs)
          File: OWLinProjGraph.py, line 73 in setData
          Code: orngScaleLinProjData.setData(self, data, subsetData, **args)
            File: __init__.py, line 199 in wrap_call
            Code: return func(*args, **kwargs)
              File: __init__.py, line 199 in wrap_call
              Code: return func(*args, **kwargs)
                File: scaling.py, line 266 in set_data
                Code: full_data = self.merge_data_sets(data, subset_data)
                  File: __init__.py, line 199 in wrap_call
                  Code: return func(*args, **kwargs)
                    File: __init__.py, line 199 in wrap_call
                    Code: return func(*args, **kwargs)
                      ValueError: shape mismatch: objects cannot be broadcast to a single shape


*error message from putting 105 column, 10 row table into linear projection as subset for existing 105x105 data set with matching column names

Re: Any way to save or lock in Linear Projection anchor info

Postby Ales » Mon May 07, 2012 17:06

Pea wrote: Then I remembered the fun stuff about requirements for multiplying matrices and how the number of columns for one had to be equal to the number of rows in the other... Long story short, I swapped the two (target domain vs. input data) and it passed through. of course, since the data was not meant to be sent through that way, all I got was a 105x105 table of ?'s. Then I tried using the transpose widget on the new data to make it have 105 instances, like the original data, but it threw the original error again.
This is not necessary (in fact it is detrimental). The PCA transformations are defined by feature descriptors for the PCA components (using this http://orange.biolab.si/doc/reference/Orange.feature.descriptor/#Orange.feature.Descriptor.get_value_from), and this expects the data to be of the same domain as the input to the PCA itself (i.e. the original domain).

The error most probably happens in _ProjectSingleComponent.__call__ (http://orange.biolab.si/trac/browser/orange/Orange/projection/linear.py#L1269)

Which means that if the input data domain matches the one feed into PCA then maybe the PCA itself is at fault here.
Can you please add 'Python Script' widget to the canvas and connect the "Transformed Data" channel to it. Then execute this script in the widget:
Code: Select all
p = in_data.domain[0].get_value_from.__callback
print p.projector.center.shape
print p.projector.scale.shape
print p.projector.projection[p.idx, :].shape
print len(p.projector.input_domain.features)
print p.projector.input_domain.class_var

print p.projector

What is the output?

Re: Any way to save or lock in Linear Projection anchor info

Postby Pea » Tue May 08, 2012 14:19

Okay, I connected the PCA Output transformed data to the python widget input. The output of the script was as follows:

Code: Select all
Running script:
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "<string>", line 1
    p = in_data.domain[0].get_value_from.__callback
   ^
IndentationError: unexpected indent

Re: Any way to save or lock in Linear Projection anchor info

Postby Ales » Tue May 08, 2012 14:26

Make sure you copy the script to the widget exactly as is (that includes indentation).

Re: Any way to save or lock in Linear Projection anchor info

Postby Pea » Tue May 08, 2012 14:38

Wow.... I didn't even make that connection...

Okay, I fixed that and it gave much more relevant looking information:

Code: Select all
Running script:
(5002,)
(5002,)
(5002,)
5002
Orange.feature.Discrete 'Owner'
PCA SUMMARY

Std. deviation of components:
                   Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6     Comp.7     Comp.8     Comp.9    Comp.10        ...   Comp.105
Std. deviation      0.256      0.237      0.191      0.176      0.165      0.159      0.152      0.147      0.141      0.137                 0.000
Proportion Var      6.547      5.617      3.640      3.094      2.710      2.519      2.310      2.175      1.977      1.885                 0.000
Cumulative Var      6.547     12.164     15.804     18.898     21.608     24.127     26.437     28.612     30.589     32.473               100.000

Re: Any way to save or lock in Linear Projection anchor info

Postby Ales » Tue May 08, 2012 15:50

Can you please also connect your new data instances ('Input Data', for the 'Translate Domain' widget) to the 'Python Script' widget and execute
Code: Select all
import Orange
print len(in_data.domain.features)
print in_data.domain.class_var
array = Orange.data.Table(in_data[:1]).to_numpy("a")[0]
print array.shape

Re: Any way to save or lock in Linear Projection anchor info

Postby Pea » Tue May 08, 2012 16:15

File(new data) -> Data Table (all rows selected) -> Python Script

Here is the output:

Code: Select all
Running script:
105
Orange.feature.Discrete 'Owner'
(1, 105)
>>>

Re: Any way to save or lock in Linear Projection anchor info

Postby Ales » Tue May 08, 2012 16:44

Then as I mentioned before, you need to provide the input data with the same domain as the input to the 'PCA' widget (that means all of the 5002 features that were used by PCA).

Re: Any way to save or lock in Linear Projection anchor info

Postby Pea » Tue May 08, 2012 17:04

Wait, so... The new data needs to be in the same format as the original data was before it went through the PCA? That... would be amazing. I was using excel to multiply the new data by the eigenvector matrix, but it was always in some different scale.

I ran it with the new data in the same format as the pre-pca original, and it worked! Thank you! Now, is there any way to get the locations of the anchors? Not necessarily to plug in later, just to record elsewhere so I can use the data from the projection if it proves supportive (which, with the new data being plotted now, doesn't look likely).

Thanks so much for your help!

Re: Any way to save or lock in Linear Projection anchor info

Postby Ales » Wed May 09, 2012 9:51

Pea wrote:Now, is there any way to get the locations of the anchors?
I think this is not possible at the moment.


Return to Questions & Support