Orange Forum • View topic - GSOC -Support for parallel computation for scripting/backend

GSOC -Support for parallel computation for scripting/backend

General discussions about Orange and with Orange connected things (data mining, machine learning, bioinformatics...).

GSOC -Support for parallel computation for scripting/backend

Postby flaviovdf » Tue Mar 22, 2011 0:02

Hello,

My name is Flavio Figueiredo and I am interested on developing the support for parallel computation for scripting/backend during Gsoc 2011. I have experience on the parallel and distributed programming, having been a developer of grid middle wares in the past.

Currently I perform social networks and IR research, being orange one of the tools I use the most during the last months. In fact, I currently employ some simple parallel computation with Orange + iPython. If you wish to know any more information about myself, please check out my website and curriculum (which is on the website) or just ask me.

homepage: http://www.dcc.ufmg.br/~ flaviov


Re: GSOC -Support for parallel computation for scripting/bac

Postby Anze » Wed Mar 23, 2011 12:54

We are looking forward to reading your idea proposal. If you have any questions or wish to discuss your idea before submitting it to Google, post on this forum or contact us on Skype chat.

Re: GSOC -Support for parallel computation for scripting/bac

Postby flaviovdf » Wed Mar 30, 2011 7:50

Hello again,

I am currently in the process of writing the proposal for this. Basically I can clearly see a parallelization of Orange.evaluation.testing (used to be orngTest), cross validation and repetitions as stated in the project idea, but many other aspects of orange can be parallelized. For example:

* When dealing with multiple datasets, any analysis can be parallelized between datasets, since they are independent.
* Distance calculations between distinct pairs of examples can also be parallelized
* Different runs of k-means on clustering
* Parameter selection from Orange's orngWrap
amongst others..

The proposal can be written considering one module only, such as orngTest, or maybe a more complex (I'm not sure if this is your idea for summer of code, it can take more time) solution for enabling the parallelization of any completely independent tasks (such as the ones above). This other idea would be to implement a Workqueue for functions which could run in parallel without affecting one another. We would have to:

* Identify such functions (pre-processing is definitely not one of them, unless the ExampleTable is cloned.)
* Implement asynchronous method calls, in which the user would submit tasks to the Workqueue and wait for results accordingly.

Simple example:

-------
ExampleTable t1 = ExampleTable(...)
ExampleTable t2 = ExampleTable(...)
ExampleTable t3 = ExampleTable(...)

import orange_clust

orange_clust.config('some_config_file')
future1, future2, future3 = orange_clust.exec('orngTest.crossValidation(t1)', 'orngClust.kmeans(t2)', 'orngTest.leave_one_out(t3)')

future1.await()
...
orange_clust.exec('orngWrap.Tune1Parameter(data, ...)'

#I know some methods do not represent the current API, this a simple idea sketch
-------

I am not sure if the description was clear, but in summary would like to know if:
1) there is a priority of modules to be parallelized (I personally like orngTest since i use it the most ;-)),
2) or if the idea is to have something more generic,
3) or if this is to be studied during the summer of code.

Re: GSOC -Support for parallel computation for scripting/bac

Postby Mitar » Wed Mar 30, 2011 10:04

Modules listed in the proposal are definitely those who we would like to see prioritized (as those are the modules we have in practice wished to have parallelization). But yes, everything else is also welcome. Just not over-plan yourself. Summer is short. ;-)

I would say that implementation should be generic, with an example use of in the idea proposal mentioned modules, but if you have time also on others.

This should be studied mostly now, this is your home-work for the proposal. Of course we can and will refine it together, if you will be accepted. And you can now ask questions if you have any.

I am not sure about your proposed API sketch, will tell others to look into that and maybe comment.

Keep up with good work and writing a good proposal.

Re: GSOC -Support for parallel computation for scripting/bac

Postby Anze » Wed Mar 30, 2011 11:09

Users should be able to take advantage of parallel computation without explicitly saying what should be run in parallel. They should be able to enable parallel cross-validation in existing scripts with a single line of code. Something like "Orange.core.enable_parallelization(no_of_cores=4)" in the beginning of the script should enable all calls to Orange.evaluation.testing.cross_validation(...) to run folds in parallel.

The API you described seems reasonable enough to use. Keep in mind that for parallelization of parameter selection you would have to run several cross-validations (each running in parallel) in parallel, so API should support this.

Re: GSOC -Support for parallel computation for scripting/bac

Postby Mitar » Wed Mar 30, 2011 11:26

So, yes. Both parallelization with local processes and across the grid should be supported. Do not forget that.

(Because Python currently do not support threading in an useful way because of a global interpreter locking. So all parallelization should be done through processes anyway.)

Re: GSOC -Support for parallel computation for scripting/bac

Postby flaviovdf » Fri Apr 08, 2011 6:09

Hi,

Unfortunately I will not be able to dedicate as much time as I would have wished to Gsoc. Yesterday I received some exam (not medical, academic) results and they were worse than expected. Since summer up north is not summer in Brazil, the time for Gsoc is not necessarily vacations for me. Also, I am visiting another university and my time would already be split. After much thought, I decided to review my priorities and leave Gsoc out this year.

I hope some one else get's this project and I am still an enthusiastic user of Orange =) Maybe I can still develop this project (or help some one) but not for Gsoc, since you and google would probably expect more effort than I can allocate.

Flavio

Re: GSOC -Support for parallel computation for scripting/bac

Postby Mitar » Fri Apr 08, 2011 9:13

We are sad to hear that. But hope you know that you are always welcome to join in the development at any time. As you said, also outside of the GSoC program.

Re: GSOC -Support for parallel computation for scripting/bac

Postby yuhongguo » Fri Apr 15, 2011 12:17

Hi Mitar

I'm gsoc 2011 Multi-label classification student yuhong.

How can I contact you via email? Can we talk about the ideas?

Re: GSOC -Support for parallel computation for scripting/bac

Postby Mitar » Mon Apr 18, 2011 1:27

For this idea you should turn to Matija.

Re: GSOC -Support for parallel computation for scripting/bac

Postby matija » Mon Apr 18, 2011 7:20

Hi,

you can reach me via the "Send message" button here: Matija Polajnar. I'll respond by e-mail.

Matija


Return to General Discussions



cron