wiki:GSoC/Ideas

Version 12 (modified by mitar, 4 years ago) (diff)

Google Summer of Code Ideas

Here is a list of ideas for projects we thought up for what would be interesting and useful to do in a course of Google Summer of Code program for Orange. Of course you can propose also some other (your) idea(s). But of course connected with Orange, data mining, machine learning, artificial intelligence in general, bioinformatics and other fields in which we are interested in (or you can get us interested in).

Ideas are listed in no particular order.

Time-series analysis

Orange currently lacks any  time-series analysis tools. It would be great to develop some basic tools for dealing with them: reading, normalizing, basic pattern search, some (auto-)correlation and similar basic techniques, and so on. Research what other similar applications support and propose which features would be useful to have as a basic set of tools.

Useful skills: Python. Data analysis experience. Digital signal processing experience could also help.

Level from 1 (beginner) to 5 (professional): 4

Widgets in separate processes

Widgets in Orange Canvas currently run in a single process. As they are independent given their inputs, they could frequently work in parallel (in a  data-flow manner). The objective of this task would be to modify Orange Canvas so that each widget would run in its own process.

It would be also useful to separate GUI thread from main payload computation of widgets. Currently we are using also just one thread for everything (GUI thread) and we have, while widget is working, to repeatedly callback into the GUI to make it responsive. It would be great to have this separated so that code would be cleaner.

Useful skills: Python programming with multiple processes and threads. Qt and PyQt experience. Program design.

Level from 1 (beginner) to 5 (professional): 5

Anova

Implement Anova regression, which would support arbitrary models, similar to the R implementation.

Useful skills: Python. The candidate should be familiar with statistics and computation with matrices (numpy).

Level from 1 (beginner) to 5 (professional): 3

Support for parallel computation for scripting/backend

One other idea discusses the idea of making GUI process in parallel/separate processes. But this idea talks about having scripting part (backend part) of the Orange support (semi)automatic parallelisation/separation into processes and possible also processes over different computers. For example,  cross-validation with multiple folds is one simple example of easy parallelized technique, as each fold can be independently computed and then easily combined into the final result.

It would be good to analyze such opportunities for parallelization, find what they have in common and maybe devise a small helper library (possibly a wrapper for some existing grid computing system, like Xgrid) to use in code to easily make it run in parallel, if such environment is available, and run normally if not. And the of course move as much of already existing implementations to this new support for parallelization.

Useful skills: Python. Grid computing experience.

Level from 1 (beginner) to 5 (professional): 4