Version 38 (modified by blaz, 4 years ago) (diff)

Google Summer of Code Ideas

Here is a list of ideas for projects we thought up for what would be interesting and useful to do in a course of Google Summer of Code program for Orange. Of course you can propose also some other (your) idea(s). But of course connected with Orange, data mining, machine learning, artificial intelligence in general, bioinformatics and other fields we are interested in (or you can get us interested in).

You can find more information about our participation in Google Summer of Code here.

Ideas are listed in no particular order.

3D Widgets in Orange

The idea is to support 3D visualizations in Orange Widgets. The task is to develop a 3D Plot class (similar to  QwtPlot3D) that can be used in an arbitrary Orange Widget instead of  OWGraph. The second task is to develop use case examples, especially Scatterplot 3D and NetExplorer 3D.

Useful skills: Knowledge on 3D modeling ( QtOpenGL,  PyOpenGL). Python programming.

Level from 1 (beginner) to 5 (professional): 4.5

Support for parallel computation for scripting/backend

The project will develop the support for (semi)automatic parallelisation/separation into processes, and possible also distribution of processes over different computers. For example,  cross-validation with multiple folds is one simple example of easy parallelized technique, as each fold can be independently computed and then easily combined into the final result. Parallelization should be seamless, from the point of view of the user - script writer.

It would be good to analyze such opportunities for parallelization, find what they have in common and maybe devise a small helper library (possibly a wrapper for some existing grid computing system) to use in code to easily make it run in parallel, if such environment is available, and run normally if not. And the of course move as much of already existing implementations to this new support for parallelization.

Useful skills: Python. Grid computing experience.

Level from 1 (beginner) to 5 (professional): 4

Multi-label classification

Orange lacks support for multi-label learning and classification – on data-structure, algorithmic and GUI level. Update the structures, implement at least a few algorithms, adapt evaluation methods and add GUI support for it. A neat repository of literature on multi-label learning is ML&KD's web page  Learning from Multi-Label Data. Also, there are excellent libraries for these methods in Java, like  mulan, and one possibility is its reimplementation, testing, and crafting of nice documentation with examples in Orange.

Useful skills: Python. C/C++. A bit of machine learning.

Level from 1 (beginner) to 5 (professional): 4

Test scripts, example scripts and documentation

Orange comes with substantial documentation for scripting which, in places, could be substantially improved. Also, Orange 2.5 with its new class hierarchy and functions is coming, and code snippets and corresponding documentation would both require a revision. The project would embark in design of new use cases (snippets of code to demonstrate various aspects of orange), review of present set of snippets, and integration of code snippets within the documentation.

Snippets in documentation also serve as regression scripts upon which Orange is tested daily. Another purpose of this project is to increase the number and coverage of regression scripts.

This could be also a good project if you would like to learn more about Orange, data mining and machine learning itself.

Useful skills: Proficiency in English (probably native speaker). Language/writing skills. Python.

Level from 1 (beginner) to 5 (professional): 3

A social platform for Orange

Orange's visual programming environment can incorporate any of its over a 100 widgets into schemas that do clustering, classification, visualization-based analysis, PCA, and many others. A repository of typical schemas would be most welcome: the novice users could choose from the already defined schemas and analyze their own data with them, the intermediate users could use the library to improve/augment their own schemas, and experienced users would be able to store their inventions into a repository for others to use. Orange could also train a widget recommendation system from the schemas in the repository. The repository could feature tagging, liking, commenting, and everything that a social platform can provide. Schemas in the repository could be described in text or video.

In summary, the project would develop a new social platform for data mining solutions, possibly relying on existing solution (e.g.  myExperiment) or crafting something new (and simpler?). The repository would feature web access, but schemas from it should also be available in Orange (browsing, uploading and downloading). Seamless integration of repository and Orange is crucial to the success of this project.

Useful skills: Knowledge on how to develop a social platform (possibly Django). Python programming. Some knowledge of data mining/machine learning would help as well.

Level from 1 (beginner) to 5 (professional): 4.5

Repository for add-ons

This project is related to the social platform for Orange, see above, and can be executed together (merge into one project) or in close collaboration.

Orange supports add-ons which can add new features to scripting and new widgets (GUI). Currently, this feature is highly underused and used only for few internally developed add-ons. It would be great to open this in such way that also contributors around the world would be able to submit their add-ons to some central repository from which would then be possible install/use add-ons into Orange. This could be in some form of a web portal, maybe something along the lines of  Trac Hacks. The portal should encourage collaboration, code exchange, help and community. In this way also a global data mining and machine learning community collaboration will be improved.

It would be good to try to integrate this with existing technologies and portals ( Bitbucket,  GitHub,  Python Package Index).

Useful skills: Python. Web programming experience (suggested technologies are Django and jQuery).

Level from 1 (beginner) to 5 (professional): 3

Widgets in separate processes

Widgets in Orange Canvas currently run in a single process. As they are independent given their inputs, they could frequently work in parallel (in a  data-flow manner). The objective of this task would be to modify Orange Canvas so that each widget would run in its own process.

It would be also useful to separate GUI thread from main payload computation of widgets. Currently we are using also just one thread for everything (GUI thread) and we have, while widget is working, to repeatedly callback into the GUI to make it responsive. It would be great to have this separated so that code would be cleaner.

Useful skills: Python programming with multiple processes and threads. Qt and PyQt experience. Program design.

Level from 1 (beginner) to 5 (professional): 5

Bridge between Orange and R

 R contains many great methods/tools which would be also very useful in Orange. To prevent duplication of work (and implementation) it would be great to be able to use those methods/tools directly in Orange (so that it is not necessary to reimplement them in Orange).

The idea is to research possibilities for this and then implement a future-proof bridge between Orange and R.

Useful skills: Python. C/C++. Experience with R. Experience with program-to-program interfaces.

Level from 1 (beginner) to 5 (professional): 4