Changes between Version 51 and Version 52 of GSoC/Ideas


Ignore:
Timestamp:
05/04/11 10:47:12 (3 years ago)
Author:
mitar
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • GSoC/Ideas

    v51 v52  
    77''Ideas are listed in no particular order.'' 
    88 
    9 == Replacing PyQwt with pure PyQt visualizations  == 
     9== Open ideas == 
     10 
     11=== 3D Widgets in Orange  === 
     12 
     13In parallel to OWGraph we should have a module OWGraph3D with similar functions, but for 3D visualization, which should be based on OpenGL. The test case examples for this task would be a 3D scatter plot and 3D net explorer widgets. 
     14 
     15Useful skills: Python, C++, [http://www.riverbankcomputing.co.uk/software/sip/intro sip], [http://doc.qt.nokia.com/latest/qtopengl.html QtOpenGL], [http://pyopengl.sourceforge.net/ PyOpenGL].  
     16 
     17Level from 1 (beginner) to 5 (professional): 4.5 
     18 
     19Possible mentors: Miha, Janez 
     20 
     21=== Support for parallel computation for scripting/backend === 
     22 
     23The project will develop the support for (semi)automatic parallelisation/separation into processes, and possible also distribution of processes over different computers. For example, [wikipedia:Cross-validation_(statistics) cross-validation] with multiple folds is one simple example of easy parallelized technique, as each fold can be independently computed and then easily combined into the final result. Parallelization should be seamless, from the point of view of the user -- script writer. 
     24 
     25It would be good to analyze such opportunities for parallelization, find what they have in common and maybe devise a small helper library (possibly a wrapper for some existing grid computing system) to use in code to easily make it run in parallel, if such environment is available, and run normally if not. And the of course move as much of already existing implementations to this new support for parallelization. 
     26 
     27Useful skills: Python. Grid computing experience. 
     28 
     29Level from 1 (beginner) to 5 (professional): 4 
     30 
     31Possible mentors: Anže 
     32 
     33=== Test scripts, example scripts and documentation === 
     34 
     35Orange comes with substantial documentation for scripting which, in places, could be substantially improved. Also, Orange 2.5 with its new class hierarchy and functions is coming, and code snippets and corresponding documentation would both require a revision. The project would embark in design of new use cases (snippets of code to demonstrate various aspects of orange), review of present set of snippets, and integration of code snippets within the documentation. 
     36 
     37Snippets in documentation also serve as regression scripts upon which Orange is tested daily. Another purpose of this project is to increase the number and coverage of regression scripts. 
     38 
     39This could be also a good project if you would like to learn more about Orange, data mining and machine learning itself. 
     40 
     41Useful skills: Proficiency in English (probably native speaker). Language/writing skills. Python. 
     42 
     43Level from 1 (beginner) to 5 (professional): 3 
     44 
     45Possible mentors: Blaž 
     46 
     47=== A social platform for Orange === 
     48 
     49Orange's visual programming environment can incorporate any of its over a 100 widgets into schemas that do clustering, classification, visualization-based analysis, PCA, and many others. A repository of typical schemas would be most welcome: the novice users could choose from the already defined schemas and analyze their own data with them, the intermediate users could use the library to improve/augment their own schemas, and experienced users would be able to store their inventions into a repository for others to use. Orange could also train a widget recommendation system from the schemas in the repository. The repository could feature tagging, liking, commenting, and everything that a social platform can provide. Schemas in the repository could be described in text or video. 
     50 
     51We would also like to provide users a way to upload their datasets and scripts/code-snippets. The later in a similar way as [https://gist.github.com/ Gist], but probably based on Mercurial (or not). 
     52 
     53Users could follow each other and their contributions (similar to [http://www.projectnoah.org/ Project Noah]), use OpenID to login, have their profiles and so on. 
     54 
     55In summary, the project would develop a new social platform for data mining solutions, possibly relying on existing solution (e.g. [http://www.myexperiment.org/ myExperiment]) or crafting something new (and simpler?). The repository would feature web access, but schemas from it should also be available in Orange (browsing, uploading and downloading). Seamless integration of repository and Orange is crucial to the success of this project (the Orange part of this integration will be done by someone from the laboratory, student will just have to define a simple HTTP-based API/protocol for it and implement it on server side). 
     56 
     57Useful skills: Knowledge on how to develop a social platform (possibly Django). Python programming. Some knowledge of data mining/machine learning would help as well. 
     58 
     59Level from 1 (beginner) to 5 (professional): 4.5 
     60 
     61Possible mentors: Blaž, Mitar 
     62 
     63=== Repository for add-ons === 
     64 
     65This project is related to the social platform for Orange, see above, and can be executed together (merge into one project) or in close collaboration. 
     66 
     67Orange supports add-ons which can add new features to scripting and new widgets (GUI). Currently, this feature is highly underused and used only for few internally developed add-ons. It would be great to open this in such way that also contributors around the world would be able to submit their add-ons to some central repository from which would then be possible install/use add-ons into Orange. This could be in some form of a web portal, maybe something along the lines of [http://trac-hacks.org/ Trac Hacks]. The portal should encourage collaboration, code exchange, help and community. In this way also a global data mining and machine learning community collaboration will be improved. 
     68 
     69It would be good to try to integrate this with existing technologies and portals ([https://bitbucket.org/ Bitbucket], [https://github.com/ GitHub], [http://pypi.python.org/ Python Package Index]). 
     70 
     71Useful skills: Python. Web programming experience (suggested technologies are Django and jQuery). 
     72 
     73Level from 1 (beginner) to 5 (professional): 3 
     74 
     75Possible mentors: Matija, Mitar 
     76 
     77=== Widgets in separate processes === 
     78 
     79Widgets in Orange Canvas currently run in a single process. As they are independent given their inputs, they could frequently work in parallel (in a [wikipedia:Dataflow data-flow manner]). The objective of this task would be to modify Orange Canvas so that each widget would run in its own process. 
     80 
     81It would be also useful to separate GUI thread from main payload computation of widgets. Currently we are using also just one thread for everything (GUI thread) and we have, while widget is working, to repeatedly callback into the GUI to make it responsive. It would be great to have this separated so that code would be cleaner. 
     82 
     83Useful skills: Python programming with multiple processes and threads. Qt and PyQt experience. Program design. 
     84 
     85Level from 1 (beginner) to 5 (professional): 5 
     86 
     87Possible mentors: Marko 
     88 
     89=== Bridge between Orange and R === 
     90 
     91[http://www.r-project.org/ R] contains many great methods/tools which would be also very useful in Orange. To prevent duplication of work (and implementation) it would be great to be able to use those methods/tools directly in Orange (so that it is not necessary to reimplement them in Orange). 
     92 
     93The idea is to research possibilities for this and then implement a future-proof bridge between Orange and R. 
     94 
     95Useful skills: Python. C/C++. Experience with R. Experience with program-to-program interfaces. 
     96 
     97Level from 1 (beginner) to 5 (professional): 4 
     98 
     99Possible mentors: Jure 
     100 
     101=== Time-series analysis === 
     102 
     103Orange currently lacks any [http://en.wikipedia.org/wiki/Time_series time-series] analysis tools. It would be great to develop some basic tools for dealing with them: reading, normalizing, basic pattern search, feature extraction, some (auto-)correlation and similar basic techniques, and so on. Research what other similar applications support and propose which features would be useful to have as a basic set of tools. 
     104 
     105Important is to implement (in a modular way) feature extraction from time-series analysis so that it can be integrated with the rest of Orange (and learning, classification and visualization tools already existing there). 
     106 
     107'''Disclaimer''': We do not have any experts on time-series analysis in a laboratory so student will have to be independent and self-learning about this. Mentor will provide some guidance and help with integration into Orange. 
     108 
     109Useful skills: Python. Data analysis experience. Digital signal processing experience could also help. 
     110 
     111Level from 1 (beginner) to 5 (professional): 4 
     112 
     113Possible mentors: Mitar 
     114 
     115=== Widgets for statistics === 
     116 
     117Orange is rather weak in basic statistics, from various statistical tests to linear regression, dimensionality reduction and so forth. It would be great to have some widgets for this. The code for computation of all this is already available in other libraries which we can call from Python, so what we actually need is a good integration within the canvas.  
     118 
     119Level from 1 (beginner) to 5 (professional): 3.5 
     120 
     121Possible mentors: Janez 
     122 
     123== Ideas selected for GSoC 2011 == 
     124 
     125=== Replacing PyQwt with pure PyQt visualizations  === 
    10126 
    11127Many visualizations in Orange widgets currently use PyQwt. It seems a good idea to migrate to pure Qt implementation, for several reasons: 
     
    23139Possible mentors: Miha, Janez 
    24140 
    25 == 3D Widgets in Orange  == 
    26  
    27 In parallel to OWGraph we should have a module OWGraph3D with similar functions, but for 3D visualization, which should be based on OpenGL. The test case examples for this task would be a 3D scatter plot and 3D net explorer widgets. 
    28  
    29 Useful skills: Python, C++, [http://www.riverbankcomputing.co.uk/software/sip/intro sip], [http://doc.qt.nokia.com/latest/qtopengl.html QtOpenGL], [http://pyopengl.sourceforge.net/ PyOpenGL].  
    30  
    31 Level from 1 (beginner) to 5 (professional): 4.5 
    32  
    33 Possible mentors: Miha, Janez 
    34  
    35 == Support for parallel computation for scripting/backend == 
    36  
    37 The project will develop the support for (semi)automatic parallelisation/separation into processes, and possible also distribution of processes over different computers. For example, [wikipedia:Cross-validation_(statistics) cross-validation] with multiple folds is one simple example of easy parallelized technique, as each fold can be independently computed and then easily combined into the final result. Parallelization should be seamless, from the point of view of the user -- script writer. 
    38  
    39 It would be good to analyze such opportunities for parallelization, find what they have in common and maybe devise a small helper library (possibly a wrapper for some existing grid computing system) to use in code to easily make it run in parallel, if such environment is available, and run normally if not. And the of course move as much of already existing implementations to this new support for parallelization. 
    40  
    41 Useful skills: Python. Grid computing experience. 
    42  
    43 Level from 1 (beginner) to 5 (professional): 4 
    44  
    45 Possible mentors: Anže 
    46  
    47 == Multi-label classification == 
     141=== Multi-label classification === 
    48142 
    49143Orange lacks support for multi-label learning and classification -- on data-structure, algorithmic and GUI level. Update the structures, implement at least a few algorithms, adapt evaluation methods and add GUI support for it. A neat repository of literature on multi-label learning is ML&KD's web page [http://mlkd.csd.auth.gr/multilabel.html Learning from Multi-Label Data]. Also, there are excellent libraries for these methods in Java, like [http://mulan.sourceforge.net/ mulan], and one possibility is its reimplementation, testing, and crafting of nice documentation with examples in Orange. 
     
    55149Possible mentors: Matija 
    56150 
    57 == Matrix factorization techniques for data mining == 
     151=== Matrix factorization techniques for data mining === 
    58152 
    59153Matrix factorization is a fundamental building block for many of current data mining approaches. Factorization techniques, like non-negative and probabilistic sparse matrix factorizations are today widely used in various applications of data mining. The aim of this project is to develop a scripting library for Orange that includes various matrix factorization techniques, and in addition provides documentation of the code, working examples that demonstrate various types of applications. Selected examples are to be included in the documentation. The entire development is therefore oriented for creation of the scripting library, that is, the project would not involve any widget programming. We would, though, like to have the student sketch how should the widgets that use this library look like, which methods developed in the scripting library should they access, and which (if any) are useful visualizations to be implemented. 
     
    64158 
    65159Possible mentors: Blaž 
    66  
    67 == Test scripts, example scripts and documentation == 
    68  
    69 Orange comes with substantial documentation for scripting which, in places, could be substantially improved. Also, Orange 2.5 with its new class hierarchy and functions is coming, and code snippets and corresponding documentation would both require a revision. The project would embark in design of new use cases (snippets of code to demonstrate various aspects of orange), review of present set of snippets, and integration of code snippets within the documentation. 
    70  
    71 Snippets in documentation also serve as regression scripts upon which Orange is tested daily. Another purpose of this project is to increase the number and coverage of regression scripts. 
    72  
    73 This could be also a good project if you would like to learn more about Orange, data mining and machine learning itself. 
    74  
    75 Useful skills: Proficiency in English (probably native speaker). Language/writing skills. Python. 
    76  
    77 Level from 1 (beginner) to 5 (professional): 3 
    78  
    79 Possible mentors: Blaž 
    80  
    81 == A social platform for Orange == 
    82  
    83 Orange's visual programming environment can incorporate any of its over a 100 widgets into schemas that do clustering, classification, visualization-based analysis, PCA, and many others. A repository of typical schemas would be most welcome: the novice users could choose from the already defined schemas and analyze their own data with them, the intermediate users could use the library to improve/augment their own schemas, and experienced users would be able to store their inventions into a repository for others to use. Orange could also train a widget recommendation system from the schemas in the repository. The repository could feature tagging, liking, commenting, and everything that a social platform can provide. Schemas in the repository could be described in text or video. 
    84  
    85 We would also like to provide users a way to upload their datasets and scripts/code-snippets. The later in a similar way as [https://gist.github.com/ Gist], but probably based on Mercurial (or not). 
    86  
    87 Users could follow each other and their contributions (similar to [http://www.projectnoah.org/ Project Noah]), use OpenID to login, have their profiles and so on. 
    88  
    89 In summary, the project would develop a new social platform for data mining solutions, possibly relying on existing solution (e.g. [http://www.myexperiment.org/ myExperiment]) or crafting something new (and simpler?). The repository would feature web access, but schemas from it should also be available in Orange (browsing, uploading and downloading). Seamless integration of repository and Orange is crucial to the success of this project (the Orange part of this integration will be done by someone from the laboratory, student will just have to define a simple HTTP-based API/protocol for it and implement it on server side). 
    90  
    91 Useful skills: Knowledge on how to develop a social platform (possibly Django). Python programming. Some knowledge of data mining/machine learning would help as well. 
    92  
    93 Level from 1 (beginner) to 5 (professional): 4.5 
    94  
    95 Possible mentors: Blaž, Mitar 
    96  
    97 == Repository for add-ons == 
    98  
    99 This project is related to the social platform for Orange, see above, and can be executed together (merge into one project) or in close collaboration. 
    100  
    101 Orange supports add-ons which can add new features to scripting and new widgets (GUI). Currently, this feature is highly underused and used only for few internally developed add-ons. It would be great to open this in such way that also contributors around the world would be able to submit their add-ons to some central repository from which would then be possible install/use add-ons into Orange. This could be in some form of a web portal, maybe something along the lines of [http://trac-hacks.org/ Trac Hacks]. The portal should encourage collaboration, code exchange, help and community. In this way also a global data mining and machine learning community collaboration will be improved. 
    102  
    103 It would be good to try to integrate this with existing technologies and portals ([https://bitbucket.org/ Bitbucket], [https://github.com/ GitHub], [http://pypi.python.org/ Python Package Index]). 
    104  
    105 Useful skills: Python. Web programming experience (suggested technologies are Django and jQuery). 
    106  
    107 Level from 1 (beginner) to 5 (professional): 3 
    108  
    109 Possible mentors: Matija, Mitar 
    110  
    111 == Widgets in separate processes == 
    112  
    113 Widgets in Orange Canvas currently run in a single process. As they are independent given their inputs, they could frequently work in parallel (in a [wikipedia:Dataflow data-flow manner]). The objective of this task would be to modify Orange Canvas so that each widget would run in its own process. 
    114  
    115 It would be also useful to separate GUI thread from main payload computation of widgets. Currently we are using also just one thread for everything (GUI thread) and we have, while widget is working, to repeatedly callback into the GUI to make it responsive. It would be great to have this separated so that code would be cleaner. 
    116  
    117 Useful skills: Python programming with multiple processes and threads. Qt and PyQt experience. Program design. 
    118  
    119 Level from 1 (beginner) to 5 (professional): 5 
    120  
    121 Possible mentors: Marko 
    122  
    123 == Bridge between Orange and R == 
    124  
    125 [http://www.r-project.org/ R] contains many great methods/tools which would be also very useful in Orange. To prevent duplication of work (and implementation) it would be great to be able to use those methods/tools directly in Orange (so that it is not necessary to reimplement them in Orange). 
    126  
    127 The idea is to research possibilities for this and then implement a future-proof bridge between Orange and R. 
    128  
    129 Useful skills: Python. C/C++. Experience with R. Experience with program-to-program interfaces. 
    130  
    131 Level from 1 (beginner) to 5 (professional): 4 
    132  
    133 Possible mentors: Jure 
    134  
    135 == Time-series analysis == 
    136  
    137 Orange currently lacks any [http://en.wikipedia.org/wiki/Time_series time-series] analysis tools. It would be great to develop some basic tools for dealing with them: reading, normalizing, basic pattern search, feature extraction, some (auto-)correlation and similar basic techniques, and so on. Research what other similar applications support and propose which features would be useful to have as a basic set of tools. 
    138  
    139 Important is to implement (in a modular way) feature extraction from time-series analysis so that it can be integrated with the rest of Orange (and learning, classification and visualization tools already existing there). 
    140  
    141 '''Disclaimer''': We do not have any experts on time-series analysis in a laboratory so student will have to be independent and self-learning about this. Mentor will provide some guidance and help with integration into Orange. 
    142  
    143 Useful skills: Python. Data analysis experience. Digital signal processing experience could also help. 
    144  
    145 Level from 1 (beginner) to 5 (professional): 4 
    146  
    147 Possible mentors: Mitar 
    148  
    149 == Widgets for statistics == 
    150  
    151 Orange is rather weak in basic statistics, from various statistical tests to linear regression, dimensionality reduction and so forth. It would be great to have some widgets for this. The code for computation of all this is already available in other libraries which we can call from Python, so what we actually need is a good integration within the canvas.  
    152  
    153 Level from 1 (beginner) to 5 (professional): 3.5 
    154  
    155 Possible mentors: Janez