Changes between Version 82 and Version 83 of GSoC/Ideas


Ignore:
Timestamp:
03/28/13 14:07:42 (16 months ago)
Author:
thocevar
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • GSoC/Ideas

    v82 v83  
    6060 
    6161Possible mentors: Marko 
    62  
    63 == Previous ideas == 
    64  
    65 === Data input from mldata.org === 
    66  
    67 [http://mldata.org/ mldata.org] is an excellent machine learning data repository]. It would be great if Orange would have a script-based access to the repository that would also support querying (e.g., show me a list of all regression data sets with more than 100 features and 1000 data instances). Implementation of automatic querying and data download would also provide a basis for implementation of widget for browsing, searching and filtering of mldata.org data sets. [http://mldata.org/ mldata.org] at present does not feature programmatic access and querying, so carrying out this task may involve changing of [https://github.com/open-machine-learning/mldata their code first]. 
    68  
    69 Useful skills: Python.  
    70  
    71 Level from 1 (beginner) to 5 (professional): 4 
    72  
    73 Possible mentors: Blaž 
    74  
    75  
    76 === Support for parallel computation for scripting/backend === 
    77  
    78 The project will develop the support for (semi)automatic parallelisation/separation into processes, and possible also distribution of processes over different computers. For example, [wikipedia:Cross-validation_(statistics) cross-validation] with multiple folds is one simple example of easy parallelized technique, as each fold can be independently computed and then easily combined into the final result. Parallelization should be seamless, from the point of view of the user -- script writer. 
    79  
    80 It would be good to analyze such opportunities for parallelization, find what they have in common and maybe devise a small helper library (possibly a wrapper for some existing grid computing system) to use in code to easily make it run in parallel, if such environment is available, and run normally if not. And the of course move as much of already existing implementations to this new support for parallelization. 
    81  
    82 Useful skills: Python. Grid computing experience. 
    83  
    84 Level from 1 (beginner) to 5 (professional): 4 
    85  
    86 Possible mentors: Anže 
    87  
    88 === Test scripts, example scripts and documentation === 
    89  
    90 Orange comes with substantial documentation for scripting which, in places, could be substantially improved. Also, Orange 2.5 with its new class hierarchy and functions is just about to be released, and some code snippets and corresponding documentation would both require a revision (note that [http://orange.biolab.si/doc/reference/ Reference Guide] has already been rewritten). The project would embark in design of new use cases (snippets of code to demonstrate various aspects of orange), review of present set of snippets, and integration of code snippets within the documentation. Writing of a Orange Cookbook, or Orange User's Guide would be most welcome. 
    91  
    92 Snippets in documentation also serve as regression scripts upon which Orange is tested daily. Another purpose of this project might be to increase the number and coverage unit tests. 
    93  
    94 This could be also a good project if you would like to learn more about Orange, data mining and machine learning itself. 
    95  
    96 Useful skills: Proficiency in English (probably native speaker) if the target is documentation writing. Language/writing skills. Good knowledge of Python if the target is writing of unit tests. 
    97  
    98 Level from 1 (beginner) to 5 (professional): 3 
    99  
    100 Possible mentors: Blaž 
    101  
    102 === Repository for add-ons === 
    103  
    104 Orange supports add-ons which can add new features to scripting and new widgets (GUI). Currently, this feature is highly underused and used only for few internally developed add-ons. It would be great to open this in such way that also contributors around the world would be able to submit their add-ons to some central repository from which would then be possible install/use add-ons into Orange.  
    105  
    106 It would be good to try to integrate this with existing technologies and portals ([https://bitbucket.org/ Bitbucket], [https://github.com/ GitHub], [http://pypi.python.org/ Python Package Index]). 
    107  
    108 Useful skills: Python (along with distutils, python package directory structure rules and packaging skills). 
    109  
    110 Level from 1 (beginner) to 5 (professional): 3 
    111  
    112 Possible mentors: Matija, Mitar 
    113  
    114  
    115  
    116 === Animations in Orange === 
    117  
    118 Data visualization plays a very important role in understanding relationships from the data. Unfortunately, it is usually limited to two dimensions (e.g. scatter plot), additional information about the data can be presented by different colors, sizes and shapes of the points. There can be, however, additional variables in data (e.g. time) which can have a strong influence on the scatter plot. Like in [http://www.gapminder.org/world/#$majorMode=chart$is;shi=t;ly=2003;lb=f;il=t;fs=11;al=30;stl=t;st=t;nsl=t;se=t$wst;tts=C$ts;sp=5.59290322580644;ti=2010$zpv;v=0$inc_x;mmid=XCOORDS;iid=phAwcNAVuyj1jiMAkmq1iMg;by=ind$inc_y;mmid=YCOORDS;iid=phAwcNAVuyj2tPLxKvvnNPA;by=ind$inc_s;uniValue=8.21; Gapminder] time can be used as an "animation" variable. One could play animations and see how the scatter plot changes during the time (or any other continuous variable from the data). 
    119  
    120 Useful skills: Python. Widgets programming. 
    121  
    122 Level from 1 (beginner) to 5 (professional): 4 
    123  
    124 Possible mentors: Blaž