Changes between Version 71 and Version 72 of GSoC/Ideas


Ignore:
Timestamp:
03/18/13 09:12:59 (18 months ago)
Author:
anze
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • GSoC/Ideas

    v71 v72  
    99== Open ideas == 
    1010 
    11 === Text mining add-on for Orange === 
    12 Current [https://bitbucket.org/biolab/orange-addon-text Orange add-on for text mining] is outdated and incomplete. Source code needs rafactoring in order to be compliant with [http://orange.biolab.si/trac/wiki/Orange25 Orange 2.5 development guidelines]. Additionally, current text mining add-on lacks of documentation in reST format (including tutorial for beginners), unit tests and installation supported by [http://pypi.python.org/pypi PyPI]. 
     11=== Porting Python code to Orange 3.0 === 
    1312 
    14 Project should also include a comparison between already implemented basic text (pre)processing techniques (lemmatization, steaming, document distance, feature sub selection, phrase detection) in current version of add-on and latest state-of-the-art techniques. If necessary additional algorithms (for example: multinomial Naive Bayes) should be (re)implemented. It would be very nice if text mining add-on functionalities would we also available from widgets in OrangeCanvas.  
     13While migrating to Python 3.0 that broke compatibility with older versions of Python, we decided to seize the opportunity to clean up our house, too. A majority of Orange's C++ code has been rewritten, and while most functionality is still there, classes have been renamed, some have been eliminated, function arguments have been cleaned up and so forth. Now we would need to correspondingly change the parts in Python. This would require some routine refactoring and so forth, and also reimplementing some functionality that used to be in C++ but should be moved to Python. At the same time, we would need tools to make this process as automated as possible (an Orange equivalent of 2to3 script for Python). 
    1514 
    16 Useful skills: Python. Data mining. 
     15Note that this is not about porting Orange from Python 2.X to Py3K: this is trivial and can be done in one evening (we tried it). The work ranges from running 2to3 to redesigning some architectural parts, so the student will have to be in constant contact with the core group. 
     16 
     17Required skills: Good knowledge of Orange and Python. 
    1718 
    1819Level from 1 (beginner) to 5 (professional): 4 
    1920 
    20 Possible mentors: Črt 
     21Possible mentors: Janez 
    2122 
    2223=== Data input from mldata.org === 
     
    2930 
    3031Possible mentors: Blaž 
    31  
    32 === Widgets for statistics === 
    33  
    34 Orange is rather weak in basic statistics, from various statistical tests to linear regression, dimensionality reduction and so forth. It would be great to have some widgets for this. The code for computation of all this is already available in other libraries which we can call from Python, so what we actually need is a good integration within the canvas.  
    35  
    36 Level from 1 (beginner) to 5 (professional): 3.5 
    37  
    38 Possible mentors: Janez 
    3932 
    4033=== Support for parallel computation for scripting/backend === 
     
    8881Possible mentors: Marko 
    8982 
    90 === Neural Networks === 
    91  
    92 Orange implements many algorithms for classification, but currently  
    93 lacks support for neural network learning. The task consists of surveying 
    94 available open-source implementations of neural networks. If no suitable 
    95 library is found, you will be asked to implement neural networks in C++ 
    96 from scratch. 
    97  
    98 Level from 1 (beginner) to 5 (professional): 3 
    99  
    100 Possible mentors: Jure 
    101  
    102 === Time-series analysis === 
    103  
    104 Orange currently lacks any [http://en.wikipedia.org/wiki/Time_series time-series] analysis tools. It would be great to develop some basic tools for dealing with them: reading, normalizing, basic pattern search, feature extraction, some (auto-)correlation and similar basic techniques, and so on. Research what other similar applications support and propose which features would be useful to have as a basic set of tools. 
    105  
    106 Important is to implement (in a modular way) feature extraction from time-series analysis so that it can be integrated with the rest of Orange (and learning, classification and visualization tools already existing there). 
    107  
    108 '''Disclaimer''': We do not have any experts on time-series analysis in a laboratory so student will have to be independent and self-learning about this. Mentor will provide some guidance and help with integration into Orange. 
    109  
    110 Useful skills: Python. Data analysis experience. Digital signal processing experience could also help. 
    111  
    112 Level from 1 (beginner) to 5 (professional): 4 
    113  
    114 Possible mentors: Mitar 
    115  
    116 === Porting Python code to Orange 3.0 === 
    117  
    118 While migrating to Python 3.0 that broke compatibility with older versions of Python, we decided to seize the opportunity to clean up our house, too. A majority of Orange's C++ code has been rewritten, and while most functionality is still there, classes have been renamed, some have been eliminated, function arguments have been cleaned up and so forth. Now we would need to correspondingly change the parts in Python. This would require some routine refactoring and so forth, and also reimplementing some functionality that used to be in C++ but should be moved to Python. At the same time, we would need tools to make this process as automated as possible (an Orange equivalent of 2to3 script for Python). 
    119  
    120 Note that this is not about porting Orange from Python 2.X to Py3K: this is trivial and can be done in one evening (we tried it). The work ranges from running 2to3 to redesigning some architectural parts, so the student will have to be in constant contact with the core group. 
    121  
    122 Required skills: Good knowledge of Orange and Python. 
    123  
    124 Level from 1 (beginner) to 5 (professional): 4 
    125  
    126 Possible mentors: Janez 
    127  
    128 === biox library (NGS, next-generation sequencing) === 
    129  
    130 Orange already offers the Bioinformatics add-on but currently lacks tools for NGS (next-generation sequencing) data management and analysis. We suggest developing Python library biox (also by integrating existing state-of-the-art software) to be used in Orange. 
    131  
    132 Short description of project tasks: 
    133 * develop support for reading/writing/searching the most used bioinformatics file formats: fasta, fastq, bed, wig, bigWig, gtf, gff3, bedGraph. Carefully craft memory efficient representations of various features (if needed, represent features in C and connect with Python), 
    134 * develop simple (programmatically easy to use) wrappers for existing NGS open source software solutions such as: read quality analysis (e.g. FASTQC), mapping of reads to reference genomes (e.g.: bowtie, bowtie2, tophat), differential expression analysis (e.g.: DESeq, baySeq), 
    135 * where needed, various tools should be able to produce statistical reports in text and also graphical format (matplotlib). 
    136  
    137 Level from 1 (beginner) to 5 (professional): 5 
    138  
    139 Possible mentors: Gregor, Tomaz, Crt 
    140  
    14183=== Animations in Orange === 
    14284