Changes between Version 57 and Version 58 of GSoC/Ideas


Ignore:
Timestamp:
03/08/12 10:46:45 (3 years ago)
Author:
gregorr
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • GSoC/Ideas

    v57 v58  
    122122Possible mentors: Janez 
    123123 
     124=== biox library (NGS, next-generation sequencing) === 
     125 
     126Orange already offers the Bioinformatics add-on but currently lacks tools for NGS (next-generation sequencing) data management and analysis. We suggest developing Python library biox (also by integrating existing state-of-the-art software) to be used in Orange. 
     127 
     128Short description of project tasks: 
     129* develop support for reading/writing/searching the most used bioinformatics file formats: fasta, fastq, bed, wig, bigWig, gtf, gff3, bedGraph. Carefully craft memory efficient representations of various features (if needed, represent features in C and connect with Python), 
     130* develop simple (programmatically easy to use) wrappers for existing NGS open source software solutions such as: read quality analysis (e.g. FASTQC), mapping of reads to reference genomes (e.g.: bowtie, bowtie2, tophat), differential expression analysis (e.g.: DESeq, baySeq), 
     131* where needed, various tools should be able to produce statistical reports in text and also graphical format (matplotlib). 
     132 
     133Level from 1 (beginner) to 5 (professional): 5 
     134 
     135Possible mentors: Gregor, Tomaz, Crt 
     136 
    124137== Ideas selected for GSoC 2011 == 
    125138 
     
    160173 
    161174Possible mentors: ? 
    162  
    163 === biox library (NGS, next-generation sequencing) === 
    164  
    165 Orange already offers the Bioinformatics add-on but currently lacks tools for NGS (next-generation sequencing) data management and analysis. We suggest developing Python library biox (also by integrating existing state-of-the-art software) to be used in Orange. 
    166  
    167 Short description of project tasks: 
    168 * develop support for reading/writing/searching the most used bioinformatics file formats: fasta, fastq, bed, wig, bigWig, gtf, gff3, bedGraph. Carefully craft memory efficient representations of various features (if needed, represent features in C and connect with Python), 
    169 * develop simple (programmatically easy to use) wrappers for existing NGS open source software solutions such as: read quality analysis (e.g. FASTQC), mapping of reads to reference genomes (e.g.: bowtie, bowtie2, tophat), differential expression analysis (e.g.: DESeq, baySeq), 
    170 * where needed, various tools should be able to produce statistical reports in text and also graphical format (matplotlib).