Orange Forum • View topic - GSOC 2013 Application : Porting Orange code to Python 3.0

GSOC 2013 Application : Porting Orange code to Python 3.0

General discussions about Orange and with Orange connected things (data mining, machine learning, bioinformatics...).

GSOC 2013 Application : Porting Orange code to Python 3.0

Postby rahulkatare » Mon Apr 01, 2013 21:51

Hi,

I am Rahul Katare, a 4th yr. computer science student from IIT Kharagpur. I am interested to contribute in open source development this summer and decided that applying for GSOC would be a good start for me. I went through the idea list on the gsoc idea page of Orange and found the projects quite interesting and suitable to my skills. I am familiar with the use of Numpy and Scipy library of Python in a project in Complex Networks course where I had to use statistical functions and even had to use matplotlib. I have also the experience of doing a variety of projects in python. I had built a complete geographical information system over our campus map using Python Tkinter library, A multi-threaded web crawler and a demo search engine using Beautiful Soup and UrlLib2 library of Python and also built a toolkit for deductive fault simulation using Python networkx library. I can send the snapshots and code of these project to the mailing list for code review if desired.

I was looking at the project idea "Porting Orange code to Python 3.0" and found it really interesting to work at. I cloned the repository of Orange at bitbucket and looked at the source codes of Orange. We could do a lot of things while re-implementing the Orange from scratch. First of all it is best to do module-wise re-implementation of the whole source code which is in C++. For each complete re-implementation of a functionality say bayes.cpp, we can pickup the modules in the bayes.cpp file and understand the functionality of these module by reading the code and then map the same functionality in python 3.0. As suggested we can use Cython to implement a module which is implementing a heavy computation as with Cython we can achieve good efficiency speed wise. Also we can look for optimizations pertaining to a module wherever they are possible.
I am very interested to be a part of this project and contribute in it for a long term basis. I assure a continuous contact with Orange developers on a daily basis. I also have the desired skills required for this project i.e familiarity with Python libraries like numpy, scipy and also with Cython which I am currently using to do my project on Predicting Click through rate of Ads where I have a huge dataset of 150 million instances to train with different machine learning algorithms. I have also coded basic versions of Naive Bayes, Linear Regression algorithms in Python. I can use my Algorithm knowledge to perform optimizations wherever required, like for ex : there are lot of matrix operations involved in the implementation of Data mining algorithms, I am quite familiar with different algorithms on matrices for ex: Strassen Matrix Multiplication which reduces the computation required to multiply matrices. We can make a module for Strassen Multiplication using Cython instead of using Python's inbuilt matrix multiplication in numpy. This is one of the examples of optimization we can perform and we can do many others.

To get a realistic sense of the time required to complete coding of different files and to make a work plan in my proposal, I would like to discuss more with the mentors Janez and Marko. Mainly I would like to discuss more about the current code and which part of the code-base is in their top priority to port and any specific requirements etc.
Please let me know your views.

Warm Regards,
Rahul Katare

Re: GSOC 2013 Application : Porting Orange code to Python 3.

Postby Ales » Tue Apr 02, 2013 10:45

Note that the current work for porting orange to python3 is in https://bitbucket.org/biolab/orange3, so you should also look over the code there (it is very much work in progress, though).

Also we would like to reuse scikit-learn whenever possible, so for instance naive bayes from scikit-learn is used in the current codebase.

Re: GSOC 2013 Application : Porting Orange code to Python 3.

Postby rahulkatare » Wed Apr 03, 2013 14:52

Thanks Ales for the link. I have cloned it and am going through the current source code. I looked at the modules present in the scikit library of python. Seems like many of the machine learning algorithms are already implemented there :). I was also looking through the other ideas at the gsoc page and found the Widgets in separate process and Neural Network idea pretty interesting too.
I have done a course on Operating System and have good knowledge of processes and threads. I had also built a multithreaded web crawler in Python. I have used Qt C++ version quite recently. I was going through the PyQt library documentation to gain more experience.

Regarding the Widget Idea, the procedure is quite clearly mentioned in the description, and it would be good to start making it for a single widget now and then integrating other widgets. Could you just confirm that for this project I am looking at the right repository https://bitbucket.org/biolab/orange ??

Regarding Neural Network Project is concerned, We can use Python network-x library to create the node graph. It will be easy to implement back-propogation if we have a network graph, I had done a similar process of backpropogation when I made a deductive fault simulator and I used network-x to create the node graph, It made my life kind of easy using that. This is my current view point.

I would like to discuss these ideas with the mentors too and know their view point and their ideas for implementation. Also it would be good if I be given some initial work in one of these ideas which is of more priority at the moment, so that I gain more experience of the code base and would also serve as a review.

Thanks

Re: GSOC 2013 Application : Porting Orange code to Python 3.

Postby Ales » Mon Apr 15, 2013 18:58

Sorry but we did not get accepted as a mentoring organization for GSoC 2013.


Return to General Discussions



cron