Orange Forum • View topic - Matrix factorization techniques for data mining

Matrix factorization techniques for data mining

General discussions about Orange and with Orange connected things (data mining, machine learning, bioinformatics...).

Matrix factorization techniques for data mining

Postby Blaz » Sat Mar 26, 2011 19:46

I have just added a new topic to the list of ideas which involves implementation of matrix factorization techniques.

Re: Matrix factorization techniques for data mining

Postby sumith » Sun Mar 27, 2011 13:42

Hi Blaz,

I'm interested on this idea. Just to brief about me, I am a PhD student at Monash University, Australia working in the area of data mining, specially in the area of Text mining. I completed my undergraduate studies in Moratuwa University, Sri Lanka in the field of Computer Science and Engineering and I was awarded the gold medal for the best student of the entire batch. About my programming experiences, I have more than 7 years of programming as an academic as well as a software engineer in the industry.

I have worked with different Matrix factorization techniques during my undergraduate and postgraduate studies, so i think I am good with the theoretical aspect of this project. But I think I need to refresh my knowledge on python since I haven't used it recently. but I am pretty sure It won't take long.

I would like to know your input about the project to improve my knowledge about what is expected.

Cheers,
Sumith

Re: Matrix factorization techniques for data mining

Postby Blaz » Sun Mar 27, 2011 15:09

Dear Sumith,

We would need a library of implementations for various matrix factorization techniques, good documentation for these, example cases, ... Exactly what we wrote in the proposal for this idea. Good integration with Orange is of course compulsory.

Best wishes,
Blaz.

Re: Matrix factorization techniques for data mining

Postby sumith » Sun Apr 03, 2011 19:13

Hi Blaz,

I have submitted my proposal in Google. It would be really nice of you, if you could let me know your feedback, so that I can improve it. Since this is my first time in applying for GSoC, I'm not sure to which detail I have to write, so please advise me on this.

At the moment, I am working on the proposals about my new ideas suggested before about Text Clustering Module and GSOM clustering. I hope it is OK to submit multiple proposals to the same organisation. I will be very much interested to integrate these all these things I can contribute with even after the GSoC, so thought of worth proposing them.

Thanks,
Sumith

Re: Matrix factorization techniques for data mining

Postby Mitar » Sun Apr 03, 2011 19:37

I am not sure if Google's interface allows you to submit multiple proposals to the same organization? Does it?

Anyway, we would like to see each student proposing only one proposal. It is student's work to decide what he finds the most interesting and what is something he is best at, have experience with it and so on. So please concentrate only on one proposal, make it shine, research it, check existing papers on the subject and so on.

Of course you are invited to discuss your ideas here, so that maybe you get a better feeling what is best suited for you.

Re: Matrix factorization techniques for data mining

Postby Mitar » Sun Apr 03, 2011 19:39

Hm, it seems it is really possible to submit multiple proposals.

Re: Matrix factorization techniques for data mining

Postby sumith » Mon Apr 04, 2011 4:19

Hi Blaz,

Ah is that so? In google page it says you can submit multiple proposals. But that's fine, if you prefer one proposal from each student.

Actually, based on my interests, I like to improve the text mining module in Orange. At the same time I like in Matrix factorization proposed by you. I have worked in both the areas during my research work, so I'm really fine with both.

But I really don't have any idea about which thing is most important to you. If I propose a new idea by myself, it might not pop up as better based on the number of proposals you have.

What's your personal idea on, submitting a proposal to a topic proposed by you or new topic proposed by myself? Will both have a equal chance of getting accepted? If not do you mind if I submit 2 proposals, I have already submitted one, and working on the other one at the moment.

I'm really sorry about the trouble, but I really like to join the Open Source community in the area that I research on, and this is a very good start point to get it start.

Thanks,
Sumith

Re: Matrix factorization techniques for data mining

Postby Blaz » Mon Apr 04, 2011 6:13

Sumith, others,

Both text mining and matrix factorization are important to Orange, we will treat them on equal grounds. In any case, try to be more descriptive when listing the techniques that you would like to implement. Provide a brief description of the method (two, three sentences), perhaps citing a link where this is described in detail. You can also add a link to your proposal to some library or software tool where the technique has already been implemented, providing it as an example implementation. All proposals require also writing of documentation. You may get familiar with the style with which Orange documentation is currently written, and perhaps propose an example (problem, data set) you will use in your documentation.

Ah, since Orange 2.5, which is just coming up, all documentation will be in reStructured Text.

Best wishes,
Blaz


Return to General Discussions