Orange Forum • View topic - Text mining add-on for Orange and Time-series analysis proj

Text mining add-on for Orange and Time-series analysis proj

General discussions about Orange and with Orange connected things (data mining, machine learning, bioinformatics...).

Text mining add-on for Orange and Time-series analysis proj

Postby HarshitDubey » Thu Mar 15, 2012 21:12

Hi

I am interested in the two projects for GSOC

1. Text mining add-on for Orange
I have used many data mining tools (including orange) and found out that a lot of functionality to process text data can be introduced.like stemming , parsing (regex search) , phrase detection. Also i have experience of working over data mining projects (classification and clustering) that deals with processing text data.

2. Time-series analysis
I am also interested in this project. I am familiar with reading, normalizing, searching patterns , feature extraction , finding correlation and other basic techniques.

So please guide me what should i do next ??

Re: Text mining add-on for Orange and Time-series analysis p

Postby Mitar » Thu Mar 15, 2012 21:19

Think about concrete proposals on how to contribute to Orange for GSoC. You can also already code something and show us that so that we can see your ideas and how you work. If you are really interested in all this you can contribute to Orange anyway, with out without your participation in GSoC.

Re: Text mining add-on for Orange and Time-series analysis p

Postby HarshitDubey » Sat Mar 17, 2012 23:39

HI
I really want to contribute to orange but i am not understanding how to get started. I have downloaded the source code and build it, but i am not understanding what feature should i implement first (with regard to Text mining project)
So please guide me.

Re: Text mining add-on for Orange and Time-series analysis p

Postby crtg » Wed Mar 21, 2012 16:50

Hi,

you should try to use our text-mining add-on and I am sure you will find some room for improvement. We welcome new features, but we would also like to refactor existing text-mining add-on. There are many features implemented, but some of them probably don't work with current version of Orange, some aren't documented and some don't have widgets.

Crt

Re: Text mining add-on for Orange and Time-series analysis p

Postby HarshitDubey » Thu Mar 22, 2012 20:58

Hi

Thanks for the reply. i was going through the documentation of the text mining add-on and found out that the add-on lags a lot on documentation part. Many pre-processing functions( like stemming , having a different scoring function rather then tf*idf) can be implemented.
But i am finding it very hard to use the current text mining add-on, please provide me some help in this regard.

Thanks
Harshit Dubey

Re: Text mining add-on for Orange and Time-series analysis p

Postby Mitar » Thu Mar 22, 2012 23:58

Harshit, one of very important aspects of our candidates is also independence. As we have explained to you, text mining add-on is in a poor state. This is why we are proposing that some student take care of it. Of course this means he or she should be capable of mostly independently get a grasp of existing code and then propose a path for its improvement. If you need so much help as you are requesting all the time, then maybe you are not the candidate we are searching for. If you just try to engage the community, then the much better way is to start proposing patches of code than to ask for the very generic help ("please help me"). You should be much more precise, if you want any help. And also, as I explained above, you should maybe try to solve problems yourself and prove to us that you are capable of solving problems yourself.

Re: Text mining add-on for Orange and Time-series analysis p

Postby HarshitDubey » Fri Mar 23, 2012 20:45

Sorry i may have not been precise in asking the type of help needed, but will take care of it from now on. I have also read the source code of the text mining add-on and tried to integrate it with Porter Stemmer. I have already sent a pull request.
I am willing to contribute from my side and may be in doing so i have asked for more help but will take care of it from now on.

Thanks and Regards
Harshit Dubey

Re: Text mining add-on for Orange and Time-series analysis p

Postby HarshitDubey » Mon Mar 26, 2012 11:48

Hi

Along with exploring the add-on for text mining of orange, I have been exploring different tools available for text mining (example lucene etc) and found out different features that we can include, they are as follows

1. Include parsing of different file format :
currently only xml file format parsing is available , we can include parsing of json , html and other file formats

2. Storing the index created for a file in some index file, so that user can use it in future and no need to index the file again and again.

3. Providing more detailed documentation and making a tutorial for text mining add-on

Please provide me some feedback on how useful will this changes be and what more i can include.

Thanks and Regards
Harshit Dubey

Re: Text mining add-on for Orange and Time-series analysis p

Postby HarshitDubey » Wed Mar 28, 2012 10:48

Hi

Just a gentle reminder about my last post.

Re: Text mining add-on for Orange and Time-series analysis p

Postby Mitar » Thu Mar 29, 2012 14:48

I can only speak about time series idea. As we do not have anything specific for that currently, any basic/initial support for time series analysis would be good. Keep that in mind while doing a proposal (if you decide for time series). What would also be important for us that we find some way of integrating time series with the rest of the Orange, for example to have some widget which would find some features we could work on with existing widgets.


Return to General Discussions



cron