Changes between Version 60 and Version 61 of GSoC/Ideas
- 03/08/12 12:30:13 (21 months ago)
v60 v61 12 12 Current [https://bitbucket.org/biolab/orange-addon-text Orange add-on for text mining] is outdated and incomplete. Source code needs rafactoring in order to be compliant with [http://orange.biolab.si/trac/wiki/Orange25 Orange 2.5 development guidelines]. Additionally, current text mining add-on lacks of documentation in reST format (including tutorial for beginners), unit tests and installation supported by PyPI (http://pypi.python.org/pypi). 13 13 14 Review existing text preprocessing methods (lemmatization and steaming) in orngText and propose improvements, 14 Project should also include a comparison between already implemented basic text (pre)processing techniques (lemmatization, steaming, document distance, feature sub selection, phrase detection) in current version of add-on and latest state-of-the-art techniques. If necessary additional algorithms should be implemented. It would be very nice if text mining add-on functionalities would we also available from widgets in OrangeCanvas. 15 15 16 16 Useful skills: Python. Data mining.