Partial least squares regression is a regression technique which supports multiple response variables. PLS regression is very popular in areas such as bioinformatics, chemometrics etc. where the number of observations is usually less than the number of measured variables and where there exists multicollinearity among the predictor variables. In such situations, standard regression techniques would usually fail. The PLS regression is now available in Orange (see documentation)!
Orange 2.5a2 available
Orange 2.5a2 has been uploaded to PyPI. It now includes basic support for multi-label classification (developed during the Google Summer of Code 2011), some new widget icons and documentation for basket format. Release is also tagged on our Bitbucket repository.
Multi-label classification (and Multi-target prediction) in Orange
The last summer, student Wencan Luo participated in Google Summer of Code to implement Multi-label Classification in Orange. He provided a framework, implemented a few algorithms and some prototype widgets. His work has been "hidden" in our repositories for too long; finally, we have merged part of his code into Orange (widgets are not there yet ...) and added a more general support for multi-target prediction.
New Orange icons
As new and new widgets with new features are added to Orange, icons for them have to be drawn. Most of the time those are just some quick sketches or even missing altogether. But now we are starting to redraw and unify them. A few of them have already been made.
Parallel Orange?
We attended a NIPS 2011 workshop on processing and learning from large scale data. Various presenters showed different tools and frameworks that can be used when developing algorithms suitable for dealing with large scale data, but none of them were written in Python and as such, not useful for Orange. We have been looking for a framework that would help us run code in parallel for some time, but so far with no luck.
Earth - Multivariate adaptive regression splines
There have recently been some additions to the lineup of Orange learners. One of these is Orange.regression.earth.EarthLearner. It is an Orange interface to the Earth library written by Stephen Milborrow implementing Multivariate adaptive regression splines.
Orange 2.5: code conversion
Orange 2.5 unifies Orange's C++ core and Python modules into a single module hierarchy. To use the new module hierarchy, import Orange instead of orange and accompanying orng* modules. While we will maintain backward compatibility in 2.* releases, we nevertheless suggest programmers to use the new interface. The provided conversion tool can help refactor your code to use the new interface.
Random forest switches to Simple tree learner by default
Random forest classifiers now use Orange.classification.tree.SimpleTreeLearner by default, which considerably shortens their construction times.
GSoC Mentor Summit
On 22th and 23th October 2011 there was Google Summer of Code Mentor Summit in Mountain View, California. Google Summer of Code is Google's program for encouraging students to work on open-source projects during their summer break. Because this year Orange participated in this program too, we decided to participate also in this summit and get to know other mentors, other open-source projects and organizations, exchange our experiences, learn something new, and improve our connections and collaborations with others.
Debian packages support multiple Python versions now
We have created Debian packages for multiple Python versions. This means that they work now with both Python 2.6 and 2.7 out of the box, or if you compile them manually, with any (supported) version you have installed on your (Debian-based) system.
Practically, this means that now you can install them without manual compiling on current Debian and Ubuntu systems. Give it a try, add our Debian package repository, apt-get install python-orange for Orange library/modules and/or orange-canvas for GUI. If you install the later package, type orange in the terminal and Orange canvas will pop-up.
3D Visualizations in Orange
Over the summer I worked (and still do) on several new 3D visualization widgets as well as a 3D plotting library they use, which will hopefully simplify making more widgets. The library is designed to be similar in terms of API to the new Qt plotting library Noughmad is working on.
The library uses OpenGL 2/3: since Khronos deprecated parts of the old OpenGL API (particularly immediate mode and fixed-function functionality) care has been taken to use only capabilities less likely to go away in the years to come. All the drawing is done using shaders; geometry data is fed to the graphics hardware using Vertex Buffers. The library is fully functional under OpenGL 2.0; when hardware supports newer versions (3+), several optimizations are possible (e.g. geometry processing is done on the GPU rather than on CPU), possibly resulting in improved user experience.
GSoC Review: Visualizations with Qt
During the course of this summer, I created a new plotting library for Orange plot, replacing the use of PyQwt. I can say that I have succesfully completed my project, but the library (and especially the visualization widgets) could still use some more work. The new library supports a similar interface, so little change is needed to convert individual widgets, but it also has several advantages over the old implementation:
- Animations: When using a single curve to show all data points, data changes only move the points instead of replacing them. These moves are now animated, as are color and size changes.
- Multithreading: All position calculations are done in separate threads, so the interface remains responsive even when an long operation is running in the background.
- Speed: I removed several occurances of needlessly clearing and repopulating the graph.
- Simplicity: Because it was written with Orange in mind, the new library has functions that match Orange's data structures. This leads to simpler code in widgets using the library, and less operations in Python.
- Appearance: The plot can use the system palette, or a custom color theme. In general, I think it looks much nicer that Qwt-based plots.
- Documentation: There is an extensive API documentation (will soon be available at Orange 2.5 documentation), as well as two widget examples.
GSoC Review: Multi-label Classification Implementation
Traditional single-label classification is concerned with learning from a set of examples that are associated with a single label l from a set of disjoint labels L, |L| > 1. If |L| = 2, then the learning problem is called a binary classification problem, while if |L| > 2, then it is called a multi-class classification problem (Tsoumakas & Katakis, 2007).
Multi-label classification methods are increasingly used by many applications, such as textual data classification, protein function classification, music categorization and semantic scene classification. However, currently, Orange can only handle single-label problems. As a result, the project Multi-label classification Implementation has been proposed to extend Orange to support multi-label.
We can group the existing methods for multi-label classification into two main categories: a) problem transformation method, and b) algorithm adaptation methods. In the former one, multi-label problems are converted to single-label, and then the traditional binary classification can apply; in the latter case, methods directly classify the multi-label data, instead.
In this project, two transformation methods and two algorithm adaptation methods are implemented. Along with the methods, their widgets are also added. As the evaluation metrics for multi-label data are different from the single-label ones, new evaluation measures are supported. The code is available in SVN branch.
Fortunately, benefiting from the Tab file format, the ExampleTable can store multi-label data without any modification. Now, we can add a special value – label into the attributes dictionary of the domain with value 1. In this way, if the attribute description has the keyword label, then it is a label; otherwise, it is a normal feature.
GSoC Review: MF - Matrix Factorization Techniques for Data Mining
MF - Matrix Factorization Techniques for Data Mining is a Python scripting library which includes a number of published matrix factorization algorithms, initialization methods, quality and performance measures and facilitates the combination of these to produce new strategies. The library represents a unified and efficient interface to matrix factorization algorithms and methods.
The MF works with numpy dense matrices and scipy sparse matrices (where this is possible to save on space). The library has support for multiple runs of the algorithms which can be used for some quality measures. By setting runtime specific options tracking the residuals error within one (or more) run or tracking fitted factorization model is possible. Extensive documentation with working examples which demonstrate real applications, commonly used benchmark data and visualization methods are provided to help with the interpretation and comprehension of the results.
See entries for:
See entries tagged:
- 3d
- analysis
- badges
- bioinformatics
- bioorange
- bohinj
- classification
- conference
- dataloading
- debian
- distribution
- download
- earth
- fink
- forestlearner
- gsoc
- icons
- mars
- matrixfactorization
- mlc
- multilabel
- multitarget
- network
- networkx
- opengl
- orange25
- packaging
- parallelization
- performance
- plot
- pls
- pypi
- python
- qt
- regression
- release
- retreat
- simpletreelearner
- summit
- tree
- t-shirts
- visualization
- website
