New in Orange: Partial least squares regression

Partial least squares regression is a regression technique which supports multiple response variables. PLS regression is very popular in areas such as bioinformatics, chemometrics etc. where the number of observations is usually less than the number of measured variables and where there exists multicollinearity among the predictor variables. In such situations, standard regression techniques would usually fail. The PLS regression is now available in Orange (see documentation)!

Read more, comment, like, share ...
published: 02/02/12 16:21 | author: lan | tags: multitarget pls regression | Comments and Reactions

Orange 2.5a2 available

Orange 2.5a2 has been uploaded to PyPI. It now includes basic support for multi-label classification (developed during the Google Summer of Code 2011), some new widget icons and documentation for basket format. Release is also tagged on our Bitbucket repository.

Comment, like, share ...
published: 23/01/12 11:23 | author: anze | tags: gsoc pypi release | Comments and Reactions

Multi-label classification (and Multi-target prediction) in Orange

The last summer, student Wencan Luo participated in Google Summer of Code to implement Multi-label Classification in Orange. He provided a framework, implemented a few algorithms and some prototype widgets. His work has been "hidden" in our repositories for too long; finally, we have merged part of his code into Orange (widgets are not there yet ...) and added a more general support for multi-target prediction.

Read more, comment, like, share ...
published: 09/01/12 12:41 | author: matija | tags: classification gsoc mlc multilabel | Comments and Reactions

New Orange icons

As new and new widgets with new features are added to Orange, icons for them have to be drawn. Most of the time those are just some quick sketches or even missing altogether. But now we are starting to redraw and unify them. A few of them have already been made.

Feature Constructor Image Viewer Paint Data Preprocess Python Script SVM

Comment, like, share ...
published: 06/01/12 14:54 | author: mitar | tags: icons | Comments and Reactions

Parallel Orange?

We attended a NIPS 2011 workshop on processing and learning from large scale data. Various presenters showed different tools and frameworks that can be used when developing algorithms suitable for dealing with large scale data, but none of them were written in Python and as such, not useful for Orange. We have been looking for a framework that would help us run code in parallel for some time, but so far with no luck.

Read more, comment, like, share ...
published: 03/01/12 08:59 | author: anze | tags: parallelization | Comments and Reactions

Earth - Multivariate adaptive regression splines

There have recently been some additions to the lineup of Orange learners. One of these is Orange.regression.earth.EarthLearner. It is an Orange interface to the Earth library written by Stephen Milborrow implementing Multivariate adaptive regression splines.

Read more, comment, like, share ...
published: 20/12/11 12:22 | author: ales | tags: earth mars regression | Comments and Reactions

Orange 2.5: code conversion

Orange 2.5 unifies Orange's C++ core and Python modules into a single module hierarchy. To use the new module hierarchy, import Orange instead of orange and accompanying orng* modules. While we will maintain backward compatibility in 2.* releases, we nevertheless suggest programmers to use the new interface. The provided conversion tool can help refactor your code to use the new interface.

Read more, comment, like, share ...
published: 20/12/11 12:21 | author: marko | tags: orange25 | Comments and Reactions

Random forest switches to Simple tree learner by default

Random forest classifiers now use Orange.classification.tree.SimpleTreeLearner by default, which considerably shortens their construction times.

Read more, comment, like, share ...
published: 08/12/11 15:28 | author: jurezb | tags: forestlearner simpletreelearner | Comments and Reactions

GSoC Mentor Summit

On 22th and 23th October 2011 there was Google Summer of Code Mentor Summit in Mountain View, California. Google Summer of Code is Google's program for encouraging students to work on open-source projects during their summer break. Because this year Orange participated in this program too, we decided to participate also in this summit and get to know other mentors, other open-source projects and organizations, exchange our experiences, learn something new, and improve our connections and collaborations with others.

015 001 002 004 005 006 008 009 011 012 014 017 018 003 007 010 013 016 019
Read more, comment, like, share ...
published: 26/10/11 11:29 | author: mitar | tags: gsoc summit | Comments and Reactions

Debian packages support multiple Python versions now

We have created Debian packages for multiple Python versions. This means that they work now with both Python 2.6 and 2.7 out of the box, or if you compile them manually, with any (supported) version you have installed on your (Debian-based) system.

Practically, this means that now you can install them without manual compiling on current Debian and Ubuntu systems. Give it a try, add our Debian package repository, apt-get install python-orange for Orange library/modules and/or orange-canvas for GUI. If you install the later package, type orange in the terminal and Orange canvas will pop-up.

Comment, like, share ...
published: 13/09/11 00:32 | author: mitar | tags: debian packaging python | Comments and Reactions

3D Visualizations in Orange

Over the summer I worked (and still do) on several new 3D visualization widgets as well as a 3D plotting library they use, which will hopefully simplify making more widgets. The library is designed to be similar in terms of API to the new Qt plotting library Noughmad is working on.

The library uses OpenGL 2/3: since Khronos deprecated parts of the old OpenGL API (particularly immediate mode and fixed-function functionality) care has been taken to use only capabilities less likely to go away in the years to come. All the drawing is done using shaders; geometry data is fed to the graphics hardware using Vertex Buffers. The library is fully functional under OpenGL 2.0; when hardware supports newer versions (3+), several optimizations are possible (e.g. geometry processing is done on the GPU rather than on CPU), possibly resulting in improved user experience.

Read more, comment, like, share ...
published: 07/09/11 10:30 | author: matejd | tags: 3d opengl visualization | Comments and Reactions

Orange badges are here!

Orange badges are here! They come in two flavors. Tasty!

Badges

Comment, like, share ...
published: 04/09/11 03:29 | author: mitar | tags: badges | Comments and Reactions

GSoC Review: Visualizations with Qt

During the course of this summer, I created a new plotting library for Orange plot, replacing the use of PyQwt. I can say that I have succesfully completed my project, but the library (and especially the visualization widgets) could still use some more work. The new library supports a similar interface, so little change is needed to convert individual widgets, but it also has several advantages over the old implementation:

  • Animations: When using a single curve to show all data points, data changes only move the points instead of replacing them. These moves are now animated, as are color and size changes.
  • Multithreading: All position calculations are done in separate threads, so the interface remains responsive even when an long operation is running in the background.
  • Speed: I removed several occurances of needlessly clearing and repopulating the graph.
  • Simplicity: Because it was written with Orange in mind, the new library has functions that match Orange's data structures. This leads to simpler code in widgets using the library, and less operations in Python.
  • Appearance: The plot can use the system palette, or a custom color theme. In general, I think it looks much nicer that Qwt-based plots.
  • Documentation: There is an extensive API documentation (will soon be available at Orange 2.5 documentation), as well as two widget examples.
Read more, comment, like, share ...
published: 03/09/11 08:28 | author: Noughmad | tags: gsoc plot qt visualization | Comments and Reactions

GSoC Review: Multi-label Classification Implementation

Traditional single-label classification is concerned with learning from a set of examples that are associated with a single label l from a set of disjoint labels L, |L| > 1. If |L| = 2, then the learning problem is called a binary classification problem, while if |L| > 2, then it is called a multi-class classification problem (Tsoumakas & Katakis, 2007).

Multi-label classification methods are increasingly used by many applications, such as textual data classification, protein function classification, music categorization and semantic scene classification. However, currently, Orange can only handle single-label problems. As a result, the project Multi-label classification Implementation has been proposed to extend Orange to support multi-label.

We can group the existing methods for multi-label classification into two main categories: a) problem transformation method, and b) algorithm adaptation methods. In the former one, multi-label problems are converted to single-label, and then the traditional binary classification can apply; in the latter case, methods directly classify the multi-label data, instead.

In this project, two transformation methods and two algorithm adaptation methods are implemented. Along with the methods, their widgets are also added. As the evaluation metrics for multi-label data are different from the single-label ones, new evaluation measures are supported. The code is available in SVN branch.

Fortunately, benefiting from the Tab file format, the ExampleTable can store multi-label data without any modification. Now, we can add a special value – label into the attributes dictionary of the domain with value 1. In this way, if the attribute description has the keyword label, then it is a label; otherwise, it is a normal feature.

Read more, comment, like, share ...
published: 02/09/11 05:47 | author: wencanluo | tags: classification gsoc multilabel | Comments and Reactions

GSoC Review: MF - Matrix Factorization Techniques for Data Mining

MF - Matrix Factorization Techniques for Data Mining is a Python scripting library which includes a number of published matrix factorization algorithms, initialization methods, quality and performance measures and facilitates the combination of these to produce new strategies. The library represents a unified and efficient interface to matrix factorization algorithms and methods.

The MF works with numpy dense matrices and scipy sparse matrices (where this is possible to save on space). The library has support for multiple runs of the algorithms which can be used for some quality measures. By setting runtime specific options tracking the residuals error within one (or more) run or tracking fitted factorization model is possible. Extensive documentation with working examples which demonstrate real applications, commonly used benchmark data and visualization methods are provided to help with the interpretation and comprehension of the results.

Read more, comment, like, share ...
published: 01/09/11 23:48 | author: marinkaz | tags: gsoc matrixfactorization | Comments and Reactions