Orange Canvas applied to x-ray optics

Orange Canvas is being appropriated by guys who would like to use it as graphical environment for simulating x-ray optics.

Manuel Sanchez del Rio, from The European Synchrotron Facility in Grenoble, France, and Luca Rebuffi from Elletra Sincrotrone, Trieste, Italy, were looking for a tool that would help them integrate the various tools for x-ray optics simulations, like the popular SHADOW and SRW. They discovered that the data workflow paradigm, like the one used in Orange Canvas, fits their needs perfectly. They took Orange, and replaced the existing widgets with new widgets that represent sources of photons (bending magnets, in the case of ESRF), various optical elements, like lenses and mirrors, and detectors. The channels between the widgets no longer pass data tables, like in the standard Orange, but rays of photons. How cool is this?

The result is a system in which the user can arrange the elements in a system that resembles the actual physical system, and then run the simulations using the most powerful tools available in x-ray optics.

The tool prototype has been presented at the SPIE Optics + Photonic 2014 in San Diego, the largest meeting of its kind.

We're really excited about this novel use of Orange Canvas.

spie.jpg

Comment, like, share ...
published: 26/08/14 15:59 | author: janez | Comments and Reactions

Orange and SQL

Orange 3.0 will also support working with data stored in a database.

While we have already talked about this some time ago, we here describe some technical details for anybody interested. This is not a thorough tecnical report, its purpose is only to provide an impression about the architecture of the upcoming version of Orange.

So, data tables in Orange 3.0 can refer to data in the working memory or in the database. Any (properly written) code that uses tables should work the same with both storages. When the data is stored in the database, the table is implemented as a "proxy object" with the necessary meta-data for constructing the SQL query to retrieve the data when needed. Operations on the data only modify the meta-data without retrieving any actual data. For instance, construction of a new table with some selected data subset, say all instances that match a certain condition, creates a new proxy with additional conditions for the WHERE clause. Similarly, selecting a subset of features only changes the domain (the list of features), which is later reflected in the columns of the SELECT clause.

Features in this model are no longer described just with their names but also with the part which goes into the query that retrieves or constructs their values. Discretization, for instance, constructs new features which wrap the representation of the continuous features into a CASE statement that assigns a value based on the boundaries of the bins.

Since the goal was to make the code in modules and widgets oblivious to the storage, we also needed separate implementation of the operations that need to be aware of how the data is stored. For instance, the code that computes the average values of attributes needs to be different for the two storages: for the in-memory data we need to use the corresponding numpy functions and for databases the average is computed on the server.

We went through the code of Orange 2.7 and identified the common operations on the data. We found that all data access belongs into the following types:

  1. basic aggregates like mean, variance, median, minimal and maximal value,
  2. distributions of discrete and continuous variables, values at percentiles,
  3. contingency matrices,
  4. covariance matrices,
  5. filtering of rows based on various criteria, including random sampling,
  6. selection of columns,
  7. construction of variables from values of other variables,
  8. matrices of distances (e.g. Euclidean) between all row pairs,
  9. individual data rows.

Points 1 to 4 are typical examples of what cannot be done on client but can be efficiently done in the database. The storage (a class derived from Table) now provides specialized methods for computing aggregates, distributions and contingencies, which use numpy for in-memory data and SQL for the data on the database.

Points 5 to 7 are implemented “lazily”, by modifying the SQL query describing the data as described above.

Point 8 is difficult to implement efficiently in common relational databases and, besides, results in a data matrix that is larger than the actual data. Methods that require such a matrix will need to be reimplemented and be aware of the storage mechanism.

Point 9 requires some caution with regard to how the data is retrieved and what it is used for. Access to individual rows should be used sparingly. Sequential retrieval - especially of all rows - needs to be avoided. For efficiency, most methods that did so in the previous versions of Orange will need to be reimplemented to use aggregate data (possibly as approximations) or to be aware of the data storage and execute some operations directly through SQL.

We have already ported a number of visualizations and other widgets to the new Orange. Here is one nice example: Mosaic needs to discretize the variables and then compute contingency matrices for discrete variables. Within the above scheme, the widget does not care about the storage mechanism, yet its computation is still as efficient as possible.

mosaic.png

The described activities were funded in part by the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.

Comment, like, share ...
published: 29/05/14 22:34 | author: janez | Comments and Reactions

Workshops at Baylor College of Medicine

On May 22nd and May 23rd, we (Blaz Zupan and Janez Demsar, assisted by Marinka Zitnik and Balaji Santhanam) have given two hands-on workshops called Data Mining without Programming at Baylor College of Medicine in Houston, Texas.

Actually, there was a lot of programming, but no Python or alike. The workshop was designed for biomedical students and Baylor's faculty members. We have presented a visual programming approach for development of data mining workflows for interactive data exploration. A three-hour workshop consisted of 15 data mining lessons on visual data exploration, classification, clustering, network analysis, and gene expression analytics. Each lesson focused on a particular data analysis task that the attendees solved with Orange.

The two workshops were organized by Baylor's Computational and Integrative Biomedical Research Center. Over two days, the event was attended by a large audience of 120 attendees.

workshop-a.jpg

workshop-b.jpg

Comment, like, share ...
published: 26/05/14 00:51 | author: blaz | Comments and Reactions

Viewing Images

I am lately having fun with Image Viewer. The widget has been recently updated and can display images stored locally or on the internet. But wait, what images? How on earth can Orange now display images if it can handle mere tabular or basket-based data?

Here's an example. I have considered a subset of animals from the zoo.tab data set (comes with Orange installation), and for demonstration purposes selected only a handful of attributes. I have added a new string attribute ("images") and declared that this is a meta attribute of the type "image". The values of this attribute are links to images on the web:

animals-dataset.png

Here is the resulting data set, zoo-with-images.tab . I have used this data set in a schema with hierarchical clustering, where upon selection of the part of the clustering tree I can display the associated images:

animals-schema.png

Typically and just like above, you would use a string meta attribute to store the link to images. Images can be referred to using a HTTP address, or, if stored locally, using a relative path from the data file location to the image files.

Here is another example, where all the images were local and we have associated them with a famous digits data set ( digits.zip is a data set in the Orange format with the image files). The task for this data set is to classify handwritten digits based on their bitmap representation. In the schema below we wanted to find out which are the most frequent errors some classification algorithm would make, and how do the images of the misclassified digits look like. Turns out that SVM with RBF kernel most often misclassify the digit 9 and confuses it with a digit 3:

digits-schema.png

Comment, like, share ...
published: 29/04/14 21:28 | author: blaz | tags: clustering images visualization | Comments and Reactions

Paint Your Data

One of the widgets I very much enjoy when teaching introductory course in data mining is Paint Data widget. In the data I would paint in this widget I would intentionally include some clusters, or intentionally obscure them. Or draw them in any strange shape. Then I would discuss with students if these clusters are identified by k-means clustering. Or by hierarchical clustering. We would also discuss automatic scoring of the quality of clusters, come up with the idea of a silhouette (ok, already invented, but helps if you get this idea on your own as well). And then we would play with various data sets and clustering techniques and their parameters in Orange.

Like in the following workflow where I drew three clusters that were indeed recognized by k-means clustering. Notice that silhouette scoring correctly identified even the number of clusters was guessed correctly. And I also drawn the clustered data in the Scatterplot to check if the clusters are indeed where they should be.

PaintData-k-Means-ok.png

Or like in the workflow below where k-means fails miserably (but some other clustering technique would not).

PaintData-k-Means-notok.png

Paint Data can also be used in supervised setting, for classification tasks. We can set the intended number of classes, and then chose any of these to paint its data. Below I have used it to create the data sets to check the behavior of several classifiers.

PaintData-Supervised.png

There are tons of other workflows where Paint Data can be useful. Give it a try!

Comment, like, share ...
published: 20/12/13 20:19 | author: blaz | tags: data visualization | Comments and Reactions

Brief History of Orange, Praise to Donald Michie

Informatica has recently published our paper on the history of Orange. The paper is a post-publication from a Conference on 100 Years of Alan Turing and 20 Years of Slovene AI Society, where Janez Demšar gave a talk on the topics.

History of Orange goes all the way back to 1997, when late Donald Michie had an idea that machine learning needs an open toolbox for machine learning. To spark the development, we co-organized WebLab97 at beautiful Bled, Slovenia. Workshop's name reflected Michie's idea that tool should be a web application where people can submit data mining code, procedures, testing scripts, and data and share them in the joint web workspace.

Donald Michie, a pioneer of Artificial Intelligence, was always ahead of time. (Check out a great talk by Ivan Bratko on their friendship and adventures in chess and machine learning). At WebLab97, Michie was actually very, very ahead of time. But despite the presence of IBM's Java team that could guide us in developments of the toolbox, the technology was not ripe and initiative of WebLab was gone as the conference ended. But, at least for us, the idea sparked interest of Janez and myself, and development of what is now Orange begun shortly after.

Our paper gives brief account of Orange's history and its developments since WebLab97. For reasons of brevity it does not mention that prior to Qt we have experimented with other GUI platforms. Prior to Qt, we laid our hopes to Pwm Python megawidgets, a library that helped us to construct the first Orange graphical user interface. The GUI part of Orange was called Orange*First. Its screenshot shows a tab for interactive discretisation, thanks to Noriaki Aoki who then proposed that this kind of visualisation should be useful in medical data analysis:

orange first

PS Somehow, I have lost a latex file with a WebLab97 program. It should be on some backup tape, somewhere. The following scan of the first page (and a weblab97.pdf ), left in some PPT presentation, is all that I can retrieve. The program of the second day is missing, with keynotes from Tom Mitchell, and much talk about then already a success story of R.

WebLab97.jpg

Comment, like, share ...
published: 09/10/13 06:33 | author: blaz | tags: history michie | Comments and Reactions

JMLR Publishes Article on Orange

Journal of Machine Learning Research has just published our paper on Orange. In the paper we focus on its Python scripting part. We have last reported on Orange scripting at ECML/PKDD 2004. The manuscript was well received (over 270 citations on Google Scholar), but it is now entirely outdated. This was also our only formal publication on Orange scripting. With publication in JMLR this is now a current description of Orange and will be, for a while :-), Orange’s primary reference.

Here's a reference:

Demšar, J., Curk, T., & Erjavec, A. et al. Orange: Data Mining Toolbox in Python; Journal of Machine Learning Research 14(Aug):2349−2353, 2013.

and bibtex entry:

@article{JMLR:demsar13a,
  author  = {Janez Dem\v{s}ar and Toma\v{z} Curk and Ale\v{s} Erjavec and
             \v{C}rt Gorup and Toma\v{z} Ho\v{c}evar and Mitar Milutinovi\v{c} and
             Martin Mo\v{z}ina and Matija Polajnar andMarko Toplak and  
             An\v{z}e Stari\v{c} and Miha \v{S}tajdohar and Lan Umek and 
             Lan \v{Z}agar and Jure \v{Z}bontar and Marinka \v{Z}itnik and
             Bla\v{z} Zupan},
  title   = {Orange: Data Mining Toolbox in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2013},
  volume  = {14},
  pages   = {2349-2353},
  url     = {http://jmlr.org/papers/v14/demsar13a.html}
}
Comment, like, share ...
published: 03/10/13 02:50 | author: blaz | tags: article jmlr scripting toolbox | Comments and Reactions

Orange and AXLE project

Our group at University of Ljubljana is a partner in the EU 7FP project Advanced Analytics for Extremely Large European Databases (AXLE). The project is particularly interesting because of the diverse partners that cover the entire vertical, from studying hardware architectures that would better support extremely large databases (University of Manchester, Barcelona Supercomputing Center) to making the necessary adjustments related to speed and security of databases (2ndQuadrant) to data analytics (our group) to handling and analyzing real data and decision making (Portavita).

As a result of the project, Orange will be better connected with databases. Currently, all data is stored in working memory, while the forthcoming Orange 3.0 will be able to handle data that is stored in the database. We are working on a parallel computation architecture. Visualization of large data also presents a big challenge: we cannot transfer large amounts of data from the database to the desktop, and on the other hand it is difficult to provide a rich interactive experience if visualizations are created on the server-side. Also, most visualizations are intrinsically unsuitable for large data sets. For instance, the scatter plot represents each data instance with a symbol. Even when the datum is represented with a single pixel, only a few million data points fits on the computer screen. So in the context of big data, we will have to replace scatterplots with heatmaps.

What have we got so far? Orange 3, which is in early stage of development, features a new architecture, which allows the data to be stored either in memory or on a database. In the latter case, selecting a subset of features or filtering the data does not copy the data but only modifies the queries that are used to access the data when needed. Computation of, for instance, distributions or contingency matrices is performed on the server, so only the minimal amount of data is transferred to the client.

We also already have a small suite of widgets that work with this new architecture. Just to wet your appetite, here is the new box plot widget.

BoxPlot-Orange30.png
Comment, like, share ...
published: 02/09/13 21:27 | author: janez | Comments and Reactions

Network Add-on Published in JSS

NetExplorer, a widget for network exploration, was in orange for over 5 years. Several network analysis widgets were added to Orange since, and we decided to move the entire network functionality to an Orange Network add-on.

We recently published a paper Interactive Network Exploration with Orange in the Journal of Statistical Software. We invite you to read the tutorial on network exploration. It is aimed for beginners in this topic, and includes detailed explanation with images.

NetAddon.png NetExplorer
Comment, like, share ...
published: 03/06/13 20:15 | author: miha | Comments and Reactions

Orange 2.7

Orange 2.7 is out with a major update in the visual programming environment. Redesigned interface, new widgets, welcome screen with workflow browser. Text annotation and arrow lines in workspace. Preloaded workflows with annotations. Widget menu and search can now be activated through key press (open the Settings to make this option available). Extended or minimised widget tab. Improved widget browsing. Enjoy!

orange27-cv.png

orange27-recent.png

orange27-tree.png

Comment, like, share ...
published: 25/05/13 10:09 | author: blaz | Comments and Reactions

Problems With Orange Website

Our servers crashed on Friday, March 1st due to technical problems. The Orange website was offline for several hours and Mac bundle was unaccessible until today.

We are still reviewing if our other services work. If you notice some problems, please ping us.

Stay tuned and fruitful downloading!

Comment, like, share ...
published: 04/03/13 11:50 | author: miha | Comments and Reactions

New canvas

Orange Canvas, a visual programming environment for Orange, has been around for a while. Integrating new and new features degraded the quality of code to a point where further development proved to be a daunting task. With ever increasing number of widgets, the existing widget toolbar is becoming harder and harder to use, but improving it is really hard. For that reason, we decided Orange needs a new Canvas, a rewrite, that would keep all of the feature of the existing one, but introduce the needed structure and modularity to the source code.

The project started about a year ago, and more than 20 thousand lines of code later, we have something to show you. As of yesterday, the new canvas was merged to the main Orange repository, where it lives alongside the old one. At the moment, it still lacks a lot of testing, some features are not completely implemented, but the main functionality, i.e. visual programming with widgets and links, should work.

New canvas

If you are feeling adventurous, you can try it out yourself. Download the latest version from our website and run:

Windows:

C:\Python27\python.exe -m Orange.OrangeCanvas.main

Mac OS X bundle:

/Applications/Orange.app/Contents/MacOS/python -m Orange.OrangeCanvas.main

or, regardless of your operating system,

python -m Orange.OrangeCanvas.main

with the python that has Orange installed.

What to expect?

Nothing will explode, but short of that, anything might happen. If you stumble upon issues or have helpful suggestions, please post them on our issue tracker. There are some known problems we are aware of; you do not need to report those :).

Comment, like, share ...
published: 14/02/13 14:37 | author: anze | Comments and Reactions

Orange NMF add-on

Nimfa, a Python library for non-negative matrix factorization (NMF), which was part of Orange GSoC program back in 2011 got its own add-on.

Read more, comment, like, share ...
published: 06/02/13 13:47 | author: marinkaz | tags: addons matrixfactorization nmf | Comments and Reactions

Writing Orange Add-ons

We officially supported add-ons in Orange 2.6. You should start by checking the list of available add-ons. We pull those automatically from the PyPi, which is our preferred distribution channel. Try to install an add-on by either:

  • writing "pip install <add-on name>" in the terminal or
  • from the Orange Canvas GUI. Select "Options / Add-ons..." in the menu.

Everything should just work. Writing add-ons is as easy as writing your own Orange Widgets or Orange Scripts. Just follow this tutorial and you will have your brand-new Orange add-on on PyPi in no time (an hour at most).

Orange Add Ons
Comment, like, share ...
published: 29/01/13 09:00 | author: miha | Comments and Reactions

Orange 2.6

A new version of Orange, 2.6, has been uploaded to Python Package Index. Since the version on the Orange website is always up to date (we post daily builds), this may not affect you. Nevertheless, let us explain what we were working on for the last year.

The most important improvement to Orange is an implementation of add-on framework that is much more "standard pythonic". As a consequence, the add-on installation procedure has been simplified for both individual users and system administrators. For developers, the new framework eases the development and distribution of add-ons. This enabled us to make first steps towards the goal of removing the rarely used parts of Orange from the core distribution, which will ultimately result in less external dependencies and less warnings on module import. Orange 2.6 lacks the modules for network analysis (Orange.network) and prediction reliability assesment (Orange.reliability), but fear not: you can get them back by installing the Orange-Network and Orange-Reliability add-ons.

Apart from that, we have been mostly squashing bugs. A fun spare time activity - you can join us anytime by cloning our repository and sending us a pull request. :)

If our version numbering system confuses you, let us try to explain. For the last (couple of) year(s), our version numbers have been a mess. Orange2.5a4 was uploaded to pypi almost a year ago, and was followed by a 2.6a2 release that was never available outisde our repository/daily builds. From this day forth, our versioning system should be as follows.

  • If you install orange from pypi, the version (Orange.version.full_version) will be something like 2.6 or 2.6.1.
  • If you use our daily builds or build orange yourself from the source available in our repository, your version will be 2.6.1.dev-8804fbc. (minor will be larger by one and .dev- suffix will show the source control revision that was used for the build)
Comment, like, share ...
published: 21/01/13 14:23 | author: anze | Comments and Reactions