Paint Your Data

One of the widgets I very much enjoy when teaching introductory course in data mining is Paint Data widget. In the data I would paint in this widget I would intentionally include some clusters, or intentionally obscure them. Or draw them in any strange shape. Then I would discuss with students if these clusters are identified by k-means clustering. Or by hierarchical clustering. We would also discuss automatic scoring of the quality of clusters, come up with the idea of a silhouette (ok, already invented, but helps if you get this idea on your own as well). And then we would play with various data sets and clustering techniques and their parameters in Orange.

Like in the following workflow where I drew three clusters that were indeed recognized by k-means clustering. Notice that silhouette scoring correctly identified even the number of clusters was guessed correctly. And I also drawn the clustered data in the Scatterplot to check if the clusters are indeed where they should be.


Or like in the workflow below where k-means fails miserably (but some other clustering technique would not).


Paint Data can also be used in supervised setting, for classification tasks. We can set the intended number of classes, and then chose any of these to paint its data. Below I have used it to create the data sets to check the behavior of several classifiers.


There are tons of other workflows where Paint Data can be useful. Give it a try!

Comment, like, share ...
published: 20/12/13 20:19 | author: blaz | tags: data visualization | Comments and Reactions

Brief History of Orange, Praise to Donald Michie

Informatica has recently published our paper on the history of Orange. The paper is a post-publication from a Conference on 100 Years of Alan Turing and 20 Years of Slovene AI Society, where Janez Demšar gave a talk on the topics.

History of Orange goes all the way back to 1997, when late Donald Michie had an idea that machine learning needs an open toolbox for machine learning. To spark the development, we co-organized WebLab97 at beautiful Bled, Slovenia. Workshop's name reflected Michie's idea that tool should be a web application where people can submit data mining code, procedures, testing scripts, and data and share them in the joint web workspace.

Donald Michie, a pioneer of Artificial Intelligence, was always ahead of time. (Check out a great talk by Ivan Bratko on their friendship and adventures in chess and machine learning). At WebLab97, Michie was actually very, very ahead of time. But despite the presence of IBM's Java team that could guide us in developments of the toolbox, the technology was not ripe and initiative of WebLab was gone as the conference ended. But, at least for us, the idea sparked interest of Janez and myself, and development of what is now Orange begun shortly after.

Our paper gives brief account of Orange's history and its developments since WebLab97. For reasons of brevity it does not mention that prior to Qt we have experimented with other GUI platforms. Prior to Qt, we laid our hopes to Pwm Python megawidgets, a library that helped us to construct the first Orange graphical user interface. The GUI part of Orange was called Orange*First. Its screenshot shows a tab for interactive discretisation, thanks to Noriaki Aoki who then proposed that this kind of visualisation should be useful in medical data analysis:

orange first

PS Somehow, I have lost a latex file with a WebLab97 program. It should be on some backup tape, somewhere. The following scan of the first page (and a weblab97.pdf ), left in some PPT presentation, is all that I can retrieve. The program of the second day is missing, with keynotes from Tom Mitchell, and much talk about then already a success story of R.


Comment, like, share ...
published: 09/10/13 06:33 | author: blaz | tags: history michie | Comments and Reactions

JMLR Publishes Article on Orange

Journal of Machine Learning Research has just published our paper on Orange. In the paper we focus on its Python scripting part. We have last reported on Orange scripting at ECML/PKDD 2004. The manuscript was well received (over 270 citations on Google Scholar), but it is now entirely outdated. This was also our only formal publication on Orange scripting. With publication in JMLR this is now a current description of Orange and will be, for a while :-), Orange’s primary reference.

Here's a reference:

Demšar, J., Curk, T., & Erjavec, A. et al. Orange: Data Mining Toolbox in Python; Journal of Machine Learning Research 14(Aug):2349−2353, 2013.

and bibtex entry:

  author  = {Janez Dem\v{s}ar and Toma\v{z} Curk and Ale\v{s} Erjavec and
             \v{C}rt Gorup and Toma\v{z} Ho\v{c}evar and Mitar Milutinovi\v{c} and
             Martin Mo\v{z}ina and Matija Polajnar andMarko Toplak and  
             An\v{z}e Stari\v{c} and Miha \v{S}tajdohar and Lan Umek and 
             Lan \v{Z}agar and Jure \v{Z}bontar and Marinka \v{Z}itnik and
             Bla\v{z} Zupan},
  title   = {Orange: Data Mining Toolbox in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2013},
  volume  = {14},
  pages   = {2349-2353},
  url     = {}
Comment, like, share ...
published: 03/10/13 02:50 | author: blaz | tags: article jmlr scripting toolbox | Comments and Reactions

Orange and AXLE project

Our group at University of Ljubljana is a partner in the EU 7FP project Advanced Analytics for Extremely Large European Databases (AXLE). The project is particularly interesting because of the diverse partners that cover the entire vertical, from studying hardware architectures that would better support extremely large databases (University of Manchester, Barcelona Supercomputing Center) to making the necessary adjustments related to speed and security of databases (2ndQuadrant) to data analytics (our group) to handling and analyzing real data and decision making (Portavita).

As a result of the project, Orange will be better connected with databases. Currently, all data is stored in working memory, while the forthcoming Orange 3.0 will be able to handle data that is stored in the database. We are working on a parallel computation architecture. Visualization of large data also presents a big challenge: we cannot transfer large amounts of data from the database to the desktop, and on the other hand it is difficult to provide a rich interactive experience if visualizations are created on the server-side. Also, most visualizations are intrinsically unsuitable for large data sets. For instance, the scatter plot represents each data instance with a symbol. Even when the datum is represented with a single pixel, only a few million data points fits on the computer screen. So in the context of big data, we will have to replace scatterplots with heatmaps.

What have we got so far? Orange 3, which is in early stage of development, features a new architecture, which allows the data to be stored either in memory or on a database. In the latter case, selecting a subset of features or filtering the data does not copy the data but only modifies the queries that are used to access the data when needed. Computation of, for instance, distributions or contingency matrices is performed on the server, so only the minimal amount of data is transferred to the client.

We also already have a small suite of widgets that work with this new architecture. Just to wet your appetite, here is the new box plot widget.

Comment, like, share ...
published: 02/09/13 21:27 | author: janez | Comments and Reactions

Network Add-on Published in JSS

NetExplorer, a widget for network exploration, was in orange for over 5 years. Several network analysis widgets were added to Orange since, and we decided to move the entire network functionality to an Orange Network add-on.

We recently published a paper Interactive Network Exploration with Orange in the Journal of Statistical Software. We invite you to read the tutorial on network exploration. It is aimed for beginners in this topic, and includes detailed explanation with images.

NetAddon.png NetExplorer
Comment, like, share ...
published: 03/06/13 20:15 | author: miha | Comments and Reactions

Orange 2.7

Orange 2.7 is out with a major update in the visual programming environment. Redesigned interface, new widgets, welcome screen with workflow browser. Text annotation and arrow lines in workspace. Preloaded workflows with annotations. Widget menu and search can now be activated through key press (open the Settings to make this option available). Extended or minimised widget tab. Improved widget browsing. Enjoy!




Comment, like, share ...
published: 25/05/13 10:09 | author: blaz | Comments and Reactions

Problems With Orange Website

Our servers crashed on Friday, March 1st due to technical problems. The Orange website was offline for several hours and Mac bundle was unaccessible until today.

We are still reviewing if our other services work. If you notice some problems, please ping us.

Stay tuned and fruitful downloading!

Comment, like, share ...
published: 04/03/13 11:50 | author: miha | Comments and Reactions

New canvas

Orange Canvas, a visual programming environment for Orange, has been around for a while. Integrating new and new features degraded the quality of code to a point where further development proved to be a daunting task. With ever increasing number of widgets, the existing widget toolbar is becoming harder and harder to use, but improving it is really hard. For that reason, we decided Orange needs a new Canvas, a rewrite, that would keep all of the feature of the existing one, but introduce the needed structure and modularity to the source code.

The project started about a year ago, and more than 20 thousand lines of code later, we have something to show you. As of yesterday, the new canvas was merged to the main Orange repository, where it lives alongside the old one. At the moment, it still lacks a lot of testing, some features are not completely implemented, but the main functionality, i.e. visual programming with widgets and links, should work.

New canvas

If you are feeling adventurous, you can try it out yourself. Download the latest version from our website and run:


C:\Python27\python.exe -m Orange.OrangeCanvas.main

Mac OS X bundle:

/Applications/ -m Orange.OrangeCanvas.main

or, regardless of your operating system,

python -m Orange.OrangeCanvas.main

with the python that has Orange installed.

What to expect?

Nothing will explode, but short of that, anything might happen. If you stumble upon issues or have helpful suggestions, please post them on our issue tracker. There are some known problems we are aware of; you do not need to report those :).

Comment, like, share ...
published: 14/02/13 14:37 | author: anze | Comments and Reactions

Orange NMF add-on

Nimfa, a Python library for non-negative matrix factorization (NMF), which was part of Orange GSoC program back in 2011 got its own add-on.

Read more, comment, like, share ...
published: 06/02/13 13:47 | author: marinkaz | tags: addons matrixfactorization nmf | Comments and Reactions

Writing Orange Add-ons

We officially supported add-ons in Orange 2.6. You should start by checking the list of available add-ons. We pull those automatically from the PyPi, which is our preferred distribution channel. Try to install an add-on by either:

  • writing "pip install <add-on name>" in the terminal or
  • from the Orange Canvas GUI. Select "Options / Add-ons..." in the menu.

Everything should just work. Writing add-ons is as easy as writing your own Orange Widgets or Orange Scripts. Just follow this tutorial and you will have your brand-new Orange add-on on PyPi in no time (an hour at most).

Orange Add Ons
Comment, like, share ...
published: 29/01/13 09:00 | author: miha | Comments and Reactions

Orange 2.6

A new version of Orange, 2.6, has been uploaded to Python Package Index. Since the version on the Orange website is always up to date (we post daily builds), this may not affect you. Nevertheless, let us explain what we were working on for the last year.

The most important improvement to Orange is an implementation of add-on framework that is much more "standard pythonic". As a consequence, the add-on installation procedure has been simplified for both individual users and system administrators. For developers, the new framework eases the development and distribution of add-ons. This enabled us to make first steps towards the goal of removing the rarely used parts of Orange from the core distribution, which will ultimately result in less external dependencies and less warnings on module import. Orange 2.6 lacks the modules for network analysis ( and prediction reliability assesment (Orange.reliability), but fear not: you can get them back by installing the Orange-Network and Orange-Reliability add-ons.

Apart from that, we have been mostly squashing bugs. A fun spare time activity - you can join us anytime by cloning our repository and sending us a pull request. :)

If our version numbering system confuses you, let us try to explain. For the last (couple of) year(s), our version numbers have been a mess. Orange2.5a4 was uploaded to pypi almost a year ago, and was followed by a 2.6a2 release that was never available outisde our repository/daily builds. From this day forth, our versioning system should be as follows.

  • If you install orange from pypi, the version (Orange.version.full_version) will be something like 2.6 or 2.6.1.
  • If you use our daily builds or build orange yourself from the source available in our repository, your version will be (minor will be larger by one and .dev- suffix will show the source control revision that was used for the build)
Comment, like, share ...
published: 21/01/13 14:23 | author: anze | Comments and Reactions

New scripting tutorial

Orange just got a new, completely rewritten scripting tutorial. The tutorial uses Orange class hierarchy as introduced for version 2.5. The tutorial is supposed to be a gentle introduction in Orange scripting. It includes many examples, from really simple ones to those more complex. To give you a hint about the later, here is the code for learner with feature subset selection from:

class SmallLearner(Orange.classification.PyLearner):
    def __init__(self, base_learner=Orange.classification.bayes.NaiveLearner,
                 name='small', m=5): = name
        self.m   = m
        self.base_learner = base_learner

    def __call__(self, data, weight=None):
        gain = Orange.feature.scoring.InfoGain()
        m = min(self.m, len(data.domain.features))
        best = [f for _, f in sorted((gain(x, data), x) \
          for x in data.domain.features)[-m:]]
        domain = + [data.domain.class_var])

        model = self.base_learner(, data), weight)
        return Orange.classification.PyClassifier(classifier=model,

The tutorial was first written for Python 2.3. Since, Python and Orange have changed a lot. And so did I. Most of the for loops have become one-liners, list and dictionary comprehension have become a must, and many new and great libraries have emerged. The (boring) tutorial code that used to read

c = [0] * len(data.domain.classVar.values)
for e in data:
    c[int(e.getclass())] += 1
print "Instances: ", len(data), "total",
r = [0.] * len(c)
for i in range(len(c)):
    r[i] = c[i] * 100. / len(data)
for i in range(len(data.domain.classVar.values)):
    print ", %d(%4.1f%s) with class %s" % 
        (c[i], r[i], '%', data.domain.classVar.values[i]),

is now replaced with

print Counter(str(d.get_class()) for d in data)

Ok. Pretty print is missing, but that, if not in the same line, could be done in another one.

For now, the tutorial focuses on data input and output, classification and regression. We plan to use other sections, but you can also give us a hint if there are any you would wish to be included.

Comment, like, share ...
published: 06/01/13 19:30 | author: blaz | tags: documentation examples tutorial | Comments and Reactions

The easy way to install add-ons

The possibility of extending functionality of Orange through add-ons has been present for a long time. In fact, we never provided the toolbox for crunching bioinformatical data as an integral part of Orange; it has always been an add-on. The exact mechanism of distribution of add-ons has changed significantly in the last year to simplify the process for add-on authors and to make it more standards-compliant. Among other things, this enables system administrators to install add-ons system-wide directly from PyPi using easy_install or pip. Unfortunately there were also negative side effects of this process, notably the temporary breakage of the add-on management dialog within the Orange Canvas.

We are happy to report that this is now being taken care of and you are encouraged to test the functionality.

Read more, comment, like, share ...
published: 30/11/12 08:00 | author: matija | Comments and Reactions

Coming soon: Orange's new interface

Orange will soon get entirely new interface. The GUI will feature new canvas and icons and new presentation of data flow. Orange will be upgraded with on-line help for widgets and tutorials. The prototype is now in testing and should replace the current version of Orange in early 2013.

Read more, comment, like, share ...
published: 27/11/12 09:01 | author: blaz | Comments and Reactions

Short history of Orange

Few weeks back we celebrated 20 years of Slovene Artificial Intelligence Society. I have much enjoyed Ivan Bratko's talk on AI history, and his account of events as triggered by late Donald Michie. Many interesting talks followed, including highlights by Stephen Muggleton and Claude Sammut.

The last talk of the event was on Orange. Janez talked about its birth, history and future prospects. You can see his presentation on videolectures and check out the paper with lecture's notes.

Comment, like, share ...
published: 23/10/12 08:58 | author: blaz | tags: future history | Comments and Reactions