Orange Blog

By: BLAZ, Aug 11, 2017

It's Sailing Time (Again)

Every fall I teach a course on Introduction to Data Mining. And while the course is really on statistical learning and its applications, I also venture into classification trees. For several reasons. First, I can introduce information gain and with it feature scoring and ranking. Second, classification trees are one of the first machine learning approaches co-invented by engineers (Ross Quinlan) and statisticians (Leo Breiman, Jerome Friedman, Charles J. Stone, Richard A.

Categories: classification tree

By: BLAZ, Dec 22, 2016

The Beauty of Random Forest

It is the time of the year when we adore Christmas trees. But these are not the only trees we, at Orange team, think about. In fact, through almost life-long professional deformation of being a data scientist, when I think about trees I would often think about classification and regression trees. And they can be beautiful as well. Not only for their elegance in explaining the hidden patterns, but aesthetically, when rendered in Orange.

By: AJDA, Jul 29, 2016

Pythagorean Trees and Forests

Classification Trees are great, but how about when they overgrow even your 27” screen? Can we make the tree fit snugly onto the screen and still tell the whole story? Well, yes we can. Pythagorean Tree widget will show you the same information as Classification Tree, but way more concisely. Pythagorean Trees represent nodes with squares whose size is proportionate to the number of covered training instances. Once the data is split into two subsets, the corresponding new squares form a right triangle on top of the parent square.

By: AJDA, Aug 14, 2015

Classifying instances with Orange in Python

Last week we showed you how to create your own data table in Python shell. Now we’re going to take you a step further and show you how to easily classify data with Orange. First we’re going to create a new data table with 10 fruits as our instances. import Orange from import * color = DiscreteVariable("color", values=["orange", "green", "yellow"])calories = ContinuousVariable("calories") fiber = ContinuousVariable("fiber") fruit = DiscreteVariable("fruit", values=["orange", "apple", "peach"]) domain = Domain([color, calories, fiber], class_vars=fruit) data=Table(domain, [</span> ["green", 4, 1.

By: BIOLAB, Feb 5, 2012

Random decisions behind your back

When Orange builds a decision tree, candidate attributes are evaluated and the best candidate is chosen. But what if two or more share the first place? Most machine learning systems don’t care about it and always take the first, which is unfair and, besides, has strange effects: the induced model and, consequentially, its accuracy depends upon the order of attributes. Which shouldn’t be. This is not an isolated problem. Another instance is when a classifier has to choose between two equally probable classes when there is no additional information (such as classification costs) to help make the prediction.

Categories: tree

By: BIOLAB, Aug 24, 2011

Faster classification and regression trees

SimpleTreeLearner is an implementation of classification and regression trees that sacrifices flexibility for speed. A benchmark on 42 different datasets reveals that SimpleTreeLearner is 11 times faster than the original TreeLearner. The motivation behind developing a new tree induction algorithm from scratch was to speed up the construction of random forests, but you can also use it as a standalone learner. SimpleTreeLearner uses gain ratio for classification and MSE for regression and can handle unknown values.