Orange Forum • View topic - R or GSL?

R or GSL?

A place to ask questions about methods in Orange and how they are used and other general support.

R or GSL?

Postby Janez » Fri Apr 01, 2005 12:17

Some time ago somebody asked why Orange uses Gsl and not R. I've looked into this, even reprogrammed linear regression to use R instead of GSL, but now we have to decide.

Certainly, Orange will rely on only one of them since it already depends on too many other libraries (Python, Qt, PyQt, PyQwt, Numeric...)

I'm talking about the C part - Orange will be calling the R.dll, the C (OK, Fortran) part of R. If something is only in an R script, it doesn't interest me since I don't like Orange's C code, which is supposed to be fast, calling an R script which is interpreted and generally slow.

Also bear in mind that this is not related to using R on Orange data: it is trivial to provide an interface between Orange and RPy, we can do this regardless of whether Orange internally uses R or Gsl.

So, does anybody have any experience with the two libraries from C? They both have the basic stuff, such as QR decomposition, but which one will be more suitable for other machine learning needs? Which one is more convenient? Based on linear regression I got an impression that gsl does more, while R has the basic stuff in Fortran and the rest is handled in its script language. Since I don't want to call R scripts, I had to program things like unpivoting and computing the standard errors myself in Orange (while gsl already has it). This is probably generally true, right?

Then come the distribution issues. If someone builds Orange on Linux, it's quite probable he already has R and less probable he has Gsl.

On the other hand, when we build a binary, we can statically link Orange with gsl and any decent linker will exclude all the unneeded functions. So the user doesn't need to know about gsl and the distribution size does not increase.

Not so with R: I don't think we can statically link it (at least not on Windows), so the user would either have to install R or we would need to include the entire R.dll (2 MB) in the distribution. There is a way to circumvent this: Orange can load R.dll if it exist; if it does not, some functionality would be unavailable. The user then has an option to download R if he needs it.

Based on all this, I lean towards gsl. Any alternative opinions? Again, think of what R provides in R.dll, not in the R scripts. If you want to call R scripts from Python, you will be able to do so anyway.

Postby jgilbert » Tue May 24, 2005 20:40

I say use R.

The distribution under UNIX is much simpler than dealing with gsl. Under Windows, 2MB is not worth worrying over, this is peanuts.

I, however, have never used R from C. I've used R itself, Rpy and RSPython and found each to be quite sophisticated and useful tools.

One man's opinion.

Postby Janez » Tue May 24, 2005 22:30

Thanks. In the meantime we have reached to the same conclusion and have already replaced some GSL stuff with R.

R-scripting widgets available??

Postby Lareo » Sat Nov 19, 2005 11:05

I am completely new to Orange so sorry if the question I have might be obsolete!

I am using R for many datamining tasks as a professional. I would very much like to embed these scripts into widgets. There are currently two frameworks, which do this: 1. Statistiklabor, a German implementation of a statisitcs lab with visual programming capabilities and 2. Scicraft. Unfortunately the latter is not mature yet and the first solution appears to weak.

How about Orange? Has anyone a widget for R-Scripting?

Thanks

Lareo

Postby jgilbert » Tue Nov 22, 2005 0:20

I'm not quite sure what you mean. Statistiklabor appears R gui R and Sciviews provides an R gui (they do other things as well). If you're looking for an R gui there's a bunch of them listed at http://www.sciviews.org/_rgui/.

As to using R scripts in Python (and Orange) you can do that through RSPython (http://www.omegahat.org/RSPython/). I've used it a number of time and it's very straightforward.

There's also rpy (http://rpy.sourceforge.net/). I've never used it but I've heard good things. It's in Debian, which is always nice.

Once you can call R from Python it should be very easy to create widgets that call R scripts. Hmm. In fact, if we remove the Orange widgets, put in some R widgets and added an R shell we would have a new R gui. That could be pretty cool.


Drop me a line if you want to discuss it further, I think this could have real potential.

R-integration

Postby Lareo » Wed Nov 23, 2005 11:49

The latter is exactly, what i mean "creating an R-widget".

The many R GUIs are al familiar to me.
If you will ever have the chance to look at the product
"Insightfull miner" then you know what i mean. Insightfull miner
integrates a large bunch of datamining functionality together with
"nodes" or "widgets" for data preprocessing, such as
outlier handling, merging and removing columns in large data tables.
It integrates also the full functionality of S-plus (it is distributed by insightfull).

So in conclusion - full integration of R will convert orange into a framework
similar to insightfull miner.

Lareo

Postby jgilbert » Thu Nov 24, 2005 5:49

OK, I see what Insightful Miner does. Seems very doable.

Naturally, a number of potential features suggest themselves immediately. In Orange there is no way to create a widget on the fly, this would be useful if the target audience is well versed in R.

Orange currently useds a tab and button interface for the selection of widgets. A hierarchial list would probably be better if you intend to offer a wider range of tools.

Orange pushes all of the data through each step before moving to the next step. For large datasets something similar to Insightful Miner's blocks would work much better.

So a number of changes to Orange would be required to move in the direction of Insightful Miner. I don't see any insurmountable obstacles, merely features to be taken on one at a time.

I do not think that Orange will move in this direction. Currently, R is used as a GSL replacement, a stopgap solution for the lack of routines in Python extension modules. The target audience of what you suggest is quite different to the one I see Orange targeting.

A very interesting project. Do you plan to pursue it?

R-extension and further plans

Postby Lareo » Thu Nov 24, 2005 11:51

I am willing to extend orange if possible. I could allocate developers on that.
This only makes sense, when there are no principal stop signs.
What I do not know how active the project is at the moment.
I would very much like to get in touch with the guys behind the scene.

There seems alot of work remaining. For example, it is neccessary that the nodes store their "State" in the Settings List. Currently the "Select Attributes" and other nodes don't do that upon saving the whole Schema. (or ?)

We have a great demand accessing different data sources, data cleansing, table merging etc. These are the modules I would like to do.

Lareo

Postby Fabrice C. » Tue Jan 10, 2006 9:16

May I suggest you have a look at Scicraft (www.scicraft.org) It is getting more and more mature, although it is more oriented toward data mining and microarrays, rather than machine learning (which is where Orange shines in my opinion, along with the ability to chose whether to use a GUI or the library). Scicraft is already interfaced with R (through Rpy) and Octave. It is also a Python app (the GUI is also pyQt by the way).

As far as I am concerned, I am looking forward to the developpement of both Scicraft and Orange as I see in them very useful tools with different strength and weaknesses

Fabrice Capiez

Re: R-extension and further plans

Postby orangelinux » Sat Jun 02, 2007 15:30

Lareo wrote:I am willing to extend orange if possible. I could allocate developers on that.
This only makes sense, when there are no principal stop signs.
What I do not know how active the project is at the moment.
I would very much like to get in touch with the guys behind the scene.

There seems alot of work remaining. For example, it is neccessary that the nodes store their "State" in the Settings List. Currently the "Select Attributes" and other nodes don't do that upon saving the whole Schema. (or ?)

We have a great demand accessing different data sources, data cleansing, table merging etc. These are the modules I would like to do.

Lareo


I'm using orange on linux, but many functions doesn't works...
Lareo is right!!! It may be updated cause it could be very useful.
I'm working on a 209 attributes and 100000 tuples and on windows I become crazy.. it could be a great thing to update orange on linux.
This is only a suggestion because I think that this is a good program and may work very well with vary big database too..


Return to Questions & Support