Orange Forum • View topic - evaluating Orange for data analysis and data modeling

evaluating Orange for data analysis and data modeling

A place to ask questions about methods in Orange and how they are used and other general support.

evaluating Orange for data analysis and data modeling

Postby srio » Fri Jun 28, 2013 10:57

Hello,

First of all, many thanks for creating and supporting this lovely tool.

I have been playing with Orange for evaluating to use Orange as an infrastructure/framwork for data analysis and modeling. I am not so interested in data mining, but I found the canvas is beautiful for designing, visualizing and storing sequences of data analysis, e.g., image analysis, data manipulations, simulations of sequential systems (like in optics) etc. We are a synchrotron facility that produce a lot of scientific data. I am looking for a design tool based on python that could implement EASILY widget and windows, and do sophisticated data visualization. I think Orange is a very good candidate for it, because

1) presents a very nice canvas interface
2) Encapsulated input/output methods and analysis algorithms in widgets (or scripts)
3) intelligent way of passing data between widgets
4) High flexibility for implementing ad-hoc methods using the python (and R) script-widgets
5) Save/restore all your workspace into a scheme file
6) uses python technology, fully compatible with the programming tools we develop.


I am not expert in widget programming with Qt, and I have not looked into the question of how easy is to implement a new widget, but it looks very reasonable.

I would like to ask how easy/complicate would be:

- change, replace and customize the full gallery of widgets by a new one for local use (e.g., read some kind of binary files, work with hdf5 files, implement some analysis methods via python code, visualize and overplot several line plots, surface and contour curve displays, etc)

- create some loop or batch processing, e.g., widget1 loads a file, widget2 does manipulation, widget3 save results to a file. I want to repeat this over a list of file (in a text list or in a directory). I think now is not possible with Orange. It requires a sort of "file injector" with a global "play" and "pause" button. Do you have any idea for doing that?

-create a widget that encapsulates a group of widgets.

Other things that could be useful:
- add options in the "right-click" menu:
- execute the widget/script (without opening the window)
- copy widget (and paste on canvas)
- print schema
- create an on-line widget library of user contributions

If we decide to use Orange as tool for our data processing and modeling, we will invest time and effort in Orange and we would like to meet you personally. Would be possible to visit you?


Many thanks,


Manuel Sanchez del Rio
ESRF, Grenoble, France

-

Re: evaluating Orange for data analysis and data modeling

Postby Blaz » Mon Jul 01, 2013 22:09

Dear Manuel, thanks for your suggestions and questions. Let me try to answer them, one by one:

change, replace and customize the full gallery of widgets


This should be easy. Adding new widgets was what the Orange environment was designed for. Please see the widget development documentation on Orange documentation we page. We made quite some effort to enable fast prototyping of widgets. For instance, state persistance of the widgets is automatically provided for the registered controls. That is, widget GUI controls automatically remember their settings. Settings can also associated with input data (context-based settings). We would most often try to implement all the computational procedures in a separate module, and then design the widgets as a GUI wrapper around the computational part. Keeping the widgets as lightweight as possible.

create a widget that encapsulates a group of widgets


Meta widgets would not be so difficult to implement, but of course this would require changes in current visual programming environment and its GUI. About every year or so we discuss about such implementation, and conclude that for data mining this is not required. But if there would be a problem domain with many (many!) widgets to compose a processing pipeline, than such functionality would be desired.

create some loop or batch processing


This could be already and easily implemented in the current Orange's visual programming environment. Say, with a widget that reads files from some directory, and sends to the output channel one data set after the other. Downstream widgets than process the data sets (one at the time, as they arrive). At the end of the workflow there's a widget that collects and saves the result.

Alternatively, one could extend current Orange's visual programming paradigm (add loops, conditions, etc.). This would require a lot of work. Years back we decided that, for data mining or bioinformatics, this is not needed. Procedures such as cross-validation are easier to embed into a special widget than to construct the environment which would enable users to develop it from scratch. Also, the interface would get too complicated.

other things that could be useful


Thanks for the list. Which options did you have in mind to associate with right click?

For execution of widgets without opening a window: this is a second request in a couple of weeks for this functionality. Not a simple one. A number of widgets in Orange require interaction. For others, say classifiers, cross-validation, data input, etc., these are all one-liners in Python and we have so far refrained from automating the script generation from a visual program. But I understand that if widgets are many (and not interactive), such functionality would be useful.

On line widget library and some sort of collaborative/social environment would be great. Not only for widgets but for workflows as well. Something like myexperiment.org. We thought about this, but would need help from other groups to develop and maintain such a platform.

If we decide to use Orange as tool for our data processing and modeling, we will invest time and effort in Orange and we would like to meet you personally. Would be possible to visit you?


Sure. For the visit, please contact us directly (blaz dot zupan at fri dot uni-lj dot si). Best wishes,
Blaz.

Re: evaluating Orange for data analysis and data modeling

Postby srio » Tue Jul 02, 2013 16:12

This could be already and easily implemented in the current Orange's visual programming environment. Say, with a widget that reads files from some directory, and sends to the output channel one data set after the other. Downstream widgets than process the data sets (one at the time, as they arrive). At the end of the workflow there's a widget that collects and saves the result.


Many thanks Blaz for your answers. I understand...

I come back to loops. What you propose implies that I must click "execute" at each iteration in the script-widget that sends the fine name in the output channel to the other widgets. Can this be avoided? Could be possible to define a loop and change the out_data at each iteraction and the other linked widgets react?
Actually, I can make a script with the loop that does read+operate+write, but we loss the power and beauty of the chained widgets.

Re: evaluating Orange for data analysis and data modeling

Postby Blaz » Tue Jul 02, 2013 21:40

What you propose implies that I must click "execute" at each iteration in the script-widget that sends the fine name in the output channel to the other widgets. Can this be avoided? Could be possible to define a loop and change the out_data at each iteraction and the other linked widgets react?


No, you would not need to click. As long as the "load" widget would send out tokens to the output channel, these would be processed by the downstream workflow. We had widget like that for microarray processing almost a decade ago, and it read the data files from one directory and passed them one after the other to processing widgets. Along with data the widgets sent also a unique ID, and passing a None token would signify that the data stream reached the end. Passing an ID with data token to a communication channel is still supported (Aleš tells me).

Re: evaluating Orange for data analysis and data modeling

Postby srio » Wed Jul 03, 2013 11:03

No, you would not need to click. As long as the "load" widget would send out tokens to the output channel, these would be processed by the downstream workflow.


Thanks. I understand this can be programmed in widgets. Can this be implemented in a python-scripts? I want to connect two python-scripts, script1 for sending file names, and script2 for processing a single file. Is there any way to trigger the execution of script2 by script1 ?

Re: evaluating Orange for data analysis and data modeling

Postby Blaz » Wed Jul 03, 2013 14:42

Sure, you can do "everything" in Python scripts. But I am not sure if I understood your question correctly. The way I understood it is if one can read files to be processed and send them to through the workflow defined by functions in some other modules. But I guess you do not refer to explicit, code-based processing, but triggering of events and event-driven processing. Something like a Petri net? Orange, though, does not implement event processing that you could use in Python scripts.

Re: evaluating Orange for data analysis and data modeling

Postby srio » Thu Jul 04, 2013 10:38

Sure, you can do "everything" in Python scripts. But I am not sure if I understood your question correctly


My problem is: script1 loads iteratively some images:
do i in range(100):
img = getmyimage(i)
out_data = img
Script2 displays the image (e.g., using matplotlib)
import matplotlib.pyplot as plt
p = plt.imshow(in_data)
fig = plt.gcf()
plt.clim() # clamp the color limits
plt.title("mytitle")
plt.pause(0.5)

The question is how to refresh the plot at each iteration (it actually displays the last image). Is it possible?

Many many thanks!

Re: evaluating Orange for data analysis and data modeling

Postby Ales » Thu Jul 04, 2013 13:07

srio wrote:My problem is: script1 loads iteratively some images:
do i in range(100):
img = getmyimage(i)
out_data = img
Script2 displays the image (e.g., using matplotlib)
import matplotlib.pyplot as plt
p = plt.imshow(in_data)
fig = plt.gcf()
plt.clim() # clamp the color limits
plt.title("mytitle")
plt.pause(0.5)

I take it that you mean running this in a Python Script widget within Orange Canvas.
srio wrote:The question is how to refresh the plot at each iteration (it actually displays the last image). Is it possible?

Using the current Python Script widget this is not possible.

Code: Select all
for i in range(100):
    img = getmyimage(i)
    out_data = img

This does not have the intended effect, the 'out_data' variable is only retrieved and sent downstream after the script finishes executing so only the last image would be shown in Script2.
Something like this would be possible (if send function would be exposed to the script's local namespace)
Code: Select all
for i in range(100):
    img = getmyimage(i)
    send("out_data", img, i)

It would also require changes to the Python Script's input handling code. Actually most widgets that have more then one input channel would need to explicitly track the incoming signal ids in order to work in batch mode. This is a bad design as it pushes the execution semantics into the widget itself. Also note that, in the above code, the send function would return immediately and simply fill the queue with 100 images, so this would not be appropriate for large workloads.

Instead I think it would be better to design and add proper support for batch processing to the signal propagation code (this code is pretty self contained so it should not have much affect on the rest of the code).


Return to Questions & Support



cron