Orange Forum • View topic - Pre-processing files of 3 GB ?

Pre-processing files of 3 GB ?

A place to ask questions about methods in Orange and how they are used and other general support.

Pre-processing files of 3 GB ?

Postby Joerg » Tue Feb 07, 2006 18:27


I have a file with 10000 instances and 20000 features. Is it possible to reduce the number of features with orange.
If this is possible, I would like to get a few lines of code as answer.

Best, Joerg

Postby Janez » Wed Feb 08, 2006 18:10

Orange will choke because it will try to load everything into the physical memory. It was not meant for data sets that large.

If you only want to select a small subset of a few tens of features, you can try something like this

Code: Select all
import orange

d = orange.TabDelimExampleGenerator("iris")

pp = orange.Preprocessor_select()
attrs = ["petal length", "sepal length"]
pp.attributes = [d.domain[x] for x in attrs]

t = pp(d)

d is here an example generator which reads examples from the file on-the-fly (it's written for tab-delimited files; if you have C45 files, use C45ExampleGenerator, ets.).

But t is entirely in memory. To avoid this, you'd have to go into C++ code.

Return to Questions & Support