Orange Forum • View topic - from numpy arrays to orange data

from numpy arrays to orange data

A place to ask questions about methods in Orange and how they are used and other general support.

from numpy arrays to orange data

Postby Sheila » Thu Jul 10, 2014 13:19

I have data and associated class in numpy array.

d = numpy.array([[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [5, 5, 2, 1, 1] ]) #this is data
c = numpy.array([0, 0, 1]) #this is class tag

How do I change it to Orange data format and feed it in orange classifiers?
Last edited by Sheila on Thu Jul 10, 2014 15:03, edited 1 time in total.

Re: from numpy arrays to orange data

Postby menosys » Thu Jul 10, 2014 14:35

If you are working with non-sparse date, the example below should do the job:

Code: Select all
import numpy
d = Orange.data.Domain([Orange.feature.Continuous('a%i' % x) for x in range(5)])
a = numpy.array([[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]])
t = Orange.data.Table(a)

Taken from http://orange.biolab.si/docs/latest/reference/rst/Orange.data.table/#example-table-prog1 for more details.

Re: from numpy arrays to orange data

Postby Ales » Thu Jul 10, 2014 14:54

Sheila wrote:I have data and associated class in numpy array.

data = numpy.array([[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [5, 5, 2, 1, 1] ])
class = numpy.array([0, 0, 1])

How do I change it to Orange data format and feed it in orange classifiers?

First you need to create an appropriate domain for the dataset (here I am assuming 5 continuous features and a discrete class variable)
Code: Select all
_, p = data.shape  # number of features
features = [Orange.feature.Continuous("X%i" % (i + 1)) for i in range(p)]
class_var = Orange.feature.Discrete("C", values=["0", "1"])  # class variable
domain = Orange.data.Domain(features, class_var)
print domain # --> [X1, X2, X3, X4, X5, C]

then pass the domain and the data array (with the class column included) to the Orange.data.Table constructor ...
Code: Select all
table = Orange.data.Table(domain, numpy.hstack((data, class_.reshape(-1, 1))))

... and train a classifier
Code: Select all
tree = Orange.classification.tree.TreeLearner(table)

Re: from numpy arrays to orange data

Postby Sheila » Thu Jul 10, 2014 15:17

I am still confused. I do not understand where to specify the class label.
Here 'd' is data matrix of size (n*m) (n samples and m features).
'c' is class label matrix of size (n*1) (for n samples)
d = numpy.array([[1, 2, 3], [5, 4, 3], [5, 0.5, 2], [0.4, 3, 0.2] , [0.1, 0.8, 3] ]) #this is data
c = numpy.array( [0, 1, 0,0,1] ) #this is class label

Is it possible to make a simple function which takes input data and corresponding class label and give Orange table!!!

Thank you.

Re: from numpy arrays to orange data

Postby Ales » Thu Jul 10, 2014 15:48

Sheila wrote:I am still confused. I do not understand where to specify the class label.

The class is specified in the Orange.data.Domain(list_of_features, class_var) constructor, i.e.
if X1 X2 and X2 are the predictor variables and C the class variable (with "0" and "1" labels) then
Code: Select all
Orange.data.Domain([X1, X2, X3], C)
creates the domain for the table and
Code: Select all
Orange.data.Table(domain, [[1, 2, 3, 0], [4, 3, 3, 1]])
creates the following table
Code: Select all
domain:
[X1   X2   X3  |  C]
with values:
[ 1    2    3  |  0]
[ 4    3    3  |  1]


Sheila wrote:Is it possible to make a simple function which takes input data and corresponding class label and give Orange table!!!

Code: Select all
def create_table_with_binary_class(d, c):
    _, p = d.shape  # number of features
    features = [Orange.feature.Continuous("X%i" % (i + 1)) for i in range(p)]
    class_var = Orange.feature.Discrete("C", values=["0", "1"])  # class variable
    domain = Orange.data.Domain(features, class_var)
    return Orange.data.Table(domain, numpy.hstack((data, c.reshape(-1, 1))))


Return to Questions & Support



cron