Orange Forum • View topic - Manually create sparse data table

Manually create sparse data table

A place to ask questions about methods in Orange and how they are used and other general support.

Manually create sparse data table

Postby menosys » Wed Jul 09, 2014 14:43

Hi,

I am currently working with a sparse data set similar to the inquisition basket (http://orange.biolab.si/docs/latest/_downloads/inquisition.basket), retrieved from a database.

So that, I would like to manually create a sparse data table from the data retrieved, instead of using a file. I have found a few examples about the creation of a data table from non-sparse data (http://orange.biolab.si/docs/latest/reference/rst/Orange.data.table/#example-table-prog1), however I cannot find an equivalent for the sparse data set.

Using ipython and the inquisition basket, I had surprising results which don't help me understanding how to create this table. Any ideas about the code required and the input type ?

Code: Select all
import Orange
data = Orange.data.Table("inquisition.basket")
print data.domain
>>> {-3:nobody, -4:expects, -5:the, -6:Spanish, -7:Inquisition, -8:our, -9:chief, -10:weapon, -11:is, -12:surprise, -13:and, -14:fear, -15:two, -16:weapons, -17:are, -18:ruthless, -19:efficiency, -20:three, -21:an, -22:almost, -23:fanatical, -24:devotion, -25:to, -26:Pope, -27:four, -28:no, -29:amongst, -30:weaponry, -31:such, -32:elements, -33:as, -34:I'll, -35:come, -36:in, -37:again, -38:diverse, -39:nice, -40:red, -41:uniforms, -42:oh damn}
print data.to_numpy()
>>> (array([], shape=(10, 0), dtype=float64), None, None)

Re: Manually create sparse data table

Postby Ales » Thu Jul 10, 2014 16:30

Unfortunately this is very tedious.
You have to add the 'sparse features' as meta attributes to the domain
Code: Select all
domain = Orange.data.Domain([])
mid = Orange.feature.Descriptor.new_meta_id
domain.add_metas(
    {mid(): Orange.feature.Continuous("nobody"),
     mid(): Orange.feature.Continuous("expects")},
    True
)

and then construct the data instances
Code: Select all
inst1 = Orange.data.Instance(domain)
inst1["nobody"] = 1.0
inst2 = Orange.data.Instance(domain)
inst2["expects"] = 2.0
table = Orange.data.Table(domain, [inst1, inst2])

Re: Manually create sparse data table

Postby menosys » Thu Jul 17, 2014 13:54

Great !

It may be interesting to add this code sample here: http://orange.biolab.si/docs/latest/reference/rst/Orange.data.table/#example-table-prog1

I have also found a former post (thanks to Ales again) with other details: http://orange.biolab.si/forum/viewtopic.php?f=4&t=1646

For those who don't understand the previous code, here is the equivalent with the simple market basket: http://orange.biolab.si/docs/latest/reference/rst/Orange.associate/#Orange.associate.AssociationRule

Code: Select all
import Orange

domain = Orange.data.Domain([])
mid = Orange.feature.Descriptor.new_meta_id
domain.add_metas(
    {mid(): Orange.feature.Continuous("Bread"),
     mid(): Orange.feature.Continuous("Milk"),
     mid(): Orange.feature.Continuous("Diapers"),
     mid(): Orange.feature.Continuous("Beer"),
     mid(): Orange.feature.Continuous("Eggs"),
     mid(): Orange.feature.Continuous("Cola")},
    True
)

transaction1 = Orange.data.Instance(domain)
transaction1["Bread"] = 1.0
transaction1["Milk"] = 1.0

transaction2 = Orange.data.Instance(domain)
transaction2["Bread"] = 1.0
transaction2["Diapers"] = 1.0
transaction2["Beer"] = 1.0
transaction2["Eggs"] = 1.0

transaction3 = Orange.data.Instance(domain)
transaction3["Milk"] = 1.0
transaction3["Diapers"] = 1.0
transaction3["Beer"] = 1.0
transaction3["Cola"] = 1.0

transaction4 = Orange.data.Instance(domain)
transaction4["Bread"] = 1.0
transaction4["Milk"] = 1.0
transaction4["Diapers"] = 1.0
transaction4["Beer"] = 1.0

transaction5 = Orange.data.Instance(domain)
transaction5["Bread"] = 1.0
transaction5["Milk"] = 1.0
transaction5["Diapers"] = 1.0
transaction5["Cola"] = 1.0

table = Orange.data.Table(domain, [transaction1, transaction2, transaction3, transaction4, transaction5])


Return to Questions & Support



cron