Changeset 9531:feba07f2b199 in orange


Ignore:
Timestamp:
01/12/12 15:22:26 (2 years ago)
Author:
jzbontar <jure.zbontar@…>
Branch:
default
Convert:
ab9c381449681aeaf16585f61f9f63fbb1f357e2
Message:

Basket format documentation

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.data.formats.rst

    r9524 r9531  
    3232 * -dc 
    3333 
     34Baskets 
     35------- 
     36 
     37Baskets can be used for storing sparse data in tab delimited files. They were 
     38specifically designed for text mining needs. If text mining and sparse data is 
     39not your business, you can skip this section. 
     40 
     41Baskets are given as a list of space-separated ``<name>=<value>`` atoms. A 
     42continuous meta attribute named ``<name>`` will be created and added to the domain 
     43as optional if it is not already there. A meta value for that variable will be 
     44added to the example. If the value is 1, you can omit the ``=<value>`` part. 
     45 
     46It is not possible to put meta attributes of other types than continuous in the 
     47basket. 
     48 
     49A tab delimited file with a basket can look like this:: 
     50 
     51    K       Ca      b_foo     Ba  y 
     52    c       c       basket    c   c 
     53            meta              i   class 
     54    0.06    8.75    a b a c   0   1 
     55    0.48            b=2 d     0   1 
     56    0.39    7.78              0   1 
     57    0.57    8.22    c=13      0   1 
     58 
     59These are the examples read from such a file:: 
     60 
     61    [0.06, 1], {"Ca":8.75, "a":2.000, "b":1.000, "c":1.000} 
     62    [0.48, 1], {"Ca":?, "b":2.000, "d":1.000} 
     63    [0.39, 1], {"Ca":7.78} 
     64    [0.57, 1], {"Ca":8.22, "c":13.000} 
     65 
     66It is recommended to have the basket as the last column, especially if it 
     67contains a lot of data. 
     68 
     69Note a few things. The basket column's name, ``b_foo``, is not used. In the first 
     70example, the value of ``a`` is 2 since it appears twice. The ordinary meta 
     71attribute, ``Ca``, appears in all examples, even in those where its value is 
     72undefined. Meta attributes from the basket appear only where they are defined. 
     73This is due to the different nature of these meta attributes: ``Ca`` is required 
     74while the others are optional.  :: 
     75 
     76    >>> d.domain.getmetas() 
     77    {-6: FloatVariable 'd', -22: FloatVariable 'Ca', -5: FloatVariable 'c', -4: FloatVariable 'b', -3: FloatVariable 'a'} 
     78    >>> d.domain.getmetas(False) 
     79    {-22: FloatVariable 'Ca'} 
     80    >>> d.domain.getmetas(True) 
     81    {-6: FloatVariable 'd', -5: FloatVariable 'c', -4: FloatVariable 'b', -3: FloatVariable 'a'} 
     82 
     83To fully understand all this, you should read the documentation on meta 
     84attributes in Domain and on the basket file format (a simple format that is 
     85limited to baskets only). 
     86 
     87Basket Format 
     88============= 
     89 
     90Basket files (.basket) are suitable for representing sparse data. Each example 
     91is represented by a line in the file. The line is written as a comma-separated 
     92list of name-value pairs. Here's an example of such file. :: 
     93 
     94    nobody, expects, the, Spanish, Inquisition=5 
     95    our, chief, weapon, is, surprise=3, surprise=2, and, fear,fear, and, surprise 
     96    our, two, weapons, are, fear, and, surprise, and, ruthless, efficiency 
     97    to, the, Pope, and, nice, red, uniforms, oh damn 
     98 
     99The file contains four examples. The first examples has five attributes 
     100defined, "nobody", "expects", "the", "Spanish" and "Inquisition"; the first 
     101four have (the default) value of 1.0 and the last has a value of 5.0. 
     102 
     103The attributes that appear in the domain aren't defined in any headers or even 
     104separate files, as with other formats supported by Orange. 
     105 
     106If attribute appears more than once, its values are added. For instance, the 
     107value of attribute "surprise" in the second examples is 6.0 and the value of 
     108"fear" is 2.0; the former appears three times with values of 3.0, 2.0 and 1.0, 
     109and the latter appears twice with value of 1.0. 
     110 
     111All attributes are loaded as optional meta-attributes, so zero values don't 
     112take any memory (unless they are given, but initialized to zero). See also 
     113section on meta-attributes in the reference for domain descriptors. 
     114 
     115Notice that at the time of writing this reference only association rules can 
     116directly use examples presented in the basket format. 
     117 
    34118 
    35119Other supported data formats 
Note: See TracChangeset for help on using the changeset viewer.