Ignore:
Timestamp:
02/06/12 00:53:53 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
rebase_source:
3c816a2053cc5cd2dba7278e26f1da0248e87c51
Message:

Fixes in documentation of Orange.data.Domain

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.data.domain.rst

    r9553 r9652  
    55=============================== 
    66 
    7 In Orange, the term `domain` denotes a set of features, which will be 
    8 used to describe the data instances, the class variables, meta 
    9 attributes and similar. Each data instance, as well as many 
    10 classifiers and other objects are associated with a domain descriptor, 
    11 which defines the object's content and/or its input and output data 
    12 format. 
    13  
    14 Domain descriptors are also responsible for converting data instances 
    15 from one domain to another, e.g. from the original feature space to 
    16 one with different set of features which are selected or constructed 
    17 from the original set. 
    18  
    19 Domains as lists 
    20 ================ 
    21  
    22 Domains resemble lists: the length of domain is the number of 
    23 variables, including the class variable. Iterating through domain 
    24 goes through features and the class variable, but not through meta 
    25 attributes. Domains can be indexed by integer indices, variable names 
    26 or instances of :obj:`Orange.data.variables.Variable`. Domain has a 
    27 method :obj:`Domain.index` that returns the index of a variable 
    28 specified by a descriptor, name. Slices can be retrieved, but not 
    29 set. :: 
    30  
    31     >>> print d2 
    32     [a, b, e, y], {-4:c, -5:d, -6:f, -7:X} 
    33     >>> d2[1] 
    34     EnumVariable 'b' 
    35     >>> d2["e"] 
    36     EnumVariable 'e' 
    37     >>> d2["d"] 
    38     EnumVariable 'd' 
    39     >>> d2[-4] 
    40     EnumVariable 'c' 
    41     >>> for attr in d2: 
     7In Orange, the term `domain` denotes a set of features, 
     8meta attributes and class attribute that describe data. Domain 
     9descriptors are attached to data instances, data tables, 
     10classifiers and other objects. 
     11 
     12Besides describing the data, domain descriptors contain methods for 
     13converting data instances from one domain to another, 
     14e.g. from the original feature space to one with different set of 
     15features that are selected or constructed from the original set. 
     16 
     17The following examples will use domain constructed when reading the data 
     18set `zoo`:: 
     19 
     20    >>> data = Orange.data.Table("zoo") 
     21    >>> domain = data.domain 
     22    >>> domain 
     23    [hair, feathers, eggs, milk, airborne, aquatic, predator, toothed, 
     24    backbone, breathes, venomous, fins, legs, tail, domestic, catsize, 
     25    type], {-2:name} 
     26 
     27Domains consists of ordinary features and the class attribute, 
     28if there is one, and of meta attributes. We will refer to features and 
     29the class attribute as *variables*. Variables are printed out 
     30in a form similar to a list whose elements are attribute names, 
     31and meta attributes are printed like a dictionary whose "keys" are meta 
     32attribute id's and "values" are attribute names. In the above case, 
     33each data instance corresponds to an animal and is described by the 
     34animal's properties and its type (the class); the meta attribute contains 
     35the animal's name. 
     36 
     37Domains as lists and dictionaries 
     38================================= 
     39 
     40Domains behave like lists: the length of domain is the number of 
     41variables including the class variable. Domains can be indexed by integer 
     42indices, variable names or instances of 
     43:obj:`Orange.data.variables.Variable`:: 
     44 
     45    >>> domain["feathers"] 
     46    EnumVariable 'feathers' 
     47    >>> domain[1] 
     48    EnumVariable 'feathers' 
     49    >>> feathers = domain[1] 
     50    >>> domain[feathers] 
     51    EnumVariable 'feathers' 
     52 
     53Meta attributes are indexed similarly:: 
     54 
     55    >>> domain[-2] 
     56    StringVariable 'name' 
     57    >>> domain["name"] 
     58    StringVariable 'name' 
     59 
     60Method :obj:`Domain.index` returns the index of a variable specified by a 
     61descriptor or name:: 
     62 
     63    >>> domain.index("feathers") 
     64    1 
     65    >>> domain.index(feathers) 
     66    1 
     67    >>> domain.index("name") 
     68    -2 
     69 
     70Slices can be retrieved, but not set. 
     71 
     72Iterating through domain goes through features and the class variable, 
     73but not through meta attributes:: 
     74 
     75    >>> for attr in domain: 
    4276    ...     print attr.name, 
    4377    ... 
    44     a b e y  
     78    hair feathers eggs milk airborne aquatic predator toothed backbone 
     79    breathes venomous fins legs tail domestic catsize type 
     80 
    4581 
    4682Conversions between domains 
    4783=========================== 
    4884 
    49 Domain descriptors are used to convert instances from one domain to 
    50 another. :: 
    51  
    52      >>> data = Orange.data.Table("monk1") 
    53      >>> d2 = Orange.data.Domain(["a", "b", "e", "y"], data.domain) 
    54      >>>  
     85Domain descriptors can convert instances from one domain to another 
     86(details on construction of domains are described later). :: 
     87 
     88     >>> new_domain = Orange.data.Domain(["feathers", "legs", "type"], 
     89     domain) 
    5590     >>> inst = data[55] 
    56      >>> print inst 
    57      ['1', '2', '1', '1', '4', '2', '0'] 
    58      >>> inst2 = d2(inst) 
    59      >>>  print inst2 
    60      ['1', '2', '4', '0'] 
     91     >>> inst 
     92     ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', '4', 
     93     '1', '0', '1', 'mammal'], {"name":'oryx'} 
     94     >>> inst2 = new_domain(inst) 
     95     >>> inst2 
     96     ['0', '4', 'mammal'] 
    6197 
    6298This is used, for instance, in classifiers: classifiers are often 
    63 trained on a preprocessed domain (e.g. with a subset of features or 
    64 with discretized data) and later used on instances from the original 
     99trained on a preprocessed domain (e.g. on a subset of features or 
     100on discretized data) and later used on instances from the original 
    65101domain. Classifiers store the training domain descriptor and use it 
    66102for converting new instances. 
    67103 
    68 Equivalently, instances can be converted by passing the new domain to 
    69 the constructor:: 
    70  
    71      >>> inst2 = Orange.data.Instance(d2, inst) 
    72  
    73 Entire data table can be converted similarly:: 
    74  
    75      >>> data2 = Orange.data.Table(d2, data) 
    76      >>> print data2[55] 
    77      ['1', '2', '4', '0'] 
     104Alternatively, instances can be converted by constructing a new instance 
     105and pass the new domain to the constructor:: 
     106 
     107     >>> inst2 = Orange.data.Instance(new_domain, inst) 
     108 
     109Entire data table can be converted in a similar way:: 
     110 
     111     >>> data2 = Orange.data.Table(new_domain, data) 
     112     >>> data2[55] 
     113     ['0', '4', 'mammal'] 
    78114 
    79115 
     
    96132Meta-values are additional values that can be attached to instances. 
    97133It is not necessary that all instances in the same table (or even all 
    98 instances from the same domain) have certain meta-value. See documentation 
    99 on :obj:`Orange.data.Instance` for a more thorough description of meta-values. 
     134instances from the same domain) have the same meta attributes. See 
     135documentation on :obj:`Orange.data.Instance` for a more thorough 
     136description of meta-values. 
    100137 
    101138Meta attributes that appear in instances can, but don't need to be 
    102 registered in the domain. Typically, the meta attribute will be 
    103 registered for the following reasons. 
     139listed in the domain. Typically, the meta attribute will be included in 
     140the domain for the following reasons. 
    104141 
    105142     * If the domain knows about a meta attribute, their values can be 
    106143       obtained with indexing by names and variable descriptors, 
    107        e.g. ``inst["age"]``. Values of unregistered meta attributes can 
    108        be obtained only through integer indices (e.g. inst[id], where 
     144       e.g. ``inst["age"]``. Values of unknown meta attributes 
     145       can be obtained only through integer indices (e.g. inst[id], where 
    109146       id needs to be an integer). 
    110147 
     
    115152       instead of a meta-id. 
    116153 
    117      * Registering an attribute provides a way to attach a descriptor 
    118        to a meta-id. See how the basket file format uses this feature. 
    119  
    120154     * When saving instances to a file, only the values of registered 
    121155       meta attributes are saved. 
    122156 
    123      * When a new data instance is constructed, it is automatically 
    124        assigned the meta attributes listed in the domain, with their 
    125        values set to unknown. 
     157     * When a new data instance is constructed, it will have all the 
     158       meta attributes listed in the domain, with their values set to 
     159       unknown. 
    126160 
    127161For the latter two points - saving to a file and construction of new 
    128162instances - there is an additional flag: a meta attribute can be 
    129163marked as "optional". Such meta attributes are not saved and not added 
    130 to newly constructed data instances. This functionality is used in, 
    131 for instance, the above mentioned basket format, where new meta 
    132 attributes are created while loading the file and new instances to 
    133 contain all words from the past examples. 
    134  
    135 There is another distinction between the optional and non-optional 
    136 meta attributes: the latter are `expected to be` present in all 
    137 examples of that domain. Saving to files expects them and will fail if 
    138 a non-optional meta value is missing. Optional attributes may be 
    139 missing. In most other places, these rules are not strictly enforced, 
    140 so adhering to them is rather up to choice. 
    141  
    142 Meta attributes can be added and removed even after the domain is 
    143 constructed and instances of that domain already exist. For instance, 
    144 if data contains the Monk 1 data set, we can add a new continuous 
    145 attribute named "misses" with the following code (a detailed 
    146 desription of methods related to meta attributes is given below):: 
     164to newly constructed data instances. 
     165 
     166Another distinction between the optional and non-optional meta 
     167attributes is that the latter are *expected to be* present in all 
     168data instances from that domain. Saving to files expects will fail 
     169if a non-optional meta value is missing; in most other places, 
     170these rules are not strictly enforced, so adhering to them is rather up 
     171to choice. 
     172 
     173While the list of features and the class value are constant, 
     174meta attributes can be added and removed at any time (a detailed 
     175description of methods related to meta attributes is given below):: 
    147176 
    148177     >>> misses = Orange.data.variable.Continuous("misses") 
     
    180209 
    181210     >>> for inst in data: 
    182      ... if inst.get_class() != classifier(example): 
    183      ...     example[misses] += 1 
     211     ... if inst.get_class() != classifier(inst): 
     212     ...     inst[misses] += 1 
    184213 
    185214The other effect of registering meta attributes is that they appear in 
     
    188217that domain. If the meta attributes occur in the original domain of 
    189218the instance or if they can be computed from them, they will have 
    190 appropriate values, otherwise they will have a "don't know" value. :: 
    191  
    192      domain = data.domain 
    193      d2 = Orange.data.Domain(["a", "b", "e", "y"], domain) 
    194      for attr in ["c", "d", "f"]: 
    195      d2.add_meta(Orange.data.new_meta_id(), domain[attr]) 
    196      d2.add_meta(Orange.data.new_meta_id(), orange.data.variable.Discrete("X")) 
    197      data2 = Orange.data.Table(d2, data) 
    198  
    199 Domain ``d2`` in this example has variables ``a``, ``b``, ``e`` and the 
    200 class, while the other three variables are added as meta 
    201 attributes, together with additional attribute X. Results are as 
    202 follows. :: 
    203  
    204      >>> print data[55] 
    205      ['1', '2', '1', '1', '4', '2', '0'], {"misses":0.000000} 
    206      >>> print data2[55] 
    207      ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?'} 
    208  
    209 After conversion, the three attributes are moved to meta attributes 
    210 and the new attribute appears as unknown. 
     219appropriate values, otherwise their value will be missing. :: 
     220 
     221    new_domain = Orange.data.Domain(["feathers", "legs"], domain) 
     222    new_domain.add_meta(Orange.data.new_meta_id(), domain["type"]) 
     223    new_domain.add_meta(Orange.data.new_meta_id(), domain["legs"]) 
     224    new_domain.add_meta( 
     225        Orange.data.new_meta_id(), Orange.data.variable.Discrete("X")) 
     226    data2 = Orange.data.Table(new_domain, data) 
     227 
     228Domain ``new_domain`` in this example has variables ``feathers`` and 
     229``legs`` and meta attributes ``type``, ``legs`` (again) and ``X`` which 
     230is a new feature with no relation to the existing ones. :: 
     231 
     232    >>> data[55] 
     233    ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', 
     234    '4', '1', '0', '1', 'mammal'], {"name":'oryx'} 
     235    >>> data2[55] 
     236    ['0', '4'], {"type":'mammal', "legs":'4', "X":'?'} 
    211237 
    212238 
     
    238264 
    239265     An integer value that is changed when the domain is 
    240      modified. Can be also used as unique domain identifier; two 
    241      different domains also have different versions. 
     266     modified. The value can be also used as unique domain identifier; two 
     267     different domains have different value of ``version``. 
    242268 
    243269     .. method:: __init__(variables[, class_vars=]) 
    244270 
    245      Construct a domain with the given variables specified; the 
     271     Construct a domain with the given variables; the 
    246272     last one is used as the class variable. :: 
    247273 
    248          >>> a, b, c = [Orange.data.variable.Discrete(x) 
    249                 for x in ["a", "b", "c"]] 
    250          >>> d = Orange.data.Domain([a, b, c]) 
    251          >>> print d.features 
     274         >>> a, b, c = [Orange.data.variable.Discrete(x) for x in "abc"] 
     275         >>> domain = Orange.data.Domain([a, b, c]) 
     276         >>> domain.features 
    252277         <EnumVariable 'a', EnumVariable 'b'> 
    253          >>> print d.class_var 
     278         >>> domain.class_var 
    254279         EnumVariable 'c' 
    255280 
    256      :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`) 
    257          :param class_vars: A list of multiple classes; must be a keword argument 
     281     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`) 
    258282     :type variables: list 
    259  
    260      .. method:: __init__(features, class_variable[, classVars=]) 
     283     :param class_vars: A list of multiple classes; must be a keword argument 
     284     :type class_vars: list 
     285 
     286     .. method:: __init__(features, class_variable[, class_vars=]) 
    261287 
    262288     Construct a domain with the given list of features and the 
    263289     class variable. :: 
    264290 
    265          >>> d = Orange.data.Domain([a, b], c) 
    266          >>> print d.features 
     291         >>> domain = Orange.data.Domain([a, b], c) 
     292         >>> domain.features 
    267293         <EnumVariable 'a', EnumVariable 'b'> 
    268          >>> print d.class_var EnumVariable 'c' 
    269  
    270      :param features: List of features (instances of :obj:`Orange.data.variable.Variable`) 
    271      :type features: list 
    272      :param class_variable: Class variable 
    273          :param class_vars: A list of multiple classes; must be a keword argument 
    274      :type features: Orange.data.variable.Variable 
     294         >>> domain.class_var 
     295         EnumVariable 'c' 
     296 
     297     :param features: List of features (instances of :obj:`Orange.data.variable.Variable`) 
     298     :type features: list 
     299     :param class_variable: Class variable 
     300     :type class_variable: Orange.data.variable.Variable 
     301     :param class_vars: A list of multiple classes; must be a keyword argument 
     302     :type class_vars: list 
    275303 
    276304     .. method:: __init__(variables, has_class[, class_vars=]) 
    277305 
    278      Construct a domain with the given variables. If has_class is 
     306     Construct a domain with the given variables. If `has_class` is 
    279307     :obj:`True`, the last one is used as the class variable. :: 
    280308 
    281          >>> d = Orange.data.Domain([a, b, c], False) 
    282          >>> print d.features 
     309         >>> domain = Orange.data.Domain([a, b, c], False) 
     310         >>> domain.features 
    283311         <EnumVariable 'a', EnumVariable 'b'> 
    284          >>> print d.class_var 
     312         >>> domain.class_var 
    285313         EnumVariable 'c' 
    286314 
     
    288316     :type features: list 
    289317     :param has_class: A flag telling whether the domain has a class 
    290          :param class_vars: A list of multiple classes; must be a keword argument 
    291318     :type has_class: bool 
     319     :param class_vars: A list of multiple classes; must be a keyword argument 
     320     :type class_vars: list 
    292321 
    293322     .. method:: __init__(variables, source[, class_vars=]) 
    294323 
    295      Construct a domain with the given variables, which can also be 
    296      specified by names, provided that the variables with that 
    297      names exist in the source list. The last variable from the 
    298      list is used as the class variable. :: 
    299  
    300          >>> d1 = orange.Domain([a, b]) 
    301          >>> d2 = orange.Domain(["a", b, c], d1)  
     324     Construct a domain with the given variables that can also be 
     325     specified by names if the variables with that names exist in the 
     326     source list. The last variable from the list is used as the class 
     327     variable. :: 
     328 
     329         >>> domain1 = orange.Domain([a, b]) 
     330         >>> domain2 = orange.Domain(["a", b, c], domain) 
    302331 
    303332     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`) 
    304333     :type variables: list 
    305334     :param source: An existing domain or a list of variables 
    306          :param class_vars: A list of multiple classes; must be a keword argument 
    307335     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable` 
     336     :param class_vars: A list of multiple classes; must be a keyword argument 
     337     :type class_vars: list 
    308338 
    309339     .. method:: __init__(variables, has_class, source[, class_vars=]) 
     
    312342     last variable should be used as the class variable. :: 
    313343 
    314          >>> d1 = orange.Domain([a, b]) 
    315          >>> d2 = orange.Domain(["a", b, c], d1)  
     344         >>> domain1 = orange.Domain([a, b], False) 
     345         >>> domain2 = orange.Domain(["a", b, c], False, domain) 
    316346 
    317347     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`) 
     
    320350     :type has_class: bool 
    321351     :param source: An existing domain or a list of variables 
    322          :param class_vars: A list of multiple classes; must be a keword argument 
    323352     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable` 
     353     :param class_vars: A list of multiple classes; must be a keyword argument 
     354     :type class_vars: list 
    324355 
    325356     .. method:: __init__(domain, class_var[, class_vars=]) 
    326357 
    327      Construct a domain as a shallow copy of an existing domain 
     358     Construct a copy of an existing domain 
    328359     except that the class variable is replaced with the given one 
    329      and the class variable of the existing domain becoems an 
     360     and the class variable of the existing domain becomes an 
    330361     ordinary feature. If the new class is one of the original 
    331362     domain's features, it can also be specified by a name. 
     
    334365     :type domain: :obj:`Orange.variable.Domain` 
    335366     :param class_var: Class variable for the new domain 
    336          :param class_vars: A list of multiple classes; must be a keword argument 
    337367     :type class_var: string or :obj:`Orange.data.variable.Variable` 
     368     :param class_vars: A list of multiple classes; must be a keword argument 
     369     :type class_vars: list 
    338370 
    339371     .. method:: __init__(domain, has_class=False[, class_vars=]) 
    340372 
    341      Construct a shallow copy of the domain. If the ``has_class`` 
    342      flag is given and equals :obj:`False`, it moves the class 
     373     Construct a copy of the domain. If the ``has_class`` 
     374     flag is given and is :obj:`False`, it moves the class 
    343375     attribute to ordinary features. 
    344376 
     
    346378     :type domain: :obj:`Orange.variable.Domain` 
    347379     :param has_class: A flag telling whether the domain has a class 
    348          :param class_vars: A list of multiple classes; must be a keword argument 
    349380     :type has_class: bool 
     381     :param class_vars: A list of multiple classes; must be a keword argument 
     382     :type class_vars: list 
    350383 
    351384     .. method:: has_discrete_attributes(include_class=True) 
    352385 
    353386     Return :obj:`True` if the domain has any discrete variables; 
    354      class is considered unless ``include_class`` is ``False``. 
     387     class is included unless ``include_class`` is ``False``. 
    355388 
    356389     :param has_class: Tells whether to consider the class variable 
     
    361394 
    362395     Return :obj:`True` if the domain has any continuous variables; 
    363      class is considered unless ``include_class`` is ``False``. 
     396     class is included unless ``include_class`` is ``False``. 
    364397 
    365398     :param has_class: Tells whether to consider the class variable 
     
    371404     Return :obj:`True` if the domain has any variables which are 
    372405     neither discrete nor continuous, such as, for instance string variables. 
    373      class is considered unless ``include_class`` is ``False``. 
     406     class is included unless ``include_class`` is ``False``. 
    374407 
    375408     :param has_class: Tells whether to consider the class variable 
     
    381414 
    382415     Register a meta attribute with the given id (obtained by 
    383      :obj:`Orange.data.new_meta_id`). The same meta attribute can (and 
    384      should) have the same id when registered in different domains. :: 
     416     :obj:`Orange.data.new_meta_id`). The same meta attribute should 
     417     have the same id in all domain in which it is registered. :: 
    385418 
    386419         >>> newid = Orange.data.new_meta_id() 
    387          >>> d2.add_meta(newid, Orange.data.variable.String("name")) 
    388          >>> d2[55]["name"] = "Joe" 
    389          >>> print data2[55] 
    390          ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?', "name":'Joe'} 
     420         >>> domain.add_meta(newid, Orange.data.variable.String("origin")) 
     421         >>> data[55]["origin"] = "Nepal" 
     422         >>> data[55] 
     423         ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', 
     424         '4', '1', '0', '1', 'mammal'], {"name":'oryx', "origin":'Nepal'} 
    391425 
    392426     The third argument tells whether the meta attribute is optional or 
    393427     not. The parameter is an integer, with any non-zero value meaning that 
    394428     the attribute is optional. Different values can be used to distinguish 
    395      between various optional attributes; the meaning of the value is not 
    396      defined in advance and can be used arbitrarily by the application. 
     429     between various types optional attributes; the meaning of the value 
     430     is not defined in advance and can be used arbitrarily by the 
     431     application. 
    397432 
    398433     :param id: id of the new meta attribute 
     
    406441 
    407442     Add multiple meta attributes at once. The dictionary contains id's as 
    408      keys and variables as the corresponding values. The following example 
    409      shows how to add all meta attributes from one domain to another:: 
    410  
    411           newdomain.add_metas(domain.get_metas) 
     443     keys and variables (:obj:~Orange.data.variable as the corresponding 
     444     values. The following example shows how to add all meta attributes 
     445     from one domain to another:: 
     446 
     447          newdomain.add_metas(domain.get_metas()) 
    412448 
    413449     The optional second argument has the same meaning as in :obj:`add_meta`. 
     
    458494 
    459495      Return a dictionary with meta attribute id's as keys and corresponding 
    460       variable descriptors as values; the dictionary contains only meta 
     496      variable descriptors as values. The dictionary contains only meta 
    461497      attributes for which the argument ``optional`` matches the flag given 
    462498      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`. 
Note: See TracChangeset for help on using the changeset viewer.