source: orange/docs/reference/rst/Orange.data.domain.rst @ 9535:6ad805782021

Revision 9535:6ad805782021, 17.1 KB checked in by jzbontar <jure.zbontar@…>, 2 years ago (diff)

Basket format documentation - added links

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of features, which will be
8used to describe the data instances, the class variables, meta
9attributes and similar. Each data instance, as well as many
10classifiers and other objects are associated with a domain descriptor,
11which defines the object's content and/or its input and output data
12format.
13
14Domain descriptors are also responsible for converting data instances
15from one domain to another, e.g. from the original feature space to
16one with different set of features which are selected or constructed
17from the original set.
18
19Domains as lists
20================
21
22Domains resemble lists: the length of domain is the number of
23variables, including the class variable. Iterating through domain
24goes through features and the class variable, but not through meta
25attributes. Domains can be indexed by integer indices, variable names
26or instances of :obj:`Orange.data.variables.Variable`. Domain has a
27method :obj:`Domain.index` that returns the index of a variable
28specified by a descriptor, name. Slices can be retrieved, but not
29set. ::
30
31    >>> print d2
32    [a, b, e, y], {-4:c, -5:d, -6:f, -7:X}
33    >>> d2[1]
34    EnumVariable 'b'
35    >>> d2["e"]
36    EnumVariable 'e'
37    >>> d2["d"]
38    EnumVariable 'd'
39    >>> d2[-4]
40    EnumVariable 'c'
41    >>> for attr in d2:
42    ...     print attr.name,
43    ...
44    a b e y
45
46Conversions between domains
47===========================
48
49Domain descriptors are used to convert instances from one domain to
50another. ::
51
52     >>> data = Orange.data.Table("monk1")
53     >>> d2 = Orange.data.Domain(["a", "b", "e", "y"], data.domain)
54     >>>
55     >>> inst = data[55]
56     >>> print inst
57     ['1', '2', '1', '1', '4', '2', '0']
58     >>> inst2 = d2(inst)
59     >>>  print inst2
60     ['1', '2', '4', '0']
61
62This is used, for instance, in classifiers: classifiers are often
63trained on a preprocessed domain (e.g. with a subset of features or
64with discretized data) and later used on instances from the original
65domain. Classifiers store the training domain descriptor and use it
66for converting new instances.
67
68Equivalently, instances can be converted by passing the new domain to
69the constructor::
70
71     >>> inst2 = Orange.data.Instance(d2, inst)
72
73Entire data table can be converted similarly::
74
75     >>> data2 = Orange.data.Table(d2, data)
76     >>> print data2[55]
77     ['1', '2', '4', '0']
78
79
80Multiple classes
81================
82
83A domain can have multiple additional class attributes. These are stored
84similarly to other features except that they are not used for learning. The
85list of such classes is stored in `class_vars`. When converting between
86domains, multiple classes can become ordinary features or the class, and
87vice versa.
88
89.. _meta-attributes:
90
91Meta attributes
92===============
93
94Meta-values are additional values that can be attached to instances.
95It is not necessary that all instances in the same table (or even all
96instances from the same domain) have certain meta-value. See documentation
97on :obj:`Orange.data.Instance` for a more thorough description of meta-values.
98
99Meta attributes that appear in instances can, but don't need to be
100registered in the domain. Typically, the meta attribute will be
101registered for the following reasons.
102
103     * If the domain knows about a meta attribute, their values can be
104       obtained with indexing by names and variable descriptors,
105       e.g. ``inst["age"]``. Values of unregistered meta attributes can
106       be obtained only through integer indices (e.g. inst[id], where
107       id needs to be an integer).
108
109     * When printing out an instance, the symbolic values of discrete
110       meta attributes can only be printed if the attribute is
111       registered. Also, if the attribute is registered, the printed
112       out example will show a (more informative) attribute's name
113       instead of a meta-id.
114
115     * Registering an attribute provides a way to attach a descriptor
116       to a meta-id. See how the basket file format uses this feature.
117
118     * When saving instances to a file, only the values of registered
119       meta attributes are saved.
120
121     * When a new data instance is constructed, it is automatically
122       assigned the meta attributes listed in the domain, with their
123       values set to unknown.
124
125For the latter two points - saving to a file and construction of new
126instances - there is an additional flag: a meta attribute can be
127marked as "optional". Such meta attributes are not saved and not added
128to newly constructed data instances. This functionality is used in,
129for instance, the above mentioned basket format, where new meta
130attributes are created while loading the file and new instances to
131contain all words from the past examples.
132
133There is another distinction between the optional and non-optional
134meta attributes: the latter are `expected to be` present in all
135examples of that domain. Saving to files expects them and will fail if
136a non-optional meta value is missing. Optional attributes may be
137missing. In most other places, these rules are not strictly enforced,
138so adhering to them is rather up to choice.
139
140Meta attributes can be added and removed even after the domain is
141constructed and instances of that domain already exist. For instance,
142if data contains the Monk 1 data set, we can add a new continuous
143attribute named "misses" with the following code (a detailed
144desription of methods related to meta attributes is given below)::
145
146     >>> misses = Orange.data.variable.Continuous("misses")
147     >>> id = Orange.data.new_meta_id()
148     >>> data.domain.add_meta(id, misses)
149
150This does not change the data: no attributes are added to data
151instances.
152
153Registering meta attributes enables addressing by indexing, either by
154name or by descriptor. For instance, the following snippet sets the new
155attribute to 0 for all instances in the data table::
156
157     >>> for inst in data:
158     ...     inst[misses] = 0
159
160An alternative is to refer to the attribute by name::
161
162     >>> for inst in data:
163     ...     inst["misses"] = 0
164
165If the attribute were not registered, it could still be set using the
166integer index::
167
168     >>> for inst in data:
169     ...    inst.set_meta(id, 0)
170
171Registering the meta attribute also enhances printouts. When an instance
172is printed, meta-values for registered meta attributes are shown as
173"name:value" pairs, while for unregistered only id is given instead
174of a name.
175
176A meta-attribute can be used, for instance, to record the number of
177misclassifications by a given ``classifier``::
178
179     >>> for inst in data:
180     ... if inst.get_class() != classifier(example):
181     ...     example[misses] += 1
182
183The other effect of registering meta attributes is that they appear in
184converted instances: whenever an instances is converted to some
185domain, it will have all the meta attributes that are registered in
186that domain. If the meta attributes occur in the original domain of
187the instance or if they can be computed from them, they will have
188appropriate values, otherwise they will have a "don't know" value. ::
189
190     domain = data.domain
191     d2 = Orange.data.Domain(["a", "b", "e", "y"], domain)
192     for attr in ["c", "d", "f"]:
193     d2.add_meta(Orange.data.new_meta_id(), domain[attr])
194     d2.add_meta(Orange.data.new_meta_id(), orange.data.variable.Discrete("X"))
195     data2 = Orange.data.Table(d2, data)
196
197Domain ``d2`` in this example has variables ``a``, ``b``, ``e`` and the
198class, while the other three variables are added as meta
199attributes, together with additional attribute X. Results are as
200follows. ::
201
202     >>> print data[55]
203     ['1', '2', '1', '1', '4', '2', '0'], {"misses":0.000000}
204     >>> print data2[55]
205     ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?'}
206
207After conversion, the three attributes are moved to meta attributes
208and the new attribute appears as unknown.
209
210
211
212.. class:: Domain
213
214     .. attribute:: features
215
216     List of domain attributes
217     (:obj:`Orange.data.variable.Variables`) without the class
218     variable. Read only.
219
220     .. attribute:: variables
221
222     List of domain attributes
223     (:obj:`Orange.data.variable.Variables`) including the class
224     variable. Read only.
225
226     .. attribute:: class_var
227
228     The class variable (:obj:`Orange.data.variable.Variable`), or
229     :obj:`None` if there is none. Read only.
230
231     .. attribute:: class_vars
232
233     A list of additional class attributes. Read only.
234
235     .. attribute:: version
236
237     An integer value that is changed when the domain is
238     modified. Can be also used as unique domain identifier; two
239     different domains also have different versions.
240
241     .. method:: __init__(variables[, class_vars=])
242
243     Construct a domain with the given variables specified; the
244     last one is used as the class variable. ::
245
246         >>> a, b, c = [Orange.data.variable.Discrete(x)
247                for x in ["a", "b", "c"]]
248         >>> d = Orange.data.Domain([a, b, c])
249         >>> print d.features
250         <EnumVariable 'a', EnumVariable 'b'>
251         >>> print d.class_var
252         EnumVariable 'c'
253
254     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
255         :param class_vars: A list of multiple classes; must be a keword argument
256     :type variables: list
257
258     .. method:: __init__(features, class_variable[, classVars=])
259
260     Construct a domain with the given list of features and the
261     class variable. ::
262
263         >>> d = Orange.data.Domain([a, b], c)
264         >>> print d.features
265         <EnumVariable 'a', EnumVariable 'b'>
266         >>> print d.class_var EnumVariable 'c'
267
268     :param features: List of features (instances of :obj:`Orange.data.variable.Variable`)
269     :type features: list
270     :param class_variable: Class variable
271         :param class_vars: A list of multiple classes; must be a keword argument
272     :type features: Orange.data.variable.Variable
273
274     .. method:: __init__(variables, has_class[, class_vars=])
275
276     Construct a domain with the given variables. If has_class is
277     :obj:`True`, the last one is used as the class variable. ::
278
279         >>> d = Orange.data.Domain([a, b, c], False)
280         >>> print d.features
281         <EnumVariable 'a', EnumVariable 'b'>
282         >>> print d.class_var
283         EnumVariable 'c'
284
285     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
286     :type features: list
287     :param has_class: A flag telling whether the domain has a class
288         :param class_vars: A list of multiple classes; must be a keword argument
289     :type has_class: bool
290
291     .. method:: __init__(variables, source[, class_vars=])
292
293     Construct a domain with the given variables, which can also be
294     specified by names, provided that the variables with that
295     names exist in the source list. The last variable from the
296     list is used as the class variable. ::
297
298         >>> d1 = orange.Domain([a, b])
299         >>> d2 = orange.Domain(["a", b, c], d1)
300
301     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
302     :type variables: list
303     :param source: An existing domain or a list of variables
304         :param class_vars: A list of multiple classes; must be a keword argument
305     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
306
307     .. method:: __init__(variables, has_class, source[, class_vars=])
308
309     Similar to above except for the flag which tells whether the
310     last variable should be used as the class variable. ::
311
312         >>> d1 = orange.Domain([a, b])
313         >>> d2 = orange.Domain(["a", b, c], d1)
314
315     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
316     :type variables: list
317     :param has_class: A flag telling whether the domain has a class
318     :type has_class: bool
319     :param source: An existing domain or a list of variables
320         :param class_vars: A list of multiple classes; must be a keword argument
321     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
322
323     .. method:: __init__(domain, class_var[, class_vars=])
324
325     Construct a domain as a shallow copy of an existing domain
326     except that the class variable is replaced with the given one
327     and the class variable of the existing domain becoems an
328     ordinary feature. If the new class is one of the original
329     domain's features, it can also be specified by a name.
330
331     :param domain: An existing domain
332     :type domain: :obj:`Orange.variable.Domain`
333     :param class_var: Class variable for the new domain
334         :param class_vars: A list of multiple classes; must be a keword argument
335     :type class_var: string or :obj:`Orange.data.variable.Variable`
336
337     .. method:: __init__(domain, has_class=False[, class_vars=])
338
339     Construct a shallow copy of the domain. If the ``has_class``
340     flag is given and equals :obj:`False`, it moves the class
341     attribute to ordinary features.
342
343     :param domain: An existing domain
344     :type domain: :obj:`Orange.variable.Domain`
345     :param has_class: A flag telling whether the domain has a class
346         :param class_vars: A list of multiple classes; must be a keword argument
347     :type has_class: bool
348
349     .. method:: has_discrete_attributes(include_class=True)
350
351     Return :obj:`True` if the domain has any discrete variables;
352     class is considered unless ``include_class`` is ``False``.
353
354     :param has_class: Tells whether to consider the class variable
355     :type has_class: bool
356     :rtype: bool
357
358     .. method:: has_continuous_attributes(include_class=True)
359
360     Return :obj:`True` if the domain has any continuous variables;
361     class is considered unless ``include_class`` is ``False``.
362
363     :param has_class: Tells whether to consider the class variable
364     :type has_class: bool
365     :rtype: bool
366
367     .. method:: has_other_attributes(include_class=True)
368
369     Return :obj:`True` if the domain has any variables which are
370     neither discrete nor continuous, such as, for instance string variables.
371     class is considered unless ``include_class`` is ``False``.
372
373     :param has_class: Tells whether to consider the class variable
374     :type has_class: bool
375     :rtype: bool
376
377
378     .. method:: add_meta(id, variable, optional=0)
379
380     Register a meta attribute with the given id (obtained by
381     :obj:`Orange.data.new_meta_id`). The same meta attribute can (and
382     should) have the same id when registered in different domains. ::
383
384         >>> newid = Orange.data.new_meta_id()
385         >>> d2.add_meta(newid, Orange.data.variable.String("name"))
386         >>> d2[55]["name"] = "Joe"
387         >>> print data2[55]
388         ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?', "name":'Joe'}
389
390     The third argument tells whether the meta attribute is optional or
391     not. The parameter is an integer, with any non-zero value meaning that
392     the attribute is optional. Different values can be used to distinguish
393     between various optional attributes; the meaning of the value is not
394     defined in advance and can be used arbitrarily by the application.
395
396     :param id: id of the new meta attribute
397     :type id: int
398     :param variable: variable descriptor
399     :type variable: Orange.data.variable.Variable
400     :param optional: tells whether the meta attribute is optional
401     :type optional: int
402
403     .. method:: add_metas(attributes, optional=0)
404
405     Add multiple meta attributes at once. The dictionary contains id's as
406     keys and variables as the corresponding values. The following example
407     shows how to add all meta attributes from one domain to another::
408
409          newdomain.add_metas(domain.get_metas)
410
411     The optional second argument has the same meaning as in :obj:`add_meta`.
412
413     :param attributes: dictionary of id's and variables
414     :type attributes: dict
415     :param optional: tells whether the meta attribute is optional
416     :type optional: int
417
418     .. method:: remove_meta(attribute)
419
420     Removes one or multiple meta attributes. Removing a meta attribute has
421     no effect on data instances.
422
423     :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
424     :type attribute: string, int, Orange.data.variable.Variable; or a list
425
426     .. method:: has_attribute(attribute)
427
428     Return True if the domain contains the specified meta attribute.
429
430     :param attribute: attribute to be checked
431     :type attribute: string, int, Orange.data.variable.Variable
432     :rtype: bool
433
434     .. method:: meta_id(attribute)
435
436     Return an id of a meta attribute.
437
438     :param attribute: name or variable descriptor of the attribute
439     :type attribute: string or Orange.data.variable.Variable
440     :rtype: int
441
442     .. method:: get_meta(attribute)
443
444     Return a variable descriptor corresponding to the meta attribute.
445
446     :param attribute: name or id of the attribute
447     :type attribute: string or int
448     :rtype: Orange.data.variable.Variable
449
450     .. method:: get_metas()
451
452      Return a dictionary with meta attribute id's as keys and corresponding
453      variable descriptors as values.
454
455     .. method:: get_metas(optional)
456
457      Return a dictionary with meta attribute id's as keys and corresponding
458      variable descriptors as values; the dictionary contains only meta
459      attributes for which the argument ``optional`` matches the flag given
460      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`.
461
462      :param optional: flag that specifies the attributes to be returned
463      :type optional: int
464      :rtype: dict
465
466     .. method:: is_optional_meta(attribute)
467
468     Return True if the given meta attribute is optional, and False if it is
469     not.
470
471     :param attribute: attribute to be checked
472     :type attribute: string, int, Orange.data.variable.Variable
473     :rtype: bool
Note: See TracBrowser for help on using the repository browser.