source: orange/docs/reference/rst/Orange.data.domain.rst @ 9652:51f4370e879a

Revision 9652:51f4370e879a, 18.1 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Fixes in documentation of Orange.data.Domain

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of features,
8meta attributes and class attribute that describe data. Domain
9descriptors are attached to data instances, data tables,
10classifiers and other objects.
11
12Besides describing the data, domain descriptors contain methods for
13converting data instances from one domain to another,
14e.g. from the original feature space to one with different set of
15features that are selected or constructed from the original set.
16
17The following examples will use domain constructed when reading the data
18set `zoo`::
19
20    >>> data = Orange.data.Table("zoo")
21    >>> domain = data.domain
22    >>> domain
23    [hair, feathers, eggs, milk, airborne, aquatic, predator, toothed,
24    backbone, breathes, venomous, fins, legs, tail, domestic, catsize,
25    type], {-2:name}
26
27Domains consists of ordinary features and the class attribute,
28if there is one, and of meta attributes. We will refer to features and
29the class attribute as *variables*. Variables are printed out
30in a form similar to a list whose elements are attribute names,
31and meta attributes are printed like a dictionary whose "keys" are meta
32attribute id's and "values" are attribute names. In the above case,
33each data instance corresponds to an animal and is described by the
34animal's properties and its type (the class); the meta attribute contains
35the animal's name.
36
37Domains as lists and dictionaries
38=================================
39
40Domains behave like lists: the length of domain is the number of
41variables including the class variable. Domains can be indexed by integer
42indices, variable names or instances of
43:obj:`Orange.data.variables.Variable`::
44
45    >>> domain["feathers"]
46    EnumVariable 'feathers'
47    >>> domain[1]
48    EnumVariable 'feathers'
49    >>> feathers = domain[1]
50    >>> domain[feathers]
51    EnumVariable 'feathers'
52
53Meta attributes are indexed similarly::
54
55    >>> domain[-2]
56    StringVariable 'name'
57    >>> domain["name"]
58    StringVariable 'name'
59
60Method :obj:`Domain.index` returns the index of a variable specified by a
61descriptor or name::
62
63    >>> domain.index("feathers")
64    1
65    >>> domain.index(feathers)
66    1
67    >>> domain.index("name")
68    -2
69
70Slices can be retrieved, but not set.
71
72Iterating through domain goes through features and the class variable,
73but not through meta attributes::
74
75    >>> for attr in domain:
76    ...     print attr.name,
77    ...
78    hair feathers eggs milk airborne aquatic predator toothed backbone
79    breathes venomous fins legs tail domestic catsize type
80
81
82Conversions between domains
83===========================
84
85Domain descriptors can convert instances from one domain to another
86(details on construction of domains are described later). ::
87
88     >>> new_domain = Orange.data.Domain(["feathers", "legs", "type"],
89     domain)
90     >>> inst = data[55]
91     >>> inst
92     ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', '4',
93     '1', '0', '1', 'mammal'], {"name":'oryx'}
94     >>> inst2 = new_domain(inst)
95     >>> inst2
96     ['0', '4', 'mammal']
97
98This is used, for instance, in classifiers: classifiers are often
99trained on a preprocessed domain (e.g. on a subset of features or
100on discretized data) and later used on instances from the original
101domain. Classifiers store the training domain descriptor and use it
102for converting new instances.
103
104Alternatively, instances can be converted by constructing a new instance
105and pass the new domain to the constructor::
106
107     >>> inst2 = Orange.data.Instance(new_domain, inst)
108
109Entire data table can be converted in a similar way::
110
111     >>> data2 = Orange.data.Table(new_domain, data)
112     >>> data2[55]
113     ['0', '4', 'mammal']
114
115
116.. _multiple-classes:
117
118Multiple classes
119================
120
121A domain can have multiple additional class attributes. These are stored
122similarly to other features except that they are not used for learning. The
123list of such classes is stored in :obj:`~Orange.data.Domain.class_vars`.
124When converting between domains, multiple classes can become ordinary
125features or the class, and vice versa.
126
127.. _meta-attributes:
128
129Meta attributes
130===============
131
132Meta-values are additional values that can be attached to instances.
133It is not necessary that all instances in the same table (or even all
134instances from the same domain) have the same meta attributes. See
135documentation on :obj:`Orange.data.Instance` for a more thorough
136description of meta-values.
137
138Meta attributes that appear in instances can, but don't need to be
139listed in the domain. Typically, the meta attribute will be included in
140the domain for the following reasons.
141
142     * If the domain knows about a meta attribute, their values can be
143       obtained with indexing by names and variable descriptors,
144       e.g. ``inst["age"]``. Values of unknown meta attributes
145       can be obtained only through integer indices (e.g. inst[id], where
146       id needs to be an integer).
147
148     * When printing out an instance, the symbolic values of discrete
149       meta attributes can only be printed if the attribute is
150       registered. Also, if the attribute is registered, the printed
151       out example will show a (more informative) attribute's name
152       instead of a meta-id.
153
154     * When saving instances to a file, only the values of registered
155       meta attributes are saved.
156
157     * When a new data instance is constructed, it will have all the
158       meta attributes listed in the domain, with their values set to
159       unknown.
160
161For the latter two points - saving to a file and construction of new
162instances - there is an additional flag: a meta attribute can be
163marked as "optional". Such meta attributes are not saved and not added
164to newly constructed data instances.
165
166Another distinction between the optional and non-optional meta
167attributes is that the latter are *expected to be* present in all
168data instances from that domain. Saving to files expects will fail
169if a non-optional meta value is missing; in most other places,
170these rules are not strictly enforced, so adhering to them is rather up
171to choice.
172
173While the list of features and the class value are constant,
174meta attributes can be added and removed at any time (a detailed
175description of methods related to meta attributes is given below)::
176
177     >>> misses = Orange.data.variable.Continuous("misses")
178     >>> id = Orange.data.new_meta_id()
179     >>> data.domain.add_meta(id, misses)
180
181This does not change the data: no attributes are added to data
182instances.
183
184Registering meta attributes enables addressing by indexing, either by
185name or by descriptor. For instance, the following snippet sets the new
186attribute to 0 for all instances in the data table::
187
188     >>> for inst in data:
189     ...     inst[misses] = 0
190
191An alternative is to refer to the attribute by name::
192
193     >>> for inst in data:
194     ...     inst["misses"] = 0
195
196If the attribute were not registered, it could still be set using the
197integer index::
198
199     >>> for inst in data:
200     ...    inst.set_meta(id, 0)
201
202Registering the meta attribute also enhances printouts. When an instance
203is printed, meta-values for registered meta attributes are shown as
204"name:value" pairs, while for unregistered only id is given instead
205of a name.
206
207A meta-attribute can be used, for instance, to record the number of
208misclassifications by a given ``classifier``::
209
210     >>> for inst in data:
211     ... if inst.get_class() != classifier(inst):
212     ...     inst[misses] += 1
213
214The other effect of registering meta attributes is that they appear in
215converted instances: whenever an instances is converted to some
216domain, it will have all the meta attributes that are registered in
217that domain. If the meta attributes occur in the original domain of
218the instance or if they can be computed from them, they will have
219appropriate values, otherwise their value will be missing. ::
220
221    new_domain = Orange.data.Domain(["feathers", "legs"], domain)
222    new_domain.add_meta(Orange.data.new_meta_id(), domain["type"])
223    new_domain.add_meta(Orange.data.new_meta_id(), domain["legs"])
224    new_domain.add_meta(
225        Orange.data.new_meta_id(), Orange.data.variable.Discrete("X"))
226    data2 = Orange.data.Table(new_domain, data)
227
228Domain ``new_domain`` in this example has variables ``feathers`` and
229``legs`` and meta attributes ``type``, ``legs`` (again) and ``X`` which
230is a new feature with no relation to the existing ones. ::
231
232    >>> data[55]
233    ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
234    '4', '1', '0', '1', 'mammal'], {"name":'oryx'}
235    >>> data2[55]
236    ['0', '4'], {"type":'mammal', "legs":'4', "X":'?'}
237
238
239
240.. class:: Domain
241
242     .. attribute:: features
243
244     List of domain attributes
245     (:obj:`Orange.data.variable.Variables`) without the class
246     variable. Read only.
247
248     .. attribute:: variables
249
250     List of domain attributes
251     (:obj:`Orange.data.variable.Variables`) including the class
252     variable. Read only.
253
254     .. attribute:: class_var
255
256     The class variable (:obj:`Orange.data.variable.Variable`), or
257     :obj:`None` if there is none. Read only.
258
259     .. attribute:: class_vars
260
261     A list of additional class attributes. Read only.
262
263     .. attribute:: version
264
265     An integer value that is changed when the domain is
266     modified. The value can be also used as unique domain identifier; two
267     different domains have different value of ``version``.
268
269     .. method:: __init__(variables[, class_vars=])
270
271     Construct a domain with the given variables; the
272     last one is used as the class variable. ::
273
274         >>> a, b, c = [Orange.data.variable.Discrete(x) for x in "abc"]
275         >>> domain = Orange.data.Domain([a, b, c])
276         >>> domain.features
277         <EnumVariable 'a', EnumVariable 'b'>
278         >>> domain.class_var
279         EnumVariable 'c'
280
281     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
282     :type variables: list
283     :param class_vars: A list of multiple classes; must be a keword argument
284     :type class_vars: list
285
286     .. method:: __init__(features, class_variable[, class_vars=])
287
288     Construct a domain with the given list of features and the
289     class variable. ::
290
291         >>> domain = Orange.data.Domain([a, b], c)
292         >>> domain.features
293         <EnumVariable 'a', EnumVariable 'b'>
294         >>> domain.class_var
295         EnumVariable 'c'
296
297     :param features: List of features (instances of :obj:`Orange.data.variable.Variable`)
298     :type features: list
299     :param class_variable: Class variable
300     :type class_variable: Orange.data.variable.Variable
301     :param class_vars: A list of multiple classes; must be a keyword argument
302     :type class_vars: list
303
304     .. method:: __init__(variables, has_class[, class_vars=])
305
306     Construct a domain with the given variables. If `has_class` is
307     :obj:`True`, the last one is used as the class variable. ::
308
309         >>> domain = Orange.data.Domain([a, b, c], False)
310         >>> domain.features
311         <EnumVariable 'a', EnumVariable 'b'>
312         >>> domain.class_var
313         EnumVariable 'c'
314
315     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
316     :type features: list
317     :param has_class: A flag telling whether the domain has a class
318     :type has_class: bool
319     :param class_vars: A list of multiple classes; must be a keyword argument
320     :type class_vars: list
321
322     .. method:: __init__(variables, source[, class_vars=])
323
324     Construct a domain with the given variables that can also be
325     specified by names if the variables with that names exist in the
326     source list. The last variable from the list is used as the class
327     variable. ::
328
329         >>> domain1 = orange.Domain([a, b])
330         >>> domain2 = orange.Domain(["a", b, c], domain)
331
332     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
333     :type variables: list
334     :param source: An existing domain or a list of variables
335     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
336     :param class_vars: A list of multiple classes; must be a keyword argument
337     :type class_vars: list
338
339     .. method:: __init__(variables, has_class, source[, class_vars=])
340
341     Similar to above except for the flag which tells whether the
342     last variable should be used as the class variable. ::
343
344         >>> domain1 = orange.Domain([a, b], False)
345         >>> domain2 = orange.Domain(["a", b, c], False, domain)
346
347     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
348     :type variables: list
349     :param has_class: A flag telling whether the domain has a class
350     :type has_class: bool
351     :param source: An existing domain or a list of variables
352     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
353     :param class_vars: A list of multiple classes; must be a keyword argument
354     :type class_vars: list
355
356     .. method:: __init__(domain, class_var[, class_vars=])
357
358     Construct a copy of an existing domain
359     except that the class variable is replaced with the given one
360     and the class variable of the existing domain becomes an
361     ordinary feature. If the new class is one of the original
362     domain's features, it can also be specified by a name.
363
364     :param domain: An existing domain
365     :type domain: :obj:`Orange.variable.Domain`
366     :param class_var: Class variable for the new domain
367     :type class_var: string or :obj:`Orange.data.variable.Variable`
368     :param class_vars: A list of multiple classes; must be a keword argument
369     :type class_vars: list
370
371     .. method:: __init__(domain, has_class=False[, class_vars=])
372
373     Construct a copy of the domain. If the ``has_class``
374     flag is given and is :obj:`False`, it moves the class
375     attribute to ordinary features.
376
377     :param domain: An existing domain
378     :type domain: :obj:`Orange.variable.Domain`
379     :param has_class: A flag telling whether the domain has a class
380     :type has_class: bool
381     :param class_vars: A list of multiple classes; must be a keword argument
382     :type class_vars: list
383
384     .. method:: has_discrete_attributes(include_class=True)
385
386     Return :obj:`True` if the domain has any discrete variables;
387     class is included unless ``include_class`` is ``False``.
388
389     :param has_class: Tells whether to consider the class variable
390     :type has_class: bool
391     :rtype: bool
392
393     .. method:: has_continuous_attributes(include_class=True)
394
395     Return :obj:`True` if the domain has any continuous variables;
396     class is included unless ``include_class`` is ``False``.
397
398     :param has_class: Tells whether to consider the class variable
399     :type has_class: bool
400     :rtype: bool
401
402     .. method:: has_other_attributes(include_class=True)
403
404     Return :obj:`True` if the domain has any variables which are
405     neither discrete nor continuous, such as, for instance string variables.
406     class is included unless ``include_class`` is ``False``.
407
408     :param has_class: Tells whether to consider the class variable
409     :type has_class: bool
410     :rtype: bool
411
412
413     .. method:: add_meta(id, variable, optional=0)
414
415     Register a meta attribute with the given id (obtained by
416     :obj:`Orange.data.new_meta_id`). The same meta attribute should
417     have the same id in all domain in which it is registered. ::
418
419         >>> newid = Orange.data.new_meta_id()
420         >>> domain.add_meta(newid, Orange.data.variable.String("origin"))
421         >>> data[55]["origin"] = "Nepal"
422         >>> data[55]
423         ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
424         '4', '1', '0', '1', 'mammal'], {"name":'oryx', "origin":'Nepal'}
425
426     The third argument tells whether the meta attribute is optional or
427     not. The parameter is an integer, with any non-zero value meaning that
428     the attribute is optional. Different values can be used to distinguish
429     between various types optional attributes; the meaning of the value
430     is not defined in advance and can be used arbitrarily by the
431     application.
432
433     :param id: id of the new meta attribute
434     :type id: int
435     :param variable: variable descriptor
436     :type variable: Orange.data.variable.Variable
437     :param optional: tells whether the meta attribute is optional
438     :type optional: int
439
440     .. method:: add_metas(attributes, optional=0)
441
442     Add multiple meta attributes at once. The dictionary contains id's as
443     keys and variables (:obj:~Orange.data.variable as the corresponding
444     values. The following example shows how to add all meta attributes
445     from one domain to another::
446
447          newdomain.add_metas(domain.get_metas())
448
449     The optional second argument has the same meaning as in :obj:`add_meta`.
450
451     :param attributes: dictionary of id's and variables
452     :type attributes: dict
453     :param optional: tells whether the meta attribute is optional
454     :type optional: int
455
456     .. method:: remove_meta(attribute)
457
458     Removes one or multiple meta attributes. Removing a meta attribute has
459     no effect on data instances.
460
461     :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
462     :type attribute: string, int, Orange.data.variable.Variable; or a list
463
464     .. method:: has_attribute(attribute)
465
466     Return True if the domain contains the specified meta attribute.
467
468     :param attribute: attribute to be checked
469     :type attribute: string, int, Orange.data.variable.Variable
470     :rtype: bool
471
472     .. method:: meta_id(attribute)
473
474     Return an id of a meta attribute.
475
476     :param attribute: name or variable descriptor of the attribute
477     :type attribute: string or Orange.data.variable.Variable
478     :rtype: int
479
480     .. method:: get_meta(attribute)
481
482     Return a variable descriptor corresponding to the meta attribute.
483
484     :param attribute: name or id of the attribute
485     :type attribute: string or int
486     :rtype: Orange.data.variable.Variable
487
488     .. method:: get_metas()
489
490      Return a dictionary with meta attribute id's as keys and corresponding
491      variable descriptors as values.
492
493     .. method:: get_metas(optional)
494
495      Return a dictionary with meta attribute id's as keys and corresponding
496      variable descriptors as values. The dictionary contains only meta
497      attributes for which the argument ``optional`` matches the flag given
498      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`.
499
500      :param optional: flag that specifies the attributes to be returned
501      :type optional: int
502      :rtype: dict
503
504     .. method:: is_optional_meta(attribute)
505
506     Return True if the given meta attribute is optional, and False if it is
507     not.
508
509     :param attribute: attribute to be checked
510     :type attribute: string, int, Orange.data.variable.Variable
511     :rtype: bool
Note: See TracBrowser for help on using the repository browser.