source: orange/docs/reference/rst/Orange.data.domain.rst @ 9678:928c30f33a6f

Revision 9678:928c30f33a6f, 18.2 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Still more changes to Domain documentation

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of variables and meta
8attributes that describe data. A domain descriptor is attached to data
9instances, data tables, classifiers and other objects. A descriptor is
10constructed, for instance, after reading data from a file.
11
12    >>> data = Orange.data.Table("zoo")
13    >>> domain = data.domain
14    >>> domain
15    [hair, feathers, eggs, milk, airborne, aquatic, predator, toothed,
16    backbone, breathes, venomous, fins, legs, tail, domestic, catsize,
17    type], {-2:name}
18
19Domains consists of ordinary features (from "hair" to "catsize" in the
20above example), the class attribute ("type"), and meta attributes
21("name"). We will refer to features and the class attribute as
22*variables*. Variables are printed out in a form similar to a list whose
23elements are attribute names, and meta attributes are printed like a
24dictionary whose "keys" are meta attribute id's and "values" are
25attribute names. In the above case, each data instance corresponds to an
26animal and is described by the animal's properties and its type (the
27class); the meta attribute contains the animal's name.
28
29Domains as lists and dictionaries
30=================================
31
32Domains behave like lists: the length of domain is the number of
33variables including the class variable. Domains can be indexed by integer
34indices, variable names or instances of
35:obj:`Orange.data.variable.Variable`::
36
37    >>> domain["feathers"]
38    EnumVariable 'feathers'
39    >>> domain[1]
40    EnumVariable 'feathers'
41    >>> feathers = domain[1]
42    >>> domain[feathers]
43    EnumVariable 'feathers'
44
45Meta attributes work the same::
46
47    >>> domain[-2]
48    StringVariable 'name'
49    >>> domain["name"]
50    StringVariable 'name'
51
52
53Slices can be retrieved, but not set. Iterating through domain goes
54through features and the class variable, but not through meta attributes::
55
56    >>> for attr in domain:
57    ...     print attr.name,
58    ...
59    hair feathers eggs milk airborne aquatic predator toothed backbone
60    breathes venomous fins legs tail domestic catsize type
61
62Method :obj:`Domain.index` returns the index of a variable specified by a
63descriptor or name::
64
65    >>> domain.index("feathers")
66    1
67    >>> domain.index(feathers)
68    1
69    >>> domain.index("name")
70    -2
71
72
73Conversions between domains
74===========================
75
76Domain descriptors can convert instances from one domain to another
77(details on construction of domains are described later). ::
78
79     >>> new_domain = Orange.data.Domain(["feathers", "legs", "type"], domain)
80     >>> inst = data[55]
81     >>> inst
82     ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', '4',
83     '1', '0', '1', 'mammal'], {"name":'oryx'}
84     >>> inst2 = new_domain(inst)
85     >>> inst2
86     ['0', '4', 'mammal']
87
88This is used, for instance, in classifiers: classifiers are often
89trained on a preprocessed domain (e.g. on a subset of features or
90on discretized data) and later used on instances from the original
91domain. Classifiers store the training domain descriptor and use it
92for converting new instances.
93
94Alternatively, instances can be converted by constructing a new instance
95and pass the new domain to the constructor::
96
97     >>> inst2 = Orange.data.Instance(new_domain, inst)
98
99Entire data table can be converted in a similar way::
100
101     >>> data2 = Orange.data.Table(new_domain, data)
102     >>> data2[55]
103     ['0', '4', 'mammal']
104
105
106.. _multiple-classes:
107
108Multiple classes
109================
110
111A domain can have multiple additional class attributes. These are stored
112similarly to other features except that they are not used for learning. The
113list of such classes is stored in :obj:`~Orange.data.Domain.class_vars`.
114When converting between domains, multiple classes can become ordinary
115features or the class, and vice versa.
116
117.. _meta-attributes:
118
119Meta attributes
120===============
121
122Meta attributes hold additional data attached to individual
123instances. Different instances from the same domain or even the same
124table may have different meta attributes. (See documentation on
125:obj:`Orange.data.Instance` for details about meta values.)
126
127Meta attributes that appear in instances can - but don't need to - be
128listed in the domain. Typically, the meta attribute will be included in
129the domain for the following reasons.
130
131     * If the domain knows about meta attributes, their values can be
132       obtained with indexing by names and variable descriptors,
133       e.g. ``inst["age"]``. Values of unknown meta attributes
134       can be obtained only through integer indices (e.g. inst[id], where
135       id needs to be an integer).
136
137     * When printing out a data instance, the symbolic values of discrete
138       meta attributes can only be printed if the attribute is
139       registered. Also, if the attribute is registered, the printed
140       out example will show a (more informative) attribute's name
141       instead of a meta-id.
142
143     * When saving instances to a file, only the values of registered
144       meta attributes are saved.
145
146     * When a new data instance is constructed, it will have all the
147       meta attributes listed in the domain, with their values set to
148       unknown.
149
150Meta attribute can be marked as "optional". Non-optional meta
151attributes are *expected to be* present in all data instances from that
152domain. This rule is not strictly enforced. As one of the few places
153where the difference matters, saving to files fails if a non-optional
154meta value is missing; optional attributes are not written to the file
155at all. Also, newly constructed data instances initially have all the
156non-optional meta attributes.
157
158While the list of features and the class value are immutable,
159meta attributes can be added and removed at any time::
160
161     >>> misses = Orange.data.variable.Continuous("misses")
162     >>> id = Orange.data.new_meta_id()
163     >>> data.domain.add_meta(id, misses)
164
165This does not change the data: no attributes are added to data
166instances. Methods related to meta attributes are described in more
167details later.
168
169Registering meta attributes enables addressing by indexing, either by
170name or by descriptor. For instance, the following snippet sets the new
171attribute to 0 for all instances in the data table::
172
173     >>> for inst in data:
174     ...     inst[misses] = 0
175
176An alternative is to refer to the attribute by name::
177
178     >>> for inst in data:
179     ...     inst["misses"] = 0
180
181If the attribute were not registered, it could still be set using the
182integer index::
183
184     >>> for inst in data:
185     ...    inst.set_meta(id, 0)
186
187Registering the meta attribute also enhances printouts. When an instance
188is printed, meta-values for registered meta attributes are shown as
189"name:value" pairs, while for unregistered only id is given instead
190of a name.
191
192A meta-attribute can be used, for instance, to record the number of
193misclassifications by a given ``classifier``::
194
195     >>> for inst in data:
196     ... if inst.get_class() != classifier(inst):
197     ...     inst[misses] += 1
198
199The other effect of registering meta attributes is that they appear in
200converted instances: whenever an instances is converted to some
201domain, it will have all the meta attributes that are registered in
202that domain. If the meta attributes occur in the original domain of
203the instance or if they can be computed from them, they will have
204appropriate values, otherwise their value will be missing. ::
205
206    new_domain = Orange.data.Domain(["feathers", "legs"], domain)
207    new_domain.add_meta(Orange.data.new_meta_id(), domain["type"])
208    new_domain.add_meta(Orange.data.new_meta_id(), domain["legs"])
209    new_domain.add_meta(
210        Orange.data.new_meta_id(), Orange.data.variable.Discrete("X"))
211    data2 = Orange.data.Table(new_domain, data)
212
213Domain ``new_domain`` in this example has variables ``feathers`` and
214``legs`` and meta attributes ``type``, ``legs`` (again) and ``X`` which
215is a new feature with no relation to the existing ones. ::
216
217    >>> data[55]
218    ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
219    '4', '1', '0', '1', 'mammal'], {"name":'oryx'}
220    >>> data2[55]
221    ['0', '4'], {"type":'mammal', "legs":'4', "X":'?'}
222
223
224
225.. class:: Domain
226
227     .. attribute:: features
228
229         List of domain attributes
230         (:obj:`Orange.data.variable.Variables`) without the class
231         variable. Read only.
232
233     .. attribute:: variables
234
235     List of domain attributes
236     (:obj:`~Orange.data.variable.Variables`) including the class
237     variable. Read only.
238
239     .. attribute:: class_var
240
241     The class variable (:obj:`~Orange.data.variable.Variable`), or
242     :obj:`None` if there is none. Read only.
243
244     .. attribute:: class_vars
245
246     A list of additional class attributes. Read only.
247
248     .. attribute:: version
249
250     An integer value that is changed when the domain is
251     modified. The value can be also used as unique domain identifier; two
252     different domains have different value of ``version``.
253
254     .. method:: __init__(variables[, class_vars=])
255
256     Construct a domain with the given variables; the
257     last one is used as the class variable. ::
258
259         >>> a, b, c = [Orange.data.variable.Discrete(x) for x in "abc"]
260         >>> domain = Orange.data.Domain([a, b, c])
261         >>> domain.features
262         <EnumVariable 'a', EnumVariable 'b'>
263         >>> domain.class_var
264         EnumVariable 'c'
265
266     :param variables: List of variables (instances of :obj:`~Orange.data.variable.Variable`)
267     :type variables: list
268     :param class_vars: A list of multiple classes; must be a keword argument
269     :type class_vars: list
270
271     .. method:: __init__(features, class_variable[, class_vars=])
272
273     Construct a domain with the given list of features and the
274     class variable. ::
275
276         >>> domain = Orange.data.Domain([a, b], c)
277         >>> domain.features
278         <EnumVariable 'a', EnumVariable 'b'>
279         >>> domain.class_var
280         EnumVariable 'c'
281
282     :param features: List of features (instances of :obj:`~Orange.data.variable.Variable`)
283     :type features: list
284     :param class_variable: Class variable
285     :type class_variable: Orange.data.variable.Variable
286     :param class_vars: A list of multiple classes; must be a keyword argument
287     :type class_vars: list
288
289     .. method:: __init__(variables, has_class[, class_vars=])
290
291     Construct a domain with the given variables. If `has_class` is
292     :obj:`True`, the last one is used as the class variable. ::
293
294         >>> domain = Orange.data.Domain([a, b, c], False)
295         >>> domain.features
296         <EnumVariable 'a', EnumVariable 'b'>
297         >>> domain.class_var
298         EnumVariable 'c'
299
300     :param variables: List of variables (instances of :obj:`~Orange.data.variable.Variable`)
301     :type features: list
302     :param has_class: A flag telling whether the domain has a class
303     :type has_class: bool
304     :param class_vars: A list of multiple classes; must be a keyword argument
305     :type class_vars: list
306
307     .. method:: __init__(variables, source[, class_vars=])
308
309     Construct a domain with the given variables that can also be
310     specified by names if the variables with that names exist in the
311     source list. The last variable from the list is used as the class
312     variable. ::
313
314         >>> domain1 = orange.Domain([a, b])
315         >>> domain2 = orange.Domain(["a", b, c], domain)
316
317     :param variables: List of variables (strings or instances of :obj:`~Orange.data.variable.Variable`)
318     :type variables: list
319     :param source: An existing domain or a list of variables
320     :type source: Orange.data.Domain or list of :obj:`~Orange.data.variable.Variable`
321     :param class_vars: A list of multiple classes; must be a keyword argument
322     :type class_vars: list
323
324     .. method:: __init__(variables, has_class, source[, class_vars=])
325
326     Similar to above except for the flag which tells whether the
327     last variable should be used as the class variable. ::
328
329         >>> domain1 = orange.Domain([a, b], False)
330         >>> domain2 = orange.Domain(["a", b, c], False, domain)
331
332     :param variables: List of variables (strings or instances of :obj:`~Orange.data.variable.Variable`)
333     :type variables: list
334     :param has_class: A flag telling whether the domain has a class
335     :type has_class: bool
336     :param source: An existing domain or a list of variables
337     :type source: Orange.data.Domain or list of :obj:`~Orange.data.variable.Variable`
338     :param class_vars: A list of multiple classes; must be a keyword argument
339     :type class_vars: list
340
341     .. method:: __init__(domain, class_var[, class_vars=])
342
343     Construct a copy of an existing domain
344     except that the class variable is replaced with the given one
345     and the class variable of the existing domain becomes an
346     ordinary feature. If the new class is one of the original
347     domain's features, it can also be specified by a name.
348
349     :param domain: An existing domain
350     :type domain: :obj:`~Orange.variable.Domain`
351     :param class_var: Class variable for the new domain
352     :type class_var: string or :obj:`~Orange.data.variable.Variable`
353     :param class_vars: A list of multiple classes; must be a keyword argument
354     :type class_vars: list
355
356     .. method:: __init__(domain, has_class=False[, class_vars=])
357
358     Construct a copy of the domain. If the ``has_class``
359     flag is given and is :obj:`False`, it moves the class
360     attribute to ordinary features.
361
362     :param domain: An existing domain
363     :type domain: :obj:`~Orange.variable.Domain`
364     :param has_class: A flag telling whether the domain has a class
365     :type has_class: bool
366     :param class_vars: A list of multiple classes; must be a keword argument
367     :type class_vars: list
368
369     .. method:: has_discrete_attributes(include_class=True)
370
371     Return :obj:`True` if the domain has any discrete variables;
372     class is included unless ``include_class`` is ``False``.
373
374     :param has_class: Tells whether to consider the class variable
375     :type has_class: bool
376     :rtype: bool
377
378     .. method:: has_continuous_attributes(include_class=True)
379
380     Return :obj:`True` if the domain has any continuous variables;
381     class is included unless ``include_class`` is ``False``.
382
383     :param has_class: Tells whether to consider the class variable
384     :type has_class: bool
385     :rtype: bool
386
387     .. method:: has_other_attributes(include_class=True)
388
389     Return :obj:`True` if the domain has any variables which are
390     neither discrete nor continuous, such as, for instance string variables.
391     class is included unless ``include_class`` is ``False``.
392
393     :param has_class: Tells whether to consider the class variable
394     :type has_class: bool
395     :rtype: bool
396
397
398     .. method:: add_meta(id, variable, optional=0)
399
400     Register a meta attribute with the given id (obtained by
401     :obj:`Orange.data.new_meta_id`). The same meta attribute should
402     have the same id in all domain in which it is registered. ::
403
404         >>> newid = Orange.data.new_meta_id()
405         >>> domain.add_meta(newid, Orange.data.variable.String("origin"))
406         >>> data[55]["origin"] = "Nepal"
407         >>> data[55]
408         ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
409         '4', '1', '0', '1', 'mammal'], {"name":'oryx', "origin":'Nepal'}
410
411     The third argument tells whether the meta attribute is optional or
412     not. The parameter is an integer, with any non-zero value meaning that
413     the attribute is optional. Different values can be used to distinguish
414     between various types optional attributes; the meaning of the value
415     is not defined in advance and can be used arbitrarily by the
416     application.
417
418     :param id: id of the new meta attribute
419     :type id: int
420     :param variable: variable descriptor
421     :type variable: Orange.data.variable.Variable
422     :param optional: tells whether the meta attribute is optional
423     :type optional: int
424
425     .. method:: add_metas(attributes, optional=0)
426
427     Add multiple meta attributes at once. The dictionary contains id's as
428     keys and variables (:obj:~Orange.data.variable as the corresponding
429     values. The following example shows how to add all meta attributes
430     from one domain to another::
431
432          newdomain.add_metas(domain.get_metas())
433
434     The optional second argument has the same meaning as in :obj:`add_meta`.
435
436     :param attributes: dictionary of id's and variables
437     :type attributes: dict
438     :param optional: tells whether the meta attribute is optional
439     :type optional: int
440
441     .. method:: remove_meta(attribute)
442
443     Removes one or multiple meta attributes. Removing a meta attribute has
444     no effect on data instances.
445
446     :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
447     :type attribute: string, int, Orange.data.variable.Variable; or a list
448
449     .. method:: has_attribute(attribute)
450
451     Return True if the domain contains the specified meta attribute.
452
453     :param attribute: attribute to be checked
454     :type attribute: string, int, Orange.data.variable.Variable
455     :rtype: bool
456
457     .. method:: meta_id(attribute)
458
459     Return an id of a meta attribute.
460
461     :param attribute: name or variable descriptor of the attribute
462     :type attribute: string or Orange.data.variable.Variable
463     :rtype: int
464
465     .. method:: get_meta(attribute)
466
467     Return a variable descriptor corresponding to the meta attribute.
468
469     :param attribute: name or id of the attribute
470     :type attribute: string or int
471     :rtype: Orange.data.variable.Variable
472
473     .. method:: get_metas()
474
475      Return a dictionary with meta attribute id's as keys and corresponding
476      variable descriptors as values.
477
478     .. method:: get_metas(optional)
479
480      Return a dictionary with meta attribute id's as keys and corresponding
481      variable descriptors as values. The dictionary contains only meta
482      attributes for which the argument ``optional`` matches the flag given
483      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`.
484
485      :param optional: flag that specifies the attributes to be returned
486      :type optional: int
487      :rtype: dict
488
489     .. method:: is_optional_meta(attribute)
490
491     Return True if the given meta attribute is optional, and False if it is
492     not.
493
494     :param attribute: attribute to be checked
495     :type attribute: string, int, Orange.data.variable.Variable
496     :rtype: bool
Note: See TracBrowser for help on using the repository browser.