source: orange/docs/reference/rst/Orange.data.domain.rst @ 9704:4eaf240118e8

Revision 9704:4eaf240118e8, 18.7 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Finished polishing documentation about Domain

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of variables and meta
8attributes that describe data. A domain descriptor is attached to data
9instances, data tables, classifiers and other objects. A descriptor is
10constructed, for instance, after reading data from a file.
11
12    >>> data = Orange.data.Table("zoo")
13    >>> domain = data.domain
14    >>> domain
15    [hair, feathers, eggs, milk, airborne, aquatic, predator, toothed,
16    backbone, breathes, venomous, fins, legs, tail, domestic, catsize,
17    type], {-2:name}
18
19Domains consists of ordinary features (from "hair" to "catsize" in the
20above example), the class attribute ("type"), and meta attributes
21("name"). We will refer to features and the class attribute as
22*variables*. Variables are printed out in a form similar to a list whose
23elements are attribute names, and meta attributes are printed like a
24dictionary whose "keys" are meta attribute id's and "values" are
25attribute names. In the above case, each data instance corresponds to an
26animal and is described by the animal's properties and its type (the
27class); the meta attribute contains the animal's name.
28
29Domains as lists and dictionaries
30=================================
31
32Domains behave like lists: the length of domain is the number of
33variables including the class variable. Domains can be indexed by integer
34indices, variable names or instances of
35:obj:`Orange.data.variable.Variable`::
36
37    >>> domain["feathers"]
38    EnumVariable 'feathers'
39    >>> domain[1]
40    EnumVariable 'feathers'
41    >>> feathers = domain[1]
42    >>> domain[feathers]
43    EnumVariable 'feathers'
44
45Meta attributes work the same::
46
47    >>> domain[-2]
48    StringVariable 'name'
49    >>> domain["name"]
50    StringVariable 'name'
51
52
53Slices can be retrieved, but not set. Iterating through domain goes
54through features and the class variable, but not through meta attributes::
55
56    >>> for attr in domain:
57    ...     print attr.name,
58    ...
59    hair feathers eggs milk airborne aquatic predator toothed backbone
60    breathes venomous fins legs tail domestic catsize type
61
62Method :obj:`Domain.index` returns the index of a variable specified by a
63descriptor or name::
64
65    >>> domain.index("feathers")
66    1
67    >>> domain.index(feathers)
68    1
69    >>> domain.index("name")
70    -2
71
72
73Conversions between domains
74===========================
75
76Domain descriptors can convert instances from one domain to another
77(details on construction of domains are described later). ::
78
79     >>> new_domain = Orange.data.Domain(["feathers", "legs", "type"], domain)
80     >>> inst = data[55]
81     >>> inst
82     ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', '4',
83     '1', '0', '1', 'mammal'], {"name":'oryx'}
84     >>> inst2 = new_domain(inst)
85     >>> inst2
86     ['0', '4', 'mammal']
87
88This is used, for instance, in classifiers: classifiers are often
89trained on a preprocessed domain (e.g. on a subset of features or
90on discretized data) and later used on instances from the original
91domain. Classifiers store the training domain descriptor and use it
92for converting new instances.
93
94Alternatively, instances can be converted by constructing a new instance
95and pass the new domain to the constructor::
96
97     >>> inst2 = Orange.data.Instance(new_domain, inst)
98
99Entire data table can be converted in a similar way::
100
101     >>> data2 = Orange.data.Table(new_domain, data)
102     >>> data2[55]
103     ['0', '4', 'mammal']
104
105
106.. _multiple-classes:
107
108Multiple classes
109================
110
111A domain can have multiple additional class attributes. These are stored
112similarly to other features except that they are not used for learning. The
113list of such classes is stored in :obj:`~Orange.data.Domain.class_vars`.
114When converting between domains, multiple classes can become ordinary
115features or the class, and vice versa.
116
117.. _meta-attributes:
118
119Meta attributes
120===============
121
122Meta attributes hold additional data attached to individual
123instances. Different instances from the same domain or even the same
124table may have different meta attributes. (See documentation on
125:obj:`Orange.data.Instance` for details about meta values.)
126
127Meta attributes that appear in instances can - but don't need to - be
128listed in the domain. Typically, the meta attribute will be included in
129the domain for the following reasons.
130
131     * If the domain knows about meta attributes, their values can be
132       obtained with indexing by names and variable descriptors,
133       e.g. ``inst["age"]``. Values of unknown meta attributes
134       can be obtained only through integer indices (e.g. inst[id], where
135       id needs to be an integer).
136
137     * When printing out a data instance, the symbolic values of discrete
138       meta attributes can only be printed if the attribute is
139       registered. Also, if the attribute is registered, the printed
140       out example will show a (more informative) attribute's name
141       instead of a meta-id.
142
143     * When saving instances to a file, only the values of registered
144       meta attributes are saved.
145
146     * When a new data instance is constructed, it will have all the
147       meta attributes listed in the domain, with their values set to
148       unknown.
149
150Meta attribute can be marked as "optional". Non-optional meta
151attributes are *expected to be* present in all data instances from that
152domain. This rule is not strictly enforced. As one of the few places
153where the difference matters, saving to files fails if a non-optional
154meta value is missing; optional attributes are not written to the file
155at all. Also, newly constructed data instances initially have all the
156non-optional meta attributes.
157
158While the list of features and the class value are immutable,
159meta attributes can be added and removed at any time::
160
161     >>> misses = Orange.data.variable.Continuous("misses")
162     >>> id = Orange.data.new_meta_id()
163     >>> data.domain.add_meta(id, misses)
164
165This does not change the data: no attributes are added to data
166instances. Methods related to meta attributes are described in more
167details later.
168
169Registering meta attributes enables addressing by indexing, either by
170name or by descriptor. For instance, the following snippet sets the new
171attribute to 0 for all instances in the data table::
172
173     >>> for inst in data:
174     ...     inst[misses] = 0
175
176An alternative is to refer to the attribute by name::
177
178     >>> for inst in data:
179     ...     inst["misses"] = 0
180
181If the attribute were not registered, it could still be set using the
182integer index::
183
184     >>> for inst in data:
185     ...    inst.set_meta(id, 0)
186
187Registering the meta attribute also enhances printouts. When an instance
188is printed, meta-values for registered meta attributes are shown as
189"name:value" pairs, while for unregistered only id is given instead
190of a name.
191
192A meta-attribute can be used, for instance, to record the number of
193misclassifications by a given ``classifier``::
194
195     >>> for inst in data:
196     ... if inst.get_class() != classifier(inst):
197     ...     inst[misses] += 1
198
199The other effect of registering meta attributes is that they appear in
200converted instances: whenever an instances is converted to some
201domain, it will have all the meta attributes that are registered in
202that domain. If the meta attributes occur in the original domain of
203the instance or if they can be computed from them, they will have
204appropriate values, otherwise their value will be missing. ::
205
206    new_domain = Orange.data.Domain(["feathers", "legs"], domain)
207    new_domain.add_meta(Orange.data.new_meta_id(), domain["type"])
208    new_domain.add_meta(Orange.data.new_meta_id(), domain["legs"])
209    new_domain.add_meta(
210        Orange.data.new_meta_id(), Orange.data.variable.Discrete("X"))
211    data2 = Orange.data.Table(new_domain, data)
212
213Domain ``new_domain`` in this example has variables ``feathers`` and
214``legs`` and meta attributes ``type``, ``legs`` (again) and ``X`` which
215is a new feature with no relation to the existing ones. ::
216
217    >>> data[55]
218    ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
219    '4', '1', '0', '1', 'mammal'], {"name":'oryx'}
220    >>> data2[55]
221    ['0', '4'], {"type":'mammal', "legs":'4', "X":'?'}
222
223
224
225.. class:: Domain
226
227     .. attribute:: features
228
229         List of domain attributes
230         (of type :obj:`Orange.data.variable.Variables`) without the class
231         variable. Read only.
232
233     .. attribute:: variables
234
235         List of domain attributes including the class variable. Read only.
236
237     .. attribute:: class_var
238
239         The class variable (:obj:`~Orange.data.variable.Variable`) or
240         ``None``. Read only.
241
242     .. attribute:: class_vars
243
244         A list of additional class attributes. Read only.
245
246     .. attribute:: version
247
248         An integer value that is changed when the domain is
249         modified. The value can be also used as unique domain identifier; two
250         different domains have different value of ``version``.
251
252     .. method:: __init__(variables[, class_vars=])
253
254         Construct a domain with the given variables; the
255         last one is used as the class variable. ::
256
257             >>> a, b, c = [Orange.data.variable.Discrete(x) for x in "abc"]
258             >>> domain = Orange.data.Domain([a, b, c])
259             >>> domain.features
260             <EnumVariable 'a', EnumVariable 'b'>
261             >>> domain.class_var
262             EnumVariable 'c'
263
264         :param variables: List of variables (instances of :obj:`~Orange.data.variable.Variable`)
265         :type variables: list
266         :param class_vars: A list of multiple classes; must be a keword argument
267         :type class_vars: list
268
269     .. method:: __init__(features, class_variable[, class_vars=])
270
271         Construct a domain with the given list of features and the
272         class variable. ::
273
274             >>> domain = Orange.data.Domain([a, b], c)
275             >>> domain.features
276             <EnumVariable 'a', EnumVariable 'b'>
277             >>> domain.class_var
278             EnumVariable 'c'
279
280         :param features: List of features (instances of :obj:`~Orange.data.variable.Variable`)
281         :type features: list
282         :param class_variable: Class variable
283         :type class_variable: Orange.data.variable.Variable
284         :param class_vars: A list of multiple classes; must be a keyword argument
285         :type class_vars: list
286
287     .. method:: __init__(variables, has_class[, class_vars=])
288
289         Construct a domain with the given variables. If ``has_class``
290         is ``True``, the last variable is the class. ::
291
292             >>> domain = Orange.data.Domain([a, b, c], False)
293             >>> domain.features
294             <EnumVariable 'a', EnumVariable 'b'>
295             >>> domain.class_var
296             EnumVariable 'c'
297
298         :param variables: List of variables (instances of :obj:`~Orange.data.variable.Variable`)
299         :type features: list
300         :param has_class: A flag telling whether the domain has a class
301         :type has_class: bool
302         :param class_vars: A list of multiple classes; must be a keyword argument
303         :type class_vars: list
304
305     .. method:: __init__(variables, source[, class_vars=])
306
307         Construct a domain with the given variables. Variables specified
308         by names are sought for in the ``source`` argument. The last
309         variable from the list is used as the class variable. ::
310
311             >>> domain1 = orange.Domain([a, b])
312             >>> domain2 = orange.Domain(["a", b, c], domain)
313
314         :param variables: List of variables (strings or instances of :obj:`~Orange.data.variable.Variable`)
315         :type variables: list
316         :param source: An existing domain or a list of variables
317         :type source: Orange.data.Domain or list of :obj:`~Orange.data.variable.Variable`
318         :param class_vars: A list of multiple classes; must be a keyword argument
319         :type class_vars: list
320
321     .. method:: __init__(variables, has_class, source[, class_vars=])
322
323         Similar to above except for the flag which tells whether the
324         last variable should be used as the class variable. ::
325
326             >>> domain1 = orange.Domain([a, b], False)
327             >>> domain2 = orange.Domain(["a", b, c], False, domain)
328
329         :param variables: List of variables (strings or instances of :obj:`~Orange.data.variable.Variable`)
330         :type variables: list
331         :param has_class: A flag telling whether the domain has a class
332         :type has_class: bool
333         :param source: An existing domain or a list of variables
334         :type source: Orange.data.Domain or list of :obj:`~Orange.data.variable.Variable`
335         :param class_vars: A list of multiple classes; must be a keyword argument
336         :type class_vars: list
337
338     .. method:: __init__(domain, class_var[, class_vars=])
339
340         Construct a copy of an existing domain except that the class
341         variable is replaced with the one specified in the argument
342         and the class variable of the existing domain becomes an
343         ordinary feature. If the new class is one of the original
344         domain's features, ``class_var`` can also be specified by name.
345
346         :param domain: An existing domain
347         :type domain: :obj:`~Orange.variable.Domain`
348         :param class_var: Class variable for the new domain
349         :type class_var: :obj:`~Orange.data.variable.Variable` or string
350         :param class_vars: A list of multiple classes; must be a keyword argument
351         :type class_vars: list
352
353     .. method:: __init__(domain, has_class=False[, class_vars=])
354
355         Construct a copy of the domain. If the ``has_class``
356         flag is given and is :obj:`False`, the class attribute becomes
357         an ordinary feature.
358
359         :param domain: An existing domain
360         :type domain: :obj:`~Orange.variable.Domain`
361         :param has_class: A flag indicating whether the domain will have a class
362         :type has_class: bool
363         :param class_vars: A list of multiple classes; must be a keword argument
364         :type class_vars: list
365
366     .. method:: has_discrete_attributes(include_class=True)
367
368         Return ``True`` if the domain has any discrete variables;
369         class is included unless ``include_class`` is ``False``.
370
371         :param has_class: tells whether to consider the class variable
372         :type has_class: bool
373         :rtype: bool
374
375     .. method:: has_continuous_attributes(include_class=True)
376
377         Return ``True`` if the domain has any continuous variables;
378         class is included unless ``include_class`` is ``False``.
379
380         :param has_class: tells whether to consider the class variable
381         :type has_class: bool
382         :rtype: bool
383
384     .. method:: has_other_attributes(include_class=True)
385
386         Return ``True`` if the domain has any variables that are
387         neither discrete nor continuous, such as, for instance string
388         variables. The class is included unless ``include_class`` is
389         ``False``.
390
391         :param has_class: tells whether to consider the class variable
392         :type has_class: bool
393         :rtype: bool
394
395
396     .. method:: add_meta(id, variable, optional=0)
397
398         Register a meta attribute with the given id (see
399         :obj:`Orange.data.new_meta_id`). The same meta attribute should
400         have the same id in all domains in which it is registered. ::
401
402             >>> newid = Orange.data.new_meta_id()
403             >>> domain.add_meta(newid, Orange.data.variable.String("origin"))
404             >>> data[55]["origin"] = "Nepal"
405             >>> data[55]
406             ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
407             '4', '1', '0', '1', 'mammal'], {"name":'oryx', "origin":'Nepal'}
408
409         The third argument tells whether the meta attribute is optional or
410         not; non-zero values indicate optional attributes. Different
411         values can be used to distinguish between various types
412         optional attributes; the meaning of the value is not defined in
413         advance and can be used arbitrarily by the application.
414
415         :param id: id of the new meta attribute
416         :type id: int
417         :param variable: variable descriptor
418         :type variable: Orange.data.variable.Variable
419         :param optional: indicates whether the meta attribute is optional
420         :type optional: int
421
422     .. method:: add_metas(attributes, optional=0)
423
424         Add multiple meta attributes at once. The dictionary contains id's as
425         keys and variables (:obj:`~Orange.data.variable.Variable`) as the
426         corresponding values. The following example shows how to add all
427          meta attributes from another domain::
428
429              >>> newdomain.add_metas(domain.get_metas())
430
431         The optional second argument has the same meaning as in :obj:`add_meta`.
432
433         :param attributes: dictionary of id's and variables
434         :type attributes: dict
435         :param optional: tells whether the meta attribute is optional
436         :type optional: int
437
438     .. method:: remove_meta(attribute)
439
440         Removes one or multiple meta attributes. Removing a meta attribute has
441         no effect on data instances.
442
443         :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
444         :type attribute: string, int, Orange.data.variable.Variable; or a list
445
446     .. method:: has_attribute(attribute)
447
448         Return ``True`` if the domain contains the specified meta
449         attribute.
450
451         :param attribute: attribute to be checked
452         :type attribute: string, int, Orange.data.variable.Variable
453         :rtype: bool
454
455     .. method:: meta_id(attribute)
456
457         Return an id of a meta attribute.
458
459         :param attribute: name or variable descriptor of the attribute
460         :type attribute: string or Orange.data.variable.Variable
461         :rtype: int
462
463     .. method:: get_meta(attribute)
464
465         Return a variable descriptor corresponding to the meta attribute.
466
467         :param attribute: name or id of the attribute
468         :type attribute: string or int
469         :rtype: Orange.data.variable.Variable
470
471     .. method:: get_metas()
472
473          Return a dictionary with meta attribute id's as keys and
474          corresponding variable descriptors as values.
475
476     .. method:: get_metas(optional)
477
478          Return a dictionary with meta attribute id's as keys and
479          corresponding variable descriptors as values. The dictionary
480          contains only meta attributes for which the argument ``optional``
481          matches the flag given when the attributes were added using
482          :obj:`add_meta` or :obj:`add_metas`.
483
484          :param optional: flag that specifies the attributes to be returned
485          :type optional: int
486          :rtype: dict
487
488     .. method:: is_optional_meta(attribute)
489
490         Return ``True`` if the given meta attribute is optional,
491         and ``False`` if it is not.
492
493         :param attribute: attribute to be checked
494         :type attribute: string, int, Orange.data.variable.Variable
495         :rtype: bool
Note: See TracBrowser for help on using the repository browser.