source: orange/docs/reference/rst/Orange.data.domain.rst @ 9664:6638cc93015a

Revision 9664:6638cc93015a, 17.8 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Small fixed in documention of Orange.data.domain

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of variables and meta
8attributes that describe data. A domain descriptor is attached to data
9instances, data tables, classifiers and other objects. A descriptor is
10constructed, for instance, after reading data from a file.
11
12    >>> data = Orange.data.Table("zoo")
13    >>> domain = data.domain
14    >>> domain
15    [hair, feathers, eggs, milk, airborne, aquatic, predator, toothed,
16    backbone, breathes, venomous, fins, legs, tail, domestic, catsize,
17    type], {-2:name}
18
19Domains consists of ordinary features (from "hair" to "catsize" in the
20above example), the class attribute ("type"), and meta attributes
21("name"). We will refer to features and the class attribute as
22*variables*. Variables are printed out in a form similar to a list whose
23elements are attribute names, and meta attributes are printed like a
24dictionary whose "keys" are meta attribute id's and "values" are
25attribute names. In the above case, each data instance corresponds to an
26animal and is described by the animal's properties and its type (the
27class); the meta attribute contains the animal's name.
28
29Domains as lists and dictionaries
30=================================
31
32Domains behave like lists: the length of domain is the number of
33variables including the class variable. Domains can be indexed by integer
34indices, variable names or instances of
35:obj:`Orange.data.variable.Variable`::
36
37    >>> domain["feathers"]
38    EnumVariable 'feathers'
39    >>> domain[1]
40    EnumVariable 'feathers'
41    >>> feathers = domain[1]
42    >>> domain[feathers]
43    EnumVariable 'feathers'
44
45Meta attributes work the same::
46
47    >>> domain[-2]
48    StringVariable 'name'
49    >>> domain["name"]
50    StringVariable 'name'
51
52
53Slices can be retrieved, but not set. Iterating through domain goes
54through features and the class variable, but not through meta attributes::
55
56    >>> for attr in domain:
57    ...     print attr.name,
58    ...
59    hair feathers eggs milk airborne aquatic predator toothed backbone
60    breathes venomous fins legs tail domestic catsize type
61
62Method :obj:`Domain.index` returns the index of a variable specified by a
63descriptor or name::
64
65    >>> domain.index("feathers")
66    1
67    >>> domain.index(feathers)
68    1
69    >>> domain.index("name")
70    -2
71
72
73Conversions between domains
74===========================
75
76Domain descriptors can convert instances from one domain to another
77(details on construction of domains are described later). ::
78
79     >>> new_domain = Orange.data.Domain(["feathers", "legs", "type"], domain)
80     >>> inst = data[55]
81     >>> inst
82     ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', '4',
83     '1', '0', '1', 'mammal'], {"name":'oryx'}
84     >>> inst2 = new_domain(inst)
85     >>> inst2
86     ['0', '4', 'mammal']
87
88This is used, for instance, in classifiers: classifiers are often
89trained on a preprocessed domain (e.g. on a subset of features or
90on discretized data) and later used on instances from the original
91domain. Classifiers store the training domain descriptor and use it
92for converting new instances.
93
94Alternatively, instances can be converted by constructing a new instance
95and pass the new domain to the constructor::
96
97     >>> inst2 = Orange.data.Instance(new_domain, inst)
98
99Entire data table can be converted in a similar way::
100
101     >>> data2 = Orange.data.Table(new_domain, data)
102     >>> data2[55]
103     ['0', '4', 'mammal']
104
105
106.. _multiple-classes:
107
108Multiple classes
109================
110
111A domain can have multiple additional class attributes. These are stored
112similarly to other features except that they are not used for learning. The
113list of such classes is stored in :obj:`~Orange.data.Domain.class_vars`.
114When converting between domains, multiple classes can become ordinary
115features or the class, and vice versa.
116
117.. _meta-attributes:
118
119Meta attributes
120===============
121
122Meta attributes hold additional data attached to individual
123instances. Different instances from the same domain or even the same
124table may have different meta attributes. See documentation on
125:obj:`Orange.data.Instance` for a more thorough description of meta
126values.
127
128Meta attributes that appear in instances can, but don't need to be
129listed in the domain. Typically, the meta attribute will be included in
130the domain for the following reasons.
131
132     * If the domain knows about a meta attribute, their values can be
133       obtained with indexing by names and variable descriptors,
134       e.g. ``inst["age"]``. Values of unknown meta attributes
135       can be obtained only through integer indices (e.g. inst[id], where
136       id needs to be an integer).
137
138     * When printing out an instance, the symbolic values of discrete
139       meta attributes can only be printed if the attribute is
140       registered. Also, if the attribute is registered, the printed
141       out example will show a (more informative) attribute's name
142       instead of a meta-id.
143
144     * When saving instances to a file, only the values of registered
145       meta attributes are saved.
146
147     * When a new data instance is constructed, it will have all the
148       meta attributes listed in the domain, with their values set to
149       unknown.
150
151For the latter two points - saving to a file and construction of new
152instances - there is an additional flag: a meta attribute can be
153marked as "optional". Such meta attributes are not saved and not added
154to newly constructed data instances.
155
156Another distinction between the optional and non-optional meta
157attributes is that the latter are *expected to be* present in all
158data instances from that domain. Saving to files expects will fail
159if a non-optional meta value is missing; in most other places,
160these rules are not strictly enforced, so adhering to them is rather up
161to choice.
162
163While the list of features and the class value are constant,
164meta attributes can be added and removed at any time (a detailed
165description of methods related to meta attributes is given below)::
166
167     >>> misses = Orange.data.variable.Continuous("misses")
168     >>> id = Orange.data.new_meta_id()
169     >>> data.domain.add_meta(id, misses)
170
171This does not change the data: no attributes are added to data
172instances.
173
174Registering meta attributes enables addressing by indexing, either by
175name or by descriptor. For instance, the following snippet sets the new
176attribute to 0 for all instances in the data table::
177
178     >>> for inst in data:
179     ...     inst[misses] = 0
180
181An alternative is to refer to the attribute by name::
182
183     >>> for inst in data:
184     ...     inst["misses"] = 0
185
186If the attribute were not registered, it could still be set using the
187integer index::
188
189     >>> for inst in data:
190     ...    inst.set_meta(id, 0)
191
192Registering the meta attribute also enhances printouts. When an instance
193is printed, meta-values for registered meta attributes are shown as
194"name:value" pairs, while for unregistered only id is given instead
195of a name.
196
197A meta-attribute can be used, for instance, to record the number of
198misclassifications by a given ``classifier``::
199
200     >>> for inst in data:
201     ... if inst.get_class() != classifier(inst):
202     ...     inst[misses] += 1
203
204The other effect of registering meta attributes is that they appear in
205converted instances: whenever an instances is converted to some
206domain, it will have all the meta attributes that are registered in
207that domain. If the meta attributes occur in the original domain of
208the instance or if they can be computed from them, they will have
209appropriate values, otherwise their value will be missing. ::
210
211    new_domain = Orange.data.Domain(["feathers", "legs"], domain)
212    new_domain.add_meta(Orange.data.new_meta_id(), domain["type"])
213    new_domain.add_meta(Orange.data.new_meta_id(), domain["legs"])
214    new_domain.add_meta(
215        Orange.data.new_meta_id(), Orange.data.variable.Discrete("X"))
216    data2 = Orange.data.Table(new_domain, data)
217
218Domain ``new_domain`` in this example has variables ``feathers`` and
219``legs`` and meta attributes ``type``, ``legs`` (again) and ``X`` which
220is a new feature with no relation to the existing ones. ::
221
222    >>> data[55]
223    ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
224    '4', '1', '0', '1', 'mammal'], {"name":'oryx'}
225    >>> data2[55]
226    ['0', '4'], {"type":'mammal', "legs":'4', "X":'?'}
227
228
229
230.. class:: Domain
231
232     .. attribute:: features
233
234     List of domain attributes
235     (:obj:`Orange.data.variable.Variables`) without the class
236     variable. Read only.
237
238     .. attribute:: variables
239
240     List of domain attributes
241     (:obj:`Orange.data.variable.Variables`) including the class
242     variable. Read only.
243
244     .. attribute:: class_var
245
246     The class variable (:obj:`Orange.data.variable.Variable`), or
247     :obj:`None` if there is none. Read only.
248
249     .. attribute:: class_vars
250
251     A list of additional class attributes. Read only.
252
253     .. attribute:: version
254
255     An integer value that is changed when the domain is
256     modified. The value can be also used as unique domain identifier; two
257     different domains have different value of ``version``.
258
259     .. method:: __init__(variables[, class_vars=])
260
261     Construct a domain with the given variables; the
262     last one is used as the class variable. ::
263
264         >>> a, b, c = [Orange.data.variable.Discrete(x) for x in "abc"]
265         >>> domain = Orange.data.Domain([a, b, c])
266         >>> domain.features
267         <EnumVariable 'a', EnumVariable 'b'>
268         >>> domain.class_var
269         EnumVariable 'c'
270
271     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
272     :type variables: list
273     :param class_vars: A list of multiple classes; must be a keword argument
274     :type class_vars: list
275
276     .. method:: __init__(features, class_variable[, class_vars=])
277
278     Construct a domain with the given list of features and the
279     class variable. ::
280
281         >>> domain = Orange.data.Domain([a, b], c)
282         >>> domain.features
283         <EnumVariable 'a', EnumVariable 'b'>
284         >>> domain.class_var
285         EnumVariable 'c'
286
287     :param features: List of features (instances of :obj:`Orange.data.variable.Variable`)
288     :type features: list
289     :param class_variable: Class variable
290     :type class_variable: Orange.data.variable.Variable
291     :param class_vars: A list of multiple classes; must be a keyword argument
292     :type class_vars: list
293
294     .. method:: __init__(variables, has_class[, class_vars=])
295
296     Construct a domain with the given variables. If `has_class` is
297     :obj:`True`, the last one is used as the class variable. ::
298
299         >>> domain = Orange.data.Domain([a, b, c], False)
300         >>> domain.features
301         <EnumVariable 'a', EnumVariable 'b'>
302         >>> domain.class_var
303         EnumVariable 'c'
304
305     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
306     :type features: list
307     :param has_class: A flag telling whether the domain has a class
308     :type has_class: bool
309     :param class_vars: A list of multiple classes; must be a keyword argument
310     :type class_vars: list
311
312     .. method:: __init__(variables, source[, class_vars=])
313
314     Construct a domain with the given variables that can also be
315     specified by names if the variables with that names exist in the
316     source list. The last variable from the list is used as the class
317     variable. ::
318
319         >>> domain1 = orange.Domain([a, b])
320         >>> domain2 = orange.Domain(["a", b, c], domain)
321
322     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
323     :type variables: list
324     :param source: An existing domain or a list of variables
325     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
326     :param class_vars: A list of multiple classes; must be a keyword argument
327     :type class_vars: list
328
329     .. method:: __init__(variables, has_class, source[, class_vars=])
330
331     Similar to above except for the flag which tells whether the
332     last variable should be used as the class variable. ::
333
334         >>> domain1 = orange.Domain([a, b], False)
335         >>> domain2 = orange.Domain(["a", b, c], False, domain)
336
337     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
338     :type variables: list
339     :param has_class: A flag telling whether the domain has a class
340     :type has_class: bool
341     :param source: An existing domain or a list of variables
342     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
343     :param class_vars: A list of multiple classes; must be a keyword argument
344     :type class_vars: list
345
346     .. method:: __init__(domain, class_var[, class_vars=])
347
348     Construct a copy of an existing domain
349     except that the class variable is replaced with the given one
350     and the class variable of the existing domain becomes an
351     ordinary feature. If the new class is one of the original
352     domain's features, it can also be specified by a name.
353
354     :param domain: An existing domain
355     :type domain: :obj:`Orange.variable.Domain`
356     :param class_var: Class variable for the new domain
357     :type class_var: string or :obj:`Orange.data.variable.Variable`
358     :param class_vars: A list of multiple classes; must be a keword argument
359     :type class_vars: list
360
361     .. method:: __init__(domain, has_class=False[, class_vars=])
362
363     Construct a copy of the domain. If the ``has_class``
364     flag is given and is :obj:`False`, it moves the class
365     attribute to ordinary features.
366
367     :param domain: An existing domain
368     :type domain: :obj:`Orange.variable.Domain`
369     :param has_class: A flag telling whether the domain has a class
370     :type has_class: bool
371     :param class_vars: A list of multiple classes; must be a keword argument
372     :type class_vars: list
373
374     .. method:: has_discrete_attributes(include_class=True)
375
376     Return :obj:`True` if the domain has any discrete variables;
377     class is included unless ``include_class`` is ``False``.
378
379     :param has_class: Tells whether to consider the class variable
380     :type has_class: bool
381     :rtype: bool
382
383     .. method:: has_continuous_attributes(include_class=True)
384
385     Return :obj:`True` if the domain has any continuous variables;
386     class is included unless ``include_class`` is ``False``.
387
388     :param has_class: Tells whether to consider the class variable
389     :type has_class: bool
390     :rtype: bool
391
392     .. method:: has_other_attributes(include_class=True)
393
394     Return :obj:`True` if the domain has any variables which are
395     neither discrete nor continuous, such as, for instance string variables.
396     class is included unless ``include_class`` is ``False``.
397
398     :param has_class: Tells whether to consider the class variable
399     :type has_class: bool
400     :rtype: bool
401
402
403     .. method:: add_meta(id, variable, optional=0)
404
405     Register a meta attribute with the given id (obtained by
406     :obj:`Orange.data.new_meta_id`). The same meta attribute should
407     have the same id in all domain in which it is registered. ::
408
409         >>> newid = Orange.data.new_meta_id()
410         >>> domain.add_meta(newid, Orange.data.variable.String("origin"))
411         >>> data[55]["origin"] = "Nepal"
412         >>> data[55]
413         ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
414         '4', '1', '0', '1', 'mammal'], {"name":'oryx', "origin":'Nepal'}
415
416     The third argument tells whether the meta attribute is optional or
417     not. The parameter is an integer, with any non-zero value meaning that
418     the attribute is optional. Different values can be used to distinguish
419     between various types optional attributes; the meaning of the value
420     is not defined in advance and can be used arbitrarily by the
421     application.
422
423     :param id: id of the new meta attribute
424     :type id: int
425     :param variable: variable descriptor
426     :type variable: Orange.data.variable.Variable
427     :param optional: tells whether the meta attribute is optional
428     :type optional: int
429
430     .. method:: add_metas(attributes, optional=0)
431
432     Add multiple meta attributes at once. The dictionary contains id's as
433     keys and variables (:obj:~Orange.data.variable as the corresponding
434     values. The following example shows how to add all meta attributes
435     from one domain to another::
436
437          newdomain.add_metas(domain.get_metas())
438
439     The optional second argument has the same meaning as in :obj:`add_meta`.
440
441     :param attributes: dictionary of id's and variables
442     :type attributes: dict
443     :param optional: tells whether the meta attribute is optional
444     :type optional: int
445
446     .. method:: remove_meta(attribute)
447
448     Removes one or multiple meta attributes. Removing a meta attribute has
449     no effect on data instances.
450
451     :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
452     :type attribute: string, int, Orange.data.variable.Variable; or a list
453
454     .. method:: has_attribute(attribute)
455
456     Return True if the domain contains the specified meta attribute.
457
458     :param attribute: attribute to be checked
459     :type attribute: string, int, Orange.data.variable.Variable
460     :rtype: bool
461
462     .. method:: meta_id(attribute)
463
464     Return an id of a meta attribute.
465
466     :param attribute: name or variable descriptor of the attribute
467     :type attribute: string or Orange.data.variable.Variable
468     :rtype: int
469
470     .. method:: get_meta(attribute)
471
472     Return a variable descriptor corresponding to the meta attribute.
473
474     :param attribute: name or id of the attribute
475     :type attribute: string or int
476     :rtype: Orange.data.variable.Variable
477
478     .. method:: get_metas()
479
480      Return a dictionary with meta attribute id's as keys and corresponding
481      variable descriptors as values.
482
483     .. method:: get_metas(optional)
484
485      Return a dictionary with meta attribute id's as keys and corresponding
486      variable descriptors as values. The dictionary contains only meta
487      attributes for which the argument ``optional`` matches the flag given
488      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`.
489
490      :param optional: flag that specifies the attributes to be returned
491      :type optional: int
492      :rtype: dict
493
494     .. method:: is_optional_meta(attribute)
495
496     Return True if the given meta attribute is optional, and False if it is
497     not.
498
499     :param attribute: attribute to be checked
500     :type attribute: string, int, Orange.data.variable.Variable
501     :rtype: bool
Note: See TracBrowser for help on using the repository browser.