source: orange/docs/reference/rst/Orange.data.domain.rst @ 9929:6df3696524a2

Revision 9929:6df3696524a2, 18.6 KB checked in by markotoplak, 2 years ago (diff)

data.variable -> feature.

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of variables and meta
8attributes that describe data. A domain descriptor is attached to data
9instances, data tables, classifiers and other objects. A descriptor is
10constructed, for instance, after reading data from a file.
11
12    >>> data = Orange.data.Table("zoo")
13    >>> domain = data.domain
14    >>> domain
15    [hair, feathers, eggs, milk, airborne, aquatic, predator, toothed,
16    backbone, breathes, venomous, fins, legs, tail, domestic, catsize,
17    type], {-2:name}
18
19Domains consists of ordinary features (from "hair" to "catsize" in the
20above example), the class attribute ("type"), and meta attributes
21("name"). We will refer to features and the class attribute as
22*variables*. Variables are printed out in a form similar to a list whose
23elements are attribute names, and meta attributes are printed like a
24dictionary whose "keys" are meta attribute id's and "values" are
25attribute names. In the above case, each data instance corresponds to an
26animal and is described by the animal's properties and its type (the
27class); the meta attribute contains the animal's name.
28
29Domains as lists and dictionaries
30=================================
31
32Domains behave like lists: the length of domain is the number of
33variables including the class variable. Domains can be indexed by integer
34indices, variable names or instances of
35:obj:`Orange.feature.Descriptor`::
36
37    >>> domain["feathers"]
38    EnumVariable 'feathers'
39    >>> domain[1]
40    EnumVariable 'feathers'
41    >>> feathers = domain[1]
42    >>> domain[feathers]
43    EnumVariable 'feathers'
44
45Meta attributes work the same::
46
47    >>> domain[-2]
48    StringVariable 'name'
49    >>> domain["name"]
50    StringVariable 'name'
51
52
53Slices can be retrieved, but not set. Iterating through domain goes
54through features and the class variable, but not through meta attributes::
55
56    >>> for attr in domain:
57    ...     print attr.name,
58    ...
59    hair feathers eggs milk airborne aquatic predator toothed backbone
60    breathes venomous fins legs tail domestic catsize type
61
62Method :obj:`Domain.index` returns the index of a variable specified by a
63descriptor or name::
64
65    >>> domain.index("feathers")
66    1
67    >>> domain.index(feathers)
68    1
69    >>> domain.index("name")
70    -2
71
72
73Conversions between domains
74===========================
75
76Domain descriptors can convert instances from one domain to another
77(details on construction of domains are described later). ::
78
79     >>> new_domain = Orange.data.Domain(["feathers", "legs", "type"], domain)
80     >>> inst = data[55]
81     >>> inst
82     ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0', '4',
83     '1', '0', '1', 'mammal'], {"name":'oryx'}
84     >>> inst2 = new_domain(inst)
85     >>> inst2
86     ['0', '4', 'mammal']
87
88This is used, for instance, in classifiers: classifiers are often
89trained on a preprocessed domain (e.g. on a subset of features or
90on discretized data) and later used on instances from the original
91domain. Classifiers store the training domain descriptor and use it
92for converting new instances.
93
94Alternatively, instances can be converted by constructing a new instance
95and pass the new domain to the constructor::
96
97     >>> inst2 = Orange.data.Instance(new_domain, inst)
98
99Entire data table can be converted in a similar way::
100
101     >>> data2 = Orange.data.Table(new_domain, data)
102     >>> data2[55]
103     ['0', '4', 'mammal']
104
105
106.. _multiple-classes:
107
108Multiple classes
109================
110
111A domain can have multiple additional class attributes. These are stored
112similarly to other features except that they are not used for learning. The
113list of such classes is stored in :obj:`~Orange.data.Domain.class_vars`.
114When converting between domains, multiple classes can become ordinary
115features or the class, and vice versa.
116
117.. _meta-attributes:
118
119Meta attributes
120===============
121
122Meta attributes hold additional data attached to individual
123instances. Different instances from the same domain or even the same
124table may have different meta attributes. (See documentation on
125:obj:`Orange.data.Instance` for details about meta values.)
126
127Meta attributes that appear in instances can - but don't need to - be
128listed in the domain. Typically, the meta attribute will be included in
129the domain for the following reasons.
130
131     * If the domain knows about meta attributes, their values can be
132       obtained with indexing by names and variable descriptors,
133       e.g. ``inst["age"]``. Values of unknown meta attributes
134       can be obtained only through integer indices (e.g. inst[id], where
135       id needs to be an integer).
136
137     * When printing out a data instance, the symbolic values of discrete
138       meta attributes can only be printed if the attribute is
139       registered. Also, if the attribute is registered, the printed
140       out example will show a (more informative) attribute's name
141       instead of a meta-id.
142
143     * When saving instances to a file, only the values of registered
144       meta attributes are saved.
145
146     * When a new data instance is constructed, it will have all the
147       meta attributes listed in the domain, with their values set to
148       unknown.
149
150Meta attribute can be marked as "optional". Non-optional meta
151attributes are *expected to be* present in all data instances from that
152domain. This rule is not strictly enforced. As one of the few places
153where the difference matters, saving to files fails if a non-optional
154meta value is missing; optional attributes are not written to the file
155at all. Also, newly constructed data instances initially have all the
156non-optional meta attributes.
157
158While the list of features and the class value are immutable,
159meta attributes can be added and removed at any time::
160
161     >>> misses = Orange.feature.Continuous("misses")
162     >>> id = Orange.feature.new_meta_id()
163     >>> data.domain.add_meta(id, misses)
164
165This does not change the data: no attributes are added to data
166instances. Methods related to meta attributes are described in more
167details later.
168
169Registering meta attributes enables addressing by indexing, either by
170name or by descriptor. For instance, the following snippet sets the new
171attribute to 0 for all instances in the data table::
172
173     >>> for inst in data:
174     ...     inst[misses] = 0
175
176An alternative is to refer to the attribute by name::
177
178     >>> for inst in data:
179     ...     inst["misses"] = 0
180
181If the attribute were not registered, it could still be set using the
182integer index::
183
184     >>> for inst in data:
185     ...    inst.set_meta(id, 0)
186
187Registering the meta attribute also enhances printouts. When an instance
188is printed, meta-values for registered meta attributes are shown as
189"name:value" pairs, while for unregistered only id is given instead
190of a name.
191
192A meta-attribute can be used, for instance, to record the number of
193misclassifications by a given ``classifier``::
194
195     >>> for inst in data:
196     ... if inst.get_class() != classifier(inst):
197     ...     inst[misses] += 1
198
199The other effect of registering meta attributes is that they appear in
200converted instances: whenever an instances is converted to some
201domain, it will have all the meta attributes that are registered in
202that domain. If the meta attributes occur in the original domain of
203the instance or if they can be computed from them, they will have
204appropriate values, otherwise their value will be missing. ::
205
206    new_domain = Orange.data.Domain(["feathers", "legs"], domain)
207    new_domain.add_meta(Orange.feature.new_meta_id(), domain["type"])
208    new_domain.add_meta(Orange.feature.new_meta_id(), domain["legs"])
209    new_domain.add_meta(
210        Orange.feature.new_meta_id(), Orange.feature.Discrete("X"))
211    data2 = Orange.data.Table(new_domain, data)
212
213Domain ``new_domain`` in this example has variables ``feathers`` and
214``legs`` and meta attributes ``type``, ``legs`` (again) and ``X`` which
215is a new feature with no relation to the existing ones. ::
216
217    >>> data[55]
218    ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
219    '4', '1', '0', '1', 'mammal'], {"name":'oryx'}
220    >>> data2[55]
221    ['0', '4'], {"type":'mammal', "legs":'4', "X":'?'}
222
223
224
225.. class:: Domain
226
227     .. attribute:: features
228
229         Immutable list of domain attributes without the class
230         variable. Read only.
231
232     .. attribute:: variables
233
234         List of domain attributes including the class variable. Read only.
235
236     .. attribute:: class_var
237
238         The class variable (:obj:`~Orange.feature.Descriptor`) or
239         ``None``. Read only.
240
241     .. attribute:: class_vars
242
243         A list of additional class attributes. Read only.
244
245     .. attribute:: version
246
247         An integer value that is changed when the domain is
248         modified. The value can be also used as unique domain identifier; two
249         different domains have different value of ``version``.
250
251     .. method:: __init__(variables[, class_vars=])
252
253         Construct a domain with the given variables; the
254         last one is used as the class variable. ::
255
256             >>> a, b, c = [Orange.feature.Discrete(x) for x in "abc"]
257             >>> domain = Orange.data.Domain([a, b, c])
258             >>> domain.features
259             <EnumVariable 'a', EnumVariable 'b'>
260             >>> domain.class_var
261             EnumVariable 'c'
262
263         :param variables: List of variables (instances of :obj:`~Orange.feature.Descriptor`)
264         :type variables: list
265         :param class_vars: A list of multiple classes; must be a keword argument
266         :type class_vars: list
267
268     .. method:: __init__(features, class_variable[, class_vars=])
269
270         Construct a domain with the given list of features and the
271         class variable. ::
272
273             >>> domain = Orange.data.Domain([a, b], c)
274             >>> domain.features
275             <EnumVariable 'a', EnumVariable 'b'>
276             >>> domain.class_var
277             EnumVariable 'c'
278
279         :param features: List of features (instances of :obj:`~Orange.feature.Descriptor`)
280         :type features: list
281         :param class_variable: Class variable
282         :type class_variable: Orange.feature.Descriptor
283         :param class_vars: A list of multiple classes; must be a keyword argument
284         :type class_vars: list
285
286     .. method:: __init__(variables, has_class[, class_vars=])
287
288         Construct a domain with the given variables. If ``has_class``
289         is ``True``, the last variable is the class. ::
290
291             >>> domain = Orange.data.Domain([a, b, c], False)
292             >>> domain.features
293             <EnumVariable 'a', EnumVariable 'b'>
294             >>> domain.class_var
295             EnumVariable 'c'
296
297         :param variables: List of variables (instances of :obj:`~Orange.feature.Descriptor`)
298         :type features: list
299         :param has_class: A flag telling whether the domain has a class
300         :type has_class: bool
301         :param class_vars: A list of multiple classes; must be a keyword argument
302         :type class_vars: list
303
304     .. method:: __init__(variables, source[, class_vars=])
305
306         Construct a domain with the given variables. Variables specified
307         by names are sought for in the ``source`` argument. The last
308         variable from the list is used as the class variable. ::
309
310             >>> domain1 = orange.Domain([a, b])
311             >>> domain2 = orange.Domain(["a", b, c], domain)
312
313         :param variables: List of variables (strings or instances of :obj:`~Orange.feature.Descriptor`)
314         :type variables: list
315         :param source: An existing domain or a list of variables
316         :type source: Orange.data.Domain or list of :obj:`~Orange.feature.Descriptor`
317         :param class_vars: A list of multiple classes; must be a keyword argument
318         :type class_vars: list
319
320     .. method:: __init__(variables, has_class, source[, class_vars=])
321
322         Similar to above except for the flag which tells whether the
323         last variable should be used as the class variable. ::
324
325             >>> domain1 = orange.Domain([a, b], False)
326             >>> domain2 = orange.Domain(["a", b, c], False, domain)
327
328         :param variables: List of variables (strings or instances of :obj:`~Orange.feature.Descriptor`)
329         :type variables: list
330         :param has_class: A flag telling whether the domain has a class
331         :type has_class: bool
332         :param source: An existing domain or a list of variables
333         :type source: Orange.data.Domain or list of :obj:`~Orange.feature.Descriptor`
334         :param class_vars: A list of multiple classes; must be a keyword argument
335         :type class_vars: list
336
337     .. method:: __init__(domain, class_var[, class_vars=])
338
339         Construct a copy of an existing domain except that the class
340         variable is replaced with the one specified in the argument
341         and the class variable of the existing domain becomes an
342         ordinary feature. If the new class is one of the original
343         domain's features, ``class_var`` can also be specified by name.
344
345         :param domain: An existing domain
346         :type domain: :obj:`~Orange.variable.Domain`
347         :param class_var: Class variable for the new domain
348         :type class_var: :obj:`~Orange.feature.Descriptor` or string
349         :param class_vars: A list of multiple classes; must be a keyword argument
350         :type class_vars: list
351
352     .. method:: __init__(domain, has_class=False[, class_vars=])
353
354         Construct a copy of the domain. If the ``has_class``
355         flag is given and is :obj:`False`, the class attribute becomes
356         an ordinary feature.
357
358         :param domain: An existing domain
359         :type domain: :obj:`~Orange.variable.Domain`
360         :param has_class: A flag indicating whether the domain will have a class
361         :type has_class: bool
362         :param class_vars: A list of multiple classes; must be a keword argument
363         :type class_vars: list
364
365     .. method:: has_discrete_attributes(include_class=True)
366
367         Return ``True`` if the domain has any discrete variables;
368         class is included unless ``include_class`` is ``False``.
369
370         :param has_class: tells whether to consider the class variable
371         :type has_class: bool
372         :rtype: bool
373
374     .. method:: has_continuous_attributes(include_class=True)
375
376         Return ``True`` if the domain has any continuous variables;
377         class is included unless ``include_class`` is ``False``.
378
379         :param has_class: tells whether to consider the class variable
380         :type has_class: bool
381         :rtype: bool
382
383     .. method:: has_other_attributes(include_class=True)
384
385         Return ``True`` if the domain has any variables that are
386         neither discrete nor continuous, such as, for instance string
387         variables. The class is included unless ``include_class`` is
388         ``False``.
389
390         :param has_class: tells whether to consider the class variable
391         :type has_class: bool
392         :rtype: bool
393
394
395     .. method:: add_meta(id, variable, optional=0)
396
397         Register a meta attribute with the given id (see
398         :obj:`Orange.feature.new_meta_id`). The same meta attribute should
399         have the same id in all domains in which it is registered. ::
400
401             >>> newid = Orange.feature.new_meta_id()
402             >>> domain.add_meta(newid, Orange.feature.String("origin"))
403             >>> data[55]["origin"] = "Nepal"
404             >>> data[55]
405             ['1', '0', '0', '1', '0', '0', '0', '1', '1', '1', '0', '0',
406             '4', '1', '0', '1', 'mammal'], {"name":'oryx', "origin":'Nepal'}
407
408         The third argument tells whether the meta attribute is optional or
409         not; non-zero values indicate optional attributes. Different
410         values can be used to distinguish between various types
411         optional attributes; the meaning of the value is not defined in
412         advance and can be used arbitrarily by the application.
413
414         :param id: id of the new meta attribute
415         :type id: int
416         :param variable: variable descriptor
417         :type variable: Orange.feature.Descriptor
418         :param optional: indicates whether the meta attribute is optional
419         :type optional: int
420
421     .. method:: add_metas(attributes, optional=0)
422
423         Add multiple meta attributes at once. The dictionary contains id's as
424         keys and variables (:obj:`~Orange.feature.Descriptor`) as the
425         corresponding values. The following example shows how to add all
426         meta attributes from another domain::
427
428              >>> newdomain.add_metas(domain.get_metas())
429
430         The optional second argument has the same meaning as in :obj:`add_meta`.
431
432         :param attributes: dictionary of id's and variables
433         :type attributes: dict
434         :param optional: tells whether the meta attribute is optional
435         :type optional: int
436
437     .. method:: remove_meta(attribute)
438
439         Removes one or multiple meta attributes. Removing a meta attribute has
440         no effect on data instances.
441
442         :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
443         :type attribute: string, int, Orange.feature.Descriptor; or a list
444
445     .. method:: has_attribute(attribute)
446
447         Return ``True`` if the domain contains the specified meta
448         attribute.
449
450         :param attribute: attribute to be checked
451         :type attribute: string, int, Orange.feature.Descriptor
452         :rtype: bool
453
454     .. method:: meta_id(attribute)
455
456         Return an id of a meta attribute.
457
458         :param attribute: name or variable descriptor of the attribute
459         :type attribute: string or Orange.feature.Descriptor
460         :rtype: int
461
462     .. method:: get_meta(attribute)
463
464         Return a variable descriptor corresponding to the meta attribute.
465
466         :param attribute: name or id of the attribute
467         :type attribute: string or int
468         :rtype: Orange.feature.Descriptor
469
470     .. method:: get_metas()
471
472          Return a dictionary with meta attribute id's as keys and
473          corresponding variable descriptors as values.
474
475     .. method:: get_metas(optional)
476
477          Return a dictionary with meta attribute id's as keys and
478          corresponding variable descriptors as values. The dictionary
479          contains only meta attributes for which the argument ``optional``
480          matches the flag given when the attributes were added using
481          :obj:`add_meta` or :obj:`add_metas`.
482
483          :param optional: flag that specifies the attributes to be returned
484          :type optional: int
485          :rtype: dict
486
487     .. method:: is_optional_meta(attribute)
488
489         Return ``True`` if the given meta attribute is optional,
490         and ``False`` if it is not.
491
492         :param attribute: attribute to be checked
493         :type attribute: string, int, Orange.feature.Descriptor
494         :rtype: bool
Note: See TracBrowser for help on using the repository browser.