source: orange/docs/reference/rst/Orange.data.domain.rst @ 9372:aef193695ea9

Revision 9372:aef193695ea9, 16.0 KB checked in by mitar, 2 years ago (diff)

Moved documentation to the separate directory.

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of features, which will be
8used to describe the data instances, the class variables, meta
9attributes and similar. Each data instance, as well as many
10classifiers and other objects are associated with a domain descriptor,
11which defines the object's content and/or its input and output data
12format.
13
14Domain descriptors are also responsible for converting data instances
15from one domain to another, e.g. from the original feature space to
16one with different set of features which are selected or constructed
17from the original set.
18
19Domains as lists
20================
21
22Domains resemble lists: the length of domain is the number of
23variables, including the class variable. Iterating through domain
24goes through features and the class variable, but not through meta
25attributes. Domains can be indexed by integer indices, variable names
26or instances of :obj:`Orange.data.variables.Variable`. Domain has a
27method :obj:`Domain.index` that returns the index of a variable
28specified by a descriptor, name. Slices can be retrieved, but not
29set. ::
30
31    >>> print d2
32    [a, b, e, y], {-4:c, -5:d, -6:f, -7:X}
33    >>> d2[1]
34    EnumVariable 'b'
35    >>> d2["e"]
36    EnumVariable 'e'
37    >>> d2["d"]
38    EnumVariable 'd'
39    >>> d2[-4]
40    EnumVariable 'c'
41    >>> for attr in d2:
42    ...     print attr.name,
43    ...
44    a b e y
45
46Conversions between domains
47===========================
48
49Domain descriptors are used to convert instances from one domain to
50another. ::
51
52     >>> data = Orange.data.Table("monk1")
53     >>> d2 = Orange.data.Domain(["a", "b", "e", "y"], data.domain)
54     >>>
55     >>> inst = data[55]
56     >>> print inst
57     ['1', '2', '1', '1', '4', '2', '0']
58     >>> inst2 = d2(inst)
59     >>>  print inst2
60     ['1', '2', '4', '0']
61
62This is used, for instance, in classifiers: classifiers are often
63trained on a preprocessed domain (e.g. with a subset of features or
64with discretized data) and later used on instances from the original
65domain. Classifiers store the training domain descriptor and use it
66for converting new instances.
67
68Equivalently, instances can be converted by passing the new domain to
69the constructor::
70
71     >>> inst2 = Orange.data.Instance(d2, inst)
72
73Entire data table can be converted similarly::
74
75     >>> data2 = Orange.data.Table(d2, data)
76     >>> print data2[55]
77     ['1', '2', '4', '0']
78
79
80Meta attributes
81===============
82
83Meta-values are additional values that can be attached to instances.
84It is not necessary that all instances in the same table (or even all
85instances from the same domain) have certain meta-value. See documentation
86on :obj:`Orange.data.Instance` for a more thorough description of meta-values.
87
88Meta attributes that appear in instances can, but don't need to be
89registered in the domain. Typically, the meta attribute will be
90registered for the following reasons.
91
92     * If the domain knows about a meta attribute, their values can be
93       obtained with indexing by names and variable descriptors,
94       e.g. ``inst["age"]``. Values of unregistered meta attributes can
95       be obtained only through integer indices (e.g. inst[id], where
96       id needs to be an integer).
97
98     * When printing out an instance, the symbolic values of discrete
99       meta attributes can only be printed if the attribute is
100       registered. Also, if the attribute is registered, the printed
101       out example will show a (more informative) attribute's name
102       instead of a meta-id.
103
104     * Registering an attribute provides a way to attach a descriptor
105       to a meta-id. See how the basket file format uses this feature.
106
107     * When saving instances to a file, only the values of registered
108       meta attributes are saved.
109
110     * When a new data instance is constructed, it is automatically
111       assigned the meta attributes listed in the domain, with their
112       values set to unknown.
113
114For the latter two points - saving to a file and construction of new
115instances - there is an additional flag: a meta attribute can be
116marked as "optional". Such meta attributes are not saved and not added
117to newly constructed data instances. This functionality is used in,
118for instance, the above mentioned basket format, where new meta
119attributes are created while loading the file and new instances to
120contain all words from the past examples.
121
122There is another distinction between the optional and non-optional
123meta attributes: the latter are `expected to be` present in all
124examples of that domain. Saving to files expects them and will fail if
125a non-optional meta value is missing. Optional attributes may be
126missing. In most other places, these rules are not strictly enforced,
127so adhering to them is rather up to choice.
128
129Meta attributes can be added and removed even after the domain is
130constructed and instances of that domain already exist. For instance,
131if data contains the Monk 1 data set, we can add a new continuous
132attribute named "misses" with the following code (a detailed
133desription of methods related to meta attributes is given below)::
134
135     >>> misses = Orange.data.variable.Continuous("misses")
136     >>> id = orange.new_meta_id()
137     >>> data.domain.add_meta(id, misses)
138
139This does not change the data: no attributes are added to data
140instances.
141
142Registering meta attributes enables addressing by indexing, either by
143name or by descriptor. For instance, the following snippet sets the new
144attribute to 0 for all instances in the data table::
145
146     >>> for inst in data:
147     ...     inst[misses] = 0
148
149An alternative is to refer to the attribute by name::
150
151     >>> for inst in data:
152     ...     inst["misses"] = 0
153
154If the attribute were not registered, it could still be set using the
155integer index::
156
157     >>> for inst in data:
158     ...    inst.set_meta(id, 0)
159
160Registering the meta attribute also enhances printouts. When an instance
161is printed, meta-values for registered meta attributes are shown as
162"name:value" pairs, while for unregistered only id is given instead
163of a name.
164
165A meta-attribute can be used, for instance, to record the number of
166misclassifications by a given ``classifier``::
167
168     >>> for inst in data:
169     ... if inst.get_class() != classifier(example):
170     ...     example[misses] += 1
171
172The other effect of registering meta attributes is that they appear in
173converted instances: whenever an instances is converted to some
174domain, it will have all the meta attributes that are registered in
175that domain. If the meta attributes occur in the original domain of
176the instance or if they can be computed from them, they will have
177appropriate values, otherwise they will have a "don't know" value. ::
178
179     domain = data.domain
180     d2 = Orange.data.Domain(["a", "b", "e", "y"], domain)
181     for attr in ["c", "d", "f"]:
182     d2.add_meta(orange.newmetaid(), domain[attr])
183     d2.add_meta(orange.newmetaid(), orange.data.variable.Discrete("X"))
184     data2 = Orange.data.Table(d2, data)
185
186Domain ``d2`` in this example has variables ``a``, ``b``, ``e`` and the
187class, while the other three variables are added as meta
188attributes, together with additional attribute X. Results are as
189follows. ::
190
191     >>> print data[55]
192     ['1', '2', '1', '1', '4', '2', '0'], {"misses":0.000000}
193     >>> print data2[55]
194     ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?'}
195
196After conversion, the three attributes are moved to meta attributes
197and the new attribute appears as unknown.
198
199.. class:: Domain
200
201     .. attribute:: features
202
203     List of domain attributes
204     (:obj:`Orange.data.variable.Variables`) without the class
205     variable. Read only.
206
207     .. attribute:: variables
208
209     List of domain attributes
210     (:obj:`Orange.data.variable.Variables`) including the class
211     variable. Read only.
212
213     .. attribute:: class_var
214
215     The class variable (:obj:`Orange.data.variable.Variable`), or
216     :obj:`None` if there is none. Read only.
217
218     .. attribute:: version
219
220     An integer value that is changed when the domain is
221     modified. Can be also used as unique domain identifier; two
222     different domains also have different versions.
223
224     .. method:: __init__(variables)
225
226     Construct a domain with the given variables specified; the
227     last one is used as the class variable. ::
228
229         >>> a, b, c = [Orange.data.variable.Discrete(x)
230                for x in ["a", "b", "c"]]
231         >>> d = Orange.data.Domain([a, b, c])
232         >>> print d.features
233         <EnumVariable 'a', EnumVariable 'b'>
234         >>> print d.class_var
235         EnumVariable 'c'
236
237     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
238     :type variables: list
239
240     .. method:: __init__(features, class_variable)
241
242     Construct a domain with the given list of features and the
243     class variable. ::
244
245         >>> d = Orange.data.Domain([a, b], c)
246         >>> print d.features
247         <EnumVariable 'a', EnumVariable 'b'>
248         >>> print d.class_var EnumVariable 'c'
249
250     :param features: List of features (instances of :obj:`Orange.data.variable.Variable`)
251     :type features: list
252     :param class_variable: Class variable
253     :type features: Orange.data.variable.Variable
254
255     .. method:: __init__(variables, has_class)
256
257     Construct a domain with the given variables. If has_class is
258     :obj:`True`, the last one is used as the class variable. ::
259
260         >>> d = Orange.data.Domain([a, b, c], False)
261         >>> print d.features
262         <EnumVariable 'a', EnumVariable 'b'>
263         >>> print d.class_var
264         EnumVariable 'c'
265
266     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
267     :type features: list
268     :param has_class: A flag telling whether the domain has a class
269     :type has_class: bool
270
271     .. method:: __init__(variables, source)
272
273     Construct a domain with the given variables, which can also be
274     specified by names, provided that the variables with that
275     names exist in the source list. The last variable from the
276     list is used as the class variable. ::
277
278         >>> d1 = orange.Domain([a, b])
279         >>> d2 = orange.Domain(["a", b, c], d1)
280
281     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
282     :type variables: list
283     :param source: An existing domain or a list of variables
284     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
285
286     .. method:: __init__(variables, has_class, source)
287
288     Similar to above except for the flag which tells whether the
289     last variable should be used as the class variable. ::
290
291         >>> d1 = orange.Domain([a, b])
292         >>> d2 = orange.Domain(["a", b, c], d1)
293
294     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
295     :type variables: list
296     :param has_class: A flag telling whether the domain has a class
297     :type has_class: bool
298     :param source: An existing domain or a list of variables
299     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
300
301     .. method:: __init__(domain, class_var)
302
303     Construct a domain as a shallow copy of an existing domain
304     except that the class variable is replaced with the given one
305     and the class variable of the existing domain becoems an
306     ordinary feature. If the new class is one of the original
307     domain's features, it can also be specified by a name.
308
309     :param domain: An existing domain
310     :type domain: :obj:`Orange.variable.Domain`
311     :param class_var: Class variable for the new domain
312     :type class_var: string or :obj:`Orange.data.variable.Variable`
313
314     .. method:: __init__(domain, has_class=False)
315
316     Construct a shallow copy of the domain. If the ``has_class``
317     flag is given and equals :obj:`False`, it moves the class
318     attribute to ordinary features.
319
320     :param domain: An existing domain
321     :type domain: :obj:`Orange.variable.Domain`
322     :param has_class: A flag telling whether the domain has a class
323     :type has_class: bool
324
325     .. method:: has_discrete_attributes(include_class=True)
326
327     Return :obj:`True` if the domain has any discrete variables;
328     class is considered unless ``include_class`` is ``False``.
329
330     :param has_class: Tells whether to consider the class variable
331     :type has_class: bool
332     :rtype: bool
333
334     .. method:: has_continuous_attributes(include_class=True)
335
336     Return :obj:`True` if the domain has any continuous variables;
337     class is considered unless ``include_class`` is ``False``.
338
339     :param has_class: Tells whether to consider the class variable
340     :type has_class: bool
341     :rtype: bool
342
343     .. method:: has_other_attributes(include_class=True)
344
345     Return :obj:`True` if the domain has any variables which are
346     neither discrete nor continuous, such as, for instance string variables.
347     class is considered unless ``include_class`` is ``False``.
348
349     :param has_class: Tells whether to consider the class variable
350     :type has_class: bool
351     :rtype: bool
352
353
354     .. method:: add_meta(id, variable, optional=0)
355
356     Register a meta attribute with the given id (obtained by
357     :obj:`Orange.data.new_meta_id`). The same meta attribute can (and
358     should) have the same id when registered in different domains. ::
359
360         >>> newid = Orange.data.new_meta_id()
361         >>> d2.add_meta(newid, Orange.data.variable.String("name"))
362         >>> d2[55]["name"] = "Joe"
363         >>> print data2[55]
364         ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?', "name":'Joe'}
365
366     The third argument tells whether the meta attribute is optional or
367     not. The parameter is an integer, with any non-zero value meaning that
368     the attribute is optional. Different values can be used to distinguish
369     between various optional attributes; the meaning of the value is not
370     defined in advance and can be used arbitrarily by the application.
371
372     :param id: id of the new meta attribute
373     :type id: int
374     :param variable: variable descriptor
375     :type variable: Orange.data.variable.Variable
376     :param optional: tells whether the meta attribute is optional
377     :type optional: int
378
379     .. method:: add_metas(attributes, optional=0)
380
381     Add multiple meta attributes at once. The dictionary contains id's as
382     keys and variables as the corresponding values. The following example
383     shows how to add all meta attributes from one domain to another::
384
385          newdomain.add_metas(domain.get_metas)
386
387     The optional second argument has the same meaning as in :obj:`add_meta`.
388
389     :param attributes: dictionary of id's and variables
390     :type attributes: dict
391     :param optional: tells whether the meta attribute is optional
392     :type optional: int
393
394     .. method:: remove_meta(attribute)
395
396     Removes one or multiple meta attributes. Removing a meta attribute has
397     no effect on data instances.
398
399     :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
400     :type attribute: string, int, Orange.data.variable.Variable; or a list
401
402     .. method:: has_attribute(attribute)
403
404     Return True if the domain contains the specified meta attribute.
405
406     :param attribute: attribute to be checked
407     :type attribute: string, int, Orange.data.variable.Variable
408     :rtype: bool
409
410     .. method:: meta_id(attribute)
411
412     Return an id of a meta attribute.
413
414     :param attribute: name or variable descriptor of the attribute
415     :type attribute: string or Orange.data.variable.Variable
416     :rtype: int
417
418     .. method:: get_meta(attribute)
419
420     Return a variable descriptor corresponding to the meta attribute.
421
422     :param attribute: name or id of the attribute
423     :type attribute: string or int
424     :rtype: Orange.data.variable.Variable
425
426     .. method:: get_metas()
427
428      Return a dictionary with meta attribute id's as keys and corresponding
429      variable descriptors as values.
430
431     .. method:: get_metas(optional)
432
433      Return a dictionary with meta attribute id's as keys and corresponding
434      variable descriptors as values; the dictionary contains only meta
435      attributes for which the argument ``optional`` matches the flag given
436      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`.
437
438      :param optional: flag that specifies the attributes to be returned
439      :type optional: int
440      :rtype: dict
441
442     .. method:: is_optional_meta(attribute)
443
444     Return True if the given meta attribute is optional, and False if it is
445     not.
446
447     :param attribute: attribute to be checked
448     :type attribute: string, int, Orange.data.variable.Variable
449     :rtype: bool
Note: See TracBrowser for help on using the repository browser.