source: orange/docs/reference/rst/Orange.data.domain.rst @ 9553:be024c77a0df

Revision 9553:be024c77a0df, 17.1 KB checked in by lanz <lan.zagar@…>, 2 years ago (diff)

Added some documentation for Orange.multitarget

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of features, which will be
8used to describe the data instances, the class variables, meta
9attributes and similar. Each data instance, as well as many
10classifiers and other objects are associated with a domain descriptor,
11which defines the object's content and/or its input and output data
12format.
13
14Domain descriptors are also responsible for converting data instances
15from one domain to another, e.g. from the original feature space to
16one with different set of features which are selected or constructed
17from the original set.
18
19Domains as lists
20================
21
22Domains resemble lists: the length of domain is the number of
23variables, including the class variable. Iterating through domain
24goes through features and the class variable, but not through meta
25attributes. Domains can be indexed by integer indices, variable names
26or instances of :obj:`Orange.data.variables.Variable`. Domain has a
27method :obj:`Domain.index` that returns the index of a variable
28specified by a descriptor, name. Slices can be retrieved, but not
29set. ::
30
31    >>> print d2
32    [a, b, e, y], {-4:c, -5:d, -6:f, -7:X}
33    >>> d2[1]
34    EnumVariable 'b'
35    >>> d2["e"]
36    EnumVariable 'e'
37    >>> d2["d"]
38    EnumVariable 'd'
39    >>> d2[-4]
40    EnumVariable 'c'
41    >>> for attr in d2:
42    ...     print attr.name,
43    ...
44    a b e y
45
46Conversions between domains
47===========================
48
49Domain descriptors are used to convert instances from one domain to
50another. ::
51
52     >>> data = Orange.data.Table("monk1")
53     >>> d2 = Orange.data.Domain(["a", "b", "e", "y"], data.domain)
54     >>>
55     >>> inst = data[55]
56     >>> print inst
57     ['1', '2', '1', '1', '4', '2', '0']
58     >>> inst2 = d2(inst)
59     >>>  print inst2
60     ['1', '2', '4', '0']
61
62This is used, for instance, in classifiers: classifiers are often
63trained on a preprocessed domain (e.g. with a subset of features or
64with discretized data) and later used on instances from the original
65domain. Classifiers store the training domain descriptor and use it
66for converting new instances.
67
68Equivalently, instances can be converted by passing the new domain to
69the constructor::
70
71     >>> inst2 = Orange.data.Instance(d2, inst)
72
73Entire data table can be converted similarly::
74
75     >>> data2 = Orange.data.Table(d2, data)
76     >>> print data2[55]
77     ['1', '2', '4', '0']
78
79
80.. _multiple-classes:
81
82Multiple classes
83================
84
85A domain can have multiple additional class attributes. These are stored
86similarly to other features except that they are not used for learning. The
87list of such classes is stored in :obj:`~Orange.data.Domain.class_vars`.
88When converting between domains, multiple classes can become ordinary
89features or the class, and vice versa.
90
91.. _meta-attributes:
92
93Meta attributes
94===============
95
96Meta-values are additional values that can be attached to instances.
97It is not necessary that all instances in the same table (or even all
98instances from the same domain) have certain meta-value. See documentation
99on :obj:`Orange.data.Instance` for a more thorough description of meta-values.
100
101Meta attributes that appear in instances can, but don't need to be
102registered in the domain. Typically, the meta attribute will be
103registered for the following reasons.
104
105     * If the domain knows about a meta attribute, their values can be
106       obtained with indexing by names and variable descriptors,
107       e.g. ``inst["age"]``. Values of unregistered meta attributes can
108       be obtained only through integer indices (e.g. inst[id], where
109       id needs to be an integer).
110
111     * When printing out an instance, the symbolic values of discrete
112       meta attributes can only be printed if the attribute is
113       registered. Also, if the attribute is registered, the printed
114       out example will show a (more informative) attribute's name
115       instead of a meta-id.
116
117     * Registering an attribute provides a way to attach a descriptor
118       to a meta-id. See how the basket file format uses this feature.
119
120     * When saving instances to a file, only the values of registered
121       meta attributes are saved.
122
123     * When a new data instance is constructed, it is automatically
124       assigned the meta attributes listed in the domain, with their
125       values set to unknown.
126
127For the latter two points - saving to a file and construction of new
128instances - there is an additional flag: a meta attribute can be
129marked as "optional". Such meta attributes are not saved and not added
130to newly constructed data instances. This functionality is used in,
131for instance, the above mentioned basket format, where new meta
132attributes are created while loading the file and new instances to
133contain all words from the past examples.
134
135There is another distinction between the optional and non-optional
136meta attributes: the latter are `expected to be` present in all
137examples of that domain. Saving to files expects them and will fail if
138a non-optional meta value is missing. Optional attributes may be
139missing. In most other places, these rules are not strictly enforced,
140so adhering to them is rather up to choice.
141
142Meta attributes can be added and removed even after the domain is
143constructed and instances of that domain already exist. For instance,
144if data contains the Monk 1 data set, we can add a new continuous
145attribute named "misses" with the following code (a detailed
146desription of methods related to meta attributes is given below)::
147
148     >>> misses = Orange.data.variable.Continuous("misses")
149     >>> id = Orange.data.new_meta_id()
150     >>> data.domain.add_meta(id, misses)
151
152This does not change the data: no attributes are added to data
153instances.
154
155Registering meta attributes enables addressing by indexing, either by
156name or by descriptor. For instance, the following snippet sets the new
157attribute to 0 for all instances in the data table::
158
159     >>> for inst in data:
160     ...     inst[misses] = 0
161
162An alternative is to refer to the attribute by name::
163
164     >>> for inst in data:
165     ...     inst["misses"] = 0
166
167If the attribute were not registered, it could still be set using the
168integer index::
169
170     >>> for inst in data:
171     ...    inst.set_meta(id, 0)
172
173Registering the meta attribute also enhances printouts. When an instance
174is printed, meta-values for registered meta attributes are shown as
175"name:value" pairs, while for unregistered only id is given instead
176of a name.
177
178A meta-attribute can be used, for instance, to record the number of
179misclassifications by a given ``classifier``::
180
181     >>> for inst in data:
182     ... if inst.get_class() != classifier(example):
183     ...     example[misses] += 1
184
185The other effect of registering meta attributes is that they appear in
186converted instances: whenever an instances is converted to some
187domain, it will have all the meta attributes that are registered in
188that domain. If the meta attributes occur in the original domain of
189the instance or if they can be computed from them, they will have
190appropriate values, otherwise they will have a "don't know" value. ::
191
192     domain = data.domain
193     d2 = Orange.data.Domain(["a", "b", "e", "y"], domain)
194     for attr in ["c", "d", "f"]:
195     d2.add_meta(Orange.data.new_meta_id(), domain[attr])
196     d2.add_meta(Orange.data.new_meta_id(), orange.data.variable.Discrete("X"))
197     data2 = Orange.data.Table(d2, data)
198
199Domain ``d2`` in this example has variables ``a``, ``b``, ``e`` and the
200class, while the other three variables are added as meta
201attributes, together with additional attribute X. Results are as
202follows. ::
203
204     >>> print data[55]
205     ['1', '2', '1', '1', '4', '2', '0'], {"misses":0.000000}
206     >>> print data2[55]
207     ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?'}
208
209After conversion, the three attributes are moved to meta attributes
210and the new attribute appears as unknown.
211
212
213
214.. class:: Domain
215
216     .. attribute:: features
217
218     List of domain attributes
219     (:obj:`Orange.data.variable.Variables`) without the class
220     variable. Read only.
221
222     .. attribute:: variables
223
224     List of domain attributes
225     (:obj:`Orange.data.variable.Variables`) including the class
226     variable. Read only.
227
228     .. attribute:: class_var
229
230     The class variable (:obj:`Orange.data.variable.Variable`), or
231     :obj:`None` if there is none. Read only.
232
233     .. attribute:: class_vars
234
235     A list of additional class attributes. Read only.
236
237     .. attribute:: version
238
239     An integer value that is changed when the domain is
240     modified. Can be also used as unique domain identifier; two
241     different domains also have different versions.
242
243     .. method:: __init__(variables[, class_vars=])
244
245     Construct a domain with the given variables specified; the
246     last one is used as the class variable. ::
247
248         >>> a, b, c = [Orange.data.variable.Discrete(x)
249                for x in ["a", "b", "c"]]
250         >>> d = Orange.data.Domain([a, b, c])
251         >>> print d.features
252         <EnumVariable 'a', EnumVariable 'b'>
253         >>> print d.class_var
254         EnumVariable 'c'
255
256     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
257         :param class_vars: A list of multiple classes; must be a keword argument
258     :type variables: list
259
260     .. method:: __init__(features, class_variable[, classVars=])
261
262     Construct a domain with the given list of features and the
263     class variable. ::
264
265         >>> d = Orange.data.Domain([a, b], c)
266         >>> print d.features
267         <EnumVariable 'a', EnumVariable 'b'>
268         >>> print d.class_var EnumVariable 'c'
269
270     :param features: List of features (instances of :obj:`Orange.data.variable.Variable`)
271     :type features: list
272     :param class_variable: Class variable
273         :param class_vars: A list of multiple classes; must be a keword argument
274     :type features: Orange.data.variable.Variable
275
276     .. method:: __init__(variables, has_class[, class_vars=])
277
278     Construct a domain with the given variables. If has_class is
279     :obj:`True`, the last one is used as the class variable. ::
280
281         >>> d = Orange.data.Domain([a, b, c], False)
282         >>> print d.features
283         <EnumVariable 'a', EnumVariable 'b'>
284         >>> print d.class_var
285         EnumVariable 'c'
286
287     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
288     :type features: list
289     :param has_class: A flag telling whether the domain has a class
290         :param class_vars: A list of multiple classes; must be a keword argument
291     :type has_class: bool
292
293     .. method:: __init__(variables, source[, class_vars=])
294
295     Construct a domain with the given variables, which can also be
296     specified by names, provided that the variables with that
297     names exist in the source list. The last variable from the
298     list is used as the class variable. ::
299
300         >>> d1 = orange.Domain([a, b])
301         >>> d2 = orange.Domain(["a", b, c], d1)
302
303     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
304     :type variables: list
305     :param source: An existing domain or a list of variables
306         :param class_vars: A list of multiple classes; must be a keword argument
307     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
308
309     .. method:: __init__(variables, has_class, source[, class_vars=])
310
311     Similar to above except for the flag which tells whether the
312     last variable should be used as the class variable. ::
313
314         >>> d1 = orange.Domain([a, b])
315         >>> d2 = orange.Domain(["a", b, c], d1)
316
317     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
318     :type variables: list
319     :param has_class: A flag telling whether the domain has a class
320     :type has_class: bool
321     :param source: An existing domain or a list of variables
322         :param class_vars: A list of multiple classes; must be a keword argument
323     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
324
325     .. method:: __init__(domain, class_var[, class_vars=])
326
327     Construct a domain as a shallow copy of an existing domain
328     except that the class variable is replaced with the given one
329     and the class variable of the existing domain becoems an
330     ordinary feature. If the new class is one of the original
331     domain's features, it can also be specified by a name.
332
333     :param domain: An existing domain
334     :type domain: :obj:`Orange.variable.Domain`
335     :param class_var: Class variable for the new domain
336         :param class_vars: A list of multiple classes; must be a keword argument
337     :type class_var: string or :obj:`Orange.data.variable.Variable`
338
339     .. method:: __init__(domain, has_class=False[, class_vars=])
340
341     Construct a shallow copy of the domain. If the ``has_class``
342     flag is given and equals :obj:`False`, it moves the class
343     attribute to ordinary features.
344
345     :param domain: An existing domain
346     :type domain: :obj:`Orange.variable.Domain`
347     :param has_class: A flag telling whether the domain has a class
348         :param class_vars: A list of multiple classes; must be a keword argument
349     :type has_class: bool
350
351     .. method:: has_discrete_attributes(include_class=True)
352
353     Return :obj:`True` if the domain has any discrete variables;
354     class is considered unless ``include_class`` is ``False``.
355
356     :param has_class: Tells whether to consider the class variable
357     :type has_class: bool
358     :rtype: bool
359
360     .. method:: has_continuous_attributes(include_class=True)
361
362     Return :obj:`True` if the domain has any continuous variables;
363     class is considered unless ``include_class`` is ``False``.
364
365     :param has_class: Tells whether to consider the class variable
366     :type has_class: bool
367     :rtype: bool
368
369     .. method:: has_other_attributes(include_class=True)
370
371     Return :obj:`True` if the domain has any variables which are
372     neither discrete nor continuous, such as, for instance string variables.
373     class is considered unless ``include_class`` is ``False``.
374
375     :param has_class: Tells whether to consider the class variable
376     :type has_class: bool
377     :rtype: bool
378
379
380     .. method:: add_meta(id, variable, optional=0)
381
382     Register a meta attribute with the given id (obtained by
383     :obj:`Orange.data.new_meta_id`). The same meta attribute can (and
384     should) have the same id when registered in different domains. ::
385
386         >>> newid = Orange.data.new_meta_id()
387         >>> d2.add_meta(newid, Orange.data.variable.String("name"))
388         >>> d2[55]["name"] = "Joe"
389         >>> print data2[55]
390         ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?', "name":'Joe'}
391
392     The third argument tells whether the meta attribute is optional or
393     not. The parameter is an integer, with any non-zero value meaning that
394     the attribute is optional. Different values can be used to distinguish
395     between various optional attributes; the meaning of the value is not
396     defined in advance and can be used arbitrarily by the application.
397
398     :param id: id of the new meta attribute
399     :type id: int
400     :param variable: variable descriptor
401     :type variable: Orange.data.variable.Variable
402     :param optional: tells whether the meta attribute is optional
403     :type optional: int
404
405     .. method:: add_metas(attributes, optional=0)
406
407     Add multiple meta attributes at once. The dictionary contains id's as
408     keys and variables as the corresponding values. The following example
409     shows how to add all meta attributes from one domain to another::
410
411          newdomain.add_metas(domain.get_metas)
412
413     The optional second argument has the same meaning as in :obj:`add_meta`.
414
415     :param attributes: dictionary of id's and variables
416     :type attributes: dict
417     :param optional: tells whether the meta attribute is optional
418     :type optional: int
419
420     .. method:: remove_meta(attribute)
421
422     Removes one or multiple meta attributes. Removing a meta attribute has
423     no effect on data instances.
424
425     :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
426     :type attribute: string, int, Orange.data.variable.Variable; or a list
427
428     .. method:: has_attribute(attribute)
429
430     Return True if the domain contains the specified meta attribute.
431
432     :param attribute: attribute to be checked
433     :type attribute: string, int, Orange.data.variable.Variable
434     :rtype: bool
435
436     .. method:: meta_id(attribute)
437
438     Return an id of a meta attribute.
439
440     :param attribute: name or variable descriptor of the attribute
441     :type attribute: string or Orange.data.variable.Variable
442     :rtype: int
443
444     .. method:: get_meta(attribute)
445
446     Return a variable descriptor corresponding to the meta attribute.
447
448     :param attribute: name or id of the attribute
449     :type attribute: string or int
450     :rtype: Orange.data.variable.Variable
451
452     .. method:: get_metas()
453
454      Return a dictionary with meta attribute id's as keys and corresponding
455      variable descriptors as values.
456
457     .. method:: get_metas(optional)
458
459      Return a dictionary with meta attribute id's as keys and corresponding
460      variable descriptors as values; the dictionary contains only meta
461      attributes for which the argument ``optional`` matches the flag given
462      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`.
463
464      :param optional: flag that specifies the attributes to be returned
465      :type optional: int
466      :rtype: dict
467
468     .. method:: is_optional_meta(attribute)
469
470     Return True if the given meta attribute is optional, and False if it is
471     not.
472
473     :param attribute: attribute to be checked
474     :type attribute: string, int, Orange.data.variable.Variable
475     :rtype: bool
Note: See TracBrowser for help on using the repository browser.