source: orange/docs/reference/rst/Orange.data.domain.rst @ 9524:c806ca0fa3a9

Revision 9524:c806ca0fa3a9, 17.1 KB checked in by janezd <janez.demsar@…>, 2 years ago (diff)

Added documentation about multiple classes

Line 
1.. py:currentmodule:: Orange.data
2
3===============================
4Domain description (``Domain``)
5===============================
6
7In Orange, the term `domain` denotes a set of features, which will be
8used to describe the data instances, the class variables, meta
9attributes and similar. Each data instance, as well as many
10classifiers and other objects are associated with a domain descriptor,
11which defines the object's content and/or its input and output data
12format.
13
14Domain descriptors are also responsible for converting data instances
15from one domain to another, e.g. from the original feature space to
16one with different set of features which are selected or constructed
17from the original set.
18
19Domains as lists
20================
21
22Domains resemble lists: the length of domain is the number of
23variables, including the class variable. Iterating through domain
24goes through features and the class variable, but not through meta
25attributes. Domains can be indexed by integer indices, variable names
26or instances of :obj:`Orange.data.variables.Variable`. Domain has a
27method :obj:`Domain.index` that returns the index of a variable
28specified by a descriptor, name. Slices can be retrieved, but not
29set. ::
30
31    >>> print d2
32    [a, b, e, y], {-4:c, -5:d, -6:f, -7:X}
33    >>> d2[1]
34    EnumVariable 'b'
35    >>> d2["e"]
36    EnumVariable 'e'
37    >>> d2["d"]
38    EnumVariable 'd'
39    >>> d2[-4]
40    EnumVariable 'c'
41    >>> for attr in d2:
42    ...     print attr.name,
43    ...
44    a b e y
45
46Conversions between domains
47===========================
48
49Domain descriptors are used to convert instances from one domain to
50another. ::
51
52     >>> data = Orange.data.Table("monk1")
53     >>> d2 = Orange.data.Domain(["a", "b", "e", "y"], data.domain)
54     >>>
55     >>> inst = data[55]
56     >>> print inst
57     ['1', '2', '1', '1', '4', '2', '0']
58     >>> inst2 = d2(inst)
59     >>>  print inst2
60     ['1', '2', '4', '0']
61
62This is used, for instance, in classifiers: classifiers are often
63trained on a preprocessed domain (e.g. with a subset of features or
64with discretized data) and later used on instances from the original
65domain. Classifiers store the training domain descriptor and use it
66for converting new instances.
67
68Equivalently, instances can be converted by passing the new domain to
69the constructor::
70
71     >>> inst2 = Orange.data.Instance(d2, inst)
72
73Entire data table can be converted similarly::
74
75     >>> data2 = Orange.data.Table(d2, data)
76     >>> print data2[55]
77     ['1', '2', '4', '0']
78
79
80Multiple classes
81================
82
83A domain can have multiple additional class attributes. These are stored
84similarly to other features except that they are not used for learning. The
85list of such classes is stored in `class_vars`. When converting between
86domains, multiple classes can become ordinary features or the class, and
87vice versa.
88
89Meta attributes
90===============
91
92Meta-values are additional values that can be attached to instances.
93It is not necessary that all instances in the same table (or even all
94instances from the same domain) have certain meta-value. See documentation
95on :obj:`Orange.data.Instance` for a more thorough description of meta-values.
96
97Meta attributes that appear in instances can, but don't need to be
98registered in the domain. Typically, the meta attribute will be
99registered for the following reasons.
100
101     * If the domain knows about a meta attribute, their values can be
102       obtained with indexing by names and variable descriptors,
103       e.g. ``inst["age"]``. Values of unregistered meta attributes can
104       be obtained only through integer indices (e.g. inst[id], where
105       id needs to be an integer).
106
107     * When printing out an instance, the symbolic values of discrete
108       meta attributes can only be printed if the attribute is
109       registered. Also, if the attribute is registered, the printed
110       out example will show a (more informative) attribute's name
111       instead of a meta-id.
112
113     * Registering an attribute provides a way to attach a descriptor
114       to a meta-id. See how the basket file format uses this feature.
115
116     * When saving instances to a file, only the values of registered
117       meta attributes are saved.
118
119     * When a new data instance is constructed, it is automatically
120       assigned the meta attributes listed in the domain, with their
121       values set to unknown.
122
123For the latter two points - saving to a file and construction of new
124instances - there is an additional flag: a meta attribute can be
125marked as "optional". Such meta attributes are not saved and not added
126to newly constructed data instances. This functionality is used in,
127for instance, the above mentioned basket format, where new meta
128attributes are created while loading the file and new instances to
129contain all words from the past examples.
130
131There is another distinction between the optional and non-optional
132meta attributes: the latter are `expected to be` present in all
133examples of that domain. Saving to files expects them and will fail if
134a non-optional meta value is missing. Optional attributes may be
135missing. In most other places, these rules are not strictly enforced,
136so adhering to them is rather up to choice.
137
138Meta attributes can be added and removed even after the domain is
139constructed and instances of that domain already exist. For instance,
140if data contains the Monk 1 data set, we can add a new continuous
141attribute named "misses" with the following code (a detailed
142desription of methods related to meta attributes is given below)::
143
144     >>> misses = Orange.data.variable.Continuous("misses")
145     >>> id = orange.new_meta_id()
146     >>> data.domain.add_meta(id, misses)
147
148This does not change the data: no attributes are added to data
149instances.
150
151Registering meta attributes enables addressing by indexing, either by
152name or by descriptor. For instance, the following snippet sets the new
153attribute to 0 for all instances in the data table::
154
155     >>> for inst in data:
156     ...     inst[misses] = 0
157
158An alternative is to refer to the attribute by name::
159
160     >>> for inst in data:
161     ...     inst["misses"] = 0
162
163If the attribute were not registered, it could still be set using the
164integer index::
165
166     >>> for inst in data:
167     ...    inst.set_meta(id, 0)
168
169Registering the meta attribute also enhances printouts. When an instance
170is printed, meta-values for registered meta attributes are shown as
171"name:value" pairs, while for unregistered only id is given instead
172of a name.
173
174A meta-attribute can be used, for instance, to record the number of
175misclassifications by a given ``classifier``::
176
177     >>> for inst in data:
178     ... if inst.get_class() != classifier(example):
179     ...     example[misses] += 1
180
181The other effect of registering meta attributes is that they appear in
182converted instances: whenever an instances is converted to some
183domain, it will have all the meta attributes that are registered in
184that domain. If the meta attributes occur in the original domain of
185the instance or if they can be computed from them, they will have
186appropriate values, otherwise they will have a "don't know" value. ::
187
188     domain = data.domain
189     d2 = Orange.data.Domain(["a", "b", "e", "y"], domain)
190     for attr in ["c", "d", "f"]:
191     d2.add_meta(orange.newmetaid(), domain[attr])
192     d2.add_meta(orange.newmetaid(), orange.data.variable.Discrete("X"))
193     data2 = Orange.data.Table(d2, data)
194
195Domain ``d2`` in this example has variables ``a``, ``b``, ``e`` and the
196class, while the other three variables are added as meta
197attributes, together with additional attribute X. Results are as
198follows. ::
199
200     >>> print data[55]
201     ['1', '2', '1', '1', '4', '2', '0'], {"misses":0.000000}
202     >>> print data2[55]
203     ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?'}
204
205After conversion, the three attributes are moved to meta attributes
206and the new attribute appears as unknown.
207
208
209
210.. class:: Domain
211
212     .. attribute:: features
213
214     List of domain attributes
215     (:obj:`Orange.data.variable.Variables`) without the class
216     variable. Read only.
217
218     .. attribute:: variables
219
220     List of domain attributes
221     (:obj:`Orange.data.variable.Variables`) including the class
222     variable. Read only.
223
224     .. attribute:: class_var
225
226     The class variable (:obj:`Orange.data.variable.Variable`), or
227     :obj:`None` if there is none. Read only.
228
229     .. attribute:: class_vars
230
231     A list of additional class attributes. Read only.
232
233     .. attribute:: version
234
235     An integer value that is changed when the domain is
236     modified. Can be also used as unique domain identifier; two
237     different domains also have different versions.
238
239     .. method:: __init__(variables[, class_vars=])
240
241     Construct a domain with the given variables specified; the
242     last one is used as the class variable. ::
243
244         >>> a, b, c = [Orange.data.variable.Discrete(x)
245                for x in ["a", "b", "c"]]
246         >>> d = Orange.data.Domain([a, b, c])
247         >>> print d.features
248         <EnumVariable 'a', EnumVariable 'b'>
249         >>> print d.class_var
250         EnumVariable 'c'
251
252     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
253         :param class_vars: A list of multiple classes; must be a keword argument
254     :type variables: list
255
256     .. method:: __init__(features, class_variable[, classVars=])
257
258     Construct a domain with the given list of features and the
259     class variable. ::
260
261         >>> d = Orange.data.Domain([a, b], c)
262         >>> print d.features
263         <EnumVariable 'a', EnumVariable 'b'>
264         >>> print d.class_var EnumVariable 'c'
265
266     :param features: List of features (instances of :obj:`Orange.data.variable.Variable`)
267     :type features: list
268     :param class_variable: Class variable
269         :param class_vars: A list of multiple classes; must be a keword argument
270     :type features: Orange.data.variable.Variable
271
272     .. method:: __init__(variables, has_class[, class_vars=])
273
274     Construct a domain with the given variables. If has_class is
275     :obj:`True`, the last one is used as the class variable. ::
276
277         >>> d = Orange.data.Domain([a, b, c], False)
278         >>> print d.features
279         <EnumVariable 'a', EnumVariable 'b'>
280         >>> print d.class_var
281         EnumVariable 'c'
282
283     :param variables: List of variables (instances of :obj:`Orange.data.variable.Variable`)
284     :type features: list
285     :param has_class: A flag telling whether the domain has a class
286         :param class_vars: A list of multiple classes; must be a keword argument
287     :type has_class: bool
288
289     .. method:: __init__(variables, source[, class_vars=])
290
291     Construct a domain with the given variables, which can also be
292     specified by names, provided that the variables with that
293     names exist in the source list. The last variable from the
294     list is used as the class variable. ::
295
296         >>> d1 = orange.Domain([a, b])
297         >>> d2 = orange.Domain(["a", b, c], d1)
298
299     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
300     :type variables: list
301     :param source: An existing domain or a list of variables
302         :param class_vars: A list of multiple classes; must be a keword argument
303     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
304
305     .. method:: __init__(variables, has_class, source[, class_vars=])
306
307     Similar to above except for the flag which tells whether the
308     last variable should be used as the class variable. ::
309
310         >>> d1 = orange.Domain([a, b])
311         >>> d2 = orange.Domain(["a", b, c], d1)
312
313     :param variables: List of variables (strings or instances of :obj:`Orange.data.variable.Variable`)
314     :type variables: list
315     :param has_class: A flag telling whether the domain has a class
316     :type has_class: bool
317     :param source: An existing domain or a list of variables
318         :param class_vars: A list of multiple classes; must be a keword argument
319     :type source: Orange.data.Domain or list of :obj:`Orange.data.variable.Variable`
320
321     .. method:: __init__(domain, class_var[, class_vars=])
322
323     Construct a domain as a shallow copy of an existing domain
324     except that the class variable is replaced with the given one
325     and the class variable of the existing domain becoems an
326     ordinary feature. If the new class is one of the original
327     domain's features, it can also be specified by a name.
328
329     :param domain: An existing domain
330     :type domain: :obj:`Orange.variable.Domain`
331     :param class_var: Class variable for the new domain
332         :param class_vars: A list of multiple classes; must be a keword argument
333     :type class_var: string or :obj:`Orange.data.variable.Variable`
334
335     .. method:: __init__(domain, has_class=False[, class_vars=])
336
337     Construct a shallow copy of the domain. If the ``has_class``
338     flag is given and equals :obj:`False`, it moves the class
339     attribute to ordinary features.
340
341     :param domain: An existing domain
342     :type domain: :obj:`Orange.variable.Domain`
343     :param has_class: A flag telling whether the domain has a class
344         :param class_vars: A list of multiple classes; must be a keword argument
345     :type has_class: bool
346
347     .. method:: has_discrete_attributes(include_class=True)
348
349     Return :obj:`True` if the domain has any discrete variables;
350     class is considered unless ``include_class`` is ``False``.
351
352     :param has_class: Tells whether to consider the class variable
353     :type has_class: bool
354     :rtype: bool
355
356     .. method:: has_continuous_attributes(include_class=True)
357
358     Return :obj:`True` if the domain has any continuous variables;
359     class is considered unless ``include_class`` is ``False``.
360
361     :param has_class: Tells whether to consider the class variable
362     :type has_class: bool
363     :rtype: bool
364
365     .. method:: has_other_attributes(include_class=True)
366
367     Return :obj:`True` if the domain has any variables which are
368     neither discrete nor continuous, such as, for instance string variables.
369     class is considered unless ``include_class`` is ``False``.
370
371     :param has_class: Tells whether to consider the class variable
372     :type has_class: bool
373     :rtype: bool
374
375
376     .. method:: add_meta(id, variable, optional=0)
377
378     Register a meta attribute with the given id (obtained by
379     :obj:`Orange.data.new_meta_id`). The same meta attribute can (and
380     should) have the same id when registered in different domains. ::
381
382         >>> newid = Orange.data.new_meta_id()
383         >>> d2.add_meta(newid, Orange.data.variable.String("name"))
384         >>> d2[55]["name"] = "Joe"
385         >>> print data2[55]
386         ['1', '2', '4', '0'], {"c":'1', "d":'1', "f":'2', "X":'?', "name":'Joe'}
387
388     The third argument tells whether the meta attribute is optional or
389     not. The parameter is an integer, with any non-zero value meaning that
390     the attribute is optional. Different values can be used to distinguish
391     between various optional attributes; the meaning of the value is not
392     defined in advance and can be used arbitrarily by the application.
393
394     :param id: id of the new meta attribute
395     :type id: int
396     :param variable: variable descriptor
397     :type variable: Orange.data.variable.Variable
398     :param optional: tells whether the meta attribute is optional
399     :type optional: int
400
401     .. method:: add_metas(attributes, optional=0)
402
403     Add multiple meta attributes at once. The dictionary contains id's as
404     keys and variables as the corresponding values. The following example
405     shows how to add all meta attributes from one domain to another::
406
407          newdomain.add_metas(domain.get_metas)
408
409     The optional second argument has the same meaning as in :obj:`add_meta`.
410
411     :param attributes: dictionary of id's and variables
412     :type attributes: dict
413     :param optional: tells whether the meta attribute is optional
414     :type optional: int
415
416     .. method:: remove_meta(attribute)
417
418     Removes one or multiple meta attributes. Removing a meta attribute has
419     no effect on data instances.
420
421     :param attribute: attribute(s) to be removed, given as name, id, variable descriptor or a list of them
422     :type attribute: string, int, Orange.data.variable.Variable; or a list
423
424     .. method:: has_attribute(attribute)
425
426     Return True if the domain contains the specified meta attribute.
427
428     :param attribute: attribute to be checked
429     :type attribute: string, int, Orange.data.variable.Variable
430     :rtype: bool
431
432     .. method:: meta_id(attribute)
433
434     Return an id of a meta attribute.
435
436     :param attribute: name or variable descriptor of the attribute
437     :type attribute: string or Orange.data.variable.Variable
438     :rtype: int
439
440     .. method:: get_meta(attribute)
441
442     Return a variable descriptor corresponding to the meta attribute.
443
444     :param attribute: name or id of the attribute
445     :type attribute: string or int
446     :rtype: Orange.data.variable.Variable
447
448     .. method:: get_metas()
449
450      Return a dictionary with meta attribute id's as keys and corresponding
451      variable descriptors as values.
452
453     .. method:: get_metas(optional)
454
455      Return a dictionary with meta attribute id's as keys and corresponding
456      variable descriptors as values; the dictionary contains only meta
457      attributes for which the argument ``optional`` matches the flag given
458      when the attributes were added using :obj:`add_meta` or :obj:`add_metas`.
459
460      :param optional: flag that specifies the attributes to be returned
461      :type optional: int
462      :rtype: dict
463
464     .. method:: is_optional_meta(attribute)
465
466     Return True if the given meta attribute is optional, and False if it is
467     not.
468
469     :param attribute: attribute to be checked
470     :type attribute: string, int, Orange.data.variable.Variable
471     :rtype: bool
Note: See TracBrowser for help on using the repository browser.