source: orange/docs/widgets/rst/data/purgedomain.rst @ 11359:8d54e79aa135

Revision 11359:8d54e79aa135, 3.4 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Cleanup of 'Widget catalog' documentation.

Fixed rst text formating, replaced dead hardcoded reference links (now using
:ref:), etc.

Line 
1.. _Purge Domain:
2
3Purge Domain
4============
5
6.. image:: ../icons/PurgeDomain.png
7
8Removes the unused attribute values and useless attributes, sorts values of
9the remaining.
10
11Signals
12-------
13
14Inputs:
15
16
17   - Examples (ExampleTable)
18      A data set.
19
20
21Outputs:
22
23
24   - Examples (ExampleTable)
25      Filtered data set
26
27
28Description
29-----------
30
31Definitions of nominal attributes sometimes contain values which don't appear
32in the data. Even if this does not happen in the original data, filtering the
33data, selecting examples subsets and similar can remove all examples for which
34the attribute has some particular value. Such values clutter data presentation,
35especially various visualizations, and should be removed.
36
37After purging an attribute, it may become single-valued or, in extreme case,
38have no values at all (if the value of this attribute was undefined for all
39examples). In such cases, the attribute can be removed.
40
41A different issue is the order of attribute values: if the data is read from a
42file in a format where the values are not declared in advance, they are sorted
43"in order of appearance". Sometimes we would prefer to have them sorted
44alphabetically.
45
46.. image:: images/PurgeDomain.png
47
48Such purification is done by widget Purge Domain. Ordinary attributes and class
49attributes are treated separately. For each, we can decide if we want the
50values sorted or not. Next, we may allow the widget to remove attributes with
51less than two values, or remove the class attribute if there are less than two
52classes. Finally, we can instruct the widget to check which values of
53attributes actually appear in the data and remove the unused values. The widget
54cannot remove values if it is not allowed to remove the attributes; since
55(potentially) having attributes without values makes no sense.
56
57If :obj:`Send automatically` is checked, the widget will send data at each
58change of widget settings. Otherwise, sending the data needs to be explicitly
59initiated by clicking the :obj:`Send data` button.
60
61The new, reduced attributes get a prefix "R", which distinguishes them from
62the original ones. The values of new attributes can be computed from the old
63ones, but not the opposite. This means that if you construct a classifier from
64the new attributes, you can use it to classify the examples described by the
65original attributes. But not the opposite: constructing the classifier from
66old attributes and using it on examples described by the reduced ones won't
67work. Fortunately, the latter is seldom the case. In a typical setup, one would
68explore the data, visualize it, filter it, purify it... and then test the final
69model on the original data.
70
71Examples
72--------
73
74Purge Domain would typically appear after data filtering, for instance when
75selecting a subset of visualized examples.
76
77.. image:: images/PurgeDomain-Schema.png
78   :alt: Schema with Purge Domain
79
80In the above schema we play with the Zoo data set: we visualize it and select
81a portion of the data which contains only four out of the seven original
82classes. To get rid of the empty classes, we put the data through Purge Domain
83before going on in, say, Attribute Statistics widget. The latter shows only
84the four classes which actually appear. To see the effect of data
85purification, uncheck :obj:`Remove unused class values` and observe the effect
86this has on Attribute Statistics.
87
88.. image:: images/PurgeDomain-Widgets.png
89   :alt: Schema with Purge Domain
Note: See TracBrowser for help on using the repository browser.