source: orange/docs/widgets/rst/data/impute.rst @ 11359:8d54e79aa135

Revision 11359:8d54e79aa135, 3.4 KB checked in by Ales Erjavec <ales.erjavec@…>, 14 months ago (diff)

Cleanup of 'Widget catalog' documentation.

Fixed rst text formating, replaced dead hardcoded reference links (now using
:ref:), etc.

Line 
1.. _Impute:
2
3Impute
4======
5
6.. image:: ../icons/Impute.png
7
8Replaces unknown values in the data.
9
10Signals
11-------
12
13Inputs:
14
15   - Examples (ExampleTable)
16      Data set.
17
18   - Learner for Imputation
19      A learning algorithm to be used when values are imputed using a
20      predictive model. This algorithm, if given, substitutes the default
21      (1-NNLearner).
22
23
24Outputs:
25
26   - Examples (ExampleTable)
27      The same data set as on the input, but with the missing values imputed.
28
29
30Description
31-----------
32
33Some Orange's algorithms and visualization cannot handle unknown values in the
34data. This widget does what statistician call imputation: it substitutes them
35by values computed from the data or set by the user.
36
37.. image:: images/Impute.png
38   :alt: Impute widget
39
40In the top-most box, :obj:`Default imputation method`, the user can specify a
41general imputation technique for all attributes.
42
43   - :obj:`Don't Impute` does nothing with the missing values.
44
45   - :obj:`Average/Most-frequent` uses the average value (for continuous
46     attributes) or the most common value (for discrete attributes).
47
48   - :obj:`Model-based imputer` constructs a model for predicting the missing
49     value based on values of other attributes; a separate model is constructed
50     for each attribute. The default model is 1-NN learner, which takes the
51     value from the most similar example (this is sometimes referred to as hot
52     deck imputation). This algorithm can be substituted by one that the user
53     connects to the input signal :obj:`Learner for Imputation`. Note, however,
54     that if there are discrete and continuous attributes in the data, the
55     algorithm needs to be capable of handling them both; at the moment only
56     kNN learner can do that. (In the future, when Orange has more regressors,
57     Impute widget may have separate input signals for discrete and continuous
58     models.)
59
60   - :obj:`Random values` computes the distributions of values for each
61     attribute and then imputes by picking random values from them.
62
63   - :obj:`Remove examples with missing values` removes the example containing
64     missing values, except for the attributes for which specific actions are
65     defined as described below. This check also applies to the class attribute
66     if :obj:`Impute class values` is checked.
67
68
69
70It is also possible to specify individual treatment for each attribute which
71override the default treatment set above. One can also specify a manually
72defined value used for imputation. In the snapshot on the left, we decided not
73to impute the values of "normalized-losses" and "make", the missing values of
74"aspiration" will be replaced by random values, while the missing values of
75"body-style" and "drive-wheels" are replaced by "hatchback" and "fwd",
76respectively. If the values of "length", "width" or "height" is missing,
77the example is discarded. Values of all other attributes use the default
78method set above (model-based imputer, in our case).
79
80Button :obj:`Set All to Default` resets the individual attribute treatments
81to the default.
82
83Imputing class values is typically not a good practice, so it is off by
84default. It can be enabled by checking :obj:`Impute class values`. If checked
85and the default method is to remove the examples with missing values, then
86also examples with unknown classes are removed; otherwise they are not.
87
88All changes are committed immediately is :obj:`Send automatically` is checked.
89Otherwise, :obj:`Apply` needs to be pushed to apply any new settings.
Note: See TracBrowser for help on using the repository browser.