source: orange/docs/widgets/rst/data/datasampler.rst @ 11809:cf2369b2427d

Revision 11809:cf2369b2427d, 2.6 KB checked in by blaz <blaz.zupan@…>, 4 months ago (diff)

Updated documentation on Data Sampler widget.

Line 
1.. _Data Sampler:
2
3Data Sampler
4============
5
6.. image:: ../../../../Orange/OrangeWidgets/Data/icons/DataSampler.svg
7   :alt: Data Sampler icon
8   :class: widget-category-data widget-icon
9
10Selects a subset of data instances from the input data set.
11
12Signals
13-------
14
15Inputs:
16    - Data
17        Input data set to be sampled.
18
19Outputs:
20    - Data Sample
21        A set of sampled data instances.
22    - Remaining Data
23        All other data instances from input data set that are not included
24        in the sample.
25
26Description
27-----------
28
29Data Sampler implements several means of
30sampling of the data from the input channel. It outputs the sampled
31data set and complementary data set (with instances from the input set
32that are not included in the sampled data set). Output is set when the
33input data set is provided and after :obj:`Sample Data` is
34pressed.
35
36.. image:: images/DataSampler-stamped.png
37   :alt: Data Sampler
38   :align: right
39
40.. rst-class:: stamp-list
41
42   1. Info on input and output data set.
43   #. If input data contains a class, sampling will try to match
44      its class distribution in the output data sets.
45   #. Set random seed to always obtain the same sample given a choice of
46      data set and sampling parameters.
47   #. :obj:`Random sampling` can draw a
48      fixed number of instances or create a data set with a size set as
49      a proportion of instances from the input data set. In repeated
50      sampling, an data instance may be included in a sampled data several
51      times (like in bootstrap).
52   #. :obj:`Cross validation`, :obj:`Leave-one-out` or sampling that creates
53      :obj:`Multiple subsets` of preset sample sizes relative to the input
54      data set (like random sampling) all create several data samples.
55      Cross validation would split the data to equally-sized subsets
56      (:obj:`Number of folds`), and consider one of these as a sample.
57      Leave-one-out randomly chooses one data instance; all other instances
58      go to :obj:`Remaining Data` channel. Multiple subsets can create subset
59      of different sizes.
60   #. For sampling methods that create different data subsets, this
61      determines which subset is pushed to the :obj:`Data Sample` channel.
62   #. Press :obj:`Sample Data` to push the sample to the output
63      channel of the widget.
64
65.. container:: clearer
66
67   .. image :: images/spacer.png
68
69Example
70-------
71
72In the following workflow Schema where we have sampled 10 data instances
73from Iris data set and send original data and the sample
74to Scatterplot widget. Sampled data instances are plotted with filled circles.
75
76.. image:: images/DataSampler-Example.png
77   :alt: A workflow with Data Sampler
Note: See TracBrowser for help on using the repository browser.