Changeset 10073:521af73504e4 in orange


Ignore:
Timestamp:
02/08/12 14:37:23 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Documentation for Orange.data.sample

Files:
4 edited

Legend:

Unmodified
Added
Removed
  • Orange/data/sample.py

    r9994 r10073  
    1 """ 
    2 ================================= 
    3 Sampling of examples (``sample``) 
    4 ================================= 
    5  
    6 Example sampling is one of the basic procedures in machine learning. If 
    7 for nothing else, everybody needs to split dataset into training and 
    8 testing examples.  
    9   
    10 It is easy to select a subset of examples in Orange. The key idea is the 
    11 use of indices: first construct a list of indices, one corresponding 
    12 to each example. Then you can select examples by indices, say take 
    13 all examples with index 3. Or with index other than 3. It is obvious 
    14 that this is useful for many typical setups, such as 70-30 splits or 
    15 cross-validation.  
    16   
    17 Orange provides methods for making such selections, such as 
    18 :obj:`Orange.data.Table.select`.  And, of course, it provides methods 
    19 for constructing indices for different kinds of splits. For instance, 
    20 for the most common used sampling method, cross-validation, the Orange's 
    21 class :obj:`SubsetIndicesCV` prepares a list of indices that assign a 
    22 fold to each example. 
    23  
    24 Classes that construct such indices are derived from a basic 
    25 abstract :obj:`SubsetIndices`. There are three different classes 
    26 provided. :obj:`SubsetIndices2` constructs a list of 0's and 1's in 
    27 prescribed proportion; it can be used for, for instance, 70-30 divisions 
    28 on training and testing examples. A more general :obj:`SubsetIndicesN` 
    29 construct a list of indices from 0 to N-1 in given proportions. Finally, 
    30 the most often used :obj:`SubsetIndicesCV` prepares indices for 
    31 cross-validation. 
    32  
    33 Subset indices are more deterministic than in versions of Orange prior to 
    34 September 2003. See examples in the section about :obj:`SubsetIndices2` 
    35 for details. 
    36   
    37 .. class:: SubsetIndices 
    38  
    39     .. data:: Stratified 
    40  
    41     .. data:: NotStratified 
    42  
    43     .. data:: StratifiedIfPossible 
    44          
    45         Constants for setting :obj:`stratified`. If 
    46         :obj:`StratifiedIfPossible`, Orange will try to construct 
    47         stratified indices, but fall back to non-stratified if anything 
    48         goes wrong. For stratified indices, it needs to see the example 
    49         table (see the calling operator below), and the class should be 
    50         discrete and have no unknown values. 
    51  
    52  
    53     .. attribute:: stratified 
    54  
    55         Defines whether the division should be stratified, that is, 
    56         whether all subset should have approximatelly equal class 
    57         distributions. Possible values are :obj:`Stratified`, 
    58         :obj:`NotStratified` and :obj:`StratifiedIfPossible` (default). 
    59  
    60     .. attribute:: randseed 
    61      
    62     .. attribute:: random_generator 
    63  
    64         These two fields deal with the way :obj:`SubsetIndices` generates 
    65         random numbers. 
    66  
    67         If :obj:`random_generator` (of type :obj:`Orange.misc.Random`) 
    68         is set, it is used. The same random generator can be shared 
    69         between different objects; this can be useful when constructing an 
    70         experiment that depends on a single random seed. If you use this, 
    71         :obj:`SubsetIndices` will return a different set of indices each 
    72         time it's called, even if with the same arguments. 
    73  
    74         If :obj:`random_generator` is not given, but :attr:`randseed` is 
    75         (positive values denote a defined :obj:`randseed`), the value is 
    76         used to initiate a new, temporary local random generator. This 
    77         way, the indices generator will always give same indices for 
    78         the same data. 
    79  
    80         If none of the two is defined, a new random generator 
    81         is constructed each time the object is called (note that 
    82         this is unlike some other classes, such as :obj:`Variable`, 
    83         :obj:`Distribution` and :obj:`Orange.data.Table`, that store 
    84         such generators for future use; the generator constructed by 
    85         :obj:`SubsetIndices` is disposed after use) and initialized 
    86         with random seed 0. This thus has the same effect as setting 
    87         :obj:`randseed` to 0. 
    88  
    89         The example for :obj:`SubsetIndices2` shows the difference 
    90         between those options. 
    91  
    92     .. method:: __call__(examples) 
    93  
    94         :obj:`SubsetIndices` can be called to return a list of 
    95         indices. The argument can be either the desired length of the list 
    96         (presumably corresponding to a length of some list of examples) 
    97         or a set of examples, given as :obj:`Orange.data.Table` or plain 
    98         Python list. It is obvious that in the former case, indices 
    99         cannot correspond to a stratified division; if :obj:`stratified` 
    100         is set to :obj:`Stratified`, an exception is raised. 
    101  
    102 .. class:: SubsetIndices2 
    103  
    104     This object prepares a list of 0's and 1's. 
    105   
    106     .. attribute:: p0 
    107  
    108         The proportion or a number of 0's. If :obj:`p0` is less than 
    109         1, it's a proportion. For instance, if :obj:`p0` is 0.2, 20% 
    110         of indices will be 0's and 80% will be 1's. If :obj:`p0` 
    111         is 1 or more, it gives the exact number of 0's. For instance, 
    112         with :obj:`p0` of 10, you will get a list with 10 0's and 
    113         the rest of the list will be 1's. 
    114   
    115 Say that you have loaded the lenses domain into ``data``. We'll split 
    116 it into two datasets, the first containing only 6 examples and the other 
    117 containing the rest (from :download:`randomindices2.py <code/randomindices2.py>`): 
    118   
    119 .. literalinclude:: code/randomindices2.py 
    120     :lines: 11-17 
    121  
    122 Output:: 
    123  
    124     <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1> 
    125     6 18 
    126   
    127 No surprises here. Let's now see what's with those random seeds and generators. First, we shall simply construct and print five lists of random indices.  
    128   
    129 .. literalinclude:: code/randomindices2.py 
    130     :lines: 19-21 
    131  
    132 Output:: 
    133  
    134     Indices without playing with random generator 
    135     <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
    136     <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
    137     <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
    138     <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
    139     <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
    140  
    141  
    142 We ran it for five times and got the same result each time. 
    143  
    144 .. literalinclude:: code/randomindices2.py 
    145     :lines: 23-26 
    146  
    147 Output:: 
    148  
    149     Indices with random generator 
    150     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    151     <1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1> 
    152     <1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1> 
    153     <1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0> 
    154     <1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1> 
    155  
    156 We have constructed a private random generator for random indices. And 
    157 got five different lists but if you run the whole script again, you'll 
    158 get the same five sets, since the generator will be constructed again 
    159 and start generating number from the beginning. Again, you should have 
    160 got this same indices on any operating system. 
    161  
    162 .. literalinclude:: code/randomindices2.py 
    163     :lines: 28-32 
    164  
    165 Output:: 
    166  
    167     Indices with randseed 
    168     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    169     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    170     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    171     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    172     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    173  
    174  
    175 Here we have set the random seed and removed the random generator 
    176 (otherwise the seed would have no effect as the generator has the 
    177 priority). Each time we run the indices generator, it constructs a 
    178 private random generator and initializes it with the given seed, and 
    179 consequentially always returns the same indices. 
    180  
    181 Let's play with :obj:`SubsetIndices2.p0`. There are 24 examples in the 
    182 dataset. Setting :obj:`SubsetIndices2.p0` to 0.25 instead of 6 shouldn't 
    183 alter the indices. Let's check it. 
    184  
    185 .. literalinclude:: code/randomindices2.py 
    186     :lines: 35-37 
    187  
    188 Output:: 
    189  
    190     Indices with p0 set as probability (not 'a number of') 
    191     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    192  
    193 Finally, let's observe the effects of :obj:`~SubsetIndices.stratified`. By 
    194 default, indices are stratified if it's possible and, in our case, 
    195 it is and they are. 
    196  
    197 .. literalinclude:: code/randomindices2.py 
    198     :lines: 39-49 
    199  
    200 Output:: 
    201  
    202     ... with stratification 
    203     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    204     <0.625, 0.167, 0.208> 
    205     <0.611, 0.167, 0.222> 
    206  
    207 We explicitly requested stratication and got the same indices as 
    208 before. That's OK. We also printed out the distribution for the whole 
    209 dataset and for the selected dataset (as we gave no second parameter, 
    210 the examples with no-null indices got selected). They are not same, but 
    211 they are pretty close. :obj:`SubsetIndices2` did what it could. Now let's 
    212 try without stratification. The script is pretty same except for changing 
    213 :obj:`~SubsetIndices.stratified` to :obj:`~SubsetIndices.NotStratified`. 
    214  
    215 .. literalinclude:: code/randomindices2.py 
    216     :lines: 51-62 
    217  
    218 Output:: 
    219      
    220     ... and without stratification 
    221     <0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1> 
    222     <0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1> 
    223     <0.625, 0.167, 0.208> 
    224     <0.611, 0.167, 0.222> 
    225  
    226  
    227 Different indices and ... just look at the distribution. Could be worse 
    228 but, well, :obj:`~SubsetIndices.NotStratified` doesn't mean that Orange 
    229 will make an effort to get uneven distributions. It just won't mind 
    230 about them. 
    231  
    232 For a final test, you can set the class of one of the examples to unknown 
    233 and rerun the last script with setting :obj:`~SubsetIndices.stratified` 
    234 once to :obj:`~SubsetIndices.Stratified` and once to 
    235 :obj:`~SubsetIndices.StratifiedIfPossible`. In the first case you'll 
    236 get an error and in the second you'll have a non-stratified indices. 
    237  
    238 .. literalinclude:: code/randomindices2.py 
    239     :lines: 64-70 
    240  
    241 Output:: 
    242  
    243     ... stratified 'if possible' 
    244     <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
    245  
    246     ... stratified 'if possible', after removing the first example's class 
    247     <0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1> 
    248   
    249 .. class:: SubsetIndicesN 
    250  
    251     A straight generalization of :obj:`RandomIndices2`, so there's not 
    252     much to be told about it. 
    253  
    254     .. attribute:: p 
    255  
    256         A list of proportions of examples that go to each fold. If 
    257         :obj:`p` has a length of 3, the returned list will have four 
    258         different indices, the first three will have probabilities as 
    259         defined in :obj:`p` while the last will have a probability of 
    260         (1 - sum of elements of :obj:`p`). 
    261  
    262 :obj:`SubsetIndicesN` does not support stratification; setting 
    263 :obj:`stratified` to :obj:`Stratified` will yield an error. 
    264  
    265 Let us construct a list of indices that would assign half of examples 
    266 to the first set and a quarter to the second and third (part of 
    267 :download:`randomindicesn.py <code/randomindicesn.py>`): 
    268  
    269 .. literalinclude:: code/randomindicesn.py 
    270     :lines: 9-14 
    271  
    272 Output: 
    273  
    274     <1, 0, 0, 2, 0, 1, 1, 0, 2, 0, 2, 2, 1, 0, 0, 0, 2, 0, 0, 0, 1, 2, 1, 0> 
    275  
    276 Count them and you'll see there are 12 zero's and 6 one's and two's out of 24. 
    277   
    278 .. class:: SubsetIndicesCV 
    279   
    280     :obj:`SubsetIndicesCV` computes indices for cross-validation. 
    281  
    282     It constructs a list of indices between 0 and :obj:`folds` -1 
    283     (inclusive), with an equal number of each (if the number of examples 
    284     is not divisible by :obj:`folds`, the last folds will have one 
    285     example less). 
    286  
    287     .. attribute:: folds 
    288  
    289         Number of folds. Default is 10. 
    290   
    291 We shall prepare indices for an ordinary ten-fold cross validation and 
    292 indices for 10 examples for 5-fold cross validation. For the latter, 
    293 we shall only pass the number of examples, which, of course, prevents 
    294 the stratification. Part of :download:`randomindicescv.py <code/randomindicescv.py>`): 
    295  
    296 .. literalinclude:: code/randomindicescv.py 
    297     :lines: 7-12 
    298  
    299 Output:: 
    300  
    301     Indices for ordinary 10-fold CV 
    302     <1, 1, 3, 8, 8, 3, 2, 7, 5, 0, 1, 5, 2, 9, 4, 7, 4, 9, 3, 6, 0, 2, 0, 6> 
    303     Indices for 5 folds on 10 examples 
    304     <3, 0, 1, 0, 3, 2, 4, 4, 1, 2> 
    305  
    306  
    307 Since examples don't divide evenly into ten folds, the first four folds 
    308 have one example more - there are three 0's, 1's, 2's and 3's, but only 
    309 two 4's, 5's.. 
    310  
    311 """ 
    312  
    313 pass 
    314  
    3151from orange import \ 
    3162     MakeRandomIndices as SubsetIndices, \ 
  • docs/reference/rst/Orange.data.sample.rst

    r9372 r10073  
    11.. automodule:: Orange.data.sample 
     2 
     3================================= 
     4Random sampling data (``sample``) 
     5================================= 
     6 
     7Random sampling is done by constructing a vector of subset indices 
     8(e.g. a table of 0's and 1's), one corresponding to each instance, and 
     9then passing the vector to the table's :obj:`Orange.data.Table.select` 
     10method. 
     11  
     12Orange provides several methods for construction of such indices: 
     13:obj:`SubsetIndices2` for splitting into two sets (or extracting a 
     14random subset), :obj:`SubsetIndicesN` for splitting into multiple 
     15sets and :obj:`SubsetIndicesCV` for cross validation. All classes are 
     16derived from the abstract class :obj:`SubsetIndices`. 
     17 
     18The typical usage pattern is as follows. :: 
     19 
     20    lenses = Orange.data.Table("lenses") 
     21    indices2 = Orange.data.sample.SubsetIndices2(p0=0.25) 
     22    ind = indices2(lenses) 
     23    lenses0 = lenses.select(ind, 0) 
     24    lenses1 = lenses.select(ind, 1) 
     25 
     26Subset indices are deterministic in the sense that unless the caller 
     27explicitly modifies random seeds, the same setup will always return 
     28the same indices. Details are shown in the section about 
     29:obj:`SubsetIndices2`. 
     30  
     31.. class:: SubsetIndices 
     32 
     33    .. attribute:: stratified 
     34 
     35        Defines whether the samples should be stratified, that is, 
     36        whether all subset should have approximatelly equal class 
     37        distributions. Possible values are 
     38 
     39    .. data:: Stratified 
     40 
     41            Division is stratified; exceptions is raised if this is 
     42            not possible, for instance if the data is numeric. 
     43 
     44    .. data:: NotStratified 
     45 
     46            Division is not stratified. 
     47 
     48    .. data:: StratifiedIfPossible 
     49 
     50            Division is stratified if possible and unstratified 
     51            otherwise (default). 
     52 
     53    .. attribute:: randseed 
     54     
     55    .. attribute:: random_generator 
     56 
     57        If :obj:`random_generator` (of type :obj:`Orange.misc.Random`) 
     58        is set, it is used for generation of random numbers. In this 
     59        case, :obj:`SubsetIndices` will return a different set of 
     60        indices each time it is called. 
     61 
     62        The same generator can be shared between different objects; 
     63        this can be useful when constructing an experiment that 
     64        depends on a single random seed. 
     65 
     66        If :obj:`random_generator` is not given, but :attr:`randseed` 
     67        is set (that is, positive), the value is used to initiate a 
     68        new, temporary local random generator. This way, the indices 
     69        generator will always give same indices for the same data. 
     70 
     71        If none of the two is defined, a new random generator is 
     72        constructed each time the object is called and initialized 
     73        with a seed of 0. Note that this is different from some other 
     74        classes, such as :obj:`~Orange.data.feature.Descriptor`, 
     75        :obj:`~Orange.statistics.distribution.Distribution` and 
     76        :obj:`~Orange.data.Table`, that store such generators for 
     77        future use: the generator constructed by :obj:`SubsetIndices` 
     78        is disposed after use) and initialized with random seed 
     79        0. 
     80 
     81        Examples are shown in documentation for :obj:`SubsetIndices2`. 
     82 
     83    .. method:: __call__(data) 
     84 
     85        Return a list of indices. The argument can be either the 
     86        desired length of the list or a set of instances, given as 
     87        :obj:`Orange.data.Table` or as plain Python list. In the 
     88        former case, sampling cannot be stratified. 
     89 
     90.. class:: SubsetIndices2 
     91 
     92    Prepares a list of 0's and 1's in the given proportions. 
     93  
     94    .. attribute:: p0 
     95 
     96        The proportion or a number of 0's. If :obj:`p0` is less than 
     97        1, the number gives a proportion; for instance, if :obj:`p0` 
     98        is 0.2, 20% of indices will be 0's and 80% will be 1's. If 
     99        :obj:`p0` is 1 or more, it gives the number of 0's; with 
     100        :obj:`p0=10`, the list will have 10 0's and the rest of the 
     101        list will be 1's. 
     102  
     103    The following examples splits the data on lenses to two datasets, 
     104    the first containing only 6 data instances and the other 
     105    containing the rest (from :download:`randomindices2.py 
     106    <code/randomindices2.py>`): 
     107 
     108    .. literalinclude:: code/randomindices2.py 
     109    :lines: 11-17 
     110 
     111    Output:: 
     112 
     113    <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
     114    6 18 
     115 
     116    Repeating this gives the same set of indices. 
     117 
     118    .. literalinclude:: code/randomindices2.py 
     119    :lines: 19-21 
     120 
     121    Output:: 
     122 
     123    Indices without playing with random generator 
     124    <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
     125    <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
     126    <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
     127    <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
     128    <0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1> 
     129 
     130    With a random generator, it gives different indices every time. 
     131 
     132    .. literalinclude:: code/randomindices2.py 
     133    :lines: 23-26 
     134 
     135    Output:: 
     136 
     137    Indices with random generator 
     138    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     139    <1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1> 
     140    <1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1> 
     141    <1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0> 
     142    <1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1> 
     143 
     144    Running this same script again however gives the same indices 
     145    since the same random generator is constructed and used. 
     146 
     147    The next example sets the random seed and removes the random 
     148    generator (otherwise the seed would have no effect as the 
     149    generator has the priority). At each call, it constructs a private 
     150    random generator and initializes it with the given seed, and 
     151    therefore always returns the same indices. 
     152 
     153    .. literalinclude:: code/randomindices2.py 
     154    :lines: 28-32 
     155 
     156    Output:: 
     157 
     158    Indices with randseed 
     159    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     160    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     161    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     162    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     163    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     164 
     165    There are 24 instances in the dataset. Setting 
     166    :obj:`SubsetIndices2.p0` to 0.25 instead of 6 gives the same result. 
     167 
     168    .. literalinclude:: code/randomindices2.py 
     169    :lines: 35-37 
     170 
     171    Output:: 
     172 
     173    Indices with p0 set as probability (not 'a number of') 
     174    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     175 
     176    The class can also be called with a number of data instances 
     177    instead of the data. In this case, stratification is not possible. 
     178 
     179    .. literalinclude:: code/randomindices2.py 
     180    :lines: 64-66 
     181 
     182    Output:: 
     183 
     184    ... stratified 'if possible' 
     185    <1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1> 
     186  
     187.. class:: SubsetIndicesN 
     188 
     189    A generalization of :obj:`RandomIndices2` to multiple subsets. 
     190 
     191    .. attribute:: p 
     192 
     193        A list of proportions of data that go to each fold. If 
     194        :obj:`p` has a length of 3, the returned list will have four 
     195        different indices, the first three will have probabilities as 
     196        defined in :obj:`p` while the last will have a probability of 
     197        (1 - sum of elements of :obj:`p`). 
     198 
     199    :obj:`SubsetIndicesN` does not support stratification; setting 
     200    :obj:`stratified` to :obj:`Stratified` will yield an error. 
     201 
     202    The following constructs a division in which one half of data is 
     203    in the first set and one quarter in the second and in the third 
     204    :download:`randomindicesn.py <code/randomindicesn.py>`). 
     205 
     206    .. literalinclude:: code/randomindicesn.py 
     207        :lines: 9-14 
     208 
     209    Output:: 
     210 
     211        <1, 0, 0, 2, 0, 1, 1, 0, 2, 0, 2, 2, 1, 0, 0, 0, 2, 0, 0, 0, 1, 2, 1, 0> 
     212 
     213 
     214.. class:: SubsetIndicesCV 
     215  
     216    Computes indices for cross-validation by constructing a list of 
     217    indices between 0 and :obj:`folds`-1 (inclusive), with an equal 
     218    number of each (if the number of instances is not divisible by 
     219    :obj:`folds`, the last folds will have one element less). 
     220 
     221    .. attribute:: folds 
     222 
     223        Number of folds. Default is 10. 
     224  
     225    This prepares indices for ten-fold cross validation and indices 
     226    for 10 data instances for 5-fold cross validation without giving 
     227    the actual data in the latter case (:download:`randomindicescv.py 
     228    <code/randomindicescv.py>`). 
     229 
     230    .. literalinclude:: code/randomindicescv.py 
     231    :lines: 7-12 
     232 
     233    Output:: 
     234    Indices for ordinary 10-fold CV 
     235    <1, 1, 3, 8, 8, 3, 2, 7, 5, 0, 1, 5, 2, 9, 4, 7, 4, 9, 3, 6, 0, 2, 0, 6> 
     236    Indices for 5 folds on 10 instances 
     237    <3, 0, 1, 0, 3, 2, 4, 4, 1, 2> 
     238 
     239    Since instances do not divide evenly into ten folds, the first 
     240    four folds have one element more - there are three 0's, 1's, 2's 
     241    and 3's, but only two 4's, 5's.. 
  • docs/reference/rst/code/randomindices2.py

    r9946 r10073  
    1 # Description: Shows how to sample example by random divisions into two groups 
     1# Description: Shows how to sample by random divisions into two groups 
    22# Category:    sampling 
    33# Classes:     SubsetIndices2, RandomGenerator 
     
    6666print indices2(lenses) 
    6767 
    68 print "\n... stratified 'if possible', after removing the first example's class" 
     68print "\n... stratified 'if possible', after removing the first instance's class" 
    6969lenses[0].setclass("?") 
    7070print indices2(lenses) 
  • docs/reference/rst/code/randomindicescv.py

    r9823 r10073  
    99print "Indices for ordinary 10-fold CV" 
    1010print Orange.data.sample.SubsetIndicesCV(lenses) 
    11 print "Indices for 5 folds on 10 examples" 
     11print "Indices for 5 folds on 10 instances" 
    1212print Orange.data.sample.SubsetIndicesCV(10, folds=5) 
Note: See TracChangeset for help on using the changeset viewer.