Changeset 10164:5b658417f984 in orange


Ignore:
Timestamp:
02/11/12 21:29:36 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Finished polishing documentation for Orange.data.filter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.data.filter.rst

    r10163 r10164  
    1010********************** 
    1111 
    12 Filters select subsets of instances. Consider the following 
    13 example that selects instances with age="young" from data set lenses: 
    14  
    15 .. literalinclude:: code/filter.py 
    16     :lines: 58-64 
    17  
    18 Output:: 
    19  
    20     Young instances 
    21     ['young', 'myope', 'no', 'reduced', 'none'] 
    22     ['young', 'myope', 'no', 'normal', 'soft'] 
    23     ['young', 'myope', 'yes', 'reduced', 'none'] 
    24     ['young', 'myope', 'yes', 'normal', 'hard'] 
    25     ['young', 'hypermetrope', 'no', 'reduced', 'none'] 
    26     ['young', 'hypermetrope', 'no', 'normal', 'soft'] 
    27     ['young', 'hypermetrope', 'yes', 'reduced', 'none'] 
    28     ['young', 'hypermetrope', 'yes', 'normal', 'hard'] 
    29  
    30 ``data.domain.``:obj:`~Orange.data.Domain.features` behaves as a list and provides method 
    31 `index`, which is used to retrieve the position of feature `age`. Feature 
    32 `age` is also used to construct a :obj:`~Orange.data.Value`. 
    33  
    34  
    35 Filters operator on individual instances, not the entire data table, 
    36 and are limited to accepting or rejecting instances. All filters are derived from the base class :obj:`Filter`. 
     12Filters select subsets of instances. They are most typically used to 
     13select data instances from a table, for example to drop all 
     14instances that have no class value:: 
     15 
     16    filtered = Orange.data.filter.HasClassValue(data) 
     17 
     18Despite this typical use, filters operate on individual instances, not 
     19the entire data table: they can be called with an instance and return 
     20``True`` are ``False`` to accept or reject the instances. Most 
     21examples below use them like this for sake of demonstration. 
     22 
     23An alternative way to apply a filter is to call 
     24:obj:`Orange.data.Table.filter` on the data table. 
     25 
     26All filters are derived from the base class :obj:`Filter`. 
    3727 
    3828.. class:: Filter 
    3929 
     30    Abstract base class for filters. 
     31 
    4032    .. attribute:: negate 
    4133 
     
    4436    .. attribute:: domain 
    4537 
    46         Domain to which examples are converted prior to checking. 
     38        Domain to which data instances are converted before checking. 
    4739 
    4840    .. method:: __call__(instance) 
    4941 
    5042        Check whether the instance matches the filter's criterion and 
    51         return either :obj:`True` or :obj:`False`. 
     43        return either ``True`` or ``False``. 
    5244 
    5345    .. method:: __call__(table) 
     
    5648        the criterion. 
    5749 
    58         An alternative way to apply a filter is to call 
    59         :obj:`~Orange.data.Table.filter` on the data table. 
    60  
    61 Random filter 
    62 ------------- 
    63  
    64 .. class:: Random 
    65  
    66     Accepts an instance with a given probability. 
    67  
    68     .. attribute:: prob 
    69  
    70         Probability for accepting an instance. 
    71  
    72     .. attribute:: random_generator 
    73  
    74         The random number generator used for making selections. If not 
    75         set before filtering, a new generator is constructed and 
    76         stored here for later use. 
    77  
    78 .. literalinclude:: code/filter.py 
    79     :lines: 12-14 
    80  
    81 The output is:: 
    82  
    83     1 0 0 0 1 1 0 1 0 1 
    84  
    85 Although the probability of selecting an instance is set to 0.7, the 
    86 filter accepted five out of ten instances since the decision is made for each instance separately. To select exactly 70 % of instance (except for a rounding error), use :obj:`~Orange.data.sample.SubsetIndices2`. 
    87  
    88 Setting the random generator ensures that the filter will always 
    89 select the same instances. Setting `randomGenerator=24` is a shortcut 
    90 for `randomGenerator = Orange.misc.Random(initseed=24)` or 
    91 `randomGenerator = Orange.misc.Random(initseed=24)`. 
    92  
    93 To select a subset of instances instead of calling the filter for each 
    94 individual example, call:: 
    95  
    96     data70 = randomfilter(data) 
    97  
    98  
    99 Filtering instances with missing data 
    100 ------------------------------------- 
     50 
     51 
     52Filtering missing data 
     53---------------------- 
    10154 
    10255.. class:: IsDefined 
    10356 
    104     Selects instances for which all feature values are defined.  By 
    105     default, the filter checks all features; this can be changed by 
    106     setting the attribute :obj:`check`. The filter does not check meta 
    107     attributes. 
     57    Selects instances for which all feature values are defined. 
    10858 
    10959    .. attribute:: check 
    11060 
    111     A list of ``bool``s specifying which features to check. Each 
     61    A list of ``bool``'s specifying which features to check. Each 
    11262    element corresponds to a feature in the domain. By default, 
    11363    :obj:`check` is ``None``, meaning that all features are 
     
    11969    or descriptors can also be used as indices. 
    12070 
    121     .. literalinclude:: code/filter.py 
    122         :lines: 9, 20-40 
    123  
    124  
    125 .. class:: HasClass 
     71.. literalinclude:: code/filter.py 
     72    :lines: 9, 20-40 
     73 
     74 
     75.. class:: HasClassValue 
    12676 
    12777    Selects instances with defined class value. Setting 
    128     :obj:`~Orange.data.filter.Filter.negate` to inverts the selection. 
    129  
    130  
    131     .. literalinclude:: code/filter.py 
    132         :lines: 9, 49-55 
     78    :obj:`~Orange.data.filter.Filter.negate` inverts the selection and 
     79    chooses examples with unknown class. 
     80 
     81.. literalinclude:: code/filter.py 
     82    :lines: 9, 49-55 
    13383 
    13484 
     
    14191        The id of the meta attribute to look for. 
    14292 
    143     This is filter is especially useful with instances from basket 
    144     files, which have optional meta attributes. If they come, for 
    145     example, from a text mining domain, we can use it to get the 
    146     documents that contain a specific word: 
    147  
    148     .. literalinclude:: code/filterm.py 
    149         :lines: 3, 5 
    150  
    151  
    152 Filtering by value of a single feature 
    153 -------------------------------------- 
     93This is filter is especially useful with instances from basket files, 
     94which have optional meta attributes. If they come, for example, from a 
     95text mining domain, we can use it to get the documents that contain a 
     96specific word: 
     97 
     98.. literalinclude:: code/filterm.py 
     99    :lines: 3, 5 
     100 
     101Random filter 
     102------------- 
     103 
     104.. class:: Random 
     105 
     106    Accepts an instance with a given probability. 
     107 
     108    .. attribute:: prob 
     109 
     110        Probability for accepting an instance. 
     111 
     112    .. attribute:: random_generator 
     113 
     114        The random number generator used for making selections. If not 
     115        set before filtering, a new generator is constructed and 
     116        stored here for later use. If the attribute is set to an 
     117        integer, Orange constructs a random generator and uses the 
     118        integer as a seed. 
     119 
     120.. literalinclude:: code/filter.py 
     121    :lines: 12-14 
     122 
     123The output is:: 
     124 
     125    1 0 0 0 1 1 0 1 0 1 
     126 
     127Although the probability of selecting an instance is set to 0.7, the 
     128filter accepted five out of ten instances since the decision is made for each instance separately. To select exactly 70 % of instance (except for a rounding error), use :obj:`~Orange.data.sample.SubsetIndices2`. 
     129 
     130Setting the random generator ensures that the filter will always 
     131select the same instances. Setting `random_generator=24` is a shortcut 
     132for `random_generator = Orange.misc.Random(initseed=24)`. 
     133 
     134 
     135Filtering by single features 
     136---------------------------- 
    154137 
    155138.. class:: SameValue 
     
    160143    .. attribute:: position 
    161144 
    162         Index of feature in the :obj:`~Orange.data.Domain`, as 
    163         returned by :obj:`Orange.data.Domain.index`. 
     145        Index of feature in the :obj:`~Orange.data.Domain` as returned 
     146        by :obj:`Orange.data.Domain.index`. 
    164147 
    165148    .. attribute:: value 
     
    167150        Features's value. 
    168151 
    169  
    170 Filtering by multiple values 
    171 ---------------------------- 
     152The following example selects instances with age="young" from data set 
     153lenses: 
     154 
     155.. literalinclude:: code/filter.py 
     156    :lines: 58-64 
     157 
     158 
     159``data.domain.``:obj:`~Orange.data.Domain.features` behaves as a list and provides method 
     160`index`, which is used to retrieve the position of feature `age`. Feature 
     161`age` is also used to construct a :obj:`~Orange.data.Value`. 
     162 
     163 
     164Filtering by multiple features 
     165------------------------------ 
    172166 
    173167:obj:`~Orange.data.filter.Values` filters by values of multiple 
    174 features and can compute conjunctions and disjunctions of more complex 
    175 conditions. 
     168features presented as subfilters derived from 
     169:obj:`Orange.data.filter.ValueFilter`. 
    176170 
    177171.. class:: Values 
     
    189183        at least one value is accepted. 
    190184 
     185The attribute :obj:`conditions` contains subfilter instances of the following classes. 
     186 
    191187.. class:: ValueFilter 
    192188 
    193     The abstract base class for filters for discrete and continuous features. 
     189    The abstract base class for subfilters. 
    194190 
    195191    .. attribute:: position 
     
    205201.. class:: ValueFilterDiscrete 
    206202 
    207     Accepts the listed discrete values. 
     203    Subfilter for values of discrete features. 
    208204 
    209205    .. attribute:: values 
     
    214210.. class:: ValueFilterContinous 
    215211 
    216     Accepts the continuous values within (or without) the given interval. 
    217  
    218     .. attribute:: min, ref 
     212    Subfilter for values of continuous features. 
     213 
     214    .. attribute:: min / ref 
    219215 
    220216        Lower bound of the interval (``min`` and ``ref`` are aliases 
    221         for the same attribute. 
     217        for the same attribute). 
    222218 
    223219    .. attribute:: max 
     
    231227        :obj:`ValueFilter.LessEqual`, :obj:`ValueFilter.Greater`, 
    232228        :obj:`ValueFilter.GreaterEqual`, :obj:`ValueFilter.Between`, 
    233         :obj:`ValueFilter.Outside`. Fields ``min`` and ``max`` to 
    234         define the interval for interval operators 
    235         (:obj:`ValueFilter.Between` and :obj:`ValueFilter.Outside`), 
    236         and ``ref`` (which is the same as ``min``) for the others. 
     229        :obj:`ValueFilter.Outside`. 
     230 
     231    Attributes ``min`` and ``max`` define the interval for 
     232    operators :obj:`ValueFilter.Between` and :obj:`ValueFilter.Outside` 
     233    and ``ref`` (which is the same as ``min``) for the others. 
    237234 
    238235 
    239236.. class:: ValueFilterString 
    240237 
    241     Accepts the string values within (or without) the given interval. 
    242  
    243     .. attribute:: min, ref 
     238    Subfilter for values of discrete features. 
     239 
     240    .. attribute:: min / ref 
    244241 
    245242        Lower bound of the interval (``min`` and ``ref`` are aliases 
     
    258255        :obj:`ValueFilter.Outside`, :obj:`Contains`, 
    259256        :obj:`NotContains`, :obj:`BeginsWith`, :obj:`EndsWith`. 
    260  
    261         Fields ``min`` and ``max`` to define the interval for interval 
    262         operators (:obj:`ValueFilter.Between` and 
    263         :obj:`ValueFilter.Outside`), and ``ref`` (which is the same as 
    264         ``min``) for the others. 
    265257     
    266258    .. attribute:: case_sensitive 
     
    268260        Tells whether the comparisons are case sensitive. Default is ``True``. 
    269261 
     262    Attributes ``min`` and ``max`` define the interval for 
     263    operators :obj:`ValueFilter.Between` and :obj:`ValueFilter.Outside` 
     264    and ``ref`` (which is the same as ``min``) for the others. 
     265 
    270266.. class:: ValueFilterStringList 
    271267 
     
    274270    .. attribute:: values 
    275271 
    276         An list of accepted values. 
     272        A list of accepted strings. 
    277273 
    278274    .. attribute:: case_sensitive 
    279275 
    280276        Tells whether the comparisons are case sensitive. Default is ``True``. 
     277 
    281278 
    282279The following script selects instances whose age is "young" or "presbyopic" and 
     
    304301.. literalinclude:: code/filter.py 
    305302    :lines: 129-141 
     303 
     304Composition of filters 
     305---------------------- 
     306 
     307Filters can be combined into conjuctions or disjunctions using the following descendants of :obj:`Filter`. It is possible to build hierarchies of filters (e.g. disjunction of conjuctions). 
     308 
     309.. class:: FilterConjunction 
     310 
     311    Conjunction of filters. Reject the instance if any of the 
     312    combined filters rejects it. Conjunction can be negated using the 
     313    inherited :obj:``~Filter.negate`` flag. 
     314 
     315    .. attribute:: filters 
     316 
     317        A list of filters (instances of :obj:`Filter`) 
     318 
     319.. class:: FilterDisjunction 
     320 
     321    Disjunction of filters. Accept the instance if any of the 
     322    combined filters accepts it. Disjunction can be negated using the 
     323    inherited :obj:``~Filter.negate`` flag. 
     324     
     325    .. attribute:: filters 
     326 
     327        A list of filters (instances of :obj:`Filter`) 
Note: See TracChangeset for help on using the changeset viewer.