Changeset 10105:9bc7eb543746 in orange


Ignore:
Timestamp:
02/08/12 18:32:30 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Changed documentation for filter; not done yet

Location:
docs/reference/rst
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.data.filter.rst

    r10049 r10105  
    1010********************** 
    1111 
    12 Filters are used to select subsets of instances. Consider the following 
    13 example, where instances with age="young" from lenses are 
    14 selected: 
     12Filters select subsets of instances. Consider the following 
     13example that selects instances with age="young" from data set lenses: 
    1514 
    1615.. literalinclude:: code/filter.py 
    1716    :lines: 58-64 
    1817 
    19 Outputs:: 
     18Output:: 
    2019 
    2120    Young instances 
     
    2928    ['young', 'hypermetrope', 'yes', 'normal', 'hard'] 
    3029 
    31 :obj:`~Orange.data.Domain.features` behaves as a list and provides method 
     30``data.domain.``:obj:`~Orange.data.Domain.features` behaves as a list and provides method 
    3231`index`, which is used to retrieve the position of feature `age`. Feature 
    3332`age` is also used to construct a :obj:`~Orange.data.Value`. 
    3433 
    35 Structure 
    36 --------- 
    37  
    38 Filters see individual instances, not the entire table, 
    39 and are limited to accepting or rejecting instances. All filters have this 
    40 structure: 
     34 
     35Filters operator on individual instances, not the entire data table, 
     36and are limited to accepting or rejecting instances. All filters are derived from the base class :obj:`Filter`. 
    4137 
    4238.. class:: Filter 
     
    4440    .. attribute:: negate 
    4541 
    46     Inverts the selection. Defaults to :obj:`False`. 
     42        Inverts the selection. Defaults to :obj:`False`. 
    4743 
    4844    .. attribute:: domain 
    4945 
    50     Domain to which examples are converted prior to checking. 
    51     :obj:`Random` ignores this field. 
     46        Domain to which examples are converted prior to checking. 
    5247 
    5348    .. method:: __call__(instance) 
    5449 
    55     Checks whether the instance matches the filter's criterion and returns 
    56     either :obj:`True` or :obj:`False`. 
     50        Check whether the instance matches the filter's criterion and 
     51        return either :obj:`True` or :obj:`False`. 
    5752 
    5853    .. method:: __call__(table) 
    5954 
    60     When given an entire data table, it returns a list of instances (as a 
    61     :obj:`~Orange.data.Table`) that matches the criterion. 
    62  
    63     .. method:: selectionVector(table) 
    64  
    65     Returns a list of :obj:`bool` of the same length as :obj:`table`, 
    66     denoting which instances are accepted. Equivalent to 
    67     `[filter(ex) for i in table]`. 
    68  
    69 An alternative way to apply a filter is to call 
    70 :obj:`~Orange.data.Table.filter` on the data table. 
     55        Return a new data table containing the instances that match 
     56        the criterion. 
     57 
     58        An alternative way to apply a filter is to call 
     59        :obj:`~Orange.data.Table.filter` on the data table. 
    7160 
    7261Random filter 
     
    7564.. class:: Random 
    7665 
    77     It accepts an instance with a given probability. 
     66    Accepts an instance with a given probability. 
    7867 
    7968    .. attribute:: prob 
    8069 
    81     Probability for accepting an instance. 
     70        Probability for accepting an instance. 
    8271 
    8372    .. attribute:: random_generator 
    8473 
    85     The random number generator used for making selections. If not set 
    86     before filtering, a new generator is constructed and stored here for 
    87     later use. 
     74        The random number generator used for making selections. If not 
     75        set before filtering, a new generator is constructed and 
     76        stored here for later use. 
    8877 
    8978.. literalinclude:: code/filter.py 
     
    9483    1 0 0 0 1 1 0 1 0 1 
    9584 
    96 In this script, :obj:`instance` should be some learning instance; 
    97 you can load any data and set `instance = data[0]`. Although the probability 
    98 of selecting an instance is set to 0.7, the filter accepted five out of ten 
    99 instances. Because the filter only sees individual instances, it cannot be 
    100 accurate in this regard. If exactly 70% of instances are needed then use 
    101 :obj:`~Orange.data.sample.SubsetIndices2`. 
    102  
    103 Setting the random generator ensures that the filter will always select 
    104 the same instances, disregarding of how many times you run the script or what 
    105 you do in Orange before you run it. Setting `randomGenerator=24` is a 
    106 shortcut for `randomGenerator = Orange.misc.Random(initseed=24)` or 
     85Although the probability of selecting an instance is set to 0.7, the 
     86filter accepted five out of ten instances since the decision is made for each instance separately. To select exactly 70 % of instance (except for a rounding error), use :obj:`~Orange.data.sample.SubsetIndices2`. 
     87 
     88Setting the random generator ensures that the filter will always 
     89select the same instances. Setting `randomGenerator=24` is a shortcut 
     90for `randomGenerator = Orange.misc.Random(initseed=24)` or 
    10791`randomGenerator = Orange.misc.Random(initseed=24)`. 
    10892 
    10993To select a subset of instances instead of calling the filter for each 
    110 individual example, use a filter like this:: 
     94individual example, call:: 
    11195 
    11296    data70 = randomfilter(data) 
    11397 
    114 Unknown values 
    115 -------------- 
     98 
     99Filtering instances with missing data 
     100------------------------------------- 
    116101 
    117102.. class:: IsDefined 
    118103 
    119     This class selects instances for which all feature values are defined. 
    120     By default, the filter checks all features; you can modify 
    121     the list :obj:`check` to limit the features to check. 
    122     This filter does not check meta attributes. 
    123  
    124 .. class:: HasSpecial 
    125  
    126     This is an obsolete filter which selects instances with at least one 
    127     unknown value in any feature. 
    128     This filter does not check meta attributes. 
     104    Selects instances for which all feature values are defined.  By 
     105    default, the filter checks all features; this can be changed by 
     106    setting the attribute :obj:`check`. The filter does not check meta 
     107    attributes. 
     108 
     109    .. attribute:: check 
     110 
     111    A list of ``bool``s specifying which features to check. Each 
     112    element corresponds to a feature in the domain. By default, 
     113    :obj:`check` is ``None``, meaning that all features are 
     114    checked. The list is initialized to a list of ``True`` when 
     115    the filter's :obj:`~Orange.data.filter.Filter.domain` is set, 
     116    unless the list already exists. The list can be indexed by 
     117    ordinary integers (for example, `check[0]`); if 
     118    :obj:`~Orange.data.filter.Filter.domain` is set, feature names 
     119    or descriptors can also be used as indices. 
     120 
     121    .. literalinclude:: code/filter.py 
     122        :lines: 9, 20-40 
     123 
    129124 
    130125.. class:: HasClass 
    131126 
    132     Selects instances with defined class value. You can use 
    133     :obj:`~Orange.data.filter.Filter.negate` to invert the selection, 
    134     as shown in the script below. 
    135  
    136     .. attribute:: check 
    137  
    138     A list of :obj:`bool` elements specifying which features to check. Each 
    139     element corresponds to a feature in the domain. By default, 
    140     :obj:`check` is :obj:`None`, meaning that all features are checked. The 
    141     list is initialized to a list of :obj:`True` when the filter's 
    142     :obj:`~Orange.data.filter.Filter.domain` is set unless the list 
    143     already exists. You can also set 
    144     :obj:`~Orange.data.filter.HasClass.check` manually, 
    145     even without setting the :obj:`~Orange.data.filter.Filter.domain`. The list 
    146     can be indexed by ordinary integers (for example, 
    147     `check[0]`). If :obj:`~Orange.data.filter.Filter.domain` is set, 
    148     you can also address the list by feature names or descriptors. 
    149  
    150 After setting :obj:`~Orange.data.filter.Filter.domain` 
    151 the :obj:`~Orange.data.Domain` should not be modified. Changes will 
    152 disrupt the correspondence between the domain features and the 
    153 list :obj:`~Orange.data.filter.HasClass.check`, causing unpredictable 
    154 behaviour. 
    155  
    156 .. literalinclude:: code/filter.py 
    157     :lines: 9, 20-55 
    158  
    159 Meta values 
    160 ----------- 
     127    Selects instances with defined class value. Setting 
     128    :obj:`~Orange.data.filter.Filter.negate` to inverts the selection. 
     129 
     130 
     131    .. literalinclude:: code/filter.py 
     132        :lines: 9, 49-55 
     133 
    161134 
    162135.. class:: HasMeta 
    163136 
    164     Filters out instances that don't have a meta attribute with the given id. 
     137    Filters out instances that do not have a meta attribute with the given id. 
    165138 
    166139    .. attribute:: id 
    167140 
    168     The id of the meta attribute to look for. 
    169  
    170 This is filter is especially useful with instances from basket format and 
    171 their optional meta attributes. If they come, for example, 
    172 from a text mining domain, we can use it to get the documents that contain a 
    173 specific word: 
    174  
    175 .. literalinclude:: code/filterm.py 
    176     :lines: 3-5 
    177  
    178 In this example all instances that contain the word "surprise" are selected. 
    179 It does so by searching the :obj:`~Orange.data.Domain` for a meta attribute 
    180 named "suprise" present in the instance. This is an optional attribute that 
    181 does not necessarily appear in all instances. This filter can be used in 
    182 other situations involving meta values that appear only in some instances. 
    183 The corresponding attributes do not need to be registered in the domain. 
    184  
    185 Filtering by value 
    186 ------------------ 
     141        The id of the meta attribute to look for. 
     142 
     143    This is filter is especially useful with instances from basket 
     144    files, which have optional meta attributes. If they come, for 
     145    example, from a text mining domain, we can use it to get the 
     146    documents that contain a specific word: 
     147 
     148    .. literalinclude:: code/filterm.py 
     149        :lines: 3, 5 
     150 
     151 
     152Filtering by values 
     153------------------- 
    187154 
    188155Single values 
    189 ============= 
     156............. 
    190157 
    191158.. class:: SameValue 
    192159 
    193     This is a fast filter for selecting instances with particular value of a 
     160    Fast filter for selecting instances with particular value of a 
    194161    feature. 
    195162 
    196163    .. attribute:: position 
    197164 
    198     Index of feature in the :obj:`~Orange.data.Domain`. Method `index` 
    199     provided by :obj:`~Orange.data.Domain` can be used to retrieve the 
    200     position of a feature. 
     165        Index of feature in the :obj:`~Orange.data.Domain`, as 
     166        returned by :obj:`Orange.data.Domain.index`. 
    201167 
    202168    .. attribute:: value 
    203169 
    204     Features's value. 
     170        Features's value. 
     171 
    205172 
    206173Continuous features 
    207 =================== 
    208  
    209 :obj:`ValueFilter` provides different methods for filtering values of 
    210 countinuous features: :obj:`ValueFilter.Equal`, 
     174................... 
     175 
     176:obj:`Orange.data.filter.Values` provides different methods for 
     177filtering values of countinuous features: :obj:`ValueFilter.Equal`, 
    211178:obj:`ValueFilter.Less`, :obj:`ValueFilter.LessEqual`, 
    212179:obj:`ValueFilter.Greater`, :obj:`ValueFilter.GreaterEqual`, 
     
    215182In the following example two different filters are used: 
    216183:obj:`ValueFilter.GreaterEqual`, which needs only one parameter and 
    217 :obj:`ValueFilter.Between`, which needs to be defined by two parameters. 
     184:obj:`ValueFilter.Between`, which needs two. 
    218185 
    219186.. literalinclude:: code/filterv.py 
    220187    :lines: 52, 75-83 
    221188 
     189 
    222190Multiple values and features 
    223 ============================ 
    224  
    225 :obj:`~Orange.data.filter.Values` performs a similar function as 
    226 :obj:`~Orange.data.filter.SameValue`, but can handle conjunctions and 
    227 disjunctions of more complex conditions. 
     191............................ 
     192 
     193:obj:`~Orange.data.filter.Values` filters by values of multuple 
     194features and can compute conjunctions and disjunctions of more complex 
     195conditions. 
    228196 
    229197.. class:: Values 
     
    231199    .. attribute:: conditions 
    232200 
    233     A list of :obj:`~Orange.data.filter.ValueFilterList` that contains 
    234     conditions. Elements must be objects of type 
    235     :obj:`~Orange.data.filter.ValueFilterDiscrete` for discrete and 
    236     :obj:`~Orange.data.filter.ValueFilterContinuous` for continuous 
    237     attributes; both are derived from 
    238     :obj:`Orange.data.filter.ValueFilter`. 
     201        A list of conditions described by instances of 
     202        :obj:`~Orange.data.filter.ValueFilterDiscrete` for discrete 
     203        features and :obj:`~Orange.data.filter.ValueFilterContinuous` 
     204        for continuous ones; both are derived from 
     205        :obj:`Orange.data.filter.ValueFilter`. 
    239206 
    240207    .. attribute:: conjunction 
    241208 
    242     Decides whether the filter will compute conjunction or disjunction of 
    243     conditions. If :obj:`True`, instance is accepted if no values are 
    244     rejected. If :obj:`False`, instance is accepted if at least one value is 
    245     accepted. 
     209        Decides whether the filter computes conjunction or disjunction 
     210        of conditions. If ``True``, instance is accepted if no 
     211        values are rejected. If ``False``, instance is accepted if 
     212        at least one value is accepted. 
    246213 
    247214.. class:: ValueFilter 
    248215 
     216    The abstract base class for filters for discrete and continuous features. 
     217 
    249218    .. attribute:: position 
    250219 
    251     Indicates the posiiton of the checked feature (similar to 
    252     :obj:`Orange.data.filter.SameValue`). 
     220        The position of the checked feature (as returned by, for 
     221        instance, :obj:`Orange.data.Domain.index`). 
    253222 
    254223    .. attribute:: accept_special 
    255224 
    256     Determines whether undefined values are accepted (1), 
    257     rejected (0) or simply ignored (-1, default). 
     225        Determines whether undefined values are accepted (``1``), 
     226        rejected (``0``) or ignored (``-1``, default). 
    258227 
    259228.. class:: ValueFilterDiscrete 
     
    261230    .. attribute:: values 
    262231 
    263     An immutable :obj:`list` that contains objects of type 
    264     :obj:`~Orange.data.Value`, with values to accept. 
     232        An immutable ``list`` that contains objects of type 
     233        :obj:`~Orange.data.Value`, with values to accept. 
    265234 
    266235.. class:: ValueFilterContinous 
     
    268237    .. attribute:: min 
    269238 
    270     Lower bound of values to consider. 
     239        Lower bound of values to consider. 
    271240 
    272241    .. attribute:: max 
    273242 
    274     Upper bound of values to consider. 
     243        Upper bound of values to consider. 
    275244 
    276245    .. attribute:: outside 
    277246 
    278     Indicates whether instances outside the interval should be accepted. 
    279     Defaults to :obj:`False`. 
     247        Indicates whether instances outside the interval should be 
     248        accepted.  Defaults to :obj:`False`. 
    280249 
    281250.. literalinclude:: code/filter.py 
     
    303272.. literalinclude:: code/filter.py 
    304273    :lines: 129-141 
     274 
  • docs/reference/rst/code/filter.py

    r10031 r10105  
    5959age = data.domain["age"] 
    6060filteryoung.value = Orange.data.Value(age, "young") 
    61 filteryoung.position = data.domain.attributes.index(age) 
     61filteryoung.position = data.domain.features.index(age) 
    6262print "\nYoung instances" 
    6363for ex in filteryoung(data): 
     
    7171fya.conditions.append( 
    7272    Orange.data.filter.ValueFilterDiscrete( 
    73         position=data.domain.attributes.index(age), 
     73        position=data.domain.features.index(age), 
    7474        values=[Orange.data.Value(age, "young"), 
    7575                Orange.data.Value(age, "presbyopic")]) 
     
    7777fya.conditions.append( 
    7878    Orange.data.filter.ValueFilterDiscrete( 
    79         position = data.domain.attributes.index(astigm), 
     79        position = data.domain.features.index(astigm), 
    8080        values=[Orange.data.Value(astigm, "yes")])) 
    8181for ex in fya(data): 
     
    8686    [ 
    8787    Orange.data.filter.ValueFilterDiscrete( 
    88         position=data.domain.attributes.index(age), 
     88        position=data.domain.features.index(age), 
    8989        values=[Orange.data.Value(age, "young"), 
    9090                Orange.data.Value(age, "presbyopic")]), 
    9191    Orange.data.filter.ValueFilterDiscrete( 
    92         position=data.domain.attributes.index(astigm), 
     92        position=data.domain.features.index(astigm), 
    9393        values=[Orange.data.Value(astigm, "yes")]) 
    9494    ]) 
     
    101101    [ 
    102102    Orange.data.filter.ValueFilterDiscrete( 
    103         position=data.domain.attributes.index(age), 
     103        position=data.domain.features.index(age), 
    104104        values=[Orange.data.Value(age, "young"), 
    105105                Orange.data.Value(age, "presbyopic")], acceptSpecial = 0), 
    106106    Orange.data.filter.ValueFilterDiscrete( 
    107         position=data.domain.attributes.index(astigm), 
     107        position=data.domain.features.index(astigm), 
    108108        values=[Orange.data.Value(astigm, "yes")]) 
    109109    ]) 
     
    115115    [ 
    116116    Orange.data.filter.ValueFilterDiscrete( 
    117         position=data.domain.attributes.index(age), 
     117        position=data.domain.features.index(age), 
    118118        values=[Orange.data.Value(age, "young"), 
    119119                Orange.data.Value(age, "presbyopic") 
    120120                ], acceptSpecial = 1), 
    121121    Orange.data.filter.ValueFilterDiscrete( 
    122         position=data.domain.attributes.index(astigm), 
     122        position=data.domain.features.index(astigm), 
    123123        values=[Orange.data.Value(astigm, "yes")]) 
    124124    ]) 
     
    130130    [ 
    131131    Orange.data.filter.ValueFilterDiscrete( 
    132         position=data.domain.attributes.index(age), 
     132        position=data.domain.features.index(age), 
    133133        values=[Orange.data.Value(age, "young"), 
    134134                Orange.data.Value(age, "presbyopic") 
    135135                ], acceptSpecial = 1), 
    136136    Orange.data.filter.ValueFilterDiscrete( 
    137         position=data.domain.attributes.index(astigm), 
     137        position=data.domain.features.index(astigm), 
    138138        values=[Orange.data.Value(astigm, "yes")]) 
    139139    ], 
Note: See TracChangeset for help on using the changeset viewer.