Changeset 9808:0a0cb189ea89 in orange


Ignore:
Timestamp:
02/06/12 16:39:09 (2 years ago)
Author:
tomazc <tomaz.curk@…>
Branch:
default
rebase_source:
540d741850d292ab579d3f0d544347ebde6689cf
Message:

Changes to Orange.feature.imputation.

Location:
docs/reference/rst
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/reference/rst/Orange.feature.imputation.rst

    r9807 r9808  
    3030that accept training data and construct an instance of a class derived from 
    3131:obj:`Imputer`. When an :obj:`Imputer` is called with an 
    32 :obj:`Orange.data.Instance` it returns a new example with the 
    33 missing values imputed (leaving the original example intact). If imputer is 
    34 called with an :obj:`Orange.data.Table` it returns a new example table 
    35 with imputed instances. 
     32:obj:`Orange.data.Instance` it returns a new instance with the 
     33missing values imputed (leaving the original instance intact). If imputer is 
     34called with an :obj:`Orange.data.Table` it returns a new data table with 
     35imputed instances. 
    3636 
    3737.. class:: ImputerConstructor 
     
    3939    .. attribute:: imputeClass 
    4040 
    41     Indicates whether to impute the class value. Default is True. 
     41    Indicates whether to impute the class value. Defaults to True. 
    4242 
    4343    .. attribute:: deterministic 
    4444 
    45     Indicates whether to initialize random by example's CRC. Default is False. 
     45    Indicates whether to initialize random by example's CRC. Defaults to False. 
    4646 
    4747Simple imputation 
    4848================= 
    4949 
    50 Simple imputers always impute the same value for a particular attribute, 
    51 disregarding the values of other attributes. They all use the same class 
     50Simple imputers always impute the same value for a particular feature, 
     51disregarding the values of other features. They all use the same class 
    5252:obj:`Imputer_defaults`. 
    5353 
     
    6969they will impute the lowest (the one with index 0, e. g. attr.values[0]), 
    7070the highest (attr.values[-1]), and the most common value encountered in the 
    71 data. 
    72  
    73 If values of discrete features are be ordered according to their 
    74 impact on class (for example, possible values for symptoms of some 
     71data, respectively. If values of discrete features are ordered according to 
     72their impact on class (for example, possible values for symptoms of some 
    7573disease can be ordered according to their seriousness), 
    76 the minimal and maximal imputers will then represent optimistic and 
     74the minimal and maximal imputers  will then represent optimistic and 
    7775pessimistic imputations. 
    7876 
    79 To construct the :obj:`~Orange.feature.imputation.Imputer_defaults` 
    80 yourself and specify your own defaults. Or leave some values unspecified, in 
    81 which case the imputer won't impute them, as in the following example. Here, 
    82 the only attribute whose values will get imputed is "LENGTH"; the imputed value 
    83 will be 1234. 
     77User-define defaults can be given when constructing a :obj:`~Orange.feature 
     78.imputation.Imputer_defaults`. Values that are left unspecified do not get 
     79imputed. In the following example "LENGTH" is the 
     80only attribute to get imputed with value 1234: 
    8481 
    8582.. literalinclude:: code/imputation-complex.py 
    8683    :lines: 56-69 
    8784 
    88 :obj:`Orange.feature.imputation.Imputer_defaults`'s constructor will accept an 
    89 argument of type :obj:`Orange.data.Domain` (in which case it will construct an 
    90 empty instance for :obj:`defaults`) or an example. (Be careful with this: 
    91 :obj:`Orange.feature.imputation.Imputer_defaults` will have a reference to the 
    92 instance and not a copy. But you can make a copy yourself to avoid problems: 
    93 instead of `Imputer_defaults(data[0])` you may want to write 
     85If :obj:`~Orange.feature.imputation.Imputer_defaults`'s constructor is given 
     86an argument of type :obj:`~Orange.data.Domain` it constructs an empty instance 
     87for :obj:`defaults`. If an instance is given, the reference to the 
     88instance will be kept. To avoid problems associated with `Imputer_defaults 
     89(data[0])`, it is better to provide a copy of the instance: 
    9490`Imputer_defaults(Orange.data.Instance(data[0]))`. 
    9591 
     
    108104    .. attribute:: deterministic 
    109105 
    110     If true (default is False), random generator is initialized for each 
    111     example using the example's hash value as a seed. This results in same 
    112     examples being always imputed the same values. 
     106    If true (defaults to False), random generator is initialized for each 
     107    instance using the instance's hash value as a seed. This results in same 
     108    instances being always imputed with the same (random) values. 
    113109 
    114110Model-based imputation 
     
    117113.. class:: ImputerConstructor_model 
    118114 
    119     Model-based imputers learn to predict the attribute's value from values of 
    120     other attributes. :obj:`ImputerConstructor_model` are given a learning 
    121     algorithm (two, actually - one for discrete and one for continuous 
    122     attributes) and they construct a classifier for each attribute. The 
    123     constructed imputer :obj:`Imputer_model` stores a list of classifiers which 
    124     are used when needed. 
     115    Model-based imputers learn to predict the features's value from values of 
     116    other features. :obj:`ImputerConstructor_model` are given two learning 
     117    algorithms and they construct a classifier for each attribute. The 
     118    constructed imputer :obj:`Imputer_model` stores a list of classifiers that 
     119    are used for imputation. 
    125120 
    126121    .. attribute:: learner_discrete, learner_continuous 
    127122 
    128123    Learner for discrete and for continuous attributes. If any of them is 
    129     missing, the attributes of the corresponding type won't get imputed. 
     124    missing, the attributes of the corresponding type will not get imputed. 
    130125 
    131126    .. attribute:: use_class 
    132127 
    133     Tells whether the imputer is allowed to use the class value. As this is 
    134     most often undesired, this option is by default set to False. It can 
    135     however be useful for a more complex design in which we would use one 
    136     imputer for learning examples (this one would use the class value) and 
    137     another for testing examples (which would not use the class value as this 
    138     is unavailable at that moment). 
     128    Tells whether the imputer can use the class attribute. Defaults to 
     129    False. It is useful in more complex designs in which one imputer is used 
     130    on learning instances, where it uses the class value, 
     131    and a second imputer on testing instances, where class is not available. 
    139132 
    140133.. class:: Imputer_model 
    141134 
    142     .. attribute: models 
    143  
    144     A list of classifiers, each corresponding to one attribute of the examples 
    145     whose values are to be imputed. The :obj:`classVar`'s of the models should 
    146     equal the examples' attributes. If any of classifier is missing (that is, 
    147     the corresponding element of the table is :obj:`None`, the corresponding 
    148     attribute's values will not be imputed. 
     135    .. attribute:: models 
     136 
     137    A list of classifiers, each corresponding to one attribute to be imputed. 
     138    The :obj:`class_var`'s of the models should equal the instances' 
     139    attributes. If an element is :obj:`None`, the corresponding attribute's 
     140    values are not imputed. 
    149141 
    150142.. rubric:: Examples 
    151143 
    152 The following imputer predicts the missing attribute values using 
    153 classification and regression trees with the minimum of 20 examples in a leaf. 
    154 Part of :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
     144Examples are taken from :download:`imputation-complex.py 
     145<code/imputation-complex.py>`. The following imputer predicts the missing 
     146attribute values using classification and regression trees with the minimum 
     147of 20 examples in a leaf. 
    155148 
    156149.. literalinclude:: code/imputation-complex.py 
    157150    :lines: 74-76 
    158151 
    159 We could even use the same learner for discrete and continuous attributes, 
    160 as :class:`Orange.classification.tree.TreeLearner` checks the class type 
    161 and constructs regression or classification trees accordingly. The 
    162 common parameters, such as the minimal number of 
    163 examples in leaves, are used in both cases. 
    164  
    165 You can also use different learning algorithms for discrete and 
    166 continuous attributes. Probably a common setup will be to use 
    167 :class:`Orange.classification.bayes.BayesLearner` for discrete and 
    168 :class:`Orange.regression.mean.MeanLearner` (which 
    169 just remembers the average) for continuous attributes. Part of 
    170 :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
     152A common setup, where different learning algorithms are used for discrete 
     153and continuous features, is to use 
     154:class:`~Orange.classification.bayes.NaiveLearner` for discrete and 
     155:class:`~Orange.regression.mean.MeanLearner` (which 
     156just remembers the average) for continuous attributes: 
    171157 
    172158.. literalinclude:: code/imputation-complex.py 
    173159    :lines: 91-94 
    174160 
    175 You can also construct an :class:`Imputer_model` yourself. You will do 
     161To construct a yourself. You will do 
    176162this if different attributes need different treatment. Brace for an 
    177163example that will be a bit more complex. First we shall construct an 
    178164:class:`Imputer_model` and initialize an empty list of models. 
    179 The following code snippets are from 
    180 :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`): 
    181  
    182 .. literalinclude:: code/imputation-complex.py 
    183     :lines: 108-109 
    184  
    185 Attributes "LANES" and "T-OR-D" will always be imputed values 2 and 
    186 "THROUGH". Since "LANES" is continuous, it suffices to construct a 
    187 :obj:`DefaultClassifier` with the default value 2.0 (don't forget the 
    188 decimal part, or else Orange will think you talk about an index of a discrete 
    189 value - how could it tell?). For the discrete attribute "T-OR-D", we could 
    190 construct a :class:`Orange.classification.ConstantClassifier` and give the index of value 
    191 "THROUGH" as an argument. But we shall do it nicer, by constructing a 
    192 :class:`Orange.data.Value`. Both classifiers will be stored at the appropriate places 
    193 in :obj:`imputer.models`. 
    194  
    195 .. literalinclude:: code/imputation-complex.py 
    196     :lines: 110-112 
    197  
    198  
    199 "LENGTH" will be computed with a regression tree induced from "MATERIAL", 
    200 "SPAN" and "ERECTED" (together with "LENGTH" as the class attribute, of 
    201 course). Note that we initialized the domain by simply giving a list with 
    202 the names of the attributes, with the domain as an additional argument 
    203 in which Orange will look for the named attributes. 
     165 
     166To construct a user-defined :class:`Imputer_model`: 
     167 
     168.. literalinclude:: code/imputation-complex.py 
     169    :lines: 108-112 
     170 
     171A list of empty models is first initialized. Continuous feature "LANES" is 
     172imputed with value 2, using :obj:`DefaultClassifier` with the default value 
     1732.0. A float must be given, because integer values are interpreted as indexes 
     174of discrete features. Discrete feature "T-OR-D" is imputed using 
     175:class:`Orange.classification.ConstantClassifier` which is given the index 
     176of value "THROUGH" as an argument. Both classifiers are stored at the 
     177appropriate places in :obj:`Imputer_model.models`. 
     178 
     179Feature "LENGTH" is computed with a regression tree induced from "MATERIAL", 
     180"SPAN" and "ERECTED" (feature "LENGTH" is used as class attribute here). 
     181The domain is initialized by simply giving a list of feature names and 
     182domain as an additional argument where Orange will look for features. 
    204183 
    205184.. literalinclude:: code/imputation-complex.py 
    206185    :lines: 114-119 
    207186 
    208 We printed the tree just to see what it looks like. 
    209  
    210 :: 
     187This is how the inferred tree should look like:: 
    211188 
    212189    <XMP class=code>SPAN=SHORT: 1158 
  • docs/reference/rst/code/imputation-values.py

    r9806 r9808  
    1616print imputer(bridges[10]) 
    1717 
    18 impdata = imputer(bridges) 
    19 print impdata[10] 
    20  
     18imputed_bridges = imputer(bridges) 
     19print imputed_bridges[10] 
Note: See TracChangeset for help on using the changeset viewer.