source: orange/orange/Orange/feature/imputation.py @ 9349:fa13a2c52fcd

Revision 9349:fa13a2c52fcd, 28.4 KB checked in by mitar, 2 years ago (diff)

Changed way of linking to code in documentation.

Line 
1"""
2###########################
3Imputation (``imputation``)
4###########################
5
6.. index:: imputation
7
8.. index::
9   single: feature; value imputation
10
11
12Imputation is a procedure of replacing the missing feature values with some
13appropriate values. Imputation is needed because of the methods (learning
14algorithms and others) that are not capable of handling unknown values, for
15instance logistic regression.
16
17Missing values sometimes have a special meaning, so they need to be replaced
18by a designated value. Sometimes we know what to replace the missing value
19with; for instance, in a medical problem, some laboratory tests might not be
20done when it is known what their results would be. In that case, we impute
21certain fixed value instead of the missing. In the most complex case, we assign
22values that are computed based on some model; we can, for instance, impute the
23average or majority value or even a value which is computed from values of
24other, known feature, using a classifier.
25
26In a learning/classification process, imputation is needed on two occasions.
27Before learning, the imputer needs to process the training examples.
28Afterwards, the imputer is called for each example to be classified.
29
30In general, imputer itself needs to be trained. This is, of course, not needed
31when the imputer imputes certain fixed value. However, when it imputes the
32average or majority value, it needs to compute the statistics on the training
33examples, and use it afterwards for imputation of training and testing
34examples.
35
36While reading this document, bear in mind that imputation is a part of the
37learning process. If we fit the imputation model, for instance, by learning
38how to predict the feature's value from other features, or even if we
39simply compute the average or the minimal value for the feature and use it
40in imputation, this should only be done on learning data. If cross validation
41is used for sampling, imputation should be done on training folds only. Orange
42provides simple means for doing that.
43
44This page will first explain how to construct various imputers. Then follow
45the examples for `proper use of imputers <#using-imputers>`_. Finally, quite
46often you will want to use imputation with special requests, such as certain
47features' missing values getting replaced by constants and other by values
48computed using models induced from specified other features. For instance,
49in one of the studies we worked on, the patient's pulse rate needed to be
50estimated using regression trees that included the scope of the patient's
51injuries, sex and age, some attributes' values were replaced by the most
52pessimistic ones and others were computed with regression trees based on
53values of all features. If you are using learners that need the imputer as a
54component, you will need to `write your own imputer constructor
55<#write-your-own-imputer-constructor>`_. This is trivial and is explained at
56the end of this page.
57
58Wrapper for learning algorithms
59===============================
60
61This wrapper can be used with learning algorithms that cannot handle missing
62values: it will impute the missing examples using the imputer, call the
63earning and, if the imputation is also needed by the classifier, wrap the
64resulting classifier into another wrapper that will impute the missing values
65in examples to be classified.
66
67Even so, the module is somewhat redundant, as all learners that cannot handle
68missing values should, in principle, provide the slots for imputer constructor.
69For instance, :obj:`Orange.classification.logreg.LogRegLearner` has an attribute
70:obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor`, and even
71if you don't set it, it will do some imputation by default.
72
73.. class:: ImputeLearner
74
75    Wraps a learner and performs data discretization before learning.
76
77    Most of Orange's learning algorithms do not use imputers because they can
78    appropriately handle the missing values. Bayesian classifier, for instance,
79    simply skips the corresponding attributes in the formula, while
80    classification/regression trees have components for handling the missing
81    values in various ways.
82
83    If for any reason you want to use these algorithms to run on imputed data,
84    you can use this wrapper. The class description is a matter of a separate
85    page, but we shall show its code here as another demonstration of how to
86    use the imputers - logistic regression is implemented essentially the same
87    as the below classes.
88
89    This is basically a learner, so the constructor will return either an
90    instance of :obj:`ImputerLearner` or, if called with examples, an instance
91    of some classifier. There are a few attributes that need to be set, though.
92
93    .. attribute:: base_learner
94   
95    A wrapped learner.
96
97    .. attribute:: imputer_constructor
98   
99    An instance of a class derived from :obj:`ImputerConstructor` (or a class
100    with the same call operator).
101
102    .. attribute:: dont_impute_classifier
103
104    If given and set (this attribute is optional), the classifier will not be
105    wrapped into an imputer. Do this if the classifier doesn't mind if the
106    examples it is given have missing values.
107
108    The learner is best illustrated by its code - here's its complete
109    :obj:`__call__` method::
110
111        def __call__(self, data, weight=0):
112            trained_imputer = self.imputer_constructor(data, weight)
113            imputed_data = trained_imputer(data, weight)
114            base_classifier = self.base_learner(imputed_data, weight)
115            if self.dont_impute_classifier:
116                return base_classifier
117            else:
118                return ImputeClassifier(base_classifier, trained_imputer)
119
120    So "learning" goes like this. :obj:`ImputeLearner` will first construct
121    the imputer (that is, call :obj:`self.imputer_constructor` to get a (trained)
122    imputer. Than it will use the imputer to impute the data, and call the
123    given :obj:`baseLearner` to construct a classifier. For instance,
124    :obj:`baseLearner` could be a learner for logistic regression and the
125    result would be a logistic regression model. If the classifier can handle
126    unknown values (that is, if :obj:`dont_impute_classifier`, we return it as
127    it is, otherwise we wrap it into :obj:`ImputeClassifier`, which is given
128    the base classifier and the imputer which it can use to impute the missing
129    values in (testing) examples.
130
131.. class:: ImputeClassifier
132
133    Objects of this class are returned by :obj:`ImputeLearner` when given data.
134
135    .. attribute:: baseClassifier
136   
137    A wrapped classifier.
138
139    .. attribute:: imputer
140   
141    An imputer for imputation of unknown values.
142
143    .. method:: __call__
144   
145    This class is even more trivial than the learner. Its constructor accepts
146    two arguments, the classifier and the imputer, which are stored into the
147    corresponding attributes. The call operator which does the classification
148    then looks like this::
149
150        def __call__(self, ex, what=orange.GetValue):
151            return self.base_classifier(self.imputer(ex), what)
152
153    It imputes the missing values by calling the :obj:`imputer` and passes the
154    class to the base classifier.
155
156.. note::
157   In this setup the imputer is trained on the training data - even if you do
158   cross validation, the imputer will be trained on the right data. In the
159   classification phase we again use the imputer which was classified on the
160   training data only.
161
162.. rubric:: Code of ImputeLearner and ImputeClassifier
163
164:obj:`Orange.feature.imputation.ImputeLearner` puts the keyword arguments into
165the instance's  dictionary. You are expected to call it like
166:obj:`ImputeLearner(base_learner=<someLearner>,
167imputer=<someImputerConstructor>)`. When the learner is called with examples, it
168trains the imputer, imputes the data, induces a :obj:`base_classifier` by the
169:obj:`base_cearner` and constructs :obj:`ImputeClassifier` that stores the
170:obj:`base_classifier` and the :obj:`imputer`. For classification, the missing
171values are imputed and the classifier's prediction is returned.
172
173Note that this code is slightly simplified, although the omitted details handle
174non-essential technical issues that are unrelated to imputation::
175
176    class ImputeLearner(orange.Learner):
177        def __new__(cls, examples = None, weightID = 0, **keyw):
178            self = orange.Learner.__new__(cls, **keyw)
179            self.__dict__.update(keyw)
180            if examples:
181                return self.__call__(examples, weightID)
182            else:
183                return self
184   
185        def __call__(self, data, weight=0):
186            trained_imputer = self.imputer_constructor(data, weight)
187            imputed_data = trained_imputer(data, weight)
188            base_classifier = self.base_learner(imputed_data, weight)
189            return ImputeClassifier(base_classifier, trained_imputer)
190   
191    class ImputeClassifier(orange.Classifier):
192        def __init__(self, base_classifier, imputer):
193            self.base_classifier = base_classifier
194            self.imputer = imputer
195   
196        def __call__(self, ex, what=orange.GetValue):
197            return self.base_classifier(self.imputer(ex), what)
198
199.. rubric:: Example
200
201Although most Orange's learning algorithms will take care of imputation
202internally, if needed, it can sometime happen that an expert will be able to
203tell you exactly what to put in the data instead of the missing values. In this
204example we shall suppose that we want to impute the minimal value of each
205feature. We will try to determine whether the naive Bayesian classifier with
206its  implicit internal imputation works better than one that uses imputation by
207minimal values.
208
209:download:`imputation-minimal-imputer.py <code/imputation-minimal-imputer.py>` (uses :download:`voting.tab <code/voting.tab>`):
210
211.. literalinclude:: code/imputation-minimal-imputer.py
212    :lines: 7-
213   
214Should ouput this::
215
216    Without imputation: 0.903
217    With imputation: 0.899
218
219.. note::
220   Note that we constructed just one instance of \
221   :obj:`Orange.classification.bayes.NaiveLearner`, but this same instance is
222   used twice in each fold, once it is given the examples as they are (and
223   returns an instance of :obj:`Orange.classification.bayes.NaiveClassifier`.
224   The second time it is called by :obj:`imba` and the \
225   :obj:`Orange.classification.bayes.NaiveClassifier` it returns is wrapped
226   into :obj:`Orange.feature.imputation.Classifier`. We thus have only one
227   learner, but which produces two different classifiers in each round of
228   testing.
229
230Abstract imputers
231=================
232
233As common in Orange, imputation is done by pairs of two classes: one that does
234the work and another that constructs it. :obj:`ImputerConstructor` is an
235abstract root of the hierarchy of classes that get the training data (with an
236optional id for weight) and constructs an instance of a class, derived from
237:obj:`Imputer`. An :obj:`Imputer` can be called with an
238:obj:`Orange.data.Instance` and it will return a new example with the missing
239values imputed (it will leave the original example intact!). If imputer is
240called with an :obj:`Orange.data.Table`, it will return a new example table
241with imputed examples.
242
243.. class:: ImputerConstructor
244
245    .. attribute:: imputeClass
246   
247    Tell whether to impute the class value (default) or not.
248
249Simple imputation
250=================
251
252The simplest imputers always impute the same value for a particular attribute,
253disregarding the values of other attributes. They all use the same imputer
254class, :obj:`Imputer_defaults`.
255   
256.. class:: Imputer_defaults
257
258    .. attribute::  defaults
259   
260    An example with the default values to be imputed instead of the missing.
261    Examples to be imputed must be from the same domain as :obj:`defaults`.
262
263    Instances of this class can be constructed by
264    :obj:`Orange.feature.imputation.ImputerConstructor_minimal`,
265    :obj:`Orange.feature.imputation.ImputerConstructor_maximal`,
266    :obj:`Orange.feature.imputation.ImputerConstructor_average`.
267
268    For continuous features, they will impute the smallest, largest or the
269    average  values encountered in the training examples. For discrete, they
270    will impute the lowest (the one with index 0, e. g. attr.values[0]), the
271    highest (attr.values[-1]), and the most common value encountered in the
272    data. The first two imputers will mostly be used when the discrete values
273    are ordered according to their impact on the class (for instance, possible
274    values for symptoms of some disease can be ordered according to their
275    seriousness). The minimal and maximal imputers will then represent
276    optimistic and pessimistic imputations.
277
278    The following code will load the bridges data, and first impute the values
279    in a single examples and then in the whole table.
280
281:download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`):
282
283.. literalinclude:: code/imputation-complex.py
284    :lines: 9-23
285
286This is example shows what the imputer does, not how it is to be used. Don't
287impute all the data and then use it for cross-validation. As warned at the top
288of this page, see the instructions for actual `use of
289imputers <#using-imputers>`_.
290
291.. note:: The :obj:`ImputerConstructor` are another class with schizophrenic
292  constructor: if you give the constructor the data, it will return an \
293  :obj:`Imputer` - the above call is equivalent to calling \
294  :obj:`Orange.feature.imputation.ImputerConstructor_minimal()(data)`.
295
296You can also construct the :obj:`Orange.feature.imputation.Imputer_defaults`
297yourself and specify your own defaults. Or leave some values unspecified, in
298which case the imputer won't impute them, as in the following example. Here,
299the only attribute whose values will get imputed is "LENGTH"; the imputed value
300will be 1234.
301
302.. literalinclude:: code/imputation-complex.py
303    :lines: 56-69
304
305:obj:`Orange.feature.imputation.Imputer_defaults`'s constructor will accept an
306argument of type :obj:`Orange.data.Domain` (in which case it will construct an
307empty instance for :obj:`defaults`) or an example. (Be careful with this:
308:obj:`Orange.feature.imputation.Imputer_defaults` will have a reference to the
309instance and not a copy. But you can make a copy yourself to avoid problems:
310instead of `Imputer_defaults(data[0])` you may want to write
311`Imputer_defaults(Orange.data.Instance(data[0]))`.
312
313Random imputation
314=================
315
316.. class:: Imputer_Random
317
318    Imputes random values. The corresponding constructor is
319    :obj:`ImputerConstructor_Random`.
320
321    .. attribute:: impute_class
322   
323    Tells whether to impute the class values or not. Defaults to True.
324
325    .. attribute:: deterministic
326
327    If true (default is False), random generator is initialized for each
328    example using the example's hash value as a seed. This results in same
329    examples being always imputed the same values.
330   
331Model-based imputation
332======================
333
334.. class:: ImputerConstructor_model
335
336    Model-based imputers learn to predict the attribute's value from values of
337    other attributes. :obj:`ImputerConstructor_model` are given a learning
338    algorithm (two, actually - one for discrete and one for continuous
339    attributes) and they construct a classifier for each attribute. The
340    constructed imputer :obj:`Imputer_model` stores a list of classifiers which
341    are used when needed.
342
343    .. attribute:: learner_discrete, learner_continuous
344   
345    Learner for discrete and for continuous attributes. If any of them is
346    missing, the attributes of the corresponding type won't get imputed.
347
348    .. attribute:: use_class
349   
350    Tells whether the imputer is allowed to use the class value. As this is
351    most often undesired, this option is by default set to False. It can
352    however be useful for a more complex design in which we would use one
353    imputer for learning examples (this one would use the class value) and
354    another for testing examples (which would not use the class value as this
355    is unavailable at that moment).
356
357.. class:: Imputer_model
358
359    .. attribute: models
360
361    A list of classifiers, each corresponding to one attribute of the examples
362    whose values are to be imputed. The :obj:`classVar`'s of the models should
363    equal the examples' attributes. If any of classifier is missing (that is,
364    the corresponding element of the table is :obj:`None`, the corresponding
365    attribute's values will not be imputed.
366
367.. rubric:: Examples
368
369The following imputer predicts the missing attribute values using
370classification and regression trees with the minimum of 20 examples in a leaf.
371Part of :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`):
372
373.. literalinclude:: code/imputation-complex.py
374    :lines: 74-76
375
376We could even use the same learner for discrete and continuous attributes,
377as :class:`Orange.classification.tree.TreeLearner` checks the class type
378and constructs regression or classification trees accordingly. The
379common parameters, such as the minimal number of
380examples in leaves, are used in both cases.
381
382You can also use different learning algorithms for discrete and
383continuous attributes. Probably a common setup will be to use
384:class:`Orange.classification.bayes.BayesLearner` for discrete and
385:class:`Orange.regression.mean.MeanLearner` (which
386just remembers the average) for continuous attributes. Part of
387:download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`):
388
389.. literalinclude:: code/imputation-complex.py
390    :lines: 91-94
391
392You can also construct an :class:`Imputer_model` yourself. You will do
393this if different attributes need different treatment. Brace for an
394example that will be a bit more complex. First we shall construct an
395:class:`Imputer_model` and initialize an empty list of models.
396The following code snippets are from
397:download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`):
398
399.. literalinclude:: code/imputation-complex.py
400    :lines: 108-109
401
402Attributes "LANES" and "T-OR-D" will always be imputed values 2 and
403"THROUGH". Since "LANES" is continuous, it suffices to construct a
404:obj:`DefaultClassifier` with the default value 2.0 (don't forget the
405decimal part, or else Orange will think you talk about an index of a discrete
406value - how could it tell?). For the discrete attribute "T-OR-D", we could
407construct a :class:`Orange.classification.ConstantClassifier` and give the index of value
408"THROUGH" as an argument. But we shall do it nicer, by constructing a
409:class:`Orange.data.Value`. Both classifiers will be stored at the appropriate places
410in :obj:`imputer.models`.
411
412.. literalinclude:: code/imputation-complex.py
413    :lines: 110-112
414
415
416"LENGTH" will be computed with a regression tree induced from "MATERIAL",
417"SPAN" and "ERECTED" (together with "LENGTH" as the class attribute, of
418course). Note that we initialized the domain by simply giving a list with
419the names of the attributes, with the domain as an additional argument
420in which Orange will look for the named attributes.
421
422.. literalinclude:: code/imputation-complex.py
423    :lines: 114-119
424
425We printed the tree just to see what it looks like.
426
427::
428
429    <XMP class=code>SPAN=SHORT: 1158
430    SPAN=LONG: 1907
431    SPAN=MEDIUM
432    |    ERECTED<1908.500: 1325
433    |    ERECTED>=1908.500: 1528
434    </XMP>
435
436Small and nice. Now for the "SPAN". Wooden bridges and walkways are short,
437while the others are mostly medium. This could be done by
438:class:`Orange.classifier.ClassifierByLookupTable` - this would be faster
439than what we plan here. See the corresponding documentation on lookup
440classifier. Here we are going to do it with a Python function.
441
442.. literalinclude:: code/imputation-complex.py
443    :lines: 121-128
444
445:obj:`compute_span` could also be written as a class, if you'd prefer
446it. It's important that it behaves like a classifier, that is, gets an example
447and returns a value. The second element tells, as usual, what the caller expect
448the classifier to return - a value, a distribution or both. Since the caller,
449:obj:`Imputer_model`, always wants values, we shall ignore the argument
450(at risk of having problems in the future when imputers might handle
451distribution as well).
452
453Missing values as special values
454================================
455
456Missing values sometimes have a special meaning. The fact that something was
457not measured can sometimes tell a lot. Be, however, cautious when using such
458values in decision models; it the decision not to measure something (for
459instance performing a laboratory test on a patient) is based on the expert's
460knowledge of the class value, such unknown values clearly should not be used
461in models.
462
463.. class:: ImputerConstructor_asValue
464
465    Constructs a new domain in which each
466    discrete attribute is replaced with a new attribute that has one value more:
467    "NA". The new attribute will compute its values on the fly from the old one,
468    copying the normal values and replacing the unknowns with "NA".
469
470    For continuous attributes, it will
471    construct a two-valued discrete attribute with values "def" and "undef",
472    telling whether the continuous attribute was defined or not. The attribute's
473    name will equal the original's with "_def" appended. The original continuous
474    attribute will remain in the domain and its unknowns will be replaced by
475    averages.
476
477    :class:`ImputerConstructor_asValue` has no specific attributes.
478
479    It constructs :class:`Imputer_asValue` (I bet you
480    wouldn't guess). It converts the example into the new domain, which imputes
481    the values for discrete attributes. If continuous attributes are present, it
482    will also replace their values by the averages.
483
484.. class:: Imputer_asValue
485
486    .. attribute:: domain
487
488        The domain with the new attributes constructed by
489        :class:`ImputerConstructor_asValue`.
490
491    .. attribute:: defaults
492
493        Default values for continuous attributes. Present only if there are any.
494
495The following code shows what this imputer actually does to the domain.
496Part of :download:`imputation-complex.py <code/imputation-complex.py>` (uses :download:`bridges.tab <code/bridges.tab>`):
497
498.. literalinclude:: code/imputation-complex.py
499    :lines: 137-151
500
501The script's output looks like this::
502
503    [RIVER, ERECTED, PURPOSE, LENGTH, LANES, CLEAR-G, T-OR-D, MATERIAL, SPAN, REL-L, TYPE]
504
505    [RIVER, ERECTED_def, ERECTED, PURPOSE, LENGTH_def, LENGTH, LANES_def, LANES, CLEAR-G, T-OR-D, MATERIAL, SPAN, REL-L, TYPE]
506
507    RIVER: M -> M
508    ERECTED: 1874 -> 1874 (def)
509    PURPOSE: RR -> RR
510    LENGTH: ? -> 1567 (undef)
511    LANES: 2 -> 2 (def)
512    CLEAR-G: ? -> NA
513    T-OR-D: THROUGH -> THROUGH
514    MATERIAL: IRON -> IRON
515    SPAN: ? -> NA
516    REL-L: ? -> NA
517    TYPE: SIMPLE-T -> SIMPLE-T
518
519Seemingly, the two examples have the same attributes (with
520:samp:`imputed` having a few additional ones). If you check this by
521:samp:`original.domain[0] == imputed.domain[0]`, you shall see that this
522first glance is False. The attributes only have the same names,
523but they are different attributes. If you read this page (which is already a
524bit advanced), you know that Orange does not really care about the attribute
525names).
526
527Therefore, if we wrote :samp:`imputed[i]` the program would fail
528since :samp:`imputed` has no attribute :samp:`i`. But it has an
529attribute with the same name (which even usually has the same value). We
530therefore use :samp:`i.name` to index the attributes of
531:samp:`imputed`. (Using names for indexing is not fast, though; if you do
532it a lot, compute the integer index with
533:samp:`imputed.domain.index(i.name)`.)</P>
534
535For continuous attributes, there is an additional attribute with "_def"
536appended; we get it by :samp:`i.name+"_def"`.
537
538The first continuous attribute, "ERECTED" is defined. Its value remains 1874
539and the additional attribute "ERECTED_def" has value "def". Not so for
540"LENGTH". Its undefined value is replaced by the average (1567) and the new
541attribute has value "undef". The undefined discrete attribute "CLEAR-G" (and
542all other undefined discrete attributes) is assigned the value "NA".
543
544Using imputers
545==============
546
547To properly use the imputation classes in learning process, they must be
548trained on training examples only. Imputing the missing values and subsequently
549using the data set in cross-validation will give overly optimistic results.
550
551Learners with imputer as a component
552------------------------------------
553
554Orange learners that cannot handle missing values will generally provide a slot
555for the imputer component. An example of such a class is
556:obj:`Orange.classification.logreg.LogRegLearner` with an attribute called
557:obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor`. To it you
558can assign an imputer constructor - one of the above constructors or a specific
559constructor you wrote yourself. When given learning examples,
560:obj:`Orange.classification.logreg.LogRegLearner` will pass them to
561:obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor` to get an
562imputer (again some of the above or a specific imputer you programmed). It will
563immediately use the imputer to impute the missing values in the learning data
564set, so it can be used by the actual learning algorithm. Besides, when the
565classifier :obj:`Orange.classification.logreg.LogRegClassifier` is constructed,
566the imputer will be stored in its attribute
567:obj:`Orange.classification.logreg.LogRegClassifier.imputer`. At
568classification, the imputer will be used for imputation of missing values in
569(testing) examples.
570
571Although details may vary from algorithm to algorithm, this is how the
572imputation is generally used in Orange's learners. Also, if you write your own
573learners, it is recommended that you use imputation according to the described
574procedure.
575
576Write your own imputer
577======================
578
579Imputation classes provide the Python-callback functionality (not all Orange
580classes do so, refer to the documentation on `subtyping the Orange classes
581in Python <callbacks.htm>`_ for a list). If you want to write your own
582imputation constructor or an imputer, you need to simply program a Python
583function that will behave like the built-in Orange classes (and even less,
584for imputer, you only need to write a function that gets an example as
585argument, imputation for example tables will then use that function).
586
587You will most often write the imputation constructor when you have a special
588imputation procedure or separate procedures for various attributes, as we've
589demonstrated in the description of
590:obj:`Orange.feature.imputation.ImputerConstructor_model`. You basically only
591need to pack everything we've written there to an imputer constructor that
592will accept a data set and the id of the weight meta-attribute (ignore it if
593you will, but you must accept two arguments), and return the imputer (probably
594:obj:`Orange.feature.imputation.Imputer_model`. The benefit of implementing an
595imputer constructor as opposed to what we did above is that you can use such a
596constructor as a component for Orange learners (like logistic regression) or
597for wrappers from module orngImpute, and that way properly use the in
598classifier testing procedures.
599
600"""
601
602import Orange.core as orange
603from orange import ImputerConstructor_minimal
604from orange import ImputerConstructor_maximal
605from orange import ImputerConstructor_average
606from orange import Imputer_defaults
607from orange import ImputerConstructor_model
608from orange import Imputer_model
609from orange import ImputerConstructor_asValue
610
611import Orange.misc
612
613class ImputeLearner(orange.Learner):
614    def __new__(cls, examples = None, weight_id = 0, **keyw):
615        self = orange.Learner.__new__(cls, **keyw)
616        self.dont_impute_classifier = False
617        self.__dict__.update(keyw)
618        if examples:
619            return self.__call__(examples, weight_id)
620        else:
621            return self
622       
623    def __call__(self, data, weight=0):
624        trained_imputer = self.imputer_constructor(data, weight)
625        imputed_data = trained_imputer(data, weight)
626        base_classifier = self.base_learner(imputed_data, weight)
627        if self.dont_impute_classifier:
628            return base_classifier
629        else:
630            return ImputeClassifier(base_classifier, trained_imputer)
631
632ImputeLearner = Orange.misc.deprecated_members(
633  {
634      "dontImputeClassifier": "dont_impute_classifier",
635      "imputerConstructor": "imputer_constructor",
636      "baseLearner": "base_learner",
637      "weightID": "weight_id"
638  })(ImputeLearner)
639
640
641class ImputeClassifier(orange.Classifier):
642    def __init__(self, base_classifier, imputer, **argkw):
643        self.base_classifier = base_classifier
644        self.imputer = imputer
645        self.__dict__.update(argkw)
646
647    def __call__(self, ex, what=orange.GetValue):
648        return self.base_classifier(self.imputer(ex), what)
649
650ImputeClassifier = Orange.misc.deprecated_members(
651  {
652      "baseClassifier": "base_classifier"
653  })(ImputeClassifier)
Note: See TracBrowser for help on using the repository browser.