source: orange/orange/Orange/feature/imputation.py @ 7764:773887c2a6e2

Revision 7764:773887c2a6e2, 27.8 KB checked in by markotoplak, 3 years ago (diff)

Imputation for 2.5.

Line 
1"""
2
3.. index:: imputation
4
5.. index::
6   single: feature; value imputation
7
8
9Imputation is a procedure of replacing the missing feature values with some
10appropriate values. Imputation is needed because of the methods (learning
11algorithms and others) that are not capable of handling unknown values, for
12instance logistic regression.
13
14Missing values sometimes have a special meaning, so they need to be replaced
15by a designated value. Sometimes we know what to replace the missing value
16with; for instance, in a medical problem, some laboratory tests might not be
17done when it is known what their results would be. In that case, we impute
18certain fixed value instead of the missing. In the most complex case, we assign
19values that are computed based on some model; we can, for instance, impute the
20average or majority value or even a value which is computed from values of
21other, known feature, using a classifier.
22
23In a learning/classification process, imputation is needed on two occasions.
24Before learning, the imputer needs to process the training examples.
25Afterwards, the imputer is called for each example to be classified.
26
27In general, imputer itself needs to be trained. This is, of course, not needed
28when the imputer imputes certain fixed value. However, when it imputes the
29average or majority value, it needs to compute the statistics on the training
30examples, and use it afterwards for imputation of training and testing
31examples.
32
33While reading this document, bear in mind that imputation is a part of the
34learning process. If we fit the imputation model, for instance, by learning
35how to predict the feature's value from other features, or even if we
36simply compute the average or the minimal value for the feature and use it
37in imputation, this should only be done on learning data. If cross validation
38is used for sampling, imputation should be done on training folds only. Orange
39provides simple means for doing that.
40
41This page will first explain how to construct various imputers. Then follow
42the examples for `proper use of imputers <#using-imputers>`_. Finally, quite
43often you will want to use imputation with special requests, such as certain
44features' missing values getting replaced by constants and other by values
45computed using models induced from specified other features. For instance,
46in one of the studies we worked on, the patient's pulse rate needed to be
47estimated using regression trees that included the scope of the patient's
48injuries, sex and age, some attributes' values were replaced by the most
49pessimistic ones and others were computed with regression trees based on
50values of all features. If you are using learners that need the imputer as a
51component, you will need to `write your own imputer constructor
52<#write-your-own-imputer-constructor>`_. This is trivial and is explained at
53the end of this page.
54
55Wrapper for learning algorithms
56===============================
57
58This wrapper can be used with learning algorithms that cannot handle missing
59values: it will impute the missing examples using the imputer, call the
60earning and, if the imputation is also needed by the classifier, wrap the
61resulting classifier into another wrapper that will impute the missing values
62in examples to be classified.
63
64Even so, the module is somewhat redundant, as all learners that cannot handle
65missing values should, in principle, provide the slots for imputer constructor.
66For instance, :obj:`Orange.classification.logreg.LogRegLearner` has an attribute
67:obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor`, and even
68if you don't set it, it will do some imputation by default.
69
70.. class:: ImputeLearner
71
72    Wraps a learner and performs data discretization before learning.
73
74    Most of Orange's learning algorithms do not use imputers because they can
75    appropriately handle the missing values. Bayesian classifier, for instance,
76    simply skips the corresponding attributes in the formula, while
77    classification/regression trees have components for handling the missing
78    values in various ways.
79
80    If for any reason you want to use these algorithms to run on imputed data,
81    you can use this wrapper. The class description is a matter of a separate
82    page, but we shall show its code here as another demonstration of how to
83    use the imputers - logistic regression is implemented essentially the same
84    as the below classes.
85
86    This is basically a learner, so the constructor will return either an
87    instance of :obj:`ImputerLearner` or, if called with examples, an instance
88    of some classifier. There are a few attributes that need to be set, though.
89
90    .. attribute:: base_learner
91   
92    A wrapped learner.
93
94    .. attribute:: imputer_constructor
95   
96    An instance of a class derived from :obj:`ImputerConstructor` (or a class
97    with the same call operator).
98
99    .. attribute:: dont_impute_classifier
100
101    If given and set (this attribute is optional), the classifier will not be
102    wrapped into an imputer. Do this if the classifier doesn't mind if the
103    examples it is given have missing values.
104
105    The learner is best illustrated by its code - here's its complete
106    :obj:`__call__` method::
107
108        def __call__(self, data, weight=0):
109            trained_imputer = self.imputer_constructor(data, weight)
110            imputed_data = trained_imputer(data, weight)
111            base_classifier = self.base_learner(imputed_data, weight)
112            if self.dont_impute_classifier:
113                return base_classifier
114            else:
115                return ImputeClassifier(base_classifier, trained_imputer)
116
117    So "learning" goes like this. :obj:`ImputeLearner` will first construct
118    the imputer (that is, call :obj:`self.imputer_constructor` to get a (trained)
119    imputer. Than it will use the imputer to impute the data, and call the
120    given :obj:`baseLearner` to construct a classifier. For instance,
121    :obj:`baseLearner` could be a learner for logistic regression and the
122    result would be a logistic regression model. If the classifier can handle
123    unknown values (that is, if :obj:`dont_impute_classifier`, we return it as
124    it is, otherwise we wrap it into :obj:`ImputeClassifier`, which is given
125    the base classifier and the imputer which it can use to impute the missing
126    values in (testing) examples.
127
128.. class:: ImputeClassifier
129
130    Objects of this class are returned by :obj:`ImputeLearner` when given data.
131
132    .. attribute:: baseClassifier
133   
134    A wrapped classifier.
135
136    .. attribute:: imputer
137   
138    An imputer for imputation of unknown values.
139
140    .. method:: __call__
141   
142    This class is even more trivial than the learner. Its constructor accepts
143    two arguments, the classifier and the imputer, which are stored into the
144    corresponding attributes. The call operator which does the classification
145    then looks like this::
146
147        def __call__(self, ex, what=orange.GetValue):
148            return self.base_classifier(self.imputer(ex), what)
149
150    It imputes the missing values by calling the :obj:`imputer` and passes the
151    class to the base classifier.
152
153.. note::
154   In this setup the imputer is trained on the training data - even if you do
155   cross validation, the imputer will be trained on the right data. In the
156   classification phase we again use the imputer which was classified on the
157   training data only.
158
159.. rubric:: Code of ImputeLearner and ImputeClassifier
160
161:obj:`Orange.feature.imputation.ImputeLearner` puts the keyword arguments into
162the instance's  dictionary. You are expected to call it like
163:obj:`ImputeLearner(base_learner=<someLearner>,
164imputer=<someImputerConstructor>)`. When the learner is called with examples, it
165trains the imputer, imputes the data, induces a :obj:`base_classifier` by the
166:obj:`base_cearner` and constructs :obj:`ImputeClassifier` that stores the
167:obj:`base_classifier` and the :obj:`imputer`. For classification, the missing
168values are imputed and the classifier's prediction is returned.
169
170Note that this code is slightly simplified, although the omitted details handle
171non-essential technical issues that are unrelated to imputation::
172
173    class ImputeLearner(orange.Learner):
174        def __new__(cls, examples = None, weightID = 0, **keyw):
175            self = orange.Learner.__new__(cls, **keyw)
176            self.__dict__.update(keyw)
177            if examples:
178                return self.__call__(examples, weightID)
179            else:
180                return self
181   
182        def __call__(self, data, weight=0):
183            trained_imputer = self.imputer_constructor(data, weight)
184            imputed_data = trained_imputer(data, weight)
185            base_classifier = self.base_learner(imputed_data, weight)
186            return ImputeClassifier(base_classifier, trained_imputer)
187   
188    class ImputeClassifier(orange.Classifier):
189        def __init__(self, base_classifier, imputer):
190            self.base_classifier = base_classifier
191            self.imputer = imputer
192   
193        def __call__(self, ex, what=orange.GetValue):
194            return self.base_classifier(self.imputer(ex), what)
195
196.. rubric:: Example
197
198Although most Orange's learning algorithms will take care of imputation
199internally, if needed, it can sometime happen that an expert will be able to
200tell you exactly what to put in the data instead of the missing values. In this
201example we shall suppose that we want to impute the minimal value of each
202feature. We will try to determine whether the naive Bayesian classifier with
203its  implicit internal imputation works better than one that uses imputation by
204minimal values.
205
206`imputation-minimal-imputer.py`_ (uses `voting.tab`_):
207
208.. literalinclude:: code/imputation-minimal-imputer.py
209    :lines: 7-
210   
211Should ouput this::
212
213    Without imputation: 0.903
214    With imputation: 0.899
215
216.. note::
217   Note that we constructed just one instance of
218   :obj:`Orange.classification.bayes.NaiveLearner`, but this same instance is
219   used twice in each fold, once it is given the examples as they are (and
220   returns an instance of :obj:`Orange.classification.bayes.NaiveClassifier`.
221   The second time it is called by :obj:`imba` and the
222   :obj:`Orange.classification.bayes.NaiveClassifier` it returns is wrapped
223   into :obj:`Orange.feature.imputation.Classifier`. We thus have only one
224   learner, but which produces two different classifiers in each round of
225   testing.
226
227Abstract imputers
228=================
229
230As common in Orange, imputation is done by pairs of two classes: one that does
231the work and another that constructs it. :obj:`ImputerConstructor` is an
232abstract root of the hierarchy of classes that get the training data (with an
233optional id for weight) and constructs an instance of a class, derived from
234:obj:`Imputer`. An :obj:`Imputer` can be called with an
235:obj:`Orange.data.Instance` and it will return a new example with the missing
236values imputed (it will leave the original example intact!). If imputer is
237called with an :obj:`Orange.data.Table`, it will return a new example table
238with imputed examples.
239
240.. class:: ImputerConstructor
241
242    .. attribute:: imputeClass
243   
244    Tell whether to impute the class value (default) or not.
245
246Simple imputation
247=================
248
249The simplest imputers always impute the same value for a particular attribute,
250disregarding the values of other attributes. They all use the same imputer
251class, :obj:`Imputer_defaults`.
252   
253.. class:: Imputer_defaults
254
255    .. attribute::  defaults
256   
257    An example with the default values to be imputed instead of the missing.
258    Examples to be imputed must be from the same domain as :obj:`defaults`.
259
260    Instances of this class can be constructed by
261    :obj:`Orange.feature.imputation.ImputerConstructor_minimal`,
262    :obj:`Orange.feature.imputation.ImputerConstructor_maximal`,
263    :obj:`Orange.feature.imputation.ImputerConstructor_average`.
264
265    For continuous features, they will impute the smallest, largest or the
266    average  values encountered in the training examples. For discrete, they
267    will impute the lowest (the one with index 0, e. g. attr.values[0]), the
268    highest (attr.values[-1]), and the most common value encountered in the
269    data. The first two imputers will mostly be used when the discrete values
270    are ordered according to their impact on the class (for instance, possible
271    values for symptoms of some disease can be ordered according to their
272    seriousness). The minimal and maximal imputers will then represent
273    optimistic and pessimistic imputations.
274
275    The following code will load the bridges data, and first impute the values
276    in a single examples and then in the whole table.
277
278`imputation-complex.py`_ (uses `bridges.tab`_):
279
280.. literalinclude:: code/imputation-complex.py
281    :lines: 9-23
282
283This is example shows what the imputer does, not how it is to be used. Don't
284impute all the data and then use it for cross-validation. As warned at the top
285of this page, see the instructions for actual `use of
286imputers <#using-imputers>`_.
287
288.. note:: :obj:`ImputerConstructor` are another class with schizophrenic
289  constructor: if you give the constructor the data, it will return an
290  :obj:`Imputer` - the above call is equivalent to calling
291  :obj:`Orange.feature.imputation.ImputerConstructor_minimal()(data)`.
292
293You can also construct the :obj:`Orange.feature.imputation.Imputer_defaults`
294yourself and specify your own defaults. Or leave some values unspecified, in
295which case the imputer won't impute them, as in the following example. Here,
296the only attribute whose values will get imputed is "LENGTH"; the imputed value
297will be 1234.
298
299`imputation-complex.py`_ (uses `bridges.tab`_):
300
301.. literalinclude:: code/imputation-complex.py
302    :lines: 56-69
303
304:obj:`Orange.feature.imputation.Imputer_defaults`'s constructor will accept an
305argument of type :obj:`Orange.data.Domain` (in which case it will construct an
306empty instance for :obj:`defaults`) or an example. (Be careful with this:
307:obj:`Orange.feature.imputation.Imputer_defaults` will have a reference to the
308instance and not a copy. But you can make a copy yourself to avoid problems:
309instead of `Imputer_defaults(data[0])` you may want to write
310`Imputer_defaults(Orange.data.Instance(data[0]))`.
311
312Random imputation
313=================
314
315.. class:: Imputer_Random
316
317    Imputes random values. The corresponding constructor is
318    :obj:`ImputerConstructor_Random`.
319
320    .. attribute:: impute_class
321   
322    Tells whether to impute the class values or not. Defaults to True.
323
324    .. attribute:: deterministic
325
326    If true (default is False), random generator is initialized for each
327    example using the example's hash value as a seed. This results in same
328    examples being always imputed the same values.
329   
330Model-based imputation
331======================
332
333.. class:: ImputerConstructor_model
334
335    Model-based imputers learn to predict the attribute's value from values of
336    other attributes. :obj:`ImputerConstructor_model` are given a learning
337    algorithm (two, actually - one for discrete and one for continuous
338    attributes) and they construct a classifier for each attribute. The
339    constructed imputer :obj:`Imputer_model` stores a list of classifiers which
340    are used when needed.
341
342    .. attribute:: learner_discrete, learner_continuous
343   
344    Learner for discrete and for continuous attributes. If any of them is
345    missing, the attributes of the corresponding type won't get imputed.
346
347    .. attribute:: use_class
348   
349    Tells whether the imputer is allowed to use the class value. As this is
350    most often undesired, this option is by default set to False. It can
351    however be useful for a more complex design in which we would use one
352    imputer for learning examples (this one would use the class value) and
353    another for testing examples (which would not use the class value as this
354    is unavailable at that moment).
355
356.. class:: Imputer_model
357
358    .. attribute: models
359
360    A list of classifiers, each corresponding to one attribute of the examples
361    whose values are to be imputed. The :obj:`classVar`'s of the models should
362    equal the examples' attributes. If any of classifier is missing (that is,
363    the corresponding element of the table is :obj:`None`, the corresponding
364    attribute's values will not be imputed.
365
366.. rubric:: Examples
367
368The following imputer predicts the missing attribute values using
369classification and regression trees with the minimum of 20 examples in a leaf.
370Part of `imputation-complex.py`_ (uses `bridges.tab`_):
371
372.. literalinclude:: code/imputation-complex.py
373    :lines: 74-76
374
375We could even use the same learner for discrete and continuous attributes,
376as :class:`Orange.classification.tree.TreeLearner` checks the class type
377and constructs regression or classification trees accordingly. The
378common parameters, such as the minimal number of
379examples in leaves, are used in both cases.
380
381You can also use different learning algorithms for discrete and
382continuous attributes. Probably a common setup will be to use
383:class:`Orange.classification.bayes.BayesLearner` for discrete and
384:class:`Orange.regression.mean.MeanLearner` (which
385just remembers the average) for continuous attributes. Part of
386`imputation-complex.py`_ (uses `bridges.tab`_):
387
388.. literalinclude:: code/imputation-complex.py
389    :lines: 91-94
390
391You can also construct an :class:`Imputer_model` yourself. You will do
392this if different attributes need different treatment. Brace for an
393example that will be a bit more complex. First we shall construct an
394:class:`Imputer_model` and initialize an empty list of models.
395The following code snippets are from
396`imputation-complex.py`_ (uses `bridges.tab`_):
397
398.. literalinclude:: code/imputation-complex.py
399    :lines: 108-109
400
401Attributes "LANES" and "T-OR-D" will always be imputed values 2 and
402"THROUGH". Since "LANES" is continuous, it suffices to construct a
403:obj:`DefaultClassifier` with the default value 2.0 (don't forget the
404decimal part, or else Orange will think you talk about an index of a discrete
405value - how could it tell?). For the discrete attribute "T-OR-D", we could
406construct a :class:`Orange.classification.ConstantClassifier` and give the index of value
407"THROUGH" as an argument. But we shall do it nicer, by constructing a
408:class:`Orange.data.Value`. Both classifiers will be stored at the appropriate places
409in :obj:`imputer.models`.
410
411.. literalinclude:: code/imputation-complex.py
412    :lines: 110-112
413
414
415"LENGTH" will be computed with a regression tree induced from "MATERIAL",
416"SPAN" and "ERECTED" (together with "LENGTH" as the class attribute, of
417course). Note that we initialized the domain by simply giving a list with
418the names of the attributes, with the domain as an additional argument
419in which Orange will look for the named attributes.
420
421.. literalinclude:: code/imputation-complex.py
422    :lines: 114-119
423
424We printed the tree just to see what it looks like.
425
426::
427
428    <XMP class=code>SPAN=SHORT: 1158
429    SPAN=LONG: 1907
430    SPAN=MEDIUM
431    |    ERECTED<1908.500: 1325
432    |    ERECTED>=1908.500: 1528
433    </XMP>
434
435Small and nice. Now for the "SPAN". Wooden bridges and walkways are short,
436while the others are mostly medium. This could be done by
437:class:`Orange.classifier.ClassifierByLookupTable` - this would be faster
438than what we plan here. See the corresponding documentation on lookup
439classifier. Here we are going to do it with a Python function.
440
441.. literalinclude:: code/imputation-complex.py
442    :lines: 121-128
443
444:obj:`compute_span` could also be written as a class, if you'd prefer
445it. It's important that it behaves like a classifier, that is, gets an example
446and returns a value. The second element tells, as usual, what the caller expect
447the classifier to return - a value, a distribution or both. Since the caller,
448:obj:`Imputer_model`, always wants values, we shall ignore the argument
449(at risk of having problems in the future when imputers might handle
450distribution as well).
451
452Missing values as special values
453================================
454
455Missing values sometimes have a special meaning. The fact that something was
456not measured can sometimes tell a lot. Be, however, cautious when using such
457values in decision models; it the decision not to measure something (for
458instance performing a laboratory test on a patient) is based on the expert's
459knowledge of the class value, such unknown values clearly should not be used
460in models.
461
462.. class:: ImputerConstructor_asValue
463
464    Constructs a new domain in which each
465    discrete attribute is replaced with a new attribute that has one value more:
466    "NA". The new attribute will compute its values on the fly from the old one,
467    copying the normal values and replacing the unknowns with "NA".
468
469    For continuous attributes, it will
470    construct a two-valued discrete attribute with values "def" and "undef",
471    telling whether the continuous attribute was defined or not. The attribute's
472    name will equal the original's with "_def" appended. The original continuous
473    attribute will remain in the domain and its unknowns will be replaced by
474    averages.
475
476    :class:`ImputerConstructor_asValue` has no specific attributes.
477
478    It constructs :class:`Imputer_asValue` (I bet you
479    wouldn't guess). It converts the example into the new domain, which imputes
480    the values for discrete attributes. If continuous attributes are present, it
481    will also replace their values by the averages.
482
483.. class:: Imputer_asValue
484
485    .. attribute:: domain
486
487        The domain with the new attributes constructed by
488        :class:`ImputerConstructor_asValue`.
489
490    .. attribute:: defaults
491
492        Default values for continuous attributes. Present only if there are any.
493
494The following code shows what this imputer actually does to the domain.
495Part of `imputation-complex.py`_ (uses `bridges.tab`_):
496
497.. literalinclude:: code/imputation-complex.py
498    :lines: 137-151
499
500
501The script's output looks like this::
502
503    [RIVER, ERECTED, PURPOSE, LENGTH, LANES, CLEAR-G, T-OR-D, MATERIAL, SPAN, REL-L, TYPE]
504
505    [RIVER, ERECTED_def, ERECTED, PURPOSE, LENGTH_def, LENGTH, LANES_def, LANES, CLEAR-G, T-OR-D, MATERIAL, SPAN, REL-L, TYPE]
506
507    RIVER: M -> M
508    ERECTED: 1874 -> 1874 (def)
509    PURPOSE: RR -> RR
510    LENGTH: ? -> 1567 (undef)
511    LANES: 2 -> 2 (def)
512    CLEAR-G: ? -> NA
513    T-OR-D: THROUGH -> THROUGH
514    MATERIAL: IRON -> IRON
515    SPAN: ? -> NA
516    REL-L: ? -> NA
517    TYPE: SIMPLE-T -> SIMPLE-T
518
519Seemingly, the two examples have the same attributes (with
520:samp:`imputed` having a few additional ones). If you check this by
521:samp:`original.domain[0] == imputed.domain[0]`, you shall see that this
522first glance is False. The attributes only have the same names,
523but they are different attributes. If you read this page (which is already a
524bit advanced), you know that Orange does not really care about the attribute
525names).
526
527Therefore, if we wrote :samp:`imputed[i]` the program would fail
528since :samp:`imputed` has no attribute :samp:`i`. But it has an
529attribute with the same name (which even usually has the same value). We
530therefore use :samp:`i.name` to index the attributes of
531:samp:`imputed`. (Using names for indexing is not fast, though; if you do
532it a lot, compute the integer index with
533:samp:`imputed.domain.index(i.name)`.)</P>
534
535For continuous attributes, there is an additional attribute with "_def"
536appended; we get it by :samp:`i.name+"_def"`.
537
538The first continuous attribute, "ERECTED" is defined. Its value remains 1874
539and the additional attribute "ERECTED_def" has value "def". Not so for
540"LENGTH". Its undefined value is replaced by the average (1567) and the new
541attribute has value "undef". The undefined discrete attribute "CLEAR-G" (and
542all other undefined discrete attributes) is assigned the value "NA".
543
544Using imputers
545==============
546
547To properly use the imputation classes in learning process, they must be
548trained on training examples only. Imputing the missing values and subsequently
549using the data set in cross-validation will give overly optimistic results.
550
551Learners with imputer as a component
552------------------------------------
553
554Orange learners that cannot handle missing values will generally provide a slot
555for the imputer component. An example of such a class is
556:obj:`Orange.classification.logreg.LogRegLearner` with an attribute called
557:obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor`. To it you
558can assign an imputer constructor - one of the above constructors or a specific
559constructor you wrote yourself. When given learning examples,
560:obj:`Orange.classification.logreg.LogRegLearner` will pass them to
561:obj:`Orange.classification.logreg.LogRegLearner.imputerConstructor` to get an
562imputer (again some of the above or a specific imputer you programmed). It will
563immediately use the imputer to impute the missing values in the learning data
564set, so it can be used by the actual learning algorithm. Besides, when the
565classifier :obj:`Orange.classification.logreg.LogRegClassifier` is constructed,
566the imputer will be stored in its attribute
567:obj:`Orange.classification.logreg.LogRegClassifier.imputer`. At
568classification, the imputer will be used for imputation of missing values in
569(testing) examples.
570
571Although details may vary from algorithm to algorithm, this is how the
572imputation is generally used in Orange's learners. Also, if you write your own
573learners, it is recommended that you use imputation according to the described
574procedure.
575
576Write your own imputer
577======================
578
579Imputation classes provide the Python-callback functionality (not all Orange
580classes do so, refer to the documentation on `subtyping the Orange classes
581in Python <callbacks.htm>`_ for a list). If you want to write your own
582imputation constructor or an imputer, you need to simply program a Python
583function that will behave like the built-in Orange classes (and even less,
584for imputer, you only need to write a function that gets an example as
585argument, imputation for example tables will then use that function).
586
587You will most often write the imputation constructor when you have a special
588imputation procedure or separate procedures for various attributes, as we've
589demonstrated in the description of
590:obj:`Orange.feature.imputation.ImputerConstructor_model`. You basically only
591need to pack everything we've written there to an imputer constructor that
592will accept a data set and the id of the weight meta-attribute (ignore it if
593you will, but you must accept two arguments), and return the imputer (probably
594:obj:`Orange.feature.imputation.Imputer_model`. The benefit of implementing an
595imputer constructor as opposed to what we did above is that you can use such a
596constructor as a component for Orange learners (like logistic regression) or
597for wrappers from module orngImpute, and that way properly use the in
598classifier testing procedures.
599
600.. _imputation-minimal-imputer.py: code/imputation-minimal-imputer.py
601.. _imputation-complex.py: code/imputation-complex.py
602.. _voting.tab: code/voting.tab
603.. _bridges.tab: code/bridges.tab
604
605"""
606
607import Orange.core as orange
608from orange import ImputerConstructor_minimal
609from orange import ImputerConstructor_maximal
610from orange import ImputerConstructor_average
611from orange import Imputer_defaults
612from orange import ImputerConstructor_model
613from orange import Imputer_model
614from orange import ImputerConstructor_asValue
615
616class ImputeLearner(orange.Learner):
617    def __new__(cls, examples = None, weightID = 0, **keyw):
618        self = orange.Learner.__new__(cls, **keyw)
619        self.dont_impute_classifier = False
620        self.__dict__.update(keyw)
621        if examples:
622            return self.__call__(examples, weightID)
623        else:
624            return self
625       
626    def __call__(self, data, weight=0):
627        trained_imputer = self.imputer_constructor(data, weight)
628        imputed_data = trained_imputer(data, weight)
629        base_classifier = self.base_learner(imputed_data, weight)
630        if self.dont_impute_classifier:
631            return base_classifier
632        else:
633            return ImputeClassifier(base_classifier, trained_imputer)
634
635class ImputeClassifier(orange.Classifier):
636    def __init__(self, base_classifier, imputer, **argkw):
637        self.base_classifier = base_classifier
638        self.imputer = imputer
639        self.__dict__.update(argkw)
640
641    def __call__(self, ex, what=orange.GetValue):
642        return self.base_classifier(self.imputer(ex), what)
Note: See TracBrowser for help on using the repository browser.