source: orange/Orange/doc/modules/orngImpute.htm @ 9671:a7b056375472

Revision 9671:a7b056375472, 5.8 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved orange to Orange (part 2)

Line 
1<html><HEAD>
2<LINK REL=StyleSheet HREF="../style.css" TYPE="text/css">
3</HEAD>
4<body>
5
6<h1>orngImpute: An Imputation Wrapper for Learning Algorithms</h1>
7<index name="modules+imputation">
8<index name="classifiers/with imputation">
9
10<P>This module used to be larger, but most of its code went into the Orange's core for various reasons. So now it only contains a wrapper to be used with learning algorithms that cannot handle missing values: it will impute the missing examples using the imputer, call the learning and, if the imputation is also needed by the classifier, wrap the resulting classifier into another wrapper that will impute the missing values in examples to be classified.</P>
11
12<P>Even so, the module is somewhat redundant, as all learners that cannot handle missing values should, in principle, provide the slots for imputer constructor. For instance, <code>orange.LogRegLearner</code> has an attribute <code>imputerConstructor</code>, and even if you don't set it, it will do some imputation by default.</P>
13
14<P>The module consists of two classes. First is <code><INDEX name="classes/ImputeLearner (in orngImpute)">ImputeLearner</code>. It is basically a learner, so the constructor will construct either an instance of <code>ImputerLearner</code> or, if called with examples, an instance of some classifier. There are a few attributes that need to be set, though.</P>
15
16<p class="section">Attributes</p>
17<dl class="attributes">
18<dt>baseLearner</dt>
19<dd>The wrapped learner.</dd>
20
21<dt>imputerConstructor</dt>
22<dd>An instance of a class derived from <a href="../reference/imputation.htm"><code>ImputerConstructor</code></a> (or a class with the same call operator).</dd>
23
24<dt>dontImputeClassifier</dt>
25<dd>If given and set (this attribute is optional), the classifier will not be wrapped into an imputer. Do this if the classifier doesn't mind if the examples it is given have missing values.</dd>
26</dl>
27
28<P>The learner is best illustrated by its code - here's its complete <code>__call__</code> operator.</P>
29<xmp class="code">    def __call__(self, data, weight=0):
30        trained_imputer = self.imputerConstructor(data, weight)
31        imputed_data = trained_imputer(data, weight)
32        baseClassifier = self.baseLearner(imputed_data, weight)
33        if self.dontImputeClassifier:
34            return baseClassifier
35        else:
36            return ImputeClassifier(baseClassifier, trained_imputer)
37</xmp>
38
39<P>So "learning" goes like this. <code>ImputeLearner</code> will first construct the imputer (that is, call <code>self.imputerConstructor</code> to get a (trained) imputer. Than it will use the imputer to impute the data, and call the given <code>baseLearner</code> to construct a classifier. For instance, <code>baseLearner</code> could be a learner for logistic regression and the result would be a logistic regression model. If the classifier can handle unknown values (that is, if <code>dontImputeClassifier</code>, we return it as it is, otherwise we wrap it into <code><INDEX name="classes/ImputeClassifier (in orngImpute)">ImputeClassifier</code>, which is given the base classifier and the imputer which it can use to impute the missing values in (testing) examples.</P>
40
41<P>The other class in the module is, of course, the classifier with imputation, <code>ImputeClassifier</code>.</P>
42
43<p class="section">Attributes</p>
44<dl class="attributes">
45<dt>baseClassifier</dt>
46<dd>The wrapped classifier</dd>
47
48<dt>imputer</dt>
49<dd>The imputer for imputation of unknown values</dd>
50</dl>
51
52<P>This class is even more trivial than the learner. Its constructor accepts two arguments, the classifier and the imputer, which are stored into the corresponding attributes. The call operator which does the classification then looks like this:</P>
53<xmp class="code">    def __call__(self, ex, what=orange.GetValue):
54        return self.baseClassifier(self.imputer(ex), what)
55</xmp>
56<P>It imputes the missing values by calling the imputer and passes the class to the base classifier.</P>
57
58<P>Note that in this setup the imputer is trained on the training data - even if you do cross validation, the imputer will be trained on the right data. In the classification phase we again use the imputer which was classified on the training data only.</P>
59
60<P>Now for an example. Although most Orange's learning algorithms will take care of imputation internally, if needed, it can sometime happen that an expert will be able to tell you exactly what to put in the data instead of the missing values. The <a href="../reference/imputation.htm">documentation on imputers</a> in the Reference Guide presents various classes for imputation, but for this example we shall suppose that we want to impute the minimal value of each attribute. We will try to determine whether the naive Bayesian classifier with its implicit internal imputation works better than one that uses imputation by minimal values.</P>
61
62<p class="header">part of <a href="imputation.py">imputation.py</a></p>
63<xmp class="code">import orange, orngImpute, orngTest, orngStat
64
65ba = orange.BayesLearner()
66imba = orngImpute.ImputeLearner(baseLearner = ba, imputerConstructor=orange.ImputerConstructor_minimal)
67
68data = orange.ExampleTable("voting")
69res = orngTest.crossValidation([ba, imba], data)
70CAs = orngStat.CA(res)
71
72print "Without imputation: %5.3f" % CAs[0]
73print "With imputation: %5.3f" % CAs[1]
74</xmp>
75
76<P>Note that we constructed just one instance of <code>orange.BayesLearner</code>, but this same instance is used twice in each fold, once it is given the examples as they are (and returns an instance of <code>orange.BayesClassifier</code>). The second time it is called by <code>imba</code> and the <code>orange.BayesClassifier</code> it returns is wrapped into <code>orngImputer.ImputeClassifier</code>. We thus have only one learner, but which produces two different classifiers in each round of testing.</P>
Note: See TracBrowser for help on using the repository browser.