# source:orange/orange/evaluation/reliability.py@9669:165371b04b4a

Revision 9669:165371b04b4a, 34.0 KB checked in by anze <anze.staric@…>, 2 years ago (diff)

Moved content of Orange dir to package dir

Line
1"""
2########################################
3Reliability estimation (reliability)
4########################################
5
6.. index:: Reliability Estimation
7
8.. index::
9   single: reliability; Reliability Estimation for Regression
10
11*************************************
12Reliability Estimation for Regression
13*************************************
14
15This module includes different implementations of algorithm used for
16predicting reliability of single predictions. Most of the algorithm are taken
17from Comparison of approaches for estimating reliability of individual
18regression predictions, Zoran Bosnic 2008.
19
20Next example shows basic reliability estimation usage
21(:download:reliability-basic.py <code/reliability-basic.py>, uses :download:housing.tab <code/housing.tab>):
22
23.. literalinclude:: code/reliability_basic.py
24
25First we load our desired data table and choose on learner we want to use
26reliability estimation on. We also want to calculate only the Mahalanobis and
27local cross validation estimates with desired parameters. We learn our
28estimator on data, and estimate the reliability for first instance of data table.
29We output the estimates used and the numbers.
30
31We can also do reliability estimation on whole data table not only on single
32instance. Example shows us doing cross validation on the desired data table,
33using default reliability estimates, and at the ending output reliability
34estimates for the first instance of data table.
35(:download:reliability-run.py <code/reliability-run.py>, uses :download:housing.tab <code/housing.tab>):
36
37.. literalinclude:: code/reliability-run.py
38
39Reliability estimation methods are computationally quite hard so it may take
40a bit of time for this script to produce a result. In the above example we
41first create a learner that we're interested in, in this example
42k-nearest-neighbors, and use it inside reliability learner and do cross
43validation to get the results. Now we output for the first example in the
44data table all the reliability estimates and their names.
45
46Reliability Methods
47===================
48
49Sensitivity Analysis (SAvar and SAbias)
50---------------------------------------
51.. autoclass:: SensitivityAnalysis
52
53Variance of bagged models (BAGV)
54--------------------------------
55.. autoclass:: BaggingVariance
56
57Local cross validation reliability estimate (LCV)
58-------------------------------------------------
59.. autoclass:: LocalCrossValidation
60
61Local modeling of prediction error (CNK)
62----------------------------------------
63.. autoclass:: CNeighbours
64
65Bagging variance c-neighbours (BVCK)
66------------------------------------
67
68.. autoclass:: BaggingVarianceCNeighbours
69
70Mahalanobis distance
71--------------------
72
73.. autoclass:: Mahalanobis
74
75Mahalanobis to center
76---------------------
77
78.. autoclass:: MahalanobisToCenter
79
80Reliability estimate learner
81============================
82
83.. autoclass:: Learner
84    :members:
85
86Reliability estimation scoring methods
87======================================
88
89.. autofunction:: get_pearson_r
90
91.. autofunction:: get_pearson_r_by_iterations
92
93.. autofunction:: get_spearman_r
94
95Referencing
96===========
97
98There is a dictionary named :data:METHOD_NAME which has stored names of
99all the reliability estimates::
100
101  METHOD_NAME = {0: "SAvar absolute", 1: "SAbias signed", 2: "SAbias absolute",
102                 3: "BAGV absolute", 4: "CNK signed", 5: "CNK absolute",
103                 6: "LCV absolute", 7: "BVCK_absolute", 8: "Mahalanobis absolute",
104                 10: "ICV"}
105
106and also two constants for saying whether the estimate is signed or it's an
107absolute value::
108
109  SIGNED = 0
110  ABSOLUTE = 1
111
112Example of usage
113================
114
115Here we will walk through a bit longer example of how to use the reliability
116estimate module (:download:reliability-long.py <code/reliability-long.py>, uses :download:prostate.tab <code/prostate.tab>):
117
118.. literalinclude:: code/reliability-long.py
119    :lines: 1-16
120
121After loading the Orange library we open out dataset. We chose to work with
122the kNNLearner, that also works on regression problems. Create out reliability
123estimate learner and test it with cross validation.
124Estimates are then compared using Pearson's coefficient to the prediction error.
125The p-values are also computed::
126
127  Estimate               r       p
128  SAvar absolute        -0.077   0.454
129  SAbias signed         -0.165   0.105
130  SAbias absolute       -0.099   0.333
131  BAGV absolute          0.104   0.309
132  CNK signed             0.233   0.021
133  CNK absolute           0.057   0.579
134  LCV absolute           0.069   0.504
135  BVCK_absolute          0.092   0.368
136  Mahalanobis absolute   0.091   0.375
137
138.. literalinclude:: code/reliability-long.py
139    :lines: 18-28
140
141Outputs::
142
143  Estimate               r       p
144  BAGV absolute          0.126   0.220
145  CNK signed             0.233   0.021
146  CNK absolute           0.057   0.579
147  LCV absolute           0.069   0.504
148  BVCK_absolute          0.105   0.305
149  Mahalanobis absolute   0.091   0.375
150
151
152As you can see in the above code you can also chose with reliability estimation
153method do you want to use. You might want to do this to reduce computation time
154or because you think they don't perform good enough.
155
156
157References
158==========
159
160Bosnic Z, Kononenko I (2007) Estimation of individual prediction reliability using local
162*Applied Intelligence* 29(3), 187-203.
163
164Bosnic Z, Kononenko I (2008) Comparison of approaches for estimating reliability of
165individual regression predictions.
166<http://www.sciencedirect.com/science/article/pii/S0169023X08001080>_
167*Data & Knowledge Engineering* 67(3), 504-516.
168
169Bosnic Z, Kononenko I (2010) Automatic selection of reliability estimates for individual
170regression predictions.
171<http://journals.cambridge.org/abstract_S0269888909990154>_
172*The Knowledge Engineering Review* 25(1), 27-47.
173
174"""
175import Orange
176
177import random
178import statc
179import math
180import warnings
181import numpy
182
183from collections import defaultdict
184from itertools import izip
185
186# Labels and final variables
187labels = ["SAvar", "SAbias", "BAGV", "CNK", "LCV", "BVCK", "Mahalanobis", "ICV"]
188
189"""
190# All the estimators calculation constants
191DO_SA = 0
192DO_BAGV = 1
193DO_CNK = 2
194DO_LCV = 3
195DO_BVCK = 4
196DO_MAHAL = 5
197"""
198
199# All the estimator method constants
200SAVAR_ABSOLUTE = 0
201SABIAS_SIGNED = 1
202SABIAS_ABSOLUTE = 2
203BAGV_ABSOLUTE = 3
204CNK_SIGNED = 4
205CNK_ABSOLUTE = 5
206LCV_ABSOLUTE = 6
207BVCK_ABSOLUTE = 7
208MAHAL_ABSOLUTE = 8
209BLENDING_ABSOLUTE = 9
210ICV_METHOD = 10
211MAHAL_TO_CENTER_ABSOLUTE = 13
212
213# Type of estimator constant
214SIGNED = 0
215ABSOLUTE = 1
216
217# Names of all the estimator methods
218METHOD_NAME = {0: "SAvar absolute", 1: "SAbias signed", 2: "SAbias absolute",
219               3: "BAGV absolute", 4: "CNK signed", 5: "CNK absolute",
220               6: "LCV absolute", 7: "BVCK_absolute", 8: "Mahalanobis absolute",
221               9: "BLENDING absolute", 10: "ICV", 11: "RF Variance", 12: "RF Std",
222               13: "Mahalanobis to center"}
223
224select_with_repeat = Orange.core.MakeRandomIndicesMultiple()
225select_with_repeat.random_generator = Orange.core.RandomGenerator()
226
227def get_reliability_estimation_list(res, i):
228    return [result.probabilities[0].reliability_estimate[i].estimate for result in res.results], res.results[0].probabilities[0].reliability_estimate[i].signed_or_absolute, res.results[0].probabilities[0].reliability_estimate[i].method
229
230def get_prediction_error_list(res):
231    return [result.actualClass - result.classes[0] for result in res.results]
232
233def get_description_list(res, i):
234    return [result.probabilities[0].reliability_estimate[i].text_description for result in res.results]
235
236def get_pearson_r(res):
237    """
238    Returns Pearsons coefficient between the prediction error and each of the
239    used reliability estimates. Function also return the p-value of each of
240    the coefficients.
241    """
242    prediction_error = get_prediction_error_list(res)
243    results = []
244    for i in xrange(len(res.results[0].probabilities[0].reliability_estimate)):
245        reliability_estimate, signed_or_absolute, method = get_reliability_estimation_list(res, i)
246        try:
247            if signed_or_absolute == SIGNED:
248                r, p = statc.pearsonr(prediction_error, reliability_estimate)
249            else:
250                r, p = statc.pearsonr([abs(pe) for pe in prediction_error], reliability_estimate)
251        except Exception:
252            r = p = float("NaN")
253        results.append((r, p, signed_or_absolute, method))
254    return results
255
256def get_spearman_r(res):
257    """
258    Returns Spearmans coefficient between the prediction error and each of the
259    used reliability estimates. Function also return the p-value of each of
260    the coefficients.
261    """
262    prediction_error = get_prediction_error_list(res)
263    results = []
264    for i in xrange(len(res.results[0].probabilities[0].reliability_estimate)):
265        reliability_estimate, signed_or_absolute, method = get_reliability_estimation_list(res, i)
266        try:
267            if signed_or_absolute == SIGNED:
268                r, p = statc.spearmanr(prediction_error, reliability_estimate)
269            else:
270                r, p = statc.spearmanr([abs(pe) for pe in prediction_error], reliability_estimate)
271        except Exception:
272            r = p = float("NaN")
273        results.append((r, p, signed_or_absolute, method))
274    return results
275
276def get_pearson_r_by_iterations(res):
277    """
278    Returns average Pearsons coefficient over all folds between prediction error
279    and each of the used estimates.
280    """
281    results_by_fold = Orange.evaluation.scoring.split_by_iterations(res)
282    number_of_estimates = len(res.results[0].probabilities[0].reliability_estimate)
283    number_of_examples = len(res.results)
284    number_of_folds = len(results_by_fold)
285    results = [0 for _ in xrange(number_of_estimates)]
286    sig = [0 for _ in xrange(number_of_estimates)]
287    method_list = [0 for _ in xrange(number_of_estimates)]
288
289    for res in results_by_fold:
290        prediction_error = get_prediction_error_list(res)
291        for i in xrange(number_of_estimates):
292            reliability_estimate, signed_or_absolute, method = get_reliability_estimation_list(res, i)
293            try:
294                if signed_or_absolute == SIGNED:
295                    r, _ = statc.pearsonr(prediction_error, reliability_estimate)
296                else:
297                    r, _ = statc.pearsonr([abs(pe) for pe in prediction_error], reliability_estimate)
298            except Exception:
299                r = float("NaN")
300            results[i] += r
301            sig[i] = signed_or_absolute
302            method_list[i] = method
303
304    # Calculate p-values
305    results = [float(res) / number_of_folds for res in results]
306    ps = [p_value_from_r(r, number_of_examples) for r in results]
307
308    return zip(results, ps, sig, method_list)
309
310def p_value_from_r(r, n):
311    """
312    Calculate p-value from the paerson coefficient and the sample size.
313    """
314    df = n - 2
315    t = r * (df /((-r + 1.0 + 1e-30) * (r + 1.0 + 1e-30)) )**0.5
316    return statc.betai (df * 0.5, 0.5, df/(df + t*t))
317
318class Estimate:
319    def __init__(self, estimate, signed_or_absolute, method, icv_method = -1):
320        self.estimate = estimate
321        self.signed_or_absolute = signed_or_absolute
322        self.method = method
323        self.method_name = METHOD_NAME[method]
324        self.icv_method = icv_method
325        self.icv_method_name = METHOD_NAME[icv_method] if icv_method != -1 else ""
326        self.text_description = None
327
328class DescriptiveAnalysis:
329    def __init__(self, estimator, desc=["high", "medium", "low"], procentage=[0.00, 0.33, 0.66]):
330        self.desc = desc
331        self.procentage = procentage
332        self.estimator = estimator
333
334    def __call__(self, examples, weight=None, **kwds):
335
336        # Calculate borders using cross validation
337        res = Orange.evaluation.testing.cross_validation([self.estimator], examples)
338        all_borders = []
339        for i in xrange(len(res.results[0].probabilities[0].reliability_estimate)):
340            estimates, signed_or_absolute, method = get_reliability_estimation_list(res, i)
341            sorted_estimates = sorted( abs(x) for x in estimates)
342            borders = [sorted_estimates[int(len(estimates)*p)-1]  for p in self.procentage]
343            all_borders.append(borders)
344
345        # Learn on whole train data
346        estimator_classifier = self.estimator(examples)
347
348        return DescriptiveAnalysisClassifier(estimator_classifier, all_borders, self.desc)
349
350class DescriptiveAnalysisClassifier:
351    def __init__(self, estimator_classifier, all_borders, desc):
352        self.estimator_classifier = estimator_classifier
353        self.all_borders = all_borders
354        self.desc = desc
355
356    def __call__(self, example, result_type=Orange.core.GetValue):
357        predicted, probabilities = self.estimator_classifier(example, Orange.core.GetBoth)
358
359        for borders, estimate in zip(self.all_borders, probabilities.reliability_estimate):
360            estimate.text_description = self.desc[0]
361            for lower_border, text_desc in zip(borders, self.desc):
362                if estimate.estimate >= lower_border:
363                    estimate.text_description = text_desc
364
365        # Return the appropriate type of result
366        if result_type == Orange.core.GetValue:
367            return predicted
368        elif result_type == Orange.core.GetProbabilities:
369            return probabilities
370        else:
371            return predicted, probabilities
372
373class SensitivityAnalysis:
374    """
375
376    :param e: List of possible e values for SAvar and SAbias reliability estimates, the default value is [0.01, 0.1, 0.5, 1.0, 2.0].
377    :type e: list of floats
378
379    :rtype: :class:Orange.evaluation.reliability.SensitivityAnalysisClassifier
380
381    To estimate the reliabilty for given example we extend the learning set
382    with given example and labeling it with :math:K + \epsilon (l_{max} - l_{min}),
383    where K denotes the initial prediction, :math:\epsilon is sensitivity parameter and
384    :math:l_{min} and :math:l_{max} denote lower and the upper bound of
385    the learning examples. After computing different sensitivity predictions
386    using different values of e, the prediction are combined into SAvar and SAbias.
387    SAbias can be used as signed estimate or as absolute value of SAbias.
388
389    :math:SAvar = \\frac{\sum_{\epsilon \in E}(K_{\epsilon} - K_{-\epsilon})}{|E|}
390
391    :math:SAbias = \\frac{\sum_{\epsilon \in E} (K_{\epsilon} - K ) + (K_{-\epsilon} - K)}{2 |E|}
392
393
394    """
395    def __init__(self, e=[0.01, 0.1, 0.5, 1.0, 2.0]):
396        self.e = e
397
398    def __call__(self, examples, learner):
399        min_value = max_value = examples[0].getclass().value
400        for ex in examples:
401            if ex.getclass().value > max_value:
402                max_value = ex.getclass().value
403            if ex.getclass().value < min_value:
404                min_value = ex.getclass().value
405        return SensitivityAnalysisClassifier(self.e, examples, min_value, max_value, learner)
406
407class SensitivityAnalysisClassifier:
408    def __init__(self, e, examples, min_value, max_value, learner):
409        self.e = e
410        self.examples = examples
411        self.max_value = max_value
412        self.min_value = min_value
413        self.learner = learner
414
415    def __call__(self, example, predicted, probabilities):
416        # Create new dataset
417        r_data = Orange.data.Table(self.examples)
418
419        # Create new example
420        modified_example = Orange.data.Instance(example)
421
422        # Append it to the data
423        r_data.append(modified_example)
424
425        # Calculate SAvar & SAbias
426        SAvar = SAbias = 0
427
428        for eps in self.e:
429            # +epsilon
430            r_data[-1].setclass(predicted.value + eps*(self.max_value - self.min_value))
431            c = self.learner(r_data)
432            k_plus = c(example, Orange.core.GetValue)
433
434            # -epsilon
435            r_data[-1].setclass(predicted.value - eps*(self.max_value - self.min_value))
436            c = self.learner(r_data)
437            k_minus = c(example, Orange.core.GetValue)
438            #print len(r_data)
439            #print eps*(self.max_value - self.min_value)
440            #print k_plus
441            #print k_minus
442            # calculate part SAvar and SAbias
443            SAvar += k_plus.value - k_minus.value
444            SAbias += k_plus.value + k_minus.value - 2*predicted.value
445
446        SAvar /= len(self.e)
447        SAbias /= 2*len(self.e)
448
449        return [Estimate(SAvar, ABSOLUTE, SAVAR_ABSOLUTE),
450                Estimate(SAbias, SIGNED, SABIAS_SIGNED),
451                Estimate(abs(SAbias), ABSOLUTE, SABIAS_ABSOLUTE)]
452
453class BaggingVariance:
454    """
455
456    :param m: Number of bagged models to be used with BAGV estimate
457    :type m: int
458
459    :rtype: :class:Orange.evaluation.reliability.BaggingVarianceClassifier
460
461    We construct m different bagging models of the original chosen learner and use
462    those predictions (:math:K_i, i = 1, ..., m) of given example to calculate the variance, which we use as
463    reliability estimator.
464
465    :math:BAGV = \\frac{1}{m} \sum_{i=1}^{m} (K_i - K)^2
466
467    where
468
469    :math:K = \\frac{\sum_{i=1}^{m} K_i}{m}
470
471    """
472    def __init__(self, m=50):
473        self.m = m
474
475    def __call__(self, examples, learner):
476        classifiers = []
477
478        # Create bagged classifiers using sampling with replacement
479        for _ in xrange(self.m):
480            selection = select_with_repeat(len(examples))
481            data = examples.select(selection)
482            classifiers.append(learner(data))
483        return BaggingVarianceClassifier(classifiers)
484
485class BaggingVarianceClassifier:
486    def __init__(self, classifiers):
487        self.classifiers = classifiers
488
489    def __call__(self, example, *args):
490        BAGV = 0
491
492        # Calculate the bagging variance
493        bagged_values = [c(example, Orange.core.GetValue).value for c in self.classifiers if c is not None]
494
495        k = sum(bagged_values) / len(bagged_values)
496
497        BAGV = sum( (bagged_value - k)**2 for bagged_value in bagged_values) / len(bagged_values)
498
499        return [Estimate(BAGV, ABSOLUTE, BAGV_ABSOLUTE)]
500
501class LocalCrossValidation:
502    """
503
504    :param k: Number of nearest neighbours used in LCV estimate
505    :type k: int
506
507    :rtype: :class:Orange.evaluation.reliability.LocalCrossValidationClassifier
508
509    We find k nearest neighbours to the given example and put them in
510    seperate dataset. On this dataset we do leave one out
511    validation using given model. Reliability estimate is then distance
512    weighted absolute prediction error.
513
514    1. define the set of k nearest neighours :math:N = { (x_1, x_1),..., (x_k, c_k)}
515    2. FOR EACH :math:(x_i, c_i) \in N
516
517      2.1. generare model M on :math:N \\backslash (x_i, c_i)
518
519      2.2. for :math:(x_i, c_i) compute LOO prediction :math:K_i
520
521      2.3. for :math:(x_i, c_i) compute LOO error :math:E_i = | C_i - K_i |
522
523    3. :math:LCV(x) = \\frac{ \sum_{(x_i, c_i) \in N} d(x_i, x) * E_i }{ \sum_{(x_i, c_i) \in N} d(x_i, x) }
524
525    """
526    def __init__(self, k=0):
527        self.k = k
528
529    def __call__(self, examples, learner):
530        nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor()
531        nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor()
532
533        distance_id = Orange.data.new_meta_id()
534        nearest_neighbours = nearest_neighbours_constructor(examples, 0, distance_id)
535
536        if self.k == 0:
537            self.k = max(5, len(examples)/20)
538
539        return LocalCrossValidationClassifier(distance_id, nearest_neighbours, self.k, learner)
540
541class LocalCrossValidationClassifier:
542    def __init__(self, distance_id, nearest_neighbours, k, learner):
543        self.distance_id = distance_id
544        self.nearest_neighbours = nearest_neighbours
545        self.k = k
546        self.learner = learner
547
548    def __call__(self, example, *args):
549        LCVer = 0
550        LCVdi = 0
551
552        # Find k nearest neighbors
553
554        knn = [ex for ex in self.nearest_neighbours(example, self.k)]
555
556        # leave one out of prediction error
557        for i in xrange(len(knn)):
558            train = knn[:]
559            del train[i]
560
561            classifier = self.learner(Orange.data.Table(train))
562
563            returned_value = classifier(knn[i], Orange.core.GetValue)
564
565            e = abs(knn[i].getclass().value - returned_value.value)
566
567            LCVer += e * math.exp(-knn[i][self.distance_id])
568            LCVdi += math.exp(-knn[i][self.distance_id])
569
570        LCV = LCVer / LCVdi if LCVdi != 0 else 0
571        if math.isnan(LCV):
572            LCV = 0.0
573        return [ Estimate(LCV, ABSOLUTE, LCV_ABSOLUTE) ]
574
575class CNeighbours:
576    """
577
578    :param k: Number of nearest neighbours used in CNK estimate
579    :type k: int
580
581    :rtype: :class:Orange.evaluation.reliability.CNeighboursClassifier
582
583    Estimate CNK is defined for unlabeled example as difference between
584    average label of the nearest neighbours and the examples prediction. CNK can
585    be used as a signed estimate or only as absolute value.
586
587    :math:CNK = \\frac{\sum_{i=1}^{k}C_i}{k} - K
588
589    Where k denotes number of neighbors, C :sub:i denotes neighbours' labels and
590    K denotes the example's prediction.
591
592    """
593    def __init__(self, k=5):
594        self.k = k
595
596    def __call__(self, examples, learner):
597        nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor()
598        nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor()
599
600        distance_id = Orange.data.new_meta_id()
601        nearest_neighbours = nearest_neighbours_constructor(examples, 0, distance_id)
602        return CNeighboursClassifier(nearest_neighbours, self.k)
603
604class CNeighboursClassifier:
605    def __init__(self, nearest_neighbours, k):
606        self.nearest_neighbours = nearest_neighbours
607        self.k = k
608
609    def __call__(self, example, predicted, probabilities):
610        CNK = 0
611
612        # Find k nearest neighbors
613
614        knn = [ex for ex in self.nearest_neighbours(example, self.k)]
615
616        # average label of neighbors
617        for ex in knn:
618            CNK += ex.getclass().value
619
620        CNK /= self.k
621        CNK -= predicted.value
622
623        return [Estimate(CNK, SIGNED, CNK_SIGNED),
624                Estimate(abs(CNK), ABSOLUTE, CNK_ABSOLUTE)]
625
626class Mahalanobis:
627    """
628
629    :param k: Number of nearest neighbours used in Mahalanobis estimate
630    :type k: int
631
632    :rtype: :class:Orange.evaluation.reliability.MahalanobisClassifier
633
634    Mahalanobis distance estimate is defined as mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>_ to the
635    k nearest neighbours of chosen example.
636
637
638    """
639    def __init__(self, k=3):
640        self.k = k
641
642    def __call__(self, examples, *args):
643        nnm = Orange.classification.knn.FindNearestConstructor()
644        nnm.distanceConstructor = Orange.distance.MahalanobisConstructor()
645
646        mid = Orange.data.new_meta_id()
647        nnm = nnm(examples, 0, mid)
648        return MahalanobisClassifier(self.k, nnm, mid)
649
650class MahalanobisClassifier:
651    def __init__(self, k, nnm, mid):
652        self.k = k
653        self.nnm = nnm
654        self.mid = mid
655
656    def __call__(self, example, *args):
657        mahalanobis_distance = 0
658
659        mahalanobis_distance = sum(ex[self.mid].value for ex in self.nnm(example, self.k))
660
661        return [ Estimate(mahalanobis_distance, ABSOLUTE, MAHAL_ABSOLUTE) ]
662
663class MahalanobisToCenter:
664    """
665    :rtype: :class:Orange.evaluation.reliability.MahalanobisToCenterClassifier
666
667    Mahalanobis distance to center estimate is defined as mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>_ to the
668    centroid of the data.
669
670
671    """
672    def __init__(self):
673        pass
674
675    def __call__(self, examples, *args):
676        dc = Orange.core.DomainContinuizer()
677        dc.classTreatment = Orange.core.DomainContinuizer.Ignore
678        dc.continuousTreatment = Orange.core.DomainContinuizer.NormalizeBySpan
679        dc.multinomialTreatment = Orange.core.DomainContinuizer.NValues
680
681        new_domain = dc(examples)
682        new_examples = examples.translate(new_domain)
683
684        X, _, _ = new_examples.to_numpy()
685        example_avg = numpy.average(X, 0)
686
687        distance_constructor = Orange.distance.MahalanobisConstructor()
688        distance = distance_constructor(new_examples)
689
690        average_example = Orange.data.Instance(new_examples.domain, list(example_avg) + ["?"])
691
692        return MahalanobisToCenterClassifier(distance, average_example, new_domain)
693
694class MahalanobisToCenterClassifier:
695    def __init__(self, distance, average_example, new_domain):
696        self.distance = distance
697        self.average_example = average_example
698        self.new_domain = new_domain
699
700    def __call__(self, example, *args):
701
702        ex = Orange.data.Instance(self.new_domain, example)
703
704        mahalanobis_to_center = self.distance(ex, self.average_example)
705
706        return [ Estimate(mahalanobis_to_center, ABSOLUTE, MAHAL_TO_CENTER_ABSOLUTE) ]
707
708
709class BaggingVarianceCNeighbours:
710    """
711
712    :param bagv: Instance of Bagging Variance estimator.
713    :type bagv: :class:Orange.evaluation.reliability.BaggingVariance
714
715    :param cnk: Instance of CNK estimator.
716    :type cnk: :class:Orange.evaluation.reliability.CNeighbours
717
718    :rtype: :class:Orange.evaluation.reliability.BaggingVarianceCNeighboursClassifier
719
720    BVCK is a combination of Bagging variance and local modeling of prediction
721    error, for this estimate we take the average of both.
722
723    """
724    def __init__(self, bagv=BaggingVariance(), cnk=CNeighbours()):
725        self.bagv = bagv
726        self.cnk = cnk
727
728    def __call__(self, examples, learner):
729        bagv_classifier = self.bagv(examples, learner)
730        cnk_classifier = self.cnk(examples, learner)
731        return BaggingVarianceCNeighboursClassifier(bagv_classifier, cnk_classifier)
732
733class BaggingVarianceCNeighboursClassifier:
734    def __init__(self, bagv_classifier, cnk_classifier):
735        self.bagv_classifier = bagv_classifier
736        self.cnk_classifier = cnk_classifier
737
738    def __call__(self, example, predicted, probabilities):
739        bagv_estimates = self.bagv_classifier(example, predicted, probabilities)
740        cnk_estimates = self.cnk_classifier(example, predicted, probabilities)
741
742        bvck_value = (bagv_estimates[0].estimate + cnk_estimates[1].estimate)/2
743        bvck_estimates = [ Estimate(bvck_value, ABSOLUTE, BVCK_ABSOLUTE) ]
744        bvck_estimates.extend(bagv_estimates)
745        bvck_estimates.extend(cnk_estimates)
746        return bvck_estimates
747
748class ErrorPredicting:
749    def __init__(self):
750        pass
751
752    def __call__(self, examples, learner):
753        res = Orange.evaluation.testing.cross_validation([learner], examples)
754        prediction_errors = get_prediction_error_list(res)
755
756        new_domain = Orange.data.Domain(examples.domain.attributes, Orange.core.FloatVariable("pe"))
757        new_dataset = Orange.data.Table(new_domain, examples)
758
759        for example, prediction_error in izip(new_dataset, prediction_errors):
760            example.set_class(prediction_error)
761
762        rf = Orange.ensemble.forest.RandomForestLearner()
763        rf_classifier = rf(new_dataset)
764
765        return ErrorPredictingClassification(rf_classifier, new_domain)
766
767class ErrorPredictingClassification:
768    def __init__(self, rf_classifier, new_domain):
769        self.rf_classifier = rf_classifier
770        self.new_domain = new_domain
771
772    def __call__(self, example, predicted, probabilities):
773        new_example = Orange.data.Instance(self.new_domain, example)
774        value = self.rf_classifier(new_example, Orange.core.GetValue)
775
776        return [Estimate(value.value, SIGNED, SABIAS_SIGNED)]
777
778class Learner:
779    """
780    Reliability estimation wrapper around a learner we want to test.
781    Different reliability estimation algorithms can be used on the
782    chosen learner. This learner works as any other and can be used as one.
783    The only difference is when the classifier is called with a given
784    example instead of only return the value and probabilities, it also
785    attaches a list of reliability estimates to
786    :data:probabilities.reliability_estimate.
787    Each reliability estimate consists of a tuple
788    (estimate, signed_or_absolute, method).
789
790    :param box_learner: Learner we want to wrap into reliability estimation
791    :type box_learner: learner
792
793    :param estimators: List of different reliability estimation methods we
794                       want to use on the chosen learner.
795    :type estimators: list of reliability estimators
796
797    :param name: Name of this reliability learner
798    :type name: string
799
800    :rtype: :class:Orange.evaluation.reliability.Learner
801    """
802    def __init__(self, box_learner, name="Reliability estimation",
803                 estimators = [SensitivityAnalysis(),
804                               LocalCrossValidation(),
805                               BaggingVarianceCNeighbours(),
806                               Mahalanobis(),
807                               MahalanobisToCenter()
808                               ],
809                 **kwds):
810        self.__dict__.update(kwds)
811        self.name = name
812        self.estimators = estimators
813        self.box_learner = box_learner
814        self.blending = False
815
816
817    def __call__(self, examples, weight=None, **kwds):
818        """Learn from the given table of data instances.
819
820        :param instances: Data instances to learn from.
821        :type instances: Orange.data.Table
822        :param weight: Id of meta attribute with weights of instances
823        :type weight: integer
824        :rtype: :class:Orange.evaluation.reliability.Classifier
825        """
826
827        blending_classifier = None
828        new_domain = None
829
830        if examples.domain.class_var.var_type != Orange.data.variable.Continuous.Continuous:
831            raise Exception("This method only works on data with continuous class.")
832
833        return Classifier(examples, self.box_learner, self.estimators, self.blending, new_domain, blending_classifier)
834
835    def internal_cross_validation(self, examples, folds=10):
836        """ Performs the ususal internal cross validation for getting the best
837        reliability estimate. It uses the reliability estimators defined in
838        estimators attribute. Returns the id of the method that scored the
839        best. """
840        res = Orange.evaluation.testing.cross_validation([self], examples, folds=folds)
841        results = get_pearson_r(res)
842        sorted_results = sorted(results)
843        return sorted_results[-1][3]
844
845    def internal_cross_validation_testing(self, examples, folds=10):
846        """ Performs internal cross validation (as in Automatic selection of
847        reliability estimates for individual regression predictions,
848        Zoran Bosnic 2010) and return id of the method
849        that scored best on this data. """
850        cv_indices = Orange.core.MakeRandomIndicesCV(examples, folds)
851
852        list_of_rs = []
853
854        sum_of_rs = defaultdict(float)
855
856        for fold in xrange(folds):
857            data = examples.select(cv_indices, fold)
858            if len(data) < 10:
859                res = Orange.evaluation.testing.leave_one_out([self], data)
860            else:
861                res = Orange.evaluation.testing.cross_validation([self], data)
862            results = get_pearson_r(res)
863            for r, _, _, method in results:
864                sum_of_rs[method] += r
865        sorted_sum_of_rs = sorted(sum_of_rs.items(), key=lambda estimate: estimate[1], reverse=True)
866        return sorted_sum_of_rs[0][0]
867
868    labels = ["SAvar", "SAbias", "BAGV", "CNK", "LCV", "BVCK", "Mahalanobis", "ICV"]
869
870class Classifier:
871    def __init__(self, examples, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds):
872        self.__dict__.update(kwds)
873        self.examples = examples
874        self.box_learner = box_learner
875        self.estimators = estimators
876        self.blending = blending
877        self.blending_domain = blending_domain
878        self.rf_classifier = rf_classifier
879
880        # Train the learner with original data
881        self.classifier = box_learner(examples)
882
883        # Train all the estimators and create their classifiers
884        self.estimation_classifiers = [estimator(examples, box_learner) for estimator in estimators]
885
886    def __call__(self, example, result_type=Orange.core.GetValue):
887        """
888        Classify and estimate a new instance. When you chose
889        Orange.core.GetBoth or Orange.core.getProbabilities, you can access
890        the reliability estimates inside probabilities.reliability_estimate.
891
892        :param instance: instance to be classified.
893        :type instance: :class:Orange.data.Instance
894        :param result_type: :class:Orange.classification.Classifier.GetValue or \
895              :class:Orange.classification.Classifier.GetProbabilities or
896              :class:Orange.classification.Classifier.GetBoth
897
898        :rtype: :class:Orange.data.Value,
899              :class:Orange.statistics.Distribution or a tuple with both
900        """
901        predicted, probabilities = self.classifier(example, Orange.core.GetBoth)
902
903        # Create a place holder for estimates
904        if probabilities is None:
905            probabilities = Orange.statistics.distribution.Continuous()
906        #with warnings.catch_warnings():
907        #    warnings.simplefilter("ignore")
908        probabilities.setattr('reliability_estimate', [])
909
910        # Calculate all the estimates and add them to the results
911        for estimate in self.estimation_classifiers:
912            probabilities.reliability_estimate.extend(estimate(example, predicted, probabilities))
913
914        # Return the appropriate type of result
915        if result_type == Orange.core.GetValue:
916            return predicted
917        elif result_type == Orange.core.GetProbabilities:
918            return probabilities
919        else:
920            return predicted, probabilities
Note: See TracBrowser for help on using the repository browser.