Changeset 9680:4689e76f99be in orange for orange/Orange/evaluation/reliability.py
 Timestamp:
 02/06/12 10:08:15 (2 years ago)
 Branch:
 default
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

orange/Orange/evaluation/reliability.py
r9663 r9680 1 """2 ########################################3 Reliability estimation (``reliability``)4 ########################################5 6 .. index:: Reliability Estimation7 8 .. index::9 single: reliability; Reliability Estimation for Regression10 11 *************************************12 Reliability Estimation for Regression13 *************************************14 15 This module includes different implementations of algorithm used for16 predicting reliability of single predictions. Most of the algorithm are taken17 from Comparison of approaches for estimating reliability of individual18 regression predictions, Zoran Bosnic 2008.19 20 Next example shows basic reliability estimation usage21 (:download:`reliabilitybasic.py <code/reliabilitybasic.py>`, uses :download:`housing.tab <code/housing.tab>`):22 23 .. literalinclude:: code/reliability_basic.py24 25 First we load our desired data table and choose on learner we want to use26 reliability estimation on. We also want to calculate only the Mahalanobis and27 local cross validation estimates with desired parameters. We learn our28 estimator on data, and estimate the reliability for first instance of data table.29 We output the estimates used and the numbers.30 31 We can also do reliability estimation on whole data table not only on single32 instance. Example shows us doing cross validation on the desired data table,33 using default reliability estimates, and at the ending output reliability34 estimates for the first instance of data table.35 (:download:`reliabilityrun.py <code/reliabilityrun.py>`, uses :download:`housing.tab <code/housing.tab>`):36 37 .. literalinclude:: code/reliabilityrun.py38 39 Reliability estimation methods are computationally quite hard so it may take40 a bit of time for this script to produce a result. In the above example we41 first create a learner that we're interested in, in this example42 knearestneighbors, and use it inside reliability learner and do cross43 validation to get the results. Now we output for the first example in the44 data table all the reliability estimates and their names.45 46 Reliability Methods47 ===================48 49 Sensitivity Analysis (SAvar and SAbias)50 51 .. autoclass:: SensitivityAnalysis52 53 Variance of bagged models (BAGV)54 55 .. autoclass:: BaggingVariance56 57 Local cross validation reliability estimate (LCV)58 59 .. autoclass:: LocalCrossValidation60 61 Local modeling of prediction error (CNK)62 63 .. autoclass:: CNeighbours64 65 Bagging variance cneighbours (BVCK)66 67 68 .. autoclass:: BaggingVarianceCNeighbours69 70 Mahalanobis distance71 72 73 .. autoclass:: Mahalanobis74 75 Mahalanobis to center76 77 78 .. autoclass:: MahalanobisToCenter79 80 Reliability estimate learner81 ============================82 83 .. autoclass:: Learner84 :members:85 86 Reliability estimation scoring methods87 ======================================88 89 .. autofunction:: get_pearson_r90 91 .. autofunction:: get_pearson_r_by_iterations92 93 .. autofunction:: get_spearman_r94 95 Referencing96 ===========97 98 There is a dictionary named :data:`METHOD_NAME` which has stored names of99 all the reliability estimates::100 101 METHOD_NAME = {0: "SAvar absolute", 1: "SAbias signed", 2: "SAbias absolute",102 3: "BAGV absolute", 4: "CNK signed", 5: "CNK absolute",103 6: "LCV absolute", 7: "BVCK_absolute", 8: "Mahalanobis absolute",104 10: "ICV"}105 106 and also two constants for saying whether the estimate is signed or it's an107 absolute value::108 109 SIGNED = 0110 ABSOLUTE = 1111 112 Example of usage113 ================114 115 Here we will walk through a bit longer example of how to use the reliability116 estimate module (:download:`reliabilitylong.py <code/reliabilitylong.py>`, uses :download:`prostate.tab <code/prostate.tab>`):117 118 .. literalinclude:: code/reliabilitylong.py119 :lines: 116120 121 After loading the Orange library we open out dataset. We chose to work with122 the kNNLearner, that also works on regression problems. Create out reliability123 estimate learner and test it with cross validation.124 Estimates are then compared using Pearson's coefficient to the prediction error.125 The pvalues are also computed::126 127 Estimate r p128 SAvar absolute 0.077 0.454129 SAbias signed 0.165 0.105130 SAbias absolute 0.099 0.333131 BAGV absolute 0.104 0.309132 CNK signed 0.233 0.021133 CNK absolute 0.057 0.579134 LCV absolute 0.069 0.504135 BVCK_absolute 0.092 0.368136 Mahalanobis absolute 0.091 0.375137 138 .. literalinclude:: code/reliabilitylong.py139 :lines: 1828140 141 Outputs::142 143 Estimate r p144 BAGV absolute 0.126 0.220145 CNK signed 0.233 0.021146 CNK absolute 0.057 0.579147 LCV absolute 0.069 0.504148 BVCK_absolute 0.105 0.305149 Mahalanobis absolute 0.091 0.375150 151 152 As you can see in the above code you can also chose with reliability estimation153 method do you want to use. You might want to do this to reduce computation time154 or because you think they don't perform good enough.155 156 157 References158 ==========159 160 Bosnic Z, Kononenko I (2007) `Estimation of individual prediction reliability using local161 sensitivity analysis. <http://www.springerlink.com/content/e27p2584387532g8/>`_162 *Applied Intelligence* 29(3), 187203.163 164 Bosnic Z, Kononenko I (2008) `Comparison of approaches for estimating reliability of165 individual regression predictions.166 <http://www.sciencedirect.com/science/article/pii/S0169023X08001080>`_167 *Data & Knowledge Engineering* 67(3), 504516.168 169 Bosnic Z, Kononenko I (2010) `Automatic selection of reliability estimates for individual170 regression predictions.171 <http://journals.cambridge.org/abstract_S0269888909990154>`_172 *The Knowledge Engineering Review* 25(1), 2747.173 174 """175 1 import Orange 176 2 … … 236 62 def get_pearson_r(res): 237 63 """ 238 Returns Pearsons coefficient between the prediction error and each of the 239 used reliability estimates. Function also return the pvalue of each of 64 :param res: results of evaluation, done using learners, 65 wrapped into :class:`Orange.evaluation.reliability.Classifier`. 66 :type res: :class:`Orange.evaluation.testing.ExperimentResults` 67 68 Return Pearson's coefficient between the prediction error and each of the 69 used reliability estimates. Also, return the pvalue of each of 240 70 the coefficients. 241 71 """ … … 256 86 def get_spearman_r(res): 257 87 """ 258 Returns Spearmans coefficient between the prediction error and each of the 259 used reliability estimates. Function also return the pvalue of each of 88 :param res: results of evaluation, done using learners, 89 wrapped into :class:`Orange.evaluation.reliability.Classifier`. 90 :type res: :class:`Orange.evaluation.testing.ExperimentResults` 91 92 Return Spearman's coefficient between the prediction error and each of the 93 used reliability estimates. Also, return the pvalue of each of 260 94 the coefficients. 261 95 """ … … 276 110 def get_pearson_r_by_iterations(res): 277 111 """ 278 Returns average Pearsons coefficient over all folds between prediction error 112 :param res: results of evaluation, done using learners, 113 wrapped into :class:`Orange.evaluation.reliability.Classifier`. 114 :type res: :class:`Orange.evaluation.testing.ExperimentResults` 115 116 Return average Pearson's coefficient over all folds between prediction error 279 117 and each of the used estimates. 280 118 """ … … 317 155 318 156 class Estimate: 157 """ 158 Reliability estimate. Contains attributes that describe the results of 159 reliability estimation. 160 161 .. attribute:: estimate 162 163 A numerical reliability estimate. 164 165 .. attribute:: signed_or_absolute 166 167 Determines whether the method used gives a signed or absolute result. 168 Has a value of either :obj:`SIGNED` or :obj:`ABSOLUTE`. 169 170 .. attribute:: method 171 172 An integer ID of reliability estimation method used. 173 174 .. attribute:: method_name 175 176 Name (string) of reliability estimation method used. 177 178 .. attribute:: icv_method 179 180 An integer ID of reliability estimation method that performed best, 181 as determined by ICV, and of which estimate is stored in the 182 :obj:`estimate` field. (:obj:`None` when ICV was not used.) 183 184 .. attribute:: icv_method_name 185 186 Name (string) of reliability estimation method that performed best, 187 as determined by ICV. (:obj:`None` when ICV was not used.) 188 189 """ 319 190 def __init__(self, estimate, signed_or_absolute, method, icv_method = 1): 320 191 self.estimate = estimate … … 374 245 """ 375 246 376 :param e: List of possible e values for SAvar and SAbias reliability estimates, the default value is [0.01, 0.1, 0.5, 1.0, 2.0]. 247 :param e: List of possible :math:`\epsilon` values for SAvar and SAbias 248 reliability estimates. 377 249 :type e: list of floats 378 250 379 251 :rtype: :class:`Orange.evaluation.reliability.SensitivityAnalysisClassifier` 380 252 381 To estimate the reliabilty for given example we extend the learning set 382 with given example and labeling it with :math:`K + \epsilon (l_{max}  l_{min})`, 383 where K denotes the initial prediction, :math:`\epsilon` is sensitivity parameter and 384 :math:`l_{min}` and :math:`l_{max}` denote lower and the upper bound of 385 the learning examples. After computing different sensitivity predictions 386 using different values of e, the prediction are combined into SAvar and SAbias. 387 SAbias can be used as signed estimate or as absolute value of SAbias. 253 To estimate the reliabilty for given instance, the learning set is extended 254 with this instance, labeled with :math:`K + \epsilon (l_{max}  l_{min})`, 255 where :math:`K` denotes the initial prediction, 256 :math:`\epsilon` is sensitivity parameter and :math:`l_{min}` and 257 :math:`l_{max}` denote lower and the upper bound of the learning examples 258 . After computing different sensitivity predictions using different 259 values of :math:`\epsilon`, the prediction are combined into SAvar and 260 SAbias. SAbias can be used as signed estimate or as absolute value of 261 SAbias. 388 262 389 263 :math:`SAvar = \\frac{\sum_{\epsilon \in E}(K_{\epsilon}  K_{\epsilon})}{E}` … … 454 328 """ 455 329 456 :param m: Number of bagg edmodels to be used with BAGV estimate330 :param m: Number of bagging models to be used with BAGV estimate 457 331 :type m: int 458 332 459 333 :rtype: :class:`Orange.evaluation.reliability.BaggingVarianceClassifier` 460 334 461 We construct m different bagging models of the original chosen learner and use462 th ose predictions (:math:`K_i, i = 1, ..., m`) of given example to calculate the variance, which we use as463 reliability estimator.335 :math:`m` different bagging models are constructed and used to estimate 336 the value of dependent variable for a given instance. The variance of 337 those predictions is used as a prediction reliability estimate. 464 338 465 339 :math:`BAGV = \\frac{1}{m} \sum_{i=1}^{m} (K_i  K)^2` 466 340 467 where 468 469 :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` 341 where :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` and :math:`K_i` are 342 predictions of individual constructed models. 470 343 471 344 """ … … 507 380 :rtype: :class:`Orange.evaluation.reliability.LocalCrossValidationClassifier` 508 381 509 We find k nearest neighbours to the given example and put them in 510 seperate dataset. On this dataset we do leave one out 511 validation using given model. Reliability estimate is then distance 512 weighted absolute prediction error. 513 514 1. define the set of k nearest neighours :math:`N = { (x_1, x_1),..., (x_k, c_k)}` 515 2. FOR EACH :math:`(x_i, c_i) \in N` 516 517 2.1. generare model M on :math:`N \\backslash (x_i, c_i)` 518 519 2.2. for :math:`(x_i, c_i)` compute LOO prediction :math:`K_i` 520 521 2.3. for :math:`(x_i, c_i)` compute LOO error :math:`E_i =  C_i  K_i ` 522 382 :math:`k` nearest neighbours to the given instance are found and put in 383 a separate data set. On this data set, a leaveoneout validation is 384 performed. Reliability estimate is then the distance weighted absolute 385 prediction error. 386 387 If a special value 0 is passed as :math:`k` (as is by default), 388 it is set as 1/20 of data set size (or 5, whichever is greater). 389 390 1. Determine the set of k nearest neighours :math:`N = { (x_1, c_1),..., 391 (x_k, c_k)}`. 392 2. On this set, compute leaveoneout predictions :math:`K_i` and 393 prediction errors :math:`E_i =  C_i  K_i `. 523 394 3. :math:`LCV(x) = \\frac{ \sum_{(x_i, c_i) \in N} d(x_i, x) * E_i }{ \sum_{(x_i, c_i) \in N} d(x_i, x) }` 524 395 … … 527 398 self.k = k 528 399 529 def __call__(self, examples, learner):400 def __call__(self, instances, learner): 530 401 nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor() 531 402 nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor() 532 403 533 404 distance_id = Orange.data.new_meta_id() 534 nearest_neighbours = nearest_neighbours_constructor( examples, 0, distance_id)405 nearest_neighbours = nearest_neighbours_constructor(instances, 0, distance_id) 535 406 536 407 if self.k == 0: 537 self.k = max(5, len( examples)/20)408 self.k = max(5, len(instances)/20) 538 409 539 410 return LocalCrossValidationClassifier(distance_id, nearest_neighbours, self.k, learner) … … 581 452 :rtype: :class:`Orange.evaluation.reliability.CNeighboursClassifier` 582 453 583 Estimate CNK is defined for unlabeled example as difference between584 average label of the nearest neighbours and the examples prediction. CNK can585 be used as a signed estimate or only as absolute value.454 CNK is defined for an unlabeled instance as a difference between average 455 label of its nearest neighbours and its prediction. CNK can be used as a 456 signed or absolute estimate. 586 457 587 458 :math:`CNK = \\frac{\sum_{i=1}^{k}C_i}{k}  K` 588 459 589 Where k denotes number of neighbors, C :sub:`i` denotes neighbours' labels and590 K denotes the example's prediction.460 where :math:`k` denotes number of neighbors, C :sub:`i` denotes neighbours' 461 labels and :math:`K` denotes the instance's prediction. 591 462 592 463 """ … … 627 498 """ 628 499 629 :param k: Number of nearest neighbours used in Mahalanobis estimate 500 :param k: Number of nearest neighbours used in Mahalanobis estimate. 630 501 :type k: int 631 502 632 503 :rtype: :class:`Orange.evaluation.reliability.MahalanobisClassifier` 633 504 634 Mahalanobis distance estimate is defined as `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ to the 635 k nearest neighbours of chosen example. 505 Mahalanobis distance reliability estimate is defined as 506 `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ 507 to the evaluated instance's :math:`k` nearest neighbours. 636 508 637 509 … … 665 537 :rtype: :class:`Orange.evaluation.reliability.MahalanobisToCenterClassifier` 666 538 667 Mahalanobis distance to center estimate is defined as `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ to the 668 centroid of the data. 539 Mahalanobis distance to center reliability estimate is defined as a 540 `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ 541 between the predicted instance and the centroid of the data. 669 542 670 543 … … 718 591 :rtype: :class:`Orange.evaluation.reliability.BaggingVarianceCNeighboursClassifier` 719 592 720 BVCK is a combination of Bagging variance and local modeling of prediction721 error, for this estimate we take the average of both.593 BVCK is a combination (average) of Bagging variance and local modeling of 594 prediction error. 722 595 723 596 """ … … 780 653 Reliability estimation wrapper around a learner we want to test. 781 654 Different reliability estimation algorithms can be used on the 782 chosen learner. This learner works as any other and can be used as one. 783 The only difference is when the classifier is called with a given 784 example instead of only return the value and probabilities, it also 785 attaches a list of reliability estimates to 786 :data:`probabilities.reliability_estimate`. 787 Each reliability estimate consists of a tuple 788 (estimate, signed_or_absolute, method). 789 790 :param box_learner: Learner we want to wrap into reliability estimation 655 chosen learner. This learner works as any other and can be used as one, 656 but it returns the classifier, wrapped into an instance of 657 :class:`Orange.evaluation.reliability.Classifier`. 658 659 :param box_learner: Learner we want to wrap into a reliability estimation 660 classifier. 791 661 :type box_learner: learner 792 662 … … 834 704 835 705 def internal_cross_validation(self, examples, folds=10): 836 """ Performs the ususal internal cross validation for getting the best 837 reliability estimate. It uses the reliability estimators defined in 838 estimators attribute. Returns the id of the method that scored the 839 best. """ 706 """ Perform the internal cross validation for getting the best 707 reliability estimate. It uses the reliability estimators defined in 708 estimators attribute. 709 710 Returns the id of the method that scored the best. 711 712 :param examples: Data instances to use for ICV. 713 :type examples: :class:`Orange.data.Table` 714 :param folds: number of folds for ICV. 715 :type folds: int 716 :rtype: int 717 718 """ 840 719 res = Orange.evaluation.testing.cross_validation([self], examples, folds=folds) 841 720 results = get_pearson_r(res) … … 844 723 845 724 def internal_cross_validation_testing(self, examples, folds=10): 846 """ Perform sinternal cross validation (as in Automatic selection of725 """ Perform internal cross validation (as in Automatic selection of 847 726 reliability estimates for individual regression predictions, 848 Zoran Bosnic 2010) and return id of the method 849 that scored best on this data. """ 727 Zoran Bosnic, 2010) and return id of the method 728 that scored best on this data. 729 730 :param examples: Data instances to use for ICV. 731 :type examples: :class:`Orange.data.Table` 732 :param folds: number of folds for ICV. 733 :type folds: int 734 :rtype: int 735 736 """ 850 737 cv_indices = Orange.core.MakeRandomIndicesCV(examples, folds) 851 738 … … 869 756 870 757 class Classifier: 758 """ 759 A reliability estimation wrapper for classifiers. 760 761 What distinguishes this classifier is that the returned probabilities (if 762 :obj:`Orange.classification.Classifier.GetProbabilities` or 763 :obj:`Orange.classification.Classifier.GetBoth` is passed) contain an 764 additional attribute :obj:`reliability_estimate`, which is an instance of 765 :class:`~Orange.evaluation.reliability.Estimate`. 766 767 """ 768 871 769 def __init__(self, examples, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds): 872 770 self.__dict__.update(kwds) … … 886 784 def __call__(self, example, result_type=Orange.core.GetValue): 887 785 """ 888 Classify and estimate a new instance. When you chose 889 Orange.core.GetBoth or Orange.core.getProbabilities, you can access 890 the reliability estimates inside probabilities.reliability_estimate. 786 Classify and estimate reliability of estimation for a new instance. 787 When :obj:`result_type` is set to 788 :obj:`Orange.classification.Classifier.GetBoth` or 789 :obj:`Orange.classification.Classifier.GetProbabilities`, 790 an additional attribute :obj:`reliability_estimate`, 791 which is an instance of 792 :class:`~Orange.evaluation.reliability.Estimate`, 793 is added to the distribution object. 891 794 892 795 :param instance: instance to be classified.
Note: See TracChangeset
for help on using the changeset viewer.