Changeset 9684:323e440e4272 in orange
 Timestamp:
 02/06/12 11:30:29 (2 years ago)
 Branch:
 default
 Parents:
 9683:c52ceca4a985 (diff), 9672:3b64ad491c7e (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.  Files:

 2150 deleted
 1 edited
Legend:
 Unmodified
 Added
 Removed

Orange/evaluation/reliability.py
r9671 r9684 1 """2 ########################################3 Reliability estimation (``reliability``)4 ########################################5 6 .. index:: Reliability Estimation7 8 .. index::9 single: reliability; Reliability Estimation for Regression10 11 *************************************12 Reliability Estimation for Regression13 *************************************14 15 This module includes different implementations of algorithm used for16 predicting reliability of single predictions. Most of the algorithm are taken17 from Comparison of approaches for estimating reliability of individual18 regression predictions, Zoran Bosnic 2008.19 20 Next example shows basic reliability estimation usage21 (:download:`reliabilitybasic.py <code/reliabilitybasic.py>`, uses :download:`housing.tab <code/housing.tab>`):22 23 .. literalinclude:: code/reliability_basic.py24 25 First we load our desired data table and choose on learner we want to use26 reliability estimation on. We also want to calculate only the Mahalanobis and27 local cross validation estimates with desired parameters. We learn our28 estimator on data, and estimate the reliability for first instance of data table.29 We output the estimates used and the numbers.30 31 We can also do reliability estimation on whole data table not only on single32 instance. Example shows us doing cross validation on the desired data table,33 using default reliability estimates, and at the ending output reliability34 estimates for the first instance of data table.35 (:download:`reliabilityrun.py <code/reliabilityrun.py>`, uses :download:`housing.tab <code/housing.tab>`):36 37 .. literalinclude:: code/reliabilityrun.py38 39 Reliability estimation methods are computationally quite hard so it may take40 a bit of time for this script to produce a result. In the above example we41 first create a learner that we're interested in, in this example42 knearestneighbors, and use it inside reliability learner and do cross43 validation to get the results. Now we output for the first example in the44 data table all the reliability estimates and their names.45 46 Reliability Methods47 ===================48 49 Sensitivity Analysis (SAvar and SAbias)50 51 .. autoclass:: SensitivityAnalysis52 53 Variance of bagged models (BAGV)54 55 .. autoclass:: BaggingVariance56 57 Local cross validation reliability estimate (LCV)58 59 .. autoclass:: LocalCrossValidation60 61 Local modeling of prediction error (CNK)62 63 .. autoclass:: CNeighbours64 65 Bagging variance cneighbours (BVCK)66 67 68 .. autoclass:: BaggingVarianceCNeighbours69 70 Mahalanobis distance71 72 73 .. autoclass:: Mahalanobis74 75 Mahalanobis to center76 77 78 .. autoclass:: MahalanobisToCenter79 80 Reliability estimate learner81 ============================82 83 .. autoclass:: Learner84 :members:85 86 Reliability estimation scoring methods87 ======================================88 89 .. autofunction:: get_pearson_r90 91 .. autofunction:: get_pearson_r_by_iterations92 93 .. autofunction:: get_spearman_r94 95 Referencing96 ===========97 98 There is a dictionary named :data:`METHOD_NAME` which has stored names of99 all the reliability estimates::100 101 METHOD_NAME = {0: "SAvar absolute", 1: "SAbias signed", 2: "SAbias absolute",102 3: "BAGV absolute", 4: "CNK signed", 5: "CNK absolute",103 6: "LCV absolute", 7: "BVCK_absolute", 8: "Mahalanobis absolute",104 10: "ICV"}105 106 and also two constants for saying whether the estimate is signed or it's an107 absolute value::108 109 SIGNED = 0110 ABSOLUTE = 1111 112 Example of usage113 ================114 115 Here we will walk through a bit longer example of how to use the reliability116 estimate module (:download:`reliabilitylong.py <code/reliabilitylong.py>`, uses :download:`prostate.tab <code/prostate.tab>`):117 118 .. literalinclude:: code/reliabilitylong.py119 :lines: 116120 121 After loading the Orange library we open out dataset. We chose to work with122 the kNNLearner, that also works on regression problems. Create out reliability123 estimate learner and test it with cross validation.124 Estimates are then compared using Pearson's coefficient to the prediction error.125 The pvalues are also computed::126 127 Estimate r p128 SAvar absolute 0.077 0.454129 SAbias signed 0.165 0.105130 SAbias absolute 0.099 0.333131 BAGV absolute 0.104 0.309132 CNK signed 0.233 0.021133 CNK absolute 0.057 0.579134 LCV absolute 0.069 0.504135 BVCK_absolute 0.092 0.368136 Mahalanobis absolute 0.091 0.375137 138 .. literalinclude:: code/reliabilitylong.py139 :lines: 1828140 141 Outputs::142 143 Estimate r p144 BAGV absolute 0.126 0.220145 CNK signed 0.233 0.021146 CNK absolute 0.057 0.579147 LCV absolute 0.069 0.504148 BVCK_absolute 0.105 0.305149 Mahalanobis absolute 0.091 0.375150 151 152 As you can see in the above code you can also chose with reliability estimation153 method do you want to use. You might want to do this to reduce computation time154 or because you think they don't perform good enough.155 156 157 References158 ==========159 160 Bosnic Z, Kononenko I (2007) `Estimation of individual prediction reliability using local161 sensitivity analysis. <http://www.springerlink.com/content/e27p2584387532g8/>`_162 *Applied Intelligence* 29(3), 187203.163 164 Bosnic Z, Kononenko I (2008) `Comparison of approaches for estimating reliability of165 individual regression predictions.166 <http://www.sciencedirect.com/science/article/pii/S0169023X08001080>`_167 *Data & Knowledge Engineering* 67(3), 504516.168 169 Bosnic Z, Kononenko I (2010) `Automatic selection of reliability estimates for individual170 regression predictions.171 <http://journals.cambridge.org/abstract_S0269888909990154>`_172 *The Knowledge Engineering Review* 25(1), 2747.173 174 """175 1 import Orange 176 2 … … 236 62 def get_pearson_r(res): 237 63 """ 238 Returns Pearsons coefficient between the prediction error and each of the 239 used reliability estimates. Function also return the pvalue of each of 64 :param res: results of evaluation, done using learners, 65 wrapped into :class:`Orange.evaluation.reliability.Classifier`. 66 :type res: :class:`Orange.evaluation.testing.ExperimentResults` 67 68 Return Pearson's coefficient between the prediction error and each of the 69 used reliability estimates. Also, return the pvalue of each of 240 70 the coefficients. 241 71 """ … … 256 86 def get_spearman_r(res): 257 87 """ 258 Returns Spearmans coefficient between the prediction error and each of the 259 used reliability estimates. Function also return the pvalue of each of 88 :param res: results of evaluation, done using learners, 89 wrapped into :class:`Orange.evaluation.reliability.Classifier`. 90 :type res: :class:`Orange.evaluation.testing.ExperimentResults` 91 92 Return Spearman's coefficient between the prediction error and each of the 93 used reliability estimates. Also, return the pvalue of each of 260 94 the coefficients. 261 95 """ … … 276 110 def get_pearson_r_by_iterations(res): 277 111 """ 278 Returns average Pearsons coefficient over all folds between prediction error 112 :param res: results of evaluation, done using learners, 113 wrapped into :class:`Orange.evaluation.reliability.Classifier`. 114 :type res: :class:`Orange.evaluation.testing.ExperimentResults` 115 116 Return average Pearson's coefficient over all folds between prediction error 279 117 and each of the used estimates. 280 118 """ 281 119 results_by_fold = Orange.evaluation.scoring.split_by_iterations(res) 282 120 number_of_estimates = len(res.results[0].probabilities[0].reliability_estimate) 283 number_of_ examples = len(res.results)121 number_of_instances = len(res.results) 284 122 number_of_folds = len(results_by_fold) 285 123 results = [0 for _ in xrange(number_of_estimates)] … … 304 142 # Calculate pvalues 305 143 results = [float(res) / number_of_folds for res in results] 306 ps = [p_value_from_r(r, number_of_ examples) for r in results]144 ps = [p_value_from_r(r, number_of_instances) for r in results] 307 145 308 146 return zip(results, ps, sig, method_list) … … 317 155 318 156 class Estimate: 157 """ 158 Reliability estimate. Contains attributes that describe the results of 159 reliability estimation. 160 161 .. attribute:: estimate 162 163 A numerical reliability estimate. 164 165 .. attribute:: signed_or_absolute 166 167 Determines whether the method used gives a signed or absolute result. 168 Has a value of either :obj:`SIGNED` or :obj:`ABSOLUTE`. 169 170 .. attribute:: method 171 172 An integer ID of reliability estimation method used. 173 174 .. attribute:: method_name 175 176 Name (string) of reliability estimation method used. 177 178 .. attribute:: icv_method 179 180 An integer ID of reliability estimation method that performed best, 181 as determined by ICV, and of which estimate is stored in the 182 :obj:`estimate` field. (:obj:`None` when ICV was not used.) 183 184 .. attribute:: icv_method_name 185 186 Name (string) of reliability estimation method that performed best, 187 as determined by ICV. (:obj:`None` when ICV was not used.) 188 189 """ 319 190 def __init__(self, estimate, signed_or_absolute, method, icv_method = 1): 320 191 self.estimate = estimate … … 332 203 self.estimator = estimator 333 204 334 def __call__(self, examples, weight=None, **kwds):205 def __call__(self, instances, weight=None, **kwds): 335 206 336 207 # Calculate borders using cross validation 337 res = Orange.evaluation.testing.cross_validation([self.estimator], examples)208 res = Orange.evaluation.testing.cross_validation([self.estimator], instances) 338 209 all_borders = [] 339 210 for i in xrange(len(res.results[0].probabilities[0].reliability_estimate)): … … 344 215 345 216 # Learn on whole train data 346 estimator_classifier = self.estimator( examples)217 estimator_classifier = self.estimator(instances) 347 218 348 219 return DescriptiveAnalysisClassifier(estimator_classifier, all_borders, self.desc) … … 354 225 self.desc = desc 355 226 356 def __call__(self, example, result_type=Orange.core.GetValue):357 predicted, probabilities = self.estimator_classifier( example, Orange.core.GetBoth)227 def __call__(self, instance, result_type=Orange.core.GetValue): 228 predicted, probabilities = self.estimator_classifier(instance, Orange.core.GetBoth) 358 229 359 230 for borders, estimate in zip(self.all_borders, probabilities.reliability_estimate): … … 374 245 """ 375 246 376 :param e: List of possible e values for SAvar and SAbias reliability estimates, the default value is [0.01, 0.1, 0.5, 1.0, 2.0]. 247 :param e: List of possible :math:`\epsilon` values for SAvar and SAbias 248 reliability estimates. 377 249 :type e: list of floats 378 250 379 251 :rtype: :class:`Orange.evaluation.reliability.SensitivityAnalysisClassifier` 380 252 381 To estimate the reliabilty for given example we extend the learning set 382 with given example and labeling it with :math:`K + \epsilon (l_{max}  l_{min})`, 383 where K denotes the initial prediction, :math:`\epsilon` is sensitivity parameter and 384 :math:`l_{min}` and :math:`l_{max}` denote lower and the upper bound of 385 the learning examples. After computing different sensitivity predictions 386 using different values of e, the prediction are combined into SAvar and SAbias. 387 SAbias can be used as signed estimate or as absolute value of SAbias. 253 To estimate the reliability of prediction for given instance, 254 the learning set is extended with this instance, labeled with 255 :math:`K + \epsilon (l_{max}  l_{min})`, 256 where :math:`K` denotes the initial prediction, 257 :math:`\epsilon` is sensitivity parameter and :math:`l_{min}` and 258 :math:`l_{max}` denote lower and the upper bound of the learning 259 instances' labels. After computing different sensitivity predictions 260 using different values of :math:`\epsilon`, the prediction are combined 261 into SAvar and SAbias. SAbias can be used in a signed or absolute form. 388 262 389 263 :math:`SAvar = \\frac{\sum_{\epsilon \in E}(K_{\epsilon}  K_{\epsilon})}{E}` … … 396 270 self.e = e 397 271 398 def __call__(self, examples, learner):399 min_value = max_value = examples[0].getclass().value400 for ex in examples:272 def __call__(self, instances, learner): 273 min_value = max_value = instances[0].getclass().value 274 for ex in instances: 401 275 if ex.getclass().value > max_value: 402 276 max_value = ex.getclass().value 403 277 if ex.getclass().value < min_value: 404 278 min_value = ex.getclass().value 405 return SensitivityAnalysisClassifier(self.e, examples, min_value, max_value, learner)279 return SensitivityAnalysisClassifier(self.e, instances, min_value, max_value, learner) 406 280 407 281 class SensitivityAnalysisClassifier: 408 def __init__(self, e, examples, min_value, max_value, learner):282 def __init__(self, e, instances, min_value, max_value, learner): 409 283 self.e = e 410 self. examples = examples284 self.instances = instances 411 285 self.max_value = max_value 412 286 self.min_value = min_value 413 287 self.learner = learner 414 288 415 def __call__(self, example, predicted, probabilities):289 def __call__(self, instance, predicted, probabilities): 416 290 # Create new dataset 417 r_data = Orange.data.Table(self. examples)418 419 # Create new example420 modified_ example = Orange.data.Instance(example)291 r_data = Orange.data.Table(self.instances) 292 293 # Create new instance 294 modified_instance = Orange.data.Instance(instance) 421 295 422 296 # Append it to the data 423 r_data.append(modified_ example)297 r_data.append(modified_instance) 424 298 425 299 # Calculate SAvar & SAbias … … 430 304 r_data[1].setclass(predicted.value + eps*(self.max_value  self.min_value)) 431 305 c = self.learner(r_data) 432 k_plus = c( example, Orange.core.GetValue)306 k_plus = c(instance, Orange.core.GetValue) 433 307 434 308 # epsilon 435 309 r_data[1].setclass(predicted.value  eps*(self.max_value  self.min_value)) 436 310 c = self.learner(r_data) 437 k_minus = c( example, Orange.core.GetValue)311 k_minus = c(instance, Orange.core.GetValue) 438 312 #print len(r_data) 439 313 #print eps*(self.max_value  self.min_value) … … 454 328 """ 455 329 456 :param m: Number of bagg edmodels to be used with BAGV estimate330 :param m: Number of bagging models to be used with BAGV estimate 457 331 :type m: int 458 332 459 333 :rtype: :class:`Orange.evaluation.reliability.BaggingVarianceClassifier` 460 334 461 We construct m different bagging models of the original chosen learner and use462 th ose predictions (:math:`K_i, i = 1, ..., m`) of given example to calculate the variance, which we use as463 reliability estimator.335 :math:`m` different bagging models are constructed and used to estimate 336 the value of dependent variable for a given instance. The variance of 337 those predictions is used as a prediction reliability estimate. 464 338 465 339 :math:`BAGV = \\frac{1}{m} \sum_{i=1}^{m} (K_i  K)^2` 466 340 467 where 468 469 :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` 341 where :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` and :math:`K_i` are 342 predictions of individual constructed models. 470 343 471 344 """ … … 473 346 self.m = m 474 347 475 def __call__(self, examples, learner):348 def __call__(self, instances, learner): 476 349 classifiers = [] 477 350 478 351 # Create bagged classifiers using sampling with replacement 479 352 for _ in xrange(self.m): 480 selection = select_with_repeat(len( examples))481 data = examples.select(selection)353 selection = select_with_repeat(len(instances)) 354 data = instances.select(selection) 482 355 classifiers.append(learner(data)) 483 356 return BaggingVarianceClassifier(classifiers) … … 487 360 self.classifiers = classifiers 488 361 489 def __call__(self, example, *args):362 def __call__(self, instance, *args): 490 363 BAGV = 0 491 364 492 365 # Calculate the bagging variance 493 bagged_values = [c( example, Orange.core.GetValue).value for c in self.classifiers if c is not None]366 bagged_values = [c(instance, Orange.core.GetValue).value for c in self.classifiers if c is not None] 494 367 495 368 k = sum(bagged_values) / len(bagged_values) … … 507 380 :rtype: :class:`Orange.evaluation.reliability.LocalCrossValidationClassifier` 508 381 509 We find k nearest neighbours to the given example and put them in 510 seperate dataset. On this dataset we do leave one out 511 validation using given model. Reliability estimate is then distance 512 weighted absolute prediction error. 513 514 1. define the set of k nearest neighours :math:`N = { (x_1, x_1),..., (x_k, c_k)}` 515 2. FOR EACH :math:`(x_i, c_i) \in N` 516 517 2.1. generare model M on :math:`N \\backslash (x_i, c_i)` 518 519 2.2. for :math:`(x_i, c_i)` compute LOO prediction :math:`K_i` 520 521 2.3. for :math:`(x_i, c_i)` compute LOO error :math:`E_i =  C_i  K_i ` 522 382 :math:`k` nearest neighbours to the given instance are found and put in 383 a separate data set. On this data set, a leaveoneout validation is 384 performed. Reliability estimate is then the distance weighted absolute 385 prediction error. 386 387 If a special value 0 is passed as :math:`k` (as is by default), 388 it is set as 1/20 of data set size (or 5, whichever is greater). 389 390 1. Determine the set of k nearest neighours :math:`N = { (x_1, c_1),..., 391 (x_k, c_k)}`. 392 2. On this set, compute leaveoneout predictions :math:`K_i` and 393 prediction errors :math:`E_i =  C_i  K_i `. 523 394 3. :math:`LCV(x) = \\frac{ \sum_{(x_i, c_i) \in N} d(x_i, x) * E_i }{ \sum_{(x_i, c_i) \in N} d(x_i, x) }` 524 395 … … 527 398 self.k = k 528 399 529 def __call__(self, examples, learner):400 def __call__(self, instances, learner): 530 401 nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor() 531 402 nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor() 532 403 533 404 distance_id = Orange.data.new_meta_id() 534 nearest_neighbours = nearest_neighbours_constructor( examples, 0, distance_id)405 nearest_neighbours = nearest_neighbours_constructor(instances, 0, distance_id) 535 406 536 407 if self.k == 0: 537 self.k = max(5, len( examples)/20)408 self.k = max(5, len(instances)/20) 538 409 539 410 return LocalCrossValidationClassifier(distance_id, nearest_neighbours, self.k, learner) … … 546 417 self.learner = learner 547 418 548 def __call__(self, example, *args):419 def __call__(self, instance, *args): 549 420 LCVer = 0 550 421 LCVdi = 0 … … 552 423 # Find k nearest neighbors 553 424 554 knn = [ex for ex in self.nearest_neighbours( example, self.k)]425 knn = [ex for ex in self.nearest_neighbours(instance, self.k)] 555 426 556 427 # leave one out of prediction error … … 581 452 :rtype: :class:`Orange.evaluation.reliability.CNeighboursClassifier` 582 453 583 Estimate CNK is defined for unlabeled example as difference between584 average label of the nearest neighbours and the examples prediction. CNK can585 be used as a signed estimate or only as absolute value.454 CNK is defined for an unlabeled instance as a difference between average 455 label of its nearest neighbours and its prediction. CNK can be used as a 456 signed or absolute estimate. 586 457 587 458 :math:`CNK = \\frac{\sum_{i=1}^{k}C_i}{k}  K` 588 459 589 Where k denotes number of neighbors, C :sub:`i` denotes neighbours' labels and590 K denotes the example's prediction.460 where :math:`k` denotes number of neighbors, C :sub:`i` denotes neighbours' 461 labels and :math:`K` denotes the instance's prediction. 591 462 592 463 """ … … 594 465 self.k = k 595 466 596 def __call__(self, examples, learner):467 def __call__(self, instances, learner): 597 468 nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor() 598 469 nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor() 599 470 600 471 distance_id = Orange.data.new_meta_id() 601 nearest_neighbours = nearest_neighbours_constructor( examples, 0, distance_id)472 nearest_neighbours = nearest_neighbours_constructor(instances, 0, distance_id) 602 473 return CNeighboursClassifier(nearest_neighbours, self.k) 603 474 … … 607 478 self.k = k 608 479 609 def __call__(self, example, predicted, probabilities):480 def __call__(self, instance, predicted, probabilities): 610 481 CNK = 0 611 482 612 483 # Find k nearest neighbors 613 484 614 knn = [ex for ex in self.nearest_neighbours( example, self.k)]485 knn = [ex for ex in self.nearest_neighbours(instance, self.k)] 615 486 616 487 # average label of neighbors … … 627 498 """ 628 499 629 :param k: Number of nearest neighbours used in Mahalanobis estimate 500 :param k: Number of nearest neighbours used in Mahalanobis estimate. 630 501 :type k: int 631 502 632 503 :rtype: :class:`Orange.evaluation.reliability.MahalanobisClassifier` 633 504 634 Mahalanobis distance estimate is defined as `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ to the 635 k nearest neighbours of chosen example. 505 Mahalanobis distance reliability estimate is defined as 506 `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ 507 to the evaluated instance's :math:`k` nearest neighbours. 636 508 637 509 … … 640 512 self.k = k 641 513 642 def __call__(self, examples, *args):514 def __call__(self, instances, *args): 643 515 nnm = Orange.classification.knn.FindNearestConstructor() 644 516 nnm.distanceConstructor = Orange.distance.MahalanobisConstructor() 645 517 646 518 mid = Orange.data.new_meta_id() 647 nnm = nnm( examples, 0, mid)519 nnm = nnm(instances, 0, mid) 648 520 return MahalanobisClassifier(self.k, nnm, mid) 649 521 … … 654 526 self.mid = mid 655 527 656 def __call__(self, example, *args):528 def __call__(self, instance, *args): 657 529 mahalanobis_distance = 0 658 530 659 mahalanobis_distance = sum(ex[self.mid].value for ex in self.nnm( example, self.k))531 mahalanobis_distance = sum(ex[self.mid].value for ex in self.nnm(instance, self.k)) 660 532 661 533 return [ Estimate(mahalanobis_distance, ABSOLUTE, MAHAL_ABSOLUTE) ] … … 665 537 :rtype: :class:`Orange.evaluation.reliability.MahalanobisToCenterClassifier` 666 538 667 Mahalanobis distance to center estimate is defined as `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ to the 668 centroid of the data. 539 Mahalanobis distance to center reliability estimate is defined as a 540 `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ 541 between the predicted instance and the centroid of the data. 669 542 670 543 … … 673 546 pass 674 547 675 def __call__(self, examples, *args):548 def __call__(self, instances, *args): 676 549 dc = Orange.core.DomainContinuizer() 677 550 dc.classTreatment = Orange.core.DomainContinuizer.Ignore … … 679 552 dc.multinomialTreatment = Orange.core.DomainContinuizer.NValues 680 553 681 new_domain = dc( examples)682 new_ examples = examples.translate(new_domain)683 684 X, _, _ = new_ examples.to_numpy()685 example_avg = numpy.average(X, 0)554 new_domain = dc(instances) 555 new_instances = instances.translate(new_domain) 556 557 X, _, _ = new_instances.to_numpy() 558 instance_avg = numpy.average(X, 0) 686 559 687 560 distance_constructor = Orange.distance.MahalanobisConstructor() 688 distance = distance_constructor(new_ examples)689 690 average_ example = Orange.data.Instance(new_examples.domain, list(example_avg) + ["?"])691 692 return MahalanobisToCenterClassifier(distance, average_ example, new_domain)561 distance = distance_constructor(new_instances) 562 563 average_instance = Orange.data.Instance(new_instances.domain, list(instance_avg) + ["?"]) 564 565 return MahalanobisToCenterClassifier(distance, average_instance, new_domain) 693 566 694 567 class MahalanobisToCenterClassifier: 695 def __init__(self, distance, average_ example, new_domain):568 def __init__(self, distance, average_instance, new_domain): 696 569 self.distance = distance 697 self.average_ example = average_example570 self.average_instance = average_instance 698 571 self.new_domain = new_domain 699 572 700 def __call__(self, example, *args):701 702 ex = Orange.data.Instance(self.new_domain, example)703 704 mahalanobis_to_center = self.distance( ex, self.average_example)573 def __call__(self, instance, *args): 574 575 inst = Orange.data.Instance(self.new_domain, instance) 576 577 mahalanobis_to_center = self.distance(inst, self.average_instance) 705 578 706 579 return [ Estimate(mahalanobis_to_center, ABSOLUTE, MAHAL_TO_CENTER_ABSOLUTE) ] … … 718 591 :rtype: :class:`Orange.evaluation.reliability.BaggingVarianceCNeighboursClassifier` 719 592 720 BVCK is a combination of Bagging variance and local modeling of prediction721 error, for this estimate we take the average of both.593 BVCK is a combination (average) of Bagging variance and local modeling of 594 prediction error. 722 595 723 596 """ … … 726 599 self.cnk = cnk 727 600 728 def __call__(self, examples, learner):729 bagv_classifier = self.bagv( examples, learner)730 cnk_classifier = self.cnk( examples, learner)601 def __call__(self, instances, learner): 602 bagv_classifier = self.bagv(instances, learner) 603 cnk_classifier = self.cnk(instances, learner) 731 604 return BaggingVarianceCNeighboursClassifier(bagv_classifier, cnk_classifier) 732 605 … … 736 609 self.cnk_classifier = cnk_classifier 737 610 738 def __call__(self, example, predicted, probabilities):739 bagv_estimates = self.bagv_classifier( example, predicted, probabilities)740 cnk_estimates = self.cnk_classifier( example, predicted, probabilities)611 def __call__(self, instance, predicted, probabilities): 612 bagv_estimates = self.bagv_classifier(instance, predicted, probabilities) 613 cnk_estimates = self.cnk_classifier(instance, predicted, probabilities) 741 614 742 615 bvck_value = (bagv_estimates[0].estimate + cnk_estimates[1].estimate)/2 … … 750 623 pass 751 624 752 def __call__(self, examples, learner):753 res = Orange.evaluation.testing.cross_validation([learner], examples)625 def __call__(self, instances, learner): 626 res = Orange.evaluation.testing.cross_validation([learner], instances) 754 627 prediction_errors = get_prediction_error_list(res) 755 628 756 new_domain = Orange.data.Domain( examples.domain.attributes, Orange.core.FloatVariable("pe"))757 new_dataset = Orange.data.Table(new_domain, examples)758 759 for example, prediction_error in izip(new_dataset, prediction_errors):760 example.set_class(prediction_error)629 new_domain = Orange.data.Domain(instances.domain.attributes, Orange.core.FloatVariable("pe")) 630 new_dataset = Orange.data.Table(new_domain, instances) 631 632 for instance, prediction_error in izip(new_dataset, prediction_errors): 633 instance.set_class(prediction_error) 761 634 762 635 rf = Orange.ensemble.forest.RandomForestLearner() … … 770 643 self.new_domain = new_domain 771 644 772 def __call__(self, example, predicted, probabilities):773 new_ example = Orange.data.Instance(self.new_domain, example)774 value = self.rf_classifier(new_ example, Orange.core.GetValue)645 def __call__(self, instance, predicted, probabilities): 646 new_instance = Orange.data.Instance(self.new_domain, instance) 647 value = self.rf_classifier(new_instance, Orange.core.GetValue) 775 648 776 649 return [Estimate(value.value, SIGNED, SABIAS_SIGNED)] … … 780 653 Reliability estimation wrapper around a learner we want to test. 781 654 Different reliability estimation algorithms can be used on the 782 chosen learner. This learner works as any other and can be used as one. 783 The only difference is when the classifier is called with a given 784 example instead of only return the value and probabilities, it also 785 attaches a list of reliability estimates to 786 :data:`probabilities.reliability_estimate`. 787 Each reliability estimate consists of a tuple 788 (estimate, signed_or_absolute, method). 789 790 :param box_learner: Learner we want to wrap into reliability estimation 655 chosen learner. This learner works as any other and can be used as one, 656 but it returns the classifier, wrapped into an instance of 657 :class:`Orange.evaluation.reliability.Classifier`. 658 659 :param box_learner: Learner we want to wrap into a reliability estimation 660 classifier. 791 661 :type box_learner: learner 792 662 … … 815 685 816 686 817 def __call__(self, examples, weight=None, **kwds):687 def __call__(self, instances, weight=None, **kwds): 818 688 """Learn from the given table of data instances. 819 689 … … 828 698 new_domain = None 829 699 830 if examples.domain.class_var.var_type != Orange.data.variable.Continuous.Continuous:700 if instances.domain.class_var.var_type != Orange.data.variable.Continuous.Continuous: 831 701 raise Exception("This method only works on data with continuous class.") 832 702 833 return Classifier(examples, self.box_learner, self.estimators, self.blending, new_domain, blending_classifier) 834 835 def internal_cross_validation(self, examples, folds=10): 836 """ Performs the ususal internal cross validation for getting the best 837 reliability estimate. It uses the reliability estimators defined in 838 estimators attribute. Returns the id of the method that scored the 839 best. """ 840 res = Orange.evaluation.testing.cross_validation([self], examples, folds=folds) 703 return Classifier(instances, self.box_learner, self.estimators, self.blending, new_domain, blending_classifier) 704 705 def internal_cross_validation(self, instances, folds=10): 706 """ Perform the internal cross validation for getting the best 707 reliability estimate. It uses the reliability estimators defined in 708 estimators attribute. 709 710 Returns the id of the method that scored the best. 711 712 :param instances: Data instances to use for ICV. 713 :type instances: :class:`Orange.data.Table` 714 :param folds: number of folds for ICV. 715 :type folds: int 716 :rtype: int 717 718 """ 719 res = Orange.evaluation.testing.cross_validation([self], instances, folds=folds) 841 720 results = get_pearson_r(res) 842 721 sorted_results = sorted(results) 843 722 return sorted_results[1][3] 844 723 845 def internal_cross_validation_testing(self, examples, folds=10):846 """ Perform sinternal cross validation (as in Automatic selection of724 def internal_cross_validation_testing(self, instances, folds=10): 725 """ Perform internal cross validation (as in Automatic selection of 847 726 reliability estimates for individual regression predictions, 848 Zoran Bosnic 2010) and return id of the method 849 that scored best on this data. """ 850 cv_indices = Orange.core.MakeRandomIndicesCV(examples, folds) 727 Zoran Bosnic, 2010) and return id of the method 728 that scored best on this data. 729 730 :param instances: Data instances to use for ICV. 731 :type instances: :class:`Orange.data.Table` 732 :param folds: number of folds for ICV. 733 :type folds: int 734 :rtype: int 735 736 """ 737 cv_indices = Orange.core.MakeRandomIndicesCV(instances, folds) 851 738 852 739 list_of_rs = [] … … 855 742 856 743 for fold in xrange(folds): 857 data = examples.select(cv_indices, fold)744 data = instances.select(cv_indices, fold) 858 745 if len(data) < 10: 859 746 res = Orange.evaluation.testing.leave_one_out([self], data) … … 869 756 870 757 class Classifier: 871 def __init__(self, examples, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds): 758 """ 759 A reliability estimation wrapper for classifiers. 760 761 What distinguishes this classifier is that the returned probabilities (if 762 :obj:`Orange.classification.Classifier.GetProbabilities` or 763 :obj:`Orange.classification.Classifier.GetBoth` is passed) contain an 764 additional attribute :obj:`reliability_estimate`, which is an instance of 765 :class:`~Orange.evaluation.reliability.Estimate`. 766 767 """ 768 769 def __init__(self, instances, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds): 872 770 self.__dict__.update(kwds) 873 self. examples = examples771 self.instances = instances 874 772 self.box_learner = box_learner 875 773 self.estimators = estimators … … 879 777 880 778 # Train the learner with original data 881 self.classifier = box_learner( examples)779 self.classifier = box_learner(instances) 882 780 883 781 # Train all the estimators and create their classifiers 884 self.estimation_classifiers = [estimator( examples, box_learner) for estimator in estimators]885 886 def __call__(self, example, result_type=Orange.core.GetValue):782 self.estimation_classifiers = [estimator(instances, box_learner) for estimator in estimators] 783 784 def __call__(self, instance, result_type=Orange.core.GetValue): 887 785 """ 888 Classify and estimate a new instance. When you chose 889 Orange.core.GetBoth or Orange.core.getProbabilities, you can access 890 the reliability estimates inside probabilities.reliability_estimate. 786 Classify and estimate reliability of estimation for a new instance. 787 When :obj:`result_type` is set to 788 :obj:`Orange.classification.Classifier.GetBoth` or 789 :obj:`Orange.classification.Classifier.GetProbabilities`, 790 an additional attribute :obj:`reliability_estimate`, 791 which is an instance of 792 :class:`~Orange.evaluation.reliability.Estimate`, 793 is added to the distribution object. 891 794 892 795 :param instance: instance to be classified. … … 899 802 :class:`Orange.statistics.Distribution` or a tuple with both 900 803 """ 901 predicted, probabilities = self.classifier( example, Orange.core.GetBoth)804 predicted, probabilities = self.classifier(instance, Orange.core.GetBoth) 902 805 903 806 # Create a place holder for estimates … … 910 813 # Calculate all the estimates and add them to the results 911 814 for estimate in self.estimation_classifiers: 912 probabilities.reliability_estimate.extend(estimate( example, predicted, probabilities))815 probabilities.reliability_estimate.extend(estimate(instance, predicted, probabilities)) 913 816 914 817 # Return the appropriate type of result
Note: See TracChangeset
for help on using the changeset viewer.