# Changes in [9679:3879dea56188:9686:819a1e2b751f] in orange

Ignore:
Files:
1 deleted
8 edited

Unmodified
Removed
• ## Orange/evaluation/reliability.py

 r9671 """ ######################################## Reliability estimation (reliability) ######################################## .. index:: Reliability Estimation .. index:: single: reliability; Reliability Estimation for Regression ************************************* Reliability Estimation for Regression ************************************* This module includes different implementations of algorithm used for predicting reliability of single predictions. Most of the algorithm are taken from Comparison of approaches for estimating reliability of individual regression predictions, Zoran Bosnic 2008. Next example shows basic reliability estimation usage (:download:reliability-basic.py , uses :download:housing.tab ): .. literalinclude:: code/reliability_basic.py First we load our desired data table and choose on learner we want to use reliability estimation on. We also want to calculate only the Mahalanobis and local cross validation estimates with desired parameters. We learn our estimator on data, and estimate the reliability for first instance of data table. We output the estimates used and the numbers. We can also do reliability estimation on whole data table not only on single instance. Example shows us doing cross validation on the desired data table, using default reliability estimates, and at the ending output reliability estimates for the first instance of data table. (:download:reliability-run.py , uses :download:housing.tab ): .. literalinclude:: code/reliability-run.py Reliability estimation methods are computationally quite hard so it may take a bit of time for this script to produce a result. In the above example we first create a learner that we're interested in, in this example k-nearest-neighbors, and use it inside reliability learner and do cross validation to get the results. Now we output for the first example in the data table all the reliability estimates and their names. Reliability Methods =================== Sensitivity Analysis (SAvar and SAbias) --------------------------------------- .. autoclass:: SensitivityAnalysis Variance of bagged models (BAGV) -------------------------------- .. autoclass:: BaggingVariance Local cross validation reliability estimate (LCV) ------------------------------------------------- .. autoclass:: LocalCrossValidation Local modeling of prediction error (CNK) ---------------------------------------- .. autoclass:: CNeighbours Bagging variance c-neighbours (BVCK) ------------------------------------ .. autoclass:: BaggingVarianceCNeighbours Mahalanobis distance -------------------- .. autoclass:: Mahalanobis Mahalanobis to center --------------------- .. autoclass:: MahalanobisToCenter Reliability estimate learner ============================ .. autoclass:: Learner :members: Reliability estimation scoring methods ====================================== .. autofunction:: get_pearson_r .. autofunction:: get_pearson_r_by_iterations .. autofunction:: get_spearman_r Referencing =========== There is a dictionary named :data:METHOD_NAME which has stored names of all the reliability estimates:: METHOD_NAME = {0: "SAvar absolute", 1: "SAbias signed", 2: "SAbias absolute", 3: "BAGV absolute", 4: "CNK signed", 5: "CNK absolute", 6: "LCV absolute", 7: "BVCK_absolute", 8: "Mahalanobis absolute", 10: "ICV"} and also two constants for saying whether the estimate is signed or it's an absolute value:: SIGNED = 0 ABSOLUTE = 1 Example of usage ================ Here we will walk through a bit longer example of how to use the reliability estimate module (:download:reliability-long.py , uses :download:prostate.tab ): .. literalinclude:: code/reliability-long.py :lines: 1-16 After loading the Orange library we open out dataset. We chose to work with the kNNLearner, that also works on regression problems. Create out reliability estimate learner and test it with cross validation. Estimates are then compared using Pearson's coefficient to the prediction error. The p-values are also computed:: Estimate               r       p SAvar absolute        -0.077   0.454 SAbias signed         -0.165   0.105 SAbias absolute       -0.099   0.333 BAGV absolute          0.104   0.309 CNK signed             0.233   0.021 CNK absolute           0.057   0.579 LCV absolute           0.069   0.504 BVCK_absolute          0.092   0.368 Mahalanobis absolute   0.091   0.375 .. literalinclude:: code/reliability-long.py :lines: 18-28 Outputs:: Estimate               r       p BAGV absolute          0.126   0.220 CNK signed             0.233   0.021 CNK absolute           0.057   0.579 LCV absolute           0.069   0.504 BVCK_absolute          0.105   0.305 Mahalanobis absolute   0.091   0.375 As you can see in the above code you can also chose with reliability estimation method do you want to use. You might want to do this to reduce computation time or because you think they don't perform good enough. References ========== Bosnic Z, Kononenko I (2007) Estimation of individual prediction reliability using local sensitivity analysis. _ *Applied Intelligence* 29(3), 187-203. Bosnic Z, Kononenko I (2008) Comparison of approaches for estimating reliability of individual regression predictions. _ *Data & Knowledge Engineering* 67(3), 504-516. Bosnic Z, Kononenko I (2010) Automatic selection of reliability estimates for individual regression predictions. _ *The Knowledge Engineering Review* 25(1), 27-47. """ import Orange def get_pearson_r(res): """ Returns Pearsons coefficient between the prediction error and each of the used reliability estimates. Function also return the p-value of each of :param res: results of evaluation, done using learners, wrapped into :class:Orange.evaluation.reliability.Classifier. :type res: :class:Orange.evaluation.testing.ExperimentResults Return Pearson's coefficient between the prediction error and each of the used reliability estimates. Also, return the p-value of each of the coefficients. """ def get_spearman_r(res): """ Returns Spearmans coefficient between the prediction error and each of the used reliability estimates. Function also return the p-value of each of :param res: results of evaluation, done using learners, wrapped into :class:Orange.evaluation.reliability.Classifier. :type res: :class:Orange.evaluation.testing.ExperimentResults Return Spearman's coefficient between the prediction error and each of the used reliability estimates. Also, return the p-value of each of the coefficients. """ def get_pearson_r_by_iterations(res): """ Returns average Pearsons coefficient over all folds between prediction error :param res: results of evaluation, done using learners, wrapped into :class:Orange.evaluation.reliability.Classifier. :type res: :class:Orange.evaluation.testing.ExperimentResults Return average Pearson's coefficient over all folds between prediction error and each of the used estimates. """ results_by_fold = Orange.evaluation.scoring.split_by_iterations(res) number_of_estimates = len(res.results[0].probabilities[0].reliability_estimate) number_of_examples = len(res.results) number_of_instances = len(res.results) number_of_folds = len(results_by_fold) results = [0 for _ in xrange(number_of_estimates)] # Calculate p-values results = [float(res) / number_of_folds for res in results] ps = [p_value_from_r(r, number_of_examples) for r in results] ps = [p_value_from_r(r, number_of_instances) for r in results] return zip(results, ps, sig, method_list) class Estimate: """ Reliability estimate. Contains attributes that describe the results of reliability estimation. .. attribute:: estimate A numerical reliability estimate. .. attribute:: signed_or_absolute Determines whether the method used gives a signed or absolute result. Has a value of either :obj:SIGNED or :obj:ABSOLUTE. .. attribute:: method An integer ID of reliability estimation method used. .. attribute:: method_name Name (string) of reliability estimation method used. .. attribute:: icv_method An integer ID of reliability estimation method that performed best, as determined by ICV, and of which estimate is stored in the :obj:estimate field. (:obj:None when ICV was not used.) .. attribute:: icv_method_name Name (string) of reliability estimation method that performed best, as determined by ICV. (:obj:None when ICV was not used.) """ def __init__(self, estimate, signed_or_absolute, method, icv_method = -1): self.estimate = estimate self.estimator = estimator def __call__(self, examples, weight=None, **kwds): def __call__(self, instances, weight=None, **kwds): # Calculate borders using cross validation res = Orange.evaluation.testing.cross_validation([self.estimator], examples) res = Orange.evaluation.testing.cross_validation([self.estimator], instances) all_borders = [] for i in xrange(len(res.results[0].probabilities[0].reliability_estimate)): # Learn on whole train data estimator_classifier = self.estimator(examples) estimator_classifier = self.estimator(instances) return DescriptiveAnalysisClassifier(estimator_classifier, all_borders, self.desc) self.desc = desc def __call__(self, example, result_type=Orange.core.GetValue): predicted, probabilities = self.estimator_classifier(example, Orange.core.GetBoth) def __call__(self, instance, result_type=Orange.core.GetValue): predicted, probabilities = self.estimator_classifier(instance, Orange.core.GetBoth) for borders, estimate in zip(self.all_borders, probabilities.reliability_estimate): """ :param e: List of possible e values for SAvar and SAbias reliability estimates, the default value is [0.01, 0.1, 0.5, 1.0, 2.0]. :param e: List of possible :math:\epsilon values for SAvar and SAbias reliability estimates. :type e: list of floats :rtype: :class:Orange.evaluation.reliability.SensitivityAnalysisClassifier To estimate the reliabilty for given example we extend the learning set with given example and labeling it with :math:K + \epsilon (l_{max} - l_{min}), where K denotes the initial prediction, :math:\epsilon is sensitivity parameter and :math:l_{min} and :math:l_{max} denote lower and the upper bound of the learning examples. After computing different sensitivity predictions using different values of e, the prediction are combined into SAvar and SAbias. SAbias can be used as signed estimate or as absolute value of SAbias. To estimate the reliability of prediction for given instance, the learning set is extended with this instance, labeled with :math:K + \epsilon (l_{max} - l_{min}), where :math:K denotes the initial prediction, :math:\epsilon is sensitivity parameter and :math:l_{min} and :math:l_{max} denote lower and the upper bound of the learning instances' labels. After computing different sensitivity predictions using different values of :math:\epsilon, the prediction are combined into SAvar and SAbias. SAbias can be used in a signed or absolute form. :math:SAvar = \\frac{\sum_{\epsilon \in E}(K_{\epsilon} - K_{-\epsilon})}{|E|} self.e = e def __call__(self, examples, learner): min_value = max_value = examples[0].getclass().value for ex in examples: def __call__(self, instances, learner): min_value = max_value = instances[0].getclass().value for ex in instances: if ex.getclass().value > max_value: max_value = ex.getclass().value if ex.getclass().value < min_value: min_value = ex.getclass().value return SensitivityAnalysisClassifier(self.e, examples, min_value, max_value, learner) return SensitivityAnalysisClassifier(self.e, instances, min_value, max_value, learner) class SensitivityAnalysisClassifier: def __init__(self, e, examples, min_value, max_value, learner): def __init__(self, e, instances, min_value, max_value, learner): self.e = e self.examples = examples self.instances = instances self.max_value = max_value self.min_value = min_value self.learner = learner def __call__(self, example, predicted, probabilities): def __call__(self, instance, predicted, probabilities): # Create new dataset r_data = Orange.data.Table(self.examples) # Create new example modified_example = Orange.data.Instance(example) r_data = Orange.data.Table(self.instances) # Create new instance modified_instance = Orange.data.Instance(instance) # Append it to the data r_data.append(modified_example) r_data.append(modified_instance) # Calculate SAvar & SAbias r_data[-1].setclass(predicted.value + eps*(self.max_value - self.min_value)) c = self.learner(r_data) k_plus = c(example, Orange.core.GetValue) k_plus = c(instance, Orange.core.GetValue) # -epsilon r_data[-1].setclass(predicted.value - eps*(self.max_value - self.min_value)) c = self.learner(r_data) k_minus = c(example, Orange.core.GetValue) k_minus = c(instance, Orange.core.GetValue) #print len(r_data) #print eps*(self.max_value - self.min_value) """ :param m: Number of bagged models to be used with BAGV estimate :param m: Number of bagging models to be used with BAGV estimate :type m: int :rtype: :class:Orange.evaluation.reliability.BaggingVarianceClassifier We construct m different bagging models of the original chosen learner and use those predictions (:math:K_i, i = 1, ..., m) of given example to calculate the variance, which we use as reliability estimator. :math:m different bagging models are constructed and used to estimate the value of dependent variable for a given instance. The variance of those predictions is used as a prediction reliability estimate. :math:BAGV = \\frac{1}{m} \sum_{i=1}^{m} (K_i - K)^2 where :math:K = \\frac{\sum_{i=1}^{m} K_i}{m} where :math:K = \\frac{\sum_{i=1}^{m} K_i}{m} and :math:K_i are predictions of individual constructed models. """ self.m = m def __call__(self, examples, learner): def __call__(self, instances, learner): classifiers = [] # Create bagged classifiers using sampling with replacement for _ in xrange(self.m): selection = select_with_repeat(len(examples)) data = examples.select(selection) selection = select_with_repeat(len(instances)) data = instances.select(selection) classifiers.append(learner(data)) return BaggingVarianceClassifier(classifiers) self.classifiers = classifiers def __call__(self, example, *args): def __call__(self, instance, *args): BAGV = 0 # Calculate the bagging variance bagged_values = [c(example, Orange.core.GetValue).value for c in self.classifiers if c is not None] bagged_values = [c(instance, Orange.core.GetValue).value for c in self.classifiers if c is not None] k = sum(bagged_values) / len(bagged_values) :rtype: :class:Orange.evaluation.reliability.LocalCrossValidationClassifier We find k nearest neighbours to the given example and put them in seperate dataset. On this dataset we do leave one out validation using given model. Reliability estimate is then distance weighted absolute prediction error. 1. define the set of k nearest neighours :math:N = { (x_1, x_1),..., (x_k, c_k)} 2. FOR EACH :math:(x_i, c_i) \in N 2.1. generare model M on :math:N \\backslash (x_i, c_i) 2.2. for :math:(x_i, c_i) compute LOO prediction :math:K_i 2.3. for :math:(x_i, c_i) compute LOO error :math:E_i = | C_i - K_i | :math:k nearest neighbours to the given instance are found and put in a separate data set. On this data set, a leave-one-out validation is performed. Reliability estimate is then the distance weighted absolute prediction error. If a special value 0 is passed as :math:k (as is by default), it is set as 1/20 of data set size (or 5, whichever is greater). 1. Determine the set of k nearest neighours :math:N = { (x_1, c_1),..., (x_k, c_k)}. 2. On this set, compute leave-one-out predictions :math:K_i and prediction errors :math:E_i = | C_i - K_i |. 3. :math:LCV(x) = \\frac{ \sum_{(x_i, c_i) \in N} d(x_i, x) * E_i }{ \sum_{(x_i, c_i) \in N} d(x_i, x) } self.k = k def __call__(self, examples, learner): def __call__(self, instances, learner): nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor() nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor() distance_id = Orange.data.new_meta_id() nearest_neighbours = nearest_neighbours_constructor(examples, 0, distance_id) nearest_neighbours = nearest_neighbours_constructor(instances, 0, distance_id) if self.k == 0: self.k = max(5, len(examples)/20) self.k = max(5, len(instances)/20) return LocalCrossValidationClassifier(distance_id, nearest_neighbours, self.k, learner) self.learner = learner def __call__(self, example, *args): def __call__(self, instance, *args): LCVer = 0 LCVdi = 0 # Find k nearest neighbors knn = [ex for ex in self.nearest_neighbours(example, self.k)] knn = [ex for ex in self.nearest_neighbours(instance, self.k)] # leave one out of prediction error :rtype: :class:Orange.evaluation.reliability.CNeighboursClassifier Estimate CNK is defined for unlabeled example as difference between average label of the nearest neighbours and the examples prediction. CNK can be used as a signed estimate or only as absolute value. CNK is defined for an unlabeled instance as a difference between average label of its nearest neighbours and its prediction. CNK can be used as a signed or absolute estimate. :math:CNK = \\frac{\sum_{i=1}^{k}C_i}{k} - K Where k denotes number of neighbors, C :sub:i denotes neighbours' labels and K denotes the example's prediction. where :math:k denotes number of neighbors, C :sub:i denotes neighbours' labels and :math:K denotes the instance's prediction. """ self.k = k def __call__(self, examples, learner): def __call__(self, instances, learner): nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor() nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor() distance_id = Orange.data.new_meta_id() nearest_neighbours = nearest_neighbours_constructor(examples, 0, distance_id) nearest_neighbours = nearest_neighbours_constructor(instances, 0, distance_id) return CNeighboursClassifier(nearest_neighbours, self.k) self.k = k def __call__(self, example, predicted, probabilities): def __call__(self, instance, predicted, probabilities): CNK = 0 # Find k nearest neighbors knn = [ex for ex in self.nearest_neighbours(example, self.k)] knn = [ex for ex in self.nearest_neighbours(instance, self.k)] # average label of neighbors """ :param k: Number of nearest neighbours used in Mahalanobis estimate :param k: Number of nearest neighbours used in Mahalanobis estimate. :type k: int :rtype: :class:Orange.evaluation.reliability.MahalanobisClassifier Mahalanobis distance estimate is defined as mahalanobis distance _ to the k nearest neighbours of chosen example. Mahalanobis distance reliability estimate is defined as mahalanobis distance _ to the evaluated instance's :math:k nearest neighbours. self.k = k def __call__(self, examples, *args): def __call__(self, instances, *args): nnm = Orange.classification.knn.FindNearestConstructor() nnm.distanceConstructor = Orange.distance.MahalanobisConstructor() mid = Orange.data.new_meta_id() nnm = nnm(examples, 0, mid) nnm = nnm(instances, 0, mid) return MahalanobisClassifier(self.k, nnm, mid) self.mid = mid def __call__(self, example, *args): def __call__(self, instance, *args): mahalanobis_distance = 0 mahalanobis_distance = sum(ex[self.mid].value for ex in self.nnm(example, self.k)) mahalanobis_distance = sum(ex[self.mid].value for ex in self.nnm(instance, self.k)) return [ Estimate(mahalanobis_distance, ABSOLUTE, MAHAL_ABSOLUTE) ] :rtype: :class:Orange.evaluation.reliability.MahalanobisToCenterClassifier Mahalanobis distance to center estimate is defined as mahalanobis distance _ to the centroid of the data. Mahalanobis distance to center reliability estimate is defined as a mahalanobis distance _ between the predicted instance and the centroid of the data. pass def __call__(self, examples, *args): def __call__(self, instances, *args): dc = Orange.core.DomainContinuizer() dc.classTreatment = Orange.core.DomainContinuizer.Ignore dc.multinomialTreatment = Orange.core.DomainContinuizer.NValues new_domain = dc(examples) new_examples = examples.translate(new_domain) X, _, _ = new_examples.to_numpy() example_avg = numpy.average(X, 0) new_domain = dc(instances) new_instances = instances.translate(new_domain) X, _, _ = new_instances.to_numpy() instance_avg = numpy.average(X, 0) distance_constructor = Orange.distance.MahalanobisConstructor() distance = distance_constructor(new_examples) average_example = Orange.data.Instance(new_examples.domain, list(example_avg) + ["?"]) return MahalanobisToCenterClassifier(distance, average_example, new_domain) distance = distance_constructor(new_instances) average_instance = Orange.data.Instance(new_instances.domain, list(instance_avg) + ["?"]) return MahalanobisToCenterClassifier(distance, average_instance, new_domain) class MahalanobisToCenterClassifier: def __init__(self, distance, average_example, new_domain): def __init__(self, distance, average_instance, new_domain): self.distance = distance self.average_example = average_example self.average_instance = average_instance self.new_domain = new_domain def __call__(self, example, *args): ex = Orange.data.Instance(self.new_domain, example) mahalanobis_to_center = self.distance(ex, self.average_example) def __call__(self, instance, *args): inst = Orange.data.Instance(self.new_domain, instance) mahalanobis_to_center = self.distance(inst, self.average_instance) return [ Estimate(mahalanobis_to_center, ABSOLUTE, MAHAL_TO_CENTER_ABSOLUTE) ] :rtype: :class:Orange.evaluation.reliability.BaggingVarianceCNeighboursClassifier BVCK is a combination of Bagging variance and local modeling of prediction error, for this estimate we take the average of both. BVCK is a combination (average) of Bagging variance and local modeling of prediction error. """ self.cnk = cnk def __call__(self, examples, learner): bagv_classifier = self.bagv(examples, learner) cnk_classifier = self.cnk(examples, learner) def __call__(self, instances, learner): bagv_classifier = self.bagv(instances, learner) cnk_classifier = self.cnk(instances, learner) return BaggingVarianceCNeighboursClassifier(bagv_classifier, cnk_classifier) self.cnk_classifier = cnk_classifier def __call__(self, example, predicted, probabilities): bagv_estimates = self.bagv_classifier(example, predicted, probabilities) cnk_estimates = self.cnk_classifier(example, predicted, probabilities) def __call__(self, instance, predicted, probabilities): bagv_estimates = self.bagv_classifier(instance, predicted, probabilities) cnk_estimates = self.cnk_classifier(instance, predicted, probabilities) bvck_value = (bagv_estimates[0].estimate + cnk_estimates[1].estimate)/2 pass def __call__(self, examples, learner): res = Orange.evaluation.testing.cross_validation([learner], examples) def __call__(self, instances, learner): res = Orange.evaluation.testing.cross_validation([learner], instances) prediction_errors = get_prediction_error_list(res) new_domain = Orange.data.Domain(examples.domain.attributes, Orange.core.FloatVariable("pe")) new_dataset = Orange.data.Table(new_domain, examples) for example, prediction_error in izip(new_dataset, prediction_errors): example.set_class(prediction_error) new_domain = Orange.data.Domain(instances.domain.attributes, Orange.core.FloatVariable("pe")) new_dataset = Orange.data.Table(new_domain, instances) for instance, prediction_error in izip(new_dataset, prediction_errors): instance.set_class(prediction_error) rf = Orange.ensemble.forest.RandomForestLearner() self.new_domain = new_domain def __call__(self, example, predicted, probabilities): new_example = Orange.data.Instance(self.new_domain, example) value = self.rf_classifier(new_example, Orange.core.GetValue) def __call__(self, instance, predicted, probabilities): new_instance = Orange.data.Instance(self.new_domain, instance) value = self.rf_classifier(new_instance, Orange.core.GetValue) return [Estimate(value.value, SIGNED, SABIAS_SIGNED)] Reliability estimation wrapper around a learner we want to test. Different reliability estimation algorithms can be used on the chosen learner. This learner works as any other and can be used as one. The only difference is when the classifier is called with a given example instead of only return the value and probabilities, it also attaches a list of reliability estimates to :data:probabilities.reliability_estimate. Each reliability estimate consists of a tuple (estimate, signed_or_absolute, method). :param box_learner: Learner we want to wrap into reliability estimation chosen learner. This learner works as any other and can be used as one, but it returns the classifier, wrapped into an instance of :class:Orange.evaluation.reliability.Classifier. :param box_learner: Learner we want to wrap into a reliability estimation classifier. :type box_learner: learner def __call__(self, examples, weight=None, **kwds): def __call__(self, instances, weight=None, **kwds): """Learn from the given table of data instances. new_domain = None if examples.domain.class_var.var_type != Orange.data.variable.Continuous.Continuous: if instances.domain.class_var.var_type != Orange.data.variable.Continuous.Continuous: raise Exception("This method only works on data with continuous class.") return Classifier(examples, self.box_learner, self.estimators, self.blending, new_domain, blending_classifier) def internal_cross_validation(self, examples, folds=10): """ Performs the ususal internal cross validation for getting the best reliability estimate. It uses the reliability estimators defined in estimators attribute. Returns the id of the method that scored the best. """ res = Orange.evaluation.testing.cross_validation([self], examples, folds=folds) return Classifier(instances, self.box_learner, self.estimators, self.blending, new_domain, blending_classifier) def internal_cross_validation(self, instances, folds=10): """ Perform the internal cross validation for getting the best reliability estimate. It uses the reliability estimators defined in estimators attribute. Returns the id of the method that scored the best. :param instances: Data instances to use for ICV. :type instances: :class:Orange.data.Table :param folds: number of folds for ICV. :type folds: int :rtype: int """ res = Orange.evaluation.testing.cross_validation([self], instances, folds=folds) results = get_pearson_r(res) sorted_results = sorted(results) return sorted_results[-1][3] def internal_cross_validation_testing(self, examples, folds=10): """ Performs internal cross validation (as in Automatic selection of def internal_cross_validation_testing(self, instances, folds=10): """ Perform internal cross validation (as in Automatic selection of reliability estimates for individual regression predictions, Zoran Bosnic 2010) and return id of the method that scored best on this data. """ cv_indices = Orange.core.MakeRandomIndicesCV(examples, folds) Zoran Bosnic, 2010) and return id of the method that scored best on this data. :param instances: Data instances to use for ICV. :type instances: :class:Orange.data.Table :param folds: number of folds for ICV. :type folds: int :rtype: int """ cv_indices = Orange.core.MakeRandomIndicesCV(instances, folds) list_of_rs = [] for fold in xrange(folds): data = examples.select(cv_indices, fold) data = instances.select(cv_indices, fold) if len(data) < 10: res = Orange.evaluation.testing.leave_one_out([self], data) class Classifier: def __init__(self, examples, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds): """ A reliability estimation wrapper for classifiers. What distinguishes this classifier is that the returned probabilities (if :obj:Orange.classification.Classifier.GetProbabilities or :obj:Orange.classification.Classifier.GetBoth is passed) contain an additional attribute :obj:reliability_estimate, which is an instance of :class:~Orange.evaluation.reliability.Estimate. """ def __init__(self, instances, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds): self.__dict__.update(kwds) self.examples = examples self.instances = instances self.box_learner = box_learner self.estimators = estimators # Train the learner with original data self.classifier = box_learner(examples) self.classifier = box_learner(instances) # Train all the estimators and create their classifiers self.estimation_classifiers = [estimator(examples, box_learner) for estimator in estimators] def __call__(self, example, result_type=Orange.core.GetValue): self.estimation_classifiers = [estimator(instances, box_learner) for estimator in estimators] def __call__(self, instance, result_type=Orange.core.GetValue): """ Classify and estimate a new instance. When you chose Orange.core.GetBoth or Orange.core.getProbabilities, you can access the reliability estimates inside probabilities.reliability_estimate. Classify and estimate reliability of estimation for a new instance. When :obj:result_type is set to :obj:Orange.classification.Classifier.GetBoth or :obj:Orange.classification.Classifier.GetProbabilities, an additional attribute :obj:reliability_estimate, which is an instance of :class:~Orange.evaluation.reliability.Estimate, is added to the distribution object. :param instance: instance to be classified. :class:Orange.statistics.Distribution or a tuple with both """ predicted, probabilities = self.classifier(example, Orange.core.GetBoth) predicted, probabilities = self.classifier(instance, Orange.core.GetBoth) # Create a place holder for estimates # Calculate all the estimates and add them to the results for estimate in self.estimation_classifiers: probabilities.reliability_estimate.extend(estimate(example, predicted, probabilities)) probabilities.reliability_estimate.extend(estimate(instance, predicted, probabilities)) # Return the appropriate type of result
• ## Orange/misc/environ.py

 r9671 orange_no_deprecated_members = "False" try: install_dir = os.path.dirname(os.path.abspath(__file__)) # Orange/misc install_dir = os.path.dirname(install_dir) # Orange/ install_dir = os.path.dirname(install_dir) # except Exception: # Why should this happen?? raise import orange install_dir = os.path.dirname(os.path.abspath(orange.__file__)) install_dir = os.path.dirname(os.path.abspath(__file__)) # Orange/misc install_dir = os.path.dirname(install_dir) # Orange/ print install_dir doc_install_dir = os.path.join(install_dir, "doc")
• ## Orange/testing/regression/results/orange/orange25/reliability-long.py.txt

 r9679 SAbias absolute        0.095   0.352 LCV absolute           0.069   0.504 BVCK_absolute          0.060   0.562 BAGV absolute          0.078   0.448 BVCK_absolute          0.058   0.574 BAGV absolute            nan     nan CNK signed             0.233   0.021 CNK absolute           0.058   0.574
• ## Orange/testing/regression/results/orange/orange25/reliability-run.py.txt

 r9679 SAvar absolute 10.701260376 SAbias signed -0.0246158599854 SAbias absolute 0.0246158599854 LCV absolute 5.16450013273 BVCK_absolute 0.0697603345819 BAGV absolute 0.0221634151355 CNK signed -0.117357254028 CNK absolute 0.117357254028 Mahalanobis absolute 5.70795369148 Mahalanobis to center 2.74633686492 Instance 0 SAvar absolute 10.701260376 SAbias signed -0.0246158599854 SAbias absolute 0.0246158599854 LCV absolute 5.16450013273 BVCK_absolute 0.0586786270142 BAGV absolute 0.0 CNK signed -0.117357254028 CNK absolute 0.117357254028 Mahalanobis absolute 5.70795369148 Mahalanobis to center 2.74633686492 Instance 1 SAvar absolute 10.7012599945 SAbias signed -0.0270875930786 SAbias absolute 0.0270875930786 LCV absolute 2.99310729294 BVCK_absolute 0.530257987976 BAGV absolute 0.0 CNK signed 1.06051597595 CNK absolute 1.06051597595 Mahalanobis absolute 4.45522534847 Mahalanobis to center 2.11993694001 Instance 2 SAvar absolute 10.7012619019 SAbias signed -0.0769199371338 SAbias absolute 0.0769199371338 LCV absolute 4.46030248588 BVCK_absolute 0.803800964355 BAGV absolute 0.0 CNK signed -1.60760192871 CNK absolute 1.60760192871 Mahalanobis absolute 4.45998907089 Mahalanobis to center 2.20910797014 Instance 3 SAvar absolute 10.7012609482 SAbias signed 0.00884370803833 SAbias absolute 0.00884370803833 LCV absolute 3.63051784733 BVCK_absolute 0.0919738769531 BAGV absolute 0.0 CNK signed 0.183947753906 CNK absolute 0.183947753906 Mahalanobis absolute 4.43703401089 Mahalanobis to center 2.66496913406 Instance 4 SAvar absolute 10.701257515 SAbias signed 0.0423228263855 SAbias absolute 0.0423228263855 LCV absolute 3.32640218766 BVCK_absolute 0.561947059631 BAGV absolute 0.0 CNK signed 1.12389411926 CNK absolute 1.12389411926 Mahalanobis absolute 4.91473317146 Mahalanobis to center 2.69116657497 Instance 5 SAvar absolute 10.7012573242 SAbias signed 0.273178482056 SAbias absolute 0.273178482056 LCV absolute 3.18996776229 BVCK_absolute 0.0291097640991 BAGV absolute 0.0 CNK signed 0.0582195281982 CNK absolute 0.0582195281982 Mahalanobis absolute 4.46615076065 Mahalanobis to center 2.58802566358 Instance 6 SAvar absolute 10.7012595177 SAbias signed -0.101607465744 SAbias absolute 0.101607465744 LCV absolute 2.95812406923 BVCK_absolute 0.36095199585 BAGV absolute 0.0 CNK signed -0.721903991699 CNK absolute 0.721903991699 Mahalanobis absolute 4.64355945587 Mahalanobis to center 2.50879836044 Instance 7 SAvar absolute 10.701257515 SAbias signed -0.133856678009 SAbias absolute 0.133856678009 LCV absolute 2.64825094326 BVCK_absolute 0.236968421936 BAGV absolute 0.0 CNK signed -0.473936843872 CNK absolute 0.473936843872 Mahalanobis absolute 3.15618926287 Mahalanobis to center 3.76633615153 Instance 8 SAvar absolute 10.687993145 SAbias signed -0.0223073482513 SAbias absolute 0.0223073482513 LCV absolute 3.23505105569 BVCK_absolute 0.163931274414 BAGV absolute 0.0 CNK signed -0.327862548828 CNK absolute 0.327862548828 Mahalanobis absolute 6.07192790508 Mahalanobis to center 5.21631597918 Instance 9 SAvar absolute 10.7012593269 SAbias signed 0.0480660915375 SAbias absolute 0.0480660915375 LCV absolute 3.09854127044 BVCK_absolute 0.265921974182 BAGV absolute 0.0 CNK signed -0.531843948364 CNK absolute 0.531843948364 Mahalanobis absolute 4.02153724432 Mahalanobis to center 3.93183348061
• ## docs/reference/rst/Orange.evaluation.reliability.rst

 r9372 .. automodule:: Orange.evaluation.reliability .. index:: Reliability Estimation .. index:: single: reliability; Reliability Estimation for Regression ######################################## Reliability estimation (reliability) ######################################## ************************************* Reliability Estimation for Regression ************************************* Reliability assessment statistically predicts reliability of single predictions. Most of implemented algorithms are taken from Comparison of approaches for estimating reliability of individual regression predictions, Zoran Bosnić, 2008. The following example shows basic usage of reliability estimation methods: .. literalinclude:: code/reliability-basic.py :lines: 7- The important points of this example are: * construction of reliability estimators using classes, implemented in this module, * construction of a reliability learner that bonds a regular learner (:class:~Orange.classification.knn.kNNLearner in this case) with reliability estimators, * calling the constructed classifier with :obj:Orange.classification.Classifier.GetBoth option to obtain class probabilities; :obj:probability is the object that gets appended the :obj:reliability_estimate attribute, an instance of :class:Orange.evaluation.reliability.Estimate, by the reliability learner. It is also possible to do reliability estimation on whole data table, not only on single instance. Next example demonstrates usage of a cross-validation technique for reliability estimation. Reliability estimations for first 10 instances get printed: .. literalinclude:: code/reliability-run.py :lines: 11- Reliability Methods =================== Sensitivity Analysis (SAvar and SAbias) --------------------------------------- .. autoclass:: SensitivityAnalysis Variance of bagged models (BAGV) -------------------------------- .. autoclass:: BaggingVariance Local cross validation reliability estimate (LCV) ------------------------------------------------- .. autoclass:: LocalCrossValidation Local modeling of prediction error (CNK) ---------------------------------------- .. autoclass:: CNeighbours Bagging variance c-neighbours (BVCK) ------------------------------------ .. autoclass:: BaggingVarianceCNeighbours Mahalanobis distance -------------------- .. autoclass:: Mahalanobis Mahalanobis to center --------------------- .. autoclass:: MahalanobisToCenter Reliability estimation wrappers =============================== .. autoclass:: Learner :members: .. autoclass:: Classifier :members: Reliability estimation results ============================== .. autoclass:: Estimate :members: :show-inheritance: There is a dictionary named :obj:METHOD_NAME that maps reliability estimation method IDs (ints) to method names (strings). In this module, there are also two constants for distinguishing signed and absolute reliability estimation measures:: SIGNED = 0 ABSOLUTE = 1 Reliability estimation scoring methods ====================================== .. autofunction:: get_pearson_r .. autofunction:: get_pearson_r_by_iterations .. autofunction:: get_spearman_r Example of usage ================ .. literalinclude:: code/reliability-long.py :lines: 11-26 This script prints out Pearson's R coefficient between reliability estimates and actual prediction errors, and a corresponding p-value, for each of the reliability estimation measures used by default. :: Estimate               r       p SAvar absolute        -0.077   0.454 SAbias signed         -0.165   0.105 SAbias absolute       -0.099   0.333 BAGV absolute          0.104   0.309 CNK signed             0.233   0.021 CNK absolute           0.057   0.579 LCV absolute           0.069   0.504 BVCK_absolute          0.092   0.368 Mahalanobis absolute   0.091   0.375 References ========== Bosnić, Z., Kononenko, I. (2007) Estimation of individual prediction reliability using local sensitivity analysis. _ *Applied Intelligence* 29(3), pp. 187-203. Bosnić, Z., Kononenko, I. (2008) Comparison of approaches for estimating reliability of individual regression predictions. _ *Data & Knowledge Engineering* 67(3), pp. 504-516. Bosnić, Z., Kononenko, I. (2010) Automatic selection of reliability estimates for individual regression predictions. _ *The Knowledge Engineering Review* 25(1), pp. 27-47.
• ## docs/reference/rst/code/reliability-basic.py

 r9372 # Description: Reliability estimation - basic & fast # Category:    evaluation # Uses:        housing # Referenced:  Orange.evaluation.reliability # Classes:     Orange.evaluation.reliability.Mahalanobis, Orange.evaluation.reliability.LocalCrossValidation, Orange.evaluation.reliability.Learner import Orange
• ## docs/reference/rst/code/reliability-long.py

 r9372 # Description: Reliability estimation # Category:    evaluation # Uses:        prostate # Referenced:  Orange.evaluation.reliability # Classes:     Orange.evaluation.reliability.Learner import Orange Orange.evaluation.reliability.select_with_repeat.random_generator = None Orange.evaluation.reliability.select_with_repeat.randseed = 42 import Orange table = Orange.data.Table("prostate.tab") print "Estimate               r       p" for estimate in reliability_res: print "%-20s %7.3f %7.3f" % (Orange.evaluation.reliability.METHOD_NAME[estimate[3]], \ print "%-20s %7.3f %7.3f" % (Orange.evaluation.reliability.METHOD_NAME[estimate[3]], estimate[0], estimate[1]) print "Estimate               r       p" for estimate in reliability_res: print "%-20s %7.3f %7.3f" % (Orange.evaluation.reliability.METHOD_NAME[estimate[3]], \ print "%-20s %7.3f %7.3f" % (Orange.evaluation.reliability.METHOD_NAME[estimate[3]], estimate[0], estimate[1])
• ## docs/reference/rst/code/reliability-run.py

 r9372 # Description: Reliability estimation with cross-validation # Category:    evaluation # Uses:        housing # Referenced:  Orange.evaluation.reliability # Classes:     Orange.evaluation.reliability.Learner import Orange Orange.evaluation.reliability.select_with_repeat.random_generator = None Orange.evaluation.reliability.select_with_repeat.randseed = 42 import Orange table = Orange.data.Table("housing.tab") results = Orange.evaluation.testing.cross_validation([reliability], table) for estimate in results.results[0].probabilities[0].reliability_estimate: print estimate.method_name, estimate.estimate for i, instance in enumerate(results.results[:10]): print "Instance", i for estimate in instance.probabilities[0].reliability_estimate: print "  ", estimate.method_name, estimate.estimate
Note: See TracChangeset for help on using the changeset viewer.