Changeset 9684:323e440e4272 in orange


Ignore:
Timestamp:
02/06/12 11:30:29 (2 years ago)
Author:
Matija Polajnar <matija.polajnar@…>
Branch:
default
Parents:
9683:c52ceca4a985 (diff), 9672:3b64ad491c7e (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge.

Files:
2150 deleted
1 edited

Legend:

Unmodified
Added
Removed
  • Orange/evaluation/reliability.py

    r9671 r9684  
    1 """ 
    2 ######################################## 
    3 Reliability estimation (``reliability``) 
    4 ######################################## 
    5  
    6 .. index:: Reliability Estimation 
    7  
    8 .. index:: 
    9    single: reliability; Reliability Estimation for Regression 
    10  
    11 ************************************* 
    12 Reliability Estimation for Regression 
    13 ************************************* 
    14  
    15 This module includes different implementations of algorithm used for 
    16 predicting reliability of single predictions. Most of the algorithm are taken 
    17 from Comparison of approaches for estimating reliability of individual 
    18 regression predictions, Zoran Bosnic 2008. 
    19  
    20 Next example shows basic reliability estimation usage  
    21 (:download:`reliability-basic.py <code/reliability-basic.py>`, uses :download:`housing.tab <code/housing.tab>`): 
    22  
    23 .. literalinclude:: code/reliability_basic.py 
    24  
    25 First we load our desired data table and choose on learner we want to use  
    26 reliability estimation on. We also want to calculate only the Mahalanobis and  
    27 local cross validation estimates with desired parameters. We learn our  
    28 estimator on data, and estimate the reliability for first instance of data table. 
    29 We output the estimates used and the numbers. 
    30  
    31 We can also do reliability estimation on whole data table not only on single 
    32 instance. Example shows us doing cross validation on the desired data table, 
    33 using default reliability estimates, and at the ending output reliability 
    34 estimates for the first instance of data table. 
    35 (:download:`reliability-run.py <code/reliability-run.py>`, uses :download:`housing.tab <code/housing.tab>`): 
    36  
    37 .. literalinclude:: code/reliability-run.py 
    38  
    39 Reliability estimation methods are computationally quite hard so it may take 
    40 a bit of time for this script to produce a result. In the above example we 
    41 first create a learner that we're interested in, in this example 
    42 k-nearest-neighbors, and use it inside reliability learner and do cross 
    43 validation to get the results. Now we output for the first example in the 
    44 data table all the reliability estimates and their names. 
    45  
    46 Reliability Methods 
    47 =================== 
    48  
    49 Sensitivity Analysis (SAvar and SAbias) 
    50 --------------------------------------- 
    51 .. autoclass:: SensitivityAnalysis 
    52  
    53 Variance of bagged models (BAGV) 
    54 -------------------------------- 
    55 .. autoclass:: BaggingVariance 
    56  
    57 Local cross validation reliability estimate (LCV) 
    58 ------------------------------------------------- 
    59 .. autoclass:: LocalCrossValidation 
    60  
    61 Local modeling of prediction error (CNK) 
    62 ---------------------------------------- 
    63 .. autoclass:: CNeighbours 
    64  
    65 Bagging variance c-neighbours (BVCK) 
    66 ------------------------------------ 
    67  
    68 .. autoclass:: BaggingVarianceCNeighbours 
    69  
    70 Mahalanobis distance 
    71 -------------------- 
    72  
    73 .. autoclass:: Mahalanobis 
    74  
    75 Mahalanobis to center 
    76 --------------------- 
    77  
    78 .. autoclass:: MahalanobisToCenter 
    79  
    80 Reliability estimate learner 
    81 ============================ 
    82  
    83 .. autoclass:: Learner 
    84     :members: 
    85  
    86 Reliability estimation scoring methods 
    87 ====================================== 
    88  
    89 .. autofunction:: get_pearson_r 
    90  
    91 .. autofunction:: get_pearson_r_by_iterations 
    92  
    93 .. autofunction:: get_spearman_r 
    94  
    95 Referencing 
    96 =========== 
    97  
    98 There is a dictionary named :data:`METHOD_NAME` which has stored names of 
    99 all the reliability estimates:: 
    100  
    101   METHOD_NAME = {0: "SAvar absolute", 1: "SAbias signed", 2: "SAbias absolute", 
    102                  3: "BAGV absolute", 4: "CNK signed", 5: "CNK absolute", 
    103                  6: "LCV absolute", 7: "BVCK_absolute", 8: "Mahalanobis absolute", 
    104                  10: "ICV"} 
    105  
    106 and also two constants for saying whether the estimate is signed or it's an 
    107 absolute value:: 
    108  
    109   SIGNED = 0 
    110   ABSOLUTE = 1 
    111  
    112 Example of usage 
    113 ================ 
    114  
    115 Here we will walk through a bit longer example of how to use the reliability 
    116 estimate module (:download:`reliability-long.py <code/reliability-long.py>`, uses :download:`prostate.tab <code/prostate.tab>`): 
    117  
    118 .. literalinclude:: code/reliability-long.py 
    119     :lines: 1-16 
    120  
    121 After loading the Orange library we open out dataset. We chose to work with 
    122 the kNNLearner, that also works on regression problems. Create out reliability 
    123 estimate learner and test it with cross validation.  
    124 Estimates are then compared using Pearson's coefficient to the prediction error. 
    125 The p-values are also computed:: 
    126  
    127   Estimate               r       p 
    128   SAvar absolute        -0.077   0.454 
    129   SAbias signed         -0.165   0.105 
    130   SAbias absolute       -0.099   0.333 
    131   BAGV absolute          0.104   0.309 
    132   CNK signed             0.233   0.021 
    133   CNK absolute           0.057   0.579 
    134   LCV absolute           0.069   0.504 
    135   BVCK_absolute          0.092   0.368 
    136   Mahalanobis absolute   0.091   0.375 
    137  
    138 .. literalinclude:: code/reliability-long.py 
    139     :lines: 18-28 
    140  
    141 Outputs:: 
    142  
    143   Estimate               r       p 
    144   BAGV absolute          0.126   0.220 
    145   CNK signed             0.233   0.021 
    146   CNK absolute           0.057   0.579 
    147   LCV absolute           0.069   0.504 
    148   BVCK_absolute          0.105   0.305 
    149   Mahalanobis absolute   0.091   0.375 
    150  
    151  
    152 As you can see in the above code you can also chose with reliability estimation 
    153 method do you want to use. You might want to do this to reduce computation time  
    154 or because you think they don't perform good enough. 
    155  
    156  
    157 References 
    158 ========== 
    159  
    160 Bosnic Z, Kononenko I (2007) `Estimation of individual prediction reliability using local 
    161 sensitivity analysis. <http://www.springerlink.com/content/e27p2584387532g8/>`_ 
    162 *Applied Intelligence* 29(3), 187-203. 
    163  
    164 Bosnic Z, Kononenko I (2008) `Comparison of approaches for estimating reliability of  
    165 individual regression predictions. 
    166 <http://www.sciencedirect.com/science/article/pii/S0169023X08001080>`_ 
    167 *Data & Knowledge Engineering* 67(3), 504-516. 
    168  
    169 Bosnic Z, Kononenko I (2010) `Automatic selection of reliability estimates for individual  
    170 regression predictions. 
    171 <http://journals.cambridge.org/abstract_S0269888909990154>`_ 
    172 *The Knowledge Engineering Review* 25(1), 27-47. 
    173  
    174 """ 
    1751import Orange 
    1762 
     
    23662def get_pearson_r(res): 
    23763    """ 
    238     Returns Pearsons coefficient between the prediction error and each of the 
    239     used reliability estimates. Function also return the p-value of each of 
     64    :param res: results of evaluation, done using learners, 
     65        wrapped into :class:`Orange.evaluation.reliability.Classifier`. 
     66    :type res: :class:`Orange.evaluation.testing.ExperimentResults` 
     67 
     68    Return Pearson's coefficient between the prediction error and each of the 
     69    used reliability estimates. Also, return the p-value of each of 
    24070    the coefficients. 
    24171    """ 
     
    25686def get_spearman_r(res): 
    25787    """ 
    258     Returns Spearmans coefficient between the prediction error and each of the 
    259     used reliability estimates. Function also return the p-value of each of 
     88    :param res: results of evaluation, done using learners, 
     89        wrapped into :class:`Orange.evaluation.reliability.Classifier`. 
     90    :type res: :class:`Orange.evaluation.testing.ExperimentResults` 
     91 
     92    Return Spearman's coefficient between the prediction error and each of the 
     93    used reliability estimates. Also, return the p-value of each of 
    26094    the coefficients. 
    26195    """ 
     
    276110def get_pearson_r_by_iterations(res): 
    277111    """ 
    278     Returns average Pearsons coefficient over all folds between prediction error 
     112    :param res: results of evaluation, done using learners, 
     113        wrapped into :class:`Orange.evaluation.reliability.Classifier`. 
     114    :type res: :class:`Orange.evaluation.testing.ExperimentResults` 
     115 
     116    Return average Pearson's coefficient over all folds between prediction error 
    279117    and each of the used estimates. 
    280118    """ 
    281119    results_by_fold = Orange.evaluation.scoring.split_by_iterations(res) 
    282120    number_of_estimates = len(res.results[0].probabilities[0].reliability_estimate) 
    283     number_of_examples = len(res.results) 
     121    number_of_instances = len(res.results) 
    284122    number_of_folds = len(results_by_fold) 
    285123    results = [0 for _ in xrange(number_of_estimates)] 
     
    304142    # Calculate p-values 
    305143    results = [float(res) / number_of_folds for res in results] 
    306     ps = [p_value_from_r(r, number_of_examples) for r in results] 
     144    ps = [p_value_from_r(r, number_of_instances) for r in results] 
    307145     
    308146    return zip(results, ps, sig, method_list) 
     
    317155 
    318156class Estimate: 
     157    """ 
     158    Reliability estimate. Contains attributes that describe the results of 
     159    reliability estimation. 
     160 
     161    .. attribute:: estimate 
     162 
     163        A numerical reliability estimate. 
     164 
     165    .. attribute:: signed_or_absolute 
     166 
     167        Determines whether the method used gives a signed or absolute result. 
     168        Has a value of either :obj:`SIGNED` or :obj:`ABSOLUTE`. 
     169 
     170    .. attribute:: method 
     171 
     172        An integer ID of reliability estimation method used. 
     173 
     174    .. attribute:: method_name 
     175 
     176        Name (string) of reliability estimation method used. 
     177 
     178    .. attribute:: icv_method 
     179 
     180        An integer ID of reliability estimation method that performed best, 
     181        as determined by ICV, and of which estimate is stored in the 
     182        :obj:`estimate` field. (:obj:`None` when ICV was not used.) 
     183 
     184    .. attribute:: icv_method_name 
     185 
     186        Name (string) of reliability estimation method that performed best, 
     187        as determined by ICV. (:obj:`None` when ICV was not used.) 
     188 
     189    """ 
    319190    def __init__(self, estimate, signed_or_absolute, method, icv_method = -1): 
    320191        self.estimate = estimate 
     
    332203        self.estimator = estimator 
    333204     
    334     def __call__(self, examples, weight=None, **kwds): 
     205    def __call__(self, instances, weight=None, **kwds): 
    335206         
    336207        # Calculate borders using cross validation 
    337         res = Orange.evaluation.testing.cross_validation([self.estimator], examples) 
     208        res = Orange.evaluation.testing.cross_validation([self.estimator], instances) 
    338209        all_borders = [] 
    339210        for i in xrange(len(res.results[0].probabilities[0].reliability_estimate)): 
     
    344215         
    345216        # Learn on whole train data 
    346         estimator_classifier = self.estimator(examples) 
     217        estimator_classifier = self.estimator(instances) 
    347218         
    348219        return DescriptiveAnalysisClassifier(estimator_classifier, all_borders, self.desc) 
     
    354225        self.desc = desc 
    355226     
    356     def __call__(self, example, result_type=Orange.core.GetValue): 
    357         predicted, probabilities = self.estimator_classifier(example, Orange.core.GetBoth) 
     227    def __call__(self, instance, result_type=Orange.core.GetValue): 
     228        predicted, probabilities = self.estimator_classifier(instance, Orange.core.GetBoth) 
    358229         
    359230        for borders, estimate in zip(self.all_borders, probabilities.reliability_estimate): 
     
    374245    """ 
    375246     
    376     :param e: List of possible e values for SAvar and SAbias reliability estimates, the default value is [0.01, 0.1, 0.5, 1.0, 2.0]. 
     247    :param e: List of possible :math:`\epsilon` values for SAvar and SAbias 
     248        reliability estimates. 
    377249    :type e: list of floats 
    378250     
    379251    :rtype: :class:`Orange.evaluation.reliability.SensitivityAnalysisClassifier` 
    380252     
    381     To estimate the reliabilty for given example we extend the learning set  
    382     with given example and labeling it with :math:`K + \epsilon (l_{max} - l_{min})`, 
    383     where K denotes the initial prediction, :math:`\epsilon` is sensitivity parameter and 
    384     :math:`l_{min}` and :math:`l_{max}` denote lower and the upper bound of 
    385     the learning examples. After computing different sensitivity predictions 
    386     using different values of e, the prediction are combined into SAvar and SAbias. 
    387     SAbias can be used as signed estimate or as absolute value of SAbias.  
     253    To estimate the reliability of prediction for given instance, 
     254    the learning set is extended with this instance, labeled with 
     255    :math:`K + \epsilon (l_{max} - l_{min})`, 
     256    where :math:`K` denotes the initial prediction, 
     257    :math:`\epsilon` is sensitivity parameter and :math:`l_{min}` and 
     258    :math:`l_{max}` denote lower and the upper bound of the learning 
     259    instances' labels. After computing different sensitivity predictions 
     260    using different values of :math:`\epsilon`, the prediction are combined 
     261    into SAvar and SAbias. SAbias can be used in a signed or absolute form. 
    388262 
    389263    :math:`SAvar = \\frac{\sum_{\epsilon \in E}(K_{\epsilon} - K_{-\epsilon})}{|E|}` 
     
    396270        self.e = e 
    397271     
    398     def __call__(self, examples, learner): 
    399         min_value = max_value = examples[0].getclass().value 
    400         for ex in examples: 
     272    def __call__(self, instances, learner): 
     273        min_value = max_value = instances[0].getclass().value 
     274        for ex in instances: 
    401275            if ex.getclass().value > max_value: 
    402276                max_value = ex.getclass().value 
    403277            if ex.getclass().value < min_value: 
    404278                min_value = ex.getclass().value 
    405         return SensitivityAnalysisClassifier(self.e, examples, min_value, max_value, learner) 
     279        return SensitivityAnalysisClassifier(self.e, instances, min_value, max_value, learner) 
    406280     
    407281class SensitivityAnalysisClassifier: 
    408     def __init__(self, e, examples, min_value, max_value, learner): 
     282    def __init__(self, e, instances, min_value, max_value, learner): 
    409283        self.e = e 
    410         self.examples = examples 
     284        self.instances = instances 
    411285        self.max_value = max_value 
    412286        self.min_value = min_value 
    413287        self.learner = learner 
    414288     
    415     def __call__(self, example, predicted, probabilities): 
     289    def __call__(self, instance, predicted, probabilities): 
    416290        # Create new dataset 
    417         r_data = Orange.data.Table(self.examples) 
    418          
    419         # Create new example 
    420         modified_example = Orange.data.Instance(example) 
     291        r_data = Orange.data.Table(self.instances) 
     292         
     293        # Create new instance 
     294        modified_instance = Orange.data.Instance(instance) 
    421295         
    422296        # Append it to the data 
    423         r_data.append(modified_example) 
     297        r_data.append(modified_instance) 
    424298         
    425299        # Calculate SAvar & SAbias 
     
    430304            r_data[-1].setclass(predicted.value + eps*(self.max_value - self.min_value)) 
    431305            c = self.learner(r_data) 
    432             k_plus = c(example, Orange.core.GetValue) 
     306            k_plus = c(instance, Orange.core.GetValue) 
    433307             
    434308            # -epsilon 
    435309            r_data[-1].setclass(predicted.value - eps*(self.max_value - self.min_value)) 
    436310            c = self.learner(r_data) 
    437             k_minus = c(example, Orange.core.GetValue) 
     311            k_minus = c(instance, Orange.core.GetValue) 
    438312            #print len(r_data) 
    439313            #print eps*(self.max_value - self.min_value) 
     
    454328    """ 
    455329     
    456     :param m: Number of bagged models to be used with BAGV estimate 
     330    :param m: Number of bagging models to be used with BAGV estimate 
    457331    :type m: int 
    458332     
    459333    :rtype: :class:`Orange.evaluation.reliability.BaggingVarianceClassifier` 
    460334     
    461     We construct m different bagging models of the original chosen learner and use 
    462     those predictions (:math:`K_i, i = 1, ..., m`) of given example to calculate the variance, which we use as 
    463     reliability estimator. 
     335    :math:`m` different bagging models are constructed and used to estimate 
     336    the value of dependent variable for a given instance. The variance of 
     337    those predictions is used as a prediction reliability estimate. 
    464338 
    465339    :math:`BAGV = \\frac{1}{m} \sum_{i=1}^{m} (K_i - K)^2` 
    466340 
    467     where 
    468  
    469     :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` 
     341    where :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` and :math:`K_i` are 
     342    predictions of individual constructed models. 
    470343     
    471344    """ 
     
    473346        self.m = m 
    474347     
    475     def __call__(self, examples, learner): 
     348    def __call__(self, instances, learner): 
    476349        classifiers = [] 
    477350         
    478351        # Create bagged classifiers using sampling with replacement 
    479352        for _ in xrange(self.m): 
    480             selection = select_with_repeat(len(examples)) 
    481             data = examples.select(selection) 
     353            selection = select_with_repeat(len(instances)) 
     354            data = instances.select(selection) 
    482355            classifiers.append(learner(data)) 
    483356        return BaggingVarianceClassifier(classifiers) 
     
    487360        self.classifiers = classifiers 
    488361     
    489     def __call__(self, example, *args): 
     362    def __call__(self, instance, *args): 
    490363        BAGV = 0 
    491364         
    492365        # Calculate the bagging variance 
    493         bagged_values = [c(example, Orange.core.GetValue).value for c in self.classifiers if c is not None] 
     366        bagged_values = [c(instance, Orange.core.GetValue).value for c in self.classifiers if c is not None] 
    494367         
    495368        k = sum(bagged_values) / len(bagged_values) 
     
    507380    :rtype: :class:`Orange.evaluation.reliability.LocalCrossValidationClassifier` 
    508381     
    509     We find k nearest neighbours to the given example and put them in 
    510     seperate dataset. On this dataset we do leave one out 
    511     validation using given model. Reliability estimate is then distance 
    512     weighted absolute prediction error. 
    513      
    514     1. define the set of k nearest neighours :math:`N = { (x_1, x_1),..., (x_k, c_k)}` 
    515     2. FOR EACH :math:`(x_i, c_i) \in N` 
    516      
    517       2.1. generare model M on :math:`N \\backslash (x_i, c_i)` 
    518      
    519       2.2. for :math:`(x_i, c_i)` compute LOO prediction :math:`K_i` 
    520      
    521       2.3. for :math:`(x_i, c_i)` compute LOO error :math:`E_i = | C_i - K_i |` 
    522      
     382    :math:`k` nearest neighbours to the given instance are found and put in 
     383    a separate data set. On this data set, a leave-one-out validation is 
     384    performed. Reliability estimate is then the distance weighted absolute 
     385    prediction error. 
     386 
     387    If a special value 0 is passed as :math:`k` (as is by default), 
     388    it is set as 1/20 of data set size (or 5, whichever is greater). 
     389     
     390    1. Determine the set of k nearest neighours :math:`N = { (x_1, c_1),..., 
     391       (x_k, c_k)}`. 
     392    2. On this set, compute leave-one-out predictions :math:`K_i` and 
     393       prediction errors :math:`E_i = | C_i - K_i |`. 
    523394    3. :math:`LCV(x) = \\frac{ \sum_{(x_i, c_i) \in N} d(x_i, x) * E_i }{ \sum_{(x_i, c_i) \in N} d(x_i, x) }` 
    524395     
     
    527398        self.k = k 
    528399     
    529     def __call__(self, examples, learner): 
     400    def __call__(self, instances, learner): 
    530401        nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor() 
    531402        nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor() 
    532403         
    533404        distance_id = Orange.data.new_meta_id() 
    534         nearest_neighbours = nearest_neighbours_constructor(examples, 0, distance_id) 
     405        nearest_neighbours = nearest_neighbours_constructor(instances, 0, distance_id) 
    535406         
    536407        if self.k == 0: 
    537             self.k = max(5, len(examples)/20) 
     408            self.k = max(5, len(instances)/20) 
    538409         
    539410        return LocalCrossValidationClassifier(distance_id, nearest_neighbours, self.k, learner) 
     
    546417        self.learner = learner 
    547418     
    548     def __call__(self, example, *args): 
     419    def __call__(self, instance, *args): 
    549420        LCVer = 0 
    550421        LCVdi = 0 
     
    552423        # Find k nearest neighbors 
    553424         
    554         knn = [ex for ex in self.nearest_neighbours(example, self.k)] 
     425        knn = [ex for ex in self.nearest_neighbours(instance, self.k)] 
    555426         
    556427        # leave one out of prediction error 
     
    581452    :rtype: :class:`Orange.evaluation.reliability.CNeighboursClassifier` 
    582453     
    583     Estimate CNK is defined for unlabeled example as difference between 
    584     average label of the nearest neighbours and the examples prediction. CNK can 
    585     be used as a signed estimate or only as absolute value.  
     454    CNK is defined for an unlabeled instance as a difference between average 
     455    label of its nearest neighbours and its prediction. CNK can be used as a 
     456    signed or absolute estimate. 
    586457     
    587458    :math:`CNK = \\frac{\sum_{i=1}^{k}C_i}{k} - K` 
    588459     
    589     Where k denotes number of neighbors, C :sub:`i` denotes neighbours' labels and 
    590     K denotes the example's prediction. 
     460    where :math:`k` denotes number of neighbors, C :sub:`i` denotes neighbours' 
     461    labels and :math:`K` denotes the instance's prediction. 
    591462     
    592463    """ 
     
    594465        self.k = k 
    595466     
    596     def __call__(self, examples, learner): 
     467    def __call__(self, instances, learner): 
    597468        nearest_neighbours_constructor = Orange.classification.knn.FindNearestConstructor() 
    598469        nearest_neighbours_constructor.distanceConstructor = Orange.distance.EuclideanConstructor() 
    599470         
    600471        distance_id = Orange.data.new_meta_id() 
    601         nearest_neighbours = nearest_neighbours_constructor(examples, 0, distance_id) 
     472        nearest_neighbours = nearest_neighbours_constructor(instances, 0, distance_id) 
    602473        return CNeighboursClassifier(nearest_neighbours, self.k) 
    603474 
     
    607478        self.k = k 
    608479     
    609     def __call__(self, example, predicted, probabilities): 
     480    def __call__(self, instance, predicted, probabilities): 
    610481        CNK = 0 
    611482         
    612483        # Find k nearest neighbors 
    613484         
    614         knn = [ex for ex in self.nearest_neighbours(example, self.k)] 
     485        knn = [ex for ex in self.nearest_neighbours(instance, self.k)] 
    615486         
    616487        # average label of neighbors 
     
    627498    """ 
    628499     
    629     :param k: Number of nearest neighbours used in Mahalanobis estimate 
     500    :param k: Number of nearest neighbours used in Mahalanobis estimate. 
    630501    :type k: int 
    631502     
    632503    :rtype: :class:`Orange.evaluation.reliability.MahalanobisClassifier` 
    633504     
    634     Mahalanobis distance estimate is defined as `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ to the 
    635     k nearest neighbours of chosen example. 
     505    Mahalanobis distance reliability estimate is defined as 
     506    `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ 
     507    to the evaluated instance's :math:`k` nearest neighbours. 
    636508 
    637509     
     
    640512        self.k = k 
    641513     
    642     def __call__(self, examples, *args): 
     514    def __call__(self, instances, *args): 
    643515        nnm = Orange.classification.knn.FindNearestConstructor() 
    644516        nnm.distanceConstructor = Orange.distance.MahalanobisConstructor() 
    645517         
    646518        mid = Orange.data.new_meta_id() 
    647         nnm = nnm(examples, 0, mid) 
     519        nnm = nnm(instances, 0, mid) 
    648520        return MahalanobisClassifier(self.k, nnm, mid) 
    649521 
     
    654526        self.mid = mid 
    655527     
    656     def __call__(self, example, *args): 
     528    def __call__(self, instance, *args): 
    657529        mahalanobis_distance = 0 
    658530         
    659         mahalanobis_distance = sum(ex[self.mid].value for ex in self.nnm(example, self.k)) 
     531        mahalanobis_distance = sum(ex[self.mid].value for ex in self.nnm(instance, self.k)) 
    660532         
    661533        return [ Estimate(mahalanobis_distance, ABSOLUTE, MAHAL_ABSOLUTE) ] 
     
    665537    :rtype: :class:`Orange.evaluation.reliability.MahalanobisToCenterClassifier` 
    666538     
    667     Mahalanobis distance to center estimate is defined as `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ to the 
    668     centroid of the data. 
     539    Mahalanobis distance to center reliability estimate is defined as a 
     540    `mahalanobis distance <http://en.wikipedia.org/wiki/Mahalanobis_distance>`_ 
     541    between the predicted instance and the centroid of the data. 
    669542 
    670543     
     
    673546        pass 
    674547     
    675     def __call__(self, examples, *args): 
     548    def __call__(self, instances, *args): 
    676549        dc = Orange.core.DomainContinuizer() 
    677550        dc.classTreatment = Orange.core.DomainContinuizer.Ignore 
     
    679552        dc.multinomialTreatment = Orange.core.DomainContinuizer.NValues 
    680553         
    681         new_domain = dc(examples) 
    682         new_examples = examples.translate(new_domain) 
    683          
    684         X, _, _ = new_examples.to_numpy() 
    685         example_avg = numpy.average(X, 0) 
     554        new_domain = dc(instances) 
     555        new_instances = instances.translate(new_domain) 
     556         
     557        X, _, _ = new_instances.to_numpy() 
     558        instance_avg = numpy.average(X, 0) 
    686559         
    687560        distance_constructor = Orange.distance.MahalanobisConstructor() 
    688         distance = distance_constructor(new_examples) 
    689          
    690         average_example = Orange.data.Instance(new_examples.domain, list(example_avg) + ["?"]) 
    691          
    692         return MahalanobisToCenterClassifier(distance, average_example, new_domain) 
     561        distance = distance_constructor(new_instances) 
     562         
     563        average_instance = Orange.data.Instance(new_instances.domain, list(instance_avg) + ["?"]) 
     564         
     565        return MahalanobisToCenterClassifier(distance, average_instance, new_domain) 
    693566 
    694567class MahalanobisToCenterClassifier: 
    695     def __init__(self, distance, average_example, new_domain): 
     568    def __init__(self, distance, average_instance, new_domain): 
    696569        self.distance = distance 
    697         self.average_example = average_example 
     570        self.average_instance = average_instance 
    698571        self.new_domain = new_domain 
    699572     
    700     def __call__(self, example, *args): 
    701          
    702         ex = Orange.data.Instance(self.new_domain, example) 
    703          
    704         mahalanobis_to_center = self.distance(ex, self.average_example) 
     573    def __call__(self, instance, *args): 
     574         
     575        inst = Orange.data.Instance(self.new_domain, instance) 
     576         
     577        mahalanobis_to_center = self.distance(inst, self.average_instance) 
    705578         
    706579        return [ Estimate(mahalanobis_to_center, ABSOLUTE, MAHAL_TO_CENTER_ABSOLUTE) ] 
     
    718591    :rtype: :class:`Orange.evaluation.reliability.BaggingVarianceCNeighboursClassifier` 
    719592     
    720     BVCK is a combination of Bagging variance and local modeling of prediction 
    721     error, for this estimate we take the average of both. 
     593    BVCK is a combination (average) of Bagging variance and local modeling of 
     594    prediction error. 
    722595     
    723596    """ 
     
    726599        self.cnk = cnk 
    727600     
    728     def __call__(self, examples, learner): 
    729         bagv_classifier = self.bagv(examples, learner) 
    730         cnk_classifier = self.cnk(examples, learner) 
     601    def __call__(self, instances, learner): 
     602        bagv_classifier = self.bagv(instances, learner) 
     603        cnk_classifier = self.cnk(instances, learner) 
    731604        return BaggingVarianceCNeighboursClassifier(bagv_classifier, cnk_classifier) 
    732605 
     
    736609        self.cnk_classifier = cnk_classifier 
    737610     
    738     def __call__(self, example, predicted, probabilities): 
    739         bagv_estimates = self.bagv_classifier(example, predicted, probabilities) 
    740         cnk_estimates = self.cnk_classifier(example, predicted, probabilities) 
     611    def __call__(self, instance, predicted, probabilities): 
     612        bagv_estimates = self.bagv_classifier(instance, predicted, probabilities) 
     613        cnk_estimates = self.cnk_classifier(instance, predicted, probabilities) 
    741614         
    742615        bvck_value = (bagv_estimates[0].estimate + cnk_estimates[1].estimate)/2 
     
    750623        pass 
    751624     
    752     def __call__(self, examples, learner): 
    753         res = Orange.evaluation.testing.cross_validation([learner], examples) 
     625    def __call__(self, instances, learner): 
     626        res = Orange.evaluation.testing.cross_validation([learner], instances) 
    754627        prediction_errors = get_prediction_error_list(res) 
    755628         
    756         new_domain = Orange.data.Domain(examples.domain.attributes, Orange.core.FloatVariable("pe")) 
    757         new_dataset = Orange.data.Table(new_domain, examples) 
    758          
    759         for example, prediction_error in izip(new_dataset, prediction_errors): 
    760             example.set_class(prediction_error) 
     629        new_domain = Orange.data.Domain(instances.domain.attributes, Orange.core.FloatVariable("pe")) 
     630        new_dataset = Orange.data.Table(new_domain, instances) 
     631         
     632        for instance, prediction_error in izip(new_dataset, prediction_errors): 
     633            instance.set_class(prediction_error) 
    761634         
    762635        rf = Orange.ensemble.forest.RandomForestLearner() 
     
    770643        self.new_domain = new_domain 
    771644     
    772     def __call__(self, example, predicted, probabilities): 
    773         new_example = Orange.data.Instance(self.new_domain, example) 
    774         value = self.rf_classifier(new_example, Orange.core.GetValue) 
     645    def __call__(self, instance, predicted, probabilities): 
     646        new_instance = Orange.data.Instance(self.new_domain, instance) 
     647        value = self.rf_classifier(new_instance, Orange.core.GetValue) 
    775648         
    776649        return [Estimate(value.value, SIGNED, SABIAS_SIGNED)] 
     
    780653    Reliability estimation wrapper around a learner we want to test. 
    781654    Different reliability estimation algorithms can be used on the 
    782     chosen learner. This learner works as any other and can be used as one. 
    783     The only difference is when the classifier is called with a given 
    784     example instead of only return the value and probabilities, it also 
    785     attaches a list of reliability estimates to  
    786     :data:`probabilities.reliability_estimate`. 
    787     Each reliability estimate consists of a tuple  
    788     (estimate, signed_or_absolute, method). 
    789      
    790     :param box_learner: Learner we want to wrap into reliability estimation 
     655    chosen learner. This learner works as any other and can be used as one, 
     656    but it returns the classifier, wrapped into an instance of 
     657    :class:`Orange.evaluation.reliability.Classifier`. 
     658     
     659    :param box_learner: Learner we want to wrap into a reliability estimation 
     660        classifier. 
    791661    :type box_learner: learner 
    792662     
     
    815685         
    816686     
    817     def __call__(self, examples, weight=None, **kwds): 
     687    def __call__(self, instances, weight=None, **kwds): 
    818688        """Learn from the given table of data instances. 
    819689         
     
    828698        new_domain = None 
    829699         
    830         if examples.domain.class_var.var_type != Orange.data.variable.Continuous.Continuous: 
     700        if instances.domain.class_var.var_type != Orange.data.variable.Continuous.Continuous: 
    831701            raise Exception("This method only works on data with continuous class.") 
    832702         
    833         return Classifier(examples, self.box_learner, self.estimators, self.blending, new_domain, blending_classifier) 
    834      
    835     def internal_cross_validation(self, examples, folds=10): 
    836         """ Performs the ususal internal cross validation for getting the best 
    837         reliability estimate. It uses the reliability estimators defined in  
    838         estimators attribute. Returns the id of the method that scored the  
    839         best. """ 
    840         res = Orange.evaluation.testing.cross_validation([self], examples, folds=folds) 
     703        return Classifier(instances, self.box_learner, self.estimators, self.blending, new_domain, blending_classifier) 
     704     
     705    def internal_cross_validation(self, instances, folds=10): 
     706        """ Perform the internal cross validation for getting the best 
     707        reliability estimate. It uses the reliability estimators defined in 
     708        estimators attribute. 
     709 
     710        Returns the id of the method that scored the best. 
     711 
     712        :param instances: Data instances to use for ICV. 
     713        :type instances: :class:`Orange.data.Table` 
     714        :param folds: number of folds for ICV. 
     715        :type folds: int 
     716        :rtype: int 
     717 
     718        """ 
     719        res = Orange.evaluation.testing.cross_validation([self], instances, folds=folds) 
    841720        results = get_pearson_r(res) 
    842721        sorted_results = sorted(results) 
    843722        return sorted_results[-1][3] 
    844723     
    845     def internal_cross_validation_testing(self, examples, folds=10): 
    846         """ Performs internal cross validation (as in Automatic selection of 
     724    def internal_cross_validation_testing(self, instances, folds=10): 
     725        """ Perform internal cross validation (as in Automatic selection of 
    847726        reliability estimates for individual regression predictions, 
    848         Zoran Bosnic 2010) and return id of the method 
    849         that scored best on this data. """ 
    850         cv_indices = Orange.core.MakeRandomIndicesCV(examples, folds) 
     727        Zoran Bosnic, 2010) and return id of the method 
     728        that scored best on this data. 
     729 
     730        :param instances: Data instances to use for ICV. 
     731        :type instances: :class:`Orange.data.Table` 
     732        :param folds: number of folds for ICV. 
     733        :type folds: int 
     734        :rtype: int 
     735 
     736        """ 
     737        cv_indices = Orange.core.MakeRandomIndicesCV(instances, folds) 
    851738         
    852739        list_of_rs = [] 
     
    855742         
    856743        for fold in xrange(folds): 
    857             data = examples.select(cv_indices, fold) 
     744            data = instances.select(cv_indices, fold) 
    858745            if len(data) < 10: 
    859746                res = Orange.evaluation.testing.leave_one_out([self], data) 
     
    869756 
    870757class Classifier: 
    871     def __init__(self, examples, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds): 
     758    """ 
     759    A reliability estimation wrapper for classifiers. 
     760 
     761    What distinguishes this classifier is that the returned probabilities (if 
     762    :obj:`Orange.classification.Classifier.GetProbabilities` or 
     763    :obj:`Orange.classification.Classifier.GetBoth` is passed) contain an 
     764    additional attribute :obj:`reliability_estimate`, which is an instance of 
     765     :class:`~Orange.evaluation.reliability.Estimate`. 
     766 
     767    """ 
     768 
     769    def __init__(self, instances, box_learner, estimators, blending, blending_domain, rf_classifier, **kwds): 
    872770        self.__dict__.update(kwds) 
    873         self.examples = examples 
     771        self.instances = instances 
    874772        self.box_learner = box_learner 
    875773        self.estimators = estimators 
     
    879777         
    880778        # Train the learner with original data 
    881         self.classifier = box_learner(examples) 
     779        self.classifier = box_learner(instances) 
    882780         
    883781        # Train all the estimators and create their classifiers 
    884         self.estimation_classifiers = [estimator(examples, box_learner) for estimator in estimators] 
    885      
    886     def __call__(self, example, result_type=Orange.core.GetValue): 
     782        self.estimation_classifiers = [estimator(instances, box_learner) for estimator in estimators] 
     783     
     784    def __call__(self, instance, result_type=Orange.core.GetValue): 
    887785        """ 
    888         Classify and estimate a new instance. When you chose  
    889         Orange.core.GetBoth or Orange.core.getProbabilities, you can access  
    890         the reliability estimates inside probabilities.reliability_estimate. 
     786        Classify and estimate reliability of estimation for a new instance. 
     787        When :obj:`result_type` is set to 
     788        :obj:`Orange.classification.Classifier.GetBoth` or 
     789        :obj:`Orange.classification.Classifier.GetProbabilities`, 
     790        an additional attribute :obj:`reliability_estimate`, 
     791        which is an instance of 
     792        :class:`~Orange.evaluation.reliability.Estimate`, 
     793        is added to the distribution object. 
    891794         
    892795        :param instance: instance to be classified. 
     
    899802              :class:`Orange.statistics.Distribution` or a tuple with both 
    900803        """ 
    901         predicted, probabilities = self.classifier(example, Orange.core.GetBoth) 
     804        predicted, probabilities = self.classifier(instance, Orange.core.GetBoth) 
    902805         
    903806        # Create a place holder for estimates 
     
    910813        # Calculate all the estimates and add them to the results 
    911814        for estimate in self.estimation_classifiers: 
    912             probabilities.reliability_estimate.extend(estimate(example, predicted, probabilities)) 
     815            probabilities.reliability_estimate.extend(estimate(instance, predicted, probabilities)) 
    913816         
    914817        # Return the appropriate type of result 
Note: See TracChangeset for help on using the changeset viewer.