Changeset 53:ba8bc7d59e7a in orange-reliability


Ignore:
Timestamp:
10/04/13 13:35:18 (6 months ago)
Author:
markotoplak
Branch:
default
Message:

Updates to documentation.

Files:
5 edited

Legend:

Unmodified
Added
Removed
  • README.rst

    r50 r53  
    22================== 
    33 
    4 Orange Reliability is an add-on for Orange_ data mining software package. It 
    5 extends Orange by providing functionality to estimate reliability of individual 
    6 regression and classification predictions. 
     4Orange Reliability is an add-on for Orange_ data mining software package that  
     5enables the estimation of reliabilities for individual predictions. 
    76 
    87.. _Orange: http://orange.biolab.si/ 
  • docs/rst/Orange.evaluation.reliability.rst

    r42 r53  
    168168 
    169169.. [Pevec2011] Pevec, D., Štrumbelj, E., Kononenko, I. (2011) `Evaluating Reliability of Single Classifications of Neural Networks. <http://www.springerlink.com /content/48u881761h127r33/export-citation/>`_ *Adaptive and Natural Computing Algorithms*, 2011, pp. 22-30. 
     170 
     171.. [Wolpert1992] Wolpert, David H. (1992) `Stacked generalization.` *Neural Networks*, Vol. 5, 1992,  pp. 241-259. 
     172 
     173.. [Dzeroski2004] Dzeroski, S. and Zenko, B. (2004) `Is combining classifiers with stacking better than selecting the best one?` *Machine Learning*, Vol. 54, 2004,  pp. 255-273. 
  • docs/rst/index.rst

    r4 r53  
    22================================ 
    33 
    4 Orange Reliability is an add-on for Orange_ data mining software package. It 
    5 extends Orange by providing functionality to estimate reliability of individual 
    6 regression and classification predictions. 
     4Orange Reliability is an add-on for Orange_ data mining software package that  
     5enables the estimation of reliabilities for individual predictions. 
    76 
    87.. _Orange: http://orange.biolab.si/ 
  • orangecontrib/reliability/__init__.py

    r42 r53  
    276276    """ 
    277277     
    278     :param e: List of possible :math:`\epsilon` values for SAvar and SAbias 
    279         reliability estimates. 
     278    :param e: Values of :math:`\epsilon`. 
    280279    :type e: list of floats 
    281280     
    282281    :rtype: :class:`Orange.evaluation.reliability.SensitivityAnalysisClassifier` 
    283282     
    284     To estimate the reliability of prediction for given instance, 
    285     the learning set is extended with this instance, labeled with 
    286     :math:`K + \epsilon (l_{max} - l_{min})`, 
    287     where :math:`K` denotes the initial prediction, 
    288     :math:`\epsilon` is sensitivity parameter and :math:`l_{min}` and 
    289     :math:`l_{max}` denote lower and the upper bound of the learning 
    290     instances' labels. After computing different sensitivity predictions 
    291     using different values of :math:`\epsilon`, the prediction are combined 
    292     into SAvar and SAbias. SAbias can be used in a signed or absolute form. 
     283    To estimate the reliability of prediction for a given instance, 
     284    the learning set is extended with that instance with the label changes to  
     285    :math:`K + \epsilon (l_{max} - l_{min})` (:math:`K` is  the initial prediction, 
     286    :math:`\epsilon` a sensitivity parameter, and :math:`l_{min}` and 
     287    :math:`l_{max}` the lower and upper bounds of labels on training data) 
     288    Results for multiple values of :math:`\epsilon` are combined 
     289    into SAvar and SAbias. SAbias can be used either in a signed or absolute form. 
    293290 
    294291    :math:`SAvar = \\frac{\sum_{\epsilon \in E}(K_{\epsilon} - K_{-\epsilon})}{|E|}` 
    295  
    296292    :math:`SAbias = \\frac{\sum_{\epsilon \in E} (K_{\epsilon} - K ) + (K_{-\epsilon} - K)}{2 |E|}` 
    297293     
     
    364360    :rtype: :class:`Orange.evaluation.reliability.ReferenceExpectedErrorClassifier` 
    365361 
    366     Reference reliability estimation method for classification [Pevec2011]_: 
    367  
    368     :math:`O_{ref} = 2 (\hat y - \hat y ^2) = 2 \hat y (1-\hat y)`, 
    369  
    370     where :math:`\hat y` is the estimated probability of the predicted class. 
    371  
    372     Note that for this method, in contrast with all others, a greater estimate means lower reliability (greater expected error). 
     362    Reference estimate for classification: :math:`O_{ref} = 2 (\hat y - \hat y ^2) = 2 \hat y (1-\hat y)`, where :math:`\hat y` is the estimated probability of the predicted class [Pevec2011]_. 
     363 
     364    A greater estimate means a greater expected error. 
    373365 
    374366    """ 
     
    397389    :type m: int 
    398390     
     391    :param for instances:  Optional. If test instances 
     392      are given as a parameter, this class can compute their reliabilities 
     393      on the fly, which saves memory.  
     394 
     395    :type for_intances: Orange.data.Table 
     396     
    399397    :rtype: :class:`Orange.evaluation.reliability.BaggingVarianceClassifier` 
    400398     
    401     :math:`m` different bagging models are constructed and used to estimate 
    402     the value of dependent variable for a given instance. In regression, 
    403     the variance of those predictions is used as a prediction reliability 
    404     estimate. 
    405  
    406     :math:`BAGV = \\frac{1}{m} \sum_{i=1}^{m} (K_i - K)^2` 
    407  
    408     where :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` and :math:`K_i` are 
    409     predictions of individual constructed models. Note that a greater value 
    410     implies greater error. 
     399     
     400    :math:`m` different bagging models are used to estimate 
     401    the value of dependent variable for a given instance. For regression, 
     402    the variance of predictions is a reliability 
     403    estimate: 
     404 
     405    :math:`BAGV = \\frac{1}{m} \sum_{i=1}^{m} (K_i - K)^2`, where  
     406    :math:`K = \\frac{\sum_{i=1}^{m} K_i}{m}` and :math:`K_i` are 
     407    predictions of individual models. 
    411408 
    412409    For classification, 1 minus the average Euclidean distance between class 
    413410    probability distributions predicted by the model, and distributions 
    414     predicted by the individual bagged models, is used as the BAGV reliability 
    415     measure. Note that in this case a greater value implies a better 
     411    predicted by the individual bagged models, is the BAGV reliability 
     412    measure. For classification, a greater value implies a better 
    416413    prediction. 
    417414     
    418     This reliability measure can run out of memory fast if individual classifiers 
    419     use a lot of memory, as it build m of them, thereby using :math:`m` times memory 
    420     for a single classifier. If instances for measuring predictions 
    421     are given as a parameter, this class can only compute their reliability, 
    422     which saves memory.  
    423  
     415    This reliability measure can run out of memory if individual classifiers themselves 
     416    use a lot of memory; it needs :math:`m` times memory 
     417    for a single classifier.  
    424418    """ 
    425419    def __init__(self, m=50, name="bv", randseed=0, for_instances=None): 
    426         """ 
    427         for_instances:  
    428         """ 
     420 
    429421        self.m = m 
    430422        self.name = name 
     
    501493    """ 
    502494 
    503     :param k: Number of nearest neighbours used in LCV estimate 
     495    :param k: Number of nearest neighbours used. Default: 0, which denotes 
     496        1/20 of data set size (or 5, whichever is greater). 
    504497    :type k: int 
    505498 
    506     :param distance: function that computes a distance between two discrete 
     499    :param distance: Function that computes a distance between two discrete 
    507500        distributions (used only in classification problems). The default 
    508501        is Hellinger distance. 
    509502    :type distance: function 
    510503 
    511     :param distance_weighted: for classification reliability estimation, 
     504    :param distance_weighted: For classification, 
    512505        use an average distance between distributions, weighted by :math:`e^{-d}`, 
    513506        where :math:`d` is the distance between predicted instance and the 
     
    516509    :rtype: :class:`Orange.evaluation.reliability.LocalCrossValidationClassifier` 
    517510 
    518     :math:`k` nearest neighbours to the given instance are found and put in 
    519     a separate data set. On this data set, a leave-one-out validation is 
    520     performed. Reliability estimate for regression is then the distance 
    521     weighted absolute prediction error. In classification, 1 minus the average 
     511    Leave-one-out validation is 
     512    performed on :math:`k` nearest neighbours to the given instance. 
     513    Reliability estimate for regression is then the distance 
     514    weighted absolute prediction error. For classification, it is 1 minus the average 
    522515    distance between the predicted class probability distribution and the 
    523516    (trivial) probability distributions of the nearest neighbour. 
    524  
    525     If a special value 0 is passed as :math:`k` (as is by default), 
    526     it is set as 1/20 of data set size (or 5, whichever is greater). 
    527  
    528     Summary of the algorithm for regression: 
    529  
    530     1. Determine the set of k nearest neighours :math:`N = { (x_1, c_1),..., 
    531        (x_k, c_k)}`. 
    532     2. On this set, compute leave-one-out predictions :math:`K_i` and 
    533        prediction errors :math:`E_i = | C_i - K_i |`. 
    534     3. :math:`LCV(x) = \\frac{ \sum_{(x_i, c_i) \in N} d(x_i, x) * E_i }{ \sum_{(x_i, c_i) \in N} d(x_i, x) }` 
    535  
    536517    """ 
    537518    def __init__(self, k=0, distance=hellinger_dist, distance_weighted=True, name="lcv"): 
     
    603584    """ 
    604585     
    605     :param k: Number of nearest neighbours used in CNK estimate 
     586    :param k: Number of nearest neighbours. 
    606587    :type k: int 
    607588 
     
    613594    :rtype: :class:`Orange.evaluation.reliability.CNeighboursClassifier` 
    614595     
    615     For regression, CNK is defined for an unlabeled instance as a difference 
    616     between average label of its nearest neighbours and its prediction. CNK 
    617     can be used as a signed or absolute estimate. 
    618      
    619     :math:`CNK = \\frac{\sum_{i=1}^{k}C_i}{k} - K` 
    620      
    621     where :math:`k` denotes number of neighbors, C :sub:`i` denotes neighbours' 
    622     labels and :math:`K` denotes the instance's prediction. Note that a greater 
    623     value implies greater prediction error. 
     596    For regression, CNK is defined a difference 
     597    between average label of its nearest neighbours and the prediction. CNK 
     598    can be either signed or absolute. A greater value implies greater prediction error. 
    624599 
    625600    For classification, CNK is equal to 1 minus the average distance between 
    626601    predicted class distribution and (trivial) class distributions of the 
    627     $k$ nearest neighbours from the learning set. Note that in this case 
    628     a greater value implies better prediction. 
     602    $k$ nearest neighbours from the learning set. A greater value implies better prediction. 
    629603     
    630604    """ 
     
    905879 
    906880    This methods develops a model that integrates reliability estimates 
    907     from all available reliability scoring techniques. To develop such 
    908     model it needs to performs internal cross-validation, similarly to :class:`ICV`. 
     881    from all available reliability scoring techniques (see [Wolpert1992]_ and [Dzeroski2004]_). It 
     882    performs internal cross-validation and therefore takes roughly the same time 
     883    as :class:`ICV`. 
    909884 
    910885    :param stack_learner: a data modelling method. Default (if None): unregularized linear regression with prior normalization. 
     
    918893 
    919894    :param save_data: If True, save the data used for training the 
    920         model for intergration into resulting classifier's .data attribute (default False). 
     895        integration model into resulting classifier's .data attribute (default False). 
    921896    :type box_learner: :obj:`bool` 
    922897  
  • setup.py

    r52 r53  
    1616DOCUMENTATION_NAME = 'Orange Reliability' 
    1717 
    18 VERSION = '0.2.13' 
     18VERSION = '0.2.14' 
    1919 
    2020DESCRIPTION = 'Orange Reliability add-on for Orange data mining software package.' 
Note: See TracChangeset for help on using the changeset viewer.