Changeset 7464:7327feaec329 in orange


Ignore:
Timestamp:
02/04/11 16:13:54 (3 years ago)
Author:
mocnik <mocnik@…>
Branch:
default
Convert:
189b44b77d59dcc83f2603d96b5024114547ff30
Message:

Modifying Orange.evaluate.scoring documentation during retreat.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/evaluation/scoring.py

    r7396 r7464  
    22 
    33.. index: scoring 
    4  
    5 ================================= 
    6 Orange Statistics for Predictions 
    7 ================================= 
    84 
    95This module contains various measures of quality for classification and 
    106regression. Most functions require an argument named res, an instance of 
    11 :class:Orange.evaluation.testing.ExperimentResults as computed by functions from orngTest and which contains 
    12 predictions obtained through cross-validation, leave one-out, testing on 
    13 training data or test set examples. 
     7:class:`Orange.evaluation.testing.ExperimentResults` as computed by 
     8functions from orngTest and which contains predictions obtained through 
     9cross-validation, leave one-out, testing on training data or test set examples. 
     10 
     11============== 
     12Classification 
     13============== 
     14 
     15To prepare some data for examples on this page, we shall load the voting data 
     16set (problem of predicting the congressman's party (republican, democrat) 
     17based on a selection of votes) and evaluate naive Bayesian learner, 
     18classification trees and majority classifier using cross-validation. 
     19For examples requiring a multivalued class problem, we shall do the same 
     20with the vehicle data set (telling whether a vehicle described by the features 
     21extracted from a picture is a van, bus, or Opel or Saab car). 
     22 
     23Basic cross validation example is shown in the following part of  
     24(`statExamples.py`_, uses `voting.tab`_ and `vehicle.tab`_): 
     25 
     26.. literalinclude:: code/statExample0.py 
     27 
     28.. _voting.tab: code/voting.tab 
     29.. _vehicle.tab: code/vehicle.tab 
     30.. _statExamples.py: code/statExamples.py 
     31 
     32If instances are weighted, weights are taken into account. This can be 
     33disabled by giving :obj:`unweighted=1` as a keyword argument. Another way of 
     34disabling weights is to clear the 
     35:class:`Orange.evaluation.testing.ExperimentResults`' flag weights. 
     36 
     37General Measures of Quality 
     38=========================== 
     39 
     40.. autofunction:: CA 
     41 
     42.. autofunction:: AP 
     43 
     44.. autofunction:: BrierScore 
     45 
     46.. autofunction:: IS 
     47 
     48So, let's compute all this in part of  
     49(`statExamples.py`_, uses `voting.tab`_ and `vehicle.tab`_) and print it out: 
     50 
     51.. literalinclude:: code/statExample1.py 
     52 
     53.. _voting.tab: code/voting.tab 
     54.. _vehicle.tab: code/vehicle.tab 
     55.. _statExamples.py: code/statExamples.py 
     56 
     57The output should look like this:: 
     58 
     59    method  CA  AP  Brier   IS 
     60    bayes   0.903   0.902   0.175    0.759 
     61    tree    0.846   0.845   0.286    0.641 
     62    majrty  0.614   0.526   0.474   -0.000 
     63 
     64Script `statExamples.py`_ contains another example that also prints out  
     65the standard errors. 
     66 
     67.. _statExamples.py: code/statExamples.py 
     68 
     69Confusion Matrix 
     70================ 
     71 
     72.. autofunction:: confusionMatrices 
     73 
     74   **A positive-negative confusion matrix** is computed (a) if the class is 
     75   binary unless classIndex argument is -2, (b) if the class is multivalued 
     76   and the classIndex is non-negative. Argument classIndex then tells which 
     77   class is positive. In case (a), classIndex may be omited; the first class 
     78   is then negative and the second is positive, unless the baseClass attribute 
     79   in the object with results has non-negative value. In that case, baseClass 
     80   is an index of the traget class. baseClass attribute of results object 
     81   should be set manually. The result of a function is a list of instances 
     82   of class ConfusionMatrix, containing the (weighted) number of true 
     83   positives (TP), false negatives (FN), false positives (FP) and true 
     84   negatives (TN). 
     85    
     86   We can also add the keyword argument cutoff 
     87   (e.g. confusionMatrices(results, cutoff=0.3); if we do, confusionMatrices 
     88   will disregard the classifiers' class predictions and observe the predicted 
     89   probabilities, and consider the prediction "positive" if the predicted 
     90   probability of the positive class is higher than the cutoff. 
     91 
     92   The example (part of `statExamples.py`_) below shows how setting the 
     93   cut off threshold from the default 0.5 to 0.2 affects the confusion matrics  
     94   for naive Bayesian classifier:: 
     95    
     96       cm = orngStat.confusionMatrices(res)[0] 
     97       print "Confusion matrix for naive Bayes:" 
     98       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) 
     99        
     100       cm = orngStat.confusionMatrices(res, cutoff=0.2)[0] 
     101       print "Confusion matrix for naive Bayes:" 
     102       print "TP: %i, FP: %i, FN: %s, TN: %i" % (cm.TP, cm.FP, cm.FN, cm.TN) 
     103 
     104   .. _statExamples.py: code/statExamples.py 
     105    
     106   The output:: 
     107    
     108       Confusion matrix for naive Bayes: 
     109       TP: 238, FP: 13, FN: 29.0, TN: 155 
     110       Confusion matrix for naive Bayes: 
     111       TP: 239, FP: 18, FN: 28.0, TN: 150 
     112    
     113   shows that the number of true positives increases (and hence the number of 
     114   false negatives decreases) by only a single example, while five examples 
     115   that were originally true negatives become false positives due to the 
     116   lower threshold. 
     117    
     118   To observe how good are the classifiers in detecting vans in the vehicle 
     119   data set, we would compute the matrix like this:: 
     120    
     121      cm = orngStat.confusionMatrices(resVeh, \ 
     122vehicle.domain.classVar.values.index("van")) 
     123    
     124   and get the results like these:: 
     125    
     126       TP: 189, FP: 241, FN: 10.0, TN: 406 
     127    
     128   while the same for class "opel" would give:: 
     129    
     130       TP: 86, FP: 112, FN: 126.0, TN: 522 
     131        
     132   The main difference is that there are only a few false negatives for the 
     133   van, meaning that the classifier seldom misses it (if it says it's not a 
     134   van, it's almost certainly not a van). Not so for the Opel car, where the 
     135   classifier missed 126 of them and correctly detected only 86. 
     136    
     137   **General confusion matrix** is computed (a) in case of a binary class, 
     138   when :obj:`classIndex` is set to -2, (b) when we have multivalued class and  
     139   the caller doesn't specify the :obj:`classIndex` of the positive class. 
     140   When called in this manner, the function cannot use the argument 
     141   :obj:`cutoff`. 
     142    
     143   The function then returns a three-dimensional matrix, where the element 
     144   A[:obj:`learner`][:obj:`actualClass`][:obj:`predictedClass`] 
     145   gives the number of examples belonging to 'actualClass' for which the 
     146   'learner' predicted 'predictedClass'. We shall compute and print out 
     147   the matrix for naive Bayesian classifier. 
     148    
     149   Here we see another example from `statExamples.py`_:: 
     150    
     151       cm = orngStat.confusionMatrices(resVeh)[0] 
     152       classes = vehicle.domain.classVar.values 
     153       print "\t"+"\t".join(classes) 
     154       for className, classConfusions in zip(classes, cm): 
     155           print ("%s" + ("\t%i" * len(classes))) % ((className, ) + tuple(classConfusions)) 
     156    
     157   .. _statExamples.py: code/statExamples.py 
     158    
     159   So, here's what this nice piece of code gives:: 
     160    
     161              bus   van  saab opel 
     162       bus     56   95   21   46 
     163       van     6    189  4    0 
     164       saab    3    75   73   66 
     165       opel    4    71   51   86 
     166        
     167   Van's are clearly simple: 189 vans were classified as vans (we know this 
     168   already, we've printed it out above), and the 10 misclassified pictures 
     169   were classified as buses (6) and Saab cars (4). In all other classes, 
     170   there were more examples misclassified as vans than correctly classified 
     171   examples. The classifier is obviously quite biased to vans. 
     172    
     173.. method:: sens(confm)  
     174.. method:: spec(confm) 
     175.. method:: PPV(confm) 
     176.. method:: NPV(confm) 
     177.. method:: precision(confm) 
     178.. method:: recall(confm) 
     179.. method:: F2(confm) 
     180.. method:: Falpha(confm, alpha=2.0) 
     181.. method:: MCC(conf) 
     182 
     183   With the confusion matrix defined in terms of positive and negative 
     184   classes, you can also compute the  
     185   `sensitivity <http://en.wikipedia.org/wiki/Sensitivity_(tests)>`_ 
     186   [TP/(TP+FN)], `specificity \ 
     187<http://en.wikipedia.org/wiki/Specificity_%28tests%29>`_ 
     188   [TN/(TN+FP)], `positive predictive value \ 
     189<http://en.wikipedia.org/wiki/Positive_predictive_value>`_ 
     190   [TP/(TP+FP)] and `negative predictive value \ 
     191<http://en.wikipedia.org/wiki/Negative_predictive_value>`_ [TN/(TN+FN)].  
     192   In information retrieval, positive predictive value is called precision 
     193   (the ratio of the number of relevant records retrieved to the total number 
     194   of irrelevant and relevant records retrieved), and sensitivity is called 
     195   `recall <http://en.wikipedia.org/wiki/Information_retrieval>`_  
     196   (the ratio of the number of relevant records retrieved to the total number 
     197   of relevant records in the database). The  
     198   `harmonic mean <http://en.wikipedia.org/wiki/Harmonic_mean>`_ of precision 
     199   and recall is called an  
     200   `F-measure <http://en.wikipedia.org/wiki/F-measure>`_, where, depending 
     201   on the ratio of the weight between precision and recall is implemented 
     202   as F1 [2*precision*recall/(precision+recall)] or, for a general case, 
     203   Falpha [(1+alpha)*precision*recall / (alpha*precision + recall)]. 
     204   The `Matthews correlation coefficient \ 
     205<http://en.wikipedia.org/wiki/Matthews_correlation_coefficient Matthews>`_ 
     206   in essence a correlation coefficient between 
     207   the observed and predicted binary classifications; it returns a value 
     208   between -1 and +1. A coefficient of +1 represents a perfect prediction, 
     209   0 an average random prediction and -1 an inverse prediction. 
     210    
     211   If the argument :obj:`confm` is a single confusion matrix, a single 
     212   result (a number) is returned. If confm is a list of confusion matrices, 
     213   a list of scores is returned, one for each confusion matrix. 
     214    
     215   Note that weights are taken into account when computing the matrix, so 
     216   these functions don't check the 'weighted' keyword argument. 
     217    
     218   Let us print out sensitivities and specificities of our classifiers in 
     219   part of `statExamples.py`_:: 
     220    
     221       cm = orngStat.confusionMatrices(res) 
     222       print 
     223       print "method\tsens\tspec" 
     224       for l in range(len(learners)): 
     225           print "%s\t%5.3f\t%5.3f" % (learners[l].name, orngStat.sens(cm[l]), orngStat.spec(cm[l])) 
     226    
     227   .. _statExamples.py: code/statExamples.py 
     228 
     229ROC Analysis 
     230============ 
     231 
     232`Receiver Operating Characteristic \ 
     233<http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_  
     234(ROC) analysis was initially developed for 
     235a binary-like problems and there is no consensus on how to apply it in 
     236multi-class problems, nor do we know for sure how to do ROC analysis after 
     237cross validation and similar multiple sampling techniques. If you are 
     238interested in the area under the curve, function AUC will deal with those 
     239problems as specifically described below. 
     240 
     241.. autofunction:: AUC 
     242    
     243   .. attribute:: AUC.ByWeightedPairs (or 0) 
     244       
     245      Computes AUC for each pair of classes (ignoring examples of all other 
     246      classes) and averages the results, weighting them by the number of 
     247      pairs of examples from these two classes (e.g. by the product of 
     248      probabilities of the two classes). AUC computed in this way still 
     249      behaves as concordance index, e.g., gives the probability that two 
     250      randomly chosen examples from different classes will be correctly 
     251      recognized (this is of course true only if the classifier knows 
     252      from which two classes the examples came). 
     253    
     254   .. attribute:: AUC.ByPairs (or 1) 
     255    
     256      Similar as above, except that the average over class pairs is not 
     257      weighted. This AUC is, like the binary, independent of class 
     258      distributions, but it is not related to concordance index any more. 
     259       
     260   .. attribute:: AUC.WeightedOneAgainstAll (or 2) 
     261       
     262      For each class, it computes AUC for this class against all others (that 
     263      is, treating other classes as one class). The AUCs are then averaged by 
     264      the class probabilities. This is related to concordance index in which 
     265      we test the classifier's (average) capability for distinguishing the 
     266      examples from a specified class from those that come from other classes. 
     267      Unlike the binary AUC, the measure is not independent of class 
     268      distributions. 
     269       
     270   .. attribute:: AUC.OneAgainstAll (or 3) 
     271    
     272      As above, except that the average is not weighted. 
     273    
     274   In case of :obj:`multiple folds` (for instance if the data comes from cross 
     275   validation), the computation goes like this. When computing the partial 
     276   AUCs for individual pairs of classes or singled-out classes, AUC is 
     277   computed for each fold separately and then averaged (ignoring the number 
     278   of examples in each fold, it's just a simple average). However, if a 
     279   certain fold doesn't contain any examples of a certain class (from the 
     280   pair), the partial AUC is computed treating the results as if they came 
     281   from a single-fold. This is not really correct since the class 
     282   probabilities from different folds are not necessarily comparable, 
     283   yet this will most often occur in a leave-one-out experiments, 
     284   comparability shouldn't be a problem. 
     285    
     286   Computing and printing out the AUC's looks just like printing out 
     287   classification accuracies (except that we call AUC instead of 
     288   CA, of course):: 
     289    
     290       AUCs = orngStat.AUC(res) 
     291       for l in range(len(learners)): 
     292           print "%10s: %5.3f" % (learners[l].name, AUCs[l]) 
     293            
     294   For vehicle, you can run exactly this same code; it will compute AUCs 
     295   for all pairs of classes and return the average weighted by probabilities 
     296   of pairs. Or, you can specify the averaging method yourself, like this:: 
     297    
     298       AUCs = orngStat.AUC(resVeh, orngStat.AUC.WeightedOneAgainstAll) 
     299    
     300   The following snippet tries out all four. (We don't claim that this is 
     301   how the function needs to be used; it's better to stay with the default.):: 
     302    
     303       methods = ["by pairs, weighted", "by pairs", "one vs. all, weighted", "one vs. all"] 
     304       print " " *25 + "  \tbayes\ttree\tmajority" 
     305       for i in range(4): 
     306           AUCs = orngStat.AUC(resVeh, i) 
     307           print "%25s: \t%5.3f\t%5.3f\t%5.3f" % ((methods[i], ) + tuple(AUCs)) 
     308    
     309   As you can see from the output:: 
     310    
     311                                   bayes   tree    majority 
     312              by pairs, weighted:  0.789   0.871   0.500 
     313                        by pairs:  0.791   0.872   0.500 
     314           one vs. all, weighted:  0.783   0.800   0.500 
     315                     one vs. all:  0.783   0.800   0.500 
     316 
     317.. autofunction:: AUC_single 
     318 
     319.. autofunction:: AUC_pair 
     320 
     321.. autofunction:: AUC_matrix 
     322 
     323The remaining functions, which plot the curves and statistically compare 
     324them, require that the results come from a test with a single iteration, 
     325and they always compare one chosen class against all others. If you have 
     326cross validation results, you can either use splitByIterations to split the 
     327results by folds, call the function for each fold separately and then sum 
     328the results up however you see fit, or you can set the ExperimentResults' 
     329attribute numberOfIterations to 1, to cheat the function - at your own 
     330responsibility for the statistical correctness. Regarding the multi-class 
     331problems, if you don't chose a specific class, orngStat will use the class 
     332attribute's baseValue at the time when results were computed. If baseValue 
     333was not given at that time, 1 (that is, the second class) is used as default. 
     334 
     335We shall use the following code to prepare suitable experimental results:: 
     336 
     337    ri2 = orange.MakeRandomIndices2(voting, 0.6) 
     338    train = voting.selectref(ri2, 0) 
     339    test = voting.selectref(ri2, 1) 
     340    res1 = orngTest.learnAndTestOnTestData(learners, train, test) 
     341 
     342 
     343.. autofunction:: AUCWilcoxon 
     344 
     345.. autofunction:: computeROC 
     346 
     347Comparison of Algorithms 
     348------------------------ 
     349 
     350.. autofunction:: McNemar 
     351 
     352.. autofunction:: McNemarOfTwo 
     353 
     354========== 
     355Regression 
     356========== 
     357 
     358General measure of quality 
     359========================== 
     360 
     361Several alternative measures, as given below, can be used to evaluate 
     362the sucess of numeric prediction: 
     363 
     364.. image:: files/statRegression.png 
     365 
     366.. autofunction:: MSE 
     367 
     368.. autofunction:: RMSE 
     369 
     370.. autofunction:: MAE 
     371 
     372.. autofunction:: RSE 
     373 
     374.. autofunction:: RRSE 
     375 
     376.. autofunction:: RAE 
     377 
     378.. autofunction:: R2 
     379 
     380The following code (`statExamples.py`_) uses most of the above measures to 
     381score several regression methods. 
     382 
     383.. literalinclude:: code/statExamplesRegression.py 
     384 
     385.. _statExamples.py: code/statExamples.py 
     386 
     387The code above produces the following output:: 
     388 
     389    Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2 
     390    maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002 
     391    rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526 
     392    knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748 
     393    lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715 
     394     
     395================= 
     396Ploting functions 
     397================= 
     398 
     399.. autofunction:: graph_ranks 
     400 
     401The following script (`statExamplesGraphRanks.py`_) shows hot to plot a graph: 
     402 
     403.. literalinclude:: code/statExamplesGraphRanks.py 
     404 
     405.. _statExamplesGraphRanks.py: code/statExamplesGraphRanks.py 
     406 
     407Code produces the following graph:  
     408 
     409.. image:: files/statExamplesGraphRanks1.png 
     410 
     411.. autofunction:: compute_CD 
     412 
     413.. autofunction:: compute_friedman 
     414 
     415================= 
     416Utility Functions 
     417================= 
     418 
     419.. autofunction:: splitByIterations 
     420 
    14421""" 
    15422 
    16423import statc 
    17424from operator import add 
     425import numpy 
     426 
    18427import orngMisc, orngTest 
    19 import numpy 
    20428 
    21429#### Private stuff 
     
    26434 
    27435def checkNonZero(x): 
     436    """Throw Value Error when x = 0.0.""" 
    28437    if x==0.0: 
    29438        raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0." 
     
    43452 
    44453def splitByIterations(res): 
     454    """ Splits ExperimentResults of multiple iteratation test into a list 
     455    of ExperimentResults, one for each iteration. 
     456    """ 
    45457    if res.numberOfIterations < 2: 
    46458        return [res] 
     
    206618 
    207619def MSE(res, **argkw): 
    208     """MSE(res) -> mean-squared error""" 
     620    """ Computes mean-squared error. """ 
    209621    return regressionError(res, **argkw) 
    210622     
    211623def RMSE(res, **argkw): 
    212     """RMSE(res) -> root mean-squared error""" 
     624    """ Computes root mean-squared error. """ 
    213625    argkw.setdefault("sqrt", True) 
    214626    return regressionError(res, **argkw) 
    215627 
    216628def MAE(res, **argkw): 
    217     """MAE(res) -> mean absolute error""" 
     629    """ Computes mean absolute error. """ 
    218630    argkw.setdefault("abs", True) 
    219631    return regressionError(res, **argkw) 
    220632 
    221633def RSE(res, **argkw): 
    222     """RSE(res) -> relative squared error""" 
     634    """ Computes relative squared error. """ 
    223635    argkw.setdefault("norm-sqr", True) 
    224636    return regressionError(res, **argkw) 
    225637 
    226638def RRSE(res, **argkw): 
    227     """RRSE(res) -> root relative squared error""" 
     639    """ Computes relative squared error. """ 
    228640    argkw.setdefault("norm-sqr", True) 
    229641    argkw.setdefault("sqrt", True) 
     
    231643 
    232644def RAE(res, **argkw): 
    233     """RAE(res) -> relative absolute error""" 
     645    """ Computes relative absolute error. """ 
    234646    argkw.setdefault("abs", True) 
    235647    argkw.setdefault("norm-abs", True) 
     
    237649 
    238650def R2(res, **argkw): 
    239     """R2(res) -> R-squared""" 
     651    """ Computes the coefficient of determination, R-squared. """ 
    240652    argkw.setdefault("norm-sqr", True) 
    241653    argkw.setdefault("R2", True) 
     
    282694    return MSE_old(res, **argkw) 
    283695 
    284  
    285 ######################################################################### 
    286 # PERFORMANCE MEASURES: 
    287 # Scores for evaluation of numeric predictions 
    288  
    289 def checkArgkw(dct, lst): 
    290     """checkArgkw(dct, lst) -> returns true if any items have non-zero value in dct""" 
    291     return reduce(lambda x,y: x or y, [dct.get(k, 0) for k in lst]) 
    292  
    293 def regressionError(res, **argkw): 
    294     """regressionError(res) -> regression error (default: MSE)""" 
    295     if argkw.get("SE", 0) and res.numberOfIterations > 1: 
    296         # computes the scores for each iteration, then averages 
    297         scores = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)] 
    298         if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0): 
    299             norm = [0.0] * res.numberOfIterations 
    300  
    301         nIter = [0]*res.numberOfIterations       # counts examples in each iteration 
    302         a = [0]*res.numberOfIterations           # average class in each iteration 
    303         for tex in res.results: 
    304             nIter[tex.iterationNumber] += 1 
    305             a[tex.iterationNumber] += float(tex.actualClass) 
    306         a = [a[i]/nIter[i] for i in range(res.numberOfIterations)] 
    307  
    308         if argkw.get("unweighted", 0) or not res.weights: 
    309             # iterate accross test cases 
    310             for tex in res.results: 
    311                 ai = float(tex.actualClass) 
    312                 nIter[tex.iterationNumber] += 1 
    313  
    314                 # compute normalization, if required 
    315                 if argkw.get("norm-abs", 0): 
    316                     norm[tex.iterationNumber] += abs(ai - a[tex.iterationNumber]) 
    317                 elif argkw.get("norm-sqr", 0): 
    318                     norm[tex.iterationNumber] += (ai - a[tex.iterationNumber])**2 
    319  
    320                 # iterate accross results of different regressors 
    321                 for i, cls in enumerate(tex.classes): 
    322                     if argkw.get("abs", 0): 
    323                         scores[i][tex.iterationNumber] += abs(float(cls) - ai) 
    324                     else: 
    325                         scores[i][tex.iterationNumber] += (float(cls) - ai)**2 
    326         else: # unweighted<>0 
    327             raise NotImplementedError, "weighted error scores with SE not implemented yet" 
    328  
    329         if argkw.get("norm-abs") or argkw.get("norm-sqr"): 
    330             scores = [[x/n for x, n in zip(y, norm)] for y in scores] 
    331         else: 
    332             scores = [[x/ni for x, ni in zip(y, nIter)] for y in scores] 
    333  
    334         if argkw.get("R2"): 
    335             scores = [[1.0 - x for x in y] for y in scores] 
    336  
    337         if argkw.get("sqrt", 0): 
    338             scores = [[math.sqrt(x) for x in y] for y in scores] 
    339  
    340         return [(statc.mean(x), statc.std(x)) for x in scores] 
    341          
    342     else: # single iteration (testing on a single test set) 
    343         scores = [0.0] * res.numberOfLearners 
    344         norm = 0.0 
    345  
    346         if argkw.get("unweighted", 0) or not res.weights: 
    347             a = sum([tex.actualClass for tex in res.results]) \ 
    348                 / len(res.results) 
    349             for tex in res.results: 
    350                 if argkw.get("abs", 0): 
    351                     scores = map(lambda res, cls, ac = float(tex.actualClass): 
    352                                  res + abs(float(cls) - ac), scores, tex.classes) 
    353                 else: 
    354                     scores = map(lambda res, cls, ac = float(tex.actualClass): 
    355                                  res + (float(cls) - ac)**2, scores, tex.classes) 
    356  
    357                 if argkw.get("norm-abs", 0): 
    358                     norm += abs(tex.actualClass - a) 
    359                 elif argkw.get("norm-sqr", 0): 
    360                     norm += (tex.actualClass - a)**2 
    361             totweight = gettotsize(res) 
    362         else: 
    363             # UNFINISHED 
    364             for tex in res.results: 
    365                 MSEs = map(lambda res, cls, ac = float(tex.actualClass), 
    366                            tw = tex.weight: 
    367                            res + tw * (float(cls) - ac)**2, MSEs, tex.classes) 
    368             totweight = gettotweight(res) 
    369  
    370         if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0): 
    371             scores = [s/norm for s in scores] 
    372         else: # normalize by number of instances (or sum of weights) 
    373             scores = [s/totweight for s in scores] 
    374  
    375         if argkw.get("R2"): 
    376             scores = [1.0 - s for s in scores] 
    377  
    378         if argkw.get("sqrt", 0): 
    379             scores = [math.sqrt(x) for x in scores] 
    380  
    381         return scores 
    382  
    383 def MSE(res, **argkw): 
    384     """MSE(res) -> mean-squared error""" 
    385     return regressionError(res, **argkw) 
    386      
    387 def RMSE(res, **argkw): 
    388     """RMSE(res) -> root mean-squared error""" 
    389     argkw.setdefault("sqrt", True) 
    390     return regressionError(res, **argkw) 
    391  
    392 def MAE(res, **argkw): 
    393     """MAE(res) -> mean absolute error""" 
    394     argkw.setdefault("abs", True) 
    395     return regressionError(res, **argkw) 
    396  
    397 def RSE(res, **argkw): 
    398     """RSE(res) -> relative squared error""" 
    399     argkw.setdefault("norm-sqr", True) 
    400     return regressionError(res, **argkw) 
    401  
    402 def RRSE(res, **argkw): 
    403     """RRSE(res) -> root relative squared error""" 
    404     argkw.setdefault("norm-sqr", True) 
    405     argkw.setdefault("sqrt", True) 
    406     return regressionError(res, **argkw) 
    407  
    408 def RAE(res, **argkw): 
    409     """RAE(res) -> relative absolute error""" 
    410     argkw.setdefault("abs", True) 
    411     argkw.setdefault("norm-abs", True) 
    412     return regressionError(res, **argkw) 
    413  
    414 def R2(res, **argkw): 
    415     """R2(res) -> R-squared""" 
    416     argkw.setdefault("norm-sqr", True) 
    417     argkw.setdefault("R2", True) 
    418     return regressionError(res, **argkw) 
    419  
    420 def MSE_old(res, **argkw): 
    421     """MSE(res) -> mean-squared error""" 
    422     if argkw.get("SE", 0) and res.numberOfIterations > 1: 
    423         MSEs = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)] 
    424         nIter = [0]*res.numberOfIterations 
    425         if argkw.get("unweighted", 0) or not res.weights: 
    426             for tex in res.results: 
    427                 ac = float(tex.actualClass) 
    428                 nIter[tex.iterationNumber] += 1 
    429                 for i, cls in enumerate(tex.classes): 
    430                     MSEs[i][tex.iterationNumber] += (float(cls) - ac)**2 
    431         else: 
    432             raise ValueError, "weighted RMSE with SE not implemented yet" 
    433         MSEs = [[x/ni for x, ni in zip(y, nIter)] for y in MSEs] 
    434         if argkw.get("sqrt", 0): 
    435             MSEs = [[math.sqrt(x) for x in y] for y in MSEs] 
    436         return [(statc.mean(x), statc.std(x)) for x in MSEs] 
    437          
    438     else: 
    439         MSEs = [0.0]*res.numberOfLearners 
    440         if argkw.get("unweighted", 0) or not res.weights: 
    441             for tex in res.results: 
    442                 MSEs = map(lambda res, cls, ac = float(tex.actualClass): 
    443                            res + (float(cls) - ac)**2, MSEs, tex.classes) 
    444             totweight = gettotsize(res) 
    445         else: 
    446             for tex in res.results: 
    447                 MSEs = map(lambda res, cls, ac = float(tex.actualClass), tw = tex.weight: 
    448                            res + tw * (float(cls) - ac)**2, MSEs, tex.classes) 
    449             totweight = gettotweight(res) 
    450  
    451         if argkw.get("sqrt", 0): 
    452             MSEs = [math.sqrt(x) for x in MSEs] 
    453         return [x/totweight for x in MSEs] 
    454  
    455 def RMSE_old(res, **argkw): 
    456     """RMSE(res) -> root mean-squared error""" 
    457     argkw.setdefault("sqrt", 1) 
    458     return MSE_old(res, **argkw) 
    459  
    460  
    461696######################################################################### 
    462697# PERFORMANCE MEASURES: 
     
    464699 
    465700def CA(res, reportSE = False, **argkw): 
     701    """ Computes classification accuracy, i.e. percentage of matches between 
     702    predicted and actual class. The function returns a list of classification 
     703    accuracies of all classifiers tested. If reportSE is set to true, the list 
     704    will contain tuples with accuracies and standard errors. 
     705     
     706    If results are from multiple repetitions of experiments (like those 
     707    returned by orngTest.crossValidation or orngTest.proportionTest) the 
     708    standard error (SE) is estimated from deviation of classification 
     709    accuracy accross folds (SD), as SE = SD/sqrt(N), where N is number 
     710    of repetitions (e.g. number of folds). 
     711     
     712    If results are from a single repetition, we assume independency of 
     713    examples and treat the classification accuracy as distributed according 
     714    to binomial distribution. This can be approximated by normal distribution, 
     715    so we report the SE of sqrt(CA*(1-CA)/N), where CA is classification 
     716    accuracy and N is number of test examples. 
     717     
     718    Instead of ExperimentResults, this function can be given a list of 
     719    confusion matrices (see below). Standard errors are in this case 
     720    estimated using the latter method. 
     721    """ 
    466722    if res.numberOfIterations==1: 
    467723        if type(res)==ConfusionMatrix: 
     
    512768 
    513769def AP(res, reportSE = False, **argkw): 
     770    """ Computes the average probability assigned to the correct class. """ 
    514771    if res.numberOfIterations == 1: 
    515772        APs=[0.0]*res.numberOfLearners 
     
    541798 
    542799def BrierScore(res, reportSE = False, **argkw): 
    543     """Computes Brier score""" 
     800    """ Computes the Brier's score, defined as the average (over test examples) 
     801    of sumx(t(x)-p(x))2, where x is a class, t(x) is 1 for the correct class 
     802    and 0 for the others, and p(x) is the probability that the classifier 
     803    assigned to the class x 
     804    """ 
    544805    # Computes an average (over examples) of sum_x(t(x) - p(x))^2, where 
    545806    #    x is class, 
     
    655916     
    656917def IS(res, apriori=None, reportSE = False, **argkw): 
     918    """ Computes the information score as defined by  
     919    `Kononenko and Bratko (1991) \ 
     920    <http://www.springerlink.com/content/g5p7473160476612/>`_. 
     921    Argument 'apriori' gives the apriori class 
     922    distribution; if it is omitted, the class distribution is computed from 
     923    the actual classes of examples in res. 
     924    """ 
    657925    if not apriori: 
    658926        apriori = classProbabilitiesFromRes(res) 
     
    7541022 
    7551023def confusionMatrices(res, classIndex=-1, **argkw): 
     1024    """ This function can compute two different forms of confusion matrix: 
     1025    one in which a certain class is marked as positive and the other(s) 
     1026    negative, and another in which no class is singled out. The way to 
     1027    specify what we want is somewhat confusing due to backward 
     1028    compatibility issues. 
     1029    """ 
    7561030    tfpns = [ConfusionMatrix() for i in range(res.numberOfLearners)] 
    7571031     
     
    9981272 
    9991273def AUCWilcoxon(res, classIndex=-1, **argkw): 
     1274    """ Computes the area under ROC (AUC) and its standard error using 
     1275    Wilcoxon's approach proposed by Hanley and McNeal (1982). If classIndex 
     1276    is not specified, the first class is used as "the positive" and others 
     1277    are negative. The result is a list of tuples (aROC, standard error). 
     1278    """ 
    10001279    import corn 
    10011280    useweights = res.weights and not argkw.get("unweighted", 0) 
     
    10361315     
    10371316def computeROC(res, classIndex=-1): 
     1317    """ Computes a ROC curve as a list of (x, y) tuples, where x is  
     1318    1-specificity and y is sensitivity. 
     1319    """ 
    10381320    import corn 
    10391321    problists, tots = corn.computeROCCumulative(res, classIndex) 
     
    15741856    return sum_aucs 
    15751857 
     1858def AUC(): 
     1859    pass 
     1860 
     1861AUC.ByWeightedPairs = 0 
    15761862 
    15771863# Computes AUC, possibly for multiple classes (the averaging method can be specified) 
    15781864# Results over folds are averages; if some folds examples from one class only, the folds are merged 
    1579 def AUC(res, method = 0, useWeights = True): 
     1865def AUC(res, method = AUC.ByWeightedPairs, useWeights = True): 
     1866    """ Returns the area under ROC curve (AUC) given a set of experimental 
     1867    results. For multivalued class problems, it will compute some sort of 
     1868    average, as specified by the argument method. 
     1869    """ 
    15801870    if len(res.classValues) < 2: 
    15811871        raise ValueError("Cannot compute AUC on a single-class problem") 
     
    15941884# Results over folds are averages; if some folds examples from one class only, the folds are merged 
    15951885def AUC_single(res, classIndex = -1, useWeights = True): 
     1886    """ Computes AUC where the class given classIndex is singled out, and 
     1887    all other classes are treated as a single class. To find how good our 
     1888    classifiers are in distinguishing between vans and other vehicle, call 
     1889    the function like this:: 
     1890     
     1891        orngStat.AUC_single(resVeh, \ 
     1892classIndex = vehicle.domain.classVar.values.index("van")) 
     1893    """ 
    15961894    if classIndex<0: 
    15971895        if res.baseClass>=0: 
     
    16081906# Results over folds are averages; if some folds have examples from one class only, the folds are merged 
    16091907def AUC_pair(res, classIndex1, classIndex2, useWeights = True): 
     1908    """ Computes AUC between a pair of examples, ignoring examples from all 
     1909    other classes. 
     1910    """ 
    16101911    if res.numberOfIterations > 1: 
    16111912        return AUC_iterations(AUC_ij, splitByIterations(res), (classIndex1, classIndex2, useWeights, res, res.numberOfIterations)) 
     
    16161917# AUC for multiclass problems 
    16171918def AUC_matrix(res, useWeights = True): 
     1919    """ Computes a (lower diagonal) matrix with AUCs for all pairs of classes. 
     1920    If there are empty classes, the corresponding elements in the matrix 
     1921    are -1. Remember the beautiful(?) code for printing out the confusion 
     1922    matrix? Here it strikes again:: 
     1923     
     1924        classes = vehicle.domain.classVar.values 
     1925        AUCmatrix = orngStat.AUC_matrix(resVeh)[0] 
     1926        print "\t"+"\t".join(classes[:-1]) 
     1927        for className, AUCrow in zip(classes[1:], AUCmatrix[1:]): 
     1928            print ("%s" + ("\t%5.3f" * len(AUCrow))) % ((className, ) + tuple(AUCrow)) 
     1929    """ 
    16181930    numberOfClasses = len(res.classValues) 
    16191931    numberOfLearners = res.numberOfLearners 
     
    16401952 
    16411953def McNemar(res, **argkw): 
     1954    """ Computes a triangular matrix with McNemar statistics for each pair of 
     1955    classifiers. The statistics is distributed by chi-square distribution with 
     1956    one degree of freedom; critical value for 5% significance is around 3.84. 
     1957    """ 
    16421958    nLearners = res.numberOfLearners 
    16431959    mcm = [] 
     
    16831999 
    16842000def McNemarOfTwo(res, lrn1, lrn2): 
     2001    """ McNemarOfTwo computes a McNemar statistics for a pair of classifier, 
     2002    specified by indices learner1 and learner2. 
     2003    """ 
    16852004    tf = ft = 0.0 
    16862005    if not res.weights or argkw.get("unweighted"): 
     
    19902309 
    19912310def compute_friedman(avranks, N): 
    1992     """ 
    1993     Returns a tuple (friedman statistic, degrees of freedom) 
    1994     and (Iman statistic - F-distribution, degrees of freedom) 
     2311    """ Returns a tuple composed of (friedman statistic, degrees of freedom) 
     2312    and (Iman statistic - F-distribution, degrees of freedoma) given average 
     2313    ranks and a number of tested data sets N. 
    19952314    """ 
    19962315 
     
    20102329 
    20112330def compute_CD(avranks, N, alpha="0.05", type="nemenyi"): 
    2012     """ 
    2013     if type == "nemenyi": 
    2014         critical difference for Nemenyi two tailed test. 
    2015     if type == "bonferroni-dunn": 
    2016         critical difference for Bonferroni-Dunn test 
     2331    """ Returns critical difference for Nemenyi or Bonferroni-Dunn test 
     2332    according to given alpha (either alpha="0.05" or alpha="0.1") for average 
     2333    ranks and number of tested data sets N. Type can be either "nemenyi" for 
     2334    for Nemenyi two tailed test or "bonferroni-dunn" for Bonferroni-Dunn test. 
    20172335    """ 
    20182336 
     
    20422360    Needs matplotlib to work. 
    20432361 
    2044     Arguments: 
    2045     filename -- Output file name (with extension). Formats supported 
    2046         by matplotlib can be used. 
    2047     avranks -- List of average methods' ranks. 
    2048     names -- List of methods' names. 
    2049  
    2050     Keyword arguments: 
    2051     cd -- Critical difference. Used for marking methods that whose 
    2052         difference is not statistically significant. 
    2053     lowv -- The lowest shown rank, if None, use 1. 
    2054     highv -- The highest shown rank, if None, use len(avranks). 
    2055     width -- Width of the drawn figure in inches, default 6 in. 
    2056     textspace -- Space on figure sides left for the description 
    2057         of methods, default 1 in. 
    2058     reverse -- If True, the lowest rank is on the right. Default: 
    2059         False. 
    2060     cdmethod -- None by default. It can be an index of element in avranks or 
    2061         or names which specifies the method which should be marked 
    2062         with an interval. 
    2063  
    2064     Maintainer: Marko Toplak 
     2362    :param filename: Output file name (with extension). Formats supported  
     2363                     by matplotlib can be used. 
     2364    :param avranks: List of average methods' ranks. 
     2365    :param names: List of methods' names. 
     2366 
     2367    :param cd: Critical difference. Used for marking methods that whose 
     2368               difference is not statistically significant. 
     2369    :param lowv: The lowest shown rank, if None, use 1. 
     2370    :param highv: The highest shown rank, if None, use len(avranks). 
     2371    :param width: Width of the drawn figure in inches, default 6 in. 
     2372    :param textspace: Space on figure sides left for the description 
     2373                      of methods, default 1 in. 
     2374    :param reverse:  If True, the lowest rank is on the right. Default: False. 
     2375    :param cdmethod: None by default. It can be an index of element in avranks 
     2376                     or or names which specifies the method which should be 
     2377                     marked with an interval. 
    20652378    """ 
    20662379 
Note: See TracChangeset for help on using the changeset viewer.