Changeset 4058:bc1ec4bfa8c3 in orange


Ignore:
Timestamp:
08/07/07 09:16:30 (7 years ago)
Author:
blaz <blaz.zupan@…>
Branch:
default
Convert:
5aca394d1d3fab6490dcdc54336c5686f74602a8
Message:

new: regression scores, in classification F1 and Falpha

Location:
orange
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • orange/doc/modules/orngStat.htm

    r3468 r4058  
    44<body> 
    55 
    6 <h1>orngStat: Orange Statistics for Classifiers</h1> 
     6<h1>orngStat: Orange Statistics for Predictors</h1> 
    77<index name="modules/performance of classifiers"> 
     8<index name="modules/performance of regressors"> 
    89<index name="classifiers/accuracy of"> 
    9  
    10 <P>This module contains various measures of quality, such as 
    11 classification accuracy, ROC statistics and similar. Most functions 
    12 require an argument named <code>res</code>, an instance of 
    13 <code><INDEX name="classes/ExperimentResults (in 
     10<index name="regression/evaluation of"> 
     11 
     12<P>This module contains various measures of quality for classification 
     13and regression. Most functions require an argument named 
     14<code>res</code>, an instance of <code><INDEX 
     15name="classes/ExperimentResults (in 
    1416orngTest)">ExperimentResults</code> as computed by functions from <a 
    15 href="orngTest.htm">orngTest</a> and which contains classifications 
    16 and probabilities obtained through cross-validation, leave one-out, 
    17 testing on training or new examples...</P> 
     17href="orngTest.htm">orngTest</a> and which contains predictions 
     18obtained through cross-validation, leave one-out, testing on training 
     19data or test set examples.</P> 
     20 
     21<h2>Classification</h2> 
    1822 
    1923<P>To prepare some data for examples on this page, we shall load the 
     
    4650 
    4751 
    48 <H2>General Measures of Quality</H2> 
     52<H3>General Measures of Quality</H3> 
    4953 
    5054<DL class="attributes"> 
     
    125129<P>Script <a href="statExamples.py">statExamples.py</a> contains another example that also prints out the standard errors.</P> 
    126130 
    127 <H2>Confusion Matrix</H2> 
     131<H3>Confusion Matrix</H3> 
    128132<index name="performance scores+confusion matrix"> 
    129133 
     
    245249</DD> 
    246250 
    247 <dt>sens(confm), spec(confm), PPV(confm), NPV(confm), precision(confm), recall(confm), fmeasure(conf) 
     251<dt>sens(confm), spec(confm), PPV(confm), NPV(confm), precision(confm), recall(confm), F2(conf), Falpha(conf, alpha=2.0) 
    248252<index name="performance scores+sensitivity"> 
    249253<index name="performance scores+specificity"> 
     
    252256<index name="performance scores+precision"> 
    253257<index name="performance scores+recall"> 
    254 <index name="performance scores+F-measure"></dt> 
     258<index name="performance scores+F-measure"> 
     259<index name="performance scores+F1"> 
     260<index name="performance scores+Falpha"></dt> 
    255261 
    256262<dd><p>With the confusion matrix defined in terms of positive and 
     
    266272predictive value is called precision (the ratio of the number of 
    267273relevant records retrieved to the total number of irrelevant and 
    268 relevant records retrieved), and sensitivity is called recall (the 
    269 ratio of the number of relevant records retrieved to the total number 
    270 of relevant records in the database). The <a 
     274relevant records retrieved), and sensitivity is called <a 
     275href="http://en.wikipedia.org/wiki/Information_retrieval">recall</a> 
     276(the ratio of the number of relevant records retrieved to the total 
     277number of relevant records in the database). The <a 
    271278href="http://en.wikipedia.org/wiki/Harmonic_mean">harmonic mean</a> of 
    272279precision and recall [] is called an <a 
    273 href="http://en.wikipedia.org/wiki/Sensitivity_(tests)">F-measure</a>.</p> 
     280href="http://en.wikipedia.org/wiki/Sensitivity_(tests)">F-measure</a>, 
     281where, depending on the ratio of the weight between precision and 
     282recall is implemented as <code>F1</code> or, for a general case, 
     283<code>Falpha</code>.</p> 
    274284 
    275285<br> 
     
    299309 
    300310 
    301 <H2>ROC Analysis</H2> 
     311<H3>ROC Analysis</H3> 
    302312<index name="performance scores+ROC analysis"> 
    303313<index name="performance scores+AUC"> 
     
    444454</DL> 
    445455 
    446 <H2>Comparison of Algorithms</H2> 
     456<H3>Comparison of Algorithms</H3> 
    447457 
    448458<DL> 
    449459<DT><B>McNemar(res)</B> 
    450460<index name="performance scores+McNemar test"></DT> 
    451 <DD>Computes a triangular matrix with McNemar statistics for each pair of classifiers. The statistics is distributed by chi-square distribution with one degree of freedom; critical value for 5% significance is around 3.84.</DD><P> 
    452  
     461<DD>Computes a triangular matrix with McNemar statistics for each pair 
     462of classifiers. The statistics is distributed by chi-square 
     463distribution with one degree of freedom; critical value for 5% 
     464significance is around 3.84.</DD> 
    453465<DT><B>McNemarOfTwo(res, learner1, learner2)</B></DT> 
    454 <DD>McNemarOfTwo computes a McNemar statistics for a pair of classifier, specified by indices learner1 and learner2.</DD><P> 
    455  
     466<DD>McNemarOfTwo computes a McNemar statistics for a pair of 
     467classifier, specified by indices learner1 and learner2.</DD> 
    456468</DL> 
     469 
     470<H2>Regression</H2> 
     471 
     472<p>Several alternative measures, as given below, can be used to 
     473evaluate the sucess of numeric prediction:</p> 
     474 
     475<img src="orngStat-regression.png"> 
     476 
     477<dl class="attributes"> 
     478 
     479<dt>MSE(res) 
     480<index name="performance scores+mean-squared error"></dt> 
     481<dd>Computes mean-squared error.</dd> 
     482     
     483<dt>RMSE(res) 
     484<index name="performance scores+root mean-squared error"></dt> 
     485<dd>Computes root mean-squared error.</dd> 
     486 
     487<dt>MAE(res) 
     488<index name="performance scores+mean absolute error"></dt> 
     489<dd>Computes mean absolute error.</dd> 
     490 
     491<dt>RSE(res) 
     492<index name="performance scores+relative squared error"></dt> 
     493<dd>Computes relative squared error.</dd> 
     494 
     495<dt>RRSE(res) 
     496<index name="performance scores+root relative squared error"></dt> 
     497<dd>Computes root relative squared error.</dd> 
     498 
     499<dt>RAE(res) 
     500<index name="performance scores+relative absolute error"></dt> 
     501<dd>Computes relative absolute error.</dd> 
     502 
     503<dt>R2(res) 
     504<index name="performance scores+R-squared"></dt> 
     505<dd>Computes the coefficient of determination, R-squared.</dd> 
     506 
     507</dl> 
     508 
     509<p> The following code uses most of the above measures to score 
     510several regression methods.</p> 
     511 
     512<p class="header"><a href="statExamples-regression.py">statExamples-regression.py</a></p> 
     513<xmp class="code">import orange 
     514import orngRegression as r 
     515import orngTree 
     516import orngStat, orngTest 
     517 
     518data = orange.ExampleTable("housing") 
     519 
     520# definition of regressors 
     521lr = r.LinearRegressionLearner(name="lr") 
     522rt = orngTree.TreeLearner(measure="retis", mForPruning=2, 
     523                          minExamples=20, name="rt") 
     524maj = orange.MajorityLearner(name="maj") 
     525knn = orange.kNNLearner(k=10, name="knn") 
     526 
     527learners = [maj, rt, knn, lr] 
     528 
     529# cross validation, selection of scores, report of results 
     530results = orngTest.crossValidation(learners, data, folds=3) 
     531scores = [("MSE", orngStat.MSE),   ("RMSE", orngStat.RMSE), 
     532          ("MAE", orngStat.MAE),   ("RSE", orngStat.RSE), 
     533          ("RRSE", orngStat.RRSE), ("RAE", orngStat.RAE), 
     534          ("R2", orngStat.R2)] 
     535 
     536print "Learner   " + "".join(["%-8s" % s[0] for s in scores]) 
     537for i in range(len(learners)): 
     538    print "%-8s " % learners[i].name + \ 
     539    "".join(["%7.3f " % s[1](results)[i] for s in scores]) 
     540</xmp> 
     541 
     542<p>The code above produces the following output:</p> 
     543 
     544<xmp class="code">Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2 
     545maj       84.585   9.197   6.653   1.002   1.001   1.001  -0.002 
     546rt        40.015   6.326   4.592   0.474   0.688   0.691   0.526 
     547knn       21.248   4.610   2.870   0.252   0.502   0.432   0.748 
     548lr        24.092   4.908   3.425   0.285   0.534   0.515   0.715 
     549</xmp> 
    457550 
    458551<H2>Plotting Functions</H2> 
  • orange/orngStat.py

    r3469 r4058  
    7373        return [statc.mean(x) for x in stats] 
    7474     
    75 #### Statistics 
    76  
    7775def ME(res, **argkw): 
    7876    MEs = [0.0]*res.numberOfLearners 
     
    9391MAE = ME 
    9492 
     93######################################################################### 
     94# PERFORMANCE MEASURES: 
     95# Scores for evaluation of numeric predictions 
     96 
     97def checkArgkw(dct, lst): 
     98    """checkArgkw(dct, lst) -> returns true if any items have non-zero value in dct""" 
     99    return reduce(lambda x,y: x or y, [dct.get(k, 0) for k in lst]) 
     100 
     101def regressionError(res, **argkw): 
     102    """regressionError(res) -> regression error (default: MSE)""" 
     103    if argkw.get("SE", 0) and res.numberOfIterations > 1: 
     104        # computes the scores for each iteration, then averages 
     105        scores = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)] 
     106        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0): 
     107            norm = [0.0] * res.numberOfIterations 
     108 
     109        nIter = [0]*res.numberOfIterations       # counts examples in each iteration 
     110        a = [0]*res.numberOfIterations           # average class in each iteration 
     111        for tex in res.results: 
     112            nIter[tex.iterationNumber] += 1 
     113            a[tex.iterationNumber] += float(tex.actualClass) 
     114        a = [a[i]/nIter[i] for i in range(res.numberOfIterations)] 
     115 
     116        if argkw.get("unweighted", 0) or not res.weights: 
     117            # iterate accross test cases 
     118            for tex in res.results: 
     119                ai = float(tex.actualClass) 
     120                nIter[tex.iterationNumber] += 1 
     121 
     122                # compute normalization, if required 
     123                if argkw.get("norm-abs", 0): 
     124                    norm[tex.iterationNumber] += abs(ai - a[tex.iterationNumber]) 
     125                elif argkw.get("norm-sqr", 0): 
     126                    norm[tex.iterationNumber] += (ai - a[tex.iterationNumber])**2 
     127 
     128                # iterate accross results of different regressors 
     129                for i, cls in enumerate(tex.classes): 
     130                    if argkw.get("abs", 0): 
     131                        scores[i][tex.iterationNumber] += abs(float(cls) - ai) 
     132                    else: 
     133                        scores[i][tex.iterationNumber] += (float(cls) - ai)**2 
     134        else: # unweighted<>0 
     135            raise SystemError, "weighted error scores with SE not implemented yet" 
     136 
     137        if argkw.get("norm-abs") or argkw.get("norm-sqr"): 
     138            scores = [[x/n for x, n in zip(y, norm)] for y in scores] 
     139        else: 
     140            scores = [[x/ni for x, ni in zip(y, nIter)] for y in scores] 
     141 
     142        if argkw.get("R2"): 
     143            scores = [[1.0 - x for x in y] for y in scores] 
     144 
     145        if argkw.get("sqrt", 0): 
     146            scores = [[math.sqrt(x) for x in y] for y in scores] 
     147 
     148        return [(statc.mean(x), statc.std(x)) for x in scores] 
     149         
     150    else: # single iteration (testing on a single test set) 
     151        scores = [0.0] * res.numberOfLearners 
     152        norm = 0.0 
     153 
     154        if argkw.get("unweighted", 0) or not res.weights: 
     155            a = sum([tex.actualClass for tex in res.results]) \ 
     156                / len(res.results) 
     157            for tex in res.results: 
     158                if argkw.get("abs", 0): 
     159                    scores = map(lambda res, cls, ac = float(tex.actualClass): 
     160                                 res + abs(float(cls) - ac), scores, tex.classes) 
     161                else: 
     162                    scores = map(lambda res, cls, ac = float(tex.actualClass): 
     163                                 res + (float(cls) - ac)**2, scores, tex.classes) 
     164 
     165                if argkw.get("norm-abs", 0): 
     166                    norm += abs(tex.actualClass - a) 
     167                elif argkw.get("norm-sqr", 0): 
     168                    norm += (tex.actualClass - a)**2 
     169            totweight = gettotsize(res) 
     170        else: 
     171            # UNFINISHED 
     172            for tex in res.results: 
     173                MSEs = map(lambda res, cls, ac = float(tex.actualClass), 
     174                           tw = tex.weight: 
     175                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes) 
     176            totweight = gettotweight(res) 
     177 
     178        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0): 
     179            scores = [s/norm for s in scores] 
     180        else: # normalize by number of instances (or sum of weights) 
     181            scores = [s/totweight for s in scores] 
     182 
     183        if argkw.get("R2"): 
     184            scores = [1.0 - s for s in scores] 
     185 
     186        if argkw.get("sqrt", 0): 
     187            scores = [math.sqrt(x) for x in scores] 
     188 
     189        return scores 
     190 
    95191def MSE(res, **argkw): 
     192    """MSE(res) -> mean-squared error""" 
     193    return regressionError(res, **argkw) 
     194     
     195def RMSE(res, **argkw): 
     196    """RMSE(res) -> root mean-squared error""" 
     197    argkw.setdefault("sqrt", True) 
     198    return regressionError(res, **argkw) 
     199 
     200def MAE(res, **argkw): 
     201    """MAE(res) -> mean absolute error""" 
     202    argkw.setdefault("abs", True) 
     203    return regressionError(res, **argkw) 
     204 
     205def RSE(res, **argkw): 
     206    """RSE(res) -> relative squared error""" 
     207    argkw.setdefault("norm-sqr", True) 
     208    return regressionError(res, **argkw) 
     209 
     210def RRSE(res, **argkw): 
     211    """RRSE(res) -> root relative squared error""" 
     212    argkw.setdefault("norm-sqr", True) 
     213    argkw.setdefault("sqrt", True) 
     214    return regressionError(res, **argkw) 
     215 
     216def RAE(res, **argkw): 
     217    """RAE(res) -> relative absolute error""" 
     218    argkw.setdefault("abs", True) 
     219    argkw.setdefault("norm-abs", True) 
     220    return regressionError(res, **argkw) 
     221 
     222def R2(res, **argkw): 
     223    """R2(res) -> R-squared""" 
     224    argkw.setdefault("norm-sqr", True) 
     225    argkw.setdefault("R2", True) 
     226    return regressionError(res, **argkw) 
     227 
     228def MSE_old(res, **argkw): 
     229    """MSE(res) -> mean-squared error""" 
    96230    if argkw.get("SE", 0) and res.numberOfIterations > 1: 
    97231        MSEs = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)] 
     
    127261        return [x/totweight for x in MSEs] 
    128262 
    129 def RMSE(res, **argkw): 
     263def RMSE_old(res, **argkw): 
     264    """RMSE(res) -> root mean-squared error""" 
    130265    argkw.setdefault("sqrt", 1) 
    131     return MSE(res, **argkw) 
    132  
     266    return MSE_old(res, **argkw) 
     267 
     268 
     269######################################################################### 
     270# PERFORMANCE MEASURES: 
     271# Scores for evaluation of classifiers 
    133272 
    134273def CA(res, reportSE = False, **argkw): 
     
    206345            foldN[tex.iterationNumber] += tex.weight 
    207346 
    208     return statisticsByFolds(APsByFold, foldN, reportSE, True)     
    209          
    210          
     347    return statisticsByFolds(APsByFold, foldN, reportSE, True) 
    211348 
    212349 
    213350def BrierScore(res, reportSE = False, **argkw): 
     351    """Computes Brier score""" 
    214352    # Computes an average (over examples) of sum_x(t(x) - p(x))^2, where 
    215353    #    x is class, 
     
    260398    else: 
    261399        return [x+1.0 for x in stats] 
    262      
    263  
    264400 
    265401def BSS(res, **argkw): 
     
    319455## 
    320456 
    321  
    322  
    323457def IS_ex(Pc, P): 
    324458    "Pc aposterior probability, P aprior" 
     
    328462        return -(-log2(1-P)+log2(1-Pc)) 
    329463     
    330  
    331  
    332464def IS(res, apriori=None, reportSE = False, **argkw): 
    333465    if not apriori: 
     
    389521    return F, statc.chisqprob(F, k-1) 
    390522     
     523 
    391524def Wilcoxon(res, statistics, **argkw): 
    392525    res1, res2 = [], [] 
     
    543676        return confm.TP/tot 
    544677 
     678 
    545679def precision(confm): 
    546680    return PPV(confm) 
     
    558692        return confm.TP/tot 
    559693 
    560 def fmeasure(confm): 
     694def F1(confm): 
    561695    if type(confm) == list: 
    562696        return [fmeasure(cm) for cm in confm] 
     
    565699        r = recall(confm) 
    566700        return 2. * p * r / (p + r) 
     701 
     702def Falpha(confm, alpha=1.0): 
     703    if type(confm) == list: 
     704        return [fmeasure(cm) for cm in confm] 
     705    else: 
     706        p = precision(confm) 
     707        r = recall(confm) 
     708        return (1. + alpha) * p * r / (alpha * p + r) 
    567709 
    568710def AUCWilcoxon(res, classIndex=-1, **argkw): 
     
    14791621                        if td<best[0]: 
    14801622                            best=(td, t) 
    1481                     #print wanted, best 
    14821623                    if not best[1] in newn: 
    14831624                        newn.append(best[1]) 
Note: See TracChangeset for help on using the changeset viewer.