Changeset 9447:df7e3665f3c7 in orange


Ignore:
Timestamp:
07/04/11 13:59:54 (3 years ago)
Author:
wencanluo <wencanluo@…>
Branch:
default
Convert:
06db485eb984e0f17b4667561337304e344b6db7
Message:

Merged the testing and scoring in multilabel and the old one

Location:
orange
Files:
7 added
9 deleted
8 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/__init__.py

    r9445 r9447  
    9090_import("multilabel.br") 
    9191_import("multilabel.testing") 
     92_import("multilabel.scoring") 
    9293 
  • orange/Orange/evaluation/scoring.py

    r8264 r9447  
    145145    
    146146   The function then returns a three-dimensional matrix, where the element 
    147    A[:obj:`learner`][:obj:`actualClass`][:obj:`predictedClass`] 
    148    gives the number of instances belonging to 'actualClass' for which the 
     147   A[:obj:`learner`][:obj:`actual_class`][:obj:`predictedClass`] 
     148   gives the number of instances belonging to 'actual_class' for which the 
    149149   'learner' predicted 'predictedClass'. We shall compute and print out 
    150150   the matrix for naive Bayesian classifier. 
     
    330330results by folds, call the function for each fold separately and then sum 
    331331the results up however you see fit, or you can set the ExperimentResults' 
    332 attribute numberOfIterations to 1, to cheat the function - at your own 
     332attribute number_of_iterations to 1, to cheat the function - at your own 
    333333responsibility for the statistical correctness. Regarding the multi-class 
    334334problems, if you don't chose a specific class, Orange.evaluation.scoring will use the class 
     
    421421 
    422422.. autofunction:: split_by_iterations 
     423 
     424====================== 
     425Scoring for multilabel 
     426====================== 
     427 
     428Multi-label classification requries different metrics than those used in traditional single-label  
     429classification. This module presents the various methrics that have been proposed in the literature.  
     430Let :math:`D` be a multi-label evaluation data set, conisting of :math:`|D|` multi-label examples  
     431:math:`(x_i,Y_i)`, :math:`i=1..|D|`, :math:`Y_i \\subseteq L`. Let :math:`H` be a multi-label classifier  
     432and :math:`Z_i=H(x_i)` be the set of labels predicted by :math:`H` for example :math:`x_i`. 
     433 
     434.. autofunction:: hamming_loss  
     435.. autofunction:: accuracy 
     436.. autofunction:: precision 
     437.. autofunction:: recall 
     438 
     439So, let's compute all this in part of  
     440(`ml-evaluator.py`_, uses `multidata.tab`_) and print it out: 
     441 
     442.. literalinclude:: code/mlc-evaluator.py 
     443   :lines: 1- 
     444 
     445.. _multidata.tab: code/multidata.tab 
     446.. _ml-evaluator.py: code/mlc-evaluator.py 
     447 
     448The output should look like this:: 
     449 
     450    loss= [0.9375] 
     451    accuracy= [0.875] 
     452    precision= [1.0] 
     453    recall= [0.875] 
     454 
     455References 
     456========== 
     457 
     458Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification', 
     459Pattern Recogintion, vol.37, no.9, pp:1757-71 
     460 
     461Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper  
     462presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining  
     463(PAKDD 2004) 
     464  
     465Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization',  
     466Machine Learning, vol.39, no.2/3, pp:135-68. 
    423467 
    424468""" 
     
    461505    of ExperimentResults, one for each iteration. 
    462506    """ 
    463     if res.numberOfIterations < 2: 
     507    if res.number_of_iterations < 2: 
    464508        return [res] 
    465509         
    466     ress = [Orange.evaluation.testing.ExperimentResults(1, res.classifierNames, res.classValues, res.weights, classifiers=res.classifiers, loaded=res.loaded) 
    467             for i in range(res.numberOfIterations)] 
     510    ress = [Orange.evaluation.testing.ExperimentResults(1, res.classifierNames, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded) 
     511            for i in range(res.number_of_iterations)] 
    468512    for te in res.results: 
    469         ress[te.iterationNumber].results.append(te) 
     513        ress[te.iteration_number].results.append(te) 
    470514    return ress     
    471515 
     
    473517def class_probabilities_from_res(res, **argkw): 
    474518    """Calculate class probabilities""" 
    475     probs = [0.0] * len(res.classValues) 
     519    probs = [0.0] * len(res.class_values) 
    476520    if argkw.get("unweighted", 0) or not res.weights: 
    477521        for tex in res.results: 
    478             probs[int(tex.actualClass)] += 1.0 
     522            probs[int(tex.actual_class)] += 1.0 
    479523        totweight = gettotsize(res) 
    480524    else: 
    481525        totweight = 0.0 
    482526        for tex in res.results: 
    483             probs[tex.actualClass] += tex.weight 
     527            probs[tex.actual_class] += tex.weight 
    484528            totweight += tex.weight 
    485529        check_non_zero(totweight) 
     
    492536        if not stats: 
    493537            raise ValueError, "Cannot compute the score: no examples or sum of weights is 0.0." 
    494         numberOfLearners = len(stats[0]) 
     538        number_of_learners = len(stats[0]) 
    495539        stats = filter(lambda (x, fN): fN>0.0, zip(stats,foldN)) 
    496         stats = [ [x[lrn]/fN for x, fN in stats] for lrn in range(numberOfLearners)] 
     540        stats = [ [x[lrn]/fN for x, fN in stats] for lrn in range(number_of_learners)] 
    497541    else: 
    498542        stats = [ [x/Fn for x, Fn in filter(lambda (x, Fn): Fn > 0.0, zip(lrnD, foldN))] for lrnD in stats] 
     
    509553     
    510554def ME(res, **argkw): 
    511     MEs = [0.0]*res.numberOfLearners 
     555    MEs = [0.0]*res.number_of_learners 
    512556 
    513557    if argkw.get("unweighted", 0) or not res.weights: 
    514558        for tex in res.results: 
    515             MEs = map(lambda res, cls, ac = float(tex.actualClass): 
     559            MEs = map(lambda res, cls, ac = float(tex.actual_class): 
    516560                      res + abs(float(cls) - ac), MEs, tex.classes) 
    517561        totweight = gettotsize(res) 
    518562    else: 
    519563        for tex in res.results: 
    520             MEs = map(lambda res, cls, ac = float(tex.actualClass), tw = tex.weight: 
     564            MEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight: 
    521565                       res + tw*abs(float(cls) - ac), MEs, tex.classes) 
    522566        totweight = gettotweight(res) 
     
    536580def regression_error(res, **argkw): 
    537581    """regression_error(res) -> regression error (default: MSE)""" 
    538     if argkw.get("SE", 0) and res.numberOfIterations > 1: 
     582    if argkw.get("SE", 0) and res.number_of_iterations > 1: 
    539583        # computes the scores for each iteration, then averages 
    540         scores = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)] 
     584        scores = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)] 
    541585        if argkw.get("norm-abs", 0) or argkw.get("norm-sqr", 0): 
    542             norm = [0.0] * res.numberOfIterations 
    543  
    544         nIter = [0]*res.numberOfIterations       # counts examples in each iteration 
    545         a = [0]*res.numberOfIterations           # average class in each iteration 
     586            norm = [0.0] * res.number_of_iterations 
     587 
     588        nIter = [0]*res.number_of_iterations       # counts examples in each iteration 
     589        a = [0]*res.number_of_iterations           # average class in each iteration 
    546590        for tex in res.results: 
    547             nIter[tex.iterationNumber] += 1 
    548             a[tex.iterationNumber] += float(tex.actualClass) 
    549         a = [a[i]/nIter[i] for i in range(res.numberOfIterations)] 
     591            nIter[tex.iteration_number] += 1 
     592            a[tex.iteration_number] += float(tex.actual_class) 
     593        a = [a[i]/nIter[i] for i in range(res.number_of_iterations)] 
    550594 
    551595        if argkw.get("unweighted", 0) or not res.weights: 
    552596            # iterate accross test cases 
    553597            for tex in res.results: 
    554                 ai = float(tex.actualClass) 
    555                 nIter[tex.iterationNumber] += 1 
     598                ai = float(tex.actual_class) 
     599                nIter[tex.iteration_number] += 1 
    556600 
    557601                # compute normalization, if required 
    558602                if argkw.get("norm-abs", 0): 
    559                     norm[tex.iterationNumber] += abs(ai - a[tex.iterationNumber]) 
     603                    norm[tex.iteration_number] += abs(ai - a[tex.iteration_number]) 
    560604                elif argkw.get("norm-sqr", 0): 
    561                     norm[tex.iterationNumber] += (ai - a[tex.iterationNumber])**2 
     605                    norm[tex.iteration_number] += (ai - a[tex.iteration_number])**2 
    562606 
    563607                # iterate accross results of different regressors 
    564608                for i, cls in enumerate(tex.classes): 
    565609                    if argkw.get("abs", 0): 
    566                         scores[i][tex.iterationNumber] += abs(float(cls) - ai) 
     610                        scores[i][tex.iteration_number] += abs(float(cls) - ai) 
    567611                    else: 
    568                         scores[i][tex.iterationNumber] += (float(cls) - ai)**2 
     612                        scores[i][tex.iteration_number] += (float(cls) - ai)**2 
    569613        else: # unweighted<>0 
    570614            raise NotImplementedError, "weighted error scores with SE not implemented yet" 
     
    584628         
    585629    else: # single iteration (testing on a single test set) 
    586         scores = [0.0] * res.numberOfLearners 
     630        scores = [0.0] * res.number_of_learners 
    587631        norm = 0.0 
    588632 
    589633        if argkw.get("unweighted", 0) or not res.weights: 
    590             a = sum([tex.actualClass for tex in res.results]) \ 
     634            a = sum([tex.actual_class for tex in res.results]) \ 
    591635                / len(res.results) 
    592636            for tex in res.results: 
    593637                if argkw.get("abs", 0): 
    594                     scores = map(lambda res, cls, ac = float(tex.actualClass): 
     638                    scores = map(lambda res, cls, ac = float(tex.actual_class): 
    595639                                 res + abs(float(cls) - ac), scores, tex.classes) 
    596640                else: 
    597                     scores = map(lambda res, cls, ac = float(tex.actualClass): 
     641                    scores = map(lambda res, cls, ac = float(tex.actual_class): 
    598642                                 res + (float(cls) - ac)**2, scores, tex.classes) 
    599643 
    600644                if argkw.get("norm-abs", 0): 
    601                     norm += abs(tex.actualClass - a) 
     645                    norm += abs(tex.actual_class - a) 
    602646                elif argkw.get("norm-sqr", 0): 
    603                     norm += (tex.actualClass - a)**2 
     647                    norm += (tex.actual_class - a)**2 
    604648            totweight = gettotsize(res) 
    605649        else: 
    606650            # UNFINISHED 
    607651            for tex in res.results: 
    608                 MSEs = map(lambda res, cls, ac = float(tex.actualClass), 
     652                MSEs = map(lambda res, cls, ac = float(tex.actual_class), 
    609653                           tw = tex.weight: 
    610654                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes) 
     
    663707def MSE_old(res, **argkw): 
    664708    """MSE(res) -> mean-squared error""" 
    665     if argkw.get("SE", 0) and res.numberOfIterations > 1: 
    666         MSEs = [[0.0] * res.numberOfIterations for i in range(res.numberOfLearners)] 
    667         nIter = [0]*res.numberOfIterations 
     709    if argkw.get("SE", 0) and res.number_of_iterations > 1: 
     710        MSEs = [[0.0] * res.number_of_iterations for i in range(res.number_of_learners)] 
     711        nIter = [0]*res.number_of_iterations 
    668712        if argkw.get("unweighted", 0) or not res.weights: 
    669713            for tex in res.results: 
    670                 ac = float(tex.actualClass) 
    671                 nIter[tex.iterationNumber] += 1 
     714                ac = float(tex.actual_class) 
     715                nIter[tex.iteration_number] += 1 
    672716                for i, cls in enumerate(tex.classes): 
    673                     MSEs[i][tex.iterationNumber] += (float(cls) - ac)**2 
     717                    MSEs[i][tex.iteration_number] += (float(cls) - ac)**2 
    674718        else: 
    675719            raise ValueError, "weighted RMSE with SE not implemented yet" 
     
    680724         
    681725    else: 
    682         MSEs = [0.0]*res.numberOfLearners 
     726        MSEs = [0.0]*res.number_of_learners 
    683727        if argkw.get("unweighted", 0) or not res.weights: 
    684728            for tex in res.results: 
    685                 MSEs = map(lambda res, cls, ac = float(tex.actualClass): 
     729                MSEs = map(lambda res, cls, ac = float(tex.actual_class): 
    686730                           res + (float(cls) - ac)**2, MSEs, tex.classes) 
    687731            totweight = gettotsize(res) 
    688732        else: 
    689733            for tex in res.results: 
    690                 MSEs = map(lambda res, cls, ac = float(tex.actualClass), tw = tex.weight: 
     734                MSEs = map(lambda res, cls, ac = float(tex.actual_class), tw = tex.weight: 
    691735                           res + tw * (float(cls) - ac)**2, MSEs, tex.classes) 
    692736            totweight = gettotweight(res) 
     
    728772    estimated using the latter method. 
    729773    """ 
    730     if res.numberOfIterations==1: 
     774    if res.number_of_iterations==1: 
    731775        if type(res)==ConfusionMatrix: 
    732776            div = nm.TP+nm.FN+nm.FP+nm.TN 
     
    734778            ca = [(nm.TP+nm.TN)/div] 
    735779        else: 
    736             CAs = [0.0]*res.numberOfLearners 
     780            CAs = [0.0]*res.number_of_learners 
    737781            if argkw.get("unweighted", 0) or not res.weights: 
    738782                totweight = gettotsize(res) 
    739783                for tex in res.results: 
    740                     CAs = map(lambda res, cls: res+(cls==tex.actualClass), CAs, tex.classes) 
     784                    CAs = map(lambda res, cls: res+(cls==tex.actual_class), CAs, tex.classes) 
    741785            else: 
    742786                totweight = 0. 
    743787                for tex in res.results: 
    744                     CAs = map(lambda res, cls: res+(cls==tex.actualClass and tex.weight), CAs, tex.classes) 
     788                    CAs = map(lambda res, cls: res+(cls==tex.actual_class and tex.weight), CAs, tex.classes) 
    745789                    totweight += tex.weight 
    746790            check_non_zero(totweight) 
     
    753797         
    754798    else: 
    755         CAsByFold = [[0.0]*res.numberOfIterations for i in range(res.numberOfLearners)] 
    756         foldN = [0.0]*res.numberOfIterations 
     799        CAsByFold = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)] 
     800        foldN = [0.0]*res.number_of_iterations 
    757801 
    758802        if argkw.get("unweighted", 0) or not res.weights: 
    759803            for tex in res.results: 
    760                 for lrn in range(res.numberOfLearners): 
    761                     CAsByFold[lrn][tex.iterationNumber] += (tex.classes[lrn]==tex.actualClass) 
    762                 foldN[tex.iterationNumber] += 1 
     804                for lrn in range(res.number_of_learners): 
     805                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) 
     806                foldN[tex.iteration_number] += 1 
    763807        else: 
    764808            for tex in res.results: 
    765                 for lrn in range(res.numberOfLearners): 
    766                     CAsByFold[lrn][tex.iterationNumber] += (tex.classes[lrn]==tex.actualClass) and tex.weight 
    767                 foldN[tex.iterationNumber] += tex.weight 
     809                for lrn in range(res.number_of_learners): 
     810                    CAsByFold[lrn][tex.iteration_number] += (tex.classes[lrn]==tex.actual_class) and tex.weight 
     811                foldN[tex.iteration_number] += tex.weight 
    768812 
    769813        return statistics_by_folds(CAsByFold, foldN, reportSE, False) 
     
    777821def AP(res, reportSE = False, **argkw): 
    778822    """ Computes the average probability assigned to the correct class. """ 
    779     if res.numberOfIterations == 1: 
    780         APs=[0.0]*res.numberOfLearners 
     823    if res.number_of_iterations == 1: 
     824        APs=[0.0]*res.number_of_learners 
    781825        if argkw.get("unweighted", 0) or not res.weights: 
    782826            for tex in res.results: 
    783                 APs = map(lambda res, probs: res + probs[tex.actualClass], APs, tex.probabilities) 
     827                APs = map(lambda res, probs: res + probs[tex.actual_class], APs, tex.probabilities) 
    784828            totweight = gettotsize(res) 
    785829        else: 
    786830            totweight = 0. 
    787831            for tex in res.results: 
    788                 APs = map(lambda res, probs: res + probs[tex.actualClass]*tex.weight, APs, tex.probabilities) 
     832                APs = map(lambda res, probs: res + probs[tex.actual_class]*tex.weight, APs, tex.probabilities) 
    789833                totweight += tex.weight 
    790834        check_non_zero(totweight) 
    791835        return [AP/totweight for AP in APs] 
    792836 
    793     APsByFold = [[0.0]*res.numberOfLearners for i in range(res.numberOfIterations)] 
    794     foldN = [0.0] * res.numberOfIterations 
     837    APsByFold = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)] 
     838    foldN = [0.0] * res.number_of_iterations 
    795839    if argkw.get("unweighted", 0) or not res.weights: 
    796840        for tex in res.results: 
    797             APsByFold[tex.iterationNumber] = map(lambda res, probs: res + probs[tex.actualClass], APsByFold[tex.iterationNumber], tex.probabilities) 
    798             foldN[tex.iterationNumber] += 1 
     841            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class], APsByFold[tex.iteration_number], tex.probabilities) 
     842            foldN[tex.iteration_number] += 1 
    799843    else: 
    800844        for tex in res.results: 
    801             APsByFold[tex.iterationNumber] = map(lambda res, probs: res + probs[tex.actualClass] * tex.weight, APsByFold[tex.iterationNumber], tex.probabilities) 
    802             foldN[tex.iterationNumber] += tex.weight 
     845            APsByFold[tex.iteration_number] = map(lambda res, probs: res + probs[tex.actual_class] * tex.weight, APsByFold[tex.iteration_number], tex.probabilities) 
     846            foldN[tex.iteration_number] += tex.weight 
    803847 
    804848    return statistics_by_folds(APsByFold, foldN, reportSE, True) 
     
    821865    # We take max(result, 0) to avoid -0.0000x due to rounding errors 
    822866 
    823     if res.numberOfIterations == 1: 
    824         MSEs=[0.0]*res.numberOfLearners 
     867    if res.number_of_iterations == 1: 
     868        MSEs=[0.0]*res.number_of_learners 
    825869        if argkw.get("unweighted", 0) or not res.weights: 
    826870            totweight = 0.0 
    827871            for tex in res.results: 
    828872                MSEs = map(lambda res, probs: 
    829                            res + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actualClass], MSEs, tex.probabilities) 
     873                           res + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities) 
    830874                totweight += tex.weight 
    831875        else: 
    832876            for tex in res.results: 
    833877                MSEs = map(lambda res, probs: 
    834                            res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actualClass], MSEs, tex.probabilities) 
     878                           res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], MSEs, tex.probabilities) 
    835879            totweight = gettotweight(res) 
    836880        check_non_zero(totweight) 
     
    840884            return [max(x/totweight+1.0, 0) for x in MSEs] 
    841885 
    842     BSs = [[0.0]*res.numberOfLearners for i in range(res.numberOfIterations)] 
    843     foldN = [0.] * res.numberOfIterations 
     886    BSs = [[0.0]*res.number_of_learners for i in range(res.number_of_iterations)] 
     887    foldN = [0.] * res.number_of_iterations 
    844888 
    845889    if argkw.get("unweighted", 0) or not res.weights: 
    846890        for tex in res.results: 
    847             BSs[tex.iterationNumber] = map(lambda rr, probs: 
    848                        rr + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actualClass], BSs[tex.iterationNumber], tex.probabilities) 
    849             foldN[tex.iterationNumber] += 1 
     891            BSs[tex.iteration_number] = map(lambda rr, probs: 
     892                       rr + reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities) 
     893            foldN[tex.iteration_number] += 1 
    850894    else: 
    851895        for tex in res.results: 
    852             BSs[tex.iterationNumber] = map(lambda res, probs: 
    853                        res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actualClass], BSs[tex.iterationNumber], tex.probabilities) 
    854             foldN[tex.iterationNumber] += tex.weight 
     896            BSs[tex.iteration_number] = map(lambda res, probs: 
     897                       res + tex.weight*reduce(lambda s, pi: s+pi**2, probs, 0) - 2*probs[tex.actual_class], BSs[tex.iteration_number], tex.probabilities) 
     898            foldN[tex.iteration_number] += tex.weight 
    855899 
    856900    stats = statistics_by_folds(BSs, foldN, reportSE, True) 
     
    881925        apriori = class_probabilities_from_res(res) 
    882926 
    883     if res.numberOfIterations==1: 
    884         ISs = [0.0]*res.numberOfLearners 
     927    if res.number_of_iterations==1: 
     928        ISs = [0.0]*res.number_of_learners 
    885929        if argkw.get("unweighted", 0) or not res.weights: 
    886930            for tex in res.results: 
    887931              for i in range(len(tex.probabilities)): 
    888                     cls = tex.actualClass 
     932                    cls = tex.actual_class 
    889933                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls]) 
    890934            totweight = gettotsize(res) 
     
    892936            for tex in res.results: 
    893937              for i in range(len(tex.probabilities)): 
    894                     cls = tex.actualClass 
     938                    cls = tex.actual_class 
    895939                    ISs[i] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight 
    896940            totweight = gettotweight(res) 
     
    901945 
    902946         
    903     ISs = [[0.0]*res.numberOfIterations for i in range(res.numberOfLearners)] 
    904     foldN = [0.] * res.numberOfIterations 
     947    ISs = [[0.0]*res.number_of_iterations for i in range(res.number_of_learners)] 
     948    foldN = [0.] * res.number_of_iterations 
    905949 
    906950    # compute info scores for each fold     
     
    908952        for tex in res.results: 
    909953            for i in range(len(tex.probabilities)): 
    910                 cls = tex.actualClass 
    911                 ISs[i][tex.iterationNumber] += IS_ex(tex.probabilities[i][cls], apriori[cls]) 
    912             foldN[tex.iterationNumber] += 1 
     954                cls = tex.actual_class 
     955                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls]) 
     956            foldN[tex.iteration_number] += 1 
    913957    else: 
    914958        for tex in res.results: 
    915959            for i in range(len(tex.probabilities)): 
    916                 cls = tex.actualClass 
    917                 ISs[i][tex.iterationNumber] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight 
    918             foldN[tex.iterationNumber] += tex.weight 
     960                cls = tex.actual_class 
     961                ISs[i][tex.iteration_number] += IS_ex(tex.probabilities[i][cls], apriori[cls]) * tex.weight 
     962            foldN[tex.iteration_number] += tex.weight 
    919963 
    920964    return statistics_by_folds(ISs, foldN, reportSE, False) 
     
    930974            sums = ranks 
    931975            k = len(sums) 
    932     N = res.numberOfIterations 
     976    N = res.number_of_iterations 
    933977    k = len(sums) 
    934978    T = sum([x*x for x in sums]) 
     
    9871031    compatibility issues. 
    9881032    """ 
    989     tfpns = [ConfusionMatrix() for i in range(res.numberOfLearners)] 
     1033    tfpns = [ConfusionMatrix() for i in range(res.number_of_learners)] 
    9901034     
    9911035    if classIndex<0: 
    992         numberOfClasses = len(res.classValues) 
     1036        numberOfClasses = len(res.class_values) 
    9931037        if classIndex < -1 or numberOfClasses > 2: 
    994             cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.numberOfLearners)] 
     1038            cm = [[[0.0] * numberOfClasses for i in range(numberOfClasses)] for l in range(res.number_of_learners)] 
    9951039            if argkw.get("unweighted", 0) or not res.weights: 
    9961040                for tex in res.results: 
    997                     trueClass = int(tex.actualClass) 
     1041                    trueClass = int(tex.actual_class) 
    9981042                    for li, pred in enumerate(tex.classes): 
    9991043                        predClass = int(pred) 
     
    10021046            else: 
    10031047                for tex in enumerate(res.results): 
    1004                     trueClass = int(tex.actualClass) 
     1048                    trueClass = int(tex.actual_class) 
    10051049                    for li, pred in tex.classes: 
    10061050                        predClass = int(pred) 
     
    10181062        if argkw.get("unweighted", 0) or not res.weights: 
    10191063            for lr in res.results: 
    1020                 isPositive=(lr.actualClass==classIndex) 
    1021                 for i in range(res.numberOfLearners): 
     1064                isPositive=(lr.actual_class==classIndex) 
     1065                for i in range(res.number_of_learners): 
    10221066                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive) 
    10231067        else: 
    10241068            for lr in res.results: 
    1025                 isPositive=(lr.actualClass==classIndex) 
    1026                 for i in range(res.numberOfLearners): 
     1069                isPositive=(lr.actual_class==classIndex) 
     1070                for i in range(res.number_of_learners): 
    10271071                    tfpns[i].addTFPosNeg(lr.probabilities[i][classIndex]>cutoff, isPositive, lr.weight) 
    10281072    else: 
    10291073        if argkw.get("unweighted", 0) or not res.weights: 
    10301074            for lr in res.results: 
    1031                 isPositive=(lr.actualClass==classIndex) 
    1032                 for i in range(res.numberOfLearners): 
     1075                isPositive=(lr.actual_class==classIndex) 
     1076                for i in range(res.number_of_learners): 
    10331077                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive) 
    10341078        else: 
    10351079            for lr in res.results: 
    1036                 isPositive=(lr.actualClass==classIndex) 
    1037                 for i in range(res.numberOfLearners): 
     1080                isPositive=(lr.actual_class==classIndex) 
     1081                for i in range(res.number_of_learners): 
    10381082                    tfpns[i].addTFPosNeg(lr.classes[i]==classIndex, isPositive, lr.weight) 
    10391083    return tfpns 
     
    15531597    import corn 
    15541598    ## merge multiple iterations into one 
    1555     mres = Orange.evaluation.testing.ExperimentResults(1, res.classifierNames, res.classValues, res.weights, classifiers=res.classifiers, loaded=res.loaded) 
     1599    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifierNames, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded) 
    15561600    for te in res.results: 
    15571601        mres.results.append( te ) 
     
    16151659    import corn 
    16161660    ## merge multiple iterations into one 
    1617     mres = Orange.evaluation.testing.ExperimentResults(1, res.classifierNames, res.classValues, res.weights, classifiers=res.classifiers, loaded=res.loaded) 
     1661    mres = Orange.evaluation.testing.ExperimentResults(1, res.classifierNames, res.class_values, res.weights, classifiers=res.classifiers, loaded=res.loaded) 
    16181662    for te in res.results: 
    16191663        mres.results.append( te ) 
     
    16591703    weightByClasses = argkw.get("weightByClasses", True) 
    16601704 
    1661     if (res.numberOfIterations>1): 
    1662         CDTs = [CDT() for i in range(res.numberOfLearners)] 
     1705    if (res.number_of_iterations>1): 
     1706        CDTs = [CDT() for i in range(res.number_of_learners)] 
    16631707        iterationExperiments = split_by_iterations(res) 
    16641708        for exp in iterationExperiments: 
     
    16681712                CDTs[i].D += expCDTs[i].D 
    16691713                CDTs[i].T += expCDTs[i].T 
    1670         for i in range(res.numberOfLearners): 
     1714        for i in range(res.number_of_learners): 
    16711715            if is_CDT_empty(CDTs[0]): 
    16721716                return corn.computeCDT(res, classIndex, useweights) 
     
    17481792# in these cases the result is returned immediately 
    17491793def AUC_iterations(AUCcomputer, iterations, computerArgs): 
    1750     subsum_aucs = [0.] * iterations[0].numberOfLearners 
     1794    subsum_aucs = [0.] * iterations[0].number_of_learners 
    17511795    for ite in iterations: 
    17521796        aucs, foldsUsed = AUCcomputer(*(ite, ) + computerArgs) 
     
    17611805# AUC for binary classification problems 
    17621806def AUC_binary(res, useWeights = True): 
    1763     if res.numberOfIterations > 1: 
    1764         return AUC_iterations(AUC_i, split_by_iterations(res), (-1, useWeights, res, res.numberOfIterations)) 
     1807    if res.number_of_iterations > 1: 
     1808        return AUC_iterations(AUC_i, split_by_iterations(res), (-1, useWeights, res, res.number_of_iterations)) 
    17651809    else: 
    17661810        return AUC_i(res, -1, useWeights)[0] 
     
    17681812# AUC for multiclass problems 
    17691813def AUC_multi(res, useWeights = True, method = 0): 
    1770     numberOfClasses = len(res.classValues) 
    1771      
    1772     if res.numberOfIterations > 1: 
     1814    numberOfClasses = len(res.class_values) 
     1815     
     1816    if res.number_of_iterations > 1: 
    17731817        iterations = split_by_iterations(res) 
    17741818        all_ite = res 
     
    17781822     
    17791823    # by pairs 
    1780     sum_aucs = [0.] * res.numberOfLearners 
     1824    sum_aucs = [0.] * res.number_of_learners 
    17811825    usefulClassPairs = 0. 
    17821826 
     
    17871831        for classIndex1 in range(numberOfClasses): 
    17881832            for classIndex2 in range(classIndex1): 
    1789                 subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.numberOfIterations)) 
     1833                subsum_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations)) 
    17901834                if subsum_aucs: 
    17911835                    if method == 0: 
     
    17981842    else: 
    17991843        for classIndex in range(numberOfClasses): 
    1800             subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, useWeights, all_ite, res.numberOfIterations)) 
     1844            subsum_aucs = AUC_iterations(AUC_i, iterations, (classIndex, useWeights, all_ite, res.number_of_iterations)) 
    18011845            if subsum_aucs: 
    18021846                if method == 0: 
     
    18251869    average, as specified by the argument method. 
    18261870    """ 
    1827     if len(res.classValues) < 2: 
     1871    if len(res.class_values) < 2: 
    18281872        raise ValueError("Cannot compute AUC on a single-class problem") 
    1829     elif len(res.classValues) == 2: 
     1873    elif len(res.class_values) == 2: 
    18301874        return AUC_binary(res, useWeights) 
    18311875    else: 
     
    18551899            classIndex = 1 
    18561900 
    1857     if res.numberOfIterations > 1: 
    1858         return AUC_iterations(AUC_i, split_by_iterations(res), (classIndex, useWeights, res, res.numberOfIterations)) 
     1901    if res.number_of_iterations > 1: 
     1902        return AUC_iterations(AUC_i, split_by_iterations(res), (classIndex, useWeights, res, res.number_of_iterations)) 
    18591903    else: 
    18601904        return AUC_i( res, classIndex, useWeights)[0] 
     
    18661910    other classes. 
    18671911    """ 
    1868     if res.numberOfIterations > 1: 
    1869         return AUC_iterations(AUC_ij, split_by_iterations(res), (classIndex1, classIndex2, useWeights, res, res.numberOfIterations)) 
     1912    if res.number_of_iterations > 1: 
     1913        return AUC_iterations(AUC_ij, split_by_iterations(res), (classIndex1, classIndex2, useWeights, res, res.number_of_iterations)) 
    18701914    else: 
    18711915        return AUC_ij(res, classIndex1, classIndex2, useWeights) 
     
    18851929            print ("%s" + ("\t%5.3f" * len(AUCrow))) % ((className, ) + tuple(AUCrow)) 
    18861930    """ 
    1887     numberOfClasses = len(res.classValues) 
    1888     numberOfLearners = res.numberOfLearners 
    1889      
    1890     if res.numberOfIterations > 1: 
     1931    numberOfClasses = len(res.class_values) 
     1932    number_of_learners = res.number_of_learners 
     1933     
     1934    if res.number_of_iterations > 1: 
    18911935        iterations, all_ite = split_by_iterations(res), res 
    18921936    else: 
    18931937        iterations, all_ite = [res], None 
    18941938     
    1895     aucs = [[[] for i in range(numberOfClasses)] for i in range(numberOfLearners)] 
     1939    aucs = [[[] for i in range(numberOfClasses)] for i in range(number_of_learners)] 
    18961940    prob = class_probabilities_from_res(res) 
    18971941         
    18981942    for classIndex1 in range(numberOfClasses): 
    18991943        for classIndex2 in range(classIndex1): 
    1900             pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.numberOfIterations)) 
     1944            pair_aucs = AUC_iterations(AUC_ij, iterations, (classIndex1, classIndex2, useWeights, all_ite, res.number_of_iterations)) 
    19011945            if pair_aucs: 
    1902                 for lrn in range(numberOfLearners): 
     1946                for lrn in range(number_of_learners): 
    19031947                    aucs[lrn][classIndex1].append(pair_aucs[lrn]) 
    19041948            else: 
    1905                 for lrn in range(numberOfLearners): 
     1949                for lrn in range(number_of_learners): 
    19061950                    aucs[lrn][classIndex1].append(-1) 
    19071951    return aucs 
     
    19131957    one degree of freedom; critical value for 5% significance is around 3.84. 
    19141958    """ 
    1915     nLearners = res.numberOfLearners 
     1959    nLearners = res.number_of_learners 
    19161960    mcm = [] 
    19171961    for i in range(nLearners): 
    1918        mcm.append([0.0]*res.numberOfLearners) 
     1962       mcm.append([0.0]*res.number_of_learners) 
    19191963 
    19201964    if not res.weights or argkw.get("unweighted"): 
    19211965        for i in res.results: 
    1922             actual = i.actualClass 
     1966            actual = i.actual_class 
    19231967            classes = i.classes 
    19241968            for l1 in range(nLearners): 
     
    19311975    else: 
    19321976        for i in res.results: 
    1933             actual = i.actualClass 
     1977            actual = i.actual_class 
    19341978            classes = i.classes 
    19351979            for l1 in range(nLearners): 
     
    19622006    if not res.weights or argkw.get("unweighted"): 
    19632007        for i in res.results: 
    1964             actual=i.actualClass 
     2008            actual=i.actual_class 
    19652009            if i.classes[lrn1]==actual: 
    19662010                if i.classes[lrn2]!=actual: 
     
    19702014    else: 
    19712015        for i in res.results: 
    1972             actual=i.actualClass 
     2016            actual=i.actual_class 
    19732017            if i.classes[lrn1]==actual: 
    19742018                if i.classes[lrn2]!=actual: 
     
    25502594    print_figure(fig, filename, **kwargs) 
    25512595 
     2596def hamming_loss(res): 
     2597    """ 
     2598    Schapire and Singer (2000) presented Hamming Loss, which id defined as:  
     2599     
     2600    :math:`HammingLoss(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{Y_i \\vartriangle Z_i}{|L|}` 
     2601    """ 
     2602    losses = [0.0]*res.number_of_learners 
     2603    label_num = len(res.class_values) 
     2604    example_num = gettotsize(res) 
     2605     
     2606    for e in res.results: 
     2607        aclass = e.actual_class 
     2608        for i in range(len(e.classes)): 
     2609            labels = e.classes[i]  
     2610            if len(labels) <> len(aclass): 
     2611                raise ValueError, "The dimensions of the classified output and the actual class array do not match." 
     2612            for j in range(label_num): 
     2613                if labels[j] == aclass[j]: 
     2614                    losses[i] = losses[i]+1 
     2615             
     2616    return [x/label_num/example_num for x in losses] 
     2617 
     2618def accuracy(res, forgiveness_rate = 1.0): 
     2619    """ 
     2620    Godbole & Sarawagi, 2004 uses the metrics accuracy, precision, recall as follows: 
     2621      
     2622    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|}` 
     2623     
     2624    Boutell et al. (2004) give a more generalized version using a parameter :math:`\\alpha \\ge 0`,  
     2625    called forgiveness rate: 
     2626     
     2627    :math:`Accuracy(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} (\\frac{|Y_i \\cap Z_i|}{|Y_i \\cup Z_i|})^{\\alpha}` 
     2628    """ 
     2629    accuracies = [0.0]*res.number_of_learners 
     2630    label_num = len(res.class_values) 
     2631    example_num = gettotsize(res) 
     2632     
     2633    for e in res.results: 
     2634        aclass = e.actual_class 
     2635        for i in range(len(e.classes)): 
     2636            labels = e.classes[i]  
     2637            if len(labels) <> len(aclass): 
     2638                raise ValueError, "The dimensions of the classified output and the actual class array do not match." 
     2639             
     2640            intersection = 0.0 
     2641            union = 0.0 
     2642            for j in range(label_num): 
     2643                if labels[j]=='1' and aclass[j]=='1': 
     2644                    intersection = intersection+1 
     2645                if labels[j]=='1' or aclass[j]=='1': 
     2646                    union = union+1 
     2647            #print intersection, union 
     2648            if union <> 0: 
     2649                accuracies[i] = accuracies[i] + intersection/union 
     2650             
     2651    return [math.pow(x/example_num,forgiveness_rate) for x in accuracies] 
     2652 
     2653def precision(res): 
     2654    """ 
     2655    :math:`Precision(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Z_i|}` 
     2656    """ 
     2657    precisions = [0.0]*res.number_of_learners 
     2658    label_num = len(res.class_values) 
     2659    example_num = gettotsize(res) 
     2660     
     2661    for e in res.results: 
     2662        aclass = e.actual_class 
     2663        for i in range(len(e.classes)): 
     2664            labels = e.classes[i]  
     2665            if len(labels) <> len(aclass): 
     2666                raise ValueError, "The dimensions of the classified output and the actual class array do not match." 
     2667             
     2668            intersection = 0.0 
     2669            predicted = 0.0 
     2670            for j in range(label_num): 
     2671                if labels[j]=='1' and aclass[j]=='1': 
     2672                    intersection = intersection+1 
     2673                if labels[j] == '1': 
     2674                    predicted = predicted + 1 
     2675            if predicted <> 0: 
     2676                precisions[i] = precisions[i] + intersection/predicted 
     2677             
     2678    return [x/example_num for x in precisions] 
     2679 
     2680def recall(res): 
     2681    """ 
     2682    :math:`Recall(H,D)=\\frac{1}{|D|} \\sum_{i=1}^{|D|} \\frac{|Y_i \\cap Z_i|}{|Y_i|}` 
     2683    """ 
     2684    recalls = [0.0]*res.number_of_learners 
     2685    label_num = len(res.class_values) 
     2686    example_num = gettotsize(res) 
     2687     
     2688    for e in res.results: 
     2689        aclass = e.actual_class 
     2690        for i in range(len(e.classes)): 
     2691            labels = e.classes[i]  
     2692            if len(labels) <> len(aclass): 
     2693                raise ValueError, "The dimensions of the classified output and the actual class array do not match." 
     2694             
     2695            intersection = 0.0 
     2696            actual = 0.0 
     2697            for j in range(label_num): 
     2698                if labels[j]=='1' and aclass[j]=='1': 
     2699                    intersection = intersection+1 
     2700                if aclass[j] == '1': 
     2701                    actual = actual + 1 
     2702            if actual <> 0: 
     2703                recalls[i] = recalls[i] + intersection/actual 
     2704             
     2705    return [x/example_num for x in recalls] 
     2706 
     2707def ranking_loss(res): 
     2708    pass 
     2709 
     2710def average_precision(res): 
     2711    pass 
     2712 
     2713def hierarchical_loss(res): 
     2714    pass 
     2715 
     2716######################################################################################### 
    25522717if __name__ == "__main__": 
    25532718    avranks =  [3.143, 2.000, 2.893, 1.964] 
  • orange/Orange/evaluation/testing.py

    r8264 r9447  
    3636    :start-after: import random 
    3737    :end-before: def printResults(res) 
     38 
     39Example scripts for multi-label data. 
     40part of `mlc-evaluator.py`_ (uses `multidata.tab`_) 
     41 
     42.. literalinclude:: code/mlc-evaluator.py 
     43    :lines: 1-7 
    3844 
    3945After testing is done, classification accuracies can be computed and 
     
    7379    if the class variable is discrete and has no unknown values. 
    7480 
    75 *randseed (obsolete: indicesrandseed), randomGenerator* 
     81*randseed (obsolete: indices_randseed), randomGenerator* 
    7682    Random seed (``randseed``) or random generator (``randomGenerator``) for 
    7783    random selection of examples. If omitted, random seed of 0 is used and 
     
    138144    to be made, where applicable. The default is ``[0.1, 0.2, ..., 1.0]``. 
    139145 
    140 *storeClassifiers (keyword argument)* 
     146*store_classifiers (keyword argument)* 
    141147    If this flag is set, the testing procedure will store the constructed 
    142148    classifiers. For each iteration of the test (eg for each fold in 
     
    196202from Orange.misc import demangleExamples, getobjectname, printVerbose 
    197203import exceptions, cPickle, os, os.path 
     204import Orange.multilabel.label as label  
    198205 
    199206#### Some private stuff 
     
    223230        A list of probabilities of classes, one for each classifier. 
    224231 
    225     .. attribute:: iterationNumber 
     232    .. attribute:: iteration_number 
    226233 
    227234        Iteration number (e.g. fold) in which the TestedExample was created/tested. 
    228235 
    229     .. attribute:: actualClass 
     236    .. attribute:: actual_class 
    230237 
    231238        The correct class of the example 
     
    235242        Example's weight. Even if the example set was not weighted, 
    236243        this attribute is present and equals 1.0. 
    237  
    238     :param iterationNumber: 
    239     :paramtype iterationNumber: type??? 
    240     :param actualClass: 
    241     :paramtype actualClass: type??? 
     244         
     245    .. attribute:: multilabel_flag 
     246         
     247       Flag for the example to indicate whether it is a multi-label instance. If the flag is 0, it's single-label. Or else, it's multi-label. 
     248     
     249    :param iteration_number: 
     250    :paramtype iteration_number: int 
     251    :param actual_class: 
     252    :paramtype actual_class: :class:`Orange.data.Value` for single-label classification, and list of :class:`Orange.data.Value` for multi-label classification 
    242253    :param n: 
    243254    :paramtype n: int 
    244255    :param weight: 
    245256    :paramtype weight: float 
    246  
    247     """ 
    248  
    249     def __init__(self, iterationNumber=None, actualClass=None, n=0, weight=1.0): 
     257    :param multilabel_flag: 
     258    :paramtype multilabel_flag: int 
     259    """ 
     260 
     261    def __init__(self, iteration_number=None, actual_class=None, n=0, weight=1.0, multilabel_flag = 0): 
    250262        self.classes = [None]*n 
    251263        self.probabilities = [None]*n 
    252         self.iterationNumber = iterationNumber 
    253         self.actualClass= actualClass 
     264        self.iteration_number = iteration_number 
     265        self.actual_class= actual_class 
    254266        self.weight = weight 
     267        self.multilabel_flag = multilabel_flag 
    255268     
    256269    def add_result(self, aclass, aprob): 
    257270        """Appends a new result (class and probability prediction by a single classifier) to the classes and probabilities field.""" 
    258      
    259         if type(aclass.value)==float: 
     271        if self.multilabel_flag and type(aclass.value)==float: 
    260272            self.classes.append(float(aclass)) 
    261273            self.probabilities.append(aprob) 
     
    263275            self.classes.append(int(aclass)) 
    264276            self.probabilities.append(list(aprob)) 
    265  
     277        
    266278    def set_result(self, i, aclass, aprob): 
    267279        """Sets the result of the i-th classifier to the given values.""" 
    268         if type(aclass.value)==float: 
    269             self.classes[i] = float(aclass) 
    270             self.probabilities[i] = aprob 
     280        if self.multilabel_flag == 0: 
     281            if  type(aclass.value)==float: 
     282                self.classes[i] = float(aclass) 
     283                self.probabilities[i] = aprob 
     284            else: 
     285                self.classes[i] = int(aclass) 
     286                self.probabilities[i] = list(aprob) 
    271287        else: 
    272             self.classes[i] = int(aclass) 
     288            self.classes[i] = aclass 
    273289            self.probabilities[i] = list(aprob) 
    274  
     290             
    275291class ExperimentResults(object): 
    276292    """ 
     
    289305        fold). Each element is a list of classifiers, one for each 
    290306        learner. This field is used only if storing is enabled by 
    291         ``storeClassifiers=1``. 
    292  
    293     .. attribute:: numberOfIterations 
     307        ``store_classifiers=1``. 
     308 
     309    .. attribute:: number_of_iterations 
    294310 
    295311        Number of iterations. This can be the number of folds 
    296312        (in cross validation) or the number of repetitions of some 
    297         test. ``TestedExample``'s attribute ``iterationNumber`` should 
    298         be in range ``[0, numberOfIterations-1]``. 
    299  
    300     .. attribute:: numberOfLearners 
     313        test. ``TestedExample``'s attribute ``iteration_number`` should 
     314        be in range ``[0, number_of_iterations-1]``. 
     315 
     316    .. attribute:: number_of_learners 
    301317 
    302318        Number of learners. Lengths of lists classes and probabilities 
    303         in each :obj:`TestedExample` should equal ``numberOfLearners``. 
     319        in each :obj:`TestedExample` should equal ``number_of_learners``. 
    304320 
    305321    .. attribute:: loaded 
     
    320336 
    321337    """ 
    322     def __init__(self, iterations, classifierNames, classValues, weights, baseClass=-1, **argkw): 
    323         self.classValues = classValues 
    324         self.classifierNames = classifierNames 
    325         self.numberOfIterations = iterations 
    326         self.numberOfLearners = len(classifierNames) 
     338    def __init__(self, iterations, classifier_names, class_values, weights, base_class=-1, **argkw): 
     339        self.class_values = class_values 
     340        self.classifier_names = classifier_names 
     341        self.number_of_iterations = iterations 
     342        self.number_of_learners = len(classifier_names) 
    327343        self.results = [] 
    328344        self.classifiers = [] 
    329345        self.loaded = None 
    330         self.baseClass = baseClass 
     346        self.base_class = base_class 
    331347        self.weights = weights 
    332348        self.__dict__.update(argkw) 
     
    342358                for ex in range(len(self.results)): 
    343359                    tre = self.results[ex] 
    344                     if (tre.actualClass, tre.iterationNumber) != d[ex][0]: 
     360                    if (tre.actual_class, tre.iteration_number) != d[ex][0]: 
    345361                        raise SystemError, "mismatching example tables or sampling" 
    346362                    self.results[ex].set_result(i, d[ex][1][0], d[ex][1][1]) 
     
    364380        The data is saved in a separate file for each classifier. The 
    365381        file is a binary pickle file containing a list of tuples 
    366         ``((x.actualClass, x.iterationNumber), (x.classes[i], 
     382        ``((x.actual_class, x.iteration_number), (x.classes[i], 
    367383        x.probabilities[i]))`` where ``x`` is a :obj:`TestedExample` 
    368384        and ``i`` is the index of a learner. 
     
    396412                f=open(fname, "wb") 
    397413                pickler=cPickle.Pickler(f, 1) 
    398                 pickler.dump([(  (x.actualClass, x.iterationNumber), (x.classes[i], x.probabilities[i])  ) for x in self.results]) 
     414                pickler.dump([(  (x.actual_class, x.iteration_number), (x.classes[i], x.probabilities[i])  ) for x in self.results]) 
    399415                f.close() 
    400416 
     
    404420            del r.classes[index] 
    405421            del r.probabilities[index] 
    406         del self.classifierNames[index] 
    407         self.numberOfLearners -= 1 
     422        del self.classifier_names[index] 
     423        self.number_of_learners -= 1 
    408424 
    409425    def add(self, results, index, replace=-1): 
     
    411427        if len(self.results)<>len(results.results): 
    412428            raise SystemError, "mismatch in number of test cases" 
    413         if self.numberOfIterations<>results.numberOfIterations: 
     429        if self.number_of_iterations<>results.number_of_iterations: 
    414430            raise SystemError, "mismatch in number of iterations (%d<>%d)" % \ 
    415                   (self.numberOfIterations, results.numberOfIterations) 
     431                  (self.number_of_iterations, results.number_of_iterations) 
    416432        if len(self.classifiers) and len(results.classifiers)==0: 
    417433            raise SystemError, "no classifiers in results" 
    418434 
    419         if replace < 0 or replace >= self.numberOfLearners: # results for new learner 
    420             self.classifierNames.append(results.classifierNames[index]) 
    421             self.numberOfLearners += 1 
     435        if replace < 0 or replace >= self.number_of_learners: # results for new learner 
     436            self.classifier_names.append(results.classifier_names[index]) 
     437            self.number_of_learners += 1 
    422438            for i,r in enumerate(self.results): 
    423439                r.classes.append(results.results[i].classes[index]) 
    424440                r.probabilities.append(results.results[i].probabilities[index]) 
    425441            if len(self.classifiers): 
    426                 for i in range(self.numberOfIterations): 
     442                for i in range(self.number_of_iterations): 
    427443                    self.classifiers[i].append(results.classifiers[i][index]) 
    428444        else: # replace results of existing learner 
    429             self.classifierNames[replace] = results.classifierNames[index] 
     445            self.classifier_names[replace] = results.classifier_names[index] 
    430446            for i,r in enumerate(self.results): 
    431447                r.classes[replace] = results.results[i].classes[index] 
    432448                r.probabilities[replace] = results.results[i].probabilities[index] 
    433449            if len(self.classifiers): 
    434                 for i in range(self.numberOfIterations): 
     450                for i in range(self.number_of_iterations): 
    435451                    self.classifiers[replace] = results.classifiers[i][index] 
    436452 
    437453#### Experimental procedures 
    438454 
    439 def leave_one_out(learners, examples, pps=[], indicesrandseed="*", **argkw): 
     455def leave_one_out(learners, examples, pps=[], indices_randseed="*", **argkw): 
    440456 
    441457    """leave-one-out evaluation of learners on a data set 
     
    449465 
    450466    (examples, weight) = demangleExamples(examples) 
    451     return test_with_indices(learners, examples, range(len(examples)), indicesrandseed, pps, **argkw) 
     467    return test_with_indices(learners, examples, range(len(examples)), indices_randseed, pps, **argkw) 
    452468    # return test_with_indices(learners, examples, range(len(examples)), pps=pps, argkw) 
    453469 
    454 # apply(test_with_indices, (learners, (examples, weight), indices, indicesrandseed, pps), argkw) 
     470# apply(test_with_indices, (learners, (examples, weight), indices, indices_randseed, pps), argkw) 
    455471 
    456472 
     
    473489    Note that Python allows naming the arguments; instead of "100" you 
    474490    can use "times=100" to increase the clarity (not so with keyword 
    475     arguments, such as ``storeClassifiers``, ``randseed`` or ``verbose`` 
     491    arguments, such as ``store_classifiers``, ``randseed`` or ``verbose`` 
    476492    that must always be given with a name). 
    477493 
     
    480496    # randomGenerator is set either to what users provided or to orange.RandomGenerator(0) 
    481497    # If we left it None or if we set MakeRandomIndices2.randseed, it would give same indices each time it's called 
    482     randomGenerator = argkw.get("indicesrandseed", 0) or argkw.get("randseed", 0) or argkw.get("randomGenerator", 0) 
     498    randomGenerator = argkw.get("indices_randseed", 0) or argkw.get("randseed", 0) or argkw.get("randomGenerator", 0) 
    483499    pick = Orange.core.MakeRandomIndices2(stratified = strat, p0 = learnProp, randomGenerator = randomGenerator) 
    484500     
     
    490506    else: 
    491507        baseValue = values = None 
    492     testResults = ExperimentResults(times, [l.name for l in learners], values, weight!=0, baseValue) 
     508    test_results = ExperimentResults(times, [l.name for l in learners], values, weight!=0, baseValue) 
    493509 
    494510    for time in range(times): 
     
    496512        learnset = examples.selectref(indices, 0) 
    497513        testset = examples.selectref(indices, 1) 
    498         learn_and_test_on_test_data(learners, (learnset, weight), (testset, weight), testResults, time, pps, **argkw) 
     514        learn_and_test_on_test_data(learners, (learnset, weight), (testset, weight), test_results, time, pps, **argkw) 
    499515        if callback: callback() 
    500     return testResults 
     516    return test_results 
    501517 
    502518def cross_validation(learners, examples, folds=10, 
    503519                    strat=Orange.core.MakeRandomIndices.StratifiedIfPossible, 
    504                     pps=[], indicesrandseed="*", **argkw): 
     520                    pps=[], indices_randseed="*", **argkw): 
    505521    """cross-validation evaluation of learners 
    506522 
     
    509525    """ 
    510526    (examples, weight) = demangleExamples(examples) 
    511     if indicesrandseed!="*": 
    512         indices = Orange.core.MakeRandomIndicesCV(examples, folds, randseed=indicesrandseed, stratified = strat) 
     527    if indices_randseed!="*": 
     528        indices = Orange.core.MakeRandomIndicesCV(examples, folds, randseed=indices_randseed, stratified = strat) 
    513529    else: 
    514530        randomGenerator = argkw.get("randseed", 0) or argkw.get("randomGenerator", 0) 
    515531        indices = Orange.core.MakeRandomIndicesCV(examples, folds, stratified = strat, randomGenerator = randomGenerator) 
    516     return test_with_indices(learners, (examples, weight), indices, indicesrandseed, pps, **argkw) 
     532    return test_with_indices(learners, (examples, weight), indices, indices_randseed, pps, **argkw) 
    517533 
    518534 
     
    539555    """ 
    540556 
    541     seed = argkw.get("indicesrandseed", -1) or argkw.get("randseed", -1) 
     557    seed = argkw.get("indices_randseed", -1) or argkw.get("randseed", -1) 
    542558    if seed: 
    543559        randomGenerator = Orange.core.RandomGenerator(seed) 
     
    585601 
    586602    if not cv or not pick:     
    587         seed = argkw.get("indicesrandseed", -1) or argkw.get("randseed", -1) 
     603        seed = argkw.get("indices_randseed", -1) or argkw.get("randseed", -1) 
    588604        if seed: 
    589605            randomGenerator = Orange.core.RandomGenerator(seed) 
     
    613629 
    614630        conv = examples.domain.classVar.varType == Orange.data.Type.Discrete and int or float 
    615         testResults = ExperimentResults(cv.folds, [l.name for l in learners], examples.domain.classVar.values.native(), weight!=0, examples.domain.classVar.baseValue) 
    616         testResults.results = [TestedExample(folds[i], conv(examples[i].getclass()), nLrn, examples[i].getweight(weight)) 
     631        test_results = ExperimentResults(cv.folds, [l.name for l in learners], examples.domain.classVar.values.native(), weight!=0, examples.domain.classVar.baseValue) 
     632        test_results.results = [TestedExample(folds[i], conv(examples[i].getclass()), nLrn, examples[i].getweight(weight)) 
    617633                               for i in range(len(examples))] 
    618634 
    619         if cache and testResults.load_from_files(learners, fnstr): 
     635        if cache and test_results.load_from_files(learners, fnstr): 
    620636            printVerbose("  loaded from cache", verb) 
    621637        else: 
     
    634650                classifiers = [None]*nLrn 
    635651                for i in range(nLrn): 
    636                     if not cache or not testResults.loaded[i]: 
     652                    if not cache or not test_results.loaded[i]: 
    637653                        classifiers[i] = learners[i](learnset, weight) 
    638654 
     
    644660                        ex.setclass("?") 
    645661                        for cl in range(nLrn): 
    646                             if not cache or not testResults.loaded[cl]: 
     662                            if not cache or not test_results.loaded[cl]: 
    647663                                cls, pro = classifiers[cl](ex, Orange.core.GetBoth) 
    648                                 testResults.results[i].set_result(cl, cls, pro) 
     664                                test_results.results[i].set_result(cl, cls, pro) 
    649665                if callback: callback() 
    650666            if cache: 
    651                 testResults.save_to_files(learners, fnstr) 
    652  
    653         allResults.append(testResults) 
     667                test_results.save_to_files(learners, fnstr) 
     668 
     669        allResults.append(test_results) 
    654670         
    655671    return allResults 
     
    685701    testweight = demangleExamples(testset)[1] 
    686702     
    687     randomGenerator = argkw.get("indicesrandseed", 0) or argkw.get("randseed", 0) or argkw.get("randomGenerator", 0) 
     703    randomGenerator = argkw.get("indices_randseed", 0) or argkw.get("randseed", 0) or argkw.get("randomGenerator", 0) 
    688704    pick = Orange.core.MakeRandomIndices2(stratified = strat, randomGenerator = randomGenerator) 
    689705    allResults=[] 
    690706    for p in proportions: 
    691707        printVerbose("Proportion: %5.3f" % p, verb) 
    692         testResults = ExperimentResults(times, [l.name for l in learners], 
     708        test_results = ExperimentResults(times, [l.name for l in learners], 
    693709                                        testset.domain.classVar.values.native(), 
    694710                                        testweight!=0, testset.domain.classVar.baseValue) 
    695         testResults.results = [] 
     711        test_results.results = [] 
    696712         
    697713        for t in range(times): 
    698714            printVerbose("  repetition %d" % t, verb) 
    699715            learn_and_test_on_test_data(learners, (learnset.selectref(pick(learnset, p), 0), learnweight), 
    700                                    testset, testResults, t) 
    701  
    702         allResults.append(testResults) 
     716                                   testset, test_results, t) 
     717 
     718        allResults.append(test_results) 
    703719         
    704720    return allResults 
    705721 
    706722    
    707 def test_with_indices(learners, examples, indices, indicesrandseed="*", pps=[], callback=None, **argkw): 
     723def test_with_indices(learners, examples, indices, indices_randseed="*", pps=[], callback=None, **argkw): 
    708724    """ 
    709725    Performs a cross-validation-like test. The difference is that the 
     
    718734    saved in files or loaded therefrom if you add a keyword argument 
    719735    ``cache=1``. In this case, you also have to specify the random seed 
    720     which was used to compute the indices (argument ``indicesrandseed``; 
     736    which was used to compute the indices (argument ``indices_randseed``; 
    721737    if you don't there will be no caching. 
    722738 
     
    725741    verb = argkw.get("verbose", 0) 
    726742    cache = argkw.get("cache", 0) 
    727     storeclassifiers = argkw.get("storeclassifiers", 0) or argkw.get("storeClassifiers", 0) 
    728     cache = cache and not storeclassifiers 
     743    store_classifiers = argkw.get("store_classifiers", 0) or argkw.get("store_classifiers", 0) 
     744    cache = cache and not store_classifiers 
    729745 
    730746    examples, weight = demangleExamples(examples) 
     
    733749    if not examples: 
    734750        raise ValueError("Test data set with no examples") 
    735     if not examples.domain.classVar: 
     751     
     752    #check if the data is a multi-label data 
     753    multilabel_flag = label.is_multilabel(examples) 
     754     
     755    if multilabel_flag == 0 and not examples.domain.classVar: #single-label 
    736756        raise ValueError("Test data set without class attribute") 
    737757     
     
    741761 
    742762    nIterations = max(indices)+1 
    743     if examples.domain.classVar.varType == Orange.data.Type.Discrete: 
    744         values = list(examples.domain.classVar.values) 
    745         basevalue = examples.domain.classVar.baseValue 
     763    if multilabel_flag == 0: #single-label 
     764        if examples.domain.classVar.varType == Orange.data.Type.Discrete: 
     765            values = list(examples.domain.classVar.values) 
     766            basevalue = examples.domain.classVar.baseValue 
     767        else: 
     768            basevalue = values = None 
     769    else: #multi-label 
     770        values = label.get_label_names(examples) 
     771        basevalue = None 
     772     
     773    test_results = ExperimentResults(nIterations, [getobjectname(l) for l in learners], values, weight!=0, basevalue) 
     774    if multilabel_flag == 0: 
     775        conv = examples.domain.classVar.varType == Orange.data.Type.Discrete and int or float 
     776        test_results.results = [TestedExample(indices[i], conv(examples[i].getclass()), nLrn, examples[i].getweight(weight)) 
     777                               for i in range(len(examples))] 
    746778    else: 
    747         basevalue = values = None 
    748  
    749     conv = examples.domain.classVar.varType == Orange.data.Type.Discrete and int or float         
    750     testResults = ExperimentResults(nIterations, [getobjectname(l) for l in learners], values, weight!=0, basevalue) 
    751     testResults.results = [TestedExample(indices[i], conv(examples[i].getclass()), nLrn, examples[i].getweight(weight)) 
     779        test_results.results = [TestedExample(indices[i], label.get_labels(examples,examples[i]), nLrn, examples[i].getweight(weight),multilabel_flag) 
    752780                           for i in range(len(examples))] 
    753  
     781     
    754782    if argkw.get("storeExamples", 0): 
    755         testResults.examples = examples 
     783        test_results.examples = examples 
    756784         
    757785    ccsum = hex(examples.checksum())[2:] 
    758786    ppsp = encode_PP(pps) 
    759     fnstr = "{TestWithIndices}_%s_%s%s-%s" % ("%s", indicesrandseed, ppsp, ccsum) 
     787    fnstr = "{TestWithIndices}_%s_%s%s-%s" % ("%s", indices_randseed, ppsp, ccsum) 
    760788    if "*" in fnstr: 
    761789        cache = 0 
    762790 
    763     if cache and testResults.load_from_files(learners, fnstr): 
     791    if cache and test_results.load_from_files(learners, fnstr): 
    764792        printVerbose("  loaded from cache", verb) 
    765793    else: 
     
    794822            classifiers = [None]*nLrn 
    795823            for i in range(nLrn): 
    796                 if not cache or not testResults.loaded[i]: 
     824                if not cache or not test_results.loaded[i]: 
    797825                    classifiers[i] = learners[i](learnset, weight) 
    798             if storeclassifiers:     
    799                 testResults.classifiers.append(classifiers) 
     826            if store_classifiers:     
     827                test_results.classifiers.append(classifiers) 
    800828 
    801829            # testing 
     
    805833                    # This is to prevent cheating: 
    806834                    ex = Orange.data.Instance(testset[tcn]) 
    807                     ex.setclass("?") 
     835                    if multilabel_flag == 0: 
     836                        ex.setclass("?") 
    808837                    tcn += 1 
    809838                    for cl in range(nLrn): 
    810                         if not cache or not testResults.loaded[cl]: 
     839                        if not cache or not test_results.loaded[cl]: 
    811840                            cr = classifiers[cl](ex, Orange.core.GetBoth)                                       
    812                             if cr[0].isSpecial(): 
     841                            if multilabel_flag == 0 and cr[0].isSpecial(): 
    813842                                raise "Classifier %s returned unknown value" % (classifiers[cl].name or ("#%i" % cl)) 
    814                             testResults.results[i].set_result(cl, cr[0], cr[1]) 
     843                            test_results.results[i].set_result(cl, cr[0], cr[1]) 
    815844            if callback: 
    816845                callback() 
    817846        if cache: 
    818             testResults.save_to_files(learners, fnstr) 
     847            test_results.save_to_files(learners, fnstr) 
    819848         
    820     return testResults 
    821  
    822  
    823 def learn_and_test_on_test_data(learners, learnset, testset, testResults=None, iterationNumber=0, pps=[], callback=None, **argkw): 
     849    return test_results 
     850 
     851 
     852def learn_and_test_on_test_data(learners, learnset, testset, test_results=None, iteration_number=0, pps=[], callback=None, **argkw): 
    824853    """ 
    825854    This function performs no sampling on its own: two separate datasets 
     
    832861 
    833862    You can pass an already initialized :obj:`ExperimentResults` (argument 
    834     ``results``) and an iteration number (``iterationNumber``). Results 
     863    ``results``) and an iteration number (``iteration_number``). Results 
    835864    of the test will be appended with the given iteration 
    836865    number. This is because :obj:`learnAndTestWithTestData` 
     
    840869 
    841870    """ 
    842     storeclassifiers = argkw.get("storeclassifiers", 0) or argkw.get("storeClassifiers", 0) 
     871    store_classifiers = argkw.get("store_classifiers", 0) or argkw.get("store_classifiers", 0) 
    843872    storeExamples = argkw.get("storeExamples", 0) 
    844873 
    845874    learnset, learnweight = demangleExamples(learnset) 
    846875    testset, testweight = demangleExamples(testset) 
    847     storeclassifiers = argkw.get("storeclassifiers", 0) or argkw.get("storeClassifiers", 0) 
     876    store_classifiers = argkw.get("store_classifiers", 0) or argkw.get("store_classifiers", 0) 
    848877     
    849878    for pp in pps: 
     
    867896    classifiers = [learner(learnset, learnweight) for learner in learners] 
    868897    for i in range(len(learners)): classifiers[i].name = getattr(learners[i], 'name', 'noname') 
    869     testResults = test_on_data(classifiers, (testset, testweight), testResults, iterationNumber, storeExamples) 
    870     if storeclassifiers: 
    871         testResults.classifiers.append(classifiers) 
    872     return testResults 
    873  
    874  
    875 def learn_and_test_on_learn_data(learners, learnset, testResults=None, iterationNumber=0, pps=[], callback=None, **argkw): 
     898    test_results = test_on_data(classifiers, (testset, testweight), test_results, iteration_number, storeExamples) 
     899    if store_classifiers: 
     900        test_results.classifiers.append(classifiers) 
     901    return test_results 
     902 
     903 
     904def learn_and_test_on_learn_data(learners, learnset, test_results=None, iteration_number=0, pps=[], callback=None, **argkw): 
    876905    """ 
    877906    This function is similar to the above, except that it learns and 
     
    888917    """ 
    889918 
    890     storeclassifiers = argkw.get("storeclassifiers", 0) or argkw.get("storeClassifiers", 0) 
     919    store_classifiers = argkw.get("store_classifiers", 0) or argkw.get("store_classifiers", 0) 
    891920    storeExamples = argkw.get("storeExamples", 0) 
    892921 
     
    918947            callback() 
    919948    for i in range(len(learners)): classifiers[i].name = getattr(learners[i], "name", "noname") 
    920     testResults = test_on_data(classifiers, (testset, learnweight), testResults, iterationNumber, storeExamples) 
    921     if storeclassifiers: 
    922         testResults.classifiers.append(classifiers) 
    923     return testResults 
    924  
    925  
    926 def test_on_data(classifiers, testset, testResults=None, iterationNumber=0, storeExamples = False, **argkw): 
     949    test_results = test_on_data(classifiers, (testset, learnweight), test_results, iteration_number, storeExamples) 
     950    if store_classifiers: 
     951        test_results.classifiers.append(classifiers) 
     952    return test_results 
     953 
     954 
     955def test_on_data(classifiers, testset, test_results=None, iteration_number=0, storeExamples = False, **argkw): 
    927956    """ 
    928957    This function gets a list of classifiers, not learners like the other 
     
    937966    testset, testweight = demangleExamples(testset) 
    938967 
    939     if not testResults: 
     968    if not test_results: 
    940969        classVar = testset.domain.classVar 
    941970        if testset.domain.classVar.varType == Orange.data.Type.Discrete: 
     
    945974            values = None 
    946975            baseValue = -1 
    947         testResults=ExperimentResults(1, [l.name for l in classifiers], values, testweight!=0, baseValue) 
    948  
    949     examples = getattr(testResults, "examples", False) 
     976        test_results=ExperimentResults(1, [l.name for l in classifiers], values, testweight!=0, baseValue) 
     977 
     978    examples = getattr(test_results, "examples", False) 
    950979    if examples and len(examples): 
    951980        # We must not modify an example table we do not own, so we clone it the 
    952981        # first time we have to add to it 
    953         if not getattr(testResults, "examplesCloned", False): 
    954             testResults.examples = Orange.data.Table(testResults.examples) 
    955             testResults.examplesCloned = True 
    956         testResults.examples.extend(testset) 
     982        if not getattr(test_results, "examplesCloned", False): 
     983            test_results.examples = Orange.data.Table(test_results.examples) 
     984            test_results.examplesCloned = True 
     985        test_results.examples.extend(testset) 
    957986    else: 
    958987        # We do not clone at the first iteration - cloning might never be needed at all... 
    959         testResults.examples = testset 
     988        test_results.examples = testset 
    960989     
    961990    conv = testset.domain.classVar.varType == Orange.data.Type.Discrete and int or float 
    962991    for ex in testset: 
    963         te = TestedExample(iterationNumber, conv(ex.getclass()), 0, ex.getweight(testweight)) 
     992        te = TestedExample(iteration_number, conv(ex.getclass()), 0, ex.getweight(testweight)) 
    964993 
    965994        for classifier in classifiers: 
     
    969998            cr = classifier(ex2, Orange.core.GetBoth) 
    970999            te.add_result(cr[0], cr[1]) 
    971         testResults.results.append(te) 
     1000        test_results.results.append(te) 
    9721001         
    973     return testResults 
     1002    return test_results 
  • orange/Orange/multilabel/br.py

    r9445 r9447  
    4848 
    4949The following example demonstrates a straightforward invocation of 
    50 this algorithm (`br_example.py`_, uses `multidata.tab`_): 
     50this algorithm (`mlc-br-example.py`_, uses `multidata.tab`_): 
    5151 
    52 .. literalinclude:: code/br_example.py 
     52.. literalinclude:: code/mlc-br-example.py 
    5353   :lines: 1- 
    5454 
    55 .. _br_example.py: code/br_example.py 
     55.. _br_example.py: code/mlc-br-example.py 
    5656.. _multidata.tab: code/multidata.tab 
    5757 
  • orange/Orange/multilabel/label.py

    r9445 r9447  
    1515    return len( [i for i, var in enumerate(data.domain.variables) 
    1616          if var.attributes.has_key('label')]) 
     17 
     18def is_multilabel(data): 
     19    """ Judge whether the data is multi-label, if so, return 1, else return 0""" 
     20    if not isinstance(data, Orange.data.Table): 
     21        raise TypeError('data must be of type \'Orange.data.Table\'') 
     22    if not data.domain.classVar and get_num_labels(data) > 0: 
     23        return 1 
     24    return 0 
    1725 
    1826def get_label_indices(data): 
  • orange/doc/Orange/hiearchy.txt

    r9445 r9447  
    3737      tuning 
    3838   multilabel 
    39       br; tesing; scoring 
     39      br 
  • orange/doc/Orange/rst/Orange.multilabel.rst

    r9445 r9447  
    99 
    1010   Orange.multilabel.br 
    11    Orange.multilabel.testing 
    12    Orange.multilabel.scoring 
  • orange/doc/Orange/rst/code/multidata.tab

    r9444 r9447  
    1 Feature Sports  Religion    SCience Politics 
     1Feature Sports  Religion    Science Politics 
    22d   d   d   d   d 
    33    label=1 label=1 label=1 label=1 
Note: See TracChangeset for help on using the changeset viewer.