Changeset 7592:cd2c14f42aba in orange


Ignore:
Timestamp:
02/05/11 00:28:57 (3 years ago)
Author:
miha <miha.stajdohar@…>
Branch:
default
Convert:
39e90347a30134b8b13c08d4b0dee2513f7a3c09
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/optimization/__init__.py

    r7561 r7592  
    88positive class. 
    99 
    10 ================= 
     10***************** 
    1111Tuning parameters 
    12 ================= 
     12***************** 
    1313 
    1414Two classes support tuning parameters. 
     
    3333   :members:  
    3434    
    35 ========================== 
     35************************** 
    3636Setting Optimal Thresholds 
    37 ========================== 
     37************************** 
     38 
     39Some models may perform well in terms of AUC which measures the ability to 
     40distinguish between examples of two classes, but have low classifications 
     41accuracies. The reason may be in the threshold: in binary problems, classifiers 
     42usually classify into the more probable class, while sometimes, when class 
     43distributions are highly skewed, a modified threshold would give better 
     44accuracies. Here are two classes that can help. 
    3845   
    3946.. autoclass:: Orange.optimization.ThresholdLearner 
     
    4249.. autoclass:: Orange.optimization.ThresholdClassifier 
    4350   :members:  
     51    
     52Examples 
     53======== 
     54 
     55This is how you use the learner. 
     56 
     57part of `optimization-thresholding1.py`_ 
     58 
     59.. literalinclude:: code/optimization-thresholding1.py 
     60 
     61The output:: 
     62 
     63    W/out threshold adjustement: 0.633 
     64    With adjusted thredhold: 0.659 
     65    With threshold at 0.80: 0.449 
     66 
     67shows that fitting threshold is good (well, although 2.5 percent increase in 
     68the accuracy absolutely guarantees you a publication at ICML, the difference is 
     69still unimportant), while setting it at 80% is a bad idea. Or is it? 
     70 
     71part of `optimization-thresholding2.py`_ 
     72 
     73.. literalinclude:: code/optimization-thresholding2.py 
     74 
     75The script first divides the data into training and testing examples. It trains 
     76a naive Bayesian classifier and than wraps it into 
     77:obj:`Orange.optimization.ThresholdClassifiers` with thresholds of .2, .5 and 
     78.8. The three models are tested on the left-out examples, and we compute the 
     79confusion matrices from the results. The printout:: 
     80 
     81    0.20: TP 60.000, TN 1.000 
     82    0.50: TP 42.000, TN 24.000 
     83    0.80: TP 2.000, TN 43.000 
     84 
     85shows how the varying threshold changes the balance between the number of true 
     86positives and negatives. 
    4487 
    4588.. autoclass:: Orange.optimization.PreprocessedLearner 
    4689   :members:  
     90    
     91.. _optimization-thresholding1.py: code/optimization-thresholding1.py 
     92.. _optimization-thresholding2.py: code/optimization-thresholding2.py 
    4793 
    4894""" 
    4995 
    5096import Orange.core 
    51  
    52 class TuneParameters(Orange.core.Learner): 
     97import Orange.classification 
     98import Orange.evaluation.scoring 
     99import Orange.evaluation.testing 
     100import Orange.misc 
     101 
     102class TuneParameters(Orange.classification.Learner): 
    53103     
    54104    """.. attribute:: examples 
     
    62112    .. attribute:: object 
    63113     
    64         The learning algorithm whose parameters are to be tuned. This can be, for 
    65         instance, orngTree.TreeLearner. You will usually use the wrapped learners 
    66         from modules, not the built-in classifiers, such as orange.TreeLearner 
     114        The learning algorithm whose parameters are to be tuned. This can be, 
     115        for instance, :obj:`Orange.classification.tree.TreeLearner`. You will 
     116        usually use the wrapped learners from modules, not the built-in 
     117        classifiers, such as :obj:`Orange.classification.tree.TreeLearner` 
    67118        directly, since the arguments to be fitted are easier to address in the 
    68         wrapped versions. But in principle it doesn't matter.  
     119        wrapped versions. But in principle it doesn't matter. 
    69120     
    70121    .. attribute:: evaluate 
    71122     
    72         The statistics to evaluate. The default is orngStat.CA, so the learner will 
    73         be fit for the optimal classification accuracy. You can replace it with, 
    74         for instance, orngStat.AUC to optimize the AUC. Statistics can return 
    75         either a single value (classification accuracy), a list with a single value 
    76         (this is what orngStat.CA actually does), or arbitrary objects which the 
    77         compare function below must be able to compare. 
     123        The statistics to evaluate. The default is 
     124        :obj:`Orange.evaluation.scoring.CA`, so the learner will be fit for the 
     125        optimal classification accuracy. You can replace it with, for instance, 
     126        :obj:`Orange.evaluation.scoring.AUC` to optimize the AUC. Statistics 
     127        can return either a single value (classification accuracy), a list with 
     128        a single value (this is what :obj:`Orange.evaluation.scoring.CA` 
     129        actually does), or arbitrary objects which the compare function below 
     130        must be able to compare. 
    78131     
    79132    .. attribute:: folds 
     
    83136    .. attribute:: compare 
    84137     
    85         The function used to compare the results. The function should accept two 
    86         arguments (e.g. two classification accuracies, AUCs or whatever the result 
    87         of evaluate is) and return a positive value if the first argument is 
    88         better, 0 if they are equal and a negative value if the first is worse than 
    89         the second. The default compare function is cmp. You don't need to change 
    90         this if evaluate is such that higher values mean a better classifier. 
     138        The function used to compare the results. The function should accept 
     139        two arguments (e.g. two classification accuracies, AUCs or whatever the 
     140        result of evaluate is) and return a positive value if the first 
     141        argument is better, 0 if they are equal and a negative value if the 
     142        first is worse than the second. The default compare function is cmp. 
     143        You don't need to change this if evaluate is such that higher values 
     144        mean a better classifier. 
    91145     
    92146    .. attribute:: returnWhat 
     
    105159     
    106160        If 0 (default), the class doesn't print anything. If set to 1, it will 
    107         print out the optimal value found, if set to 2, it will print out all tried 
    108         values and the related 
     161        print out the optimal value found, if set to 2, it will print out all 
     162        tried values and the related 
    109163     
    110164    If tuner returns the classifier, it behaves as a learning algorithm. As the 
     
    113167    cross-validation. 
    114168 
    115     Out of these attributes, the only necessary argument is object. The real tuning 
    116     classes add two additional - the attributes that tell what parameter(s) to 
    117     optimize and which values to use. 
     169    Out of these attributes, the only necessary argument is object. The real 
     170    tuning classes add two additional - the attributes that tell what 
     171    parameter(s) to optimize and which values to use. 
    118172     
    119173    """ 
     
    125179     
    126180    def __new__(cls, examples = None, weightID = 0, **argkw): 
    127         self = Orange.core.Learner.__new__(cls, **argkw) 
     181        self = Orange.classification.Learner.__new__(cls, **argkw) 
    128182        self.__dict__.update(argkw) 
    129183        if examples: 
     
    140194        return lastobj, names[-1] 
    141195         
    142 # Same arguments as TuneParameters, plus: 
    143 #   parameter  - a string or a list of strings with parameter(s) to fit 
    144 #   values     - possible values of the parameter 
    145 #                (eg <object>.<parameter> = <value>[i]) 
    146196class Tune1Parameter(TuneParameters): 
    147197     
     
    163213 
    164214    .. literalinclude:: code/optimization-tuning1.py 
    165         :lines: 7-15 
     215        :lines: 3-11 
    166216 
    167217    Set up like this, when the tuner is called, set learner.minSubset to 1, 2, 
     
    187237 
    188238    .. literalinclude:: code/optimization-tuning1.py 
    189         :lines: 17-22 
     239        :lines: 13-18 
    190240     
    191241    This will take some time: for each of 8 values for minSubset it will 
     
    204254     
    205255    def __call__(self, table, weight=None, verbose=0): 
    206         import orngTest, orngStat, orngMisc 
    207  
    208256        verbose = verbose or getattr(self, "verbose", 0) 
    209         evaluate = getattr(self, "evaluate", orngStat.CA) 
     257        evaluate = getattr(self, "evaluate", Orange.evaluation.scoring.CA) 
    210258        folds = getattr(self, "folds", 5) 
    211259        compare = getattr(self, "compare", cmp) 
    212         returnWhat = getattr(self, "returnWhat", Tune1Parameter.returnClassifier) 
     260        returnWhat = getattr(self, "returnWhat",  
     261                             Tune1Parameter.returnClassifier) 
    213262 
    214263        if (type(self.parameter)==list) or (type(self.parameter)==tuple): 
     
    218267 
    219268        cvind = Orange.core.MakeRandomIndicesCV(table, folds) 
    220         findBest = orngMisc.BestOnTheFly(seed = table.checksum(), callCompareOn1st = True) 
     269        findBest = Orange.misc.selection.BestOnTheFly(seed = table.checksum(),  
     270                                         callCompareOn1st = True) 
    221271        tableAndWeight = weight and (table, weight) or table 
    222272        for par in self.values: 
    223273            for i in to_set: 
    224274                setattr(i[0], i[1], par) 
    225             res = evaluate(orngTest.testWithIndices([self.object], tableAndWeight, cvind)) 
     275            res = evaluate(Orange.evaluation.testing.testWithIndices( 
     276                                        [self.object], tableAndWeight, cvind)) 
    226277            findBest.candidate((res, par)) 
    227278            if verbose==2: 
    228                 print '*** orngWrap  %s: %s:' % (par, res) 
     279                print '*** optimization  %s: %s:' % (par, res) 
    229280 
    230281        bestpar = findBest.winner()[1] 
     
    246297            return classifier 
    247298 
    248  
    249 # Same arguments as TuneParameters, plus 
    250 #   parameters - a list of tuples with parameters to be fitted and the 
    251 #                corresponding possible values, [(parameter(s), values), ...] 
    252 #                (eg <object>.<parameter[j]> = <value[j]>[i]) 
    253299class TuneMParameters(TuneParameters): 
    254300     
    255     """The use of :obj:`Orange.optimization.TuneMParameters differs from  
    256     Tune1Parameter only in specification of tuning parameters. 
     301    """The use of :obj:`Orange.optimization.TuneMParameters` differs from  
     302    :obj:`Orange.optimization.Tune1Parameter` only in specification of tuning 
     303    parameters. 
    257304     
    258305    .. attribute:: parameters 
     
    265312    tuner as follows: 
    266313     
    267     part of `optimization-tuningm.py`_ 
     314    `optimization-tuningm.py`_ 
    268315 
    269316    .. literalinclude:: code/optimization-tuningm.py 
    270         :lines: 9-12 
    271          
    272     Everything else stays like above, in examples for Tune1Parameter. 
     317         
     318    Everything else stays like above, in examples for 
     319    :obj:`Orange.optimization.Tune1Parameter`. 
    273320     
    274321    .. _optimization-tuningm.py: code/optimization-tuningm.py 
     
    277324     
    278325    def __call__(self, table, weight=None, verbose=0): 
    279         import orngTest, orngStat, orngMisc 
    280  
    281         evaluate = getattr(self, "evaluate", orngStat.CA) 
     326        evaluate = getattr(self, "evaluate", Orange.evaluation.scoring.CA) 
    282327        folds = getattr(self, "folds", 5) 
    283328        compare = getattr(self, "compare", cmp) 
    284329        verbose = verbose or getattr(self, "verbose", 0) 
    285         returnWhat = getattr(self, "returnWhat", Tune1Parameter.returnClassifier) 
     330        returnWhat=getattr(self, "returnWhat", Tune1Parameter.returnClassifier) 
    286331        progressCallback = getattr(self, "progressCallback", lambda i: None) 
    287332         
     
    298343 
    299344        cvind = Orange.core.MakeRandomIndicesCV(table, folds) 
    300         findBest = orngMisc.BestOnTheFly(seed = table.checksum(), callCompareOn1st = True) 
     345        findBest = Orange.misc.selection.BestOnTheFly(seed = table.checksum(),  
     346                                         callCompareOn1st = True) 
    301347        tableAndWeight = weight and (table, weight) or table 
    302348        numOfTests = sum([len(x[1]) for x in self.parameters]) 
    303349        milestones = set(range(0, numOfTests, max(numOfTests / 100, 1))) 
    304         for itercount, valueindices in enumerate(orngMisc.LimitedCounter([len(x[1]) for x in self.parameters])): 
    305             values = [self.parameters[i][1][x] for i,x in enumerate(valueindices)] 
     350        for itercount, valueindices in enumerate(Orange.misc.counters.LimitedCounter( \ 
     351                                        [len(x[1]) for x in self.parameters])): 
     352            values = [self.parameters[i][1][x] for i,x \ 
     353                      in enumerate(valueindices)] 
    306354            for pi, value in enumerate(values): 
    307355                for i, par in enumerate(to_set[pi]): 
     
    310358                        print "%s: %s" % (parnames[pi][i], value) 
    311359                         
    312             res = evaluate(orngTest.testWithIndices([self.object], tableAndWeight, cvind)) 
     360            res = evaluate(Orange.evaluation.testing.testWithIndices( 
     361                                        [self.object], tableAndWeight, cvind)) 
    313362            if itercount in milestones: 
    314363                progressCallback(100.0 * itercount / numOfTests) 
     
    340389            return classifier 
    341390 
    342  
    343  
    344  
    345 class ThresholdLearner(Orange.core.Learner): 
     391class ThresholdLearner(Orange.classification.Learner): 
     392     
     393    """:obj:`Orange.optimization.ThresholdLearner` is a class that wraps around  
     394    another learner. When given the data, it calls the wrapped learner to build 
     395    a classifier, than it uses the classifier to predict the class 
     396    probabilities on the training examples. Storing the probabilities, it 
     397    computes the threshold that would give the optimal classification accuracy. 
     398    Then it wraps the classifier and the threshold into an instance of 
     399    :obj:`Orange.optimization.ThresholdClassifier`. 
     400 
     401    Note that the learner doesn't perform internal cross-validation. Also, the 
     402    learner doesn't work for multivalued classes. If you don't understand why, 
     403    think harder. If you still don't, try to program it yourself, this should 
     404    help. :) 
     405 
     406    :obj:`Orange.optimization.ThresholdLearner` has the same interface as any 
     407    learner: if the constructor is given examples, it returns a classifier, 
     408    else it returns a learner. It has two attributes. 
     409     
     410    .. attribute:: learner 
     411     
     412        The wrapped learner, for example an instance of 
     413        :obj:`Orange.classification.bayes.NaiveLearner`. 
     414     
     415    .. attribute:: storeCurve 
     416     
     417        If set, the resulting classifier will contain an attribute curve, with 
     418        a list of tuples containing thresholds and classification accuracies at 
     419        that threshold. 
     420     
     421    """ 
     422     
    346423    def __new__(cls, examples = None, weightID = 0, **kwds): 
    347         self = Orange.core.Learner.__new__(cls, **kwds) 
     424        self = Orange.classification.Learner.__new__(cls, **kwds) 
    348425        self.__dict__.update(kwds) 
    349426        if examples: 
     
    357434         
    358435        classifier = self.learner(examples, weightID) 
    359         threshold, optCA, curve = Orange.core.ThresholdCA(classifier, examples, weightID) 
     436        threshold, optCA, curve = Orange.wrappers.ThresholdCA(classifier,  
     437                                                          examples,  
     438                                                          weightID) 
    360439        if getattr(self, "storeCurve", 0): 
    361440            return ThresholdClassifier(classifier, threshold, curve = curve) 
     
    363442            return ThresholdClassifier(classifier, threshold) 
    364443 
    365 class ThresholdClassifier(Orange.core.Classifier): 
     444class ThresholdClassifier(Orange.classification.Classifier): 
     445     
     446    """:obj:`Orange.optimization.ThresholdClassifier`, used by both  
     447    :obj:`Orange.optimization.ThredholdLearner` and 
     448    :obj:`Orange.optimization.ThresholdLearner_fixed` is therefore another 
     449    wrapper class, containing a classifier and a threshold. When it needs to 
     450    classify an example, it calls the wrapped classifier to predict 
     451    probabilities. The example will be classified into the second class only if 
     452    the probability of that class is above the threshold. 
     453 
     454    .. attribute:: classifier 
     455     
     456    The wrapped classifier, normally the one related to the ThresholdLearner's 
     457    learner, e.g. an instance of 
     458    :obj:`Orange.classification.bayes.NaiveLearner`. 
     459     
     460    .. attribute:: threshold 
     461     
     462    The threshold for classification into the second class. 
     463     
     464    The two attributes can be specified set as attributes or given to the 
     465    constructor as ordinary arguments. 
     466     
     467    """ 
     468     
    366469    def __init__(self, classifier, threshold, **kwds): 
    367470        self.classifier = classifier 
     
    369472        self.__dict__.update(kwds) 
    370473 
    371     def __call__(self, example, what = Orange.core.Classifier.GetValue): 
     474    def __call__(self, example, what = Orange.classification.Classifier.GetValue): 
    372475        probs = self.classifier(example, self.GetProbabilities) 
    373476        if what == self.GetProbabilities: 
    374477            return probs 
    375         value = Orange.core.Value(self.classifier.classVar, probs[1]>self.threshold) 
    376         if what == Orange.core.Classifier.GetValue: 
     478        value = Orange.data.Value(self.classifier.classVar, probs[1] > \ 
     479                                  self.threshold) 
     480        if what == Orange.classification.Classifier.GetValue: 
    377481            return value 
    378482        else: 
    379483            return (value, probs) 
    380484 
    381 def ThresholdLearner_fixed(learner, threshold, examples = None, weightId = 0, **kwds): 
     485def ThresholdLearner_fixed(learner, threshold,  
     486                           examples=None, weightId=0, **kwds): 
     487     
     488    """There's also a dumb variant of  
     489    :obj:`Orange.optimization.ThresholdLearner`, a class called 
     490    :obj:`Orange.optimization.ThreshholdLearner_fixed`. Instead of finding the 
     491    optimal threshold it uses a prescribed one. So, it has the following two 
     492    attributes. 
     493     
     494    .. attriute:: learner 
     495     
     496    The wrapped learner, for example an instance of 
     497    :obj:`Orange.classification.bayes.NaiveLearner`. 
     498     
     499    .. attriute:: threshold 
     500     
     501    Threshold to use in classification. 
     502     
     503    What this guy does is therefore simple: to learn, it calls the learner and 
     504    puts the resulting classifier together with the threshold into an instance 
     505    of ThresholdClassifier. 
     506     
     507    """ 
     508     
    382509    lr = apply(ThresholdLearner_fixed_Class, (learner, threshold), kwds) 
    383510    if examples: 
     
    386513        return lr 
    387514     
    388 class ThresholdLearner_fixed(Orange.core.Learner): 
     515class ThresholdLearner_fixed(Orange.classification.Learner): 
    389516    def __new__(cls, examples = None, weightID = 0, **kwds): 
    390         self = Orange.core.Learner.__new__(cls, **kwds) 
     517        self = Orange.classification.Learner.__new__(cls, **kwds) 
    391518        self.__dict__.update(kwds) 
    392519        if examples: 
     
    403530            raise "ThresholdLearner handles binary classes only" 
    404531         
    405         return ThresholdClassifier(self.learner(examples, weightID), self.threshold) 
     532        return ThresholdClassifier(self.learner(examples, weightID),  
     533                                   self.threshold) 
    406534 
    407535class PreprocessedLearner(object): 
     
    428556        hadWeight = hasWeight = weightId is not None 
    429557        for preprocessor in self.preprocessors: 
    430             t = preprocessor(data, weightId) if hasWeight else preprocessor(data) 
     558            if hasWeight: 
     559                t = preprocessor(data, weightId)   
     560            else: 
     561                t = preprocessor(data) 
     562                 
    431563            if isinstance(t, tuple): 
    432564                data, weightId = t 
     
    454586                 
    455587            def __reduce__(self): 
    456                 return PreprocessedLearner, (self.preprocessor.preprocessors, self.wrappedLearner) 
     588                return PreprocessedLearner, (self.preprocessor.preprocessors, \ 
     589                                             self.wrappedLearner) 
    457590             
    458591            def __getattr__(self, name): 
Note: See TracChangeset for help on using the changeset viewer.