Changeset 7251:c02e45fd5f45 in orange


Ignore:
Timestamp:
02/02/11 21:55:24 (3 years ago)
Author:
jzbontar <jure.zbontar@…>
Branch:
default
Convert:
4b02916d1106f808e18311781ef55005c3df91a5
Message:

checkpoint

Location:
orange
Files:
1 added
1 deleted
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/logreg.py

    r7206 r7251  
    66=================== 
    77 
    8 Module logreg is a set of wrappers around classes LogisticLearner and 
    9 LogisticClassifier, that are implemented in core Orange. This module 
    10 expanses use of logistic regression to discrete attributes, it helps 
    11 handling various anomalies in attributes, such as constant variables 
    12 and singularities, that makes fitting logistic regression almost 
    13 impossible. It also implements a function for constructing a stepwise 
    14 logistic regression, which is a good technique to prevent overfitting, 
    15 and is a good feature subset selection technique as well. 
     8Module :obj:`Orange.classification.logreg` is a set of wrappers around 
     9classes LogisticLearner and LogisticClassifier, that are implemented 
     10in core Orange. This module extends the use of logistic regression 
     11to discrete attributes, it helps handling various anomalies in 
     12attributes, such as constant variables and singularities, that make 
     13fitting logistic regression almost impossible. It also implements a 
     14function for constructing a stepwise logistic regression, which is a 
     15good technique for prevent overfitting, and is a good feature subset 
     16selection technique as well. 
    1617 
    1718Functions 
     
    2021.. autofunction:: LogRegLearner 
    2122.. autofunction:: StepWiseFSS 
    22  
     23.. autofunction:: printOUT 
     24 
     25Class 
     26----- 
     27 
     28.. autoclass:: StepWiseFSS_class 
     29 
     30Examples 
     31-------- 
     32 
     33First example shows a very simple induction of a logistic regression 
     34classifier (`logreg-run.py`_, uses `titanic.tab`_). 
     35 
     36.. literalinclude:: code/logreg-run.py 
     37 
     38Result:: 
     39 
     40    Classification accuracy: 0.778282598819 
     41 
     42    class attribute = survived 
     43    class values = <no, yes> 
     44 
     45        Attribute       beta  st. error     wald Z          P OR=exp(beta) 
     46 
     47        Intercept      -1.23       0.08     -15.15      -0.00 
     48     status=first       0.86       0.16       5.39       0.00       2.36 
     49    status=second      -0.16       0.18      -0.91       0.36       0.85 
     50     status=third      -0.92       0.15      -6.12       0.00       0.40 
     51        age=child       1.06       0.25       4.30       0.00       2.89 
     52       sex=female       2.42       0.14      17.04       0.00      11.25 
     53 
     54Next examples shows how to handle singularities in data sets 
     55(`logreg-singularities.py`_, uses `adult_sample.tab`_). 
     56 
     57.. literalinclude:: code/logreg-singularities.py 
     58 
     59Result:: 
     60 
     61    <=50K <=50K 
     62    <=50K <=50K 
     63    <=50K <=50K 
     64    >50K >50K 
     65    <=50K >50K 
     66 
     67    class attribute = y 
     68    class values = <>50K, <=50K> 
     69 
     70                               Attribute       beta  st. error     wald Z          P OR=exp(beta) 
     71 
     72                               Intercept       6.62      -0.00       -inf       0.00 
     73                                     age      -0.04       0.00       -inf       0.00       0.96 
     74                                  fnlwgt      -0.00       0.00       -inf       0.00       1.00 
     75                           education-num      -0.28       0.00       -inf       0.00       0.76 
     76                 marital-status=Divorced       4.29       0.00        inf       0.00      72.62 
     77            marital-status=Never-married       3.79       0.00        inf       0.00      44.45 
     78                marital-status=Separated       3.46       0.00        inf       0.00      31.95 
     79                  marital-status=Widowed       3.85       0.00        inf       0.00      46.96 
     80    marital-status=Married-spouse-absent       3.98       0.00        inf       0.00      53.63 
     81        marital-status=Married-AF-spouse       4.01       0.00        inf       0.00      55.19 
     82                 occupation=Tech-support      -0.32       0.00       -inf       0.00       0.72 
     83                 occupation=Craft-repair       0.37       0.00        inf       0.00       1.45 
     84                occupation=Other-service       2.68       0.00        inf       0.00      14.61 
     85                        occupation=Sales       0.22       0.00        inf       0.00       1.24 
     86               occupation=Prof-specialty       0.18       0.00        inf       0.00       1.19 
     87            occupation=Handlers-cleaners       1.29       0.00        inf       0.00       3.64 
     88            occupation=Machine-op-inspct       0.86       0.00        inf       0.00       2.37 
     89                 occupation=Adm-clerical       0.30       0.00        inf       0.00       1.35 
     90              occupation=Farming-fishing       1.12       0.00        inf       0.00       3.06 
     91             occupation=Transport-moving       0.62       0.00        inf       0.00       1.85 
     92              occupation=Priv-house-serv       3.46       0.00        inf       0.00      31.87 
     93              occupation=Protective-serv       0.11       0.00        inf       0.00       1.12 
     94                 occupation=Armed-Forces       0.59       0.00        inf       0.00       1.81 
     95                       relationship=Wife      -1.06      -0.00        inf       0.00       0.35 
     96                  relationship=Own-child      -1.04      60.00      -0.02       0.99       0.35 
     97              relationship=Not-in-family      -1.94  532845.00      -0.00       1.00       0.14 
     98             relationship=Other-relative      -2.42       0.00       -inf       0.00       0.09 
     99                  relationship=Unmarried      -1.92       0.00       -inf       0.00       0.15 
     100                 race=Asian-Pac-Islander      -0.19       0.00       -inf       0.00       0.83 
     101                 race=Amer-Indian-Eskimo       2.88       0.00        inf       0.00      17.78 
     102                              race=Other       3.93       0.00        inf       0.00      51.07 
     103                              race=Black       0.11       0.00        inf       0.00       1.12 
     104                              sex=Female       0.30       0.00        inf       0.00       1.36 
     105                            capital-gain      -0.00       0.00       -inf       0.00       1.00 
     106                            capital-loss      -0.00       0.00       -inf       0.00       1.00 
     107                          hours-per-week      -0.04       0.00       -inf       0.00       0.96 
     108 
     109In case we set removeSingular to 0, inducing logistic regression 
     110classifier would return an error:: 
     111 
     112    Traceback (most recent call last): 
     113      File "logreg-singularities.py", line 4, in <module> 
     114        lr = classification.logreg.LogRegLearner(table, removeSingular=0) 
     115      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 255, in LogRegLearner 
     116        return lr(examples, weightID) 
     117      File "/home/jure/devel/orange/Orange/classification/logreg.py", line 291, in __call__ 
     118        lr = learner(examples, weight) 
     119    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 
     120 
     121 
     122We can see that attribute workclass=Never-worked is causeing singularity. The 
     123issue of this is that we should remove Never-worked manually or leave it to 
     124function LogRegLearner to remove it automatically.  
     125 
     126In the last example it is shown, how the use of stepwise logistic 
     127regression can help us in achieving better classification 
     128(`logreg-stepwise.py`_, uses `ionosphere.tab`_): 
     129 
     130.. literalinclude:: code/logreg-stepwise.py 
     131 
     132Result:: 
     133 
     134    Learner      CA 
     135    logistic     0.841 
     136    filtered     0.846 
     137 
     138    Number of times attributes were used in cross-validation: 
     139     1 x a21 
     140    10 x a22 
     141     8 x a23 
     142     7 x a24 
     143     1 x a25 
     144    10 x a26 
     145    10 x a27 
     146     3 x a28 
     147     7 x a29 
     148     9 x a31 
     149     2 x a16 
     150     7 x a12 
     151     1 x a32 
     152     8 x a15 
     153    10 x a14 
     154     4 x a17 
     155     7 x a30 
     156    10 x a11 
     157     1 x a10 
     158     1 x a13 
     159    10 x a34 
     160     2 x a19 
     161     1 x a18 
     162    10 x a3 
     163    10 x a5 
     164     4 x a4 
     165     4 x a7 
     166     8 x a6 
     167    10 x a9 
     168    10 x a8 
     169 
     170References 
     171---------- 
     172 
     173David W. Hosmer, Stanley Lemeshow. Applied Logistic Regression - 2nd ed. Wiley, New York, 2000  
     174 
     175 
     176.. _logreg-run.py: code/logreg-run.py 
     177.. _logreg-singularities.py: code/logreg-singularities.py 
     178.. _logreg-stepwise.py: code/logreg-stepwise.py 
     179 
     180.. _ionosphere.tab: code/ionosphere.tab 
     181.. _adult_sample.tab: code/adult_sample.tab 
     182.. _titanic.tab: code/titanic.tab 
    23183 
    24184""" 
     
    38198 
    39199def printOUT(classifier): 
     200    """ Formatted print to console of all major attributes in logistic 
     201    regression classifier. Parameter classifier is a logistic regression 
     202    classifier. 
     203 
     204    :param examples: data set 
     205    :type examples: :obj:`Orange.data.table` 
     206    """ 
     207 
    40208    # print out class values 
    41209    print 
     
    67235    return 0 
    68236 
    69 def LogRegLearner(examples = None, weightID=0, **kwds): 
    70     """ Returns a LogisticClassifier if examples are given. If examples 
    71     are not specified, an instance of object LogisticLearner with 
    72     its parameters appropriately initialized is returned.  Parameter 
    73     weightID defines the ID of the weight meta attribute. Set parameter 
    74     removeSingular to 1,if you want automatic removal of disturbing 
    75     attributes, such as constants and singularities. Examples can contain 
    76     discrete and continuous attributes. Parameter fitter is used to 
    77     alternate fitting algorithm. Currently a Newton-Raphson fitting 
    78     algorithm is used, however you can change it to something else. You 
    79     can find bayesianFitter in orngLR to test it out. The last three 
    80     parameters addCrit, deleteCrit, numAttr are used to set parameters 
    81     for stepwise attribute selection (see next method). If you wish to 
    82     use stepwise within LogRegLearner, stpewiseLR must be set as 1. 
    83  
    84     :param examples: 
    85     :param weightID: 
    86     :param removeSingular: 
    87     :param fitter: 
    88     :param stepwiseLR: 
    89     :param addCrit: 
    90     :param deleteCrit: 
    91     :param numAttr: 
     237def LogRegLearner(table=None, weightID=0, **kwds): 
     238    """ Logistic regression learner 
     239 
     240    :obj:`LogRegLearner` implements logistic regression. If data 
     241    instances are provided to the constructor, the learning algorithm 
     242    is called and the resulting classifier is returned instead of the 
     243    learner. 
     244 
     245    :param table: data set with either discrete or continuous features 
     246    :type table: Orange.table.data 
     247    :param weightID: the ID of the weight meta attribute 
     248    :param removeSingular: set to 1 if you want automatic removal of disturbing attributes, such as constants and singularities 
     249    :param fitter: alternate the fitting algorithm (currently the Newton-Raphson fitting algorithm is used) 
     250    :param stepwiseLR: set to 1 if you wish to use stepwise logistic regression 
     251    :param addCrit: parameter for stepwise attribute selection 
     252    :param deleteCrit: parameter for stepwise attribute selection 
     253    :param numAttr: parameter for stepwise attribute selection 
    92254         
    93255    """ 
    94256    lr = LogRegLearnerClass(**kwds) 
    95     if examples: 
    96         return lr(examples, weightID) 
     257    if table: 
     258        return lr(table, weightID) 
    97259    else: 
    98260        return lr 
     
    694856 
    695857class StepWiseFSS_class(orange.Learner): 
     858  """ Performs stepwise logistic regression and returns a list of "most" 
     859  informative attributes. Each step of this algorithm is composed 
     860  of two parts. First is backward elimination, where each already 
     861  chosen attribute is tested for significant contribution to overall 
     862  model. If worst among all tested attributes has higher significance 
     863  that is specified in deleteCrit, this attribute is removed from 
     864  the model. The second step is forward selection, which is similar 
     865  to backward elimination. It loops through all attributes that are 
     866  not in model and tests whether they contribute to common model with 
     867  significance lower that addCrit. Algorithm stops when no attribute 
     868  in model is to be removed and no attribute out of the model is to 
     869  be added. By setting numAttr larger than -1, algorithm will stop its 
     870  execution when number of attributes in model will exceed that number. 
     871 
     872  Significances are assesed via the likelihood ration chi-square 
     873  test. Normal F test is not appropriate, because errors are assumed to 
     874  follow a binomial distribution. 
     875 
     876  """ 
     877 
    696878  def __init__(self, addCrit=0.2, deleteCrit=0.3, numAttr = -1, **kwds): 
    697879    self.__dict__.update(kwds) 
  • orange/doc/Orange/rst/index.rst

    r7248 r7251  
    1919   orange.classification.svm 
    2020   orange.classification.tree 
    21    orange.classification.logreg 
     21   Orange.classification.logreg 
    2222   orange.classification.rules 
    2323 
Note: See TracChangeset for help on using the changeset viewer.