Ignore:
Timestamp:
02/25/12 22:37:01 (2 years ago)
Author:
janezd <janez.demsar@…>
Branch:
default
Message:

Polished documentation about SVM

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Orange/classification/svm/__init__.py

    r10300 r10369  
    1 """ 
    2 .. index:: support vector machines (SVM) 
    3 .. index: 
    4    single: classification; support vector machines (SVM) 
    5     
    6 ********************************* 
    7 Support Vector Machines (``svm``) 
    8 ********************************* 
    9  
    10 This is a module for `Support Vector Machine`_ (SVM) classification. It 
    11 exposes the underlying `LibSVM`_ and `LIBLINEAR`_ library in a standard 
    12 Orange Learner/Classifier interface. 
    13  
    14 Choosing the right learner 
    15 ========================== 
    16  
    17 Choose an SVM learner suitable for the problem. 
    18 :obj:`SVMLearner` is a general SVM learner. :obj:`SVMLearnerEasy` will 
    19 help with the data normalization and parameter tuning. Learn with a fast 
    20 :obj:`LinearSVMLearner` on data sets with a large number of features.  
    21  
    22 .. note:: SVM can perform poorly on some data sets. Choose the parameters  
    23           carefully. In cases of low classification accuracy, try scaling the  
    24           data and experiment with different parameters. \ 
    25           :obj:`SVMLearnerEasy` class does this automatically (it is similar 
    26           to the `svm-easy.py` script in the LibSVM distribution). 
    27  
    28            
    29 SVM learners (from `LibSVM`_) 
    30 ============================= 
    31  
    32 The most basic :class:`SVMLearner` implements the standard `LibSVM`_ learner 
    33 It supports four built-in kernel types (Linear, Polynomial, RBF and Sigmoid). 
    34 Additionally kernel functions defined in Python can be used instead.  
    35  
    36 .. note:: For learning from ordinary :class:`Orange.data.Table` use the \ 
    37     :class:`SVMLearner`. For learning from sparse dataset (i.e. 
    38     data in `basket` format) use the :class:`SVMLearnerSparse` class. 
    39  
    40 .. autoclass:: Orange.classification.svm.SVMLearner 
    41     :members: 
    42  
    43 .. autoclass:: Orange.classification.svm.SVMLearnerSparse 
    44     :members: 
    45     :show-inheritance: 
    46      
    47 .. autoclass:: Orange.classification.svm.SVMLearnerEasy 
    48     :members: 
    49     :show-inheritance: 
    50  
    51 The next example shows how to use SVM learners and that :obj:`SVMLearnerEasy`  
    52 with automatic data preprocessing and parameter tuning  
    53 outperforms :obj:`SVMLearner` with the default :obj:`~SVMLearner.nu` and :obj:`~SVMLearner.gamma`:   
    54      
    55 .. literalinclude:: code/svm-easy.py 
    56  
    57  
    58     
    59 Linear SVM learners (from `LIBLINEAR`_) 
    60 ======================================= 
    61  
    62 The :class:`LinearSVMLearner` learner is more suitable for large scale 
    63 problems as it is significantly faster then :class:`SVMLearner` and its 
    64 subclasses. A down side is it only supports a linear kernel (as the name 
    65 suggests) and does not support probability estimation for the 
    66 classifications. Furthermore a Multi-class SVM learner 
    67 :class:`MultiClassSVMLearner` is provided. 
    68     
    69 .. autoclass:: Orange.classification.svm.LinearSVMLearner 
    70    :members: 
    71     
    72 .. autoclass:: Orange.classification.svm.MultiClassSVMLearner 
    73    :members: 
    74     
    75     
    76 SVM Based feature selection and scoring 
    77 ======================================= 
    78  
    79 .. autoclass:: Orange.classification.svm.RFE 
    80  
    81 .. autoclass:: Orange.classification.svm.ScoreSVMWeights 
    82     :show-inheritance: 
    83   
    84   
    85 Utility functions 
    86 ================= 
    87  
    88 .. automethod:: Orange.classification.svm.max_nu 
    89  
    90 .. automethod:: Orange.classification.svm.get_linear_svm_weights 
    91  
    92 .. automethod:: Orange.classification.svm.table_to_svm_format 
    93  
    94 The following example shows how to get linear SVM weights: 
    95      
    96 .. literalinclude:: code/svm-linear-weights.py     
    97  
    98  
    99 .. _kernel-wrapper: 
    100  
    101 Kernel wrappers 
    102 =============== 
    103  
    104 Kernel wrappers are helper classes used to build custom kernels for use 
    105 with :class:`SVMLearner` and subclasses. All wrapper constructors take 
    106 one or more Python functions (`wrapped` attribute) to wrap. The  
    107 function must be a positive definite kernel, taking two arguments of  
    108 type :class:`Orange.data.Instance` and return a float. 
    109  
    110 .. autoclass:: Orange.classification.svm.kernels.KernelWrapper 
    111    :members: 
    112  
    113 .. autoclass:: Orange.classification.svm.kernels.DualKernelWrapper 
    114    :members: 
    115  
    116 .. autoclass:: Orange.classification.svm.kernels.RBFKernelWrapper 
    117    :members: 
    118  
    119 .. autoclass:: Orange.classification.svm.kernels.PolyKernelWrapper 
    120    :members: 
    121  
    122 .. autoclass:: Orange.classification.svm.kernels.AdditionKernelWrapper 
    123    :members: 
    124  
    125 .. autoclass:: Orange.classification.svm.kernels.MultiplicationKernelWrapper 
    126    :members: 
    127  
    128 .. autoclass:: Orange.classification.svm.kernels.CompositeKernelWrapper 
    129    :members: 
    130  
    131 .. autoclass:: Orange.classification.svm.kernels.SparseLinKernel 
    132    :members: 
    133  
    134 Example: 
    135  
    136 .. literalinclude:: code/svm-custom-kernel.py 
    137  
    138 .. _`Support Vector Machine`: http://en.wikipedia.org/wiki/Support_vector_machine 
    139 .. _`LibSVM`: http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 
    140 .. _`LIBLINEAR`: http://www.csie.ntu.edu.tw/~cjlin/liblinear/ 
    141  
    142 """ 
    143  
    1441import math 
    1452 
     
    17128 
    17229def max_nu(data): 
    173     """Return the maximum nu parameter for Nu_SVC support vector learning  
    174     for the given data table.  
     30    """ 
     31    Return the maximum nu parameter for the given data table for 
     32    Nu_SVC learning. 
    17533     
    17634    :param data: Data with discrete class variable 
     
    19149class SVMLearner(_SVMLearner): 
    19250    """ 
    193     :param svm_type: defines the SVM type (can be C_SVC, Nu_SVC  
    194         (default), OneClass, Epsilon_SVR, Nu_SVR) 
     51    :param svm_type: the SVM type 
    19552    :type svm_type: SVMLearner.SVMType 
    196     :param kernel_type: defines the kernel type for learning 
    197         (can be kernels.RBF (default), kernels.Linear, kernels.Polynomial,  
    198         kernels.Sigmoid, kernels.Custom) 
     53    :param kernel_type: the kernel type 
    19954    :type kernel_type: SVMLearner.Kernel 
    200     :param degree: kernel parameter (for Polynomial) (default 3) 
     55    :param degree: kernel parameter (only for ``Polynomial``) 
    20156    :type degree: int 
    202     :param gamma: kernel parameter (Polynomial/RBF/Sigmoid) 
    203         (default 1.0/num_of_features) 
     57    :param gamma: kernel parameter; if 0, it is set to 1.0/#features (for ``Polynomial``, ``RBF`` and ``Sigmoid``) 
    20458    :type gamma: float 
    205     :param coef0: kernel parameter (Polynomial/Sigmoid) (default 0) 
     59    :param coef0: kernel parameter (for ``Polynomial`` and ``Sigmoid``) 
    20660    :type coef0: int 
    207     :param kernel_func: function that will be called if `kernel_type` is 
    208         `kernels.Custom`. It must accept two :obj:`Orange.data.Instance` 
    209         arguments and return a float (see :ref:`kernel-wrapper` for some 
    210         examples). 
    211     :type kernel_func: callable function 
    212     :param C: C parameter for C_SVC, Epsilon_SVR and Nu_SVR 
     61    :param kernel_func: kernel function if ``kernel_type`` is 
     62        ``kernels.Custom`` 
     63    :type kernel_func: callable object 
     64    :param C: C parameter (for ``C_SVC``, ``Epsilon_SVR`` and ``Nu_SVR``) 
    21365    :type C: float 
    214     :param nu: Nu parameter for Nu_SVC, Nu_SVR and OneClass (default 0.5) 
     66    :param nu: Nu parameter (for ``Nu_SVC``, ``Nu_SVR`` and ``OneClass``) 
    21567    :type nu: float 
    216     :param p: epsilon in loss-function for Epsilon_SVR 
     68    :param p: epsilon parameter (for ``Epsilon_SVR``) 
    21769    :type p: float 
    218     :param cache_size: cache memory size in MB (default 200) 
     70    :param cache_size: cache memory size in MB 
    21971    :type cache_size: int 
    220     :param eps: tolerance of termination criterion (default 0.001) 
     72    :param eps: tolerance of termination criterion 
    22173    :type eps: float 
    22274    :param probability: build a probability model 
    223         (default False) 
    22475    :type probability: bool 
    22576    :param shrinking: use shrinking heuristics  
    226         (default True) 
    22777    :type shrinking: bool 
    22878    :param weight: a list of class weights 
    22979    :type weight: list 
    230      
     80 
    23181    Example: 
    23282     
     
    23484        >>> from Orange.classification import svm 
    23585        >>> from Orange.evaluation import testing, scoring 
    236         >>> table = Orange.data.Table("vehicle.tab") 
     86        >>> data = Orange.data.Table("vehicle.tab") 
    23787        >>> learner = svm.SVMLearner() 
    238         >>> results = testing.cross_validation([learner], table, folds=5) 
     88        >>> results = testing.cross_validation([learner], data, folds=5) 
    23989        >>> print scoring.CA(results)[0] 
    24090        0.789613644274 
     
    283133        :type table: Orange.data.Table 
    284134         
    285         :param weight: unused - use the constructors ``weight`` 
    286             parameter to set class weights 
    287          
     135        :param weight: ignored (required due to base class signature); 
    288136        """ 
    289137 
     
    338186    def tune_parameters(self, data, parameters=None, folds=5, verbose=0, 
    339187                       progress_callback=None): 
    340         """Tune the ``parameters`` on given ``data`` using  
    341         cross validation. 
     188        """Tune the ``parameters`` on the given ``data`` using  
     189        internal cross validation. 
    342190         
    343191        :param data: data for parameter tuning 
    344192        :type data: Orange.data.Table  
    345         :param parameters: defaults to ["nu", "C", "gamma"] 
     193        :param parameters: names of parameters to tune 
     194            (default: ["nu", "C", "gamma"]) 
    346195        :type parameters: list of strings 
    347         :param folds: number of folds used for cross validation 
     196        :param folds: number of folds for internal cross validation 
    348197        :type folds: int 
    349         :param verbose: default False 
     198        :param verbose: set verbose output 
    350199        :type verbose: bool 
    351         :param progress_callback: report progress 
     200        :param progress_callback: callback function for reporting progress 
    352201        :type progress_callback: callback function 
    353202             
    354         An example that tunes the `gamma` parameter on `data` using 3-fold cross  
    355         validation. :: 
     203        Here is example of tuning the `gamma` parameter using 
     204        3-fold cross validation. :: 
    356205 
    357206            svm = Orange.classification.svm.SVMLearner() 
     
    445294class SVMLearnerSparse(SVMLearner): 
    446295 
    447     """A :class:`SVMLearner` that learns from 
    448     meta attributes. 
    449      
    450     Meta attributes do not need to be registered with the data set domain, or  
    451     present in all the instances. Use this for large  
    452     sparse data sets. 
    453      
     296    """ 
     297    A :class:`SVMLearner` that learns from data stored in meta 
     298    attributes. Meta attributes do not need to be registered with the 
     299    data set domain, or present in all data instances. 
    454300    """ 
    455301 
     
    472318class SVMLearnerEasy(SVMLearner): 
    473319 
    474     """Apart from the functionality of :obj:`SVMLearner` it automatically scales the  
    475     data and perform parameter optimization with the  
    476     :func:`SVMLearner.tune_parameters`. It is similar to the easy.py script in  
    477     the LibSVM package. 
     320    """A class derived from :obj:`SVMLearner` that automatically 
     321    scales the data and performs parameter optimization using 
     322    :func:`SVMLearner.tune_parameters`. The procedure is similar to 
     323    that implemented in easy.py script from the LibSVM package. 
    478324     
    479325    """ 
     
    545391    def __init__(self, solver_type=L2R_L2LOSS_DUAL, C=1.0, eps=0.01, **kwargs): 
    546392        """ 
    547         :param solver_type: Can be one of class constants: 
    548          
    549             - L2R_L2LOSS_DUAL 
    550             - L2R_L2LOSS  
    551             - L2R_L1LOSS_DUAL 
    552             - L2R_L1LOSS 
    553             - L1R_L2LOSS 
     393        :param solver_type: One of the following class constants: ``LR2_L2LOSS_DUAL``, ``L2R_L2LOSS``, ``LR2_L1LOSS_DUAL``, ``L2R_L1LOSS`` or ``L1R_L2LOSS`` 
    554394         
    555395        :param C: Regularization parameter (default 1.0) 
     
    611451    """Extract attribute weights from the linear SVM classifier. 
    612452     
    613     For multi class classification the weights are square-summed over all 
    614     binary one vs. one classifiers unles obj:`sum` is False, in which case 
    615     the return value is a list of weights for each individual binary 
    616     classifier (in the order of [class1 vs class2, class1 vs class3 ... class2 
    617     vs class3 ...]). 
     453    For multi class classification, the result depends on the argument 
     454    :obj:`sum`. If ``True`` (default) the function computes the 
     455    squared sum of the weights over all binary one vs. one 
     456    classifiers. If :obj:`sum` is ``False`` it returns a list of 
     457    weights for each individual binary classifier (in the order of 
     458    [class1 vs class2, class1 vs class3 ... class2 vs class3 ...]). 
    618459         
    619460    """ 
     
    687528 
    688529class ScoreSVMWeights(Orange.feature.scoring.Score): 
    689     """Score feature by training a linear SVM classifier, using a squared sum of  
    690     weights (of each binary classifier) as the returned score. 
     530    """ 
     531    Score a feature by the squared sum of weights using a linear SVM 
     532    classifier. 
    691533         
    692534    Example: 
     
    759601class RFE(object): 
    760602 
    761     """Recursive feature elimination using linear SVM derived attribute  
    762     weights. 
     603    """Iterative feature elimination based on weights computed by 
     604    linear SVM. 
    763605     
    764606    Example:: 
     
    770612            normalization=False) # normalization=False will not change the domain 
    771613        rfe = Orange.classification.svm.RFE(l) 
    772         data_with_removed_features = rfe(table, 5) 
     614        data_subset_of_features = rfe(table, 5) 
    773615         
    774616    """ 
Note: See TracChangeset for help on using the changeset viewer.