Ignore:
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • Orange/classification/logreg.py

    r10246 r10346  
    88 
    99def dump(classifier): 
    10     """ Return a formatted string of all major features in logistic regression 
    11     classifier. 
     10    """ Return a formatted string describing the logistic regression model 
    1211 
    1312    :param classifier: logistic regression classifier. 
     
    5352    """ Logistic regression learner. 
    5453 
    55     If data instances are provided to 
    56     the constructor, the learning algorithm is called and the resulting 
    57     classifier is returned instead of the learner. 
    58  
    59     :param data: data table with either discrete or continuous features 
     54    Returns either a learning algorithm (instance of 
     55    :obj:`LogRegLearner`) or, if data is provided, a fitted model 
     56    (instance of :obj:`LogRegClassifier`). 
     57 
     58    :param data: data table; it may contain discrete and continuous features 
    6059    :type data: Orange.data.Table 
    6160    :param weight_id: the ID of the weight meta attribute 
    6261    :type weight_id: int 
    63     :param remove_singular: set to 1 if you want automatic removal of 
    64         disturbing features, such as constants and singularities 
     62    :param remove_singular: automated removal of constant 
     63        features and singularities (default: `False`) 
    6564    :type remove_singular: bool 
    66     :param fitter: the fitting algorithm (by default the Newton-Raphson 
    67         fitting algorithm is used) 
    68     :param stepwise_lr: set to 1 if you wish to use stepwise logistic 
    69         regression 
     65    :param fitter: the fitting algorithm (default: :obj:`LogRegFitter_Cholesky`) 
     66    :param stepwise_lr: enables stepwise feature selection (default: `False`) 
    7067    :type stepwise_lr: bool 
    71     :param add_crit: parameter for stepwise feature selection 
     68    :param add_crit: threshold for adding a feature in stepwise 
     69        selection (default: 0.2) 
    7270    :type add_crit: float 
    73     :param delete_crit: parameter for stepwise feature selection 
     71    :param delete_crit: threshold for removing a feature in stepwise 
     72        selection (default: 0.3) 
    7473    :type delete_crit: float 
    75     :param num_features: parameter for stepwise feature selection 
     74    :param num_features: number of features in stepwise selection 
     75        (default: -1, no limit) 
    7676    :type num_features: int 
    7777    :rtype: :obj:`LogRegLearner` or :obj:`LogRegClassifier` 
     
    9696    @deprecated_keywords({"examples": "data"}) 
    9797    def __call__(self, data, weight=0): 
    98         """Learn from the given table of data instances. 
    99  
    100         :param data: Data instances to learn from. 
     98        """Fit a model to the given data. 
     99 
     100        :param data: Data instances. 
    101101        :type data: :class:`~Orange.data.Table` 
    102         :param weight: Id of meta attribute with weights of instances 
     102        :param weight: Id of meta attribute with instance weights 
    103103        :type weight: int 
    104104        :rtype: :class:`~Orange.classification.logreg.LogRegClassifier` 
     
    685685class StepWiseFSS(Orange.classification.Learner): 
    686686  """ 
    687   Algorithm described in Hosmer and Lemeshow, 
    688   Applied Logistic Regression, 2000. 
    689  
    690   Perform stepwise logistic regression and return a list of the 
    691   most "informative" features. Each step of the algorithm is composed 
    692   of two parts. The first is backward elimination, where each already 
    693   chosen feature is tested for a significant contribution to the overall 
    694   model. If the worst among all tested features has higher significance 
    695   than is specified in :obj:`delete_crit`, the feature is removed from 
    696   the model. The second step is forward selection, which is similar to 
    697   backward elimination. It loops through all the features that are not 
    698   in the model and tests whether they contribute to the common model 
    699   with significance lower that :obj:`add_crit`. The algorithm stops when 
    700   no feature in the model is to be removed and no feature not in the 
    701   model is to be added. By setting :obj:`num_features` larger than -1, 
    702   the algorithm will stop its execution when the number of features in model 
    703   exceeds that number. 
    704  
    705   Significances are assesed via the likelihood ration chi-square 
    706   test. Normal F test is not appropriate, because errors are assumed to 
    707   follow a binomial distribution. 
    708  
    709   If :obj:`table` is specified, stepwise logistic regression implemented 
    710   in :obj:`StepWiseFSS` is performed and a list of chosen features 
    711   is returned. If :obj:`table` is not specified, an instance of 
    712   :obj:`StepWiseFSS` with all parameters set is returned and can be called 
    713   with data later. 
    714  
    715   :param table: data set. 
     687  A learning algorithm for logistic regression that implements a 
     688  stepwise feature subset selection as described in Applied Logistic 
     689  Regression (Hosmer and Lemeshow, 2000). 
     690 
     691  Each step of the algorithm is composed of two parts. The first is 
     692  backward elimination in which the least significant variable in the 
     693  model is removed if its p-value is above the prescribed threshold 
     694  :obj:`delete_crit`. The second step is forward selection in which 
     695  all variables are tested for addition to the model, and the one with 
     696  the most significant contribution is added if the corresponding 
     697  p-value is smaller than the prescribed :obj:d`add_crit`. The 
     698  algorithm stops when no more variables can be added or removed. 
     699 
     700  The model can be additionaly constrained by setting 
     701  :obj:`num_features` to a non-negative value. The algorithm will then 
     702  stop when the number of variables exceeds the given limit. 
     703 
     704  Significances are assesed by the likelihood ratio chi-square 
     705  test. Normal F test is not appropriate since the errors are assumed 
     706  to follow a binomial distribution. 
     707 
     708  The class constructor returns an instance of learning algorithm or, 
     709  if given training data, a list of selected variables. 
     710 
     711  :param table: training data. 
    716712  :type table: Orange.data.Table 
    717713 
    718   :param add_crit: "Alpha" level to judge if variable has enough importance to 
    719        be added in the new set. (e.g. if add_crit is 0.2, 
    720        then features is added if its P is lower than 0.2). 
     714  :param add_crit: threshold for adding a variable (default: 0.2) 
    721715  :type add_crit: float 
    722716 
    723   :param delete_crit: Similar to add_crit, just that it is used at backward 
    724       elimination. It should be higher than add_crit! 
     717  :param delete_crit: threshold for removing a variable 
     718      (default: 0.3); should be higher than :obj:`add_crit`. 
    725719  :type delete_crit: float 
    726720 
  • Orange/classification/lookup.py

    r10090 r10347  
    1 """ 
    2  
    3 .. index:: classification; lookup 
    4  
    5 ******************************* 
    6 Lookup classifiers (``lookup``) 
    7 ******************************* 
    8  
    9 Lookup classifiers predict classes by looking into stored lists of 
    10 cases. There are two kinds of such classifiers in Orange. The simpler 
    11 and faster :obj:`ClassifierByLookupTable` uses up to three discrete 
    12 features and has a stored mapping from values of those features to the 
    13 class value. The more complex classifiers store an 
    14 :obj:`Orange.data.Table` and predict the class by matching the instance 
    15 to instances in the table. 
    16  
    17 .. index:: 
    18    single: feature construction; lookup classifiers 
    19  
    20 A natural habitat for these classifiers is feature construction: 
    21 they usually reside in :obj:`~Orange.feature.Descriptor.get_value_from` 
    22 fields of constructed 
    23 features to facilitate their automatic computation. For instance, 
    24 the following script shows how to translate the ``monks-1.tab`` data set 
    25 features into a more useful subset that will only include the features 
    26 ``a``, ``b``, ``e``, and features that will tell whether ``a`` and ``b`` 
    27 are equal and whether ``e`` is 1 (part of 
    28 :download:`lookup-lookup.py <code/lookup-lookup.py>`): 
    29  
    30 .. 
    31     .. literalinclude:: code/lookup-lookup.py 
    32         :lines: 7-21 
    33  
    34 .. testcode:: 
    35  
    36     import Orange 
    37  
    38     monks = Orange.data.Table("monks-1") 
    39  
    40     a, b, e = monks.domain["a"], monks.domain["b"], monks.domain["e"] 
    41  
    42     ab = Orange.feature.Discrete("a==b", values = ["no", "yes"]) 
    43     ab.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(ab, a, b, 
    44                         ["yes", "no", "no",  "no", "yes", "no",  "no", "no", "yes"]) 
    45  
    46     e1 = Orange.feature.Discrete("e==1", values = ["no", "yes"]) 
    47     e1.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(e1, e, 
    48                         ["yes", "no", "no", "no", "?"]) 
    49  
    50     monks2 = monks.select([a, b, ab, e, e1, monks.domain.class_var]) 
    51      
    52 We can check the correctness of the script by printing out several 
    53 random examples from ``data2``. 
    54  
    55     >>> for i in range(5): 
    56     ...     print monks2.randomexample() 
    57     ['3', '2', 'no', '2', 'no', '0'] 
    58     ['2', '2', 'yes', '2', 'no', '1'] 
    59     ['1', '2', 'no', '2', 'no', '0'] 
    60     ['2', '3', 'no', '1', 'yes', '1'] 
    61     ['1', '3', 'no', '1', 'yes', '1'] 
    62  
    63 The first :obj:`ClassifierByLookupTable` takes values of features ``a`` 
    64 and ``b`` and computes the value of ``ab`` according to the rule given in the 
    65 given table. The first three values correspond to ``a=1`` and ``b=1,2,3``; 
    66 for the first combination, value of ``ab`` should be "yes", for the other 
    67 two ``a`` and ``b`` are different. The next triplet corresponds to ``a=2``; 
    68 here, the middle value is "yes"... 
    69  
    70 The second lookup is simpler: since it involves only a single feature, 
    71 the list is a simple one-to-one mapping from the four-valued ``e`` to the 
    72 two-valued ``e1``. The last value in the list is returned when ``e`` is unknown 
    73 and tells that ``e1`` should be unknown then as well. 
    74  
    75 Note that :obj:`ClassifierByLookupTable` is not needed for this. 
    76 The new feature ``e1`` could be computed with a callback to Python, 
    77 for instance:: 
    78  
    79     e2.get_value_from = lambda ex, rw: orange.Value(e2, ex["e"] == "1") 
    80  
    81  
    82 Classifiers by lookup table 
    83 =========================== 
    84  
    85 .. index:: 
    86    single: classification; lookup table 
    87  
    88 Although the above example used :obj:`ClassifierByLookupTable` as if it 
    89 was a concrete class, :obj:`ClassifierByLookupTable` is actually 
    90 abstract. Calling its constructor is a typical Orange trick: it does not 
    91 return an instance of :obj:`ClassifierByLookupTable`, but either 
    92 :obj:`ClassifierByLookupTable1`, :obj:`ClassifierByLookupTable2` or 
    93 :obj:`ClassifierByLookupTable3`. As their names tell, the first 
    94 classifies using a single feature (so that's what we had for ``e1``), 
    95 the second uses a pair of features (and has been constructed for ``ab`` 
    96 above), and the third uses three features. Class predictions for each 
    97 combination of feature values are stored in a (one dimensional) table. 
    98 To classify an instance, the classifier computes an index of the element 
    99 of the table that corresponds to the combination of feature values. 
    100  
    101 These classifiers are built to be fast, not safe. For instance, if the number 
    102 of values for one of the features is changed, Orange will most probably crash. 
    103 To alleviate this, many of these classes' features are read-only and can only 
    104 be set when the object is constructed. 
    105  
    106  
    107 .. py:class:: ClassifierByLookupTable(class_var, variable1[, variable2[, variable3]] [, lookup_table[, distributions]]) 
    108      
    109     A general constructor that, based on the number of feature descriptors, 
    110     constructs one of the three classes discussed. If :obj:`lookup_table` 
    111     and :obj:`distributions` are omitted, the constructor also initializes 
    112     them to two lists of the right sizes, but their elements are don't knows 
    113     and empty distributions. If they are given, they must be of correct size. 
    114      
    115     .. attribute:: variable1[, variable2[, variable3]](read only) 
    116          
    117         The feature(s) that the classifier uses for classification. 
    118         ClassifierByLookupTable1 only has variable1, 
    119         ClassifierByLookupTable2 also has variable2 and 
    120         ClassifierByLookupTable3 has all three. 
    121  
    122     .. attribute:: variables (read only) 
    123          
    124         The above variables, returned as a tuple. 
    125  
    126     .. attribute:: no_of_values1[, no_of_values2[, no_of_values3]] (read only) 
    127          
    128         The number of values for variable1, variable2 and variable3. 
    129         This is stored here to make the classifier faster. Those features 
    130         are defined only for ClassifierByLookupTable2 (the first two) and 
    131         ClassifierByLookupTable3 (all three). 
    132  
    133     .. attribute:: lookup_table (read only) 
    134          
    135         A list of values, one for each possible 
    136         combination of features. For ClassifierByLookupTable1, there is an 
    137         additional element that is returned when the feature's value is 
    138         unknown. Values are ordered by values of features, with variable1 
    139         being the most important. In case of two three valued features, the 
    140         list order is therefore 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3, 
    141         where the first digit corresponds to variable1 and the second to 
    142         variable2. 
    143          
    144         The attribute is read-only - a new list cannot be assigned to it. 
    145         Its elements, however, can be changed. Don't change its size.  
    146  
    147     .. attribute:: distributions (read only) 
    148          
    149         Similar to :obj:`lookup_table`, but it stores a distribution for 
    150         each combination of values.  
    151  
    152     .. attribute:: data_description 
    153          
    154         An object of type :obj:`EFMDataDescription`, defined only for 
    155         ClassifierByLookupTable2 and ClassifierByLookupTable3. They use it 
    156         to make predictions when one or more feature values are unknown. 
    157         ClassifierByLookupTable1 doesn't need it since this case is covered by 
    158         an additional element in :obj:`lookup_table` and :obj:`distributions`, 
    159         as told above. 
    160          
    161     .. method:: get_index(example) 
    162      
    163         Returns an index of ``example`` in :obj:`lookup_table` and 
    164         :obj:`distributions`. The formula depends upon the type of 
    165         the classifier. If value\ *i* is int(example[variable\ *i*]), 
    166         then the corresponding formulae are 
    167  
    168         ClassifierByLookupTable1: 
    169             index = value1, or len(lookup_table) - 1 if value is unknown 
    170         ClassifierByLookupTable2: 
    171             index = value1 * no_of_values1 + value2, or -1 if any value is unknown 
    172         ClassifierByLookupTable3: 
    173             index = (value1 * no_of_values1 + value2) * no_of_values2 + value3, or -1 if any value is unknown 
    174  
    175         Let's see some indices for randomly chosen examples from the original table. 
    176          
    177         part of :download:`lookup-lookup.py <code/lookup-lookup.py>`: 
    178  
    179         .. literalinclude:: code/lookup-lookup.py 
    180             :lines: 26-29 
    181          
    182         Output:: 
    183          
    184             ['3', '2', '1', '2', '2', '1', '0']: ab 7, e1 1  
    185             ['2', '2', '1', '2', '2', '1', '1']: ab 4, e1 1  
    186             ['1', '2', '1', '2', '2', '2', '0']: ab 1, e1 1  
    187             ['2', '3', '2', '3', '1', '1', '1']: ab 5, e1 0  
    188             ['1', '3', '2', '2', '1', '1', '1']: ab 2, e1 0  
    189  
    190  
    191  
    192 .. py:class:: ClassifierByLookupTable1(class_var, variable1 [, lookup_table, distributions]) 
    193      
    194     Uses a single feature for lookup. See 
    195     :obj:`ClassifierByLookupTable` for more details. 
    196  
    197 .. py:class:: ClassifierByLookupTable2(class_var, variable1, variable2, [, lookup_table[, distributions]]) 
    198      
    199     Uses two features for lookup. See 
    200     :obj:`ClassifierByLookupTable` for more details. 
    201          
    202 .. py:class:: ClassifierByLookupTable3(class_var, variable1, variable2, variable3, [, lookup_table[, distributions]]) 
    203      
    204     Uses three features for lookup. See 
    205     :obj:`ClassifierByLookupTable` for more details. 
    206  
    207  
    208 Classifier by data table 
    209 ======================== 
    210  
    211 .. index:: 
    212    single: classification; data table 
    213  
    214 :obj:`ClassifierByDataTable` is used in similar contexts as 
    215 :obj:`ClassifierByLookupTable`. If you write, for instance, a 
    216 constructive induction algorithm, it is recommended that the values 
    217 of the new feature are computed either by one of classifiers by lookup 
    218 table or by ClassifierByDataTable, depending on the number of bound 
    219 features. 
    220  
    221 .. py:class:: ClassifierByDataTable 
    222  
    223     :obj:`ClassifierByDataTable` is the alternative to 
    224     :obj:`ClassifierByLookupTable`. It is to be used when the 
    225     classification is based on more than three features. Instead of having 
    226     a lookup table, it stores an :obj:`Orange.data.Table`, which is 
    227     optimized for a faster access. 
    228      
    229  
    230     .. attribute:: sorted_examples 
    231          
    232         A :obj:`Orange.data.Table` with sorted data instances for lookup. 
    233         Instances in the table can be merged; if there were multiple 
    234         instances with the same feature values (but possibly different 
    235         classes), they are merged into a single instance. Regardless of 
    236         merging, class values in this table are distributed: their svalue 
    237         contains a :obj:`~Orange.statistics.distribution.Distribution`. 
    238  
    239     .. attribute:: classifier_for_unknown 
    240          
    241         This classifier is used to classify instances which were not found 
    242         in the table. If classifier_for_unknown is not set, don't know's are 
    243         returned. 
    244  
    245     .. attribute:: variables (read only) 
    246          
    247         A tuple with features in the domain. This field is here so that 
    248         :obj:`ClassifierByDataTable` appears more similar to 
    249         :obj:`ClassifierByLookupTable`. If a constructive induction 
    250         algorithm returns the result in one of these classifiers, and you 
    251         would like to check which features are used, you can use variables 
    252         regardless of the class you actually got. 
    253  
    254     There are no specific methods for ClassifierByDataTable. 
    255     Since this is a classifier, it can be called. When the instance to be 
    256     classified includes unknown values, :obj:`classifier_for_unknown` will be 
    257     used if it is defined. 
    258  
    259  
    260  
    261 .. py:class:: LookupLearner 
    262      
    263     Although :obj:`ClassifierByDataTable` is not really a classifier in 
    264     the sense that you will use it to classify instances, but is rather a 
    265     function for computation of intermediate values, it has an associated 
    266     learner, :obj:`LookupLearner`. The learner's task is, basically, to 
    267     construct a table for :obj:`ClassifierByDataTable.sorted_examples`. 
    268     It sorts them, merges them 
    269     and regards instance weights in the process as well. 
    270      
    271     If data instances are provided to the constructor, the learning algorithm 
    272     is called and the resulting classifier is returned instead of the learner. 
    273  
    274 part of :download:`lookup-table.py <code/lookup-table.py>`: 
    275  
    276 .. 
    277     .. literalinclude:: code/lookup-table.py 
    278         :lines: 7-13 
    279  
    280 .. testcode:: 
    281          
    282     import Orange 
    283  
    284     table = Orange.data.Table("monks-1") 
    285     a, b, e = table.domain["a"], table.domain["b"], table.domain["e"] 
    286  
    287     table_s = table.select([a, b, e, table.domain.class_var]) 
    288     abe = Orange.classification.lookup.LookupLearner(table_s) 
    289  
    290  
    291 In ``table_s``, we have prepared a table in which instances are described 
    292 only by ``a``, ``b``, ``e`` and the class. The learner constructs a 
    293 :obj:`ClassifierByDataTable` and stores instances from ``table_s`` into its 
    294 :obj:`~ClassifierByDataTable.sorted_examples`. Instances are merged so that 
    295 there are no duplicates. 
    296  
    297     >>> print len(table_s) 
    298     556 
    299     >>> print len(abe.sorted_examples) 
    300     36 
    301     >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP 
    302     ...     print i 
    303     ['1', '1', '1', '1'] 
    304     ['1', '1', '2', '1'] 
    305     ['1', '1', '3', '1'] 
    306     ['1', '1', '4', '1'] 
    307     ['1', '2', '1', '1'] 
    308     ['1', '2', '2', '0'] 
    309     ['1', '2', '3', '0'] 
    310     ['1', '2', '4', '0'] 
    311     ['1', '3', '1', '1'] 
    312     ['1', '3', '2', '0'] 
    313  
    314 Well, there's a bit more here than meets the eye: each instance's class 
    315 value also stores the distribution of classes for all instances that 
    316 were merged into it. In our case, the three features suffice to 
    317 unambiguously determine the classes and, since instances covered the 
    318 entire space, all distributions have 12 instances in one of the class 
    319 and none in the other. 
    320  
    321     >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP 
    322     ...     print i, i.get_class().svalue 
    323     ['1', '1', '1', '1'] <0.000, 12.000> 
    324     ['1', '1', '2', '1'] <0.000, 12.000> 
    325     ['1', '1', '3', '1'] <0.000, 12.000> 
    326     ['1', '1', '4', '1'] <0.000, 12.000> 
    327     ['1', '2', '1', '1'] <0.000, 12.000> 
    328     ['1', '2', '2', '0'] <12.000, 0.000> 
    329     ['1', '2', '3', '0'] <12.000, 0.000> 
    330     ['1', '2', '4', '0'] <12.000, 0.000> 
    331     ['1', '3', '1', '1'] <0.000, 12.000> 
    332     ['1', '3', '2', '0'] <12.000, 0.000> 
    333  
    334 :obj:`ClassifierByDataTable` will usually be used by 
    335 :obj:`~Orange.feature.Descriptor.get_value_from`. So, we 
    336 would probably continue this by constructing a new feature and put the 
    337 classifier into its :obj:`~Orange.feature.Descriptor.get_value_from`. 
    338  
    339     >>> y2 = Orange.feature.Discrete("y2", values = ["0", "1"]) 
    340     >>> y2.get_value_from = abe 
    341  
    342 Although ``abe`` determines the value of ``y2``, ``abe.class_var`` is still ``y``. 
    343 Orange doesn't mind (the whole example is artificial - the entire data set 
    344 will seldom be packed in an :obj:`ClassifierByDataTable`), but this can still 
    345 be solved by 
    346  
    347     >>> abe.class_var = y2 
    348  
    349 The whole story can be greatly simplified. :obj:`LookupLearner` can also be 
    350 called differently than other learners. Besides instances, you can pass 
    351 the new class variable and the features that should be used for 
    352 classification. This saves us from constructing table_s and reassigning 
    353 the :obj:`~Orange.data.Domain.class_var`. It doesn't set the 
    354 :obj:`~Orange.feature.Descriptor.get_value_from`, though. 
    355  
    356 part of :download:`lookup-table.py <code/lookup-table.py>`:: 
    357  
    358     import Orange 
    359  
    360     table = Orange.data.Table("monks-1") 
    361     a, b, e = table.domain["a"], table.domain["b"], table.domain["e"] 
    362  
    363     y2 = Orange.feature.Discrete("y2", values = ["0", "1"]) 
    364     abe2 = Orange.classification.lookup.LookupLearner(y2, [a, b, e], table) 
    365  
    366 Let us, for the end, show another use of :obj:`LookupLearner`. With the 
    367 alternative call arguments, it offers an easy way to observe feature 
    368 interactions. For this purpose, we shall omit ``e``, and construct a 
    369 :obj:`ClassifierByDataTable` from ``a`` and ``b`` only (part of 
    370 :download:`lookup-table.py <code/lookup-table.py>`): 
    371  
    372 .. literalinclude:: code/lookup-table.py 
    373     :lines: 32-35 
    374  
    375 The script's output show how the classes are distributed for different 
    376 values of ``a`` and ``b``:: 
    377  
    378     ['1', '1', '1'] <0.000, 48.000> 
    379     ['1', '2', '0'] <36.000, 12.000> 
    380     ['1', '3', '0'] <36.000, 12.000> 
    381     ['2', '1', '0'] <36.000, 12.000> 
    382     ['2', '2', '1'] <0.000, 48.000> 
    383     ['2', '3', '0'] <36.000, 12.000> 
    384     ['3', '1', '0'] <36.000, 12.000> 
    385     ['3', '2', '0'] <36.000, 12.000> 
    386     ['3', '3', '1'] <0.000, 48.000> 
    387  
    388 For instance, when ``a`` is '1' and ``b`` is '3', the majority class is '0', 
    389 and the class distribution is 36:12 in favor of '0'. 
    390  
    391  
    392 Utility functions 
    393 ================= 
    394  
    395  
    396 There are several functions for working with classifiers that use a stored 
    397 data table for making predictions. There are four such classifiers; the most 
    398 general stores a :class:`~Orange.data.Table` and the other three are 
    399 specialized and optimized for cases where the domain contains only one, two or 
    400 three features (besides the class variable). 
    401  
    402 .. function:: lookup_from_bound(class_var, bound) 
    403  
    404     This function constructs an appropriate lookup classifier for one, two or 
    405     three features. If there are more, it returns None. The resulting 
    406     classifier is of type :obj:`ClassifierByLookupTable`, 
    407     :obj:`ClassifierByLookupTable2` or :obj:`ClassifierByLookupTable3`, with 
    408     ``class_var`` and bound set set as given. 
    409  
    410     For example, using the data set ``monks-1.tab``, to construct a new feature 
    411     from features ``a`` and ``b``, this function can be called as follows. 
    412      
    413         >>> new_var = Orange.feature.Discrete() 
    414         >>> bound = [table.domain[name] for name in ["a", "b"]] 
    415         >>> lookup = Orange.classification.lookup.lookup_from_bound(new_var, bound) 
    416         >>> print lookup.lookup_table 
    417         <?, ?, ?, ?, ?, ?, ?, ?, ?> 
    418  
    419     Function ``lookup_from_bound`` does not initialize neither ``new_var`` nor 
    420     the lookup table... 
    421  
    422 .. function:: lookup_from_function(class_var, bound, function) 
    423  
    424     ... and that's exactly where ``lookup_from_function`` differs from 
    425     :obj:`lookup_from_bound`. ``lookup_from_function`` first calls 
    426     :obj:`lookup_from_bound` and then uses the function to initialize the 
    427     lookup table. The other difference between this and the previous function 
    428     is that ``lookup_from_function`` also accepts bound sets with more than three 
    429     features. In this case, it construct a :obj:`ClassifierByDataTable`. 
    430  
    431     The function gets the values of features as integer indices and should 
    432     return an integer index of the "class value". The class value must be 
    433     properly initialized. 
    434  
    435     For exercise, let us construct a new feature called ``a=b`` whose value will 
    436     be "yes" when ``a`` and ``b`` are equal and "no" when they are not. We will then 
    437     add the feature to the data set. 
    438      
    439         >>> bound = [table.domain[name] for name in ["a", "b"]] 
    440         >>> new_var = Orange.feature.Discrete("a=b", values=["no", "yes"]) 
    441         >>> lookup = Orange.classification.lookup.lookup_from_function(new_var, bound, lambda x: x[0] == x[1]) 
    442         >>> new_var.get_value_from = lookup 
    443         >>> import orngCI 
    444         >>> table2 = orngCI.addAnAttribute(new_var, table) 
    445         >>> for i in table2[:30]: 
    446         ...     print i 
    447         ['1', '1', '1', '1', '3', '1', 'yes', '1'] 
    448         ['1', '1', '1', '1', '3', '2', 'yes', '1'] 
    449         ['1', '1', '1', '3', '2', '1', 'yes', '1'] 
    450         ... 
    451         ['1', '2', '1', '1', '1', '2', 'no', '1'] 
    452         ['1', '2', '1', '1', '2', '1', 'no', '0'] 
    453         ['1', '2', '1', '1', '3', '1', 'no', '0'] 
    454         ... 
    455  
    456     The feature was inserted with use of ``orngCI.addAnAttribute``. By setting 
    457     ``new_var.get_value_from`` to ``lookup`` we state that when converting domains 
    458     (either when needed by ``addAnAttribute`` or at some other place), ``lookup`` 
    459     should be used to compute ``new_var``'s value. (A bit off topic, but 
    460     important: you should never call 
    461     :obj:`~Orange.feature.Descriptor.get_value_from` directly, but always  
    462     through :obj:`~Orange.feature.Descriptor.compute_value`.) 
    463  
    464 .. function:: lookup_from_data(examples [, weight]) 
    465  
    466     This function takes a set of data instances (e.g. :obj:`Orange.data.Table`) 
    467     and turns it into a classifier. If there are one, two or three features and 
    468     no ambiguous examples (examples are ambiguous if they have same values of 
    469     features but with different class values), it will construct an appropriate 
    470     :obj:`ClassifierByLookupTable`. Otherwise, it will return an 
    471     :obj:`ClassifierByDataTable`. 
    472      
    473         >>> lookup = Orange.classification.lookup.lookup_from_data(table) 
    474         >>> test_instance = Orange.data.Instance(table.domain, ['3', '2', '2', '3', '4', '1', '?']) 
    475         >>> lookup(test_instance) 
    476         <orange.Value 'y'='0'> 
    477      
    478 .. function:: dump_lookup_function(func) 
    479  
    480     ``dump_lookup_function`` returns a string with a lookup function in 
    481     tab-delimited format. Argument ``func`` can be any of the above-mentioned 
    482     classifiers or a feature whose 
    483     :obj:`~Orange.feature.Descriptor.get_value_from` points to one of such 
    484     classifiers. 
    485  
    486     For instance, if ``lookup`` is such as constructed in the example for 
    487     ``lookup_from_function``, it can be printed by:: 
    488      
    489         >>> print dump_lookup_function(lookup) 
    490         a      b      a=b 
    491         ------ ------ ------ 
    492         1      1      yes 
    493         1      2      no 
    494         1      3      no 
    495         2      1      no 
    496         2      2      yes 
    497         2      3      no 
    498         3      1      no 
    499         3      2      no 
    500         3      3      yes 
    501  
    502 """ 
    503  
    5041from Orange.misc import deprecated_keywords 
    5052import Orange.data 
     
    52724def lookup_from_function(class_var, bound, function): 
    52825    """ 
    529     Constructs ClassifierByDataTable or ClassifierByLookupTable 
     26    Construct ClassifierByDataTable or ClassifierByLookupTable 
    53027    mirroring the given function. 
    53128     
  • docs/reference/rst/Orange.classification.logreg.rst

    r10246 r10346  
    99******************************** 
    1010 
    11 `Logistic regression <http://en.wikipedia.org/wiki/Logistic_regression>`_ 
    12 is a statistical classification methods that fits data to a logistic 
    13 function. Orange's implementation of algorithm 
    14 can handle various anomalies in features, such as constant variables and 
    15 singularities, that could make direct fitting of logistic regression almost 
    16 impossible. Stepwise logistic regression, which iteratively selects the most 
    17 informative features, is also supported. 
     11`Logistic regression 
     12<http://en.wikipedia.org/wiki/Logistic_regression>`_ is a statistical 
     13classification method that fits data to a logistic function. Orange 
     14provides various enhancement of the method, such as stepwise selection 
     15of variables and handling of constant variables and singularities. 
    1816 
    1917.. autoclass:: LogRegLearner 
     
    4442        that beta coefficients differ from 0.0. The probability is 
    4543        computed from squared Wald Z statistics that is distributed with 
    46         Chi-Square distribution. 
     44        chi-squared distribution. 
    4745 
    4846    .. attribute :: likelihood 
    4947 
    50         The probability of the sample (ie. learning examples) observed on 
    51         the basis of the derived model, as a function of the regression 
    52         parameters. 
     48        The likelihood of the sample (ie. learning data) given the 
     49        fitted model. 
    5350 
    5451    .. attribute :: fit_status 
    5552 
    56         Tells how the model fitting ended - either regularly 
    57         (:obj:`LogRegFitter.OK`), or it was interrupted due to one of beta 
    58         coefficients escaping towards infinity (:obj:`LogRegFitter.Infinity`) 
    59         or since the values didn't converge (:obj:`LogRegFitter.Divergence`). The 
    60         value tells about the classifier's "reliability"; the classifier 
    61         itself is useful in either case. 
     53        Tells how the model fitting ended, either regularly 
     54        (:obj:`LogRegFitter.OK`), or it was interrupted due to one of 
     55        beta coefficients escaping towards infinity 
     56        (:obj:`LogRegFitter.Infinity`) or since the values did not 
     57        converge (:obj:`LogRegFitter.Divergence`). 
     58 
     59        Although the model is functional in all cases, it is 
     60        recommended to inspect whether the coefficients of the model 
     61        if the fitting did not end normally. 
    6262 
    6363    .. method:: __call__(instance, result_type) 
     
    7878.. class:: LogRegFitter 
    7979 
    80     :obj:`LogRegFitter` is the abstract base class for logistic fitters. It 
    81     defines the form of call operator and the constants denoting its 
    82     (un)success: 
    83  
    84     .. attribute:: OK 
    85  
    86         Fitter succeeded to converge to the optimal fit. 
    87  
    88     .. attribute:: Infinity 
    89  
    90         Fitter failed due to one or more beta coefficients escaping towards infinity. 
    91  
    92     .. attribute:: Divergence 
    93  
    94         Beta coefficients failed to converge, but none of beta coefficients escaped. 
    95  
    96     .. attribute:: Constant 
    97  
    98         There is a constant attribute that causes the matrix to be singular. 
    99  
    100     .. attribute:: Singularity 
    101  
    102         The matrix is singular. 
     80    :obj:`LogRegFitter` is the abstract base class for logistic 
     81    fitters. Fitters can be called with a data table and return a 
     82    vector of coefficients and the corresponding statistics, or a 
     83    status signifying an error. The possible statuses are 
     84 
     85    .. attribute:: OK 
     86 
     87        Optimization converged 
     88 
     89    .. attribute:: Infinity 
     90 
     91        Optimization failed due to one or more beta coefficients 
     92        escaping towards infinity. 
     93 
     94    .. attribute:: Divergence 
     95 
     96        Beta coefficients failed to converge, but without any of beta 
     97        coefficients escaping toward infinity. 
     98 
     99    .. attribute:: Constant 
     100 
     101        The data is singular due to a constant variable. 
     102 
     103    .. attribute:: Singularity 
     104 
     105        The data is singular. 
    103106 
    104107 
    105108    .. method:: __call__(data, weight_id) 
    106109 
    107         Performs the fitting. There can be two different cases: either 
    108         the fitting succeeded to find a set of beta coefficients (although 
    109         possibly with difficulties) or the fitting failed altogether. The 
    110         two cases return different results. 
    111  
    112         `(status, beta, beta_se, likelihood)` 
    113             The fitter managed to fit the model. The first element of 
    114             the tuple, result, tells about the problems occurred; it can 
    115             be either :obj:`OK`, :obj:`Infinity` or :obj:`Divergence`. In 
    116             the latter cases, returned values may still be useful for 
    117             making predictions, but it's recommended that you inspect 
    118             the coefficients and their errors and make your decision 
    119             whether to use the model or not. 
    120  
    121         `(status, attribute)` 
    122             The fitter failed and the returned attribute is responsible 
    123             for it. The type of failure is reported in status, which 
    124             can be either :obj:`Constant` or :obj:`Singularity`. 
    125  
    126         The proper way of calling the fitter is to expect and handle all 
    127         the situations described. For instance, if fitter is an instance 
    128         of some fitter and examples contain a set of suitable examples, 
    129         a script should look like this:: 
     110        Fit the model and return a tuple with the fitted values and 
     111        the corresponding statistics or an error indicator. The two 
     112        cases differ by the tuple length and the status (the first 
     113        tuple element). 
     114 
     115        ``(status, beta, beta_se, likelihood)`` Fitting succeeded. The 
     116            first element, ``status`` is either :obj:`OK`, 
     117            :obj:`Infinity` or :obj:`Divergence`. In the latter cases, 
     118            returned values may still be useful for making 
     119            predictions, but it is recommended to inspect the 
     120            coefficients and their errors and decide whether to use 
     121            the model or not. 
     122 
     123        ``(status, variable)`` 
     124            The fitter failed due to the indicated 
     125            ``variable``. ``status`` is either :obj:`Constant` or 
     126            :obj:`Singularity`. 
     127 
     128        The proper way of calling the fitter is to handle both scenarios :: 
    130129 
    131130            res = fitter(examples) 
     
    141140 
    142141    The sole fitter available at the 
    143     moment. It is a C++ translation of `Alan Miller's logistic regression 
    144     code <http://users.bigpond.net.au/amiller/>`_. It uses Newton-Raphson 
     142    moment. This is a C++ translation of `Alan Miller's logistic regression 
     143    code <http://users.bigpond.net.au/amiller/>`_ that uses Newton-Raphson 
    145144    algorithm to iteratively minimize least squares error computed from 
    146     learning examples. 
     145    training data. 
    147146 
    148147 
     
    158157-------- 
    159158 
    160 The first example shows a very simple induction of a logistic regression 
    161 classifier (:download:`logreg-run.py <code/logreg-run.py>`). 
     159The first example shows a straightforward use a logistic regression (:download:`logreg-run.py <code/logreg-run.py>`). 
    162160 
    163161.. literalinclude:: code/logreg-run.py 
     
    210208 
    211209If :obj:`remove_singular` is set to 0, inducing a logistic regression 
    212 classifier would return an error:: 
     210classifier returns an error:: 
    213211 
    214212    Traceback (most recent call last): 
     
    221219    orange.KernelException: 'orange.LogRegLearner': singularity in workclass=Never-worked 
    222220 
    223 We can see that the attribute workclass is causing a singularity. 
     221The attribute variable which causes the singularity is ``workclass``. 
    224222 
    225223The example below shows how the use of stepwise logistic regression can help to 
  • docs/reference/rst/Orange.classification.lookup.rst

    r9372 r10347  
    1 .. automodule:: Orange.classification.lookup 
     1.. py:currentmodule:: Orange.classification.lookup 
     2 
     3.. index:: classification; lookup 
     4 
     5******************************* 
     6Lookup classifiers (``lookup``) 
     7******************************* 
     8 
     9Lookup classifiers predict classes by looking into stored lists of 
     10cases. There are two kinds of such classifiers in Orange. The simpler 
     11and faster :obj:`ClassifierByLookupTable` uses up to three discrete 
     12features and has a stored mapping from values of those features to the 
     13class value. The more complex classifiers store an 
     14:obj:`Orange.data.Table` and predict the class by matching the 
     15instance to instances in the table. 
     16 
     17.. index:: 
     18   single: feature construction; lookup classifiers 
     19 
     20A natural habitat for these classifiers is feature construction: they 
     21usually reside in :obj:`~Orange.feature.Descriptor.get_value_from` 
     22fields of constructed features to facilitate their automatic 
     23computation. For instance, the following script shows how to translate 
     24the ``monks-1.tab`` data set features into a more useful subset that 
     25will only include the features ``a``, ``b``, ``e``, and features that 
     26will tell whether ``a`` and ``b`` are equal and whether ``e`` is 1 
     27(part of :download:`lookup-lookup.py <code/lookup-lookup.py>`): 
     28 
     29.. 
     30    .. literalinclude:: code/lookup-lookup.py 
     31        :lines: 7-21 
     32 
     33.. testcode:: 
     34 
     35    import Orange 
     36 
     37    monks = Orange.data.Table("monks-1") 
     38 
     39    a, b, e = monks.domain["a"], monks.domain["b"], monks.domain["e"] 
     40 
     41    ab = Orange.feature.Discrete("a==b", values = ["no", "yes"]) 
     42    ab.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(ab, a, b, 
     43                        ["yes", "no", "no",  "no", "yes", "no",  "no", "no", "yes"]) 
     44 
     45    e1 = Orange.feature.Discrete("e==1", values = ["no", "yes"]) 
     46    e1.get_value_from = Orange.classification.lookup.ClassifierByLookupTable(e1, e, 
     47                        ["yes", "no", "no", "no", "?"]) 
     48 
     49    monks2 = monks.select([a, b, ab, e, e1, monks.domain.class_var]) 
     50     
     51We can check the correctness of the script by printing out several 
     52random examples from ``data2``. 
     53 
     54    >>> for i in range(5): 
     55    ...     print monks2.randomexample() 
     56    ['3', '2', 'no', '2', 'no', '0'] 
     57    ['2', '2', 'yes', '2', 'no', '1'] 
     58    ['1', '2', 'no', '2', 'no', '0'] 
     59    ['2', '3', 'no', '1', 'yes', '1'] 
     60    ['1', '3', 'no', '1', 'yes', '1'] 
     61 
     62The first :obj:`ClassifierByLookupTable` takes values of features ``a`` 
     63and ``b`` and computes the value of ``ab`` according to the rule given in the 
     64given table. The first three values correspond to ``a=1`` and ``b=1,2,3``; 
     65for the first combination, value of ``ab`` should be "yes", for the other 
     66two ``a`` and ``b`` are different. The next triplet corresponds to ``a=2``; 
     67here, the middle value is "yes"... 
     68 
     69The second lookup is simpler: since it involves only a single feature, 
     70the list is a simple one-to-one mapping from the four-valued ``e`` to the 
     71two-valued ``e1``. The last value in the list is returned when ``e`` is unknown 
     72and tells that ``e1`` should be unknown then as well. 
     73 
     74Note that :obj:`ClassifierByLookupTable` is not needed for this. 
     75The new feature ``e1`` could be computed with a callback to Python, 
     76for instance:: 
     77 
     78    e2.get_value_from = lambda ex, rw: orange.Value(e2, ex["e"] == "1") 
     79 
     80 
     81Classifiers by lookup table 
     82=========================== 
     83 
     84.. index:: 
     85   single: classification; lookup table 
     86 
     87Although the above example used :obj:`ClassifierByLookupTable` as if 
     88it was a concrete class, :obj:`ClassifierByLookupTable` is actually 
     89abstract. Calling its constructor does not return an instance of 
     90:obj:`ClassifierByLookupTable`, but either 
     91:obj:`ClassifierByLookupTable1`, :obj:`ClassifierByLookupTable2` or 
     92:obj:`ClassifierByLookupTable3`, that take one (``e``, above), two 
     93(like ``a`` and ``b``) or three features, respectively. Class 
     94predictions for each combination of feature values are stored in a 
     95(one dimensional) table. To classify an instance, the classifier 
     96computes an index of the element of the table that corresponds to the 
     97combination of feature values. 
     98 
     99These classifiers are built to be fast, not safe. For instance, if the 
     100number of values for one of the features is changed, Orange will most 
     101probably crash.  To alleviate this, many of these classes' attributes 
     102are read-only and can only be set when the object is constructed. 
     103 
     104 
     105.. py:class:: ClassifierByLookupTable(class_var, variable1[, variable2[, variable3]] [, lookup_table[, distributions]]) 
     106     
     107    A general constructor that, based on the number of feature 
     108    descriptors, constructs one of the three classes discussed. If 
     109    :obj:`lookup_table` and :obj:`distributions` are omitted, the 
     110    constructor also initializes them to two lists of the right sizes, 
     111    but their elements are missing values and empty distributions. If 
     112    they are given, they must be of correct size. 
     113     
     114    .. attribute:: variable1[, variable2[, variable3]](read only) 
     115         
     116        The feature(s) that the classifier uses for classification. 
     117        :obj:`ClassifierByLookupTable1` only has :obj:`variable1`, 
     118        :obj:`ClassifierByLookupTable2` also has :obj:`variable2` and 
     119        :obj:`ClassifierByLookupTable3` has all three. 
     120 
     121    .. attribute:: variables (read only) 
     122         
     123        The above variables, returned as a tuple. 
     124 
     125    .. attribute:: no_of_values1[, no_of_values2[, no_of_values3]] (read only) 
     126         
     127        The number of values for :obj:`variable1`, :obj:`variable2` 
     128        and :obj:`variable3`. This is stored here to make the 
     129        classifier faster. These attributes are defined only for 
     130        :obj:`ClassifierByLookupTable2` (the first two) and 
     131        :obj:`ClassifierByLookupTable3` (all three). 
     132 
     133    .. attribute:: lookup_table (read only) 
     134         
     135        A list of values, one for each possible combination of 
     136        features. For :obj:`ClassifierByLookupTable1`, there is an 
     137        additional element that is returned when the feature's value 
     138        is unknown. Values are ordered by values of features, with 
     139        :obj:`variable1` being the most important. For instance, for 
     140        two three-valued features, the elements of :obj:`lookup_table` 
     141        correspond to combinations (1, 1), (1, 2), (1, 3), (2, 1), (2, 
     142        2), (2, 3), (3, 1), (3, 2), (3, 3). 
     143         
     144        The attribute is read-only; it cannot be assigned a new list, 
     145        but the existing list can be changed. Changing its size will 
     146        most likely crash Orange. 
     147 
     148    .. attribute:: distributions (read only) 
     149         
     150        Similar to :obj:`lookup_table`, but storing a distribution for 
     151        each combination of values.  
     152 
     153    .. attribute:: data_description 
     154         
     155        An object of type :obj:`EFMDataDescription`, defined only for 
     156        :obj:`ClassifierByLookupTable2` and 
     157        :obj:`ClassifierByLookupTable3`. They use it to make 
     158        predictions when one or more feature values are missing. 
     159        :obj:`ClassifierByLookupTable1` does not need it since this 
     160        case is covered by an additional element in 
     161        :obj:`lookup_table` and :obj:`distributions`, as described 
     162        above. 
     163         
     164    .. method:: get_index(inst) 
     165     
     166        Returns an index of in :obj:`lookup_table` and 
     167        :obj:`distributions` that corresponds to the given data 
     168        instance ``inst`` . The formula depends upon the type of the 
     169        classifier. If value\ *i* is int(example[variable\ *i*]), then 
     170        the corresponding formulae are 
     171 
     172        ``ClassifierByLookupTable1``: 
     173            index = value1, or len(lookup_table) - 1 if value of :obj:`variable1` is missing 
     174 
     175        ``ClassifierByLookupTable2``: 
     176            index = value1 * no_of_values1 + value2, or -1 if ``value1`` or ``value2`` is missing 
     177 
     178        ClassifierByLookupTable3: 
     179            index = (value1 * no_of_values1 + value2) * no_of_values2 + value3, or -1 if any value is missing 
     180 
     181.. py:class:: ClassifierByLookupTable1(class_var, variable1 [, lookup_table, distributions]) 
     182     
     183    Uses a single feature for lookup. See 
     184    :obj:`ClassifierByLookupTable` for more details. 
     185 
     186.. py:class:: ClassifierByLookupTable2(class_var, variable1, variable2, [, lookup_table[, distributions]]) 
     187     
     188    Uses two features for lookup. See 
     189    :obj:`ClassifierByLookupTable` for more details. 
     190         
     191.. py:class:: ClassifierByLookupTable3(class_var, variable1, variable2, variable3, [, lookup_table[, distributions]]) 
     192     
     193    Uses three features for lookup. See 
     194    :obj:`ClassifierByLookupTable` for more details. 
     195 
     196 
     197Classifier by data table 
     198======================== 
     199 
     200.. index:: 
     201   single: classification; data table 
     202 
     203:obj:`ClassifierByDataTable` is used in similar contexts as 
     204:obj:`ClassifierByLookupTable`. The class is much slower so it is recommended to use :obj:`ClassifierByLookupTable` if the number of features is less than four. 
     205 
     206.. py:class:: ClassifierByDataTable 
     207 
     208    :obj:`ClassifierByDataTable` is the alternative to 
     209    :obj:`ClassifierByLookupTable` for more than three features. 
     210    Instead of having a lookup table, it stores the data in 
     211    :obj:`Orange.data.Table` that is optimized for faster access. 
     212     
     213    .. attribute:: sorted_examples 
     214         
     215        A :obj:`Orange.data.Table` with sorted data instances for 
     216        lookup.  If there were multiple instances with the same 
     217        feature values (but possibly different classes) in the 
     218        original data, they can be merged into a single 
     219        instance. Regardless of merging, class values in this table 
     220        are distributed: their ``svalue`` contains a 
     221        :obj:`~Orange.statistics.distribution.Distribution`. 
     222 
     223    .. attribute:: classifier_for_unknown 
     224         
     225        The classifier for instances that are not found in the 
     226        table. If not set, :obj:`ClassifierByDataTable` returns 
     227        missing value for such instances. 
     228 
     229    .. attribute:: variables (read only) 
     230         
     231        A tuple with features in the domain. Equal to 
     232        :obj:`domain.features`, but here for similarity with 
     233        :obj:`ClassifierByLookupTable`. 
     234 
     235 
     236 
     237.. py:class:: LookupLearner 
     238     
     239    A learner that constructs a table for 
     240    :obj:`ClassifierByDataTable.sorted_examples`. It sorts the data 
     241    instances and merges those with the same feature values. 
     242     
     243    The constructor returns an instance of :obj:`LookupLearners`, 
     244    unless the data is provided, in which case it return 
     245    :obj:`ClassifierByDataTable`. 
     246 
     247    :obj:`LookupLearner` also supports a different call signature than 
     248    other learners. Besides instances, it accepts a new class 
     249    variable and the features that should be used for 
     250    classification.  
     251 
     252part of :download:`lookup-table.py <code/lookup-table.py>`: 
     253 
     254.. 
     255    .. literalinclude:: code/lookup-table.py 
     256        :lines: 7-13 
     257 
     258.. testcode:: 
     259         
     260    import Orange 
     261 
     262    table = Orange.data.Table("monks-1") 
     263    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"] 
     264 
     265    table_s = table.select([a, b, e, table.domain.class_var]) 
     266    abe = Orange.classification.lookup.LookupLearner(table_s) 
     267 
     268 
     269In ``table_s``, we have prepared a table in which instances are described 
     270only by ``a``, ``b``, ``e`` and the class. The learner constructs a 
     271:obj:`ClassifierByDataTable` and stores instances from ``table_s`` into its 
     272:obj:`~ClassifierByDataTable.sorted_examples`. Instances are merged so that 
     273there are no duplicates. 
     274 
     275    >>> print len(table_s) 
     276    556 
     277    >>> print len(abe.sorted_examples) 
     278    36 
     279    >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP 
     280    ...     print i 
     281    ['1', '1', '1', '1'] 
     282    ['1', '1', '2', '1'] 
     283    ['1', '1', '3', '1'] 
     284    ['1', '1', '4', '1'] 
     285    ['1', '2', '1', '1'] 
     286    ['1', '2', '2', '0'] 
     287    ['1', '2', '3', '0'] 
     288    ['1', '2', '4', '0'] 
     289    ['1', '3', '1', '1'] 
     290    ['1', '3', '2', '0'] 
     291 
     292Each instance's class value also stores the distribution of classes 
     293for all instances that were merged into it. In our case, the three 
     294features suffice to unambiguously determine the classes and, since 
     295instances cover the entire space, all distributions have 12 
     296instances in one of the class and none in the other. 
     297 
     298    >>> for i in abe.sorted_examples[:10]:  # doctest: +SKIP 
     299    ...     print i, i.get_class().svalue 
     300    ['1', '1', '1', '1'] <0.000, 12.000> 
     301    ['1', '1', '2', '1'] <0.000, 12.000> 
     302    ['1', '1', '3', '1'] <0.000, 12.000> 
     303    ['1', '1', '4', '1'] <0.000, 12.000> 
     304    ['1', '2', '1', '1'] <0.000, 12.000> 
     305    ['1', '2', '2', '0'] <12.000, 0.000> 
     306    ['1', '2', '3', '0'] <12.000, 0.000> 
     307    ['1', '2', '4', '0'] <12.000, 0.000> 
     308    ['1', '3', '1', '1'] <0.000, 12.000> 
     309    ['1', '3', '2', '0'] <12.000, 0.000> 
     310 
     311A typical use of :obj:`ClassifierByDataTable` is to construct a new 
     312feature and put the classifier into its 
     313:obj:`~Orange.feature.Descriptor.get_value_from`. 
     314 
     315    >>> y2 = Orange.feature.Discrete("y2", values = ["0", "1"]) 
     316    >>> y2.get_value_from = abe 
     317 
     318Although ``abe`` determines the value of ``y2``, ``abe.class_var`` is 
     319still ``y``.  Orange does not complain about the mismatch. 
     320 
     321Using the specific :obj:`LookupLearner`'s call signature can save us 
     322from constructing `table_s` and reassigning the 
     323:obj:`~Orange.data.Domain.class_var`, but it still does not set the 
     324:obj:`~Orange.feature.Descriptor.get_value_from`. 
     325 
     326part of :download:`lookup-table.py <code/lookup-table.py>`:: 
     327 
     328    import Orange 
     329 
     330    table = Orange.data.Table("monks-1") 
     331    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"] 
     332 
     333    y2 = Orange.feature.Discrete("y2", values = ["0", "1"]) 
     334    abe2 = Orange.classification.lookup.LookupLearner(y2, [a, b, e], table) 
     335 
     336For the final example, :obj:`LookupLearner`'s alternative call 
     337arguments offers an easy way to observe feature interactions. For this 
     338purpose, we shall omit ``e``, and construct a 
     339:obj:`ClassifierByDataTable` from ``a`` and ``b`` only (part of 
     340:download:`lookup-table.py <code/lookup-table.py>`): 
     341 
     342.. literalinclude:: code/lookup-table.py 
     343    :lines: 32-35 
     344 
     345The script's output show how the classes are distributed for different 
     346values of ``a`` and ``b``:: 
     347 
     348    ['1', '1', '1'] <0.000, 48.000> 
     349    ['1', '2', '0'] <36.000, 12.000> 
     350    ['1', '3', '0'] <36.000, 12.000> 
     351    ['2', '1', '0'] <36.000, 12.000> 
     352    ['2', '2', '1'] <0.000, 48.000> 
     353    ['2', '3', '0'] <36.000, 12.000> 
     354    ['3', '1', '0'] <36.000, 12.000> 
     355    ['3', '2', '0'] <36.000, 12.000> 
     356    ['3', '3', '1'] <0.000, 48.000> 
     357 
     358For instance, when ``a`` is '1' and ``b`` is '3', the majority class is '0', 
     359and the class distribution is 36:12 in favor of '0'. 
     360 
     361 
     362Utility functions 
     363================= 
     364 
     365 
     366There are several functions related to the above classes. 
     367 
     368.. function:: lookup_from_function(class_var, bound, function) 
     369 
     370    Construct a :obj:`ClassifierByLookupTable` or 
     371    :obj:`ClassifierByDataTable` with the given bound variables and 
     372    then use the function to initialize the lookup table. 
     373 
     374    The function is given the values of features as integer indices and 
     375    must return an integer index of the `class_var`'s value. 
     376 
     377    The following example constructs a new feature called ``a=b`` 
     378    whose value will be "yes" when ``a`` and ``b`` are equal and "no" 
     379    when they are not. We will then add the feature to the data set. 
     380     
     381        >>> bound = [table.domain[name] for name in ["a", "b"]] 
     382        >>> new_var = Orange.feature.Discrete("a=b", values=["no", "yes"]) 
     383        >>> lookup = Orange.classification.lookup.lookup_from_function(new_var, bound, lambda x: x[0] == x[1]) 
     384        >>> new_var.get_value_from = lookup 
     385        >>> import orngCI 
     386        >>> table2 = orngCI.addAnAttribute(new_var, table) 
     387        >>> for i in table2[:30]: 
     388        ...     print i 
     389        ['1', '1', '1', '1', '3', '1', 'yes', '1'] 
     390        ['1', '1', '1', '1', '3', '2', 'yes', '1'] 
     391        ['1', '1', '1', '3', '2', '1', 'yes', '1'] 
     392        ... 
     393        ['1', '2', '1', '1', '1', '2', 'no', '1'] 
     394        ['1', '2', '1', '1', '2', '1', 'no', '0'] 
     395        ['1', '2', '1', '1', '3', '1', 'no', '0'] 
     396        ... 
     397 
     398    The feature was inserted with use of ``orngCI.addAnAttribute``. By setting 
     399    ``new_var.get_value_from`` to ``lookup`` we state that when converting domains 
     400    (either when needed by ``addAnAttribute`` or at some other place), ``lookup`` 
     401    should be used to compute ``new_var``'s value. 
     402 
     403.. function:: lookup_from_data(examples [, weight]) 
     404 
     405    Take a set of data instances (e.g. :obj:`Orange.data.Table`) and 
     406    turn it into a classifier. If there are one, two or three features 
     407    and no ambiguous data instances (i.e. no instances with same 
     408    feature values and different classes), it will construct an 
     409    appropriate :obj:`ClassifierByLookupTable`. Otherwise, it will 
     410    return an :obj:`ClassifierByDataTable`. 
     411     
     412        >>> lookup = Orange.classification.lookup.lookup_from_data(table) 
     413        >>> test_instance = Orange.data.Instance(table.domain, ['3', '2', '2', '3', '4', '1', '?']) 
     414        >>> lookup(test_instance) 
     415        <orange.Value 'y'='0'> 
     416     
     417.. function:: dump_lookup_function(func) 
     418 
     419    Returns a string with a lookup function. Argument ``func`` can be 
     420    any of the above-mentioned classifiers or a feature whose 
     421    :obj:`~Orange.feature.Descriptor.get_value_from` contains one of 
     422    such classifiers. 
     423 
     424    For instance, if ``lookup`` is such as constructed in the example for 
     425    ``lookup_from_function``, it can be printed by:: 
     426     
     427        >>> print dump_lookup_function(lookup) 
     428        a      b      a=b 
     429        ------ ------ ------ 
     430        1      1      yes 
     431        1      2      no 
     432        1      3      no 
     433        2      1      no 
     434        2      2      yes 
     435        2      3      no 
     436        3      1      no 
     437        3      2      no 
     438        3      3      yes 
     439 
  • docs/reference/rst/Orange.classification.rst

    r10265 r10347  
    2424   Orange.classification.bayes 
    2525   Orange.classification.knn 
    26    Orange.classification.logreg 
    27    Orange.classification.lookup 
    28    Orange.classification.majority 
    2926   Orange.classification.rules 
    3027   Orange.classification.svm 
    3128   Orange.classification.tree 
     29   Orange.classification.logreg 
     30   Orange.classification.majority 
     31   Orange.classification.lookup 
    3232   Orange.classification.classfromvar 
    3333    
Note: See TracChangeset for help on using the changeset viewer.