Changeset 7293:79e0f9517c37 in orange


Ignore:
Timestamp:
02/03/11 10:17:56 (3 years ago)
Author:
lanz <lan.zagar@…>
Branch:
default
Convert:
eb367981aeef27ba2e7017543e5b423851fdace0
Message:

majority and lookup doc (lanz)

Location:
orange
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/lookup.py

    r7243 r7293  
    55Lookup classifiers predict classes by looking into stored lists of 
    66cases. There are two kinds of such classifiers in Orange. The simpler 
    7 and fastest ClassifierByLookupTable use up to three discrete attributes 
    8 and have a stored mapping from values of those attributes to class 
    9 value. The more complex classifiers stores an ExampleTable and predicts 
    10 the class by matching the example to examples in the table. 
     7and fastest :obj:`ClassifierByLookupTable` use up to three discrete 
     8features and have a stored mapping from values of those features to 
     9class value. The more complex classifiers store a 
     10:obj:`Orange.data.Table` and predict the class by matching the instance 
     11to instances in the table. 
    1112 
    1213The natural habitat of these classifiers is feature construction: 
    13 they usually reside in getValueFrom fields of constructed attributes 
    14 to facilitate their automatic computation. For instance, the following 
    15 script shows how to translate the Monk 1 dataset features into a more 
    16 useful subset that will only include the attributes a, b, e, and 
    17 attributes that will tell whether a and b are equal and whether e is 1 
    18 (don't bother the details, they follow later). 
    19  
    20 `classifierByLookupTable.py`_ (uses: `monks-1.tab`_): 
    21  
    22 .. literalinclude:: code/classifierByLookupTable.py 
    23  
     14they usually reside in :obj:`getValueFrom` fields of constructed 
     15features to facilitate their automatic computation. For instance, 
     16the following script shows how to translate the `monks-1.tab`_ dataset 
     17features into a more useful subset that will only include the features 
     18a, b, e, and features that will tell whether a and b are equal and 
     19whether e is 1 (don't bother about the details, they follow later). 
     20 
     21part of `classifierByLookupTable.py`_ (uses: `monks-1.tab`_):: 
     22 
     23    import Orange 
     24     
     25    data = Orange.data.Table("monks-1") 
     26     
     27    a, b, e = data.domain["a"], data.domain["b"], data.domain["e"] 
     28     
     29    ab = Orange.data.feature.Discrete("a==b", values = ["no", "yes"]) 
     30    ab.getValueFrom = Orange.classification.lookup.ClassifierByLookupTable(ab, a, b, 
     31                        ["yes", "no", "no",  "no", "yes", "no",  "no", "no", "yes"]) 
     32     
     33    e1 = Orange.data.feature.Discrete("e==1", values = ["no", "yes"]) 
     34    e1.getValueFrom = Orange.classification.lookup.ClassifierByLookupTable(e1, e, 
     35                        ["yes", "no", "no", "no", "?"]) 
     36     
     37    data2 = data.select([a, b, ab, e, e1, data.domain.classVar]) 
     38     
    2439We can check the correctness of the script by printing out several 
    2540random examples from data2. 
    2641 
     42    >>> for i in range(5): 
     43    ...     print data2.randomexample() 
     44    ['1', '1', 'yes', '4', 'no', '1'] 
     45    ['3', '3', 'yes', '2', 'no', '1'] 
     46    ['2', '1', 'no', '4', 'no', '0'] 
     47    ['2', '1', 'no', '1', 'yes', '1'] 
     48    ['1', '1', 'yes', '3', 'no', '1'] 
     49 
     50The first :obj:`ClassifierByLookupTable` takes values of features a 
     51and b and computes the value of ab according to the rule given in the 
     52given table. The first three values correspond to a=1 and b=1, 2, 3; 
     53for the first combination, value of ab should be "yes", for the other 
     54two a and b are different. The next triplet correspond to a=2; 
     55here, the middle value is "yes"... 
     56 
     57The second lookup is simpler: since it involves only a single feature, 
     58the list is a simple one-to-one mapping from the four-valued e to the 
     59two-valued e1. The last value in the list is returned when e is unknown 
     60and tells that e1 should be unknown then as well. 
     61 
     62Note that you don't need :obj:`ClassifierByLookupTable` for this. 
     63The new feature e1 could be computed with a callback to Python, 
     64for instance:: 
     65 
     66    e2.getValueFrom = lambda ex, rw: orange.Value(e2, ex["e"]=="1") 
     67 
     68=========================== 
     69Classifiers by Lookup Table 
     70=========================== 
     71 
     72Although the above example used :obj:`ClassifierByLookupTable` as if it 
     73was a concrete class, :obj:`ClassifierByLookupTable` is actually 
     74abstract. Calling its constructor is a typical Orange trick: what you 
     75get, is never :obj:`ClassifierByLookupTable`, but either 
     76:obj:`ClassifierByLookupTable1`, :obj:`ClassifierByLookupTable2` or 
     77:obj:`ClassifierByLookupTable3`. As their names tell, the first 
     78classifies using a single feature (so that's what we had for e1), 
     79the second uses a pair of features (and has been constructed for ab 
     80above), and the third uses three features. Class predictions for each 
     81combination of feature values are stored in a (one dimensional) table. 
     82To classify an instance, the classifier computes an index of the element 
     83of the table that corresponds to the combination of feature values. 
     84 
     85These classifiers are built to be fast, not safe. If you, for instance, 
     86change the number of values for one of the features, Orange will 
     87most probably crash. To protect you somewhat, many of these classes' 
     88features are read-only and can only be set when the object is 
     89constructed. 
     90 
     91**Attributes:** 
     92 
     93.. attribute:: variable1[, variable2[, variable3]](read only) 
     94     
     95    The attribute(s) that the classifier uses for classification. ClassifierByLookupTable1 only has variable1, ClassifierByLookupTable2 also has variable2 and ClassifierByLookupTable3 has all three. 
     96 
     97.. attribute:: variables (read only) 
     98     
     99    The above variables, returned as a tuple. 
     100 
     101.. attribute:: noOfValues1, noOfValues2[, noOfValues3] (read only) 
     102     
     103    The number of values for variable1, variable2 and variable3. This is stored here to make the classifier faster. Those attributes are defined only for ClassifierByLookupTable2 (the first two) and ClassifierByLookupTable3 (all three). 
     104 
     105.. attribute:: lookupTable (read only) 
     106     
     107    A list of values (ValueList), one for each possible combination of attributes. For ClassifierByLookupTable1, there is an additional element that is returned when the attribute's value is unknown. Values are ordered by values of attributes, with variable1 being the most important. In case of two three valued attributes, the list order is therefore 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3, where the first digit corresponds to variable1 and the second to variable2. 
     108     
     109    The list is read-only in the sense that you cannot assign a new list to this field. You can, however, change its elements. Don't change its size, though.  
     110 
     111.. attribute:: distributions (read only) 
     112     
     113    Similar to lookupTable, but is of type DistributionList and stores a distribution for each combination of values.  
     114 
     115.. attribute:: dataDescription 
     116     
     117    An object of type EFMDataDescription, defined only for ClassifierByLookupTable2 and ClassifierByLookupTable3. They use it to make predictions when one or more attribute values are unknown. ClassifierByLookupTable1 doesn't need it since this case is covered by an additional element in lookupTable and distributions, as told above.  
     118 
     119**Methods:** 
     120 
     121.. method:: ClassifierByLookupTable(classVar, variable1[, variable2[, variable3]] [, lookupTable[, distributions]]) 
     122 
     123    A general constructor that, based on the number of attribute descriptors, constructs one of the three classes discussed. If lookupTable and distributions are omitted, constructor also initializes lookupTable and distributions to two lists of the right sizes, but their elements are don't knows and empty distributions. If they are given, they must be of correct size. 
     124 
    27125 
    28126.. _classifierByLookupTable.py: code/classifierByLookupTable.py 
     
    32130 
    33131import Orange.data 
    34 import Orange.feature 
    35132from Orange.core import \ 
    36133        LookupLearner, \ 
     
    101198         
    102199def printLookupFunction(func): 
    103     if isinstance(func, Orange.feature.Feature): 
     200    if isinstance(func, Orange.data.feature.Feature): 
    104201        if not func.getValueFrom: 
    105202            raise TypeError, "attribute '%s' does not have an associated function" % func.name 
  • orange/Orange/classification/majority.py

    r7226 r7293  
    1919    features. Nevertheless, it has two. 
    2020 
    21     :param estimatorConstructor: An estimator constructor that can  
    22       be used for estimation of class probabilities. If left None, 
    23       probability of each class is estimated as the relative  
    24       frequency of examples belonging to this class. 
    25     :type estimatorConstructor: :class:`Orange.???` or None 
    26     :param aprioriDistribution: Apriori class distribution that is 
    27       passed to estimator constructor if one is given. 
    28     :type aprioriDistribution: :class:`Orange.???` or None 
    29  
     21    .. attribute:: estimatorConstructor 
     22     
     23        An estimator constructor that can be used for estimation of 
     24        class probabilities. If left None, probability of each class is 
     25        estimated as the relative frequency of examples belonging to 
     26        this class. 
     27         
     28    .. attribute:: aprioriDistribution 
     29     
     30        Apriori class distribution that is passed to estimator 
     31        constructor if one is given. 
    3032 
    3133============== 
     
    3840    same class probabilities. 
    3941 
    40     :param defaultVal: Value that is returned by the classifier. 
    41     :type defaultVal: :class:`Orange.???` or None 
    42     :param defaultDistribution: Class probabilities returned by the 
    43       classifier. 
    44     :type defaultDistribution: :class:`Orange.???` or None 
     42    .. attribute:: defaultVal 
     43     
     44        Value that is returned by the classifier. 
     45     
     46    .. attribute:: defaultDistribution 
     47 
     48        Class probabilities returned by the classifier. 
    4549 
    4650The ConstantClassifier's constructor can be called without arguments, 
  • orange/doc/Orange/rst/code/classifierByLookupTable.py

    r7240 r7293  
    1 # Description: Shows how to construct and use classifiers by lookup table to construct new features from the existing 
    2 # Category:    classification, lookup classifiers, constructive induction, feature construction 
    3 # Classes:     ClassifierByLookupTable, ClassifierByLookupTable1, ClassifierByLookupTable2, ClassifierByLookupTable3 
     1# Description: Shows how to construct and use classifiers by lookup table 
     2#              to construct new features from the existing 
     3# Category:    classification, lookup classifiers, constructive induction, 
     4#              feature construction 
     5# Classes:     ClassifierByLookupTable, ClassifierByLookupTable1, 
     6#              ClassifierByLookupTable2, ClassifierByLookupTable3 
    47# Uses:        monk1 
    58# Referenced:  lookup.htm 
     
    1215 
    1316ab = orange.EnumVariable("a==b", values = ["no", "yes"]) 
    14 ab.getValueFrom = orange.ClassifierByLookupTable(ab, a, b, ["yes", "no", "no",  "no", "yes", "no",  "no", "no", "yes"]) 
     17ab.getValueFrom = orange.ClassifierByLookupTable(ab, a, b, 
     18                    ["yes", "no", "no",  "no", "yes", "no",  "no", "no", "yes"]) 
    1519 
    1620e1 = orange.EnumVariable("e==1", values = ["no", "yes"]) 
    17 e1.getValueFrom = orange.ClassifierByLookupTable(e1, e, ["yes", "no", "no", "no", "?"]) 
     21e1.getValueFrom = orange.ClassifierByLookupTable(e1, e, 
     22                    ["yes", "no", "no", "no", "?"]) 
    1823 
    1924data2 = data.select([a, b, ab, e, e1, data.domain.classVar]) 
     
    2429for i in range(5): 
    2530    ex = data.randomexample() 
    26     print "%s: ab %i, e1 %i " % (ex, ab.getValueFrom.getindex(ex), e1.getValueFrom.getindex(ex)) 
     31    print "%s: ab %i, e1 %i " % (ex, ab.getValueFrom.getindex(ex), 
     32                                 e1.getValueFrom.getindex(ex)) 
    2733     
    2834# What follows is only for testing Orange... 
  • orange/doc/Orange/rst/code/majority.py

    r7236 r7293  
    1 # Description: Shows how to "learn" the majority class and compare other classifiers to the default classification 
     1# Description: Shows how to "learn" the majority class and compare 
     2#              other classifiers to the default classification 
    23# Category:    default classification accuracy, statistics 
    3 # Classes:     MajorityLearner, ConstantClassifier, Orange.evaluate.crossValidation 
     4# Classes:     MajorityLearner, Orange.evaluate.crossValidation 
    45# Uses:        monks-1 
    56# Referenced:  majority.htm 
Note: See TracChangeset for help on using the changeset viewer.