Changeset 7356:201d16be077f in orange


Ignore:
Timestamp:
02/03/11 23:33:29 (3 years ago)
Author:
lanz <lan.zagar@…>
Branch:
default
Convert:
d700d4d3aa05cd31a0cf36a242de6f89d7f13677
Message:

Documentation for lookup (and minor for majority).

Location:
orange
Files:
2 added
1 deleted
2 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/lookup.py

    r7301 r7356  
    1919whether e is 1 (don't bother about the details, they follow later). 
    2020 
    21 part of `lookup-classifier.py`_ (uses: `monks-1.tab`_): 
    22  
    23 .. literalinclude:: code/lookup-classifier.py 
     21part of `lookup-lookup.py`_ (uses: `monks-1.tab`_): 
     22 
     23.. literalinclude:: code/lookup-lookup.py 
    2424    :lines: 7-21 
    2525     
     
    8080.. attribute:: variable1[, variable2[, variable3]](read only) 
    8181     
    82     The attribute(s) that the classifier uses for classification. ClassifierByLookupTable1 only has variable1, ClassifierByLookupTable2 also has variable2 and ClassifierByLookupTable3 has all three. 
     82    The feature(s) that the classifier uses for classification. 
     83    ClassifierByLookupTable1 only has variable1, 
     84    ClassifierByLookupTable2 also has variable2 and 
     85    ClassifierByLookupTable3 has all three. 
    8386 
    8487.. attribute:: variables (read only) 
     
    8891.. attribute:: noOfValues1, noOfValues2[, noOfValues3] (read only) 
    8992     
    90     The number of values for variable1, variable2 and variable3. This is stored here to make the classifier faster. Those attributes are defined only for ClassifierByLookupTable2 (the first two) and ClassifierByLookupTable3 (all three). 
     93    The number of values for variable1, variable2 and variable3. 
     94    This is stored here to make the classifier faster. Those features 
     95    are defined only for ClassifierByLookupTable2 (the first two) and 
     96    ClassifierByLookupTable3 (all three). 
    9197 
    9298.. attribute:: lookupTable (read only) 
    9399     
    94     A list of values (ValueList), one for each possible combination of attributes. For ClassifierByLookupTable1, there is an additional element that is returned when the attribute's value is unknown. Values are ordered by values of attributes, with variable1 being the most important. In case of two three valued attributes, the list order is therefore 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3, where the first digit corresponds to variable1 and the second to variable2. 
    95      
    96     The list is read-only in the sense that you cannot assign a new list to this field. You can, however, change its elements. Don't change its size, though.  
     100    A list of values (ValueList), one for each possible combination of 
     101    features. For ClassifierByLookupTable1, there is an additional 
     102    element that is returned when the feature's value is unknown. 
     103    Values are ordered by values of features, with variable1 being the 
     104    most important. In case of two three valued features, the list 
     105    order is therefore 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3, 
     106    where the first digit corresponds to variable1 and the second to 
     107    variable2. 
     108     
     109    The list is read-only in the sense that you cannot assign a new 
     110    list to this field. You can, however, change its elements. Don't 
     111    change its size, though.  
    97112 
    98113.. attribute:: distributions (read only) 
    99114     
    100     Similar to lookupTable, but is of type DistributionList and stores a distribution for each combination of values.  
     115    Similar to :obj:`lookupTable`, but is of type DistributionList 
     116    and stores a distribution for each combination of values.  
    101117 
    102118.. attribute:: dataDescription 
    103119     
    104     An object of type EFMDataDescription, defined only for ClassifierByLookupTable2 and ClassifierByLookupTable3. They use it to make predictions when one or more attribute values are unknown. ClassifierByLookupTable1 doesn't need it since this case is covered by an additional element in lookupTable and distributions, as told above.  
     120    An object of type EFMDataDescription, defined only for 
     121    ClassifierByLookupTable2 and ClassifierByLookupTable3. They use 
     122    it to make predictions when one or more feature values are unknown. 
     123    ClassifierByLookupTable1 doesn't need it since this case is covered 
     124    by an additional element in lookupTable and distributions, 
     125    as told above.  
    105126 
    106127**Methods:** 
     
    108129.. method:: ClassifierByLookupTable(classVar, variable1[, variable2[, variable3]] [, lookupTable[, distributions]]) 
    109130 
    110     A general constructor that, based on the number of attribute descriptors, constructs one of the three classes discussed. If lookupTable and distributions are omitted, constructor also initializes lookupTable and distributions to two lists of the right sizes, but their elements are don't knows and empty distributions. If they are given, they must be of correct size. 
    111  
    112  
    113 .. _lookup-classifier.py: code/lookup-classifier.py 
     131    A general constructor that, based on the number of attribute 
     132    descriptors, constructs one of the three classes discussed. 
     133    If lookupTable and distributions are omitted, constructor also 
     134    initializes lookupTable and distributions to two lists of the 
     135    right sizes, but their elements are don't knows and empty 
     136    distributions. If they are given, they must be of correct size. 
     137 
     138.. method:: ClassifierByLookupTable1(classVar, variable1 [, lookupTable, distributions]) 
     139            ClassifierByLookupTable2(classVar, variable1, variable2, [, lookupTable[, distributions]]) 
     140            ClassifierByLookupTable3(classVar, variable1, variable2, variable3, [, lookupTable[, distributions]]) 
     141     
     142    Class-specific constructors that you can call instead of the general constructor. The number of attributes must match the constructor called. 
     143 
     144.. method:: getindex(example) 
     145     
     146    Returns an index into lookupTable or distributions. The formula 
     147    depends upon the type of the classifier. If value *i* is 
     148    int(example[variable*i*]), then the corresponding formulae are 
     149 
     150    ClassifierByLookupTable1: 
     151        index = value1, or len(lookupTable)-1 if value is unknown 
     152    ClassifierByLookupTable2: 
     153        index = value1*noOfValues1 + value2, or -1 if any value is unknown  
     154    ClassifierByLookupTable3: 
     155        index = (value1*noOfValues1 + value2) * noOfValues2 + value3, or -1 if any value is unknown 
     156 
     157    Let's see some indices for randomly chosen examples from the original table. 
     158     
     159    part of `lookup-lookup.py`_ (uses: `monks-1.tab`_): 
     160 
     161    .. literalinclude:: code/lookup-lookup.py 
     162        :lines: 26-29     
     163 
     164========================== 
     165Classifier by ExampleTable 
     166========================== 
     167 
     168:obj:`ClassifierByExampleTable` is the alternative to 
     169:obj:`ClassifierByLookupTable`. It is to be used when the 
     170classification is based on more than three features. Instead of having 
     171a lookup table, it stores an :obj:`Orange.data.Table`, which is 
     172optimized for a faster access. 
     173 
     174This class is used in similar contexts as 
     175:obj:`ClassifierByLookupTable`. If you write, for instance, a 
     176constructive induction algorithm, it is recommendable that the values 
     177of the new feature are computed either by one of classifiers by lookup 
     178table or by :obj:`ClassifierByExampleTable`, depending on the number 
     179of bound features. 
     180 
     181**Attributes:** 
     182 
     183.. attribute:: sortedExamples 
     184     
     185    A :obj:`Orange.data.Table` with sorted instances for lookup. 
     186    Instances in the table can be merged; if there were multiple 
     187    instances with the same feature values (but possibly different 
     188    classes), they are merged into a single instance. Regardless of 
     189    merging, class values in this table are distributed: their svalue 
     190    contains a :obj:`Distribution`. 
     191 
     192.. attribute:: classifierForUnknown 
     193     
     194    This classifier is used to classify instances which were not found 
     195    in the table. If classifierForUnknown is not set, don't know's are 
     196    returned. 
     197 
     198.. attribute:: variables (read only) 
     199     
     200    A tuple with features in the domain. This field is here so that 
     201    :obj:`ClassifierByExampleTable` appears more similar to 
     202    :obj:`ClassifierByLookupTable`. If a constructive induction 
     203    algorithm returns the result in one of these classifiers, and you 
     204    would like to check which features are used, you can use variables 
     205    regardless of the class you actually got. 
     206 
     207There are no specific methods for :obj:`ClassifierByExampleTable`. 
     208Since this is a classifier, it can be called. When the instance to be 
     209classified includes unknown values, :obj:`classifierForUnknown` will be 
     210used if it is defined. 
     211 
     212Although :obj:`ClassifierByExampleTable` is not really a classifier in 
     213the sense that you will use it to classify instances, but is rather a 
     214function for computation of intermediate values, it has an associated 
     215learner, :obj:`LookupLearner`. The learner's task is, basically, to 
     216construct a Table for :obj:`sortedExamples`. It sorts them, merges them 
     217and, of course, regards instance weights in the process as well. 
     218 
     219part of `lookup-table.py`_ (uses: `monks-1.tab`_): 
     220 
     221.. literalinclude:: code/lookup-table.py 
     222    :lines: 7-13 
     223 
     224 
     225In data_s, we have prepared a table in which instances are described 
     226only by a, b, e and the class. Learner constructs a 
     227ClassifierByExampleTable and stores instances from data_s into its 
     228sortedExamples. Instances are merged so that there are no duplicates. 
     229 
     230    >>> print len(data_s) 
     231    432 
     232    >>> print len(abe2.sortedExamples) 
     233    36 
     234    >>> for i in abe2.sortedExamples[:5]: 
     235    ...     print i 
     236    ['1', '1', '1', '1'] 
     237    ['1', '1', '2', '1'] 
     238    ['1', '1', '3', '1'] 
     239    ['1', '1', '4', '1'] 
     240    ['1', '2', '1', '1'] 
     241    ['1', '2', '2', '0'] 
     242    ['1', '2', '3', '0'] 
     243    ['1', '2', '4', '0'] 
     244    ['1', '3', '1', '1'] 
     245    ['1', '3', '2', '0'] 
     246 
     247Well, there's a bit more here than meets the eye: each instance's class 
     248value also stores the distribution of classes for all instances that 
     249were merged into it. In our case, the three features suffice to 
     250unambiguously determine the classes and, since instances covered the 
     251entire space, all distributions have 12 instances in one of the class 
     252and none in the other. 
     253 
     254    >>> for i in abe2.sortedExamples[:10]: 
     255    ...     print i, i.getclass().svalue 
     256    ['1', '1', '1', '1'] <0.000, 12.000> 
     257    ['1', '1', '2', '1'] <0.000, 12.000> 
     258    ['1', '1', '3', '1'] <0.000, 12.000> 
     259    ['1', '1', '4', '1'] <0.000, 12.000> 
     260    ['1', '2', '1', '1'] <0.000, 12.000> 
     261    ['1', '2', '2', '0'] <12.000, 0.000> 
     262    ['1', '2', '3', '0'] <12.000, 0.000> 
     263    ['1', '2', '4', '0'] <12.000, 0.000> 
     264    ['1', '3', '1', '1'] <0.000, 12.000> 
     265    ['1', '3', '2', '0'] <12.000, 0.000> 
     266 
     267ClassifierByExampleTable will usually be used by getValueFrom. So, we 
     268would probably continue this by constructing a new feature and put the 
     269classifier into its getValueFrom. 
     270 
     271    >>> y2 = orange.EnumVariable("y2", values = ["0", "1"]) 
     272    >>> y2.getValueFrom = abe 
     273 
     274There's something disturbing here. Although abe determines the value of 
     275y2, abe.classVar is still y. Orange doesn't bother (the whole example 
     276is artificial - you will seldom pack the entire dataset in an 
     277ClassifierByExampleTable...), so shouldn't you. But still, for the sake 
     278of hygiene, you can conclude by 
     279 
     280    >>> abe.classVar = y2 
     281 
     282The whole story can be greatly simplified. LookupLearner can also be 
     283called differently than other learners. Besides instances, you can pass 
     284the new class variable and the features that should be used for 
     285classification. This saves us from constructing data_s and reassigning 
     286the classVar. It doesn't set the getValueFrom, though. 
     287 
     288part of `lookup-table.py`_ (uses: `monks-1.tab`_):: 
     289 
     290    import Orange 
     291 
     292    table = Orange.data.Table("monks-1") 
     293    a, b, e = table.domain["a"], table.domain["b"], table.domain["e"] 
     294 
     295    y2 = Orange.data.feature.Discrete("y2", values = ["0", "1"]) 
     296    abe2 = Orange.classification.lookup.LookupLearner(y2, [a, b, e], table) 
     297 
     298Let us, for the end, show another use of LookupLearner. With the 
     299alternative call arguments, it offers an easy way to observe feature 
     300interactions. For this purpose, we shall omit e, and construct a 
     301ClassifierByExampleTable from a and b only. 
     302 
     303part of `lookup-table.py`_ (uses: `monks-1.tab`_): 
     304 
     305.. literalinclude:: code/lookup-table.py 
     306    :lines: 32-35 
     307 
     308The script's output show how the classes are distributed for different 
     309values of a and b:: 
     310 
     311    ['1', '1', '1'] <0.000, 48.000> 
     312    ['1', '2', '0'] <36.000, 12.000> 
     313    ['1', '3', '0'] <36.000, 12.000> 
     314    ['2', '1', '0'] <36.000, 12.000> 
     315    ['2', '2', '1'] <0.000, 48.000> 
     316    ['2', '3', '0'] <36.000, 12.000> 
     317    ['3', '1', '0'] <36.000, 12.000> 
     318    ['3', '2', '0'] <36.000, 12.000> 
     319    ['3', '3', '1'] <0.000, 48.000> 
     320 
     321For instance, when a is '1' and b is '3', the majority class is '0', 
     322and the class distribution is 36:12 in favor of '0'. 
     323 
     324 
     325.. _lookup-lookup.py: code/lookup-lookup.py 
     326.. _lookup-table.py: code/lookup-table.py 
    114327.. _monks-1.tab: code/monks-1.tab 
    115328 
     
    123336              ClassifierByLookupTable2, \ 
    124337              ClassifierByLookupTable3, \ 
    125               ClassifierByExampleTable 
     338              ClassifierByExampleTable as ClassifierByDataTable 
    126339 
    127340 
  • orange/doc/Orange/rst/code/majority.py

    r7297 r7356  
    1010data = Orange.data.Table("monks-1") 
    1111 
    12 treeLearner = orange.TreeLearner() 
    13 bayesLearner = orange.BayesLearner() 
     12treeLearner = Orange.classification.tree.TreeLearner() #orange.TreeLearner() 
     13bayesLearner = Orange.classification.bayes.NaiveBayesLearner() 
    1414majorityLearner = Orange.classification.majority.MajorityLearner() 
    1515learners = [treeLearner, bayesLearner, majorityLearner] 
Note: See TracChangeset for help on using the changeset viewer.