Changeset 7495:2c9de72398fd in orange


Ignore:
Timestamp:
02/04/11 18:08:46 (3 years ago)
Author:
anze <anze.staric@…>
Branch:
default
Convert:
84d5c14a12e035af175ba15bf3e671400ccf65ef
Message:

bayes documentation

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orange/Orange/classification/bayes.py

    r7477 r7495  
    3838    :lines: 7- 
    3939 
    40 Observing probabilities shows a shift towards the third, more frequent class - 
     40Observing probabilities shows a shift towards the second class - 
    4141as compared to probabilities above, where relative frequencies were used. 
    4242Note that the change in error estimation did not have any effect on apriori 
     
    5252    [0.7901746265516516, 0.8280138859667578] 
    5353 
    54 Let us load the data, induce a classifier and see how it performs on the first 
    55 five examples. 
    56  
    57 >>> from Orange import * 
    58 >>> table = data.Table("lenses") 
    59 >>> bayes = classification.bayes.NaiveLearner(table) 
    60 >>> 
    61 >>> for ex in table[:5]: 
    62 ...    print ex.getclass(), bayes(ex) 
    63 no no 
    64 no no 
    65 soft soft 
    66 no no 
    67 hard hard 
    68  
    69 The classifier is correct in all five cases. Interested in probabilities, 
    70 maybe? 
    71  
    72 >>> for ex in table[:5]: 
    73 ...     print ex.getclass(), bayes(ex, \ 
    74 Orange.classification.Classifier.GetProbabilities) 
    75 no <0.423, 0.000, 0.577> 
    76 no <0.000, 0.000, 1.000> 
    77 soft <0.000, 0.668, 0.332> 
    78 no <0.000, 0.000, 1.000> 
    79 hard <0.715, 0.000, 0.285> 
    80  
    81 While very confident about the second and the fourth example, the classifier 
    82 guessed the correct class of the first one only by a small margin of 42 vs. 
    83 58 percents. 
    84  
    85 Now, let us peek into the classifier. 
    86  
    87 >>> print bayes.estimator 
    88 None 
    89 >>> print bayes.distribution 
    90 <0.167, 0.208, 0.625> 
    91 >>> print bayes.conditionalEstimators 
    92 None 
    93 >>> print bayes.conditionalDistributions[0] 
    94 <'young': <0.250, 0.250, 0.500>, 'p_psby': <0.125, 0.250, 0.625>, (...) 
    95 >>> bayes.conditionalDistributions[0]["young"] 
    96 <0.250, 0.250, 0.500> 
    97  
    98 The classifier has no estimator since probabilities are stored in distribution. 
    99 The probability of the first class is 0.167, of the second 0.208 and the 
    100 probability of the third class is 0.625. Nor does it have  
    101 conditionalEstimators, probabilities are stored in conditionalDistributions. 
    102 We printed the contingency matrix for the first attribute and, in the last 
    103 line, conditional probabilities of the three classes when the value of the 
    104 first attribute is "young". 
    105  
    106 Let us now use m-estimate instead of relative frequencies. 
    107  
    108 >>> bayesl = classification.bayes.NaiveLearner(m=2.0) 
    109 >>> bayes = bayesl(table) 
    110  
    111 The classifier is still correct for all examples. 
    112  
    113 >>> for ex in table[:5]: 
    114 ...     print ex.getclass(), bayes(ex, \ 
    115 Orange.classification.Classifier.GetBoth) 
    116 no <0.375, 0.063, 0.562>; 
    117 no <0.016, 0.003, 0.981> 
    118 soft <0.021, 0.607, 0.372> 
    119 no <0.001, 0.039, 0.960> 
    120 hard <0.632, 0.030, 0.338> 
    121  
    122 Observing probabilities shows a shift towards the third, more frequent class - 
    123 as compared to probabilities above, where relative frequencies were used. 
    124  
    125 >>> print bayes.conditionalDistributions[0] 
    126 <'young': <0.233, 0.242, 0.525>, 'p_psby': <0.133, 0.242, 0.625>, (...) 
    127  
    128 Note that the change in error estimation did not have any effect on apriori 
    129 probabilities: 
    130  
    131 >>> print bayes.distribution 
    132 <0.167, 0.208, 0.625> 
    133  
    134 The reason for this is that this same distribution was used as apriori 
    135 distribution for m-estimation. 
    136  
    137 Finally, let us show an example with continuous attributes. We will take iris 
    138 dataset that contains four continuous and no discrete attributes. 
    139  
    140 >>> table = data.Table("iris") 
    141 >>> bayes = orange.BayesLearner(table) 
    142 >>> for exi in range(0, len(table), 20): 
    143 ...     print data[exi].getclass(), bayes(table[exi], \ 
    144 orange.Classifier.GetBoth) 
    145  
    146 The classifier works well. To see a glimpse of how it works, let us observe 
    147 conditional distributions for the first attribute. It is stored in 
    148 conditionalDistributions, as before, except that it now behaves as a 
    149 dictionary, not as a list like before (see information on distributions. 
    150  
    151 >>> print bayes.conditionalDistributions[0] 
    152 <4.300: <0.837, 0.137, 0.026>;, 4.333: <0.834, 0.140, 0.026>, 4.367: <0.830, \ 
    153 (...) 
    154  
    155 For a nicer picture, we can print out the probabilities, copy and paste it to 
    156 some graph drawing program ... and get something like the figure below. 
    157  
    158 >>> for x, probs in bayes.conditionalDistributions[0].items(): 
    159 ...     print "%5.3f\t%5.3f\t%5.3f\t%5.3f" % (x, probs[0], probs[1], probs[2]) 
    160 4.300   0.837   0.137   0.026 
    161 4.333   0.834   0.140   0.026 
    162 4.367   0.830   0.144   0.026 
    163 4.400   0.826   0.147   0.027 
    164 4.433   0.823   0.150   0.027 
    165 (...) 
    166  
    167 If petal lengths are shorter, the most probable class is "setosa". Irises with 
    168 middle petal lengths belong to "versicolor", while longer petal lengths 
    169 indicate for "virginica". Critical values where the decision would change are 
    170 at about 5.4 and 6.3. 
    171  
    172 It is important to stress that the curves are relatively smooth although no 
    173 fitting (either manual or automatic) of parameters took place. 
    174  
    175  
    17654.. _bayes-run.py: code/bayes-run.py 
    17755.. _bayes-thresholdAdjustment.py: code/bayes-thresholdAdjustment.py 
     56.. _bayes-mestimate.py: code/bayes-mestimate.py 
     57.. _adult-sample.tab: code/adult-sample.tab 
    17858.. _iris.tab: code/iris.tab 
    17959 
Note: See TracChangeset for help on using the changeset viewer.