Changeset 7614:330049b0117c in orange
 Timestamp:
 02/06/11 19:39:05 (3 years ago)
 Branch:
 default
 Convert:
 510fb1eadfbe18dab98b0828b84f899b6da86e49
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

orange/Orange/statistics/distributions.py
r7574 r7614 120 120 121 121 122 ==================123 122 Contingency Matrix 124 123 ================== 125 124 126 Contingency matrix contains conditional distributions. They can work for both, 127 discrete and continuous variables; although examples on this page will mostly 128 use discrete ones, similar code could be run for continuous variables. 125 Contingency matrix contains conditional distributions. When initialized, they 126 will typically contain absolute frequencies, that is, the number of instances 127 with a particular combination of two variables' values. If they are normalized 128 by dividing each cell by the row sum, the represent conditional probabilities 129 of the column variable (here denoted as ``innerVariable``) conditioned by the 130 row variable (``outerVariable``). 131 132 Contingencies work with both, discrete and continuous variables. 129 133 130 134 .. _distributionscontingency: code/distributionscontingency.py … … 156 160 .. class:: Orange.statistics.distribution.Contingency 157 161 158 .. attribute:: outerVariable (`Orange.data.feature.Feature`_) 159 160 Descriptor of the outer variable. 161 162 .. _`Orange.data.feature.Feature`: :obj:`Orange.data.feature.Feature` 163 164 .. attribute:: innerVariable (:class:`Orange.data.feature.Feature`) 165 166 Descriptor of the inner variable. 167 162 .. attribute:: outerVariable 163 164 Descriptor (:class:`Orange.data.feature.Feature`) of the outer variable. 165 166 .. attribute:: innerVariable 167 168 Descriptor (:class:`Orange.data.feature.Feature`) of the inner variable. 169 168 170 .. attribute:: outerDistribution 169 171 170 The distribution (`of the outer feature's values  sums of rows. 171 In the above case, distribution of ``e`` is 172 <108.000, 108.000, 108.000, 108.000> 172 The marginal distribution (:class:`Distribution`) of the outer variable. 173 173 174 174 .. attribute:: innerDistribution 175 175 176 The distribution of the inner feature. 177 In the above case, it is the class distribution 178 which is <216.000, 216.000<. 179 176 The marginal distribution (:class:`Distribution`) of the inner variable. 177 180 178 .. attribute:: innerDistributionUnknown 181 179 182 The distribution of the inner feature for the183 instances where the outer feature was unknown.184 This is the difference between the innerDistribution185 and the sum of all distributions in the matrix.180 The distribution (:class:`Distribution`) of the inner variable for 181 instances for which the outer variable was undefined. 182 This is the difference between the ``innerDistribution`` 183 and unconditional distribution of inner variable. 186 184 187 185 .. attribute:: varType 188 186 189 The varType for the outer feature (discrete, continuous...); 190 varType equals outerVariable.varType and outerDistribution.varType. 191 192 Contingency matrix is a cross between dictionary and a list. 193 It supports standard dictionary methods keys, values and items.:: 194 195 >> print cont.keys() 196 ['1', '2', '3', '4'] 197 >>> print cont.values() 198 [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>] 199 >>> print cont.items() 200 [('1', <0.000, 108.000>), ('2', <72.000, 36.000>), 201 ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)] 202 203 Although keys returned by the above functions are strings, 204 you can index the contingency with anything that converts into values 205 of the outer feature  strings, numbers or instances of Value.:: 206 207 >>> print cont[0] 208 <0.000, 108.000> 209 >>> print cont["1"] 210 <0.000, 108.000> 211 >>> print cont[Orange.data.Value(data.domain["e"], "1")] 212 213 Naturally, the length of Contingency equals the number of values of the outer 214 feature. The only weird thing is that iterating through contingency 215 (by using a for loop, for instance) doesn't return keys, as with dictionaries, 216 but dictionary values.:: 217 218 >>> for i in cont: 219 ... print i 220 <0.000, 108.000> 221 <72.000, 36.000> 222 <72.000, 36.000> 223 <72.000, 36.000> 224 <72.000, 36.000> 225 226 If cont behaved like a normal dictionary, the above script would print out strings from '0' to '3'. 227 228 229 Other methods 230 231 .. class:: Orange.statistics.distributions.Contingency 232 233 .. method:: add(outer_value, inner_value[, weight]) 234 235 Adds an element to the contingency matrix. 187 The type of the outer feature (:obj:`Orange.data.Type`, usually 188 :obj:`Orange.data.feature.Discrete` or 189 :obj:`Orange.data.feature.Continuous`). ``varType`` equals ``outerVariable.varType`` and ``outerDistribution.varType``. 190 191 .. method:: __init__(outerVariable, innerVariable) 192 193 :param outerVariable: Descriptor of the outer variable 194 :type outerVariable: Orange.data.feature.Feature 195 :param outerVariable: Descriptor of the inner variable 196 :type innerVariable: Orange.data.feature.Feature 197 198 Construct an instance of ``Contingency``. 199 200 .. method:: add(outer_value, inner_value[, weight=1]) 201 202 :param outer_value: The value for the outer variable 203 :type outer_value: int, float, string or :obj:`Orange.data.Value` 204 :param inner_value: The value for the inner variable 205 :type inner_value: int, float, string or :obj:`Orange.data.Value` 206 :param weight: Instance weight 207 :type weight: float 208 209 Add an element to the contingency matrix by adding 210 ``weight`` to the corresponding cell. 236 211 237 212 .. method:: normalize() 238 213 239 Normalizes all distributions (rows) in the contingency to sum to 1. 240 It doesn't change the innerDistribution or outerDistribution.:: 241 242 >>> cont.normalize() 243 >>> for val, dist in cont.items(): 244 print val, dist 245 246 This outputs: :: 247 248 1 <0.000, 1.000> 249 2 <0.667, 0.333> 250 3 <0.667, 0.333> 251 4 <0.667, 0.333> 252 253 .. _distributionscontingency2: code/distributionscontingency2.py 254 255 part of `distributionscontingency2`_ (uses monks1.tab) 256 257 .. literalinclude:: code/distributionscontingency2.py 258 259 The "reproduction" is not perfect. We didn't care about unknown values 260 and haven't computed innerDistribution and outerDistribution. 261 The better way to do it is by using the method add, so that the loop becomes: :: 262 263 for ins in table: 264 cont.add(ins["e"], ins.getclass()) 265 266 It's not only simpler, but also correctly handles unknown values 267 and updates innerDistribution and outerDistribution. 214 Normalize all distributions (rows) in the contingency to sum to ``1``:: 215 216 >>> cont.normalize() 217 >>> for val, dist in cont.items(): 218 print val, dist 219 220 Output: :: 221 222 1 <0.000, 1.000> 223 2 <0.667, 0.333> 224 3 <0.667, 0.333> 225 4 <0.667, 0.333> 226 227 .. note:: 228 229 This method doesn't change the ``innerDistribution`` or 230 ``outerDistribution``. 231 232 With respect to indexing, contingency matrix is a cross between dictionary 233 and a list. It supports standard dictionary methods ``keys``, ``values`` and 234 ``items``.:: 235 236 >> print cont.keys() 237 ['1', '2', '3', '4'] 238 >>> print cont.values() 239 [<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>] 240 >>> print cont.items() 241 [('1', <0.000, 108.000>), ('2', <72.000, 36.000>), 242 ('3', <72.000, 36.000>), ('4', <72.000, 36.000>)] 243 244 Although keys returned by the above functions are strings, contingency 245 can be indexed with anything that converts into values 246 of the outer variable: strings, numbers or instances of ``Orange.data.Value``.:: 247 248 >>> print cont[0] 249 <0.000, 108.000> 250 >>> print cont["1"] 251 <0.000, 108.000> 252 >>> print cont[orange.Value(data.domain["e"], "1")] 253 254 The length of ``Contingency`` equals the number of values of the outer 255 variable. However, iterating through contingency 256 doesn't return keys, as with dictionaries, but distributions.:: 257 258 >>> for i in cont: 259 ... print i 260 <0.000, 108.000> 261 <72.000, 36.000> 262 <72.000, 36.000> 263 <72.000, 36.000> 264 <72.000, 36.000> 265 268 266 269 267 .. class:: Orange.statistics.distribution.ContingencyClass 270 268 271 ContingencyClassis an abstract base class for contingency matrices269 ``ContingencyClass`` is an abstract base class for contingency matrices 272 270 that contain the class, either as the inner or the outer 273 feature. If offers a function for making filing the contingency clearer. 274 275 After reading through the rest of this page you might ask yourself 276 why do we need to separate the classes ContingencyAttrClass, 277 ContingencyClassAttr and ContingencyAttrAttr, 278 given that the underlying matrix is the same. This is to avoid confusion 279 about what is in the inner and the outer variable. 280 Contingency matrices are most often used to compute probabilities of conditional 281 classes or features. By separating the classes and giving them specialized 282 methods for computing the probabilities that are most suitable to compute 283 from a particular class, the user (ie, you or the method that gets passed 284 the matrix) is relieved from checking what kind of matrix it got, that is, 285 where is the class and where's the feature. 286 287 271 variable. 288 272 289 273 .. attribute:: classVar (read only) 290 274 291 275 The class attribute descriptor. 292 This is always equal either to innerVariable or outerVariable 276 This is always equal either to :obj:`Contingency.innerVariable` or 277 ``outerVariable``. 293 278 294 279 .. attribute:: variable
Note: See TracChangeset
for help on using the changeset viewer.