Changeset 10101:5318f864a842 in orange
 Timestamp:
 02/08/12 18:14:51 (2 years ago)
 Branch:
 default
 rebase_source:
 6f8a6086772a2376a8f6ac7ce2e0f8156dfa1bcf
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

Orange/statistics/estimate.py
r9671 r10101 6 6 ======================================= 7 7 8 Probability estimators are compute value probabilities. 9 10 There are two branches of probability estimators: 11 12 #. for unconditional and 13 14 #. for conditional probabilities. 15 16 For naive Bayesian classification the first compute p(C) 17 and the second p(Cv), where C is a class and v is a feature value. 18 19 Since probability estimation is usually based on the data, the whole 20 setup is done in orange way. As for learning, where you use a learner 21 to construct a classifier, in probability estimation there are estimator 22 constructors whose purpose is to construct probability estimators. 23 24 This page is divided into three sections. The first describes the basic 25 classes, the second contains classes that are abstract or only support 26 "real" estimators  you would seldom use these directly. The last section 27 contains estimators and constructors that you would most often use. If 28 you are not interested details, skip the first two sections. 29 30 Basic classes 8 Probability estimators compute probabilities of values of class variable. 9 They come in two flavours: 10 11 #. for unconditional probabilities (:math:`p(C=c)`, where :math:`c` is a 12 class) and 13 14 #. for conditional probabilities (:math:`p(C=cV=v)`, 15 where :math:`v` is a feature value). 16 17 A duality much like the one between learners and classifiers exists between 18 probability estimator constructors and probability estimators: when a 19 probability estimator constructor is called with data, it constructs a 20 probability estimator that can then be called with a value of class variable 21 to obtain a probability of that value. This duality is mainly needed to 22 enable probability estimation for continuous variables, 23 where it is not possible to generate a list of probabilities of all possible 24 values in advance. 25 26 First, probability estimation constructors for common probability estimation 27 techniques are enumerated. Base classes, knowledge of which is needed to 28 develop new techniques, are described later in this document. 29 30 Probability Estimation Constructors 31 =================================== 32 33 .. class:: RelativeFrequency 34 35 Bases: :class:`EstimatorConstructor` 36 37 Compute distribution using relative frequencies of classes. 38 39 :rtype: :class:`EstimatorFromDistribution` 40 41 .. class:: Laplace 42 43 Bases: :class:`EstimatorConstructor` 44 45 Use Laplace estimation to compute distribution from frequencies of classes: 46 47 .. math:: 48 49 p(c) = \\frac{Nc+1}{N+n} 50 51 where :math:`Nc` is number of occurrences of an event (e.g. number of 52 instances in class c), :math:`N` is the total number of events (instances) 53 and :math:`n` is the number of different events (classes). 54 55 :rtype: :class:`EstimatorFromDistribution` 56 57 .. class:: M 58 59 Bases: :class:`EstimatorConstructor` 60 61 .. method:: __init__(m) 62 63 :param m: Parameter for mestimation. 64 :type m: int 65 66 Use mestimation to compute distribution from frequencies of classes: 67 68 .. math:: 69 70 p(c) = \\frac{Nc+m*ap(c)}{N+m} 71 72 where :math:`Nc` is number of occurrences of an event (e.g. number of 73 instances in class c), :math:`N` is the total number of events (instances) 74 and :math:`ap(c)` is the prior probability of event (class) c. 75 76 :rtype: :class:`EstimatorFromDistribution` 77 78 .. class:: Kernel 79 80 Bases: :class:`EstimatorConstructor` 81 82 .. method:: __init__(min_impact, smoothing, n_points) 83 84 :param min_impact: A requested minimal weight of a point (default: 85 0.01); points with lower weights won't be taken into account. 86 :type min_impact: float 87 88 :param smoothing: Smoothing factor (default: 1.144). 89 :type smoothing: float 90 91 :param n_points: Number of points for the interpolating curve. If 92 negative, say 3 (default), 3 points will be inserted between each 93 data points. 94 :type n_points: int 95 96 Compute probabilities for continuous variable for certain number of points 97 using Gaussian kernels. The resulting pointwise continuous distribution is 98 stored as :class:`~Orange.statistics.distribution.Continuous`. 99 100 Probabilities are always computed at all points that 101 are present in the data (i.e. the existing values of the continuous 102 feature). If :obj:`n_points` is positive and greater than the 103 number of existing data points, additional points are inserted 104 between the existing points to achieve the required number of 105 points. Approximately equal number of new points is inserted between 106 each adjacent existing point each data points. If :obj:`n_points` is 107 negative, its absolute value determines the number of points to be added 108 between each two data points. 109 110 :rtype: :class:`EstimatorFromDistribution` 111 112 .. class:: Loess 113 114 Bases: :class:`EstimatorConstructor` 115 116 .. method:: __init__(window_proportion, n_points) 117 118 :param window_proportion: A proportion of points in a window. 119 :type window_proportion: float 120 121 :param n_points: Number of points for the interpolating curve. If 122 negative, say 3 (default), 3 points will be inserted between each 123 data points. 124 :type n_points: int 125 126 Prepare a probability estimator that computes probability at point ``x`` 127 as weighted local regression of probabilities for points in the window 128 around this point. 129 130 The window contains a prescribed proportion of original data points. The 131 window is as symmetric as possible in the sense that the leftmost point in 132 the window is approximately as far from ``x`` as the rightmost. The 133 number of points to the left of ``x`` might thus differ from the number 134 of points to the right. 135 136 Points are weighted by bicubic weight function; a weight of point 137 at ``x'`` is :math:`(1t^3)^3`, where :math:`t` is 138 :math:`(xx'>)/h` and :math:`h` is the distance to the farther 139 of the two window edge points. 140 141 :rtype: :class:`EstimatorFromDistribution` 142 143 144 .. class:: ConditionalLoess 145 146 Bases: :class:`ConditionalEstimatorConstructor` 147 148 .. method:: __init__(window_proportion, n_points) 149 150 :param window_proportion: A proportion of points in a window. 151 :type window_proportion: float 152 153 :param n_points: Number of points for the interpolating curve. If 154 negative, say 3 (default), 3 points will be inserted between each 155 data points. 156 :type n_points: int 157 158 Construct a conditional probability estimator, in other aspects 159 similar to the one constructed by :class:`Loess`. 160 161 :rtype: :class:`ConditionalEstimatorFromDistribution`. 162 163 164 Base classes 31 165 ============= 32 166 33 Four basic abstract classes serve as roots of the hierarchy: 34 :class:`Estimator`, :class:`EstimatorConstructor`, 35 :class:`ConditionalEstimator` and 36 :class:`ConditionalEstimatorConstructor`. 167 All probability estimators are derived from two base classes: one for 168 unconditional and the other for conditional probability estimation. The same 169 is true for probability estimator constructors. 170 171 .. class:: EstimatorConstructor 172 173 Constructor of an unconditional probability estimator. 174 175 .. method:: __call__([distribution[, apriori]], [instances[, weight_id]]) 176 177 :param distribution: input distribution. 178 :type distribution: :class:`~Orange.statistics.distribution.Distribution` 179 180 :param apriori: prior distribution. 181 :type distribution: :class:`~Orange.statistics.distribution.Distribution` 182 183 :param instances: input data. 184 :type distribution: :class:`Orange.data.Table` 185 186 :param weight_id: ID of the weight attribute. 187 :type weight_id: int 188 189 If distribution is given, it can be followed by prior class 190 distribution. Similarly, instances can be followed by with 191 the ID of meta attribute with instance weights. (Hint: to pass a 192 prior distribution and instances, but no distribution, 193 just pass :obj:`None` for the latter.) When both, 194 distribution and instances are given, it is up to constructor to 195 decide what to use. 37 196 38 197 .. class:: Estimator 39 198 40 199 .. attribute:: supports_discrete 41 200 42 201 Tells whether the estimator can handle discrete attributes. 43 202 44 203 .. attribute:: supports_continuous 45 204 46 205 Tells whether the estimator can handle continuous attributes. 47 206 48 207 .. method:: __call__([value]) 49 208 50 If value is given, Return the probability of the value 51 (as float). When the value is omitted, the object attempts 52 to return a distribution of probabilities for all values (as 53 :class:`~Orange.statistics.distribution.Distribution`). The 54 result can be :class:`~Orange.statistics.distribution.Discrete` 55 for discrete, :class:`~Orange.statistics.distribution.Continuous` 56 for continuous features or an instance of some other class derived 57 from :class:`~Orange.statistics.distribution.Distribution`. Note 58 that it indeed makes sense to return continuous 59 distribution. Although probabilities are stored 60 pointwise (as something similar to Python's map, where 61 keys are attribute values and items are probabilities, 62 :class:`~Orange.statistics.distribution.Distribution` can compute 63 probabilities between the recorded values by interpolation. 64 65 The estimator does not necessarily support 66 returning precomputed probabilities in form of 67 :class:`~Orange.statistics.distribution.Distribution`; in this 68 case, it simply returns None. 69 70 .. class:: EstimatorConstructor 71 72 This is an abstract class; derived classes define call operators 73 that return different probability estimators. The class is 74 callconstructible (i.e., if called with appropriate parameters, 75 the constructor returns a probability estimator, not a probability 76 estimator constructor). 77 78 The call operator can accept an already computed distribution of 79 classes or a list of examples or both. 80 81 .. method:: __call__([distribution[, apriori]], [examples[,weightID]]) 82 83 If distribution is given, it can be followed by apriori class 84 distribution. Similarly, examples can be followed by with 85 the ID of meta attribute with example weights. (Hint: if you 86 want to have examples and a priori distribution, but don't have 87 distribution ready, just pass None for distribution.) When both, 88 distribution and examples are given, it is up to constructor to 209 If value is given, return the probability of the value. 210 211 :rtype: float 212 213 If the value is omitted, an attempt is made 214 to return a distribution of probabilities for all values. 215 216 :rtype: :class:`~Orange.statistics.distribution.Distribution` 217 (usually :class:`~Orange.statistics.distribution.Discrete` for 218 discrete and :class:`~Orange.statistics.distribution.Continuous` 219 for continuous) or :obj:`NoneType` 220 221 .. class:: ConditionalEstimatorConstructor 222 223 Constructor of a conditional probability estimator. 224 225 .. method:: __call__([table[, apriori]], [instances[, weight_id]]) 226 227 :param table: input distribution. 228 :type table: :class:`Orange.statistics.contingency.Table` 229 230 :param apriori: prior distribution. 231 :type distribution: :class:`~Orange.statistics.distribution.Distribution` 232 233 :param instances: input data. 234 :type distribution: :class:`Orange.data.Table` 235 236 :param weight_id: ID of the weight attribute. 237 :type weight_id: int 238 239 If distribution is given, it can be followed by prior class 240 distribution. Similarly, instances can be followed by with 241 the ID of meta attribute with instance weights. (Hint: to pass a 242 prior distribution and instances, but no distribution, 243 just pass :obj:`None` for the latter.) When both, 244 distribution and instances are given, it is up to constructor to 89 245 decide what to use. 90 91 246 92 247 .. class:: ConditionalEstimator … … 95 250 conditional probabilities. 96 251 97 .. method:: __call__([[Value,] ConditionValue]) 98 99 When given two values, it returns a probability of 100 p(ValueCondition) (as float). When given only one value, 101 it is interpreted as condition; the estimator returns a 102 :class:`~Orange.statistics.distribution.Distribution` with 103 probabilities p(vCondition) for each possible value v. When 104 called without arguments, it returns a :class:`Orange.statistics.contingency.Table` 105 matrix containing probabilities p(vc) for each possible value 106 and condition; condition is used as outer variable. 252 .. method:: __call__([[value,] condition_value]) 253 254 When given two values, it returns a probability of :math:`p(valuecondition)`. 255 256 :rtype: float 257 258 When given only one value, it is interpreted as condition; the estimator 259 attempts to return a distribution of conditional probabilities for all 260 values. 261 262 :rtype: :class:`~Orange.statistics.distribution.Distribution` 263 (usually :class:`~Orange.statistics.distribution.Discrete` for 264 discrete and :class:`~Orange.statistics.distribution.Continuous` 265 for continuous) or :obj:`NoneType` 266 267 When called without arguments, it returns a 268 matrix containing probabilities :math:`p(valuecondition)` for each 269 possible :math:`value` and :math:`condition` (a contingency table); 270 condition is used as outer 271 variable. 272 273 :rtype: :class:`Orange.statistics.contingency.Table` or :obj:`NoneType` 107 274 108 275 If estimator cannot return precomputed distributions and/or 109 contingencies, it returns None. 110 111 .. class:: ConditionalEstimatorConstructor 112 113 A counterpart of :class:`EstimatorConstructor`. It has 114 similar arguments, except that the first argument is not a 115 :class:`~Orange.statistics.distribution.Distribution` but 116 :class:`Orange.statistics.contingency.Table`. 117 118 119 Abstract and supporting classes 120 =============================== 121 122 There are several abstract classes that simplify the actual classes 123 for probability estimation. 276 contingencies, it returns :obj:`None`. 277 278 Common Components 279 ================= 124 280 125 281 .. class:: EstimatorFromDistribution 126 282 283 Bases: :class:`Estimator` 284 285 Probability estimator constructors that compute probabilities for all 286 values in advance return this estimator with calculated 287 quantities in the :obj:`probabilities` attribute. 288 127 289 .. attribute:: probabilities 128 290 129 291 A precomputed list of probabilities. 130 292 131 There are many estimator constructors that compute 132 probabilities of classes from frequencies of classes 133 or from list of examples. Probabilities are stored as 134 :class:`~Orange.statistics.distribution.Distribution`, and 135 :class:`EstimatorFromDistribution` is returned. This is done for 136 estimators that use relative frequencies, Laplace's estimation, 137 mestimation and even estimators that compute continuous 138 distributions. 139 140 When asked about probability of certain value, the estimator 141 returns a corresponding element of :obj:`probabilities`. Note that 142 when distribution is continuous, linear interpolation between two 143 points is used to compute the probability. When asked for a complete 144 distribution, it returns a copy of :obj:`probabilities`. 293 .. method:: __call__([value]) 294 295 If value is given, return the probability of the value. For discrete 296 variables, every value has an entry in the :obj:`probabilities` 297 attribute. For continuous variables, a linear interpolation between 298 two nearest points is used to compute the probability. 299 300 :rtype: float 301 302 If the value is omitted, a copy of :obj:`probabilities` is returned. 303 304 :rtype: :class:`~Orange.statistics.distribution.Distribution` 305 (usually :class:`~Orange.statistics.distribution.Discrete` for 306 discrete and :class:`~Orange.statistics.distribution.Continuous` 307 for continuous). 145 308 146 309 .. class:: ConditionalEstimatorFromDistribution 147 310 311 Bases: :class:`ConditionalEstimator` 312 313 Probability estimator constructors that compute the whole 314 contingency table (:class:`Orange.statistics.contingency.Table`) of 315 conditional probabilities in advance 316 return this estimator with the table in the :obj:`probabilities` attribute. 317 148 318 .. attribute:: probabilities 149 319 150 A precomputed list of probabilities 151 152 This counterpart of :class:`EstimatorFromDistribution` stores 153 conditional probabilities in :class:`Orange.statistics.contingency.Table`. 154 155 .. class:: ConditionalEstimatorByRows 156 157 .. attribute:: estimator_list 158 159 A list of estimators; one for each value of 160 :obj:`Condition`. 161 162 This conditional probability estimator has different estimators for 163 different values of conditional attribute. For instance, when used 164 for computing p(cA) in naive Bayesian classifier, it would have 165 an estimator for each possible value of attribute A. This does not 166 mean that the estimators were constructed by different constructors, 167 i.e. using different probability estimation methods. This class is 168 normally used when we only have a probability estimator constructor 169 for unconditional probabilities but need to construct a conditional 170 probability estimator; the constructor is used to construct estimators 171 for subsets of original example set and the resulting estimators 172 are stored in :class:`ConditionalEstimatorByRows`. 320 A precomputed contingency table. 321 322 .. method:: __call__([[value,] condition_value]) 323 324 For detailed description of handling of different combinations of 325 parameters, see the inherited :obj:`ConditionalEstimator.__call__`. 326 For behaviour with continuous variable distributions, 327 see the unconditional counterpart :obj:`EstimatorFromDistribution.__call__`. 173 328 174 329 .. class:: ConditionalByRows 175 330 331 Bases: :class:`ConditionalEstimator` 332 176 333 .. attribute:: estimator_constructor 177 334 178 335 An unconditional probability estimator constructor. 179 336 180 This class computes a conditional probability estimator using337 Computes a conditional probability estimator using 181 338 an unconditional probability estimator constructor. The result 182 339 can be of type :class:`ConditionalEstimatorFromDistribution` … … 184 341 constructor. 185 342 186 The class first computes contingency matrix if it hasn't been 187 computed already. Then it calls :obj:`estimator_constructor` 188 for each value of condition attribute. If all constructed 189 estimators can return distribution of probabilities 190 for all classes (usually either all or none can), the 191 :class:`~Orange.statistics.distribution.Distribution` are put in 192 a contingency, and :class:`ConditionalEstimatorFromDistribution` 193 is constructed and returned. If constructed estimators are 194 not capable of returning distribution of probabilities, 195 a :class:`ConditionalEstimatorByRows` is constructed and the 196 estimators are stored in its :obj:`estimator_list`. 197 198 199 Concrete probability estimators and constructors 200 ================================================ 201 202 .. class:: RelativeFrequency 203 204 Computes relative frequencies of classes, puts it into a Distribution 205 and returns it as :class:`EstimatorFromDistribution`. 206 207 .. class:: Laplace 208 209 Uses Laplace estimation to compute probabilities from frequencies 210 of classes. 211 212 .. math:: 213 214 p(c) = (Nc+1) / (N+n) 215 216 where Nc is number of occurences of an event (e.g. number of examples 217 in class c), N is the total number of events (examples) and n is 218 the number of different events (classes). 219 220 The resulting estimator is again of type 221 :class:`EstimatorFromDistribution`. 222 223 .. class:: M 224 225 .. attribute:: m 226 227 Parameter for mestimation 228 229 Uses mestimation to compute probabilities from frequencies of 230 classes. 231 232 .. math:: 233 234 p(c) = (Nc+m*ap(c)) / (N+m) 235 236 where Nc is number of occurences of an event (e.g. number of examples 237 in class c), N is the total number of events (examples) and ap(c) 238 is the apriori probability of event (class) c. 239 240 The resulting estimator is of type :class:`EstimatorFromDistribution`. 241 242 .. class:: Kernel 243 244 .. attribute:: min_impact 245 246 A requested minimal weight of a point (default: 0.01); points 247 with lower weights won't be taken into account. 248 249 .. attribute:: smoothing 250 251 Smoothing factor (default: 1.144) 252 253 .. attribute:: n_points 254 255 Number of points for the interpolating curve. If negative, say 3 256 (default), 3 points will be inserted between each data points. 257 258 Useful for continuous distributions, this constructor computes 259 probabilities for certain number of points using Gaussian 260 kernels. The resulting pointwise continuous distribution is stored 261 as :class:`~Orange.statistics.distribution.Continuous` and returned 262 in :class:`EstimatorFromDistribution`. 263 264 The points at which probabilities are computed are determined 265 like this. Probabilities are always computed at all points that 266 are present in the data (i.e. the existing values of the continuous 267 attribute). If :obj:`n_points` is positive and greater than the 268 number of existing data points, additional points are inserted 269 between the existing points to achieve the required number of 270 points. Approximately equal number of new points is inserted between 271 each adjacent existing point each data points. 272 273 .. class:: Loess 274 275 .. attribute:: window_proportion 276 277 A proportion of points in a window. 278 279 .. attribute:: n_points 280 281 Number of points for the interpolating curve. If negative, say 3 282 (default), 3 points will be inserted between each data points. 283 284 This method of probability estimation is similar to 285 :class:`Kernel`. They both return a curve computed at certain number 286 of points and the points are determined by the same procedure. They 287 differ, however, at the method for estimating the probabilities. 288 289 To estimate probability at point ``x``, :class:`Loess` examines a 290 window containing a prescribed proportion of original data points. The 291 window is as simetric as possible; the number of points to the left 292 of ``x`` might differ from the number to the right, but the leftmost 293 point is approximately as far from ``x`` as the rightmost. Let us 294 denote the width of the windows, e.g. the distance to the farther 295 of the two edge points, by ``h``. 296 297 Points are weighted by bicubic weight function; a weight of point 298 at ``x'`` is :math:`(1t^3)^3`, where ``t`` is 299 :math:`(xx'>)/h`. 300 301 Probability at point ``x`` is then computed as weighted local 302 regression of probabilities for points in the window. 303 304 .. class:: ConditionalLoess 305 306 .. attribute:: window_proportion 307 308 A proportion of points in a window. 309 310 .. attribute:: n_points 311 312 Number of points for the interpolating curve. If negative, say 3 313 (default), 3 points will be inserted between each data points. 314 315 Constructs similar estimator as :class:`Loess`, except that 316 it computes conditional probabilites. The result is of type 317 :class:`ConditionalEstimatorFromDistribution`. 343 .. method:: __call__([table[, apriori]], [instances[, weight_id]], estimator) 344 345 :param table: input distribution. 346 :type table: :class:`Orange.statistics.contingency.Table` 347 348 :param apriori: prior distribution. 349 :type distribution: :class:`~Orange.statistics.distribution.Distribution` 350 351 :param instances: input data. 352 :type distribution: :class:`Orange.data.Table` 353 354 :param weight_id: ID of the weight attribute. 355 :type weight_id: int 356 357 :param estimator: unconditional probability estimator constructor. 358 :type estimator: :class:`EstimatorConstructor` 359 360 Compute contingency matrix if it has not been computed already. Then 361 call :obj:`estimator_constructor` for each value of condition attribute. 362 If all constructed estimators can return distribution of probabilities 363 for all classes (usually either all or none can), the 364 :class:`~Orange.statistics.distribution.Distribution` instances are put 365 in a contingency table 366 and :class:`ConditionalEstimatorFromDistribution` 367 is constructed and returned. If constructed estimators are 368 not capable of returning distribution of probabilities, 369 a :class:`ConditionalEstimatorByRows` is constructed and the 370 estimators are stored in its :obj:`estimator_list`. 371 372 :rtype: :class:`ConditionalEstimatorFromDistribution` or :class:`ConditionalEstimatorByRows` 373 374 .. class:: ConditionalEstimatorByRows 375 376 Bases: :class:`ConditionalEstimator` 377 378 A conditional probability estimator constructors that itself uses a series 379 of estimators, one for each possible condition, 380 stored in its :obj:`estimator_list` attribute. 381 382 .. attribute:: estimator_list 383 384 A list of estimators; one for each value of :obj:`condition`. 385 386 .. method:: __call__([[value,] condition_value]) 387 388 Uses estimators from :obj:`estimator_list`, 389 depending on given `condition_value`. 390 For detailed description of handling of different combinations of 391 parameters, see the inherited :obj:`ConditionalEstimator.__call__`. 318 392 319 393 """ … … 331 405 from Orange.core import ConditionalProbabilityEstimator_FromDistribution as ConditionalEstimatorFromDistribution 332 406 from Orange.core import ConditionalProbabilityEstimator_ByRows as ConditionalEstimatorByRows 407 from Orange.core import ConditionalProbabilityEstimatorConstructor as ConditionalEstimatorConstructor 333 408 from Orange.core import ConditionalProbabilityEstimatorConstructor_ByRows as ConditionalByRows 334 409 from Orange.core import ConditionalProbabilityEstimatorConstructor_loess as ConditionalLoess
Note: See TracChangeset
for help on using the changeset viewer.