Changeset 1889:2ff9da5aed53 in orange-bioinformatics for orangecontrib/bio/obiProb.py


Ignore:
Timestamp:
10/15/13 12:03:11 (6 months ago)
Author:
markotoplak
Branch:
default
rebase_source:
4ad8392b085eabc1a632d544e7017e8668a82706
Message:

Attributes m and n in Hypergeometric test can now be swapped and results do not change. The implementation is more resistant to rounding errors.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • orangecontrib/bio/obiProb.py

    r1873 r1889  
    8787 
    8888    def p_value(self, k, N, m, n): 
    89         subtract = n - k + 1 > k 
    90 ##        result = sum([math.exp(self._logbin(m, i) + self._logbin(N - m, n - i)) for i in (range(k) if subtract else range(k, n+1))]) 
    91         result = sum([self.__call__(i, N, m, n) for i in (range(k) if subtract else range(k, n+1))]) 
    92 ##        result /= math.exp(self._logbin(N, n)) 
    93         return max(1.0 - result if subtract else result, 0.0) 
     89        #From wolfram alpha: 
     90        #1-CDF[HypergeometricDistribution[80, 50, 2000], 20-1] = 1.75695e-16 
     91        #1-CDF[HypergeometricDistribution[80, 50, 2000], 40-1] = 9.92008e-52 
    9492 
     93        if min(n,m) - k + 1 <= k: 
     94            #starting from k gives the shorter list of values 
     95            return sum(self.__call__(i, N, m, n) for i in range(k, min(n,m)+1)) 
     96        else: 
     97            value = 1.0 - sum(self.__call__(i, N, m, n) for i in (range(k))) 
     98            #if the value is small it is probably inexact due to the limited 
     99            #precision of floats, as for example  (1-(1-10e-20)) -> 0 
     100            #if so, compute the result without substraction 
     101            if value < 1e-3: #arbitary threshold 
     102                #print "INEXACT", value, sum(self.__call__(i, N, m, n) for i in range(k, min(n,m)+1)) 
     103                return sum(self.__call__(i, N, m, n) for i in range(k, min(n,m)+1)) 
     104            else: 
     105                return value 
    95106 
    96107## to speed-up FDR, calculate ahead sum([1/i for i in range(1, m+1)]), for m in [1,100000]. For higher values of m use an approximation, with error less or equal to 4.99999157277e-006. (sum([1/i for i in range(1, m+1)])  ~ log(m) + 0.5772..., 0.5572 is an Euler-Mascheroni constant)  
Note: See TracChangeset for help on using the changeset viewer.