Diff for "FAQ/CombiningPvalues" - CBU statistics Wiki
location: Diff for "FAQ/CombiningPvalues"
Differences between revisions 27 and 28
Revision 27 as of 2008-05-15 12:15:52
Size: 2861
Comment:
Revision 28 as of 2008-05-15 12:16:40
Size: 2876
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Combining p-values by Fisher's method and Stouffer's method = = Combining p-values by Stouffer's (preferred) and Fisher's (legacy) methods =

Combining p-values by Stouffer's (preferred) and Fisher's (legacy) methods

Combining p-values by Stouffer's method

function pcomb = stouffer(p)
% Stouffer et al's (1949) unweighted method for combination of 
% independent p-values via z's 
    if length(p)==0
        error('pfast was passed an empty array of p-values')
        pcomb=1;
    else
        pcomb = (1-erf(sum(sqrt(2) * erfinv(1-2*p))/sqrt(2*length(p))))/2;
    end

Note the below performs Stouffer's method in R assuming p-values are entered into a vector p e.g. p <- c(0,1,0.2,0.01).

erf <- function(x) 2 * pnorm(2 * x/ sqrt(2)) - 1
erfinv <- function(x) qnorm( (x+1)/2 ) / sqrt(2)
pcomb <- function(p) (1-erf(sum(sqrt(2) * erfinv(1-2*p))/sqrt(2*length(p))))/2
pl <- NA
pl <- length(p)
{ if (is.na(pl)) { res <- "There was an empty array of p-values"} 
else 
res <- pcomb(p) }
print(res)

A [attachment:combinedp.xls spreadsheet] can also be used to compute Fisher's and Stouffer's combined p.

Combining p-values by Fisher's method

The basic idea is that if $$p_i (i=1 \ldots n)$$ are the one-sided $$p$$-values for $$n$$ independent statistics then $$-2 \sum\log(p_i)$$ is a $$\chi^2(2n)$$ statistic which reflects whether the combined $$p$$-values are smaller than would be expected if they were Uniform(0,1) variates.

The following MATLAB code evaluates this statistic and its p-value.

function p = pfast(p)
% Fisher's (1925) method for combination of independent p-values
% Code adapted from Bailey and Gribskov (1998)
    product=prod(p);
    n=length(p);
    if n<=0
        error('pfast was passed an empty array of p-values')
    elseif n==1
        p = product;
        return
    elseif product == 0
        p = 0;
        return
    else
        x = -log(product);
        t=product;
        p=product;
        for i = 1:n-1
            t = t * x / i;
            p = p + t;
        end
    end  

Let's try it out:

>> pvals=[0.1 0.01 0.01 0.7 0.3 0.1];
>> pfast(pvals)

ans =

    0.0021

I.e. the combined p-value is 0.0021 for this array of 6 $$p$$-values.

Further investigations suggest that Fisher's method has inappropriate behaviour. [examples to be included]

This method may also be performed using [:FAQ/Rfishp: R code.]

References

Bailey TL, Gribskov M (1998). Combining evidence using p-values: application to sequence homology searches. Bioinformatics, 14 (1) 48-54.

Fisher RA (1925). Statistical methods for research workers (13th edition). London: Oliver and Boyd.

Stouffer, Samuel A., Edward A. Suchman, Leland C. DeVinney, Shirley A. Star, and Robin M. Williams, Jr. (1949) Studies in Social Psychology in World War II: The American Soldier. Vol. 1, Adjustment During Army Life. Princeton: Princeton University Press.

None: FAQ/CombiningPvalues (last edited 2015-09-15 14:38:28 by PeterWatson)