FAQ/comparezt

In some experiments it may be of interest to compare output from statistical tests e.g. across voxels or across subjects (see here for recommended summary measures from repeated measures analyses). It is not usually meaningful, however, to compare statistical significance terms such as t, z or p-values although p-values can be used to adjust for inflated type I error and t-values can be used to compute compute effect sizes.

The reason for this is that we are usually interested in comparing the magnitude of effect sizes such as correlations and regression coefficients across voxels or subjects. Statistical significance tells us nothing about the magnitude of the effect size or the magnitude of differences in effect sizes but rather the ratio of the effect size, or difference in effect sizes, to its variance. For example using a Fisher transformation a Pearson correlation equal to 0.1 from a sample of size 403 will have the same z-score (of 2) as a Pearson correlation of 0.9 from a sample of size 8.

A note about comparing z, t statistics and p-values?