## Adjusted p-values in SPSS and R

Howell DC (1992, 1997, 2002) describes various ways of adjusting to uncorrected p-values based on comparing all possible pairs of repeated measures group means (see here). Howell, in particular, recommends and describes using the SPSS macros rmpost1.sps and rmpost2.sps first written by David Nichols of SPSS which uses the Bonferroni correction to control type I error when performing multiple t-tests between the groups in repeated measures.

This syntax then be run using the syntax below and substituting as many variable names as needed for the repeated measures levels called 'Reading', 'memory', 'attentin' and 'speech' in the below.

include rmpostb.sps. rmpost var=Reading memory attentin speech /alpha = .05. Execute.

An adapted form of an SPSS macro which additionally also does Holm and Sidak variants is also written below. In particular Holm-Bonferroni method is recommended for multiple testing of several correlations from the same matrix by Larzelere and Mulaik (1977) and Howell (2002;pages 388-390). Other work (see here) suggests the Holm-Bonferroni may be used for correlation matrices less than 15 by 15 in size. There are, however, problems using Bonferroni methods and so a Holm-Sidak approach is available outputted as *downsidak* in the macro below. Both Holm-Bonferroni (recommended here) and Holm-Sidak *stepdown* methods, and a method gaining popularity in imaging studies, the FDR method, may also be performed using a spreadsheet or with R. Klockars, Hancock and McAweeney (1995) show that Holm procedures which use different (weighted) significance levels for observed p-values have greater power to detect a variety of post-hoc differences than the Bonferroni approach which uses the same (unweighted) cut-off for significance for all the p-values. The p.adjust procedure in R adjusts a set of p-values for a variety of methods.

Lix and Sajobi (2010) say that the above FDR approach (Benjamini and Hochberg, 1995) is more powerful than both the Bonferroni method and that of Hochberg (1988) particularly as the number of tests increases and, also, controls the familywise error rate 'in a weak sense'. They recommend the Hochberg (1988) method for use in comparing post-hoc tests in repeated measures since it has good power and indirectly, therefore, also the FDR since it is more powerful.

Keselman et al (2011, 2012) compare FDR with a range of methods which control the family wise error rate (termed k-FWER) capping the chance of making no more than k false rejections at 5% and find FDR and the Sarkar method as more powerful. Note: Holm's method fitted below corresponds to 1-FWER allowing no more than one false rejection.

Note that Howell DC (1997, p.351) states that it is not necessary to adjust for post-hoc tests (or even to have an overall statistically significant F test) if you are interested in testing a specific comparison - â€śCurrent thinking and the logic behind most of our post-hoc tests, however does not require overall significance before making specific comparisons".

There is also a *multcompare* procedure in MATLAB which compares all pairs of group means using the Tukey-Kramer test.

References

Benjamini Y & Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. *Journal of the Royal Statistical Society* **57** 289-300.

Howell (1997) Statistical methods for psychology. Fourth Edition. Wadsworth:Belmont,CA.

Howell (2002) Statistical methods for psychology. Fifth Edition. Wadsworth:Pacific Grove:CA.

Keselman HJ, Miller CW and Holland B (2011) Many tests of significance: new methods for controlling type I errors. *Psychological Methods* **16(4)** 420-431. Note: Errata in the R codes listed in this paper for evaluating commonly used post-hoc tests are given in Keselman (2012).

Keselman HJ, Miller CW and Holland B (2012) Many tests of significance: new methods for controlling type I errors: correction to Keselman et al. (2011). *Psychological Methods* **17(4)** 679.

Klockars AJ, Hancock GR, McAweeney MJ (1995) Power of Unweighted and Weighted Versions of Simultaneous and Sequential Multiple-Comparison Procedures *Psychological Bulletin* 300-307.

Larzelere RE and Mulaik SA (1977) Single-sample tests for many correlations. *Psychological Bulletin* **84** 557-569.

Lix LM and Sajobi T (2010) Testing multiple outcomes in repeated measures designs. *Psychological Methods* **15(3)** 268-280.

[COPY AND PASTE THE BOX SYNTAX INTO A SPSS SYNTAX WINDOW; SELECT ALL AND RUN. EDIT THE INPUT DATA AS REQUIRED]

* enter a column of pvalues and this macro will * adjust for the number * in the column. The Ryan and Einot and Gabriel * methods are for pairwise * comparisons of group locations (e.g. means, * mean ranks) with a step size of abs(j - i)+ 1 * where the higher of the two means has an * overall rank of j and the lower overall * rank, i. * SPSS uses REGWQ to compute this for pairwise * comparison of group means in univariate * for between subs factors * Could be applied to p-values from ANY * procedure e.g. nonparametrics as just uses * p-value and number of comparisons * Create a dataset with all uncorrected * p-values and * step = abs(difference in ranks of group * locations) + 1. * adjust data input below as required. * If interested ONLY in Holm and Sidak methods * put step = 1 for all inputted p-values. * The program creates a file called temp.sps * in My Documents folder which may be deleted * after running the macro * -99 in the output for Holm and Sidak * procedures indicates the pairwise comparison * is not tested and deemed nonsignificant * because the previous comparison was * nonsignificant (p=0.05, by default) * this may be changed by changing last line * in this box DATA LIST list / PVAL(f9.3) STEP (f2.0). BEGIN DATA 0.266 2 0.139 3 0.016 2 END DATA. set errors=none. set mprint=off. DEFINE PV(PVALUE=!TOKENS(1) /STEP=!TOKENS(1) /ALP=!TOKENS(1)). SORT CASES BY !PVALUE (A) . COMPUTE pos=$CASENUM. * Calculate the number of p values. RANK !PVALUE /n into N. * N contains the number of cases in the file. * make a submacro to be invoked from the syntax. DO IF $CASENUM=1. WRITE OUTFILE 'C:\Documents and Settings\peterw\My Documents\temp.sps' /"DEFINE !nbcases()"n"!ENDDEFINE.". END IF. EXE. INCLUDE FILE='C:\Documents and Settings\peterw\My Documents\temp.sps'. /* The number of cases in the file is now accessible using !nbcases */. COMPUTE bonferr=!PVALUE*!nbcases. IF (bonferr>1) bonferr=1. COMPUTE sidak=1-(1-!PVALUE)**!nbcases. COMPUTE holm=(!nbcases-pos+1)*!PVALUE. IF (LAG(holm,1)>!ALP | LAG(holm,1)=-99) holm=-99. COMPUTE downsidk=1-(1-!PVALUE)**(!nbcases-pos+1). IF (LAG(downsidk,1)>!ALP | LAG(downsidk,1)=-99) downsidk=-99. COMPUTE ryan=!PVALUE*!nbcases/!STEP. IF (ryan>1) ryan=1. COMPUTE eingab=1-(1-!PVALUE)**(!nbcases/!STEP). IF (eingab>1) eingab=1. FORMAT bonferr to downsidk ryan eingab (f7.3). VARIABLE LABELS !PVALUE 'Original' /bonferr '1-step Bonferroni' /sidak '1-step Sidak' /holm 'Step-down Holm`s'/ downsidk 'Step-down Sidak' /ryan 'Ryan' /eingab 'Einot & Gabriel' /STEP 'Step'. EXECUTE. REPORT FORMAT=LIST AUTOMATIC ALIGN(CENTER) /VARIABLES=!PVALUE bonferr sidak holm downsidk !STEP ryan eingab /TITLE "Original and adjusted p-values". !ENDDEFINE. * changing the value of alp to re-specify * significance level PV PVALUE=PVAL STEP=STEP ALP=0.05.