Comparing all pairwise comparisons in a between subjects anova (with a suggested generalization to repeated measures!)
Various authors (Ramsey PH and Ramsey PP (2008); Richter and McCann (2012)) recommend using the Tukey-Kramer procedure to compare all possible group means assuming homogeneity of group variances and allowing different group sizes. It is found to give the best any-pair power if the overall F test is not significant. This procedure is computed for upto 10 means using this spreadsheet. It may also be computed using the mc or mcneq procedures found by typing mc and mcneq at a UNIX prompt. Ramsey and Ramsey (2008) further recommend using the less conservative Hayter-Fisher modification of the Tukey-Kramer procedure for maximizing any-pair power in the presence of a significant overall F value in exploratory studies. This is also computed in the spreadsheet. The top line in output sheet 2 of the spreadsheet is the studentised range statistic, q(r,df) where r is the total number of groups being compared and df is the degrees of freedom of the error term in the one-way ANOVA (see Howell (2002)). The other two lines are the Tukey-Kramer p-value and the Hayter-Fisher modification. Further details about Hayter-Fisher can be found here.
When quoting the results we quote the overall F from the ANOVA and mention which groups differ using Tukey or its variants. For example we might say something such as Using the Tukey-Kramer test there is a significant difference between the tumour diameter for 5-flourouracil and the other two drugs tested, F(5,47) = 10.46, p < .05.
The Games-Howell approach is recommended by Field, 2005 and Howell, 2002 for all pairwise comparisons when there are also heterogeneous group variances in addition to, possibly, unequal group sizes. The Games-Howell approach, as well as all the other post-hoc procedures mentioned on this page, may be computed in SPSS using the GLM:univariate procedure which handles between subject post-hoc comparisons or by using this spreadsheet.
A flow chart detailing issues involved in the choice of post-hoc tests for between subject designs is here. The chart suggests two alternative tests to the Tukey-Kramer mentioned above for groups with homogeneous variances but of different size. Tukey HSD test can be used when the sample sizes are close, enabling the use of the harmonic mean which averages group sizes when they are not too dissimilar. Tukey's test is computed by this spreadsheet which uses the sample size harmonic mean for unequal sample sizes. Tukey's test (see pages 399-400 of Howell (2002)) may also be used for comparing all pairwise group difference based on a single repeated measures factor (Tabachnick and Fidell, 2007). For a worked example of using Tukey's HSD with groups from repeated measures data see here. In this article Howell suggests using stepdown approaches for comparing group means in repeated measures testing. Further details of these methods and their computation may be found here.
Note (just in case you were wondering!): The above spreadsheets quote the, perhaps, more familiar t-statistic rather than the studentised range statistic, q, (which is quoted in the output of e.g. the mc and mcneq UNIX programs at CBSU) although we, equivalently, still use the studentised range statistic for testing. In fact, as Howell (2002, p.392) points out, q and t can be used interchangeably since q = sqrtt.
Field A (2005) Discovering statistics using SPSS. Second edition. Sage:London.
Howell DC (2002) Statistical methods for psychologists. Fifth edition. Wadsworth:Pacific Grove, CA.
Ramsey PH and Ramsey PP (2008) Power of pairwise comparisons in the equal variance and unequal sample size case. British Journal of Mathematical and Statistical Psychology 61(1) 115-131.
Richter SJ and McCann MH (2012) Using the Tukey–Kramer omnibus test in the Hayter–Fisher procedure British Journal of Mathematical and Statistical Psychology 65(3) 499–510.
Tabachnick BG and Fidell LS (2007) Using multivariate statistics. Pearson Educational:Boston,USA.