# Can I do an analysis of covariance using regression (including computing covariate adjusted group means)?

You can run an ancova using regression. Just put the group and covariates in as independent variables. Your regression estimate and its standard error, for the group term, is the difference between the two group means adjusted for the covariates.

You can also fit an ANCOVA and useful output such as estimated group regression means (adjusted for a covariate) in SPSS using the GLM univariate procedure found under analyze:General linear Model:univariate:options. SPSS calls the covariate adjusted means *Estimated Marginal Means*. 'Marginal' is used because group means (e.g. for males and females) are computed pooling across the covariate (e.g. using the overall age mean). We remove age differences and end up with (an age pooled one-way layout of) group means which are akin to looking at the (gender) edges or *margins* of a higher order (age by group two-way) table formed by collapsing across rows (e.g. ages ) to get overall column (e.g. gender) means. Chapter 7 of Boniface (1995) gives illustrations of computing ANCOVA adjusted means.

To obtain these covariate adjusted means put the group factor in the display means box (top right) and click the *compare main effect* box directly underneath and run the ancova as normal.

You also get the bonus of a 95% Confidence interval for the covariate adjusted difference in the group means.

Algebraically

$$ \mbox{Y group mean adjusted for x}

= ymean(G) - B (xmean(G) - xmean)$$

where B is the regression coefficient for X with group as the other predictor and Y as the response. Notice from this formula that there will be no adjustment for X if the two groups share the same X mean.

This approach (ANCOVA) is suggested as a means of comparing groups using the regression coefficient, B, to adjust for regression to the mean caused by differing group baseline scores, X (Barnett, van der Pols and Dobson (2004); Vickers and Altman, 2001; Senn, 2011). The reason that this is needed is that people with lower baselines will randomly increase in value at follow-up due to measurement error and those with high baseline values will decrease randomly again due to measurement error. Mean group follow-up scores could, therefore, differ due to the groups having different distributions of baseline values. ANCOVA adjusts for different group baselines and, hence, removes this dangerously arbitrary 'polluting' difference from the comparison of follow-up group mean scores.

The statistical significance of this difference can either be quoted reading off the t or F value for group (with X as the other predictor) from the regression or performing an unpaired t test on N-3 degrees of freedom on the adjusted Y group means using their standard errors (s.es):

The unpaired t statistic on N-3 df for the difference in a pair of adjusted group means =

$$\mbox{Difference in adjusted group means}$$ divided by the square root of $$\mbox{se1}^{2 } $$ + $$\mbox{se2}^{2 } $$

For example suppose we wish to compare the mean difference between actual and predicted feelings (Y) between two equal sized groups (G) of size 7 adjusted for predicted feelings (X). We can fit an ANCOVA in SPSS using the UNIVARIATE procedure (syntax below) and obtain F(1, 11)=3.44, p=0.09.

UNIANOVA DIFFERENCE BY GROUP WITH ESTIMATED /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(GROUP) /CRITERIA = ALPHA(.05) /DESIGN = GROUP ESTIMATED .

The adjusted emotional difference means are also outputted by this syntax (using the /EMMEANS subcommand) and are 3.73 and 1.98 for the two equally sized groups respectively both with a standard error of 0.667. [In fact the group means will *always* have the same standard error if they are equally sized since s.e. = MSE/n(i) where MSE is the Mean Square error obtained from the ANCOVA table]. See Howell(2013, Chapter 16 p.600-604) for a formula for the computation of the s.e. for the difference in a pair of adjusted means (out of a total of five means). This formula has close agreement (s.e.(difference)=sqrt(0.127)=0.36 to the square root of the sum of squared s.es, as described above, outputted by SPSS (s.e.(difference)=sqrt(0.23^{2 }+0.25^{2 })=0.34) in Howell's example on pages 602-3 of his book. In fact using Howell's notation from page 602 the standard error outputted from SPSS for the k-th group equals

s.e (adj group k mean) = MSE (1/n(k) + (meanc(k) - meanc)^{2 } / SSe(c) )$$

with k-th group size n(k), k-th covariate mean meanc(k), overall covariate mean (pooled over all the groups) of meanc, mean square error of ANCOVA, MSE, and error sum of squares in an ANOVA of group with the *covariate* as the response, SSe(c).

The unpaired t test comparing the adjusted difference means gives a value of 1.85 =(3.73-1.98) /sqrt(2 0.667^2). Squaring this value we get 3.44 which corresponds to the F(1,dfe) value given by the ANCOVA. This follows from the fact that the F(1,dfe) in the ANCOVA is obtained by squaring the t statistic, t(dfe), outputted by the t-test. The two approaches are, therefore, equivalent for equal sized groups (as here). In general, the ANCOVA and unpaired t test approaches will give very similar results if the groups are of similar sizes.

For **three** or more groups you have to enter them as **dummy** variables into the regression. These need to be added manually if using the linear regression procedure.

The GLM Univariate method, on the other hand, will create and fit these dummy variables all for you so saving you the effort of doing a regression. See also the GLM Graduate Statistics Talk.

In addition Searle, Hudson and Federer (1982) and SPSS (1990, p.796) illustrate using the SPSS MANOVA (not GLM!) procedure to test for heterogeneity of covariate slopes in each group. They fit a model which tests for covariate adjusted group differences assuming heterogeneity. This is simply done by adding a covariate by group interaction term. See also Rutherford (1992) p.212: Table 1.

In this model a different slope is used in each group to adjust for the covariate. For a (x,y) in the G-th group we would have

Adjusted y = Unadjusted y - B(G) (x - meanx(G))

All you need to do to fit the above model featuring two slopes (with two intercepts) is to add a covariate by group interaction term to the model (as it is not fitted by default). For example in SPSS you just add the Group*Cov term.

UNIANOVA Y BY GROUP WITH COV /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PRINT = PARAMETER /CRITERIA = ALPHA(.05) /DESIGN = GROUP COV GROUP*COV .

You can also do this from the SPSS menu bar going to **general linear model>univariate>** specifying group as a fixed factor and covariate as a covariate, then **click model>custom model** and put the terms from the LHS 'Factors and Covariates' box into the RHS 'Model' box. To add the interaction term you highlight group (hold the control key down) and click on covariate then add these to the RHS. (This is a little trick that SPSS like to hide from us!) You can get out the regression coefficients by asking for **options> parameter estimates**.

If there is a covariate by group interaction Rutherford (1992) then suggests adjusting for this heterogeneity of group covariate slopes by separately adjusting for each covariate in each group using the formula above and then performing the usual t-test or ANOVA test on the adjusted response means subtracting the number of covariates adjusted for from the error degrees of freedom.

This spreadsheet will work out and test a group difference for a two group ANCOVA with a single covariate, z, at a particular value of the covariate expressed in terms of the number of standard deviations from the covariate mean. This is useful if there is a group by covariate interaction where the differences between the groups depends on z. The spreadsheet could work out the range of z corresponding to statistically difference between a pair of group means. Raw data is entered into the green area with group coded 1 or 2. The value of the covariate at which we wish to evaluate the group difference is entered in the yellow cell.

The spreadsheet uses the formula: Difference between group means at covariate value z = B(group) + B(group x z) z

Note that co-varying out an explanatory variable in a ANCOVA that we would expect to differ between groups is not ideal (Miller and Chapman, 2001).

References

Barnett A. G., van der Pols J. C. and Dobson A. (2005). Regression to the mean: what it is and how to deal with it. *International Journal of Epidemiology* **34** 215-220. An on-line pdf copy is here.

Boniface D. R. (1995). Experiment design and statistical methods for behavioural and social research. Chapman and Hall:London.

Howell, D. C. (2013). Statistical methods for psychology. 8th Edition. International Edition. Wadsworth:Belmont,CA. Chapter 16 gives formulae for differences in a pair of adjusted means.

Miller G.A. and Chapman J. P. (2001) Misunderstanding analysis of covariance. *J Abnorm Psychol* **110(1)** 40-8.

Rutherford A. (1992). Alternatives to traditional analysis of covariance. *British Journal of Mathematical and Statistical Psychology* **45(2)** 197-224.

Searle S. R., Hudson G. F. S. & Federer W. T. (1982). Annotated computer output for analysis of covariance. *The American Statistician* **37** 172-73.

Senn S. (2011) Francis Galton and regression to the mean. *Significance* **8(3)** 124-126.

SPSS Version 4.0 Manual, 1990. SPSS Inc:Chicago, IL.

Vickers A.J. and Altman D. G. (2001). Analysing controlled trials with baseline and follow-up measurements. *British Medical Journal* **323** 1123-1124.