Differences between revisions 38 and 39

Can I do an analysis of covariance using regression?

You can run an ancova using regression. Just put the group and covariates in as independent variables. Your regression estimate and its standard error, for the group term, is the difference between the two group means adjusted for the covariates.

You can also fit an ANCOVA and useful output such as estimated group regression means (adjusted for a covariate) in SPSS using the GLM univariate procedure found under analyze:General linear Model:univariate:options. SPSS calls the covariate adjusted means Estimated Marginal Means. 'Marginal' is used because group means (e.g. for males and females) are computed pooling across the covariate (e.g. using the overall age mean). We remove age differences and end up with (an age pooled one-way layout of) group means which are akin to looking at the (gender) edges or margins of a higher order (age by group two-way) table formed by collapsing across rows (e.g. ages ) to get overall column (e.g. gender) means. Chapter 7 of Boniface (1995) gives illustrations of computing ANCOVA adjusted means.

To obtain these covariate adjusted means put the group factor in the display means box (top right) and click the compare main effect box directly underneath and run the ancova as normal.

You also get the bonus of a 95% Confidence interval for the covariate adjusted difference in the group means.

Algebraically

$$ \mbox{Y group mean adjusted for x} = \bar{y_text{G}} - B (\bar{x_text{G}} - \bar{x})$$

where B is the regression coefficient for X with group as the other predictor and Y as the response. Notice from this formula that there will be no adjustment for X if the two groups share the same X mean.

This approach (ANCOVA) is suggested as a means of comparing groups using the regression coefficient, B, to adjust for regression to the mean caused by differing group baseline scores, X (Barnett, van der Pols and Dobson (2004); Vickers and Altman, 2001; Senn, 2011). The reason that this is needed is that people with lower baselines will randomly increase in value at follow-up due to measurement error and those with high baseline values will decrease randomly again due to measurement error. Mean group follow-up scores could, therefore, differ due to the groups having different distributions of baseline values. ANCOVA adjusts for different group baselines and, hence, removes this dangerously arbitrary 'polluting' difference from the comparison of follow-up group mean scores.

The statistical significance of this difference can either be quoted reading off the t or F value for group (with X as the other predictor) from the regression or performing an unpaired t test on N-3 degrees of freedom on the adjusted Y group means using their standard errors (s.es):

The unpaired t statistic on N-3 df for the difference in a pair of adjusted group means =

$$\frac{\mbox{Difference in adjusted group means}}{\sqrt{\mbox{se1}^{text{2}+\mbox{se2}}text{2}} $$

For example suppose we wish to compare the mean difference between actual and predicted feelings (Y) between two equal sized groups (G) of size 7 adjusted for predicted feelings (X). We can fit an ANCOVA in SPSS using the UNIVARIATE procedure (syntax below) and obtain F(1, 11)=3.44, p=0.09.

UNIANOVA
  DIFFERENCE  BY GROUP WITH ESTIMATED
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /EMMEANS = TABLES(DIFFERENCE) WITH(GROUP=MEAN)
  /CRITERIA = ALPHA(.05)
  /DESIGN = GROUP ESTIMATED .

The adjusted emotional difference means are also outputted by this syntax (using the /EMMEANS subcommand) and are 3.73 and 1.98 for the two equally sized groups respectively both with a standard error of 0.667. [In fact the group means will always have the same standard error if they are equally sized since s.e. = MSE/n(i) where MSE is the Mean Square error obtained from the ANCOVA table].

The unpaired t test comparing the adjusted difference means gives a value of 1.85 =(3.73-1.98) /sqrt(2 0.667^2). Squaring this value we get 3.44 which corresponds to the F(1,dfe) value given by the ANCOVA. This follows from the fact that the F(1,dfe) in the ANCOVA is obtained by squaring the t statistic, t(dfe), outputted by the t-test. The two approaches are, therefore, equivalent for equal sized groups (as here). In general, the ANCOVA and unpaired t test approaches will give very similar results if the groups are of similar sizes.

For three or more groups you have to enter them as dummy variables into the regression. These need to be added manually if using the linear regression procedure.

The GLM Univariate method, on the other hand, will create and fit these dummy variables all for you so saving you the effort of doing a regression. See also the GLM Graduate Statistics Talk.

In addition Searle, Hudson and Federer (1982) and SPSS (1990, p.796) illustrate using the SPSS MANOVA (not GLM!) procedure to test for heterogeneity of covariate slopes in each group. They fit a model which tests for covariate adjusted group differences assuming heterogeneity. This is simply done by adding a covariate by group interaction term. See also Rutherford (1992) p.212: Table 1.

In this model a different slope is used in each group to adjust for the covariate. For a (x,y) in the G-th group we would have

$$ \mbox{Adjusted y} = \mbox{Unadjusted y} - B_text{G} (\bar{x_text{G}} - \mbox{x})$$

All you need to do to fit the above model featuring two slopes (with two intercepts) is to add a covariate by group interaction term to the model (as it is not fitted by default). For example in SPSS you just add the Group*Cov term.

UNIANOVA
  Y  BY GROUP  WITH COV
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /PRINT = PARAMETER
  /CRITERIA = ALPHA(.05)
  /DESIGN = GROUP COV GROUP*COV .

You can also do this from the SPSS menu bar going to general linear model>univariate> specifying group as a fixed factor and covariate as a covariate, then click model>custom model and put the terms from the LHS 'Factors and Covariates' box into the RHS 'Model' box. To add the interaction term you highlight group (hold the control key down) and click on covariate then add these to the RHS. (This is a little trick that SPSS like to hide from us!) You can get out the regression coefficients by asking for options> parameter estimates.

References

Barnett A. G., van der Pols J. C. and Dobson A. (2005). Regression to the mean: what it is and how to deal with it. International Journal of Epidemiology 34 215-220. [attachment:rtm.pdf An on-line pdf copy is here.]

Boniface D. R. (1995). Experiment design and statistical methods for behavioural and social research. Chapman and Hall:London.

Rutherford A. (1992). Alternatives to traditional analysis of covariance. British Journal of Mathematical and Statistical Psychology 45(2) 197-224.

Searle, S. R., Hudson, G. F. S. & Federer, W. T. (1982). Annotated computer output for analysis of covariance. The American Statistician 37 172-73.

Senn S. (2011) Francis Galton and regression to the mean. Significance 8(3) 124-126.

SPSS Version 4.0 Manual, 1990. SPSS Inc:Chicago, IL.

Vickers A.J. and Altman D. G. (2001). Analysing controlled trials with baseline and follow-up measurements. British Medical Journal 323 1123-1124.

-  ⇤ ← Revision 38 as of 2011-09-28 14:59:50 → 
  Size: 7333
  Editor: PeterWatson
  Comment:
+   ← Revision 39 as of 2011-09-29 11:26:41 → ⇥
  Size: 7443
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 17:
-This approach (ANCOVA) is suggested as a means of comparing groups using the regression coefficient, B, to adjust for regression to the mean caused by differing group baseline scores, X (Barnett, van der Pols and Dobson (2004); Vickers and Altman, 2001). The reason that this is needed is that people with lower baselines will randomly increase in value at follow-up due to measurement error and those with high baseline values will decrease randomly again due to measurement error. Mean group follow-up scores could, therefore, differ due to the groups having different distributions of baseline values. ANCOVA adjusts for different group baselines and, hence, removes this dangerously arbitrary 'polluting' difference from the comparison of follow-up group mean scores.
+This approach (ANCOVA) is suggested as a means of comparing groups using the regression coefficient, B, to adjust for regression to the mean caused by differing group baseline scores, X (Barnett, van der Pols and Dobson (2004); Vickers and Altman, 2001; Senn, 2011). The reason that this is needed is that people with lower baselines will randomly increase in value at follow-up due to measurement error and those with high baseline values will decrease randomly again due to measurement error. Mean group follow-up scores could, therefore, differ due to the groups having different distributions of baseline values. ANCOVA adjusts for different group baselines and, hence, removes this dangerously arbitrary 'polluting' difference from the comparison of follow-up group mean scores.
 Line 76:
+Senn S. (2011) Francis Galton and regression to the mean. ''Significance'' '''8(3)''' 124-126.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Can I do an analysis of covariance using regression?