FAQ/rt1d - CBU statistics Wiki
location: FAQ / rt1d

How do I correlate change score with score at baseline?

To correlate change between two time points with baseline (at time 1, T1) we correlate the score at T1 with the difference in scores between time 2, T2, and T1. This is not the same as correlating T1 score with T2 score or T1 score with the residuals from a regression of T1 score on T2 score.

To see the former consider the following example









The change over time is the same (a difference of 1 unit) regardless of baseline so has no relationship with baseline score. The correlation between the scores at T1 and T2, on the other hand, is 1 because the score at T2 is exactly 1 higher than the score at T1 so is perfectly predicted by T1 score.

The difference score T2 score - T1 score is also different to the residual of a regression using T1 score to predict T2 score. To see this lets consider the regression equation, from which this residual is obtained, which is

T2 score = B T1 score + random error

with the regression coefficient, B, therefore approximated by T2 score/T1 score, the expected ratio of a score at time 2 to a score, on the same individual, at time 1. B times T1 score is an estimate of what an individual should score at time 2 given their time 1 score. So the residual, T2 score - B T1 score, obtained from the regression of T1 score on T2 score represents if an individual's T2 score is 'above' or 'below' the 'average' expected score, obtained using all T1 score, T2 score pairs, based on their score at T1.

The pdf plot given here shows that the higher difference score given in red (a score rising from around 11 at time 1 to 16 at time 2) has a smaller (negative) residual than a smaller difference score given in blue (a score rising from around 5 at time 1 to 8 at time 2) which has a positive residual. This is because although the individual scoring 11 at time 1 increases by a larger amount (rising by around 5 units) than the one scoring 5 at time 1 (who rises by about 3 units) the former goes up by less than would be expected assuming a constant overall increase in scores represented by a constant T2/T1 ratio (in this case T2/T1 is close to 1.5).

Lum, Kabir and Bak-Coleman (2021) show that the correlation between a difference, score y-x and x where x and y are percentages can only take a restricted range of mainly negative values due to the constraints of percentages lying between 0% and 100%.

Aside: In fact the illustrative data in the plot was generated so that T2 = 1.5 T1 + Normal error(mean=0,variance=0.25). The closeness of the linear regression model is related to the size of the variance of the (normally distributed) random error which is assumed constant across all scores. The larger the error variance the worse will be the fit of the linear regression model.

Note: The intuitive way of measuring the influence of baseline score on change from baseline discussed above, namely the correlation of baseline with change from baseline, is criticised by some authors who instead suggest comparing baseline and retest variances. See the link below for further details. Harris RJ (2001, pp.39-40) and Senn (2006) suggest ignoring change from baseline and, instead, correlating baseline with follow-up score. He suggests the latter can more generally be used as an outcome measure in an ANCOVA, or multiple regression, involving baseline score as a predictor. The use of ANCOVAs is also supported by Vickers (2001) who found they had higher power for comparing change between groups than using the average of the differences or percentage change. ANCOVAs are also recommended here which is downloaded from here.


Harris, R.J. (2001). A primer of multivariate statistics. Lawrence Erlbaum:Mahwah,NJ,USA.

Lum, K., Kabir, N. & Bak-Coleman, J. (2021) How to mislead with statistics. Significance 18(4) 30-32.

Senn, S. (2006). Change from baseline and analysis of covariance revisited. Statistics in Medicine 25(24) 4334-44. Powerpoint slides based upon this paper advocating ANCOVA are here.

Vickers, A.J. (2001). The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient:a simulation study. Biomed Central Medical Research Methodology 1 6.

None: FAQ/rt1d (last edited 2021-09-09 07:52:39 by PeterWatson)