FAQ/rxxy_correction - CBU statistics Wiki

Revision 18 as of 2011-01-25 16:01:00

Clear message
location: FAQ / rxxy_correction

Correcting r(x,x-y) for bias

Consider a baseline score x and a later score on the same test, y. The correlation, r(x,x-y), between a baseline score and the difference score x-y (baseline score with change from baseline) is biased to be considerably greater than zero. Tu and Gilthorpe (2007) among others illustrate that for two independent random variables, x and y with the same variance the correlation between x and the x-y difference is approximately, not zero as one might expect, but 0.71 (= the reciprocal of root 2). Indeed they show that the x,x-y correlation will always fall between 0 and 0.71 if x and y have the same variance.

This positive bias between x and x-y is caused by y only appearing in one of the terms being correlated which means an unwanted covariance between baseline x and the retest, y, is added into the correlation since x*(x-y) = x*x - x*y. This biasing due to x appearing in both terms being correlated is a problem known as mathematical coupling. It follows that the correlation between x and y-x is biased negatively from zero which intuitively says that the smaller baseline scores tend to have larger increases.

There is also a problem caused by measurement error which may 'throw' scores higher or lower than their true value meaning that by chance the next sampled scores could be lower or higher respectively. This process where scores are falsely made more extreme by measurement error leading to less extreme successive scores is an illustration of regression to the mean which can give false differences in score variances due to errors in measurement. In practice it is not always easy to get an idea of the amount, if any, of measurement error and so solutions such as those mentioned below usually assume equality of measurement error in x and y.

Most authors suggest comparing the variances of x and y since the variances at later times would be expected to be smaller as scores 'bunch up' if there is a relationship between x and x-y. So a difference in variances in scores x and y will illustrate a relationship between baseline score and change. Variances of x and y do not suffer from the bias problem associated with the correlation between x and x-y. Myrtek and Foerster (1986) and Maloney and Rastogi (1970) propose a t-test that compares the variances of x and y. They only differ in that the former assumes that the baseline variance is greater than the retest (one-tailed test). The t-test takes into account measurement errors by assuming they are equal in x and y.

Tu and Gilthorpe (2007) show, rather surprisingly, that the comparison of x and y variances is equivalent to testing the size of the Pearson correlation between x-y and x+y. Both these authors, therefore, recommend using r(x-y,x+y) for testing if baseline influences change score rather than r(x,x-y). This rather curious result stems from the fact that y appears in both terms being correlated with opposite signs meaning the problematic biasing covariance term that was present in r(x,x-y) now cancels out since (x-y)(x+y) = x*x - x*y + x*y - y*y = x*x - y*y.

The t-test can be done by firstly computing x-y and x+y and then using the correlation or regression procedures in SPSS (or other software). Alternatively you can simply enter the (baseline) x and (retest) y values (upto 200 pairs) into this [attachment:rxx-r.xls spreadsheet] which computes the correlation testing a difference in x and y variances and the p-value for the t-test of this difference.

As an example we generated 10 cases at random each consisting of two variables x and y independently and randomly sampled from Normal(5,7) and (7,7) distributions respectively. We would expect that there is no relation between baseline and change score since the true x and y variances are equal. The correlation between x and x-y is, however, a biased 0.60 (p=0.06) whereas the x-y,x+y correlation which removes the bias is only 0.16 (p=0.66) as expected.

Other authors follow another tack. For example Tu et al. (2005) proposed a test of the x,x-y correlation which which requires the x and x-y variances to be equal which will not be the case in general.

References

Maloney, C. J. and Rastogi, S. C. (1970). Significance test for Grubb's estimators. Biometrics 26 671-676.

Myrtek, M. and Foerster, F. (1986). The law of initial value: a rare exception. Biological Psychology 22 227-237.

Tu, Y-K., Baelum, V. and Gilthorpe, M. S. (2005). The relationship between baseline value and its change: problems in categorisation and the proposal of a new method. European Journal of Oral Sciences 113 279-288.

Tu, Y-K. and Gilthorpe, M. S. (2007). Revisiting the relation between change and initial value: A review and evaluation. Statistics in Medicine 26 443-457. A pdf copy is available [http://dionysus.psych.wisc.edu/lit/Topics/Statistics/RegressionToMean/tu_RegressionToTheMean_SiM2007.pdf here (via webpage)] or [attachment:rxxmy.pdf here (pdf file).] This article gives an overview of methods testing the x,x-y correlation.