R code for Jin's test of a correlation between baseline, X, and change from baseline, Y-X

(written by Leon Reteig of the Department of Psychology, University of Amsterdam, January 2019)

corr_change_baseline <- function(X,Y) {
  # Performs a (two-tailed) test for equal variances in the two input vectors. 
  # This is one way to do an unbiased assesment of the correlation between baseline and change from baseline (retest - baseline).
  # In this case, X would be the baseline, and Y the retest.
  #
  # Returns a list of 4 values, following Maloney & Rastogi (1970)
  #   -r: correlation of X-Y vs. X+Y. Assessing significance of this correlation is equivalent to a variance test
  #   -t: t-value constructed from the r-value
  #   -df: degrees of freedom for "r" and "t"
  #   -p: p-value for the variance test. If significant, this means the two vectors do not have equal variances,
  #       suggesting presence of a correlation between baseline and change from baseline.
  #
  # This implementation was inspired by and checked against an Excel spreadsheet, available at:
  # http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/rxxy_correction
  # This page is a great resource for further background on the issue of variance tests / correlations
  #
  # The relevant methods are contained in the following papers:
  #
  #   Jin, P. (1992). Toward a reconceptualization of the law of initial value. Psychological Bulletin 111(1) 176-184.
  #   Maloney, C. J. and Rastogi, S. C. (1970). Significance test for Grubb's estimators. Biometrics 26 671-676.
  #   Myrtek, M. and Foerster, F. (1986). The law of initial value: a rare exception. Biological Psychology 22 227-237
  #   Tu, Y-K. and Gilthorpe, M. S. (2007). Revisiting the relation between change and initial value: A review and evaluation. 
  #     Statistics in Medicine 26 443-457
  
  # This approach of testing the correlation correlation of X-Y and X+Y 
  # is equivalent to testing the variance of X vs. the variance of Y (Tu & Gilthorpe, 2007) 
  r_cor <- cor(X - Y, X + Y)
  n <- length(X) # number of observations 
  df <- n - 2 # degrees of freedom for r and t
  
  t_MR_1970 <- r_cor*sqrt(df) / sqrt(1 - r_cor^2) # t-value proposed by Maloney & Rastogi (1970); equation taken from Tu & Gilthorpe (2007)
  p_MR_1970 <- 2*pt(-abs(t_MR_1970), df) # p-value from t-value (for a two-sided test)
  return(list(r = r_cor, df = df, t = t_MR_1970, p = p_MR_1970))
  
  # Myrtek & Foerster (1986) proposed a different test. The approach is equivalent to Maloney & Rastogi (1970) and yields the same p-value.
  # It is only included here for the sake of completeness.
  
  B_e <- ( var(Y) - var(X) + sqrt((var(Y) - var(X))^2 + 4*(cov(X,Y))^2) ) / (2*cov(X,Y)) # structural relation, from Jin (1992)
  # N.B. this is a simplifaction of the equation in the paper, as "cor(X,Y) * sd(X) * sd(Y)"
  # simply reduces to "cov(X,Y)". The formula as literally in Jin (1992):
  # B_e_paper <- ( var(Y) - var(X) + sqrt((var(Y) - var(X))^2 + 4*(cor(X,Y) * sd(X) * sd(Y))^2) ) / (2*cor(X,Y) * sd(X) * sd(Y))
  # This simplification is also used in the formula below.
  
  # t-value proposed by Myrtek & Foerster, 1986; equation taken from from Jin (1992)
  t_MF_1986 <- sqrt( df * sin(2*(atan(B_e) - atan(1)))^2 *
                       ( (1/4) * (var(X) - var(Y))^2 + cov(X,Y)^2 ) / 
                       ( var(X) * var(Y) - cov(X,Y)^2) )
  p_MF_1986 <- 2*pt(-abs(t_MF_1986), df) #p-value from t-value (for a two-sided test)
  # return(list(df = df, t = t_MF_1986, p = p_MF_1986)) # the t- and p-values are identical to t_MR_1970 and p_MR_1970, so only return those
}

Example using the above R function (given in spreadsheet on preceding page)

library(foreign)

> dat2 <- data.frame(dat)
> X <- dat2[,1]
> Y <- dat2[,2]
> X
 [1]  1.05842  2.43411 -0.51272  0.96959  4.00000  5.00000  2.00000  3.00000  4.00000  5.00000  6.00000  7.00000  3.74174
[14] -1.12423  1.67558  2.57155  2.81140  4.79701 -0.31238  3.12158  2.45195  2.07938  5.03972  1.72247  1.06231 -0.99752
[27]  2.66932 -4.12602  2.81685  2.97308  3.13575  0.46938  0.46938
> Y
 [1]  3.41853  0.87283  2.76419  5.36829  2.15316  3.47944 -0.66727 -0.07276 -0.06596  1.54776  2.52074 -0.52881  5.48375
[14]  0.59687 -0.56258  2.86513  2.47741 -0.01483  4.12124  1.63788  0.88078  4.90384  3.36825  2.04285  3.68688  5.04538
[27]  1.52683 -1.26871  0.83673  3.58103  1.03052 -1.08757 -1.08757

This yields the same answer as the spreadsheet

> corr_change_baseline(X,Y)
$r
[1] 0.1296376

$df
[1] 31

$t
[1] 0.7279344

$p
[1] 0.4721166