(written by Leon Reteig of the Department of Psychology, University of Amsterdam, January 2019)
corr_change_baseline <- function(X,Y) { # Performs a (two-tailed) test for equal variances in the two input vectors. # This is one way to do an unbiased assesment of the correlation between baseline and change from baseline (retest - baseline). # In this case, X would be the baseline, and Y the retest. # # Returns a list of 4 values, following Maloney & Rastogi (1970) # -r: correlation of X-Y vs. X+Y. Assessing significance of this correlation is equivalent to a variance test # -t: t-value constructed from the r-value # -df: degrees of freedom for "r" and "t" # -p: p-value for the variance test. If significant, this means the two vectors do not have equal variances, # suggesting presence of a correlation between baseline and change from baseline. # # This implementation was inspired by and checked against an Excel spreadsheet, available at: # http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/rxxy_correction # This page is a great resource for further background on the issue of variance tests / correlations # # The relevant methods are contained in the following papers: # # Jin, P. (1992). Toward a reconceptualization of the law of initial value. Psychological Bulletin 111(1) 176-184. # Maloney, C. J. and Rastogi, S. C. (1970). Significance test for Grubb's estimators. Biometrics 26 671-676. # Myrtek, M. and Foerster, F. (1986). The law of initial value: a rare exception. Biological Psychology 22 227-237 # Tu, Y-K. and Gilthorpe, M. S. (2007). Revisiting the relation between change and initial value: A review and evaluation. # Statistics in Medicine 26 443-457 # This approach of testing the correlation correlation of X-Y and X+Y # is equivalent to testing the variance of X vs. the variance of Y (Tu & Gilthorpe, 2007) r_cor <- cor(X - Y, X + Y) n <- length(X) # number of observations df <- n - 2 # degrees of freedom for r and t t_MR_1970 <- r_cor*sqrt(df) / sqrt(1 - r_cor^2) # t-value proposed by Maloney & Rastogi (1970); equation taken from Tu & Gilthorpe (2007) p_MR_1970 <- 2*pt(-abs(t_MR_1970), df) # p-value from t-value (for a two-sided test) return(list(r = r_cor, df = df, t = t_MR_1970, p = p_MR_1970)) # Myrtek & Foerster (1986) proposed a different test. The approach is equivalent to Maloney & Rastogi (1970) and yields the same p-value. # It is only included here for the sake of completeness. B_e <- ( var(Y) - var(X) + sqrt((var(Y) - var(X))^2 + 4*(cov(X,Y))^2) ) / (2*cov(X,Y)) # structural relation, from Jin (1992) # N.B. this is a simplifaction of the equation in the paper, as "cor(X,Y) * sd(X) * sd(Y)" # simply reduces to "cov(X,Y)". The formula as literally in Jin (1992): # B_e_paper <- ( var(Y) - var(X) + sqrt((var(Y) - var(X))^2 + 4*(cor(X,Y) * sd(X) * sd(Y))^2) ) / (2*cor(X,Y) * sd(X) * sd(Y)) # This simplification is also used in the formula below. # t-value proposed by Myrtek & Foerster, 1986; equation taken from from Jin (1992) t_MF_1986 <- sqrt( df * sin(2*(atan(B_e) - atan(1)))^2 * ( (1/4) * (var(X) - var(Y))^2 + cov(X,Y)^2 ) / ( var(X) * var(Y) - cov(X,Y)^2) ) p_MF_1986 <- 2*pt(-abs(t_MF_1986), df) #p-value from t-value (for a two-sided test) # return(list(df = df, t = t_MF_1986, p = p_MF_1986)) # the t- and p-values are identical to t_MR_1970 and p_MR_1970, so only return those }
Example using the above R function (given in spreadsheet on preceding page)
library(foreign) > dat2 <- data.frame(dat) > X <- dat2[,1] > Y <- dat2[,2] > X [1] 1.05842 2.43411 -0.51272 0.96959 4.00000 5.00000 2.00000 3.00000 4.00000 5.00000 6.00000 7.00000 3.74174 [14] -1.12423 1.67558 2.57155 2.81140 4.79701 -0.31238 3.12158 2.45195 2.07938 5.03972 1.72247 1.06231 -0.99752 [27] 2.66932 -4.12602 2.81685 2.97308 3.13575 0.46938 0.46938 > Y [1] 3.41853 0.87283 2.76419 5.36829 2.15316 3.47944 -0.66727 -0.07276 -0.06596 1.54776 2.52074 -0.52881 5.48375 [14] 0.59687 -0.56258 2.86513 2.47741 -0.01483 4.12124 1.63788 0.88078 4.90384 3.36825 2.04285 3.68688 5.04538 [27] 1.52683 -1.26871 0.83673 3.58103 1.03052 -1.08757 -1.08757
This yields the same answer as the spreadsheet
> corr_change_baseline(X,Y) $r [1] 0.1296376 $df [1] 31 $t [1] 0.7279344 $p [1] 0.4721166