Differences between revisions 17 and 30 (spanning 13 versions)

What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome?

Suppose we have a response Y and two continuous predictors such as age of onset (a) and duration of hearing deficit (b-a) with b representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using a and b as predictors.

In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors:

Predicted y = $$B_text{a}$$a + $$B_text{b-a}$$(b-a)

= $$B_text{a}$$a + $$B_text{b-a}$$b - $$B_text{b-a}$$a

= $$(B_text{a}$$ - $$B_text{b-a}$$)a + $$B_text{b-a}$$b

So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that

$$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and

$$B_text{b-a|a,b-a}$$ = $$B_text{b|a,b}$$

In other words subtracting the regression coefficients for a and b-a in a regression using a and b-a as predictor is equivalent to the regression coefficient for a in a regression with a and b as predictors and the regression coefficient for b-a with a and b-a as predictors is the same as the regression coefficient for b in a regression with a as the other predictor.

It also follows that the standard errors of the regression coefficients for a and b respectively can be derived using the standard errors of the regression coefficients for a and b-a.

se($$B_text{a|a,b}$$) = se($$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$) = $$\sqrt{V(B_text{a|a,b-a}) \mbox{ + } V(B_text{b-a|a,b-a}) \mbox{ - } 2\mbox{Cov}(B_text{a|a,b-a},B_text{b-a|a,b-a})}$$ and

se($$B_text{b|a,b}$$) = $$B_text{b-a|a,b-a}$$

Example

For one study involving a response y and variables a and b-a we have regression coefficients (s.es) of 1.170 (0.446) for a and 1.023 (0.399) for b-a.

It follows in a regression involving a and b on the same response the regression (s.e.) of b equals that of b-a in the a, b-a regression, namely 1.023 (0.399).

The regression coefficient for a equals 1.170 - 1.023 = 0.148. Given a covariance of 0.026 between the a and b-a regression coefficients

The se(a) in the regression involving a and b is computed using the s.es and covariance from the regression coefficients in the regression with a and b-a as predictors.

se(a) = $$\sqrt{0.446² + 0.399² - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.

The zero-order correlations have the same t-values as the regression estimates used to obtain them and their zero-order correlations correspond to the signed square root of the change in R-squareds.

Example showing the extraction of zero-order correlations from the above regressions

We can obtain the zero-order correlations of a and b with y from the regressions involving a and the a+b sum and b with the a+b sum by evaluating R-squareds and regression coefficient t-values associated with the 'a+b' sum regression term. These results confirm that the zero-order correlation of a (b) with y can be obtained indirectly from the b (a) scores and the sum of a and b. Examples below are for a randomly generated data set.

The zero-order correlation of b with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'a' predicting y = $$\sqrt{0.066-0.050}$$ = -0.12 where the R-squared of 0.066 corresponds to a regression on y using 'a' and 'a+b' as predictors of y and 0.050 is the R-squared of a regression with only 'a' used to predict y. The t-value for 'a+b' =0.34, p=0.75 which equals the p-value for the zero-order correlation of b with y of -0.12.

The zero-order correlation of a with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'b' predicting y = $$\sqrt{0.066-0.014}$$ = 0.22 where the R-squared of 0.066 corresponds to a regression on y using 'a' and 'a+b' as predictors of y and 0.014 is the R-squared of a regression with 'a' as the only predictor of y. The t-value for 'a+b' =-0.62, p=0.55 which equals the p-value for the zero-order correlation of a with y of 0.22.

Relationships between a,b and a+b

It also follows Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b)

= $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and

Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b)

= $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a

So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a.

It is also true that unless a and b are highly correlated so a $$\approx$$ $$\pm$$b,

$$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b)

because $$B_text{a}$$ and $$B_text{b}$$ will not in general be equal. This means you cannot obtain a zero-order correlation between y and the sum of a and b, indirectly, using the separate a and b scores.

One can also interpret this as knowing the a+b sum does not tell you the numbers (a and b) that were added together to give it if these number were weighted unequally (so that $$B_text{a}$$ is not equal to $$B_text{b}$$).

If a and b are highly correlated then the relationships between y and a and y and b are nearly equal and the relationship between y and a+b will then be equal to the relationship between y and either a or b.

Example

If b =-3a then for a Pearson correlation, r, r(a,y)=-r(b,y). r(a+b,y) = r(b,y)= - r(a,y) since the b values are higher in absolute value than those of a so the summation will have the same sign of relationship with y as the b values.

-  ⇤ ← Revision 17 as of 2011-02-04 10:25:19 → 
  Size: 3879
  Editor: PeterWatson
  Comment:
+   ← Revision 30 as of 2018-03-12 17:22:22 → ⇥
  Size: 5949
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-Suppose we have a response Y and two continuous predictors such as age of onset (A) and duration of hearing deficit (B-A) with B representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using A and B as predictors.
+Suppose we have a response Y and two continuous predictors such as age of onset (a) and duration of hearing deficit (b-a) with b representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using a and b as predictors.
 Line 37:
-se(a) = $$\sqrt{0.446^text{2} + 0.399^text{2} - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.
+se(a) = $$\sqrt{0.446^2 ^ + 0.399^2 ^ - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.

The zero-order correlations have the same t-values as the regression estimates used to obtain them and their zero-order correlations correspond to the signed square root of the change in R-squareds.

__Example showing the extraction of zero-order correlations from the above regressions__

We can obtain the zero-order correlations of a and b with y from the regressions involving a and the a+b sum and b with the a+b sum by evaluating R-squareds and regression coefficient t-values associated with the 'a+b' sum regression term. '''These results confirm that the zero-order correlation of a (b) with y can be obtained indirectly from the b (a) scores and the sum of a and b'''. Examples below are for a randomly generated data set.

The zero-order correlation of b with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'a' predicting y = $$\sqrt{0.066-0.050}$$ = -0.12 where the R-squared of 0.066 corresponds to a regression on y using 'a' and 'a+b' as predictors of y and 0.050 is the R-squared of a regression with only 'a' used to predict y. The t-value for 'a+b' =0.34, p=0.75 which equals the p-value for the zero-order correlation of b with y of -0.12.

The zero-order correlation of a with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'b' predicting y = $$\sqrt{0.066-0.014}$$ = 0.22 where the R-squared of 0.066 corresponds to a regression on y using 'a' and 'a+b' as predictors of y and 0.014 is the R-squared of a regression with 'a' as the only predictor of y. The t-value for 'a+b' =-0.62, p=0.55 which equals the p-value for the zero-order correlation of a with y of 0.22.
-Line 52:
+Line 64:
-It is also true that ''unless a and b are highly correlated'' so a $$\approx$$ b,
+It is also true that ''unless a and b are highly correlated'' so a $$\approx$$ $$\pm$$b,
-Line 56:
+Line 68:
-because knowing the sum does not tell you the numbers that were added together to give it since there are an infinite set of two numbers that can be summed to give a particular value. It follows that knowing the relationship between y and both a and b does not tell you the relationship between the response and the sum of a and b.
+because $$B_text{a}$$ and $$B_text{b}$$ will not in general be equal. '''This means you cannot obtain a zero-order correlation between y and the sum of a and b, indirectly, using the separate a and b scores'''. 

One can also interpret this as knowing the a+b sum does not tell you the numbers (a and b) that were added together to give it if these number were weighted unequally (so that $$B_text{a}$$ is not equal to $$B_text{b}$$).
-Line 60:
+Line 74:
+__Example__

If b =-3a then for a Pearson correlation, r, r(a,y)=-r(b,y). r(a+b,y) = r(b,y)= - r(a,y) since the b values are higher in absolute value than those of a so the summation will have the same sign of relationship with y as the b values.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome?