FAQ/ab-a - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
In thi sntence, what word is mad fro the mising letters?

Revision 23 as of 2011-02-04 11:22:41

location: FAQ / ab-a

What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome?

Suppose we have a response Y and two continuous predictors such as age of onset (A) and duration of hearing deficit (B-A) with B representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using A and B as predictors.

In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors:

Predicted y = $$B_text{a}$$a + $$B_text{b-a}$$(b-a)

= $$B_text{a}$$a + $$B_text{b-a}$$b - $$B_text{b-a}$$a

= $$(B_text{a}$$ - $$B_text{b-a}$$)a + $$B_text{b-a}$$b

So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that

$$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and

$$B_text{b-a|a,b-a}$$ = $$B_text{b|a,b}$$

In other words subtracting the regression coefficients for a and b-a in a regression using a and b-a as predictor is equivalent to the regression coefficient for a in a regression with a and b as predictors and the regression coefficient for b-a with a and b-a as predictors is the same as the regression coefficient for b in a regression with a as the other predictor.

It also follows that the standard errors of the regression coefficients for a and b respectively can be derived using the standard errors of the regression coefficients for a and b-a.

se($$B_text{a|a,b}$$) = se($$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$) = $$\sqrt{V(B_text{a|a,b-a}) \mbox{ + } V(B_text{b-a|a,b-a}) \mbox{ - } 2\mbox{Cov}(B_text{a|a,b-a},B_text{b-a|a,b-a})}$$ and

se($$B_text{b|a,b}$$) = $$B_text{b-a|a,b-a}$$

Example

For one study involving a response y and variables a and b-a we have regression coefficients (s.es) of 1.170 (0.446) for a and 1.023 (0.399) for b-a.

It follows in a regression involving a and b on the same response the regression (s.e.) of b equals that of b-a in the a, b-a regression, namely 1.023 (0.399).

The regression coefficient for a equals 1.170 - 1.023 = 0.148. Given a covariance of 0.026 between the a and b-a regression coefficients

The se(a) in the regression involving a and b is computed using the s.es and covariance from the regression coefficients in the regression with a and b-a as predictors.

se(a) = $$\sqrt{0.446text{2} + 0.399text{2} - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.

The zero-order correlations have the same t-values as the regression estimates used to obtain them and their zero-order correlations correspond to the signed square root of the change in R-squareds.

Example showing the extraction of zero-order correlations from the above regressions

We can obtain the zero-order correlations of a and b with y from the regressions involving a and the a+b sum and b with the a+b sum by evaluating R-squareds and regression coefficient t-values associated with the 'a+b' sum regression term. These results confirm that the zero-order correlation of either a (b) with y can be obtained indirectly from the b (a) scores and the sum of a and b. Examples below are for a randomly generated data set.

The zero-order correlation of b with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'a' predicting y = $$\sqrt{0.066-0.050}$$ = -0.12 where the R-squared of 0.066 corresponds to a regression on y using 'a' and 'a+b' as predictors of y and 0.050 is the R-squared of a regression with only 'a' used to predict y. The t-value for 'a+b' =0.34, p=0.75 which equals the p-value for the zero-order correlation of b with y of -0.12.

The zero-order correlation of a with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'b' predicting y = $$\sqrt{0.066-0.014}$$ = 0.22 where the R-squared of 0.066 corresponds to a regression on y using 'a' and 'a+b' as predictors of y and 0.014 is the R-squared of a regression with 'a' as the only predictor of y. The t-value for 'a+b' =-0.62, p=0.55 which equals the p-value for the zero-order correlation of a with y of 0.22.

Relationships between a,b and a+b

It also follows Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b)

= $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and

Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b)

= $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a

So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a.

It is also true that unless a and b are highly correlated so a $$\approx$$ $$\pm$$b,

$$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b)

because $$B_text{a}$$ and $$B_text{b}$$ will not in general be equal.

One can also interpret this as knowing the a+b sum does not tell you the numbers (a and b) that were added together to give it if these number were weighted unequally (so that $$B_text{a}$$ is not equal to $$B_text{b}$$).

If a and b are highly correlated then the relationships between y and a and y and b are nearly equal and the relationship between y and a+b will then be equal to the relationship between y and either a or b.

Example

If b =-3a then for a Pearson correlation, r, r(a,y)=-r(b,y). r(a+b,y) = r(b,y)= - r(a,y) since the b values are higher in absolute value than those of a so the summation will have the same sign of relationship with y as the b values.