FAQ/ab-a - CBU statistics Wiki

Revision 15 as of 2011-02-04 09:59:10

Clear message
location: FAQ / ab-a

What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome?

Suppose we have a response Y and two continuous predictors such as age of onset (A) and duration of hearing deficit (B-A) with B representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using A and B as predictors.

In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors:

Predicted y = $$B_text{a}$$a + $$B_text{b-a}$$(b-a)

= $$B_text{a}$$a + $$B_text{b-a}$$b - $$B_text{b-a}$$a

= $$(B_text{a}$$ - $$B_text{b-a}$$)a + $$B_text{b-a}$$b

So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that

$$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and

$$B_text{b-a|a,b-a}$$ = $$B_text{b|a,b}$$

In other words subtracting the regression coefficients for a and b-a in a regression using a and b-a as predictor is equivalent to the regression coefficient for a in a regression with a and b as predictors and the regression coefficient for b-a with a and b-a as predictors is the same as the regression coefficient for b in a regression with a as the other predictor.

It also follows that the standard errors of the regression coefficients for a and b respectively can be derived using the standard errors of the regression coefficients for a and b-a.

se($$B_text{a|a,b}$$) = se($$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$) = $$\sqrt{V(B_text{a|a,b-a}) \mbox{ + } V(B_text{b-a|a,b-a}) \mbox{ - } 2\mbox{Cov}(B_text{a|a,b-a},B_text{b-a|a,b-a})}$$ and

se($$B_text{b|a,b}$$) = $$B_text{b-a|a,b-a}$$

Example

For one study involving a response y and variables a and b-a we have regression coefficients (s.es) of 1.170 (0.446) for a and 1.023 (0.399) for b-a.

It follows in a regression involving a and b on the same response the regression (s.e.) of b equals that of b-a in the a, b-a regression, namely 1.023 (0.399).

The regression coefficient for a equals 1.170 - 1.023 = 0.148. Given a covariance of 0.026 between the a and b-a regression coefficients

The se(a) in the regression involving a and b is computed using the s.es and covariance from the regression coefficients in the regression with a and b-a as predictors.

se(a) = $$\sqrt{0.446text{2} + 0.399text{2} - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.

Relationships between a,b and a+b

It also follows Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b)

= $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and

Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b)

= $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a

So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a.

It is also true that

$$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b)

because knowing the sum does not tell you the numbers that were added together to give it since there are an infinite set of two numbers that can be summed to give a particular value. It follows that knowing the relationship between y and both a and b does not tell you the relationship between the response and the sum of a and b.