Diff for "FAQ/ab-a" - CBU statistics Wiki
location: Diff for "FAQ/ab-a"
Differences between revisions 11 and 16 (spanning 5 versions)
Revision 11 as of 2011-01-27 12:36:17
Size: 2573
Editor: PeterWatson
Revision 16 as of 2011-02-04 10:23:53
Size: 3831
Editor: PeterWatson
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= How do I interpret a regression involving A and B-A as predictors? = = What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome? =
Line 13: Line 13:
So it follows for $$B_text{i|i,j}$$ representing the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, we have So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that
Line 15: Line 15:
So $$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and $$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and
Line 38: Line 38:

__Relationships between a,b and a+b__

It also follows
Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b)

= $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and

Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b)

= $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a

So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a.

It is also true that ''unless a and b are highly correlated'' so a $$\approx$$ b,

$$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b)

because knowing the sum does not tell you the numbers that were added together to give it since there are an infinite set of two numbers that can be summed to give a particular value. It follows that knowing the relationship between y and both a and b does not tell you the relationship between the response and the sum of a and b.

If a and b are highly correlated then the relationship between y and a and b are near equal and a+b will be equal the relationship between y and either a or b.


What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome?

Suppose we have a response Y and two continuous predictors such as age of onset (A) and duration of hearing deficit (B-A) with B representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using A and B as predictors.

In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors:

Predicted y = $$B_text{a}$$a + $$B_text{b-a}$$(b-a)

= $$B_text{a}$$a + $$B_text{b-a}$$b - $$B_text{b-a}$$a

= $$(B_text{a}$$ - $$B_text{b-a}$$)a + $$B_text{b-a}$$b

So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that

$$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and

$$B_text{b-a|a,b-a}$$ = $$B_text{b|a,b}$$

In other words subtracting the regression coefficients for a and b-a in a regression using a and b-a as predictor is equivalent to the regression coefficient for a in a regression with a and b as predictors and the regression coefficient for b-a with a and b-a as predictors is the same as the regression coefficient for b in a regression with a as the other predictor.

It also follows that the standard errors of the regression coefficients for a and b respectively can be derived using the standard errors of the regression coefficients for a and b-a.

se($$B_text{a|a,b}$$) = se($$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$) = $$\sqrt{V(B_text{a|a,b-a}) \mbox{ + } V(B_text{b-a|a,b-a}) \mbox{ - } 2\mbox{Cov}(B_text{a|a,b-a},B_text{b-a|a,b-a})}$$ and

se($$B_text{b|a,b}$$) = $$B_text{b-a|a,b-a}$$


For one study involving a response y and variables a and b-a we have regression coefficients (s.es) of 1.170 (0.446) for a and 1.023 (0.399) for b-a.

It follows in a regression involving a and b on the same response the regression (s.e.) of b equals that of b-a in the a, b-a regression, namely 1.023 (0.399).

The regression coefficient for a equals 1.170 - 1.023 = 0.148. Given a covariance of 0.026 between the a and b-a regression coefficients

The se(a) in the regression involving a and b is computed using the s.es and covariance from the regression coefficients in the regression with a and b-a as predictors.

se(a) = $$\sqrt{0.446text{2} + 0.399text{2} - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.

Relationships between a,b and a+b

It also follows Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b)

= $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and

Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b)

= $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a

So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a.

It is also true that unless a and b are highly correlated so a $$\approx$$ b,

$$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b)

because knowing the sum does not tell you the numbers that were added together to give it since there are an infinite set of two numbers that can be summed to give a particular value. It follows that knowing the relationship between y and both a and b does not tell you the relationship between the response and the sum of a and b.

If a and b are highly correlated then the relationship between y and a and b are near equal and a+b will be equal the relationship between y and either a or b.

None: FAQ/ab-a (last edited 2018-03-12 17:22:22 by PeterWatson)