2569
Comment:
|
4856
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= How do I interpret a regression involving A and B-A as predictors? = | = What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome? = |
Line 5: | Line 5: |
In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors | In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors: |
Line 13: | Line 13: |
So it follows for $$B_text{i|i,j}$$ representing the variable i regression coefficient in a regression with i and j as predictors being used to predict a response, y, we have | So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that |
Line 15: | Line 15: |
So $$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and | $$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and |
Line 38: | Line 38: |
The zero-order correlations have the same t-values as the regression estimates used to obtain them and their zero-order correlations correspond to the signed square root of the change in R-squareds. __Example showing equivalence of zero-order correlations to the above regressions__ The zero-order correlation of b with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'a' predicting y = $$\sqrt{0.066-0.050}$$ = -0.12 where the R-squared of 0.066 corresponds to a regression on y using 'a' only as a predictor and 0.050 is the R-squared of a regression with 'a' and 'a+b' predicting y. The t-value for 'a+b' =0.34, p=0.75 which equals the p-value for the zero-order correlation of -0.12. __Relationships between a,b and a+b__ It also follows Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b) = $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b) = $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a. It is also true that ''unless a and b are highly correlated'' so a $$\approx$$ $$\pm$$b, $$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b) because $$B_text{a}$$ and $$B_text{b}$$ will not in general be equal. One can also interpret this as knowing the a+b sum does not tell you the numbers (a and b) that were added together to give it if these number were weighted unequally (so that $$B_text{a}$$ is not equal to $$B_text{b}$$). If a and b are highly correlated then the relationships between y and a and y and b are nearly equal and the relationship between y and a+b will then be equal to the relationship between y and either a or b. __Example__ If b =-3a then for a Pearson correlation, r, r(a,y)=-r(b,y). r(a+b,y) = r(b,y)= - r(a,y) since the b values are higher in absolute value than those of a so the summation will have the same sign of relationship with y as the b values. |
What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome?
Suppose we have a response Y and two continuous predictors such as age of onset (A) and duration of hearing deficit (B-A) with B representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using A and B as predictors.
In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors:
Predicted y = $$B_text{a}$$a + $$B_text{b-a}$$(b-a)
= $$B_text{a}$$a + $$B_text{b-a}$$b - $$B_text{b-a}$$a
= $$(B_text{a}$$ - $$B_text{b-a}$$)a + $$B_text{b-a}$$b
So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that
$$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and
$$B_text{b-a|a,b-a}$$ = $$B_text{b|a,b}$$
In other words subtracting the regression coefficients for a and b-a in a regression using a and b-a as predictor is equivalent to the regression coefficient for a in a regression with a and b as predictors and the regression coefficient for b-a with a and b-a as predictors is the same as the regression coefficient for b in a regression with a as the other predictor.
It also follows that the standard errors of the regression coefficients for a and b respectively can be derived using the standard errors of the regression coefficients for a and b-a.
se($$B_text{a|a,b}$$) = se($$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$) = $$\sqrt{V(B_text{a|a,b-a}) \mbox{ + } V(B_text{b-a|a,b-a}) \mbox{ - } 2\mbox{Cov}(B_text{a|a,b-a},B_text{b-a|a,b-a})}$$ and
se($$B_text{b|a,b}$$) = $$B_text{b-a|a,b-a}$$
Example
For one study involving a response y and variables a and b-a we have regression coefficients (s.es) of 1.170 (0.446) for a and 1.023 (0.399) for b-a.
It follows in a regression involving a and b on the same response the regression (s.e.) of b equals that of b-a in the a, b-a regression, namely 1.023 (0.399).
The regression coefficient for a equals 1.170 - 1.023 = 0.148. Given a covariance of 0.026 between the a and b-a regression coefficients
The se(a) in the regression involving a and b is computed using the s.es and covariance from the regression coefficients in the regression with a and b-a as predictors.
se(a) = $$\sqrt{0.446text{2} + 0.399text{2} - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.
The zero-order correlations have the same t-values as the regression estimates used to obtain them and their zero-order correlations correspond to the signed square root of the change in R-squareds.
Example showing equivalence of zero-order correlations to the above regressions
The zero-order correlation of b with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'a' predicting y = $$\sqrt{0.066-0.050}$$ = -0.12 where the R-squared of 0.066 corresponds to a regression on y using 'a' only as a predictor and 0.050 is the R-squared of a regression with 'a' and 'a+b' predicting y. The t-value for 'a+b' =0.34, p=0.75 which equals the p-value for the zero-order correlation of -0.12.
Relationships between a,b and a+b
It also follows Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b)
= $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and
Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b)
= $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a
So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a.
It is also true that unless a and b are highly correlated so a $$\approx$$ $$\pm$$b,
$$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b)
because $$B_text{a}$$ and $$B_text{b}$$ will not in general be equal.
One can also interpret this as knowing the a+b sum does not tell you the numbers (a and b) that were added together to give it if these number were weighted unequally (so that $$B_text{a}$$ is not equal to $$B_text{b}$$).
If a and b are highly correlated then the relationships between y and a and y and b are nearly equal and the relationship between y and a+b will then be equal to the relationship between y and either a or b.
Example
If b =-3a then for a Pearson correlation, r, r(a,y)=-r(b,y). r(a+b,y) = r(b,y)= - r(a,y) since the b values are higher in absolute value than those of a so the summation will have the same sign of relationship with y as the b values.