Diff for "FAQ/ab-a" - CBU statistics Wiki
location: Diff for "FAQ/ab-a"
Differences between revisions 16 and 22 (spanning 6 versions)
Revision 16 as of 2011-02-04 10:23:53
Size: 3831
Editor: PeterWatson
Comment:
Revision 22 as of 2011-02-04 11:13:01
Size: 4856
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 39: Line 39:
The zero-order correlations have the same t-values as the regression estimates used to obtain them and their zero-order correlations correspond to the signed square root of the change in R-squareds.

__Example showing equivalence of zero-order correlations to the above regressions__

The zero-order correlation of b with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'a' predicting y = $$\sqrt{0.066-0.050}$$ = -0.12 where the R-squared of 0.066 corresponds to a regression on y using 'a' only as a predictor and 0.050 is the R-squared of a regression with 'a' and 'a+b' predicting y. The t-value for 'a+b' =0.34, p=0.75 which equals the p-value for the zero-order correlation of -0.12.


Line 52: Line 60:
It is also true that ''unless a and b are highly correlated'' so a $$\approx$$ b, It is also true that ''unless a and b are highly correlated'' so a $$\approx$$ $$\pm$$b,
Line 56: Line 64:
because knowing the sum does not tell you the numbers that were added together to give it since there are an infinite set of two numbers that can be summed to give a particular value. It follows that knowing the relationship between y and both a and b does not tell you the relationship between the response and the sum of a and b. because $$B_text{a}$$ and $$B_text{b}$$ will not in general be equal.
Line 58: Line 66:
If a and b are highly correlated then the relationship between y and a and b are near equal and a+b will be equal the relationship between y and either a or b. One can also interpret this as knowing the a+b sum does not tell you the numbers (a and b) that were added together to give it if these number were weighted unequally (so that $$B_text{a}$$ is not equal to $$B_text{b}$$).

If a and b are highly correlated then the relationships between y and a and y and b are nearly equal and the relationship between y and a+b will then be equal to the relationship between y and either a or b.

__Example__

If b =-3a then for a Pearson correlation, r, r(a,y)=-r(b,y). r(a+b,y) = r(b,y)= - r(a,y) since the b values are higher in absolute value than those of a so the summation will have the same sign of relationship with y as the b values.

What is the relationship between regressions involving variables A and B to those involving B-A and A+B in predicting an outcome?

Suppose we have a response Y and two continuous predictors such as age of onset (A) and duration of hearing deficit (B-A) with B representing the individual's current age. Then there is an equivalence between the coefficients in this regression and the ones associated with the same response,y, being predicted using A and B as predictors.

In particular if $$B_text{i}$$ represents the regression coefficient for variable i then in a regression using a and b-a as predictors:

Predicted y = $$B_text{a}$$a + $$B_text{b-a}$$(b-a)

= $$B_text{a}$$a + $$B_text{b-a}$$b - $$B_text{b-a}$$a

= $$(B_text{a}$$ - $$B_text{b-a}$$)a + $$B_text{b-a}$$b

So it follows that if $$B_text{i|i,j}$$ represents the regression coefficient of variable i in a regression with i and j as predictors being used to predict a response, y, that

$$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$ = $$B_text{a|a,b}$$ and

$$B_text{b-a|a,b-a}$$ = $$B_text{b|a,b}$$

In other words subtracting the regression coefficients for a and b-a in a regression using a and b-a as predictor is equivalent to the regression coefficient for a in a regression with a and b as predictors and the regression coefficient for b-a with a and b-a as predictors is the same as the regression coefficient for b in a regression with a as the other predictor.

It also follows that the standard errors of the regression coefficients for a and b respectively can be derived using the standard errors of the regression coefficients for a and b-a.

se($$B_text{a|a,b}$$) = se($$B_text{a|a,b-a}$$ - $$B_text{b-a|a,b-a}$$) = $$\sqrt{V(B_text{a|a,b-a}) \mbox{ + } V(B_text{b-a|a,b-a}) \mbox{ - } 2\mbox{Cov}(B_text{a|a,b-a},B_text{b-a|a,b-a})}$$ and

se($$B_text{b|a,b}$$) = $$B_text{b-a|a,b-a}$$

Example

For one study involving a response y and variables a and b-a we have regression coefficients (s.es) of 1.170 (0.446) for a and 1.023 (0.399) for b-a.

It follows in a regression involving a and b on the same response the regression (s.e.) of b equals that of b-a in the a, b-a regression, namely 1.023 (0.399).

The regression coefficient for a equals 1.170 - 1.023 = 0.148. Given a covariance of 0.026 between the a and b-a regression coefficients

The se(a) in the regression involving a and b is computed using the s.es and covariance from the regression coefficients in the regression with a and b-a as predictors.

se(a) = $$\sqrt{0.446text{2} + 0.399text{2} - 2(0.026)}$$ = $$\sqrt{0.306}$$ = 0.553.

The zero-order correlations have the same t-values as the regression estimates used to obtain them and their zero-order correlations correspond to the signed square root of the change in R-squareds.

Example showing equivalence of zero-order correlations to the above regressions

The zero-order correlation of b with y is the signed square root of the change in R-squared adding 'a+b' to a regression already containing 'a' predicting y = $$\sqrt{0.066-0.050}$$ = -0.12 where the R-squared of 0.066 corresponds to a regression on y using 'a' only as a predictor and 0.050 is the R-squared of a regression with 'a' and 'a+b' predicting y. The t-value for 'a+b' =0.34, p=0.75 which equals the p-value for the zero-order correlation of -0.12.

Relationships between a,b and a+b

It also follows Predicted y = $$B_text{a}$$a + $$B_text{a+b}$$(a+b)

= $$(B_text{a}$$ + $$B_text{a+b}$$)a + $$B_text{a+b}$$b and

Predicted y = $$B_text{b}$$b + $$B_text{a+b}$$(a+b)

= $$(B_text{b}$$ + $$B_text{a+b}$$)b + $$B_text{a+b}$$a

So it follows that knowing the relationship between the response with both a+b and a is enough to give the relationship between the response and b and the relationship between the response and both a+b and b is enough to give the relationship with a.

It is also true that unless a and b are highly correlated so a $$\approx$$ $$\pm$$b,

$$B_text{a}$$a + $$B_text{b}$$b $$\ne$$ $$B_text{a+b}$$(a+b)

because $$B_text{a}$$ and $$B_text{b}$$ will not in general be equal.

One can also interpret this as knowing the a+b sum does not tell you the numbers (a and b) that were added together to give it if these number were weighted unequally (so that $$B_text{a}$$ is not equal to $$B_text{b}$$).

If a and b are highly correlated then the relationships between y and a and y and b are nearly equal and the relationship between y and a+b will then be equal to the relationship between y and either a or b.

Example

If b =-3a then for a Pearson correlation, r, r(a,y)=-r(b,y). r(a+b,y) = r(b,y)= - r(a,y) since the b values are higher in absolute value than those of a so the summation will have the same sign of relationship with y as the b values.

None: FAQ/ab-a (last edited 2018-03-12 17:22:22 by PeterWatson)