Diff for "FAQ/dummyCor" - CBU statistics Wiki

Differences between revisions 13 and 14

Regression diagnostics for categorical variables

Some people feel a little anxious expressing correlations between dichotomous variables and a continuous variable in a regression, for example, as input for multicollinearity diagnostics.

When we have have a dichotomous variable (or dummy variable) in a simple regression the correlation with the outcome measure is termed a point-biserial correlation. Rosenthal, R. (1994) shows that this correlation is related both to the F and t statistics and also to the difference in group means expressed in terms of the pooled group standard deviation.

In particular, for the former two,

$$r(pb) = \mbox{the square root of } [ \mbox{t}² / (t² + df) ] $$

and

F(1,df) = [ df(Residual) r(pb) r(pb) ] / [ (1-r(pb)r(pb) ) ]

For the more general case of a categorical predictor, representing k groups, say, Rsq, the square of the semi-partial correlation for the categorical predictor with outcome is related to the F value by

F(k-1,df) = [df(Residual)/(k-1)] [Rsq /(1-Rsq)]

Semi-partial R-squared for group, Rsq(group), is defined as

Rsq(group) = Rsq(all predictors) - Rsq(removing group)

Semi-partial R-squareds and F ratios are routinely used as indicators of predictive strength in simple and multiple regressions. Cohen, J. Cohen, P. (1983), for example, give an example of semi-partial correlations in a four predictor multiple regression involving sex.

As an alternative to the above the StepAIC procedure in R can be used to select the best fitting models by comparing model Akaike Information Criteria (AICs) as described by Venables and Ripley (2002).

References

Cohen, J. Cohen, P. (1983) Applied multiple regression/correlation analysis for the behavioral sciences. Second edition. Lawrence Erlbaum:London.

Rosenthal, R. (1994) Parametric measures of effect size. In H.Cooper amd L.V. Hedges (Eds) The handbook of research synthesis. New York: Russell Sage Foundation.

Venables, W. N., Ripley, B. D., (2002). Modern Applied Statistics with S. 4th edition. New York: Springer.

-  ⇤ ← Revision 13 as of 2013-04-11 08:55:19 → 
  Size: 2189
  Editor: PeterWatson
  Comment:
+   ← Revision 14 as of 2013-08-20 15:39:17 → ⇥
  Size: 2118
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 16:
-$$r_text{pb} = \mbox{the square root of } [ \mbox{t}^2^ / (t^2^ + df) ] $$
+$$r(pb) = \mbox{the square root of } [ \mbox{t}^2^ / (t^2^ + df) ] $$
 Line 20:
-$$F_text{1,df} = \mbox{df(Residual)}r_text{pb}$$
$$r_text{pb} / (1-r_text{pb}r_text{pb}) $$
+F(1,df) = [ df(Residual)  r(pb) r(pb) ] / [ (1-r(pb)r(pb) ) ]
-Line 28:
+Line 27:
-$$F_text{k-1,df} = [\mbox{df(Residual)/(k-1)}]$$
$$ \mbox{Rsq} /(1-\mbox{Rsq}) $$
+F(k-1,df) = [df(Residual)/(k-1)] [Rsq /(1-Rsq)]

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Regression diagnostics for categorical variables