== Using collinearity diagnostics on dummy variables ==

Some people feel a little anxious expressing correlations
between dichotomous variables and a continuous variable in a regression, 
for example, as input for multicollinearity diagnostics.

When we have have a dichotomous variable (or dummy variable) in
a simple regression the correlation with the outcome measure
is termed a point-biserial correlation. Rosenthal, R. (1994)
shows that this correlation is related both to the F and t statistics
and also to the difference in group means expressed in terms of
the pooled group standard deviation.

In particular, for the former two, 
\[
r_{pb} = \Sqrt \frac{t*t}{t*t + df}
\]

and

\[
F_{1,df} = df(Residual) \frac{r_{pb}*r_{pb}}{1-r_{pb}*r_{pb}}
\]

For the more general case of a categorical predictor, representing k groups, 
say, Rsq, the square of the semi-partial correlation for  
the categorical predictor with outcome is related to the F value by

\[
F_{k-1,df} = \frac{df(Residual)}{(k-1)} \frac{Rsq}{1-Rsq}
\]


Semi-partial R-squared for group, Rsq(group), is defined as

Rsq(group) = Rsq(all predictors) - Rsq(removing group)


Semi-partial R-squareds and F ratios are routinely used as indicators
of predictive strength in simple and multiple regressions. 
Cohen, J. Cohen, P. (1983), for example, give an example of semi-partial
correlations in a four predictor multiple regression involving sex. 


References

Cohen, J. Cohen, P. (1983) Applied multiple regression/correlation
analysis for the behavioral sciences. Second edition. Lawrence Erlbaum:London.

Rosenthal, R. (1994) Parametric measures of effect size. In H.Cooper
amd L.V. Hedges (Eds) The handbook of research synthesis. 
New York: Russell Sage Foundation.