## Regression diagnostics for categorical variables

Some people feel a little anxious expressing correlations between dichotomous variables and a continuous variable in a regression, for example, as input for multicollinearity diagnostics.

When we have have a dichotomous variable (or dummy variable) in a simple regression the correlation with the outcome measure is termed a point-biserial correlation. Rosenthal, R. (1994) shows that this correlation is related both to the F and t statistics and also to the difference in group means expressed in terms of the pooled group standard deviation.

In particular, for the former two,

$$r(pb) = \mbox{the square root of } [ \mbox{t}^{2} / (t^{2} + df) ] $$

and

F(1,df) = [ df(Residual) r(pb) r(pb) ] / [ (1-r(pb)r(pb) ) ]

For the more general case of a categorical predictor, representing k groups, say, Rsq, the square of the semi-partial correlation for the categorical predictor with outcome is related to the F value by

F(k-1,df) = [df(Residual)/(k-1)] [Rsq /(1-Rsq)]

Semi-partial R-squared for group, Rsq(group), is defined as

Rsq(group) = Rsq(all predictors) - Rsq(removing group)

Semi-partial R-squareds and F ratios are routinely used as indicators of predictive strength in simple and multiple regressions. Cohen, J. Cohen, P. (1983), for example, give an example of semi-partial correlations in a four predictor multiple regression involving sex.

As an alternative to the above the StepAIC procedure in R can be used to select the best fitting models by comparing model Akaike Information Criteria (AICs) as described by Venables and Ripley (2002).

**References**

Cohen, J. Cohen, P. (1983) Applied multiple regression/correlation analysis for the behavioral sciences. Second edition. Lawrence Erlbaum:London.

Rosenthal, R. (1994) Parametric measures of effect size. In H.Cooper amd L.V. Hedges (Eds) The handbook of research synthesis. New York: Russell Sage Foundation.

Venables, W. N., Ripley, B. D., (2002). Modern Applied Statistics with S. 4th edition. New York: Springer.