Diff for "FAQ/logrrsq" - CBU statistics Wiki
location: Diff for "FAQ/logrrsq"
Differences between revisions 4 and 5
Revision 4 as of 2007-12-03 10:30:34
Size: 2240
Editor: PeterWatson
Comment:
Revision 5 as of 2007-12-03 10:31:31
Size: 2356
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
SPSS and other packages routinely output -2 log likelihoods which are indices of particular model goodness-of-fit.

How do I choose between different logistic regression models?

In multiple linear regressions we have R-squared to summarise the fit of a model and therby inform on choice.

SPSS produces two R-squared measures (Nagelkerke and Cox-Snell) for binary logistic regressions but the use of these is controversial (see below). They compare the fit of the model with the predictors to one without them (just like fit indices in structural equation models, for those familiar with this area).

What follows is an extract of an e-mail on the choice of R-squared in logistic regression from Dietrich Alte. The recommended referenced journal is available from the University library.

Menard (2000) referred to below suggests using

$$R^text{2} = \frac{\mbox{Difference between -2 log likelihoods of null model and model with covariates of interest in}}{\mbox{-2 log likelihood of null model}} $$

where the null model is a binary logistic regression with a single predictor (covariate) consisting of a column of 1's. To fit this particular model you first need to click the options button and ask for no constant to be in the regression. SPSS and other packages routinely output -2 log likelihoods which are indices of particular model goodness-of-fit.

The first measure (Cox-Snell) is used to assess models where all the independent variables are continuous and the second (Nagelkerke) is used where there are one or more binary independent variables in the model.

These statistics have at their core the ratio of the likelihood function of the fitted model to the likelihood function of an intercept only model. What that are actually measuring is the proportion of change in the likelihood function of the specified model vs. no model at all.

I don't like these statistics very much, and like them even less because their names suggest they are analogous to the "variance explained" measures used in linear models, but they are actually measuring something else.

There was a very good article in the Feb 2000 issue of The American Statistician by Scott Menard called "Coefficients of Determination for Multiple Logistic Regression Models," which may be of use.

Reference

Scott Menard, “Coefficients of Determination for Multiple Logistic Regression Analysis,” The American Statistician 54:17-24, 2000.

None: FAQ/logrrsq (last edited 2014-06-19 10:23:23 by PeterWatson)