How do I summarise a fit for a logistic regression model?
The fit indices described here stem from the fact that it also seems plausible that a model with a higher log likelihood is providing a better fit.
Menard (2000) compares various R-squared measures for binary logistic regressions and concludes that the log-likelihood ratio chi-square is the most appropriate:
R-squared (Likelihood ratio) = 1 - [ln(L[m]) / (ln(L) = [ln(L[m]) - ln(L)] / ln(L = 1 - [-2 ln(L[m])] / [-2 ln(L) ]
where L[m] and L are the log likelihoods for the model with predictors and the model containing only the intercept respectively. The latter term involves using -2 times the log likelihood which is outputted by SPSS (and other software) rather than the log likelihood. This R-squared form is also known as McFadden's R-squared. The attached note shows that the difference in -2 log likelihoods (-2 Log(intercept only model) - -2 Log(full model)) as used by McFadden is equivalent to the difference in deviances, deviance (intercept only model) - deviance (full model).
Ths statistical significance of the predictors may be jointly assessed using twice the change in the log-likelihoods in the above expression. This equals 2 (ln (L[m]) - ln (L)) which is distributed as chi-square(p) if the p predictors jointly have no influence on group membership. This chi-square is computed and outputted by most software which performs binary logistic regressions. In SPSS, for example, this term is denoted by the chi-square statistic produced immediately after the predictors are added to the model under the heading 'Block 1 Method=Enter'. For example running a logistic regression in SPSS to assess the joint importance of two predictors p1 and p2 with the syntax below
LOGISTIC REGRESSION y /METHOD = ENTER p1 p2 /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
we obtain the likelihood ratio chi-square in the output which is of form:
BLOCK 1: METHOD-ENTER Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 3.958 2 .138 Block 3.958 2 .138 Model 3.958 2 .138
This may be expressed as chi-square(2) = 3.96, p = 0.14 indicating that together the two predictors, p1 and p2, do not have a statistically significant association with group, y.
-2L[m] is outputted by SPSS as '-2 Log Likelihood' in the 'Model Summary' box and -2L can be outputted either by adding the above omnibus test chi-square to -2L[m] or, directly, by creating and fitting a column of ones (constant) and fitting this as the sole predictor in a logistic regression omitting the intercept (using SPSS syntax below)
COMPUTE CONSTANT=1. EXE. LOGISTIC REGRESSION y /METHOD = ENTER constant /ORIGIN /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
The above (likelihood ratio) R-squared estimate is also advocated by Train (2003).
Hosmer and Lemeshow (2000) note that the likelihood ratio R-squared does not attain the maximum value of 1 when two or more subjects have the same values of their predictor variables. In this case they propose a modification of the likelihood ratio R-squared
[ln(L) - ln(L[m])]/[ln(L) - ln(L([m]) - 0.5D] = [-2 ln(L) - (-2 ln(L[m]))]/[-2ln(L) - (-2ln(L([m])) + D] = [Omnibus test chi-square for m variables]/[Omnibus test chi-square for m variables + deviance]
where D is the a quantity known as the deviance which represents the overall lack of fit of the model (or the total sum of squared deviations of the subjects' predicted group probabilities from their observed groups). The deviance can be worked out by using the subjects' deviance residuals which can be outputted in SPSS via the /SAVE DEV subcommand as below.
LOGISTIC REGRESSION y /METHOD = ENTER p1 p2 /SAVE = DEV /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . COMPUTE D2= DEV_1*DEV_1 EXE. DESCRIPTIVES VARIABLES=D2 /STATISTICS=MEAN STDDEV MIN MAX .
D is then equal to the number of observations multiplied by the mean of D2, the squared subject deviance residuals. We can then use this, following the above formula, with the outputted omnibus test chi-square to obtain an adjusted likelihood ratio R-squared for the cases where two or more subjects have identical predictor variable values. Note that McFadden's R-squared is now outputted by default by SPSS using the multinomial logistic regression procedure which fits models to outcomes of TWO or more groups (ie including the binary case).
Hosmer and Lemeshow further warn that R-squared values tend to be lower in logistic regression than the usual linear regression for continuous outcomes and, consequently, can give misleadingly low indications of the fit of models deemed good using other fit criteria such as area under ROC curve or percentage correctly classified. They, therefore, recommend using R-squared to compare competing models rather than as a stand-alone effect size.
An alternative would be to use the concordance (c-statistic), which is equivalent to the area under (the ROC) curve (AUC) (described here). The concordance coefficient (c) is the proportion of individuals for which the numerically higher group (coded 1) has the higher predicted probability. A c-statistic of 0.5 is chance (see Agresti, 2013, p.224). The AUC when applied to two intervals over time (e.g. pre and post intervention) is also equivalent to the Nonoverlap of all Pairs (NAP) statistic (see here) and also Parker and Vannest (2009).
Note: the mnrfit function can also be used in MATLAB but does not directly give the deviance (lack of fit) for the intercept only model (which allocates individuals to the larger/largest group). If one has equal sized groups the deviance for the intercept only model can be obtained by fitting a variable which takes identical values in all of the groups and obtaining the deviance (lack of fit) for this model. The difference in deviances between the intercept only model and the model containing variables of interest (which always has a smaller or equal deviance to the intercept only model) divided by the deviance of the intercept only model will give McFadden's R-squared. You can more generally also obtain the deviance of the intercept only model in MATLAB by fitting a model including a single predictor variable which takes the value of zero for each individual.
Three or more groups
This section describes fitting criteria which can be applied more generally to any logistic regression (ie looking at 2 or more groups). When we look at three or more groups we use the (multinomial) logistic regression which is available as a separate procedure in SPSS.
In addition to the earlier diagnostics you can also compare models by computing Akaike's and the Bayesian information criteria (AIC, BIC) using the outputted log likelihood and the number of model predictors. These are available for the multinomial logistic regression by clicking on the statistics button in the gui window and clicking the box next to information criteria. Two are produced and are defined as below:
Akaike’s Information Criterion (AIC) equals -2*log-likelihood+2*k where k is the number of estimated parameters = number of regression terms in the model = number of intercepts + number of predictor regression coefficients. e.g. if I have one covariate predictor and three groups I have four terms which I estimate: the two intercepts and the two covariate regression coefficients.
The Bayesian Information Criterion (BIC) is similar to AIC and equals -2*log-likelihood + k*log(n) where k is the number of estimated parameters and n is the sample size. The Bayesian Information Criterion is also known as the Schwartz criterion. See also Algorithms > GENLIN Algorithms > Model Testing (generalized linear models algorithms) > Goodness-of-Fit Statistics (generalized linear models algorithms) under help > algorithms in SPSS for these same (and other) information criteria definitions as used in SPSS.
The idea is that a good model has BOTH a good fit so a high log likelihood and also relatively few predictors. Information criteria uniquely combine these two desirable features into one number. We choose the model with the lowest value of an information criterion. Note that Coxe, West and Aiken (2009) mention that information criteria may be used if models are not nested e.g. when there is overdispersion but that R^2 cannot (see also here).
SPSS outputs both these information criteria (by asking for these using the statistics button) as well as McFadden's R-squared described earlier for the binary case. McFadden's R-squared is outputted, by default, (as one of the pseudo- R-squareds requested by clicking on the statistics button) for the multinomial logistic regression model.
Agresti, A. (2013.) Categorical data analysis. Third edition. Wiley:New York.
Coxe, S., West, S.G. and Aiken, L.S. (2009). The analysis of count data: a gentle introduction to poisson regression and its alternatives. Journal of Personality Assessment 91(2) 121-136.
Field, A. (2013). Discovering statistics using IBM SPSS Statistics. Fourth Edition. Sage:London. On pages 764-766 the three R^2 outputted by SPSS are defined.
Hosmer, D.W. and Lemeshow, S. (2000). Applied logistic regression. 2nd Edition. Wiley:New York. IN CBSU LIBRARY.
Menard, S. (2000). Coefficients of determination for multiple logistic regression analysis. American Statistician, 54, 17-24.
Parker, R.I. and Vannest, K.J. (2009). An improved effect size for single case research: Non-overlap of all pairs (NAP). Behavior Therapy 40(4), 357-367
Train, K. (2003). Discrete choice methods with simulation. Cambridge University Press:Cambridge.