Differences between revisions 25 and 26

How do I summarise a fit for a logistic regression model?

Menard (2000) compares various R-squared measures for binary logistic regressions and concludes that the log-likelihood ratio chi-square is the most appropriate:

$$ \mbox{R-squared (Likelihood ratio)} = 1 - \frac{ln(L[m])}{ln(L[0]) } = \frac{ln(L[m]) - ln(L[0])}{ln(L[0])} = 1 - \frac{-2 ln(L[m])}{-2 ln(L[0]) } $$

where L[m] and L[0] are the log likelihoods for the model with predictors and the model containing only the intercept respectively. The latter term involves using -2 times the log likelihood which is outputted by SPSS (and other software) rather than the log likelihood. This R-squared form is also known as McFadden's R-squared.

Ths statistical significance of the predictors may be jointly assessed using twice the change in the log-likelihoods in the above expression. This equals 2 (ln (L[m]) - ln (L[0])) which is distributed as chi-square(p) if the p predictors jointly have no influence on group membership. This chi-square is computed and outputted by most software which performs binary logistic regressions. In SPSS, for example, this term is denoted by the chi-square statistic produced immediately after the predictors are added to the model under the heading 'Block 1 Method=Enter'. For example running a logistic regression in SPSS to assess the joint importance of two predictors p1 and p2 with the syntax below

LOGISTIC REGRESSION  y
  /METHOD = ENTER p1 p2
  /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

we obtain the likelihood ratio chi-square in the output which is of form:

BLOCK 1: METHOD-ENTER
 
Omnibus Tests of Model Coefficients
                Chi-square      df      Sig.
Step 1  Step    3.958           2       .138
        Block   3.958           2       .138
        Model   3.958           2       .138

This may be expressed as chi-square(2) = 3.96, p = 0.14 indicating that together the two predictors, p1 and p2, do not have a statistically significant association with group, y.

-2L[m] is outputted by SPSS as '-2 Log Likelihood' in the 'Model Summary' box and -2L[0] can be outputted either by adding the above omnibus test chi-square to -2L[m] or, directly, by creating and fitting a column of ones (constant) and fitting this as the sole predictor in a logistic regression omitting the intercept (using SPSS syntax below)

COMPUTE CONSTANT=1.
EXE.
LOGISTIC REGRESSION  y
  /METHOD = ENTER constant
  /ORIGIN
  /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

The above (likelihood ratio) R-squared estimate is also advocated by Train (2003).

Hosmer and Lemeshow (2000) note that the likelihood ratio R-squared does not attain the maximum value of 1 when two or more subjects have the same values of their predictor variables. In this case they propose a modification of the likelihood ratio R-squared

$$\frac{ln(L[0]) - ln(L[m])}{ln(L[0]) - ln(L([m]) - 0.5D $$ = $$\frac{-2 ln(L[0]) - (-2 ln(L[m]))}{-2ln(L[0]) - (-2ln(L([m])) + D $$ = $$\frac{ \mbox{Omnibus test chi-square for m variables}}{\mbox{Omnibus test chi-square for m variables + deviance} }$$

where D is the a quantity known as the deviance which represents the overall lack of fit of the model (or the total sum of squared deviations of the subjects' predicted group probabilities from their observed groups). The deviance can be worked out by using the subjects' deviance residuals which can be outputted in SPSS via the /SAVE DEV subcommand as below.

LOGISTIC REGRESSION  y
  /METHOD = ENTER p1 p2
  /SAVE = DEV
  /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

COMPUTE D2= DEV_1*DEV_1
EXE.

DESCRIPTIVES
  VARIABLES=D2
  /STATISTICS=MEAN STDDEV MIN MAX .

D is then equal to the number of observations multiplied by the mean of D2, the squared subject deviance residuals. We can then use this, following the above formula, with the outputted omnibus test chi-square to obtain an adjusted likelihood ratio R-squared for the cases where two or more subjects have identical predictor variable values.

Hosmer and Lemeshow further warn that R-squared values tend to be lower in logistic regression than the usual linear regression for continuous outcomes and, consequently, can give misleadingly low indications of the fit of models deemed good using other fit criteria such as area under ROC curve or percentage correctly classified. They, therefore, recommend using R-squared to compare competing models rather than as a stand-alone effect size.

References

Hosmer, D.W. and Lemeshow, S. (2000). Applied logistic regression. 2nd Edition. Wiley:New York. IN CBSU LIBRARY.

Menard, S. (2000) Coefficients of determination for multiple logistic regression analysis. American Statistician, 54, 17-24.

Train, K. (2003) Discrete choice methods with simulation. Cambridge University Press:Cambridge.

-  ⇤ ← Revision 25 as of 2011-08-12 11:20:31 → 
  Size: 4883
  Editor: PeterWatson
  Comment:
+   ← Revision 26 as of 2011-08-12 11:21:59 → ⇥
  Size: 4881
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
-where L(m) and L(0) are the log likelihoods for the model with predictors and the model containing only the intercept respectively. The latter term involves using -2 times the log likelihood which is outputted by SPSS (and other software) rather than the log likelihood. This R-squared form is also known as ''McFadden's'' R-squared.
+where L[m] and L[0] are the log likelihoods for the model with predictors and the model containing only the intercept respectively. The latter term involves using -2 times the log likelihood which is outputted by SPSS (and other software) rather than the log likelihood. This R-squared form is also known as ''McFadden's'' R-squared.
-Line 32:
+Line 31:
--2L(m) is outputted by SPSS as '-2 Log Likelihood' in the 'Model Summary' box and -2L(0) can be outputted either by adding the above omnibus test chi-square to -2L(m) or, directly, by creating and fitting a column of ones (constant) and fitting this as the sole predictor in a logistic regression omitting the intercept (using SPSS syntax below)
+-2L[m] is outputted by SPSS as '-2 Log Likelihood' in the 'Model Summary' box and -2L[0] can be outputted either by adding the above omnibus test chi-square to -2L[m] or, directly, by creating and fitting a column of ones (constant) and fitting this as the sole predictor in a logistic regression omitting the intercept (using SPSS syntax below)

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

How do I summarise a fit for a logistic regression model?