Size: 1587
Comment:
|
Size: 3572
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Menard (2000) compares various R-squared measures for binary logistic regressions and concludes that the log-likelihood ratio chi-square is the most appropriate. | Menard (2000) compares various R-squared measures for binary logistic regressions and concludes that the log-likelihood ratio chi-square is the most appropriate: |
Line 5: | Line 5: |
\[ R-squared (Likelihood ratio) = 1 - \frac{ln(L[m])}{ln(L[0] } \] |
$$ \mbox{R-squared (Likelihood ratio)} = 1 - \frac{ln(L[m])}{ln(L[0]) } = 1 - \frac{-2 ln(L[m])}{-2 ln(L[0]) } = \frac{ln(L[m]) - ln(L[0])}{ln(L[0])}$$ |
Line 9: | Line 7: |
where L(m) and L(0) are the log likelihoods for the model with predictors and intercept only models respectively. | where L(m) and L(0) are the log likelihoods for the model with predictors and the model containing only the intercept respectively. The latter term involves using -2 times the log likelihood which is outputted by SPSS (and other software) rather than the log likelihood. This R-squared form is also known as ''McFadden's'' R-squared. |
Line 11: | Line 9: |
Ths statistical significance of the predictors may be jointly assessed using twice the change in log-likelihoods equal to 2 (L(m) - L(0)) which is approximately chi-square(p) if the predictors jointly have no influence on group membership. This term is routinely computed for binary logistic regressions and in SPSS is the chi-square statistic produced immediately after the predictors are added to the model under 'Block 1 Method=Enter'. For example running a logistic regression to assess the joint importance of two predictors p1 and p2 | Ths statistical significance of the predictors may be ''jointly'' assessed using twice the change in the log-likelihoods in the above expression. This equals 2 (ln (L[m]) - ln (L[0])) which is distributed as chi-square(p) if the p predictors ''jointly'' have no influence on group membership. This chi-square is computed and outputted by most software which performs binary logistic regressions. In SPSS, for example, this term is denoted by the chi-square statistic produced immediately after the predictors are added to the model under the heading 'Block 1 Method=Enter'. For example running a logistic regression in SPSS to assess the joint importance of two predictors p1 and p2 with the syntax below |
Line 19: | Line 17: |
we obtain in the output | we obtain the likelihood ratio chi-square in the output which is of form: |
Line 25: | Line 24: |
Step 1 Step 3.958 2 .138 Block 3.958 2 .138 Model 3.958 2 .138 |
Step 1 Step 3.958 2 .138 Block 3.958 2 .138 Model 3.958 2 .138 |
Line 30: | Line 29: |
which is a chi-square(2) = 3.96, p=0.14 indicating that together the two predictors, p1 and p2, do not have a statistically significant association with group, y. | This may be expressed as chi-square(2) = 3.96, p = 0.14 indicating that together the two predictors, p1 and p2, do not have a statistically significant association with group, y. |
Line 32: | Line 31: |
__Reference__ | The above R-squared estimate is also advocated by Train (2003). Hosmer and Lemeshow (2000) note that the likelihood ratio R-squared does not attain the maximum value of 1.00 when 2 or more subjects have the same values of their predictor variables. In this case they propose a modification of the likelihood ratio R-squared $$\frac{ln(L[0]) - ln(L[m])}{ln(L[0]) - ln(L([m]) - 0.5D $$ = $$\frac{-2 ln(L[0]) - (-2 ln(L[m]))}{-2ln(L[0]) - (-2ln(L([m])) + D $$ = $$\frac{ \mbox{Omnibus test chi-square for m variables}}{\mbox{Omnibus test chi-square for m variables + deviance} }$$ where D is the a quantity known as the deviance which represents the overall lack of fit of the model (or deviation of subjects' predicted group probabilities from their observed groups). This deviation is represented by deviance residuals which can be outputted in SPSS using the /SAVE DEV subcommand as below. {{{ LOGISTIC REGRESSION y /METHOD = ENTER p1 p2 /SAVE = DEV /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . COMPUTE D2= DEV_1*DEV_1 EXE. DESCRIPTIVES VARIABLES=D2 /STATISTICS=MEAN STDDEV MIN MAX . }}} D is then equal to the number of observations multiplied by the mean of D2. __References__ Hosmer, D.W. and Lemeshow, S. (2000). Applied logistic regression. 2nd Edition. Wiley:New York. |
Line 35: | Line 60: |
regression analysis. American Statistician, 54, 17-24. | regression analysis. American Statistician, '''54''', 17-24. Train, K. (2003) Discrete choice methods with simulation. Cambridge University Press:Cambridge. |
How do I summarise a fit for a logistic regression model?
Menard (2000) compares various R-squared measures for binary logistic regressions and concludes that the log-likelihood ratio chi-square is the most appropriate:
$$ \mbox{R-squared (Likelihood ratio)} = 1 - \frac{ln(L[m])}{ln(L[0]) } = 1 - \frac{-2 ln(L[m])}{-2 ln(L[0]) } = \frac{ln(L[m]) - ln(L[0])}{ln(L[0])}$$
where L(m) and L(0) are the log likelihoods for the model with predictors and the model containing only the intercept respectively. The latter term involves using -2 times the log likelihood which is outputted by SPSS (and other software) rather than the log likelihood. This R-squared form is also known as McFadden's R-squared.
Ths statistical significance of the predictors may be jointly assessed using twice the change in the log-likelihoods in the above expression. This equals 2 (ln (L[m]) - ln (L[0])) which is distributed as chi-square(p) if the p predictors jointly have no influence on group membership. This chi-square is computed and outputted by most software which performs binary logistic regressions. In SPSS, for example, this term is denoted by the chi-square statistic produced immediately after the predictors are added to the model under the heading 'Block 1 Method=Enter'. For example running a logistic regression in SPSS to assess the joint importance of two predictors p1 and p2 with the syntax below
LOGISTIC REGRESSION y /METHOD = ENTER p1 p2 /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
we obtain the likelihood ratio chi-square in the output which is of form:
BLOCK 1: METHOD-ENTER Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 3.958 2 .138 Block 3.958 2 .138 Model 3.958 2 .138
This may be expressed as chi-square(2) = 3.96, p = 0.14 indicating that together the two predictors, p1 and p2, do not have a statistically significant association with group, y.
The above R-squared estimate is also advocated by Train (2003).
Hosmer and Lemeshow (2000) note that the likelihood ratio R-squared does not attain the maximum value of 1.00 when 2 or more subjects have the same values of their predictor variables. In this case they propose a modification of the likelihood ratio R-squared
$$\frac{ln(L[0]) - ln(L[m])}{ln(L[0]) - ln(L([m]) - 0.5D $$ = $$\frac{-2 ln(L[0]) - (-2 ln(L[m]))}{-2ln(L[0]) - (-2ln(L([m])) + D $$ = $$\frac{ \mbox{Omnibus test chi-square for m variables}}{\mbox{Omnibus test chi-square for m variables + deviance} }$$
where D is the a quantity known as the deviance which represents the overall lack of fit of the model (or deviation of subjects' predicted group probabilities from their observed groups). This deviation is represented by deviance residuals which can be outputted in SPSS using the /SAVE DEV subcommand as below.
LOGISTIC REGRESSION y /METHOD = ENTER p1 p2 /SAVE = DEV /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . COMPUTE D2= DEV_1*DEV_1 EXE. DESCRIPTIVES VARIABLES=D2 /STATISTICS=MEAN STDDEV MIN MAX .
D is then equal to the number of observations multiplied by the mean of D2.
References
Hosmer, D.W. and Lemeshow, S. (2000). Applied logistic regression. 2nd Edition. Wiley:New York.
Menard, S. (2000) Coefficients of determination for multiple logistic regression analysis. American Statistician, 54, 17-24.
Train, K. (2003) Discrete choice methods with simulation. Cambridge University Press:Cambridge.