Diff for "FAQ/infmles" - CBU statistics Wiki

Differences between revisions 25 and 26

Inflated standard errors in logistic regression

Problems interpreting logistic regression regression estimates is caused by having too good a fit!

Albert and Anderson (1984) observed that when you have perfect or near perfect prediction then the logistic regression regression estimates and standard errors are undefined. To see why this is suppose we are interested in comparing patients with controls on a pass/fail criteria. All the patients fail and all the controls pass. Now, the regression estimates in logistic regression represent odds ratios, for example the ratio of odds of passing to failing in the controls compared to the patients as given in the equation below.

OR = (A*B)/(C*D)

where A=number of controls who pass, B=number of patients who fail, C=number of controls who fail and D=number of patients who pass.

The odds ratio (and by implication the regression estimates) are undefined in this example because no patients pass and no controls fail. The OR equation shows that the odds ratio is also undefined if either no controls fail or no patients pass. Let's call this scenario a case of perfect prediction. For another illustration see [:FAQ/infmles/likel: here.]

In particular the Wald chi-square statistic, which SPSS evaluates, based on the square of the ratio of the regression estimate to its standard error, should not be used in these perfect fit cases because it grossly underestimates the effect of the predictor variables.

Instead twice the difference in log likelihoods should be used to assess the influence of a predictor variable ([attachment:infmles.pdf Rindskopf(2002)]). Collett (1991) also recommends the use of the likelihood ratio chi-square over the Wald chi-square particularly when the data are sparse as it is still well approximated by a chi-square distribution. Fit the model with and without the predictor(s) of interest and compare the term called –2 log Likelihood in the model summary box. The difference between these is chi-squared on p degrees of freedom, where p variables have been dropped from the model. The p-value can be obtained using functions under transform:compute and can also be obtained in SPSS by fitting predictors in blocks. An example of this approach using R is [:FAQ/offseteg: here.]

The chi-square obtained from differencing log likelihoods is valid because, unlike the Wald statistic, it does not depend on regression estimates and their standard errors which are not estimable because they are unbounded when we have perfect prediction. Instead it uses probabilities, to measure changes in model fit due to adding an subtracting predictors, which are always bounded between zero and 1! In particular, when we have perfect prediction these probabilities tend to zero and 1. For example, in our earlier scenario, the probability of a pass for a control is 1 and the probability of failure for a control is zero. This procedure does not, however, measure the association (odds ratio) between controls and patients and pass rate.

It is also possible to output an exact p-value (Mehta and Patel, 1995) for a test of a model predictor in logistic regression. This procedure does produces an estimated odds ratio even when an odds ratio is not able to be estimated, using more traditional likelihood methods, because of the occurrence of zero frequencies. A procedure for producing exact odds ratios and exact p-values is available using the LOGISTIC procedure in SAS. Further details of how to do this and other ways of removing this problem are illustrated [http://www.uoregon.edu/~robinh/lgst_zero.txt here.] Rindskopf (2002), however, suggests these exact odds ratios do not always give good predictions.

References

Albert A and Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 1-10.

Collett D. (1991). Modelling binary data. Chapman and Hall:London.

Hosmer DW and Lemeshow S (2000) Applied logistic regression. 2nd Edition. Wiley:New York pp135-142. IN CBSU LIBRARY.

Mehta, CR and Patel NR (1995). Exact logistic regression:Theory and examples. Statistics in Medicine, 14, 2143-2160.

Rindskopf D (2002) Infinite parameter estimates in logistic regression: opportunities, not problems. Journal of Educational and Behavioral Statistics 27(2) 147-161.

-  ⇤ ← Revision 25 as of 2011-08-12 09:57:54 → 
  Size: 4151
  Editor: PeterWatson
  Comment:
+   ← Revision 26 as of 2012-01-19 12:01:31 → ⇥
  Size: 4421
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 16:
-Instead twice the difference in log likelihoods should be used to assess the influence of a predictor variable ([attachment:infmles.pdf Rindskopf(2002)]). Fit the model with and without the predictor(s) of interest and compare the term called –2 log Likelihood in the model summary box. The difference between these is chi-squared on p degrees of freedom, where p variables have been dropped from the model. The p-value can be obtained using functions under transform:compute and can also be obtained in SPSS by fitting predictors in ''blocks''. An example of this approach using R is [:FAQ/offseteg: here.]
+Instead twice the difference in log likelihoods should be used to assess the influence of a predictor variable ([attachment:infmles.pdf Rindskopf(2002)]). Collett (1991) also recommends the use of the likelihood ratio chi-square over the Wald chi-square particularly when the data are sparse as it is still well approximated by a chi-square distribution. Fit the model with and without the predictor(s) of interest and compare the term called –2 log Likelihood in the model summary box. The difference between these is chi-squared on p degrees of freedom, where p variables have been dropped from the model. The p-value can be obtained using functions under transform:compute and can also be obtained in SPSS by fitting predictors in ''blocks''. An example of this approach using R is [:FAQ/offseteg: here.]
 Line 28:
+Collett D. (1991). Modelling binary data. Chapman and Hall:London.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Inflated standard errors in logistic regression