FAQ/infmles - CBU statistics Wiki

Revision 14 as of 2009-02-25 16:40:30

Clear message
location: FAQ / infmles

Inflated standard errors in logistic regression

Problems interpreting logistic regression regression estimates is caused by having too good a fit!

Albert and Anderson (1984) observed that when you have perfect or near perfect prediction then the logistic regression regression estimates and standard errors are undefined. To see why this is suppose we are interested in comparing patients with controls on a pass/fail criteria. All the patients fail and all the controls pass. Now, the regression estimates in logistic regression represent odds ratios, for example the ratio of odds of passing to failing in the controls compared to the patients as given in the equation below.

OR = (A*B)/(C*D)

where A=number of controls who pass, B=number of patients who fail, C=number of controls who fail and D=number of patients who pass.

The odds ratio (and by implication the regression estimates) are undefined in this example because no patients pass and no controls fail. The OR equation shows that the odds ratio is also undefined if either no controls fail or no patients pass.

In particular the Wald statistic which SPSS evaluates, based on the square of the ratio of the regression estimate to its standard error, should not be used in these perfect fit cases because it grossly underestimates the effect of the predictor variables.

Instead twice the difference in log likelihoods should be used to assess the influence of a predictor variable. Fit the model with and without the predictor(s) of interest and compare the term called –2 log Likelihood in the model summary box. The difference between these is chi-squared on p degrees of freedom, where p variables have been dropped from the model. The p-value can be obtained using functions under transform:compute. Another FAQ will deal with this.

It is possible to output an exact p-value (Mehta and Patel, 1995) for a test of a model predictor in logistic regression even when the usual estimation procedure breaks down due to its in ability to estimate an infinite odds ratio. A procedure for doing this is available using THE LOGISTIC procedure in SAS. Further details of doing this and other ways of removing this problem are illustrated [http://www.uoregon.edu/~robinh/lgst_zero.txt here.]

A practical approach to this issue is dealt with by [attachment:infmles.pdf Rindskopf(2002).]

References

Albert A and Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 1-10.

Mehta, CR and Patel NR (1995). Exact logistic regression:Theory and examples. Statistics in Medicine, 14, 2143-2160.

Rindskopf D (2002) Infinite parameter estimates in logistic regression: opportunities, not problems. Journal of Educational and Behavioral Statistics 27(2) 147-161.