Diff for "FAQ/infmles" - CBU statistics Wiki
location: Diff for "FAQ/infmles"
Differences between revisions 5 and 6
Revision 5 as of 2006-07-25 13:04:18
Size: 1944
Editor: pc0082
Comment:
Revision 6 as of 2006-07-25 13:05:52
Size: 1943
Editor: pc0082
Comment:
Deletions are marked like this. Additions are marked like this.
Line 18: Line 18:
Albert A. and Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 1-10 Albert A and Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 1-10

Problems interpreting logistic regression regression estimates is caused by having too good a fit!

Albert and Anderson (1984) observed that when you have perfect or near perfect prediction then the logistic regression regression estimates and standard errors are undefined. To see why this is suppose we are interested in comparing patients with controls on a pass/fail criteria. All the patients fail and all the controls pass. Now, the regression estimates in logistic regression represent odds ratios, for example the ratio of odds of passing to failing in the controls compared to the patients as given in the equation below.

OR = (A*B)/(C*D)

where A=number of controls who pass, B=number of patients who fail, C=number of controls who fail and D=number of patients who pass.

The odds ratio (and by implication the regression estimates) are undefined in this example because no patients pass and no controls fail. The OR equation shows that the odds ratio is also undefined if either no controls fail or no patients pass.

In particular the Wald statistic which SPSS evaluates, based on the square of the ratio of the regression estimate to its standard error, should not be used in these perfect fit cases because it grossly underestimates the effect of the predictor variables.

Instead twice the difference in log likelihoods should be used to assess the influence of a predictor variable. Fit the model with and without the predictor(s) of interest and compare the term called –2 log Likelihood in the model summary box. The difference between these is chi-squared on p degrees of freedom, where p variables have been dropped from the model. The p-value can be obtained using functions under transform:compute. Another FAQ will deal with this.

Reference:

Albert A and Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 1-10

None: FAQ/infmles (last edited 2014-04-16 15:51:50 by PeterWatson)