FAQ/heterogeneity - CBU statistics Wiki

Heterogeneity of variance in probit and logit analysis (and Poisson regression)

Probit and Logit functions are used to relate predictors to probabilities, e.g. the relationship between hearing thresholds and the proportion of people who registered a stimuli at these thresholds. The probit and logit functions are merely transforms which turn the output of a linear regression into proportions.

The output is slightly different to that from the usual linear regression. In particular a term called the heterogeneity of variance is produced – with very little explanation of what it means. In fact, this is analogous to the error mean square in an ordinary analysis of variance table. It represents the lack of fit of the model and if it is sufficiently small it suggests, for example, that the probability of picking up a stimulus is closely related to the hearing threshold of that stimulus. The quieter the sound the less people pick it up. It also suggests that noting the hearing threshold is sufficient, on its own, to predict the probability of hearing a stimulus.

The lack of fit term is chi-square distributed with N-p degrees of freedom for N observations and p-1 predictors. In our example N is the number of different hearing thresholds and p=2 representing the intercept and hearing threshold regression term. So the lack of fit term will have N-2 degrees of freedom. If the lack of fit term is substantially larger than a chi-square on N-2 df then we know we have a poor model, possibly due to other factors influencing stimulus hearing other than the volume of the sound. This case is known as overdispersion which can be corrected for using a multiplier obtainable from the output or by fitting a differnet distribution to the data such as a negative binomial (see e.g. Heck, Thomas and Tabata (2012), p.102). Another issue is correcting for numbers of events which are compared over different lengths of time. You might expect to see more events if they are observed for longer periods or obtain higher numbers of correct answers if they are based upon tests with more questions. The paper of Coxe, West and Aiken (2009) here addresses some straightforward ways of handling these two issues.

If there is a statistically significant lack of fit SPSS adjusts for this by producing confidence intervals, in our example, for proportions of people hearing a stimulus, at each stimulus threshold, which are very large indicating a lack of confidence, or precision, in predicting the probability of hearing a sound using the volume of that sound.

Reference

Coxe, S, West, SG and Aiken, LS (2009) The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives. Journal of Personality Assessment, 91(2), 121–136.

Heck, RH, Thomas, SL and Tabata, LN (2012) Multilevel modeling of categorical outcomes using IBM SPSS. Routledge: New York.