FAQ/Jaeger - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Type the odd letters out: ONlY twO thinGs aRE infiNite

location: FAQ / Jaeger

When should I use a logit analysis as opposed to an arcsine transformed ANOVA?

Jaeger (2008) advocates and illustrates using a logistic regression approach when the proportions are near 0 and 1 over the traditional method of an arcsine transform in an ANOVA. The latter can give spuriously statistically significant results in these cases. He suggests using separate logistic regressions for each subject and item (Lorch and Myers, 1990) and also use of the lmer procedure in the freeware R for fitting random effects to binomial data using generalized linear mixed models. The mixed refers to allowing the fitting of both fixed and random factors. The logistic regression he says has the advantage of giving directional comparisons via its regression coefficients (Odds Ratios) whereas post-hoc contrasts are needed to obtain this information using the ANOVA approach. These models may also be fitted in SPSS (Heck et al, 2012). Note a logit transform of the response is also used, ie of form log(p/(1-p)), assessing the importance of predictors via a multiple regression although this is not defined for p=0 or p=1.

Comparing two proportions

The arcsine transform, for sufficiently large n, renders the variance of h = 2[arcsine(sqrt[p])] roughly equal to 1 no matter what the p. Cohen gives rules of thumb for magnitudes of differences comparing two arcsined proportions, known as Cohen's h, here. For comparing two proportions one could alternatively transform proportions into an odds ratios of form r = p/(1-p) and then compare the estimated odds for two domain-level p’s.

Other approaches

A beta regression (Smithson and Verkuilen, 2006) models both the response and its variance when the response is within the interval [0-1]. The response is assumed to follow a beta distribution which allows for skewness. The authors have fitted this model using amongst other software SPSS and SAS (GLIMMIX) although, contrary to the paper, the syntax for this model fitting is no longer available on Michael Smithson's website (it may still be available upon request to the authors). A simpler form of beta regression which models just the response may be implemented using the betareg procedure in R treating the response variance as constant. Smithson and Verkuilen, however, claim such an approach is suboptimal.

References

Heck RH, Thomas SL and Tabata LN (2012). Multilevel modeling of categorical outcomes using IBM SPSS. Routledge:New York.

Jaeger TF (2008). Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. J. Mem Lang 59(4) 434-446.

Lorch RF and Myers JL (1990). Regression analyses of repeated measures data in cognitive research. Journal of Experimental Psychology: Learning, Memory and Cognition 16(1) 149-157.

Smithson M and Verkuilen J (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods 11(1) 54-71. A copy of this paper is available for free for CBUers using the PsychNet facility.