When should I use a logit analysis as opposed to an arcsine transformed ANOVA?
Jaeger (2008) advocates and illustrates using a logistic regression approach when the proportions are near 0 and 1 over the traditional method of an arcsine transform in an ANOVA. The latter can give spuriously statistically significant results in these cases. He suggests using separate logistic regressions for each subject and item (Lorch and Myers, 1990) and also use of the lmer procedure in the freeware R for fitting random effects to binomial data using generalized linear mixed models. The mixed refers to allowing the fitting of both fixed and random factors. The logistic regression he says has the advantage of giving directional comparisons via its regression coefficients (Odds Ratios) whereas post-hoc contrasts are needed to obtain this information using the ANOVA approach. These models may also be fitted in SPSS (Heck et al, 2012). Note a logit transform of the response is also used, ie of form log(p/(1-p)), assessing the importance of predictors via a multiple regression although this is not defined for p=0 or p=1.
Comparing two proportions
The arcsine transform, for sufficiently large n, renders the variance of h = 2[arcsine(p)] roughly equal to 1 no matter what the p. Cohen gives rules of thumb for magnitudes of differences comparing two arcsined proportions, known as Cohen's h, here. For comparing two proportions one could alternatively transform proportions into an odds ratios of form r = p/(1-p) and then compare the estimated odds for two domain-level p’s.
A beta regression (Smithson and Verkuilen, 2006) models both the response and its variance when the response is within the interval [0-1]. The response is assumed to follow a beta distribution which allows for skewness. The authors have fitted this model using amongst other software SPSS and SAS (GLIMMIX) although, contrary to the paper, the syntax for this model fitting is no longer available on Michael Smithson's website (it may still be available upon request to the authors). A simpler form of beta regression which models just the response may be implemented using the betareg procedure in R treating the response variance as constant. Smithson and Verkuilen, however, claim such an approach is suboptimal.
Heck RH, Thomas SL and Tabata LN (2012). Multilevel modeling of categorical outcomes using IBM SPSS. Routledge:New York.
Lorch RF and Myers JL (1990). Regression analyses of repeated measures data in cognitive research. Journal of Experimental Psychology: Learning, Memory and Cognition 16(1) 149-157.
Smithson M and Verkuilen J (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods 11(1) 54-71. A copy of this paper is available for free for CBUers using the PsychNet facility.