FAQ/thirds - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Finzd thee wrang lelters ino eacuh wosrd

location: FAQ / thirds

Optimal cut-offs for splitting a variable into groups for correlating with an outcome variable

Gelman and Park (2009) suggest it is optimal (in having accurately measured regression coefficients with small variances) to split a continuous variable into three segments coding the lowest values (in either the lowest quarter or third) as -0.5, the middle section (between lowest quartile/third and upper quartile/third) as 0 and the highest values (in upper third or upper quartile) as 0.5 (see page 2 of the paper). In this way the regression coefficient for the coded variable represents the difference between the means in the highest and lowest sections. Such a splitting is shown to perform efficiently if the predictor is uniformly or normally distributed.

Semi-partial correlations and regression coefficients can be obtained in the usual way with the regression coefficient corresponding to the difference in the means assuming the values of the other predictors are held constant (see page 6 of the paper).

Reference

Gelman A and Park D (2009) Splitting a predictor at the upper quarter or third and the lower quarter or third. The American Statistician 63(1) 1-8.