# A note on correcting for restriction of ranges which underestimate Pearson correlations

A Pearson correlation on variables which take a subset of values of at least one of the two variables being correlated will tend to be smaller than using a larger range of values. For example in an extreme case if we only used people with IQ scores of 100 and correlated IQ with memory score we would obtain a correlation of zero. To obtain a zero order correlation you need two variables. Variables with skews > abs(2) have been recommended as likely to suffer from this. See also Kendall and Stuart (1958).

Chan & Chan (2004) (amongst others) present formula which adjust (upwards) a correlation based on a subset of values in one of the variables to represent the correlation you would have got using a larger set. (e.g. taking the correlation using IQ scores between 95 and 105 and adjusting to estimate the correlation you would have got using IQ scores between 70 and 140. You need, though, to know the variance of one of the two variables for this larger range of values.

In particular as suggested here (pdf is here) the (Pearson) correlation for the larger range, r(corrected) is obtained using VR and SDR, the respective variance and standard deviation for one of the variables (the predictor) in the restricted range, and V and SD, its respective variance and standard deviation when it takes the larger range of values:

r(corrected) = r(SD/SDR) divided by square root of $$[(1-r)^{2} + r^{2 }(V/VR)$$]

which is equivalent to formula (1) in the above merely dividing its top and bottom by SDR. This correction may be computed in R using the *rangeCorrection* function as described here.

You must also assume the relationship between the two variables is linear and as accurate in the smaller and larger ranges.

The above formula for adjustment is also one of several which may be computed using the rangeCorrection procedure in R (see here). This is based upon Thorndike (1949).

This underestimation of a relationship due to small variable variation is also sometimes called *attenuation*.

Also relatively small amounts of measurement error can distort correlations if this is large relative to the variance in either of the two variables being correlated (Bland, 2005). This is more likely if one of the variables has a restricted range.

Formulae for adjusting for restriction of range are also discussed in Cohen and Cohen (1983) which is available in the CBSU library.

References

Baguley T (2012, pp. 215-7) Serious stats. A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan:New York.

Bland M (2005) Measuring agreement between measurements. Talk presented at Centre for Statistics in Medicine, Oxford.

Chan W, Chan DW-L (2004) Bootstrap standard error and confidence intervals for the correlation corrected for range restriction: a simulation study. *Psychological methods* **9(3)** 369-385.

Cohen, J and Cohen, P (1983) Applied multiple regression/correlation analysis for the behavioral sciences. Second edition. Lawrence Erlbaum:Hillsdale, New Jersey.

Cohen, J, Cohen P, West SG and Aiken LS (2003) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Third Edition. Lawrence Erlbaum:Mahwah, New Jersey.

Fiedler, K (2011) Voodoo correlations are everywhere - not only in neuroscience. *Perspectives on Psychological Science* **6** 163-171. The abstract of this article may be found on-line here.

Kendall, MG and Stuart, A (1958) The Advanced Theory of Statistics. New York:Hafner.

Thorndike RL (1949) Personnel selection: Test and measurement techniques. New York:Wiley.