FAQ/skew - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Type the odd letters out: ONlY twO thinGs aRE infiNite

location: FAQ / skew

How should I deal with skew when doing correlations?

Skewness (where the data is bunched at one end e.g. ceiling or floor effects) and in particular outliers can give spurious Pearson correlations.

To properly analyse the effects of skew one should look at the residuals from a regression using one of the variables as a predictor of the other. If the residuals are not normally distributed about zero the Pearson correlation could be unreliable. This can be checked by plotting - see regression talk at StatsCourse2006.

A suggested strategy is to transform one of the two variables, using either a power transform, or if the residuals are still non-normal after that, a rank transform (Spearman's rho or Kendall's tau-b) or compute Normal scores after separately ranking each pair of variables which are to be correlated (Bishara and Hittner, 2012).

de Winter, Golsing and Potter (2016) suggest using Pearson correlations for 'light-tailed' distributions and the Spearman correlation for heavier tailed distributions e.g. when outliers are present.

Outliers should not be deleted unless there is some measurement problem (Langkjaer-Bain R (2017)).

Further Discussion

Bishara, A. J. and Hittner, J. B. (2012) Testing the Significance of a Correlation With Nonnormal Data: Comparison of Pearson, Spearman, Transformation, and Resampling Approaches. Psychological Methods 17 (3) 399–417.

de Winter, J. C. F., Gosling, S. D. & Potter, J. (2016) Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: a tutorial using simulations and empirical data. Psychological Methods 21(3) 273-290.

Dunlap, W. P., Burke, M. J., & Greer, T. (1995). The effect of skew on the magnitude of product-moment correlations. Journal of General Psychology, 122, 365-377.

Langkjaer-Bain R. (2017) The murky tale of Flint's deceptive water data. Significance 14(2) 17-21.