Diff for "FAQ/skew" - CBU statistics Wiki
location: Diff for "FAQ/skew"
Differences between revisions 1 and 10 (spanning 9 versions)
Revision 1 as of 2007-07-20 09:05:42
Size: 860
Editor: PeterWatson
Comment:
Revision 10 as of 2016-08-31 14:07:04
Size: 1861
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
  = How should I deal with skew when doing correlations? =
Line 2: Line 4:
= How should I deal with skew when doing correlations? Skewness (where the data is bunched at one end e.g. ceiling or floor effects) and in particular outliers can give [[attachment:outlier.ppt|spurious Pearson correlations.]]
Line 4: Line 6:
Skewness (where the data is bunched at one end e.g. ceiling or floor effects) and in particular outliers can give [attachment:outlier.ppt spurious Pearson correlations.] To properly analyse the effects of skew one should look at the residuals from a regression using one of the variables as a predictor of the other. If the residuals are not normally distributed about zero the Pearson correlation could be unreliable. This can be checked by plotting - see regression talk at StatsCourse2006.
Line 6: Line 8:
To properly analyse the effects of skew one should look at the residuals from a regression using one of the variables as a predictor of the other. If the residuals are not normally distributed about zero the Pearson correlation could be unreliable. This can be checked by plotting - see [:Graduate talk on regression.] A suggested strategy is to transform one of the two variables, using either a power transform, or if the residuals are still non-normal after that, a rank transform (Spearman's rho or Kendall's tau-b) or compute Normal scores after separately ranking each pair of variables which are to be correlated (Bishara and Hittner, 2012).
Line 8: Line 10:
A suggested strategy is to transform one of the two variables, using either a power transform, or if the residuals are still non-normal that, a rank transform (Spearman's rho or Kendall's tau-b). de Winter, Golsing and Potter (2016) suggest using Pearson correlations for 'light-tailed' distributions and the Spearman correlation for heavier tailed distributions e.g. when outliers are present.
Line 10: Line 12:
 * [http://findarticles.com/p/articles/mi_m2405/is_n4_v122/ai_17848623/pg_7 Further discussion] __Further Discussion__

Bishara, A. J. and Hittner, J. B. (2012) Testing the Significance of a Correlation With Nonnormal Data: Comparison of Pearson, Spearman, Transformation, and Resampling Approaches.
''Psychological Methods'' '''17 (3)''' 399–417.

de Winter, J. C. F., Gosling, S. D. & Potter, J. (2016) Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: a tutorial using simulations and empirical data. ''Psychological Methods'' '''21(3)''' 273-290.

[[http://findarticles.com/p/articles/mi_m2405/is_n4_v122/ai_17848623/pg_7|Dunlap, W. P., Burke, M. J., & Greer, T. (1995). The effect of skew on the magnitude of product-moment correlations. Journal of General Psychology, 122, 365-377.]]

How should I deal with skew when doing correlations?

Skewness (where the data is bunched at one end e.g. ceiling or floor effects) and in particular outliers can give spurious Pearson correlations.

To properly analyse the effects of skew one should look at the residuals from a regression using one of the variables as a predictor of the other. If the residuals are not normally distributed about zero the Pearson correlation could be unreliable. This can be checked by plotting - see regression talk at StatsCourse2006.

A suggested strategy is to transform one of the two variables, using either a power transform, or if the residuals are still non-normal after that, a rank transform (Spearman's rho or Kendall's tau-b) or compute Normal scores after separately ranking each pair of variables which are to be correlated (Bishara and Hittner, 2012).

de Winter, Golsing and Potter (2016) suggest using Pearson correlations for 'light-tailed' distributions and the Spearman correlation for heavier tailed distributions e.g. when outliers are present.

Further Discussion

Bishara, A. J. and Hittner, J. B. (2012) Testing the Significance of a Correlation With Nonnormal Data: Comparison of Pearson, Spearman, Transformation, and Resampling Approaches. Psychological Methods 17 (3) 399–417.

de Winter, J. C. F., Gosling, S. D. & Potter, J. (2016) Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: a tutorial using simulations and empirical data. Psychological Methods 21(3) 273-290.

Dunlap, W. P., Burke, M. J., & Greer, T. (1995). The effect of skew on the magnitude of product-moment correlations. Journal of General Psychology, 122, 365-377.

None: FAQ/skew (last edited 2017-04-26 09:33:20 by PeterWatson)