FAQ/Winsor - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Type the odd characters out in each group: abz2a 125t7 HhHaHh year.s 5433r21 worl3d

Revision 27 as of 2013-04-25 09:32:45

location: FAQ / Winsor

Methods of Handling outliers

The purpose of this note is to mention three approaches to reducing the effects of outliers in a particular variable in a data set which does not involve removing outlying data points.

Winsorising a variable removes a fixed percentage of its highest and lowest values and replaces them by a corresponding percentile

e.g. Winsorising a variable with values of 1 4 7 9 10 25 replaces 1 and 25 by 2.5 and 17.5 which are the 25th and 75th percentiles respectively.

Trimming a variable removes a fixed percentage of its lowest and highest values. Consequently a 20% trimmed mean is the mean of a variable that has had its top 20% and its lowest 20% of values removed.

Some authors recommend computing estimates of variability using trimming by taking bootstrap samples (e.g. Wilcox et al, 2000 use trimming to downweight the effect of outliers in repeated measures anova).

There are also weighting techniques collectively known as M-estimation which give different emphasis (weights) to each observation with more outlying observations being giving smaller weights.

Transforming data using power transforms such as log, square root can also help downweight outliers. The Box-Cox transformation can be used to determine an optimal power transform. SPSS code is available.

Sometimes outliers could be removed e.g. Reaction times less than 100 which are highly improbably fast. These may be filtered out of the data file in SPSS.

References

A summary of the above robust estimates for dealing with outliers is in: Andrews (1972) D.F. Andrews, P.J. Bickel, F.R. Hampel, P.J. Huber, W.H. Rogers and J.W. Tukey, Robust estimates of location survey and advances, Princeton University Press, Princeton (1972).

Details of bootstrapping are in: Efron, & Tibshirani (1993) B. Efron and R.J. Tibshirani, An introduction to the bootstrap, Chapman & Hall, London (1993). Bootstrapping SPSS macros for regressions are here.

Keselman, HJ, Algina, J, Lix, LM, Wilcox, RR, Deering, KN (2008) A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes Psychological Methods 13(2) 110-129.

Keselman, HJ, Wilcox, RR, Lix, LM, Algina, J, Fradette K (2010) Adaptive robust estimation and testing. British Journal of Mathematical and Statistical Psychology 60(2) 267-293.

Wilcox RR, Keselman HJ, Muska J and Cribbie R (2000) Repeated measures ANOVA: some results on comparing trimmed means and means. Journal of Mathematical and Statistical Psychology 53 69-82.