FAQ/RegressionOutliers - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Type the odd characters out in each group: abz2a 125t7 HhHaHh year.s 5433r21 worl3d

location: FAQ / RegressionOutliers

Checking for outliers in regression

According to Hoaglin and Welsch (1978) leverage values above 2(p+1)/n where p predictors are in the regression on n observations (items) are influential values. If the sample size is < 30 a stiffer criterion such as 3(p+1)/n is suggested.

Leverage is also related to the i-th observation's Mahalanobis distance, MD(i), such that for sample size, N

Leverage for observation i = MD(i)/(N-1) + 1/N

so

Critical MD(i) = (2(p+1)/N - 1/N)(N-1)

(See Tabachnick and Fidell)

Other outlier detection methods using boxplots are in the Exploratory Data Analysis Graduate talk located here or by using z-scores using tests such as Grubb's test - further details and an on-line calculator are located here.

Hair, Anderson, Tatham and Black (1998) suggest Cook's distances greater than 1 are influential. Hair et al mention that some people also use 4/(N-k-1) for k predictors and N points as a threshold for Cook’s distance which usually gives a lower threshold than 1 (e.g. with 1 predictor and 27 observations this gives 4/(27-1-1) = 0.16). A third threshold of 4/N is also mentioned (Bollen and Jackman (1990)) which would give a threshold of 4/27 = 0.14 in the above example.

References

Bollen, K. A. and Jackman, R. W. (1990) Regression diagnostics: An expository treatment of outliers and influential cases, in Fox, John; and Long, J. Scott (eds.); Modern Methods of Data Analysis (pp. 257-91). Newbury Park, CA: Sage.

Hair, J., Anderson, R., Tatham, R. and Black W. (1998). Multivariate Data Analysis (fifth edition). Englewood Cliffs, NJ: Prentice-Hall.

Hoaglin, D. C. and Welsch, R. E. (1978). The hat matrix in regression and ANOVA. The American Statistician 32, 17-22.

Return to Statistics FAQ page

Return to Statistics main page

Return to CBU main page

These pages are maintained by Ian Nimmo-Smith and Peter Watson