R code to implement the extreme studentised deviate (ESD) test

Code taken from here.

This test is a more robust form of the sequential Grubb's test which tests for a single outlier each time, either deleting the outlier and reapplying until no more outliers are found.

R commands and output:

## Input data.
y = c(-0.25, 0.68, 0.94, 1.15, 1.20, 1.26, 1.26,
       1.34, 1.38, 1.43, 1.49, 1.49, 1.55, 1.56,
       1.58, 1.65, 1.69, 1.70, 1.76, 1.77, 1.81,
       1.91, 1.94, 1.96, 1.99, 2.06, 2.09, 2.10,
       2.14, 2.15, 2.23, 2.24, 2.26, 2.35, 2.37,
       2.40, 2.47, 2.54, 2.62, 2.64, 2.90, 2.92,
       2.92, 2.93, 3.21, 3.26, 3.30, 3.59, 3.68,
       4.30, 4.64, 5.34, 5.42, 6.01)

## Generate normal probability plot.
qqnorm(y)

## Create function to compute the test statistic.
rval = function(y){
       ares = abs(y - mean(y))/sd(y)
       df = data.frame(y, ares)
       r = max(df$ares)
       list(r, df)}

## Define values and vectors.
n = length(y)
alpha = 0.05
lam = c(1:10)
R = c(1:10)

## Compute test statistic until r=10 values have been
## removed from the sample.
for (i in 1:10){

if(i==1){
rt = rval(y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

else if(i!=1){
rt = rval(newdf$y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

## Compute critical value.
p = 1 - alpha/(2*(n-i+1))
t = qt(p,(n-i-1))
lam[i] = t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1))

}
## Print results.
newdf = data.frame(c(1:10),R,lam)
names(newdf)=c("No. Outliers","Test Stat.", "Critical Val.")
newdf

For this example, the largest number of outliers for which the test statistic is greater than the critical value (at the 5 % level) is three. We therefore conclude that there are three outliers in this data set which correspond to the three largest values of abs(y-mean(y)).

##>    No. Outliers Test Stat. Critical Val.
##> 1             1   3.118906      3.158794
##> 2             2   2.942973      3.151430
##> 3             3   3.179424      3.143890
##> 4             4   2.810181      3.136165
##> 5             5   2.815580      3.128247
##> 6             6   2.848172      3.120128
##> 7             7   2.279327      3.111796
##> 8             8   2.310366      3.103243
##> 9             9   2.101581      3.094456
##> 10           10   2.067178      3.085425

The three outliers are the observations which have the three largest absolute deviations from the variable mean which we can compute as below.

abs(y-mean(y))

You may consider to replace the outlier value(s) with the next highest/lowest (non-outlier) number.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

R code to implement the extreme studentised deviate (ESD) test