FAQ/ESDtest - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Finzd thee wrang lelters ino eacuh wosrd

location: FAQ / ESDtest

R code to implement the extreme studentised deviate (ESD) test

Code taken from here.

This test is a more robust form of the sequential Grubb's test which tests for a single outlier each time, either deleting the outlier and reapplying until no more outliers are found.

R commands and output:

## Input data.
y = c(-0.25, 0.68, 0.94, 1.15, 1.20, 1.26, 1.26,
       1.34, 1.38, 1.43, 1.49, 1.49, 1.55, 1.56,
       1.58, 1.65, 1.69, 1.70, 1.76, 1.77, 1.81,
       1.91, 1.94, 1.96, 1.99, 2.06, 2.09, 2.10,
       2.14, 2.15, 2.23, 2.24, 2.26, 2.35, 2.37,
       2.40, 2.47, 2.54, 2.62, 2.64, 2.90, 2.92,
       2.92, 2.93, 3.21, 3.26, 3.30, 3.59, 3.68,
       4.30, 4.64, 5.34, 5.42, 6.01)

## Generate normal probability plot.
qqnorm(y)

## Create function to compute the test statistic.
rval = function(y){
       ares = abs(y - mean(y))/sd(y)
       df = data.frame(y, ares)
       r = max(df$ares)
       list(r, df)}

## Define values and vectors.
n = length(y)
alpha = 0.05
lam = c(1:10)
R = c(1:10)

## Compute test statistic until r=10 values have been
## removed from the sample.
for (i in 1:10){

if(i==1){
rt = rval(y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

else if(i!=1){
rt = rval(newdf$y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

## Compute critical value.
p = 1 - alpha/(2*(n-i+1))
t = qt(p,(n-i-1))
lam[i] = t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1))

}
## Print results.
newdf = data.frame(c(1:10),R,lam)
names(newdf)=c("No. Outliers","Test Stat.", "Critical Val.")
newdf

For this example, the largest number of outliers for which the test statistic is greater than the critical value (at the 5 % level) is three. We therefore conclude that there are three outliers in this data set which correspond to the three largest values of abs(y-mean(y)).

##>    No. Outliers Test Stat. Critical Val.
##> 1             1   3.118906      3.158794
##> 2             2   2.942973      3.151430
##> 3             3   3.179424      3.143890
##> 4             4   2.810181      3.136165
##> 5             5   2.815580      3.128247
##> 6             6   2.848172      3.120128
##> 7             7   2.279327      3.111796
##> 8             8   2.310366      3.103243
##> 9             9   2.101581      3.094456
##> 10           10   2.067178      3.085425

The three outliers are the observations which have the three largest absolute deviations from the variable mean which we can compute as below.

abs(y-mean(y))

You may consider to replace the outlier value(s) with the next highest/lowest (non-outlier) number.