Diff for "FAQ/ESDtest" - CBU statistics Wiki
location: Diff for "FAQ/ESDtest"
Differences between revisions 1 and 2
Revision 1 as of 2015-01-23 10:11:55
Size: 2702
Editor: PeterWatson
Comment:
Revision 2 as of 2015-01-23 10:17:17
Size: 2708
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 79: Line 79:
The three outliers are the observations with the three largest absolute deviations from the variable mean which we can compute as below. The three outliers are the observations which have the three largest absolute deviations from the variable mean which we can compute as below.

R code to implement the extreme studentised deviate (ESD) test

Code taken from here.

This test is a more robust form of the sequential Grubb's test which tests for a single outlier each time, either deleting or equating the outlier to the next nearest value and reapplying until no outliers remain.

R commands and output:

## Input data.
y = c(-0.25, 0.68, 0.94, 1.15, 1.20, 1.26, 1.26,
       1.34, 1.38, 1.43, 1.49, 1.49, 1.55, 1.56,
       1.58, 1.65, 1.69, 1.70, 1.76, 1.77, 1.81,
       1.91, 1.94, 1.96, 1.99, 2.06, 2.09, 2.10,
       2.14, 2.15, 2.23, 2.24, 2.26, 2.35, 2.37,
       2.40, 2.47, 2.54, 2.62, 2.64, 2.90, 2.92,
       2.92, 2.93, 3.21, 3.26, 3.30, 3.59, 3.68,
       4.30, 4.64, 5.34, 5.42, 6.01)

## Generate normal probability plot.
qqnorm(y)

## Create function to compute the test statistic.
rval = function(y){
       ares = abs(y - mean(y))/sd(y)
       df = data.frame(y, ares)
       r = max(df$ares)
       list(r, df)}

## Define values and vectors.
n = length(y)
alpha = 0.05
lam = c(1:10)
R = c(1:10)

## Compute test statistic until r=10 values have been
## removed from the sample.
for (i in 1:10){

if(i==1){
rt = rval(y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

else if(i!=1){
rt = rval(newdf$y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

## Compute critical value.
p = 1 - alpha/(2*(n-i+1))
t = qt(p,(n-i-1))
lam[i] = t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1))

}
## Print results.
newdf = data.frame(c(1:10),R,lam)
names(newdf)=c("No. Outliers","Test Stat.", "Critical Val.")
newdf

For this example, the largest number of outliers for which the test statistic is greater than the critical value (at the 5 % level) is three. We therefore conclude that there are three outliers in this data set which correspond to the three largest values of abs(y-mean(y)).

##>    No. Outliers Test Stat. Critical Val.
##> 1             1   3.118906      3.158794
##> 2             2   2.942973      3.151430
##> 3             3   3.179424      3.143890
##> 4             4   2.810181      3.136165
##> 5             5   2.815580      3.128247
##> 6             6   2.848172      3.120128
##> 7             7   2.279327      3.111796
##> 8             8   2.310366      3.103243
##> 9             9   2.101581      3.094456
##> 10           10   2.067178      3.085425

The three outliers are the observations which have the three largest absolute deviations from the variable mean which we can compute as below.

abs(y-mean(y))

None: FAQ/ESDtest (last edited 2017-06-14 08:50:25 by PeterWatson)