Diff for "FAQ/cp2" - CBU statistics Wiki
location: Diff for "FAQ/cp2"
Differences between revisions 15 and 16
Revision 15 as of 2011-01-06 11:59:09
Size: 3288
Editor: PeterWatson
Comment:
Revision 16 as of 2013-03-08 10:17:57
Size: 3290
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 80: Line 80:
You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces [attachment:cpt2.pdf this plot]. You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces [[attachment:cpt2.pdf|this plot]].

Fitting a regression model assuming two changepoints

For two changepoints, K1 and K2, the regression can be parameterised as:

Y = A + B1 (time-K1) I(time<=K1) + B2 (time-K1) I(K1 <= time <= K2) + B3 (time-K2) I(time >=K2)

where A is the predicted average response at K1, A + B2 (K2-K1) is the predicted average response at K2, B1 is the change upto and including K1, B2 is the change between K1 and K2 and B3 is the change from K2 onwards and I() is the indicator function.

In fact it follows the general parameterisation for a model with J-1 changepoints (separating J lines) is $$Y = A + B_text{1} (\mbox{time}-K_text{1}) I(\mbox{time} <= K_text{1}) + \sum_text{i}^text{J-1} [B_text{i} (\mbox{time}-K_text{i-1}) I(K_text{i-1} <= \mbox{time} <= K_text{i}) ] + B_text{J} (\mbox{time} - K_text{J-1}) I(\mbox{time} >= K_text{J-1}) $$

where A is the predicted average response at $$K_text{1}$$, $$A + B_text{i} (k_text{i}-k_text{i-1})$$ is the predicted response at $$K_text{i}$$ (i > 1) and $$B_text{i}$$ is the slope of the line leading up to $$K_text{i}$$.

As a first step we plot the example data (dat) using the R code below.

npts <- 11
grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)
stripchart(grp ~ dat, method = "stack")

The data from the 11 time points appear to have two changepoints with an initial flat phase followed by a sharp decrease which culminates in a sharp increase towards the end of the series. The code below fits this two changepoint model in R to the above data set, dat.

npts <- 11

grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)

rss2 <- matrix(NA,npts,npts)

for (k1 in 1:npts-1) {
 for (k2 in (k1+1):npts) {
gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out

rss2[k1,k2] <- sum(out$residuals^2)
}
}

coor <- which(rss2 == min(rss2,na.rm=TRUE), arr.ind = TRUE)

The best fitting model with two changepoints (K1 and K2) is one with changepoints at 5 and 9.

> print(k1)
[1] 5
> print(k2)
[1] 9

You can also output further diagnostics for the best fitting model. The R code below, for example, outputs the predicted values.

k1 <- coor[1]
k2 <- coor[2]

gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out$fitted.values

You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces this plot.

plot(grp, out$fitted.values,type="n")
lines(grp,out$fitted.values,lty=1)
points(grp,dat,pch=1)

One should decide on the number of changepoints apriori. In this way the number of changepoints should not be compared because the more changepoints there are the better the fit to the data culminating in a model (which usually makes no psychological sense) which assumes a changepoint occurs at each data point which will trivially always perfectly fit the data.

None: FAQ/cp2 (last edited 2013-03-08 10:17:57 by localhost)