Differences between revisions 15 and 16

Fitting a regression model assuming two changepoints

For two changepoints, K1 and K2, the regression can be parameterised as:

Y = A + B1 (time-K1) I(time<=K1) + B2 (time-K1) I(K1 <= time <= K2) + B3 (time-K2) I(time >=K2)

where A is the predicted average response at K1, A + B2 (K2-K1) is the predicted average response at K2, B1 is the change upto and including K1, B2 is the change between K1 and K2 and B3 is the change from K2 onwards and I() is the indicator function.

In fact it follows the general parameterisation for a model with J-1 changepoints (separating J lines) is $$Y = A + B_text{1} (\mbox{time}-K_text{1}) I(\mbox{time} <= K_text{1}) + \sum_text{i}^text{J-1} [B_text{i} (\mbox{time}-K_text{i-1}) I(K_text{i-1} <= \mbox{time} <= K_text{i}) ] + B_text{J} (\mbox{time} - K_text{J-1}) I(\mbox{time} >= K_text{J-1}) $$

where A is the predicted average response at $$K_text{1}$$, $$A + B_text{i} (k_text{i}-k_text{i-1})$$ is the predicted response at $$K_text{i}$$ (i > 1) and $$B_text{i}$$ is the slope of the line leading up to $$K_text{i}$$.

As a first step we plot the example data (dat) using the R code below.

npts <- 11
grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)
stripchart(grp ~ dat, method = "stack")

The data from the 11 time points appear to have two changepoints with an initial flat phase followed by a sharp decrease which culminates in a sharp increase towards the end of the series. The code below fits this two changepoint model in R to the above data set, dat.

npts <- 11

grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)

rss2 <- matrix(NA,npts,npts)

for (k1 in 1:npts-1) {
 for (k2 in (k1+1):npts) {
gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out

rss2[k1,k2] <- sum(out$residuals^2)
}
}

coor <- which(rss2 == min(rss2,na.rm=TRUE), arr.ind = TRUE)

The best fitting model with two changepoints (K1 and K2) is one with changepoints at 5 and 9.

> print(k1)
[1] 5
> print(k2)
[1] 9

You can also output further diagnostics for the best fitting model. The R code below, for example, outputs the predicted values.

k1 <- coor[1]
k2 <- coor[2]

gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out$fitted.values

You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces this plot.

plot(grp, out$fitted.values,type="n")
lines(grp,out$fitted.values,lty=1)
points(grp,dat,pch=1)

One should decide on the number of changepoints apriori. In this way the number of changepoints should not be compared because the more changepoints there are the better the fit to the data culminating in a model (which usually makes no psychological sense) which assumes a changepoint occurs at each data point which will trivially always perfectly fit the data.

-  ⇤ ← Revision 15 as of 2011-01-06 11:59:09 → 
  Size: 3288
  Editor: PeterWatson
  Comment:
+   ← Revision 16 as of 2013-03-08 10:17:57 → ⇥
  Size: 3290
  Editor: localhost
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 80:
-You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces [attachment:cpt2.pdf this plot].
+You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces [[attachment:cpt2.pdf|this plot]].

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Fitting a regression model assuming two changepoints