FAQ/cp2 - CBU statistics Wiki
location: FAQ / cp2

Fitting a regression model assuming two changepoints

For two changepoints, K1 and K2, the regression can be parameterised as:

Y = A + B1 (time-K1) I(time<=K1) + B2 (time-K1) I(K1 <= time <= K2) + B3 (time-K2) I(time >=K2)

where A is the predicted average response at K1, A + B2 (K2-K1) is the predicted average response at K2, B1 is the change upto and including K1, B2 is the change between K1 and K2 and B3 is the change from K2 onwards and I() is the indicator function.

In fact it follows the general parameterisation for a model with J-1 changepoints (separating J lines) is $$Y = A + B_text{1} (\mbox{time}-K_text{1}) I(\mbox{time} <= K_text{1}) + \sum_text{i}^text{J-1} [B_text{i} (\mbox{time}-K_text{i-1}) I(K_text{i-1} <= \mbox{time} <= K_text{i}) ] + B_text{J} (\mbox{time} - K_text{J-1}) I(\mbox{time} >= K_text{J-1}) $$

where A is the predicted average response at $$K_text{1}$$, $$A + B_text{i} (k_text{i}-k_text{i-1})$$ is the predicted response at $$K_text{i}$$ (i > 1) and $$B_text{i}$$ is the slope of the line leading up to $$K_text{i}$$.

As a first step we plot the example data (dat) using the R code below.

npts <- 11
grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)
stripchart(grp ~ dat, method = "stack")

The data from the 11 time points appear to have two changepoints with an initial flat phase followed by a sharp decrease which culminates in a sharp increase towards the end of the series. The code below fits this two changepoint model in R to the above data set, dat.

npts <- 11

grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)

rss2 <- matrix(NA,npts,npts)

for (k1 in 1:npts-1) {
 for (k2 in (k1+1):npts) {
gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out

rss2[k1,k2] <- sum(out$residuals^2)
}
}

coor <- which(rss2 == min(rss2,na.rm=TRUE), arr.ind = TRUE)

The best fitting model with two changepoints (K1 and K2) is one with changepoints at 5 and 9.

> print(k1)
[1] 5
> print(k2)
[1] 9

You can also output further diagnostics for the best fitting model. The R code below, for example, outputs the predicted values.

k1 <- coor[1]
k2 <- coor[2]

gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out$fitted.values

You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces this plot.

plot(grp, out$fitted.values,type="n")
lines(grp,out$fitted.values,lty=1)
points(grp,dat,pch=1)

One should decide on the number of changepoints apriori. In this way the number of changepoints should not be compared because the more changepoints there are the better the fit to the data culminating in a model (which usually makes no psychological sense) which assumes a changepoint occurs at each data point which will trivially always perfectly fit the data.

None: FAQ/cp2 (last edited 2013-03-08 10:17:57 by localhost)