The model for outcome, Y, consists of the average response (A) at changepoint time, K1, and two regression coefficients which correspond to the change (B1) in the response occurring in time points upto the changepoint time and following it (B2).
Y = A + B1 (time-K1)[I(time<=k1)] + B2 (time-K1) [I(time>=k1)]
where I() is the indicator function. npts in the code below are the number of time points (equal to 10 in this example).
You may wish, initially, to plot the data (dat) which an be done using the R code below.
npts <- 10 grp <- c(1:npts) dat <- c(10,10,11,10,11,9,4,2,1,0) stripchart(grp ~ dat, method = "stack")
There are ten time points (grp) and a response (dat) which decreases sharply around time point five. The code below compares the residual sum of squares for each possible location of changepoint or knot and reports which changepoint position has the lowest residual sum of squares and hence the best fit.
npts <- 10 grp <- c(1:npts) dat <- c(10,10,11,10,11,9,4,2,1,0) rss1 <- rep(NA,npts) for (k1 in 1:npts) { gp1 <- rep(0,npts) gp1 <- (grp[]-k1) * (grp[]-k1 <= 0) gp2 <- rep(0,npts) gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) out <- lm(dat ~ gp1 + gp2) rss1[k1] <- sum(out$residuals^2) } print(rss1) coor <- which(rss1 == min(rss1,na.rm=TRUE)) k1 <- coor[1]
The output shows the minimum residual sum of squares occurs at changepoint five suggesting (plausibly) the trend changes direction at time five.
> print(rss1) [1] 37.587879 26.072222 15.911538 11.560790 8.494118 15.123529 27.277812 [8] 33.815385 36.472222 37.587879 > print(k1) [1] 5
You can also obtain the predicted values using the code below:
coor <- which(rss1 == min(rss1,na.rm=TRUE)) k1 <- coor[1] gp1 <- rep(0,npts) gp1 <- (grp[]-k1) * (grp[]-k1 <= 0) gp2 <- rep(0,npts) gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) out <- lm(dat ~ gp1 + gp2) out$fitted.values print(rss1[coor[1]])
You can then plot the line of best fit and the original data points using the code below which produces this plot.
plot(grp, out$fitted.values,type="n") lines(grp,out$fitted.values,lty=1) points(grp,dat,pch=1)
There is analogous model you can fit for two changepoints which is presented here. One should decide on the number of changepoints apriori. In this way the number of changepoints should not be compared because the more changepoints there are the better the fit to the data culminating in a model (which usually makes no psychological sense) which assumes a changepoint occurs at each data point which will trivially always perfectly fit the data.
A very similar group of models which uses max and min terms as regressors called M(ultivariate) A(daptive) R(egression) Splines (MARS) models may also be fitted in R.