FAQ/Rcpt - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Type the odd characters out in each group: abz2a 125t7 HhHaHh year.s 5433r21 worl3d

location: FAQ / Rcpt

Fitting 2 regression lines and finding the location of a single changepoint

The model for outcome, Y, consists of the average response (A) at changepoint time, K1, and two regression coefficients which correspond to the change (B1) in the response occurring in time points upto the changepoint time and following it (B2).

Y = A + B1 (time-K1)[I(time<=k1)] + B2 (time-K1) [I(time>=k1)]

where I() is the indicator function. npts in the code below are the number of time points (equal to 10 in this example).

You may wish, initially, to plot the data (dat) which an be done using the R code below.

npts <- 10
grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,1,0)
 stripchart(grp ~ dat, method = "stack")

There are ten time points (grp) and a response (dat) which decreases sharply around time point five. The code below compares the residual sum of squares for each possible location of changepoint or knot and reports which changepoint position has the lowest residual sum of squares and hence the best fit.

npts <- 10

grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,1,0)

rss1 <- rep(NA,npts)

for (k1 in 1:npts) {
gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0)

out <- lm(dat ~ gp1 + gp2)
rss1[k1] <- sum(out$residuals^2)
}

print(rss1)
coor <- which(rss1 == min(rss1,na.rm=TRUE))

k1 <- coor[1]

The output shows the minimum residual sum of squares occurs at changepoint five suggesting (plausibly) the trend changes direction at time five.

> print(rss1)
 [1] 37.587879 26.072222 15.911538 11.560790  8.494118 15.123529 27.277812
 [8] 33.815385 36.472222 37.587879
> print(k1)
[1] 5

You can also obtain the predicted values using the code below:

coor <- which(rss1 == min(rss1,na.rm=TRUE))

k1 <- coor[1]

gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0)

out <- lm(dat ~ gp1 + gp2)
out$fitted.values
print(rss1[coor[1]])

You can then plot the line of best fit and the original data points using the code below which produces this plot.

plot(grp, out$fitted.values,type="n")
lines(grp,out$fitted.values,lty=1)
points(grp,dat,pch=1)

There is analogous model you can fit for two changepoints which is presented here. One should decide on the number of changepoints apriori. In this way the number of changepoints should not be compared because the more changepoints there are the better the fit to the data culminating in a model (which usually makes no psychological sense) which assumes a changepoint occurs at each data point which will trivially always perfectly fit the data.

A very similar group of models which uses max and min terms as regressors called M(ultivariate) A(daptive) R(egression) Splines (MARS) models may also be fitted in R.