cox.pht {mgcv} | R Documentation |
The cox.ph
family only allows one set of covariate values per subject. If each subject has several time varying covariate measurements then it is still possible to fit a proportional hazards regression model, via an equivalent Poisson model. The recipe is provided by Whitehead (1980) and is equally valid in the smooth additive case. Its drawback is that the equivalent Poisson dataset can be quite large.
The trick is to generate an artificial Poisson observation for each subject in the risk set at each non-censored event time. The corresponding covariate values for each subject are whatever they are at the event time, while the Poisson response is zero for all subjects except those experiencing the event at that time (this corresponds to Peto's correction for ties). The linear predictor for the model must include an intercept for each event time (the cumulative sum of the exponential of these is the Breslow estimate of the baseline hazard).
Below is some example code employing this trick for the pbcseq
data from the survival
package. It uses bam
for fitting with the discrete=TRUE
option for efficiency: there is some approximation involved in doing this, and the exact equivalent to what is done in cox.ph
is rather obtained by using gam
with method="REML"
(taking some 14 times the computational time for the example below).
The function tdpois
in the example code uses crude piecewise constant interpolation for the covariates, in which the covariate value at an event time is taken to be whatever it was the previous time that it was measured. Obviously more sophisticated interpolation schemes might be preferable.
Whitehead (1980) Fitting Cox's regression model to survival data using GLIM. Applied Statistics 29(3):268-275
require(mgcv);require(survival) ## First define functions for producing Poisson model data frame app <- function(x,t,to) { ## wrapper to approx for calling from apply... y <- if (sum(!is.na(x))<1) rep(NA,length(to)) else approx(t,x,to,method="constant",rule=2)$y if (is.factor(x)) factor(levels(x)[y],levels=levels(x)) else y } ## app tdpois <- function(dat,event="z",et="futime",t="day",status="status1", id="id") { ## dat is data frame. id is patient id; et is event time; t is ## observation time; status is 1 for death 0 otherwise; ## event is name for Poisson response. if (event %in% names(dat)) warning("event name in use") require(utils) ## for progress bar te <- sort(unique(dat[[et]][dat[[status]]==1])) ## event times sid <- unique(dat[[id]]) prg <- txtProgressBar(min = 0, max = length(sid), initial = 0, char = "=",width = NA, title="Progress", style = 3) ## create dataframe for poisson model data dat[[event]] <- 0; start <- 1 dap <- dat[rep(1:length(sid),length(te)),] for (i in 1:length(sid)) { ## work through patients di <- dat[dat[[id]]==sid[i],] ## ith patient's data tr <- te[te <= di[[et]][1]] ## times required for this patient ## Now do the interpolation of covariates to event times... um <- data.frame(lapply(X=di,FUN=app,t=di[[t]],to=tr)) ## Mark the actual event... if (um[[et]][1]==max(tr)&&um[[status]]==1) um[[event]][nrow(um)] <- 1 um[[et]] <- tr ## reset time to relevant event times dap[start:(start-1+nrow(um)),] <- um ## copy to dap start <- start + nrow(um) setTxtProgressBar(prg, i) } close(prg) dap[1:(start-1),] } ## tdpois ## The following takes too long for CRAN checking ## (but typically takes a minute or less). ## Convert pbcseq to equivalent Poisson form... pbcseq$status1 <- as.numeric(pbcseq$status==2) ## death indicator pb <- tdpois(pbcseq) ## conversion pb$tf <- factor(pb$futime) ## add factor for event time ## Fit Poisson model... b <- bam(z ~ tf - 1 + sex + trt + s(sqrt(protime)) + s(platelet)+ s(age)+ s(bili)+s(albumin), family=poisson,data=pb,discrete=TRUE,nthreads=2) par(mfrow=c(2,3)) plot(b,scale=0) ## compute residuals... chaz <- tapply(fitted(b),pb$id,sum) ## cum haz by subject d <- tapply(pb$z,pb$id,sum) ## censoring indicator mrsd <- d - chaz ## Martingale drsd <- sign(mrsd)*sqrt(-2*(mrsd + d*log(chaz))) ## deviance ## plot survivor function and s.e. band for subject 25 te <- sort(unique(pb$futime)) ## event times di <- pbcseq[pbcseq$id==25,] ## data for subject 25 pd <- data.frame(lapply(X=di,FUN=app,t=di$day,to=te)) ## interpolate to te pd$tf <- factor(te) X <- predict(b,newdata=pd,type="lpmatrix") eta <- drop(X%*%coef(b)); H <- cumsum(exp(eta)) J <- apply(exp(eta)*X,2,cumsum) se <- diag(J%*%vcov(b)%*%t(J))^.5 plot(stepfun(te,c(1,exp(-H))),do.points=FALSE,ylim=c(0.7,1), ylab="S(t)",xlab="t (days)",main="",lwd=2) lines(stepfun(te,c(1,exp(-H+se))),do.points=FALSE) lines(stepfun(te,c(1,exp(-H-se))),do.points=FALSE) rug(pbcseq$day[pbcseq$id==25]) ## measurement times