Recurrent events are common in longitudinal studies. They are often neglected in typical time-to-event analyses, which only take the first occurrence of an event into account. In this scenario, any event that occurs after the first event is disregarded. Sometimes, the first event makes a subsequent event more likely. The second event may be of the same type as the first (such as serial myocardial infarctions [MI]), or of a different type (such as heart failure [HF]) that is associated with the first. Multi-type recurrent events (MTREs) arise naturally in situations where more than one event type is of interest, and may possibly recur; a possible MTRE scenario would be an individual who experiences 2 back-to-back MIs, followed by HF, and subsequently dies. Death in this case would be a terminating event, after which no further events can occur.
Existing packages in R that handle recurrent events include the
survival package that uses the coxph()
function to fit Andersen-Gill (AG) models with robust variances. The
coxph() function can also be utilized to fit multiple types
of events, possibly recurrent, by fitting separate AG models for each
event type. Frailty models provide an alternative approach for modeling
recurrent events. The frailtypack package can fit joint
MTRE models with time-dependent covariates, including up to 2
terminating event types; the possible dependence structure between
multiple event types that are attributed to the unobserved random
effects is modeled with one or more parameters. Usually, the parameter
estimate of the dependence structure is interpreted as respective
pairwise correlation measures between 2 or more event types. The
msm package fits multi-state models, which are often used
to model disease states; a typical scenario is a chronic disease with
staged progression, illustrated by the 3-state illness-death model
(comprising health, illness, and death states, respectively).
Multi-state models can be adapted to model multiple types of events,
including recurrent events and a terminating event as an absorbing
state. If we wish to model MIs and HFs, both of which can recur, there
are many distinct states that can arise from the simple case of a
participant experiencing 2 MIs and 2 HFs. Such a participant could
experience MI → MI → HF → HF; MI → HF → MI → HF; etc. If we make the
reasonable assumption that the transition from the first MI to the
second MI (with no events in between) is different from the transition
between the first MI to the first HF, and so on, then MSM models can
rapidly grow in complexity given the many possible permutations of
successive states.
multiRec and common scenarios for its
useWe wanted to develop an MTRE model that readily handles the aforementioned scenario of related events, one or more of which can recur, and one or more of which can be terminating. We wanted the model to be flexible and intuitive, capable of handling a wide variety of common scenarios that include multiple events, where the goal is to explicitly model the risk conferred by one event to one or more subsequent events. This has practical uses. For example, it is well known that stroke increases the risk of HF, and vice versa; it is also known that atrial fibrillation (AF) increases the risk of both stroke and HF. Suppose we wish to characterize the bidirectional relationship between stroke and HF. We would want to estimate the risk that a stroke imparts to a future HF, and how much of that risk of HF that arises from the stroke is due to AF, and likewise estimate how much of the risk conferred by HF on future stroke is due to AF. This is of clinical importance to the physician who treats a patient with AF, as good management of their AF could prevent both future stroke and HF. Moreover, the risk of HF that arises from a prior stroke may differ between men and women, or between older vs younger age groups, or between individuals who have diabetes vs those who do not. There are of course many other clinical and demographic variables that may capture subgroups of individuals that experience a disease course differently, including response to therapeutic management. Diabetes itself is known to have a bidirectional relationship with HF; these complex relationships are readily captured by a model that explicitly models the risk of multiple event types that arise from prior events of one or more types, and how that risk is modified by one or more subject characteristics (such as demographics, one or more disease phenotypes, or clinical indices such as biomarkers). The most common diseases are often chronic, with a natural history of complications (including other disease processes). Diabetes is once again a typical example; it is increasing in both prevalence and incidence worldwide and rapidly becoming a public health crisis. Diabetes greatly increases the risk of cardiovascular disease (CVD), chronic kidney disease (CKD), and even certain types of cancer, as well as dementia. A comprehensive risk analysis guides optimal therapy and management of these individuals, and can be used to develop risk scores and refine existing ones for CVD, CKD, and other sequelae.
Another important application of our MTRE model is its ability to address informative censoring, a common challenge in longitudinal studies when participants drop out for reasons that may be related to study factors. By explicitly modeling study dropout as an additional event type, the MTRE framework can characterize the risk factors associated with dropout—such as participant characteristics or the occurrence of other event types that may be influenced by the study intervention (or lack thereof). Treating study dropout as an event also improves the estimation of parameters for other event types, as it mitigates the bias introduced when potentially informative dropout is instead treated as noninformative censoring.
In our previous work [1], we proposed a joint MTRE model and applied
the model to a large cardiovascular clinical trial with 3 types of
recurrent events: MI, stroke, and HF; and a terminating event, death. We
developed an R package called multiRec to fit our MTRE
models. The technical details of our model including the formulation,
construction of the full likelihood, parameter estimation, variance
correction via robust standard errors, model selection and diagnostics,
and relative efficiency under common misspecifications are described in
our main manuscript and its appendices (Web Appendix 1 and 2,
respectively). In the following sections, we describe how to use
multiRec to fit MTRE models for common clinical
scenarios.
multiRecTo illustrate our MTRE model, we include 2 datasets within the
multiRec package, one with two types of events
(multiRecCVD2) and one with four types of events
(multiRecCVD4). Both multiRecCVD2 and
multiRecCVD4 were adapted from the nafld1 and
nafld3 datasets in the survival package. The
nafld1 and nafld3 datasets contain
longitudinal information on participants with nonalcoholic fatty liver
disease (NAFLD), including multiple recurrent events. We processed and
restructured these data using the tmerge() function from
survival to construct interval-based datasets
(multiRecCVD2 and multiRecCVD4, respectively)
suitable for MTRE modeling.
## id tstart tstop age bmi male event
## 1 1 0.0000000 0.5364267 60 36.6 TRUE afib
## 2 1 0.5364267 1.2405306 61 36.4 TRUE hf
## 3 1 1.2405306 1.5913715 62 34.7 TRUE
## 4 1 1.5913715 2.2905664 62 38.4 TRUE
## 5 1 2.2905664 3.0129040 63 36.3 TRUE
## 6 1 3.0129040 3.5933029 63 34.4 TRUE
## id age male bmi tstart tstop event
## 1 4 56 1 37.83010 0.0000000 0.1670089
## 2 4 56 1 37.83010 0.1670089 3.2772074
## 3 5 68 1 NA 0.0000000 1.6892539 death
## 4 10 79 0 23.71853 0.0000000 0.3641342
## 5 10 79 0 23.71853 0.3641342 2.3436003 death
## 6 26 55 0 30.59225 0.0000000 2.4804928
Any dataset for fitting models in multiRec needs to have
the following structure. The id variable denotes the
participant, who can have one or more rows; each row represents a time
interval, bookended by the tstart and tstop
variables. The tstart variable represents the beginning of
the time interval, and the tstop variable represents the
end of the time interval at which an event may occur (only 1 event
allowed per time interval), or no event occurs (considered to be
noninformative censoring). Therefore, a minimum of 4 columns are
required for the dataset:
id, which links the rows to the participant;tstart, which marks the start of the time
interval;tstop, which marks the end of the time interval;
andevent, which denotes what type of event occurred (if
any). If an event type is terminating (e.g. death), the row that
contains it should be the last row for that participant, since no
further events may occur in that participant after that interval
ends.This type of multi-interval dataset is common in survival analysis
and can be constructed using the tmerge() function in the
survival package. The parent dataset must include event
types and corresponding event times for each participant. The
survival package contains instructions for how to use
tmerge(). Note that the event is assumed to occur at the
end of the interval. Moreover, the names id,
tstart, tstop and event are the
default ones and can be overridden by using the idVar=,
startTimeVar=, stopTimeVar=, and
eventVar= arguments in multiRec().
Beyond the 4 required columns, additional covariates that may
represent participant characteristics (e.g. demographic variables,
clinical indices, metabolic data, biomarkers, etc.) can be included as
columns. For each such column/variable, the value at each row represents
the value at the beginning of the time interval (denoted by
tstart). So for example for participant 1, who is a male,
in the multiRecCVD2 dataset above, the second row starts at
time 0.5364 (in years), and so the BMI of 36.4 represents his BMI at
time 0.5364. It is assumed that his BMI remains at 36.4 until the end of
that interval, which occurs at time 1.2405. The beginning of the very
next interval (in row 3) picks up right at that point (time 1.2405), and
his BMI is then assumed to be 34.7. His BMI remains at 34.7 until time
1.59137, and so on.
We will now fit several models to illustrate how to fit MTRE models
using multiRec(). We start with a simple model:
fit = multiRec(afib ~ age + male, # The model for the AF hazard
hf ~ age + male + bmi, # The model for the HF hazard
data = multiRecCVD2 # The dataset used
)In Model 1, we utilize the dataset multiRecCVD2 to
jointly model 2 event types: AF (denoted by variable name
afib) and HF (denoted by variable name hf),
with specific covariates for each event type. Of note,
multiRec() requires a model formula for the hazard of each
event type in the dataset. The event name must appear before the
~, spelled exactly as it appears in the dataset. The terms
to the right of the ~ specify the linear model for the
hazard, in the usual R notation. In the example above, the hazard for AF
depends on age and male and the hazard for HF
depends on age, male, and
bmi.
The idVar argument specifies the name of the variable
that contains the participant ID (unique to each participant, and which
links one or more rows of observations to that participant).
The multiRec() function accepts several additional
parameters, including
link=, which selects the link function by which we
model the hazards; the default is the log link which yields
multiplicative hazards.robust= which requests robust standard errors if set to
TRUE, and naive standard errors if set to
FALSE. In practice, robust=TRUE is
preferred.na.action=, which determines how to handle missing
values; the default is na.omit, which removes missing
values.method=, which specifies the optimization method
(e.g. BFGS, CG, SANN, or
Nelder-Mead) to be used; the default is BFGS.
Please consult the optim() function for details.SANN.init=, which sets the number of simulated
annealing iterations that will be used to initially refine the starting
values of the parameters to be estimated, after which the optimization
method specified in the method= argument will be called.
This may help in cases where multiRec() has trouble
converging.fitDetails=, which includes additional information in
the fit object when TRUE. This is required for model fit
diagnostics.To examine the Model 1 results, we simply type fit at
the R console to return the fit object:
## Link: log
##
## Call: afib ~ age + male
## component parameter estimate exp(estimate) se(estimate) z P(>|Z|)
## (Intrinsic) (Intercept) -6.5746 0.0014 1.1261 -5.8382 <0.0001
## (Intrinsic) age 0.0552 1.0568 0.0160 3.4514 0.0006
## (Intrinsic) maleTRUE 1.0015 2.7224 0.2208 4.5356 <0.0001
##
## Call: hf ~ age + male + bmi
## component parameter estimate exp(estimate) se(estimate) z P(>|Z|)
## (Intrinsic) (Intercept) -1.9163 0.1472 1.8396 -1.0417 0.2975
## (Intrinsic) age 0.0041 1.0041 0.0162 0.2515 0.8014
## (Intrinsic) maleTRUE 0.2425 1.2744 0.1832 1.3239 0.1855
## (Intrinsic) bmi -0.0139 0.9862 0.0424 -0.3284 0.7426
##
## AIC=1332.824, BIC=1371.218
Keep in mind that the default link is the log link, so all parameter estimates in this model are for the log of the hazard. For AF, we interpret the parameter estimates as follows. The intrinsic risk of AF is the absolute risk that arises from subject characteristics, in this case, age and sex, respectively. Thus, a female of age 65 years would have an absolute risk of AF calculated to be: \(\exp(-6.5746+0.0552\times 65+1.0015\times 0)=0.05\); therefore, she would have a 5% chance of experiencing AF in the next year. If the individual was male, and also 65 years of age, his absolute risk of AF would then be: \(\exp(-6.5746+0.0552\times 65+1.0015\times 1)=0.137\), and his risk would be \(\exp(1.0015)=2.72\) times the risk of his female counterpart, so a relative risk (RR) of 2.72 for males vs females in this cohort in experiencing AF, if all other covariates are held constant. In the same manner, the intrinsic risk of HF for a 65-year-old male with a BMI of 30 would be: \(\exp(-1.9163+0.0041\times 65+0.2425\times 1+30\times (-0.0139))=0.16\) in this cohort. Since BMI is not significantly associated with HF in this model, it would be reasonable to refit the model without BMI and estimate the parameters of the more parsimonious model.
In this example, AF and HF can both recur multiple times for the same person. More generally, the model can include terminating events such as death or study dropout, after which no further events are possible.
We now fit a more complex model using the dataset
multiRecCVD4, which contains 4 event types: stroke, HF, AF,
and death. Among these, stroke, HF, and AF may each recur within a
participant, whereas death — being a terminating event — can only occur
once. Since this dataset contains 4 event types in total, the first 4
arguments of the multiRec() function must consist of 4
formulas, one for each of these event types. Note that for the
multiRec() function, all event types specified in the
event column of the dataset must be included in the model
as events to be modeled jointly.
fit = multiRec(stroke ~ nPrior.hf(),
afib ~ age,
hf ~ nPrior.stroke(),
death ~ age + nPrior.stroke() + nPrior.afib() + nPrior.hf(),
data = multiRecCVD4,
robust = TRUE,
SANN.init = 50
)In the model, the nPrior.() functions indicate that the
hazard depends on the number of prior events of the type specified after
the period. For example, the model assumes that the risk of stroke at
any time depends on the number of HFs the participant experiences in the
past. As another example, the risk of death at any time depends on age
and the numbers of prior strokes, AFs and HFs.
To examine the Model 2A results, we again type fit at
the R console to return the fit object:
## Link: log
##
## Call: stroke ~ nPrior.hf()
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -3.6202 0.0268 0.0644 -56.2432
## N of prior hf (Intercept) 0.4101 1.5070 0.1290 3.1801
## P(>|Z|)
## <0.0001
## 0.0015
##
## Call: afib ~ age
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -5.3791 0.0046 0.4338 -12.3997
## (Intrinsic) age 0.0223 1.0226 0.0061 3.6314
## P(>|Z|)
## <0.0001
## 0.0003
##
## Call: hf ~ nPrior.stroke()
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -3.4289 0.0324 0.0604 -56.7455
## N of prior stroke (Intercept) 0.5323 1.7028 0.1018 5.2286
## P(>|Z|)
## <0.0001
## <0.0001
##
## Call: death ~ age + nPrior.stroke() + nPrior.afib() + nPrior.hf()
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -4.7053 0.0090 0.3584 -13.1270
## (Intrinsic) age 0.0114 1.0115 0.0051 2.2625
## N of prior stroke (Intercept) 0.4300 1.5373 0.1038 4.1435
## N of prior afib (Intercept) 0.5857 1.7962 0.1138 5.1458
## N of prior hf (Intercept) 0.9181 2.5045 0.0957 9.5912
## P(>|Z|)
## <0.0001
## 0.0237
## <0.0001
## <0.0001
## <0.0001
##
## AIC=9746.077, BIC=9812.575
We begin our interpretation of this model with stroke. The intrinsic risk of stroke, which has no covariates, comes from the intercept term, which is exponentiated (since we are using the log link) to yield \(\exp(-3.6222)=0.0267\). Note that this result can also be obtained directly from the 1st row of the 4th column, as the 4th column exponentiates the parameter estimates from the 3rd column. Thus, the absolute risk for stroke in the next year for all participants who have not experienced HF is 0.0267. For each prior HF event, the absolute risk of future stroke increases multiplicatively by 1.5127 (p<0.001). Thus, the risk of stroke for a participant who had two prior HFs is \(\exp(-3.6222 + 1.5127\times 2)=0.55\).
On the other hand, the intrinsic risk of HF, which likewise has no covariates in this model, also arises from the intercept term to yield 0.0324. For each prior stroke event, the absolute risk of future HF increases multiplicatively by 1.7127 (p<0.0001). We can see from our model results thus far that prior stroke increases the risk of HF, and HF in turn increases the risk of future stroke.
The interpretation of the absolute risk of AF is straightforward and can be calculated in the same manner as described for Model 1. We now turn to death, which is the terminating event in this model. Death here has both an intrinsic covariate (age) and all 3 prior event types (stroke, AF, and HF). We can see that age significantly increases the risk of death as we expect (relative risk of death is 1.01 for every 1-year increase in age) (p<0.05). The risk of death increases the most from prior HF (relative risk of death is 2.50 for every HF event [p<0.0001]), followed by AF (relative risk of death is 1.80 for every AF event [p<0.0001]), and finally stroke (relative risk of death is 1.53 for every stroke event [p<0.0001]).
Let us now examine the potential role of AF in the association
between stroke and HF. We again utilize the dataset
multiRecCVD4 and fit the 4 event types (stroke, AF, HF, and
death); this time, we include prior AF event as a covariate of stroke
and HF, respectively. Of note, for more complex models, the optional
method.seed argument can be used to make the stochastic
optimization step (simulated annealing) reproducible.
fit = multiRec(stroke ~ nPrior.hf() + nPrior.afib(),
afib ~ nPrior.hf() + nPrior.stroke(),
hf ~ nPrior.stroke() + nPrior.afib(),
death ~ age + nPrior.stroke()
+ nPrior.afib()
+ nPrior.hf(~ male, alpha=NA),
data = multiRecCVD4,
robust = TRUE,
SANN.init = 50,
method.seed = 1)Displaying the Model 2B results:
## Link: log
##
## Call: stroke ~ nPrior.hf() + nPrior.afib()
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -3.6510 0.0260 0.0663 -55.0990
## N of prior hf (Intercept) 0.3623 1.4366 0.1324 2.7352
## N of prior afib (Intercept) 0.3057 1.3576 0.1516 2.0164
## P(>|Z|)
## <0.0001
## 0.0062
## 0.0438
##
## Call: afib ~ nPrior.hf() + nPrior.stroke()
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -4.0166 0.0180 0.0813 -49.4158
## N of prior hf (Intercept) 0.4710 1.6016 0.1415 3.3296
## N of prior stroke (Intercept) 0.4229 1.5264 0.1394 3.0338
## P(>|Z|)
## <0.0001
## 0.0009
## 0.0024
##
## Call: hf ~ nPrior.stroke() + nPrior.afib()
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -3.5408 0.0290 0.0648 -54.6180
## N of prior stroke (Intercept) 0.4131 1.5115 0.1065 3.8808
## N of prior afib (Intercept) 0.8144 2.2578 0.1138 7.1588
## P(>|Z|)
## <0.0001
## 0.0001
## <0.0001
##
## Call: death ~ age + nPrior.stroke() + nPrior.afib() + nPrior.hf(~male, alpha = NA)
## component parameter estimate exp(estimate) se(estimate) z
## (Intrinsic) (Intercept) -4.6714 0.0094 0.3606 -12.9532
## (Intrinsic) age 0.0113 1.0114 0.0051 2.2348
## N of prior stroke (Intercept) 0.4035 1.4971 0.1095 3.6845
## N of prior afib (Intercept) 0.5926 1.8087 0.1132 5.2360
## N of prior hf (Intercept) 0.6627 1.9400 0.1572 4.2168
## N of prior hf male 0.1658 1.1803 0.1231 1.3469
## hf (alpha) 1.4256 4.1604 0.3135 4.5470
## P(>|Z|)
## <0.0001
## 0.0254
## 0.0002
## <0.0001
## <0.0001
## 0.1780
## <0.0001
##
## AIC=9700.285, BIC=9797.009
We can see that AF indeed increases the risk of both stroke and HF, respectively; the relative risk of stroke arising from prior AF is 1.36 (p=0.04) and the relative risk of HF from prior AF is 2.26 (p<0.001). Notably, the risk that stroke and HF confer on each other is lower when AF is accounted for in the model: in Model 2B, the relative risk of stroke from HF is 1.44 when AF is accounted for (vs in Model 2A, where the relative risk of stroke from HF is 1.51); and the relative risk of HF from stroke is 1.51 when AF is accounted (vs in Model 2A, where the relative risk of HF from stroke is 1.71). This suggests that AF may be a potential mediator and partially explains the association between stroke and HF. This type of analysis is particularly useful when investigating potential mediation pathways between two or more disease states or occurrences.
We now briefly describe the alpha parameter that we
estimate for illustrative purposes in Model 2B for death as a
terminating event. Here, alpha is a power function that
describes whether the relative risk of death that arises from prior HF
stays the same, or increases, or decreases with each additional prior HF
event that has occurred. For example, if alpha is estimated
to be 1 (i.e. 1 lies within the 95% confidence interval), then we
interpret that to mean that the power function for HF is linear, so that
each HF increases the relative risk of death by, in this case, 1.94 (for
females; for males, the relative risk arising from HF is higher compared
to females, but the attendant p-value is not significant). So, 2 prior
HF events would increase the relative risk of death by \(1.94^2=3.76\). If the alpha is
estimated to be <1, then the relative risk of death from prior HFs
diminishes with every successive HF (as an example: the relative risk of
death from 2 prior HFs would be less than \(1.94^2=3.76\)). If the alpha
is estimated to be >1, then the relative risk of death from prior HFs
increases with each successive HF. The linear power function
(alpha=1) provides a reasonable estimate for relative risk
arising from prior events in many or most cases, in which case
alpha can be assumed to be 1 and does not need to be
estimated. Of note, when estimating the alpha parameter for
prior events on a future event, it is important to verify that there are
multiple prior events of that type in multiple participants so that
there is enough data for the effect of those prior events to be properly
estimated.
The AIC and BIC provide straightforward, relative comparisons of
model fit between any 2 or more models, and is particularly helpful when
comparing non-nested models (for example, models with different link
functions). For nested models, the likelihood ratio test can be used.
The AIC and BIC are automatically provided in every model fit and is
returned as part of the output for the fit object. In our preceding
examples using multiRecCVD4, the AIC and BIC were higher
for Model 2A than for Model 2B, which suggests that Model 2B is a better
fit compared to Model 2A. The resid() function returns
martingale residuals (the default) and if desired, score residuals can
be obtained using the score option. Residuals can be
summarized in various ways, including plots, to investigate outliers.
Goodness of fit for a selected model can be assessed by comparing the
observed number of events of each type to the expected number of events
of each type and calculating the chi-squared test statistic for observed
vs expected events. The technical details of a goodness of fit
calculation for an MTRE model are provided in our companion manuscript
and its supplemental material (Web Appendix 3).