Using Custom Outcome Models in gfoRmula

By default, the package uses a pooled logistic regression model for survival outcomes, logistic regression model for binary end-of-follow-up outcomes, and a linear regression model for continuous end-of-follow-up outcomes. Starting from version 1.1.0, the package allows users to apply their own type of outcome models. This document describes how to specify such custom outcome models. This document assumes that readers have read the long-form package documentation of McGrath et al. (2020).

Specifying custom outcome models

To specify custom outcome models, users must provide functions that fit the outcome model and obtain estimates from the fitted model through the parameters and , respectively, in the function.

The function for fitting the outcome model must take the parameters and . Below, we illustrate a function for fitting an outcome model using a random forest. This code uses the package.

ymodel_fit_custom <- function(ymodel, obs_data){
  return(randomForest::randomForest(formula = ymodel, data = obs_data))
}

The function for obtaining estimates from the model must take the parameters (the fitted outcome model) and (a containing the simulated dataset at time \(t\)). This function must return the estimated probability of the outcome for survival and binary end-of-follow-up outcomes or the estimated mean of the outcome for continuous end-of-follow-up outcomes in . Continuing with the random forest example, the code below obtains the estimated outcome mean for a continuous end-of-follow-up outcome. This code leverages the function in the package.

ymodel_predict_custom <- function(fit, newdf){
  return(as.numeric(predict(object = fit, newdata = newdf)))
}

Example

We perform an analysis similar to that Example 3 in McGrath et al. (2020), except we use the custom outcome model from the previous section.

library('Hmisc')
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
id <- 'id'
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'continuous_eof'
covtypes <- c('categorical', 'normal', 'binary')
histories <- c(lagged)
histvars <- list(c('A', 'L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag1_L1 + L3 + t0 +
                                  rcspline.eval(lag1_L2, knots = c(-1, 0, 1)),
                                L2 ~ lag1_A + L1 + lag1_L1 + lag1_L2 + L3 + t0,
                                A ~ lag1_A + L1 + L2 + lag1_L1 + lag1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3
intervention1.A <- list(static, rep(0, 7))
intervention2.A <- list(static, rep(1, 7))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000

gform_cont_eof <- gformula(obs_data = continuous_eofdata,
                           id = id, time_name = time_name,
                           covnames = covnames, outcome_name = outcome_name,
                           outcome_type = outcome_type, covtypes = covtypes,
                           covparams = covparams, ymodel = ymodel,
                           ymodel_fit_custom = ymodel_fit_custom, 
                           ymodel_predict_custom = ymodel_predict_custom,
                           intervention1.A = intervention1.A,
                           intervention2.A = intervention2.A,
                           int_descript = int_descript,
                           histories = histories, histvars = histvars,
                           basecovs = c("L3"), nsimul = nsimul, seed = 1234)
gform_cont_eof
## PREDICTED RISK UNDER MULTIPLE INTERVENTIONS
## 
## Intervention      Description
## 0         Natural course
## 1         Never treat
## 2         Always treat
## 
## Sample size = 2500, Monte Carlo sample size = 10000
## Number of bootstrap samples = 0
## Reference intervention = natural course (0)
##  
## 
##      k Interv.   NP mean g-form mean Mean ratio Mean difference % Intervened On
##  <num>   <num>     <num>       <num>      <num>           <num>           <num>
##      6       0 -4.414543   -4.336936   1.000000       0.0000000             0.0
##      6       1        NA   -3.118344   0.719020       1.2185924           100.0
##      6       2        NA   -4.597293   1.060032      -0.2603564            74.2
##  Aver % Intervened On
##                 <num>
##               0.00000
##              81.78714
##              17.61286

References

McGrath S, Lin V, Zhang Z, Petito LC, Logan RW, Hernán MA, Young JG. gfoRmula: an R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns. 2020 Jun 12;1(3).