Fitting Ascariasis data

This vignette demonstrates the AMISforInfectiousDiseases package through a case study considering Ascariasis in Ethiopia in a single time point.

Loading packages

library("AMISforInfectiousDiseases")

Prevalence map

Ascariasis is one of the most common helminth infections in people worldwide. This disease is caused by Ascaris lumbricoides, also known as roundworms, which are an intestinal nematode. Ascaris surveys for Ethiopia are available upon request from the Global Atlas of Helminth Infection developed by the London Applied & Spatial Epidemiology Research Group (LASER) (Rachel L et al. 2014). Retkute et al. (2021) used these surveys to obtain national level prevalence maps and made them available as a supplementary material. We will use these maps, which consist of \(M=100\) samples of a prevalence map for Ascaris in Ethiopia with 11,369 locations at 10km \(\times\) 10km resolution.

In this example, we will use a subset of 1,000 locations and convert the percentage prevalence values to their decimals:

ascariasis_url <- "https://projecteuclid.org/journals/supplementalcontent/10.1214/21-AOAS1486/aoas1486suppb.zip"
tempFile <- tempfile()
download.file(ascariasis_url, tempFile)
tempFolder <- tempdir()
unzip(tempFile, exdir=tempFolder)
unlink(tempFile)
load(paste0(tempFolder, "/AMISEpi-main/data/prev.rda"))
L <- 1000 # taking a subset of L locations
wh <- sample(1:nrow(prev), L)
ascaris_map <- prev[wh, ]/100

Model

Following Anderson and May (1991), we assume that the number of worms per host, \(X\), has negative binomial distribution NB(\(W\),\(k\)), where \(W\) is the mean number of worms and \(k\) the dispersion parameter describing the degree of clumping of parasites. The probability mass function is given by \[\begin{equation} \text{Pr}(X=x) = \frac{\Gamma(x+k)}{\Gamma(k)x!} \left(\frac{k}{k+W}\right)^k \left(\frac{W}{k+W}\right)^x, \quad x=0,1,2,\dots. \end{equation}\]

Therefore, the prevalence is given by the probability of observing worms:

\[\begin{align} P &= 1 - \text{Pr}\big(X=0 \ \vert \ W, k \big)\\ &= 1 - \left( 1 + \frac{W}{k} \right)^{-k}. \end{align}\]

By sampling \(W\) in log-scale, the disease model passed to function amis() can be written as follows:

ascaris_model <- function(seeds, params, n_tims=1){
  p <- matrix(NA, nrow=nrow(params), ncol=n_tims)
  for(t in 1:n_tims){
    W <- exp(params[,1])
    k <- params[,2]
    p[,t] <- 1-(1+W/k)^(-k)
  }
  return(p)
}

Priors

The priors from Retkute et al. (2021) for the model parameters are borrowed in this example. These priors reflect what is currently known about the interaction between the worm burden and clumping coefficient based on the literature. In summary, a uniform prior is placed on the log worm burden and the relationship between the clumping coefficient and worm burden is estimated through a linear regression. Formally this is \[\begin{align} \log(W) &\sim U\big(\log(0.01), \log(60)\big), \\ k | W &\sim N\left(\hat{\alpha}+ \hat{\beta} W, \sigma(W)^2\right), \\ \end{align}\] where \(\hat{\alpha}\) and \(\hat{\beta}\) are the intercept and slope coefficients respectively for the regression line of worm burden against the clumping coefficient, and \(\sigma(W)\) is five times the standard error from this analysis. Refer to Retkute et al. (2021) for further details on the prior selection.

The prior object passed to amis() is given by

fit.v <- c(0.33371009, 0.01719462)
fit.inflation.factor <- 5
fit.hess<-matrix(0, nrow = 2, ncol = 2)
fit.hess[[1,1]] <- 5138.97
fit.hess[[1,2]] <- 49499.4
fit.hess[[2,1]] <- 49499.4
fit.hess[[2,2]] <- 677831

rprior <- function(n) {
  params <- matrix(NA,n,2)
  colnames(params) <- c("logW","k")
  params[,1] <- runif(n,log(0.01),log(60))
  params[,2] <- sapply(1:n, function(i) {
    rnorm(1, mean = fit.v[1] + fit.v[2]*exp(params[i,1]), 
          sd = fit.inflation.factor*sqrt(c(1,exp(params[i,1]))%*%solve(fit.hess,c(1,exp(params[i,1])))))
  })
  return(params)
}

dprior <- function(x, log=FALSE) {
  if (log) {
    dlogW <- dunif(x[1],log(0.01),log(60),log=TRUE)
    dk <- dnorm(x[2], mean = fit.v[1] + fit.v[2]*exp(x[1]), 
               sd = fit.inflation.factor*sqrt(c(1,exp(x[1]))%*%solve(fit.hess,c(1,exp(x[1])))),log=TRUE) 
    return(sum(dlogW,dk))
  } else {
    dlogW <- dunif(x[1],log(0.01),log(60))
    dk <- dnorm(x[2], mean = fit.v[1] + fit.v[2]*exp(x[1]), 
               sd = fit.inflation.factor*sqrt(c(1,exp(x[1]))%*%solve(fit.hess,c(1,exp(x[1]))))) 
    return(prod(dlogW,dk))
  }
}
ascaris_prior <- list(rprior=rprior, dprior=dprior)

Running AMIS

The list of default control parameters can be seen in default_amis_params(). This example uses the Gaussian kernel in the nonparametric estimation of the probability density \(f\). To use the Gaussian kernel, one needs to define sigma in the object amis_params. In addition, we will will set model parameter boundaries to ensure that \(-\infty < \log(W) < \infty\) and \(k>0\).

amis_params <- default_amis_params()
amis_params$boundaries_param <- matrix(c(-Inf, Inf, 0, Inf), 2, 2, byrow=TRUE)
amis_params$sigma <- 0.0025

Note that instead of defining boundaries through the object amis_params, the user could alternatively set the prevalence to NA directly in the model when \(k\) is negative and thus out of bounds. The boundaries for prevalence values, which are in \([0,1]\), are taken into account by default (through amis_params$boundaries).

Now we have all the objects we need to run the AMIS algorithm:

set.seed(123)
out <- amis(ascaris_map, ascaris_model, ascaris_prior, amis_params)

Analysing AMIS outputs

Given the output returned by amis(), we can use the function print() to see the data dimensions, model choices, and control parameters used to run the AMIS algorithm. The function summary() provides some brief information about the number of samples and effective sample sizes achieved by the algorithm.

print(out)
#> Data dimensions:
#> - Number of time points:  1
#> - Number of locations:  1000
#> - Number of map samples in each location:  100
#> -------------------------------------------------------------
#> Model and algorithm specifications:
#> - For the nonparametric estimation of the density of the likelihood:
#>      Gaussian kernel was used with smoothing parameter sigma = 0.0025
#> - Lower and upper boundaries for prevalences:  0, 1
#> - Number of new samples drawn within each AMIS iteration:  500
#> - Maximum number of iterations:  12
#> - Target effective sample size:  500
summary(out)
#> Fitted model:
#> - Number of iterations:  6
#> - Total number of simulated samples:  3000
#> - Target effective sample size:  500
#> Number of locations whose ESS exceeded the target ESS:  1000
#> Number of locations whose ESS was lower the target ESS:  0

It is possible to look at the parameter samples proposed at a given iteration (by default, the last iteration) and the corresponding proposal density:

par(mfrow=c(1,2))
plot_mixture_components(out, what = "uncertainty", cex=3)
plot_mixture_components(out, what = "density", nlevels=200)
plot of chunk asc_plot_mixt
plot of chunk asc_plot_mixt

For two arbitrary locations, we show below the posterior distribution of weighted samples for the model parameters and corresponding simulated prevalences.

locs <- sample(1:L, 2)
par(mfrow=c(3,2))
plot(out, what = "logW", type="hist", locations=locs, breaks = 50)
plot(out, what = "k", type="hist", locations=locs, breaks = 50)
plot(out, what = "prev", type="hist", locations=locs, time=1, breaks = 50)
plot of chunk asc_plot_posteriors
plot of chunk asc_plot_posteriors

We can also produce plots of credible intervals for these statistics for all locations as follows:

par(mfrow=c(1,3))
plot(out, what=c("logW","k","prev"), type="CI", locations=1:L, 
     cex=0.1, lwd=0.1, measure_central="median", order_locations_by="prev")
plot of chunk asc_plot_credible_intervals
plot of chunk asc_plot_credible_intervals

To obtain the numerical values of summary statistics for specific locations, we can run

calculate_summaries(out, what="prev", time=1, locations=locs, alpha=0.05)
#> $mean
#>  iu345  iu364 
#> 0.1421 0.2362 
#> 
#> $median
#>  iu345  iu364 
#> 0.1288 0.2321 
#> 
#> $quantiles
#>         iu345  iu364
#>  2.5% 0.01171 0.0312
#> 97.5% 0.29781 0.4146
#> 
#> $exceedance_probability
#>   iu345   iu364 
#> 0.01447 0.07241

References

Anderson, Roy M, and Robert M May. 1991. Infectious Diseases of Humans: Dynamics and Control. Oxford university press.
Rachel L, Pullan, Smith Jennifer L, Rashmi Jasrasaria, and Simon J. Brooker. 2014. “Global Numbers of Infection and Disease Burden of Soil Transmitted Helminth Infections in 2010.” Parasites & Vectors 7 (37).
Retkute, Renata, Panayiota Touloupou, María-Gloria Basáñez, T. Déirdre Hollingsworth, and Simon E. F. Spencer. 2021. Integrating geostatistical maps and infectious disease transmission models using adaptive multiple importance sampling.” The Annals of Applied Statistics 15 (4): 1980–98. https://doi.org/10.1214/21-AOAS1486.