Key Components of the Package

  • CCVA Misclassification Matrices: Stored as CCVA_missmat, this is the inventory of uncertainty-quantified misclassification matrices for three computer-coded verbal autopsy (CCVA) algorithms: expert algorithm or EAVA, InSilicoVA, and InterVA. The matrices are derived using the modeling framework of Pramanik et al. (2025) applied to the data collected in the Child Health and Mortality Prevention Surveillance (CHAMPS) project (see Pramanik et al. (2026) for details). Due to file size limits, the posterior samples are hosted on the GitHub repository, with the .rda file available under the release. Please refer to this package and the GitHub repository for future updates.

  • Example VA-only Data from COMSA-Mozambique: Stored as comsamoz_CCVAoutput, this object contains CCVA analyses outputs of publicly available verbal autopsy (VA) survey data in COMSA-Mozambique for children under age five. It includes outputs from EAVA (using the EAVA package), and InSilicoVA and InterVA (using the openVA package). The analyses cover two age groups: neonates (0-27 days) and children (1-59 months).

  • vacalibration(): This is the main function for performing calibration. For EAVA, InSilicoVA, and InterVA, it directly takes outputs from EAVA and openVA, and produces calibrated estimates of cause-specific mortality fractions (CSMFs). More generally, this calibrates population-level prevalence derived from single-class predictions of discrete classifiers. For this, users need to provide fixed or uncertainty-quantified misclassification matrices.

  • plot_vacalib(): It presents a figure including the misclassification matrix used for calibration, and comparing uncalibrated and calibrated estimates of CSMFs.

Getting Started

We start by installing and loading the vacalibration package in R.

Install from CRAN:

install.packages("vacalibration")
library(vacalibration) # load

Install from GitHub:

# install "devtools" R package
devtools::install_github("sandy-pramanik/vacalibration")
library(vacalibration) # load

Back to top

Example: COMSA-Mozambique Data

For illustration, we use the VA-only data included in this package. Stored as comsamoz_CCVAoutput, it contains outputs from EAVA, InSilicoVA, and InterVA of the analysis of the publicly available VA-only data for children under age 5 in COMSA-Mozambique. It can be loaded with:

data("comsamoz_CCVAoutput")

comsamoz_CCVAoutput$neonate$eava  # output from EAVA for neonates
comsamoz_CCVAoutput$neonate$insilicova  # output from InSilicoVA for neonates
comsamoz_CCVAoutput$neonate  # list of outputs for neonates from EAVA, InSilicoVA, and InterVA

Outputs for children can be similarly accessed as comsamoz_CCVAoutput$child.

Back to top

CCVA Misclassification Matrices

This is the inventory of uncertainty-quantified misclassification matrices for the CCVA algorithms EAVA, InSilicoVA, and InterVA. When applying these algorithms, the matrices enable VA-Calibration to obtain calibrated CSMF estimates. The matrices are estimated using the misclassification-matrix modeling framework of Pramanik et al. (2025) and paired CHAMPS–VA cause-of-death data from the Child Health and Mortality Prevention Surveillance (CHAMPS) project (see Pramanik et al. (2026) for details). It can be loaded with:

data("CCVA_missmat")

For EAVA among neonates in Mozambique, you can access: the average misclassification matrix, the uncertainty-quantified misclassification matrix as a Dirichlet prior, and distributional summaries, as follows:

CCVA_missmat$neonate$eava$postmean$Mozambique  # average
CCVA_missmat$neonate$eava$asDirich$Mozambique  # Dirichlet approximation
CCVA_missmat$neonate$eava$postsumm$Mozambique  # summary of distribution

Matrices for other algorithms, countries, and child age groups can be accessed in the same way. Currently, CCVA_missmat provides misclassification matrices for three CCVA algorithms (EAVA, InSilicoVA, and InterVA) and two age groups (neonates aged 0-27 days, and child aged 1-59 months) across countries (specific estimates for Bangladesh, Ethiopia, Kenya, Mali, Mozambique, Sierra Leone, and South Africa, and a combined estimate for all other countries as other), enabling global calibration.

For each age group, misclassification matrices are provided for the following broad causes:

  • Neonates: "congenital_malformation", "pneumonia", "sepsis_meningitis_inf" (sepsis/meningitis/infections), "ipre" (intrapartum-related events), "other", and "prematurity".
  • Children: "malaria", "pneumonia", "diarrhea", "severe_malnutrition", "hiv", "injury", "other", "other_infections", and "nn_causes" (neonatal causes consisting of IPRE, congenital malformation, and prematurity).

If misclassification matrices are available for the age group, algorithm, and country of interest, users only need to provide va_data with algorithm name, age_group, and country, and vacalibration() automatically fetches the appropriate misclassification matrix from CCVA_missmat. If no matching matrices are available, users must provide them (see the missmat argument in vacalibration() for details).

This function also supports posterior samples of misclassification matrices, such as those included in CCVA_missmat (available from the GitHub repository). For the example above, the samples can be accessed as CCVA_missmat$neonate$eava$postsamples$Mozambique.

Back to top

Implementing VA-Calibration

In the following example, we demonstrate how vacalibration() can be used to perform algorithm-specific and ensemble calibrations, and generate calibrated CSMF estimates. For brevity, we exclude the diagnostic and summary plots as well as the detailed output of the posterior sampling.

Back to top

Integration with VA Workflow

Algorithm-Specific

Below is an example of EAVA-specific VA-Calibration for neonates in Mozambique:

vacalib_eava = vacalibration(va_data = list("eava" = comsamoz_CCVAoutput$neonate$eava), 
                             age_group = "neonate", country = "Mozambique")

# CSMF
vacalib_eava$p_uncalib[1,]  # uncalibrated estimates
vacalib_eava$p_calib[1,,]  # posterior of calibrated estimates
vacalib_eava$pcalib_postsumm[1,,]  # posterior summary of calibrated estimates

# death counts
vacalib_eava$va_deaths_uncalib[1,]  # uncalibrated
vacalib_eava$va_deaths_calib_algo[1,]  # calibrated

InSilicoVA and InterVA-specific VA-Calibration can be similarly performed by replacing va_data = list("insilicova" = comsamoz_CCVAoutput$neonate$insilicova) and va_data = list("interva" = comsamoz_CCVAoutput$neonate$interva).

Use missmat_type to control uncertainty propagation. missmat_type = "fixed" calibrates using a fixed misclassification matrix (by default, the average matrix in CCVA_missmat) and does not propagate uncertainty. missmat_type = "prior" (package default) or missmat_type = "samples" propagates uncertainty and is recommended.

To calibrate with posterior samples, use missmat_type = "samples" and missmat = CCVA_missmat$neonate$eava$postsamples$Mozambique in the example. Note: CCVA_missmat included in the package does not contain posterior samples due to file size limits. If needed, obtain them from the CCVA_missmat object in the GitHub repository and pass them to vacalibration().

Back to top

Ensemble

To perform ensemble calibration, provide a list algorithm-specific CCVA outputs. This performs both algorithm-specific calibration and an ensemble calibration. Set ensemble = FALSE to turn off ensemble calibration.

vacalib_ensemble = 
  vacalibration(va_data = list("eava" = comsamoz_CCVAoutput$neonate$eava,
                               "insilicova" = comsamoz_CCVAoutput$neonate$insilicova,
                               "interva" = comsamoz_CCVAoutput$neonate$interva),
                age_group = "neonate", country = "Mozambique")

# CSMF
vacalib_ensemble$p_uncalib  # uncalibrated estimates

# posterior of calibrated CSMF
vacalib_ensemble$p_calib["eava",,]  # EAVA
vacalib_ensemble$p_calib["insilicova",,]  # InSilicoVA
vacalib_ensemble$p_calib["interva",,]  # InterVA
vacalib_ensemble$p_calib["ensemble",,]  # ensemble

# posterior summary of calibrated CSMF
vacalib_ensemble$pcalib_postsumm["eava",,]  # EAVA
vacalib_ensemble$pcalib_postsumm["insilicova",,]  # InSilicoVA
vacalib_ensemble$pcalib_postsumm["interva",,]  # InterVA
vacalib_ensemble$pcalib_postsumm["ensemble",,]  # ensemble

# death counts
vacalib_ensemble$va_deaths_uncalib  # uncalibrated
vacalib_ensemble$va_deaths_calib_algo  # calibrated counts from algorithm-specific calibration
vacalib_ensemble$va_deaths_calib_ensemble  # calibrated counts from ensemble calibration

If missmat includes user-specified matrices, then age_group and country are not required.

Calibration for children can be performed similarly.

Back to top

Visualization

The output of the vacalibration() function can be directly passed to plot_vacalib() to generate a plot that summarizes the main components of VA-Calibration. By default, it displays the misclassification matrix used for calibration and shows both the uncalibrated and calibrated CSMF estimates. For instance, when calibrating for EAVA as demonstrated above, the summary plot can be obtained as:

plot_vacalib(vacalib_fit = vacalib_eava)

Grey rows and columns in the misclassification matrix indicate causes that are not calibrated. Set toplot="missmat" or csmf in plot_vacalib() to only plot the misclassification matrix or the comparison of CSMF estimates. This similarly applies to InSilicoVA, InterVA, and ensemble VA-Calibration. The plotted misclassification matrix depends on the missmat_type specified in vacalibration(). If missmat_type="fixed", the fixed misclassification matrix used in calibration is plotted. If missmat_type equals "prior" or "samples", the average misclassification matrix is displayed.

Back to top

Causes Outside CHAMPS Broad Causes

As discussed in CCVA Misclassification Matrices, the matrices in CCVA_missmat are available for CHAMPS broad causes. In cases where the causes in va_data are not a subset of the CHAMPS broad causes, a cause-mapping step is required. One such application is the CA CODE project, which compiles VA-based death counts across multiple countries. For example, a study in Bangladesh analyzed 302 neonatal deaths using EAVA, and reported 82 deaths due to Intrapartum, 17 due to Congenital, 6 due to Diarrhoeal, 33 due to LRI, 108 due to Sepsis, 35 due to Preterm, 14 due to Tetanus, and 7 due to Other.

In such cases, vacalibration() requires specifying studycause_map, a mapping from the study causes to the CHAMPS broad causes. For this example, following expert guidance, we define:

set_studycause_map = c("Intrapartum" = "ipre", "Congenital" = "congenital_malformation",
                       "Diarrhoeal" = "sepsis_meningitis_inf", "LRI" = "pneumonia",
                       "Sepsis" = "sepsis_meningitis_inf", "Preterm" = "prematurity", 
                       "Tetanus" = "sepsis_meningitis_inf", "Other" = "other")

This mapping converts the misclassification matrices in CCVA_missmat to align with the study causes, enabling VA-Calibration. This can then be implemented as:



vacalib_cacode = vacalibration(va_data = list("eava" = c("Intrapartum" = 82, "Congenital" = 17,
                                                         "Diarrhoeal" = 6, "LRI" = 33,
                                                         "Sepsis" = 108, "Preterm" = 35, 
                                                         "Tetanus" = 14, "Other" = 7)), 
                               age_group = "neonate", country = "Bangladesh",
                               studycause_map = set_studycause_map)

# CSMF
vacalib_cacode$p_uncalib[1,]  # uncalibrated estimates
vacalib_cacode$p_calib[1,,]  # posterior of calibrated estimates
vacalib_cacode$pcalib_postsumm[1,,]  # posterior summary of calibrated estimates

# death counts
vacalib_cacode$va_deaths_uncalib[1,]  # uncalibrated
vacalib_cacode$va_deaths_calib_algo[1,]  # calibrated

This is required only when using the misclassification matrices from CCVA_missmat. If missmat includes user-specified matrices, then age_group, country, and studycause_map are not required. Like in algorithm-specific calibration described above, vacalib_cacode can be similarly input into plot_vacalib() to generate the summary plot of the misclassification matrix and CSMF estimates.

Back to top