CCVA Misclassification Matrices: Stored
as CCVA_missmat, this is the inventory of
uncertainty-quantified misclassification matrices for three
computer-coded verbal autopsy (CCVA) algorithms: expert algorithm or EAVA, InSilicoVA, and
InterVA. The
matrices are derived using the modeling framework of Pramanik et al. (2025)
applied to the data collected in the Child Health and Mortality
Prevention Surveillance (CHAMPS)
project (see Pramanik et
al. (2026) for details). Due to file size limits, the posterior
samples are hosted on the GitHub
repository, with the .rda file available under the release.
Please refer to this package and the GitHub repository for future
updates.
Example VA-only Data from
COMSA-Mozambique: Stored as
comsamoz_CCVAoutput, this object contains CCVA analyses
outputs of publicly
available verbal autopsy (VA) survey data in COMSA-Mozambique for children
under age five. It includes outputs from EAVA (using the EAVA package), and
InSilicoVA and InterVA (using the openVA package).
The analyses cover two age groups: neonates (0-27 days) and children
(1-59 months).
vacalibration(): This is the main
function for performing calibration. For EAVA, InSilicoVA, and InterVA,
it directly takes outputs from EAVA and openVA, and
produces calibrated estimates of cause-specific mortality fractions
(CSMFs). More generally, this calibrates population-level prevalence
derived from single-class predictions of discrete classifiers. For this,
users need to provide fixed or uncertainty-quantified misclassification
matrices.
plot_vacalib(): It presents a
figure including the misclassification matrix used for calibration, and
comparing uncalibrated and calibrated estimates of CSMFs.
We start by installing and loading the vacalibration
package in R.
Install from CRAN:
install.packages("vacalibration")
library(vacalibration) # load
Install from GitHub:
# install "devtools" R package
devtools::install_github("sandy-pramanik/vacalibration")
library(vacalibration) # load
For illustration, we use the VA-only data included in this package.
Stored as comsamoz_CCVAoutput, it contains outputs from
EAVA, InSilicoVA, and InterVA of the analysis of the publicly available
VA-only data for children under age 5 in COMSA-Mozambique. It can be
loaded with:
data("comsamoz_CCVAoutput")
comsamoz_CCVAoutput$neonate$eava # output from EAVA for neonates
comsamoz_CCVAoutput$neonate$insilicova # output from InSilicoVA for neonates
comsamoz_CCVAoutput$neonate # list of outputs for neonates from EAVA, InSilicoVA, and InterVA
Outputs for children can be similarly accessed as
comsamoz_CCVAoutput$child.
This is the inventory of uncertainty-quantified misclassification matrices for the CCVA algorithms EAVA, InSilicoVA, and InterVA. When applying these algorithms, the matrices enable VA-Calibration to obtain calibrated CSMF estimates. The matrices are estimated using the misclassification-matrix modeling framework of Pramanik et al. (2025) and paired CHAMPS–VA cause-of-death data from the Child Health and Mortality Prevention Surveillance (CHAMPS) project (see Pramanik et al. (2026) for details). It can be loaded with:
data("CCVA_missmat")
For EAVA among neonates in Mozambique, you can access: the average misclassification matrix, the uncertainty-quantified misclassification matrix as a Dirichlet prior, and distributional summaries, as follows:
CCVA_missmat$neonate$eava$postmean$Mozambique # average
CCVA_missmat$neonate$eava$asDirich$Mozambique # Dirichlet approximation
CCVA_missmat$neonate$eava$postsumm$Mozambique # summary of distribution
Matrices for other algorithms, countries, and child age groups can be
accessed in the same way. Currently, CCVA_missmat provides
misclassification matrices for three CCVA algorithms (EAVA,
InSilicoVA, and InterVA) and two age groups
(neonates aged 0-27 days, and child aged 1-59
months) across countries (specific estimates for
Bangladesh, Ethiopia, Kenya,
Mali, Mozambique, Sierra Leone,
and South Africa, and a combined estimate for all other
countries as other), enabling global calibration.
For each age group, misclassification matrices are provided for the following broad causes:
"congenital_malformation", "pneumonia",
"sepsis_meningitis_inf" (sepsis/meningitis/infections),
"ipre" (intrapartum-related events), "other",
and "prematurity"."malaria",
"pneumonia", "diarrhea",
"severe_malnutrition", "hiv",
"injury", "other",
"other_infections", and "nn_causes" (neonatal
causes consisting of IPRE, congenital malformation, and
prematurity).If misclassification matrices are available for the age group,
algorithm, and country of interest, users only need to provide
va_data with algorithm name, age_group, and
country, and vacalibration() automatically
fetches the appropriate misclassification matrix from
CCVA_missmat. If no matching matrices are available, users
must provide them (see the missmat argument in
vacalibration() for details).
This function also supports posterior samples of misclassification
matrices, such as those included in CCVA_missmat (available
from the GitHub
repository). For the example above, the samples can be accessed as
CCVA_missmat$neonate$eava$postsamples$Mozambique.
In the following example, we demonstrate how
vacalibration() can be used to perform algorithm-specific
and ensemble calibrations, and generate calibrated CSMF estimates. For
brevity, we exclude the diagnostic and summary plots as well as the
detailed output of the posterior sampling.
Below is an example of EAVA-specific VA-Calibration for neonates in Mozambique:
vacalib_eava = vacalibration(va_data = list("eava" = comsamoz_CCVAoutput$neonate$eava),
age_group = "neonate", country = "Mozambique")
# CSMF
vacalib_eava$p_uncalib[1,] # uncalibrated estimates
vacalib_eava$p_calib[1,,] # posterior of calibrated estimates
vacalib_eava$pcalib_postsumm[1,,] # posterior summary of calibrated estimates
# death counts
vacalib_eava$va_deaths_uncalib[1,] # uncalibrated
vacalib_eava$va_deaths_calib_algo[1,] # calibrated
InSilicoVA and InterVA-specific VA-Calibration can be similarly
performed by replacing
va_data = list("insilicova" = comsamoz_CCVAoutput$neonate$insilicova)
and
va_data = list("interva" = comsamoz_CCVAoutput$neonate$interva).
Use missmat_type to control uncertainty propagation.
missmat_type = "fixed" calibrates using a fixed
misclassification matrix (by default, the average matrix in
CCVA_missmat) and does not propagate uncertainty.
missmat_type = "prior" (package default) or
missmat_type = "samples" propagates uncertainty and is
recommended.
To calibrate with posterior samples, use
missmat_type = "samples" and
missmat = CCVA_missmat$neonate$eava$postsamples$Mozambique
in the example. Note: CCVA_missmat included in the package
does not contain posterior samples due to file size limits. If needed,
obtain them from the CCVA_missmat object in the GitHub
repository and pass them to vacalibration().
To perform ensemble calibration, provide a list algorithm-specific
CCVA outputs. This performs both algorithm-specific calibration and an
ensemble calibration. Set ensemble = FALSE to turn off
ensemble calibration.
vacalib_ensemble =
vacalibration(va_data = list("eava" = comsamoz_CCVAoutput$neonate$eava,
"insilicova" = comsamoz_CCVAoutput$neonate$insilicova,
"interva" = comsamoz_CCVAoutput$neonate$interva),
age_group = "neonate", country = "Mozambique")
# CSMF
vacalib_ensemble$p_uncalib # uncalibrated estimates
# posterior of calibrated CSMF
vacalib_ensemble$p_calib["eava",,] # EAVA
vacalib_ensemble$p_calib["insilicova",,] # InSilicoVA
vacalib_ensemble$p_calib["interva",,] # InterVA
vacalib_ensemble$p_calib["ensemble",,] # ensemble
# posterior summary of calibrated CSMF
vacalib_ensemble$pcalib_postsumm["eava",,] # EAVA
vacalib_ensemble$pcalib_postsumm["insilicova",,] # InSilicoVA
vacalib_ensemble$pcalib_postsumm["interva",,] # InterVA
vacalib_ensemble$pcalib_postsumm["ensemble",,] # ensemble
# death counts
vacalib_ensemble$va_deaths_uncalib # uncalibrated
vacalib_ensemble$va_deaths_calib_algo # calibrated counts from algorithm-specific calibration
vacalib_ensemble$va_deaths_calib_ensemble # calibrated counts from ensemble calibration
If missmat includes user-specified matrices, then
age_group and country are not required.
Calibration for children can be performed similarly.
The output of the vacalibration() function can be
directly passed to plot_vacalib() to generate a plot that
summarizes the main components of VA-Calibration. By default, it
displays the misclassification matrix used for calibration and shows
both the uncalibrated and calibrated CSMF estimates. For instance, when
calibrating for EAVA as demonstrated above, the summary plot can be
obtained as:
plot_vacalib(vacalib_fit = vacalib_eava)
Grey rows
and columns in the misclassification matrix indicate causes that are not
calibrated. Set
toplot="missmat" or csmf in
plot_vacalib() to only plot the misclassification matrix or
the comparison of CSMF estimates. This similarly applies to InSilicoVA,
InterVA, and ensemble VA-Calibration. The plotted misclassification
matrix depends on the missmat_type specified in
vacalibration(). If missmat_type="fixed", the
fixed misclassification matrix used in calibration is plotted. If
missmat_type equals "prior" or
"samples", the average misclassification matrix is
displayed.
As discussed in CCVA Misclassification
Matrices, the matrices in CCVA_missmat are available
for CHAMPS broad causes. In cases where the causes in
va_data are not a subset of the CHAMPS broad causes, a
cause-mapping step is required. One such application is the CA CODE project, which
compiles VA-based death counts across multiple countries. For example, a
study in Bangladesh analyzed 302 neonatal deaths using EAVA, and
reported 82 deaths due to Intrapartum, 17 due to
Congenital, 6 due to Diarrhoeal, 33 due to
LRI, 108 due to Sepsis, 35 due to Preterm, 14
due to Tetanus, and 7 due to Other.
In such cases, vacalibration() requires specifying
studycause_map, a mapping from the study causes to the
CHAMPS broad causes. For this example, following expert guidance, we
define:
set_studycause_map = c("Intrapartum" = "ipre", "Congenital" = "congenital_malformation",
"Diarrhoeal" = "sepsis_meningitis_inf", "LRI" = "pneumonia",
"Sepsis" = "sepsis_meningitis_inf", "Preterm" = "prematurity",
"Tetanus" = "sepsis_meningitis_inf", "Other" = "other")
This mapping converts the misclassification matrices in
CCVA_missmat to align with the study causes, enabling
VA-Calibration. This can then be implemented as:
vacalib_cacode = vacalibration(va_data = list("eava" = c("Intrapartum" = 82, "Congenital" = 17,
"Diarrhoeal" = 6, "LRI" = 33,
"Sepsis" = 108, "Preterm" = 35,
"Tetanus" = 14, "Other" = 7)),
age_group = "neonate", country = "Bangladesh",
studycause_map = set_studycause_map)
# CSMF
vacalib_cacode$p_uncalib[1,] # uncalibrated estimates
vacalib_cacode$p_calib[1,,] # posterior of calibrated estimates
vacalib_cacode$pcalib_postsumm[1,,] # posterior summary of calibrated estimates
# death counts
vacalib_cacode$va_deaths_uncalib[1,] # uncalibrated
vacalib_cacode$va_deaths_calib_algo[1,] # calibrated
This is required only when using the misclassification matrices from
CCVA_missmat. If missmat includes
user-specified matrices, then age_group,
country, and studycause_map are not required.
Like in algorithm-specific calibration described above,
vacalib_cacode can be similarly input into
plot_vacalib() to generate the summary plot of the
misclassification matrix and CSMF estimates.