Neural estimators are neural networks that transform data into parameter point estimates. These estimators are likelihood-free and amortised, in the sense that, after an initial setup cost, inference from observed data can be made in a fraction of the time required by conventional approaches. They are also approximate Bayes estimators and, therefore, are often referred to as neural Bayes estimators (Sainsbury-Dale, Zammit-Mangion, and Huser, 2024).
The package NeuralEstimators
facilitates the
user-friendly development of neural Bayes estimators and related neural
inferential approaches. It caters for any model for which simulation is
feasible by allowing the user to implicitly define their model via
simulated data. This vignette describes the R
interface to
the Julia
version of the package, whose documentation is
available here.
The goal of parametric point estimation is to estimate a \(p\)-dimensional parameter \(\boldsymbol{\theta} \in \Theta \subseteq \mathbb{R}^p\) from data \(\boldsymbol{Z} \in \mathbb{R}^n\) using an estimator, \(\hat{\boldsymbol{\theta}} : \mathbb{R}^n\to\Theta\). A ubiquitous decision-theoretic approach to the construction of estimators is based on average-risk optimality. Consider a nonnegative loss function \(L: \Theta \times \Theta \to [0, \infty)\). The Bayes risk of an estimator \(\hat{\boldsymbol{\theta}}(\cdot)\) is its loss averaged over all possible data realisations and parameter values, that is, \[ \int_\Theta \int_{\mathbb{R}^n} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}))f(\boldsymbol{z} \mid \boldsymbol{\theta}) \rm{d} \boldsymbol{z} \rm{d} \Pi(\boldsymbol{\theta}), \] where \(\Pi(\cdot)\) is a prior measure for \(\boldsymbol{\theta}\). Any minimiser of the Bayes risk is said to be a Bayes estimator with respect to \(L(\cdot, \cdot)\) and \(\Pi(\cdot)\).
Bayes estimators are theoretically attractive. For example, unique Bayes estimators are admissible and, under suitable regularity conditions, they are consistent and asymptotically efficient. They are also highly interpretable: Bayes estimators are functionals of the posterior distribution, and the specific functional is determined by the choice of loss function. For instance, under quadratic loss, the Bayes estimator is the posterior mean.
Despite their attactive properties, Bayes estimators are typically unavailable in closed form. A way forward is to assume a flexible parametric model for the estimator, and to optimise the parameters within that model in order to approximate the Bayes estimator. Neural networks are ideal candidates, since they are universal function approximators, and because they are extremely fast to evaluate.
Let \(\hat{\boldsymbol{\theta}}(\cdot;
\boldsymbol{\gamma})\) denote a neural point estimator, where
\(\boldsymbol{\gamma}\) contains the
neural-network parameters. Bayes estimators may be approximated with
\(\hat{\boldsymbol{\theta}}(\cdot;
\boldsymbol{\gamma}^*)\), where \(\boldsymbol{\gamma}^*\) solves the
optimisation task,
\[
\boldsymbol{\gamma}^*
\equiv
\underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} \; \frac{1}{K}
\sum_{k=1}^K L(\boldsymbol{\theta}^{(k)},
\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}; \boldsymbol{\gamma})),
\] whose objective function is a Monte Carlo approximation of the
Bayes risk made using a set \(\{\boldsymbol{\theta}^{(k)} : k = 1, \dots,
K\}\) of parameter vectors sampled from the prior and, for each
\(k\), simulated data \(\boldsymbol{Z}^{(k)} \sim f(\boldsymbol{z}
\mid \boldsymbol{\theta}^{(k)})\). Note that this procedure does
not involve evaluation, or knowledge, of the likelihood function, and
that the optimisation task can be performed straightforwardly using
back-propagation and stochastic gradient descent.
Parameter estimation from replicated data is commonly required in statistical applications. A parsimonious architecture for such estimators is based on the so-called DeepSets representation (Zaheer et al., 2017). Suppressing dependence on neural-network parameters \(\boldsymbol{\gamma}\) for notational convenience, a neural Bayes estimator couched in the DeepSets framework has the form,
\[ \hat{\boldsymbol{\theta}}(\boldsymbol{Z}_1, \dots, \boldsymbol{Z}_m) = \boldsymbol{\psi}\left(\boldsymbol{a}(\{\boldsymbol{\phi}(\boldsymbol{Z}_i ): i = 1, \dots, m\})\right), \] where \(\{\boldsymbol{Z}_i : i = 1, \dots, m\}\) are mutually conditionally independent replicates from the statistical model, \(\boldsymbol{\psi}(\cdot)\) and \(\boldsymbol{\phi}(\cdot)\) are neural networks referred to as the summary and inference networks, respectively, and \(\boldsymbol{a}(\cdot)\) is a permutation-invariant aggregation function (typically the element-wise mean). The architecture of \(\boldsymbol{\psi}(\cdot)\) depends on the structure of each \(\boldsymbol{Z}_i\) with, for example, a CNN used for gridded data and a fully-connected multilayer perceptron (MLP) used for unstructured multivariate data. The architecture of \(\boldsymbol{\phi}(\cdot)\) is always an MLP.
The DeepSets representation has several motivations. First, Bayes estimators are invariant to permutations of independent replicates; estimators constructed in the DeepSets representation are guaranteed to exhibit this property. Second, the DeepSets architecture is a universal approximator for continuously differentiable permutation-invariant functions and, therefore, any Bayes estimator that is a continuously differentiable function of the data can be approximated arbitrarily well by a neural Bayes estimator in the DeepSets form. Third, estimators constructed using DeepSets may be used with an arbitrary number \(m\) of conditionally independent replicates, therefore amortising the cost of training with respect to this choice. See Sainsbury-Dale, Zammit-Mangion, and Huser (2024) for further details on the use of DeepSets in the context of neural Bayes estimation, and for a discussion on the architecture’s connection to conventional estimators.
Uncertainty quantification with neural Bayes estimators often proceeds through the bootstrap distribution. Bootstrap-based approaches are particularly attractive when nonparametric bootstrap is possible (e.g., when the data are independent replicates), or when simulation from the fitted model is fast, in which case parametric bootstrap is also computationally efficient. However, these conditions are not always met. For example, when making inference from a single spatial field, nonparametric bootstrap is not possible without breaking the spatial dependence structure, and the cost of simulation from the fitted model is often non-negligible (e.g., exact simulation from a Gaussian process model requires the factorisation of an \(n \times n\) matrix, where \(n\) is the number of spatial locations, which is a task that is \(O(n^3)\) in computational complexity). Further, although bootstrap-based methods for uncertainty quantification are often considered to be fairly accurate and favourable to methods based on asymptotic normality, there are situations where bootstrap procedures are not reliable (see, e.g., Canty et al., 2006, pg. 6).
Alternatively, by leveraging ideas from (Bayesian) quantile regression, one may construct a neural Bayes estimator that approximates a set of marginal posterior quantiles (Fisher et al., 2023; Sainsbury-Dale et al., 2025, Sec. 2.2.4), which can then be used to construct credible intervals for each parameter. Inference then remains fully amortised since, once the estimators are trained, both point estimates and credible intervals can be obtained with virtually zero computational cost.
Posterior quantiles can be targeted by employing the quantile loss function which, for a single parameter \(\theta\), is \[ L_\tau(\theta, \hat{\theta}) = (\hat{\theta} - \theta)(\mathbb{I}(\hat{\theta} > \theta) - \tau), \quad \tau \in (0, 1), \] where \(\mathbb{I}(\cdot)\) denotes the indicator function. In particular, the Bayes estimator under the above loss function is the posterior \(\tau\)-quantile. When there are \(p > 1\) parameters, \(\boldsymbol{\theta} = (\theta_1, \dots, \theta_p)'\), the Bayes estimator under the joint loss \({L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}) = \sum_{k=1}^p L_\tau(\theta_k, \hat{\theta}_k)}\) is the vector of marginal posterior quantiles since, in general, a Bayes estimator under a sum of univariate loss functions is given by the vector of marginal Bayes estimators (Sainsbury-Dale et al., 2025, Thm. 2).
Neural Bayes estimators are conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical model being considered. The workflow is as follows:
The package NeuralEstimators
is designed to implement
this workflow in a user friendly manner, as we illustrate in the
following examples.
We now consider several examples where the data are univariate (Section 2.1), multivariate and unstructured (Section 2.2), collected over a regular grid (Section 2.3), or collected over arbitrary spatial locations (Section 2.4).
A note on data format: The following general rules regarding assumed data format apply:
Before proceeding, we load the required packages (see the README
for instructions on installing Julia and the Julia packages
NeuralEstimators
and Flux
). If you have access
to an NVIDIA graphics processing unit (GPU), you can utilise it by
simply adding CUDA
to the list of Julia packages in the
final line below:
library("NeuralEstimators")
library("JuliaConnectoR")
library("ggplot2")
library("ggpubr")
juliaEval('using NeuralEstimators, Flux')
#> Starting Julia ...
We first develop a neural Bayes estimator for \(\boldsymbol{\theta} \equiv (\mu, \sigma)'\) from data \(Z_1, \dots, Z_m\) that are independent and identically distributed according to a \(\rm{Gau}(\mu, \sigma^2)\) distribution.
First, we sample parameters from the prior \(\Pi(\cdot)\) to construct parameter sets
used for training and validating the estimator. Here, we use the priors
\(\mu \sim \rm{Gau}(0, 1)\) and \(\sigma \sim \rm{Gamma}(1, 1)\), and we
assume that the parameters are independent a priori. In
NeuralEstimators
, the sampled parameters are stored as
\(p \times K\) matrices, with \(p\) the number of parameters in the model
and \(K\) the number of sampled
parameter vectors.
# Sampling from the prior
# K: number of samples to draw from the prior
prior <- function(K) {
mu <- rnorm(K)
sigma <- rgamma(K, 1)
Theta <- matrix(c(mu, sigma), byrow = TRUE, ncol = K)
return(Theta)
}
K <- 10000
theta_train <- prior(K)
theta_val <- prior(K/10)
Next, we implicitly define the statistical model with simulated data.
In NeuralEstimators
, the simulated data are stored as a
list
, where each element of the list corresponds to a data
set simulated conditional on one parameter vector. Since each replicate
of our model is univariate, we take the summary network of the DeepSets
framework to be an MLP with a single input neuron, and each simulated
data set is stored as a matrix with the independent replicates in the
columns (i.e, a \(1 \times m\)
matrix).
# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
simulate <- function(Theta, m) {
apply(Theta, 2, function(theta) t(rnorm(m, theta[1], theta[2])), simplify = FALSE)
}
m <- 15
Z_train <- simulate(theta_train, m)
Z_val <- simulate(theta_val, m)
We now design architectures for the summary network and outer network
of the DeepSets framework, and initialise our neural point estimator.
This can be done using the R helper function
initialise_estimator()
, or using Julia code directly, as
follows:
# Initialise the estimator
estimator <- juliaEval('
d = 1 # dimension of each replicate (univariate data)
p = 2 # number of parameters in the statistical model
w = 32 # number of neurons in each hidden layer
psi = Chain(Dense(d, w, relu), Dense(w, w, relu), Dense(w, w, relu))
phi = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, p))
deepset = DeepSet(psi, phi)
estimator = PointEstimator(deepset)
')
A note on long-term stability: If constructing a
neural Bayes estimator for repeated and long term use, we recommended to
defining the architecture using Julia code directly. This is because
initialise_estimator()
is subject to change, as methods for
designing neural-network architectures improve over time and these
improved methods are incorporated into the package.
Once the estimator is initialised, it is fitted using
train()
, here using the default mean-absolute-error loss.
We use the sets of parameters and simulated data constructed earlier;
one may alternatively use the arguments sampler
and
simulator
to pass functions for sampling from the prior and
simulating from the model, respectively. These functions can be defined
in R (as we have done in this example) or in Julia using the package
JuliaConnectoR
, and this approach facilitates the technique
known as “on-the-fly” simulation.
# Train the estimator
estimator <- train(
estimator,
theta_train = theta_train,
theta_val = theta_val,
Z_train = Z_train,
Z_val = Z_val,
epochs = 50
)
#> Computing the initial validation risk... Initial validation risk = 0.8906495
#> Computing the initial training risk... Initial training risk = 0.8892015
#> Epoch: 1 Training risk: 0.295 Validation risk: 0.222 Run time of epoch: 7.941 seconds
#> Epoch: 2 Training risk: 0.207 Validation risk: 0.201 Run time of epoch: 0.235 seconds
#> Epoch: 3 Training risk: 0.197 Validation risk: 0.188 Run time of epoch: 0.213 seconds
#> Epoch: 4 Training risk: 0.192 Validation risk: 0.191 Run time of epoch: 0.211 seconds
#> Epoch: 5 Training risk: 0.188 Validation risk: 0.188 Run time of epoch: 0.228 seconds
#> Epoch: 6 Training risk: 0.189 Validation risk: 0.186 Run time of epoch: 0.21 seconds
#> Epoch: 7 Training risk: 0.187 Validation risk: 0.185 Run time of epoch: 0.208 seconds
#> Epoch: 8 Training risk: 0.185 Validation risk: 0.183 Run time of epoch: 0.218 seconds
#> Epoch: 9 Training risk: 0.184 Validation risk: 0.184 Run time of epoch: 0.22 seconds
#> Epoch: 10 Training risk: 0.183 Validation risk: 0.182 Run time of epoch: 0.229 seconds
#> Epoch: 11 Training risk: 0.183 Validation risk: 0.178 Run time of epoch: 0.348 seconds
#> Epoch: 12 Training risk: 0.182 Validation risk: 0.181 Run time of epoch: 0.236 seconds
#> Epoch: 13 Training risk: 0.182 Validation risk: 0.178 Run time of epoch: 0.221 seconds
#> Epoch: 14 Training risk: 0.181 Validation risk: 0.186 Run time of epoch: 0.226 seconds
#> Epoch: 15 Training risk: 0.181 Validation risk: 0.178 Run time of epoch: 0.21 seconds
#> Epoch: 16 Training risk: 0.18 Validation risk: 0.189 Run time of epoch: 0.216 seconds
#> Epoch: 17 Training risk: 0.18 Validation risk: 0.179 Run time of epoch: 0.216 seconds
#> Stopping early since the validation loss has not improved in 5 epochs
If the argument savepath
in train()
is
specified, the neural-network state (e.g., the weights and biases) will
be saved during training as bson
files, and the best state
(as measured by validation risk) will be saved as
best_network.bson
. To load these saved states into a neural
network in later R
sessions, one may use the function
loadstate()
. Note that one may also manually save a trained
estimator using savestate()
.
Once a neural Bayes estimator has been trained, its performance should be assessed. Simulation-based (empirical) methods for evaluating the performance of a neural Bayes estimator are ideal, since simulation is already required for constructing the estimator, and because the estimator can be applied to thousands of simulated data sets at almost no computational cost.
To assess the accuracy of the resulting neural Bayes estimator, one
may use the function assess()
. The resulting object can
then be passed to the functions risk()
,
bias()
, and rmse()
to compute a Monte Carlo
approximation of the Bayes risk, the bias, and the RMSE, or passed to
the function plotestimates()
to visualise the estimates
against the corresponding true values:
# Assess the estimator
theta_test <- prior(1000)
Z_test <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, estimator_names = "NBE")
#> Running NBE...
parameter_labels <- c("θ1" = expression(mu), "θ2" = expression(sigma))
plotestimates(assessment, parameter_labels = parameter_labels)
In addition to assessing the estimator over the entire parameter
space \(\Theta\), it is often helpful
to visualise the empirical sampling distribution of an estimator for a
particular parameter configuration. This can be done by calling
assess()
with \(J > 1\)
data sets simulated under a single parameter configuration, and
providing the resulting estimates to plotdistribution()
.
This function can be used to plot the empirical joint and marginal
sampling distribution of the neural Bayes estimator, with the true
parameter values demarcated in red.
theta <- as.matrix(c(0, 0.5)) # true parameters
J <- 400 # number of data sets
Z <- lapply(1:J, function(i) simulate(theta, m)[[1]]) # simulate data
assessment <- assess(estimator, theta, Z, estimator_names = "NBE")
#> Running NBE...
joint <- plotdistribution(assessment, type = "scatter",
parameter_labels = parameter_labels)
marginal <- plotdistribution(assessment, type = "box",
parameter_labels = parameter_labels,
return_list = TRUE)
ggarrange(plotlist = c(joint, marginal), nrow = 1, common.legend = TRUE)
Once the neural Bayes estimator has been trained and assessed, it can
be applied to observed data using the function estimate()
,
and non-parametric bootstrap estimates can be computed using the
function bootstrap()
. Below, we use simulated data as a
surrogate for observed data:
theta <- as.matrix(c(0, 0.5)) # true parameters
Z <- simulate(theta, m) # pretend that this is observed data
estimate(estimator, Z) # point estimate from the "observed data"
#> [,1]
#> [1,] 0.08637698
#> [2,] 0.50336009
bootstrap(estimator, Z)[, 1:6] # non-parametric bootstrap estimates
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.1543499 0.1173697 0.2904001 -0.06560165 -0.06802403 0.1969504
#> [2,] 0.6238579 0.4553554 0.5176821 0.39032412 0.33412695 0.4491137
Suppose now that our data consists of \(m\) replicates of a \(d\)-dimensional multivariate distribution.
For unstructured \(d\)-dimensional data, the estimator is based on a multilayer perceptron (MLP), and each data set is stored as a two-dimensional array (a matrix), with the second dimension storing the independent replicates. That is, in this case, we store the data as a list of \(d \times m\) matrices, and the summary network of the DeepSets representation has \(d\) neurons in the input layer.
For example, consider the task of estimating \(\boldsymbol{\theta} \equiv (\mu_1, \mu_2, \sigma, \rho)'\) from data \(\boldsymbol{Z}_1, \dots, \boldsymbol{Z}_m\) that are independent and identically distributed according to the bivariate Gaussian distribution, \[ \rm{Gau}\left(\begin{bmatrix}\mu_1 \\ \mu_2\end{bmatrix}, \sigma^2\begin{bmatrix} 1 & \rho \\ \rho & 1\end{bmatrix}\right). \] Then, to construct a neural Bayes estimator for this model, one may use the following code for defining a prior, the data simulator, and the neural-network architecture:
# Sampling from the prior
# K: number of samples to draw from the prior
prior <- function(K) {
mu1 <- rnorm(K)
mu2 <- rnorm(K)
sigma <- rgamma(K, 1)
rho <- runif(K, -1, 1)
theta <- matrix(c(mu1, mu2, sigma, rho), byrow = TRUE, ncol = K)
return(theta)
}
# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
simulate <- function(Theta, m) {
apply(Theta, 2, function(theta) {
mu <- c(theta[1], theta[2])
sigma <- theta[3]
rho <- theta[4]
Sigma <- sigma^2 * matrix(c(1, rho, rho, 1), 2, 2)
Z <- MASS::mvrnorm(m, mu, Sigma)
t(Z)
}, simplify = FALSE)
}
# Initialise the estimator
estimator <- juliaEval('
d = 2 # dimension of each replicate
w = 32 # number of neurons in each hidden layer
# Layer to ensure valid estimates
final_layer = Parallel(
vcat,
Dense(w, 2, identity), # mean parameters
Dense(w, 1, softplus), # variance parameter
Dense(w, 1, tanh) # correlation parameter
)
psi = Chain(Dense(d, w, relu), Dense(w, w, relu), Dense(w, w, relu))
phi = Chain(Dense(w, w, relu), Dense(w, w, relu), final_layer)
deepset = DeepSet(psi, phi)
estimator = PointEstimator(deepset)
')
The training, assessment, and application of the neural Bayes estimator then remains as in the example above:
# Construct training and validation data sets
K <- 5000
m <- 250
theta_train <- prior(K)
theta_val <- prior(K/10)
Z_train <- simulate(theta_train, m)
Z_val <- simulate(theta_val, m)
# Train the estimator
estimator <- train(
estimator,
theta_train = theta_train,
theta_val = theta_val,
Z_train = Z_train,
Z_val = Z_val,
epochs = 50
)
#> Computing the initial validation risk... Initial validation risk = 0.69000894
#> Computing the initial training risk... Initial training risk = 0.69885015
#> Epoch: 1 Training risk: 0.435 Validation risk: 0.275 Run time of epoch: 4.549 seconds
#> Epoch: 2 Training risk: 0.229 Validation risk: 0.203 Run time of epoch: 1.959 seconds
#> Epoch: 3 Training risk: 0.2 Validation risk: 0.19 Run time of epoch: 2.043 seconds
#> Epoch: 4 Training risk: 0.187 Validation risk: 0.179 Run time of epoch: 1.85 seconds
#> Epoch: 5 Training risk: 0.177 Validation risk: 0.172 Run time of epoch: 1.9 seconds
#> Epoch: 6 Training risk: 0.169 Validation risk: 0.17 Run time of epoch: 1.966 seconds
#> Epoch: 7 Training risk: 0.164 Validation risk: 0.158 Run time of epoch: 2.004 seconds
#> Epoch: 8 Training risk: 0.158 Validation risk: 0.154 Run time of epoch: 1.854 seconds
#> Epoch: 9 Training risk: 0.155 Validation risk: 0.15 Run time of epoch: 1.83 seconds
#> Epoch: 10 Training risk: 0.151 Validation risk: 0.149 Run time of epoch: 1.976 seconds
#> Epoch: 11 Training risk: 0.148 Validation risk: 0.144 Run time of epoch: 2.002 seconds
#> Epoch: 12 Training risk: 0.145 Validation risk: 0.144 Run time of epoch: 1.792 seconds
#> Epoch: 13 Training risk: 0.143 Validation risk: 0.141 Run time of epoch: 1.943 seconds
#> Epoch: 14 Training risk: 0.141 Validation risk: 0.137 Run time of epoch: 1.937 seconds
#> Epoch: 15 Training risk: 0.14 Validation risk: 0.14 Run time of epoch: 1.895 seconds
#> Epoch: 16 Training risk: 0.137 Validation risk: 0.139 Run time of epoch: 2.153 seconds
#> Epoch: 17 Training risk: 0.135 Validation risk: 0.133 Run time of epoch: 2.124 seconds
#> Epoch: 18 Training risk: 0.133 Validation risk: 0.134 Run time of epoch: 2.119 seconds
#> Epoch: 19 Training risk: 0.133 Validation risk: 0.137 Run time of epoch: 2.113 seconds
#> Epoch: 20 Training risk: 0.132 Validation risk: 0.139 Run time of epoch: 2.128 seconds
#> Epoch: 21 Training risk: 0.131 Validation risk: 0.131 Run time of epoch: 1.965 seconds
#> Epoch: 22 Training risk: 0.129 Validation risk: 0.127 Run time of epoch: 2.11 seconds
#> Epoch: 23 Training risk: 0.128 Validation risk: 0.135 Run time of epoch: 1.872 seconds
#> Epoch: 24 Training risk: 0.126 Validation risk: 0.126 Run time of epoch: 1.981 seconds
#> Epoch: 25 Training risk: 0.125 Validation risk: 0.123 Run time of epoch: 1.942 seconds
#> Epoch: 26 Training risk: 0.125 Validation risk: 0.123 Run time of epoch: 1.986 seconds
#> Epoch: 27 Training risk: 0.123 Validation risk: 0.125 Run time of epoch: 2.112 seconds
#> Epoch: 28 Training risk: 0.123 Validation risk: 0.125 Run time of epoch: 2.197 seconds
#> Epoch: 29 Training risk: 0.122 Validation risk: 0.127 Run time of epoch: 2.251 seconds
#> Epoch: 30 Training risk: 0.121 Validation risk: 0.123 Run time of epoch: 2.417 seconds
#> Epoch: 31 Training risk: 0.12 Validation risk: 0.121 Run time of epoch: 2.061 seconds
#> Epoch: 32 Training risk: 0.12 Validation risk: 0.125 Run time of epoch: 1.852 seconds
#> Epoch: 33 Training risk: 0.12 Validation risk: 0.119 Run time of epoch: 1.814 seconds
#> Epoch: 34 Training risk: 0.118 Validation risk: 0.119 Run time of epoch: 1.78 seconds
#> Epoch: 35 Training risk: 0.117 Validation risk: 0.117 Run time of epoch: 1.908 seconds
#> Epoch: 36 Training risk: 0.117 Validation risk: 0.117 Run time of epoch: 1.844 seconds
#> Epoch: 37 Training risk: 0.116 Validation risk: 0.125 Run time of epoch: 1.951 seconds
#> Epoch: 38 Training risk: 0.118 Validation risk: 0.121 Run time of epoch: 1.785 seconds
#> Epoch: 39 Training risk: 0.116 Validation risk: 0.121 Run time of epoch: 1.859 seconds
#> Epoch: 40 Training risk: 0.115 Validation risk: 0.116 Run time of epoch: 1.912 seconds
#> Epoch: 41 Training risk: 0.115 Validation risk: 0.113 Run time of epoch: 2.029 seconds
#> Epoch: 42 Training risk: 0.116 Validation risk: 0.117 Run time of epoch: 2.093 seconds
#> Epoch: 43 Training risk: 0.114 Validation risk: 0.117 Run time of epoch: 1.894 seconds
#> Epoch: 44 Training risk: 0.115 Validation risk: 0.118 Run time of epoch: 2.014 seconds
#> Epoch: 45 Training risk: 0.115 Validation risk: 0.115 Run time of epoch: 2.109 seconds
#> Epoch: 46 Training risk: 0.115 Validation risk: 0.115 Run time of epoch: 2.017 seconds
#> Epoch: 47 Training risk: 0.115 Validation risk: 0.121 Run time of epoch: 1.814 seconds
#> Stopping early since the validation loss has not improved in 5 epochs
# Assess the estimator
theta_test <- prior(1000)
Z_test <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test,
estimator_names = "NBE",
parameter_names = c("mu1", "mu2", "sigma", "rho"))
#> Running NBE...
plotestimates(assessment)
For data collected over a regular grid, neural Bayes estimators are based on a convolutional neural network (CNN). We give specific attention to this case in a separate vignette. There, we also outline how to perform neural Bayes estimation with incomplete/missing data based on methods described by Wang et al. (2024) and Sainsbury-Dale et al. (2025b).
To cater for spatial data collected over arbitrary spatial locations, one may employ a graph neural network (GNN; see Sainsbury-Dale, Zammit-Mangion, Richards, and Huser, 2025). The overall workflow remains as given in previous examples, with two key additional steps:
For illustration, we develop a neural Bayes estimator for a spatial Gaussian process model, where \(\boldsymbol{Z} \equiv (Z_{1}, \dots, Z_{n})'\) are data collected at locations \(\{\boldsymbol{s}_{1}, \dots, \boldsymbol{s}_{n}\}\) in a spatial domain that is a subset of \(\mathbb{R}^2\). The data are modelled as spatially-correlated mean-zero Gaussian random variables with exponential covariance function, given by \[ \textrm{cov}(Z_i, Z_j) = \textrm{exp}(-\|\boldsymbol{s}_i - \boldsymbol{s}_j\|/\theta), \quad i, j = 1, \dots, n, \] with unknown range parameter \(\theta > 0\). Here, we take the spatial domain to be the unit square, we sample spatial configurations from a Matern cluster process, and we adopt a uniform prior \(\theta \sim \rm{Unif}(0, 0.4)\).
To begin, we define a function for sampling from the prior, and a function for sampling spatial configurations and simulating from the statistical model conditional on the sampled spatial configurations and parameter vectors.
# Sampling from the prior
# K: number of samples to draw from the prior
prior <- function(K) {
theta <- runif(K, max = 0.4) # draw from the prior
theta <- t(theta) # reshape to matrix
return(theta)
}
# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
library("parallel") # mclapply()
library("spatstat") # rMatClust()
simulate <- function(theta, m = 1) {
# Number of parameter vectors
K <- ncol(theta)
# Simulate spatial configurations over the unit square, with each configuration
# drawn from a Matern cluster process with different, randomly sampled hyperparameters
n <- sample(100:300, K, replace = TRUE) # approximate sample size
lambda <- runif(K, 10, 50) # intensity of parent Poisson point process
mu <- n/lambda # mean number of daughter points
r <- runif(K, 0.05, 0.2) # cluster radius
S <- lapply(1:K, function(k) {
pts <- rMatClust(lambda[k], r = r[k], mu = mu[k])
cbind(x = pts$x, y = pts$y)
})
# Simulate conditionally independent replicates for each pair of parameters
# and spatial configurations
Z <- mclapply(1:K, function(k) {
D <- as.matrix(dist(S[[k]])) # distance matrix
n <- nrow(D) # sample size
Sigma <- exp(-D/theta[k]) # covariance matrix
L <- t(chol(Sigma)) # Cholesky factor of the covariance matrix
e <- matrix(rnorm(n*m), nrow = n, ncol = m) # standard normal variates
Z <- L %*% e # conditionally independent replicates from the model
Z
})
# Define a graph from the spatial configurations and associated spatial data
G <- spatialgraphlist(S, Z)
return(G)
}
Next, we define our GNN architecture and initialise the estimator. Here, we use an architecture tailored to isotropic spatial dependence models; for further details, see Section 2.2.1 of Sainsbury-Dale et al. (2025). In this example our goal is to construct a point estimator, however other kinds of estimators can be constructed by substituting the appropriate estimator class in the final line below:
# Initialise the estimator
estimator <- juliaEval('
# Load Julia packages for GNN functionality
using GraphNeuralNetworks
using Statistics: mean
# Spatial weight functions: continuous surrogates for 0-1 basis functions
h_max = 0.15 # maximum distance to consider
q = 10 # output dimension of the spatial weights
w = KernelWeights(h_max, q)
# Propagation module
propagation = GNNChain(
SpatialGraphConv(1 => q, relu, w = w, w_out = q),
SpatialGraphConv(q => q, relu, w = w, w_out = q)
)
# Readout module
readout = GlobalPool(mean)
# Summary network
psi = GNNSummary(propagation, readout)
# Expert summary statistics, the empirical variogram
V = NeighbourhoodVariogram(h_max, q)
# Inference network
phi = Chain(
Dense(2q => 128, relu),
Dense(128 => 128, relu),
Dense(128 => 1, softplus) # softplus activation to ensure positive estimates
)
# DeepSet object
deepset = DeepSet(psi, phi; S = V)
# Point estimator
estimator = PointEstimator(deepset)
')
Next, we train the estimator:
# Construct training and validation data sets
K <- 5000
m <- 1
theta_train <- prior(K)
theta_val <- prior(K/10)
Z_train <- simulate(theta_train, m)
Z_val <- simulate(theta_val, m)
# Train the estimator
estimator <- train(
estimator,
theta_train = theta_train,
theta_val = theta_val,
Z_train = Z_train,
Z_val = Z_val,
epochs = 10
)
#> Computing the initial validation risk... Initial validation risk = 0.4399248
#> Computing the initial training risk... Initial training risk = 0.44623217
#> Epoch: 1 Training risk: 0.048 Validation risk: 0.029 Run time of epoch: 42.912 seconds
#> Epoch: 2 Training risk: 0.026 Validation risk: 0.026 Run time of epoch: 26.98 seconds
#> Epoch: 3 Training risk: 0.025 Validation risk: 0.026 Run time of epoch: 26.99 seconds
#> Epoch: 4 Training risk: 0.024 Validation risk: 0.025 Run time of epoch: 26.394 seconds
#> Epoch: 5 Training risk: 0.023 Validation risk: 0.024 Run time of epoch: 26.872 seconds
#> Epoch: 6 Training risk: 0.023 Validation risk: 0.024 Run time of epoch: 26.353 seconds
#> Epoch: 7 Training risk: 0.022 Validation risk: 0.024 Run time of epoch: 27.15 seconds
#> Epoch: 8 Training risk: 0.023 Validation risk: 0.024 Run time of epoch: 26.4 seconds
#> Epoch: 9 Training risk: 0.022 Validation risk: 0.023 Run time of epoch: 27.199 seconds
#> Epoch: 10 Training risk: 0.022 Validation risk: 0.025 Run time of epoch: 26.314 seconds
Note that the computations in GNNs are performed in parallel, making
them particularly well-suited for GPUs, which typically contain
thousands of cores. If you have access to an NVIDIA GPU, you can utilise
it by simply loading the Julia package CUDA
, for example,
by running juliaEval('using CUDA')
.
Next, we assess the trained estimator:
# Assess the estimator
theta_test <- prior(1000)
Z_test <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, estimator_names = "GNN NBE")
#> Running GNN NBE...
plotestimates(assessment)
Finally, once the estimator has been assessed, it may be applied to observed data:
Various other methods are implemented in the package, and will be described in forthcoming vignettes. These topics include constructing a neural Bayes estimator for settings in which some data are censored (Richards et al., 2025); constructing posterior credible intervals using a neural Bayes estimator trained under the quantile loss function (e.g., Fisher et al., 2023; Sainsbury-Dale et al., 2025, Sec. 2.2.4); and performing full posterior or likelihood-based inference using neural likelihood-to-evidence ratios (see, e.g., Hermans et al., 2020; Walchessen et al., 2024; Zammit-Mangion et al., 2025, Sec. 5.2).
If you’d like to implement these methods before these vignettes are made available, please contact the package maintainer.