Neural estimators are neural networks that transform data into parameter point estimates. These estimators are likelihood-free and amortised, in the sense that, after an initial setup cost, inference from observed data can be made in a fraction of the time required by conventional approaches. They are also approximate Bayes estimators and, therefore, are often referred to as neural Bayes estimators (Sainsbury-Dale, Zammit-Mangion, and Huser, 2024).

The package NeuralEstimators facilitates the user-friendly development of neural Bayes estimators and related neural inferential approaches. It caters for any model for which simulation is feasible by allowing the user to implicitly define their model via simulated data. This vignette describes the R interface to the Julia version of the package, whose documentation is available here.

1 Methodology

1.1 Neural Bayes estimators

The goal of parametric point estimation is to estimate a \(p\)-dimensional parameter \(\boldsymbol{\theta} \in \Theta \subseteq \mathbb{R}^p\) from data \(\boldsymbol{Z} \in \mathbb{R}^n\) using an estimator, \(\hat{\boldsymbol{\theta}} : \mathbb{R}^n\to\Theta\). A ubiquitous decision-theoretic approach to the construction of estimators is based on average-risk optimality. Consider a nonnegative loss function \(L: \Theta \times \Theta \to [0, \infty)\). The Bayes risk of an estimator \(\hat{\boldsymbol{\theta}}(\cdot)\) is its loss averaged over all possible data realisations and parameter values, that is, \[ \int_\Theta \int_{\mathbb{R}^n} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}))f(\boldsymbol{z} \mid \boldsymbol{\theta}) \rm{d} \boldsymbol{z} \rm{d} \Pi(\boldsymbol{\theta}), \] where \(\Pi(\cdot)\) is a prior measure for \(\boldsymbol{\theta}\). Any minimiser of the Bayes risk is said to be a Bayes estimator with respect to \(L(\cdot, \cdot)\) and \(\Pi(\cdot)\).

Bayes estimators are theoretically attractive. For example, unique Bayes estimators are admissible and, under suitable regularity conditions, they are consistent and asymptotically efficient. They are also highly interpretable: Bayes estimators are functionals of the posterior distribution, and the specific functional is determined by the choice of loss function. For instance, under quadratic loss, the Bayes estimator is the posterior mean.

Despite their attactive properties, Bayes estimators are typically unavailable in closed form. A way forward is to assume a flexible parametric model for the estimator, and to optimise the parameters within that model in order to approximate the Bayes estimator. Neural networks are ideal candidates, since they are universal function approximators, and because they are extremely fast to evaluate.

Let \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})\) denote a neural point estimator, where \(\boldsymbol{\gamma}\) contains the neural-network parameters. Bayes estimators may be approximated with \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)\), where \(\boldsymbol{\gamma}^*\) solves the optimisation task,
\[ \boldsymbol{\gamma}^* \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} \; \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}; \boldsymbol{\gamma})), \] whose objective function is a Monte Carlo approximation of the Bayes risk made using a set \(\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}\) of parameter vectors sampled from the prior and, for each \(k\), simulated data \(\boldsymbol{Z}^{(k)} \sim f(\boldsymbol{z} \mid \boldsymbol{\theta}^{(k)})\). Note that this procedure does not involve evaluation, or knowledge, of the likelihood function, and that the optimisation task can be performed straightforwardly using back-propagation and stochastic gradient descent.

1.2 Neural Bayes estimators for replicated data

Parameter estimation from replicated data is commonly required in statistical applications. A parsimonious architecture for such estimators is based on the so-called DeepSets representation (Zaheer et al., 2017). Suppressing dependence on neural-network parameters \(\boldsymbol{\gamma}\) for notational convenience, a neural Bayes estimator couched in the DeepSets framework has the form,

\[ \hat{\boldsymbol{\theta}}(\boldsymbol{Z}_1, \dots, \boldsymbol{Z}_m) = \boldsymbol{\psi}\left(\boldsymbol{a}(\{\boldsymbol{\phi}(\boldsymbol{Z}_i ): i = 1, \dots, m\})\right), \] where \(\{\boldsymbol{Z}_i : i = 1, \dots, m\}\) are mutually conditionally independent replicates from the statistical model, \(\boldsymbol{\psi}(\cdot)\) and \(\boldsymbol{\phi}(\cdot)\) are neural networks referred to as the summary and inference networks, respectively, and \(\boldsymbol{a}(\cdot)\) is a permutation-invariant aggregation function (typically the element-wise mean). The architecture of \(\boldsymbol{\psi}(\cdot)\) depends on the structure of each \(\boldsymbol{Z}_i\) with, for example, a CNN used for gridded data and a fully-connected multilayer perceptron (MLP) used for unstructured multivariate data. The architecture of \(\boldsymbol{\phi}(\cdot)\) is always an MLP.

The DeepSets representation has several motivations. First, Bayes estimators are invariant to permutations of independent replicates; estimators constructed in the DeepSets representation are guaranteed to exhibit this property. Second, the DeepSets architecture is a universal approximator for continuously differentiable permutation-invariant functions and, therefore, any Bayes estimator that is a continuously differentiable function of the data can be approximated arbitrarily well by a neural Bayes estimator in the DeepSets form. Third, estimators constructed using DeepSets may be used with an arbitrary number \(m\) of conditionally independent replicates, therefore amortising the cost of training with respect to this choice. See Sainsbury-Dale, Zammit-Mangion, and Huser (2024) for further details on the use of DeepSets in the context of neural Bayes estimation, and for a discussion on the architecture’s connection to conventional estimators.

1.3 Uncertainty quantification

Uncertainty quantification with neural Bayes estimators often proceeds through the bootstrap distribution. Bootstrap-based approaches are particularly attractive when nonparametric bootstrap is possible (e.g., when the data are independent replicates), or when simulation from the fitted model is fast, in which case parametric bootstrap is also computationally efficient. However, these conditions are not always met. For example, when making inference from a single spatial field, nonparametric bootstrap is not possible without breaking the spatial dependence structure, and the cost of simulation from the fitted model is often non-negligible (e.g., exact simulation from a Gaussian process model requires the factorisation of an \(n \times n\) matrix, where \(n\) is the number of spatial locations, which is a task that is \(O(n^3)\) in computational complexity). Further, although bootstrap-based methods for uncertainty quantification are often considered to be fairly accurate and favourable to methods based on asymptotic normality, there are situations where bootstrap procedures are not reliable (see, e.g., Canty et al., 2006, pg. 6).

Alternatively, by leveraging ideas from (Bayesian) quantile regression, one may construct a neural Bayes estimator that approximates a set of marginal posterior quantiles (Fisher et al., 2023; Sainsbury-Dale et al., 2025, Sec. 2.2.4), which can then be used to construct credible intervals for each parameter. Inference then remains fully amortised since, once the estimators are trained, both point estimates and credible intervals can be obtained with virtually zero computational cost.

Posterior quantiles can be targeted by employing the quantile loss function which, for a single parameter \(\theta\), is \[ L_\tau(\theta, \hat{\theta}) = (\hat{\theta} - \theta)(\mathbb{I}(\hat{\theta} > \theta) - \tau), \quad \tau \in (0, 1), \] where \(\mathbb{I}(\cdot)\) denotes the indicator function. In particular, the Bayes estimator under the above loss function is the posterior \(\tau\)-quantile. When there are \(p > 1\) parameters, \(\boldsymbol{\theta} = (\theta_1, \dots, \theta_p)'\), the Bayes estimator under the joint loss \({L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}) = \sum_{k=1}^p L_\tau(\theta_k, \hat{\theta}_k)}\) is the vector of marginal posterior quantiles since, in general, a Bayes estimator under a sum of univariate loss functions is given by the vector of marginal Bayes estimators (Sainsbury-Dale et al., 2025, Thm. 2).

1.4 Construction of neural Bayes estimators

Neural Bayes estimators are conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical model being considered. The workflow is as follows:

Define the prior, \(\Pi(\cdot)\).
Choose a loss function, \(L(\cdot, \cdot)\), typically the absolute-error or squared-error loss.
Design a suitable neural-network architecture for the neural point estimator \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})\).
Sample parameters from \(\Pi(\cdot)\) to form training/validation/test parameter sets.
Given the above parameter sets, simulate data from the model, to form training/validation/test data sets.
Train the neural network (i.e., estimate \(\boldsymbol{\gamma}\)) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.
Assess the fitted neural Bayes estimator, \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)\), using the test set.

The package NeuralEstimators is designed to implement this workflow in a user friendly manner, as we illustrate in the following examples.

2 Examples

We now consider several examples where the data are univariate (Section 2.1), multivariate and unstructured (Section 2.2), collected over a regular grid (Section 2.3), or collected over arbitrary spatial locations (Section 2.4).

A note on data format: The following general rules regarding assumed data format apply:

Sets of parameter vectors are always stored as matrices, where each column corresponds to one parameter vector \(\boldsymbol{\theta}\).
Simulated data are stored as a list, where each element of the list corresponds to a data set \(\boldsymbol{Z}\) (possibly containing independent replicates) simulated conditionally on a parameter vector \(\boldsymbol{\theta}\). The specific format of the data depends on its multivariate structure (see examples below).

Before proceeding, we load the required packages (see the README for instructions on installing Julia and the Julia packages NeuralEstimators and Flux). If you have access to an NVIDIA graphics processing unit (GPU), you can utilise it by simply adding CUDA to the list of Julia packages in the final line below:

library("NeuralEstimators")
library("JuliaConnectoR")
library("ggplot2")
library("ggpubr")  
juliaEval('using NeuralEstimators, Flux') 
#> Starting Julia ...

2.1 Univariate data

We first develop a neural Bayes estimator for \(\boldsymbol{\theta} \equiv (\mu, \sigma)'\) from data \(Z_1, \dots, Z_m\) that are independent and identically distributed according to a \(\rm{Gau}(\mu, \sigma^2)\) distribution.

First, we sample parameters from the prior \(\Pi(\cdot)\) to construct parameter sets used for training and validating the estimator. Here, we use the priors \(\mu \sim \rm{Gau}(0, 1)\) and \(\sigma \sim \rm{Gamma}(1, 1)\), and we assume that the parameters are independent a priori. In NeuralEstimators, the sampled parameters are stored as \(p \times K\) matrices, with \(p\) the number of parameters in the model and \(K\) the number of sampled parameter vectors.

# Sampling from the prior 
# K: number of samples to draw from the prior
prior <- function(K) {
  mu    <- rnorm(K)
  sigma <- rgamma(K, 1)
  Theta <- matrix(c(mu, sigma), byrow = TRUE, ncol = K)
  return(Theta)
}
K <- 10000
theta_train <- prior(K)
theta_val   <- prior(K/10)

Next, we implicitly define the statistical model with simulated data. In NeuralEstimators, the simulated data are stored as a list, where each element of the list corresponds to a data set simulated conditional on one parameter vector. Since each replicate of our model is univariate, we take the summary network of the DeepSets framework to be an MLP with a single input neuron, and each simulated data set is stored as a matrix with the independent replicates in the columns (i.e, a \(1 \times m\) matrix).

# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
simulate <- function(Theta, m) {
  apply(Theta, 2, function(theta) t(rnorm(m, theta[1], theta[2])), simplify = FALSE)
}

m <- 15
Z_train <- simulate(theta_train, m)
Z_val   <- simulate(theta_val, m)

We now design architectures for the summary network and outer network of the DeepSets framework, and initialise our neural point estimator. This can be done using the R helper function initialise_estimator(), or using Julia code directly, as follows:

# Initialise the estimator
estimator <- juliaEval('
  d = 1    # dimension of each replicate (univariate data)
  p = 2    # number of parameters in the statistical model
  w = 32   # number of neurons in each hidden layer

  psi = Chain(Dense(d, w, relu), Dense(w, w, relu), Dense(w, w, relu))
  phi = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, p))
  deepset = DeepSet(psi, phi)
  estimator = PointEstimator(deepset)
')

A note on long-term stability: If constructing a neural Bayes estimator for repeated and long term use, we recommended to defining the architecture using Julia code directly. This is because initialise_estimator() is subject to change, as methods for designing neural-network architectures improve over time and these improved methods are incorporated into the package.

Once the estimator is initialised, it is fitted using train(), here using the default mean-absolute-error loss. We use the sets of parameters and simulated data constructed earlier; one may alternatively use the arguments sampler and simulator to pass functions for sampling from the prior and simulating from the model, respectively. These functions can be defined in R (as we have done in this example) or in Julia using the package JuliaConnectoR, and this approach facilitates the technique known as “on-the-fly” simulation.

# Train the estimator
estimator <- train(
  estimator,
  theta_train = theta_train,
  theta_val   = theta_val,
  Z_train = Z_train,
  Z_val   = Z_val,
  epochs = 50
  )
#> Computing the initial validation risk... Initial validation risk = 0.8906495
#> Computing the initial training risk... Initial training risk = 0.8892015
#> Epoch: 1  Training risk: 0.295  Validation risk: 0.222  Run time of epoch: 7.941 seconds
#> Epoch: 2  Training risk: 0.207  Validation risk: 0.201  Run time of epoch: 0.235 seconds
#> Epoch: 3  Training risk: 0.197  Validation risk: 0.188  Run time of epoch: 0.213 seconds
#> Epoch: 4  Training risk: 0.192  Validation risk: 0.191  Run time of epoch: 0.211 seconds
#> Epoch: 5  Training risk: 0.188  Validation risk: 0.188  Run time of epoch: 0.228 seconds
#> Epoch: 6  Training risk: 0.189  Validation risk: 0.186  Run time of epoch: 0.21 seconds
#> Epoch: 7  Training risk: 0.187  Validation risk: 0.185  Run time of epoch: 0.208 seconds
#> Epoch: 8  Training risk: 0.185  Validation risk: 0.183  Run time of epoch: 0.218 seconds
#> Epoch: 9  Training risk: 0.184  Validation risk: 0.184  Run time of epoch: 0.22 seconds
#> Epoch: 10  Training risk: 0.183  Validation risk: 0.182  Run time of epoch: 0.229 seconds
#> Epoch: 11  Training risk: 0.183  Validation risk: 0.178  Run time of epoch: 0.348 seconds
#> Epoch: 12  Training risk: 0.182  Validation risk: 0.181  Run time of epoch: 0.236 seconds
#> Epoch: 13  Training risk: 0.182  Validation risk: 0.178  Run time of epoch: 0.221 seconds
#> Epoch: 14  Training risk: 0.181  Validation risk: 0.186  Run time of epoch: 0.226 seconds
#> Epoch: 15  Training risk: 0.181  Validation risk: 0.178  Run time of epoch: 0.21 seconds
#> Epoch: 16  Training risk: 0.18  Validation risk: 0.189  Run time of epoch: 0.216 seconds
#> Epoch: 17  Training risk: 0.18  Validation risk: 0.179  Run time of epoch: 0.216 seconds
#> Stopping early since the validation loss has not improved in 5 epochs

If the argument savepath in train() is specified, the neural-network state (e.g., the weights and biases) will be saved during training as bson files, and the best state (as measured by validation risk) will be saved as best_network.bson. To load these saved states into a neural network in later R sessions, one may use the function loadstate(). Note that one may also manually save a trained estimator using savestate().

Once a neural Bayes estimator has been trained, its performance should be assessed. Simulation-based (empirical) methods for evaluating the performance of a neural Bayes estimator are ideal, since simulation is already required for constructing the estimator, and because the estimator can be applied to thousands of simulated data sets at almost no computational cost.

To assess the accuracy of the resulting neural Bayes estimator, one may use the function assess(). The resulting object can then be passed to the functions risk(), bias(), and rmse() to compute a Monte Carlo approximation of the Bayes risk, the bias, and the RMSE, or passed to the function plotestimates() to visualise the estimates against the corresponding true values:

# Assess the estimator
theta_test <- prior(1000)
Z_test     <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, estimator_names = "NBE")
#>  Running NBE...

parameter_labels <- c("θ1" = expression(mu), "θ2" = expression(sigma))
plotestimates(assessment, parameter_labels = parameter_labels)

In addition to assessing the estimator over the entire parameter space \(\Theta\), it is often helpful to visualise the empirical sampling distribution of an estimator for a particular parameter configuration. This can be done by calling assess() with \(J > 1\) data sets simulated under a single parameter configuration, and providing the resulting estimates to plotdistribution(). This function can be used to plot the empirical joint and marginal sampling distribution of the neural Bayes estimator, with the true parameter values demarcated in red.

theta      <- as.matrix(c(0, 0.5))                             # true parameters
J          <- 400                                              # number of data sets
Z          <- lapply(1:J, function(i) simulate(theta, m)[[1]]) # simulate data
assessment <- assess(estimator, theta, Z, estimator_names = "NBE")
#>  Running NBE...

joint <- plotdistribution(assessment, type = "scatter", 
                          parameter_labels = parameter_labels)
marginal <- plotdistribution(assessment, type = "box",
                             parameter_labels = parameter_labels,
                             return_list = TRUE)
ggarrange(plotlist = c(joint, marginal), nrow = 1, common.legend = TRUE)

Once the neural Bayes estimator has been trained and assessed, it can be applied to observed data using the function estimate(), and non-parametric bootstrap estimates can be computed using the function bootstrap(). Below, we use simulated data as a surrogate for observed data:

theta    <- as.matrix(c(0, 0.5))     # true parameters
Z        <- simulate(theta, m)       # pretend that this is observed data
estimate(estimator, Z)               # point estimate from the "observed data"
#>            [,1]
#> [1,] 0.08637698
#> [2,] 0.50336009
bootstrap(estimator, Z)[, 1:6]       # non-parametric bootstrap estimates
#>           [,1]      [,2]      [,3]        [,4]        [,5]      [,6]
#> [1,] 0.1543499 0.1173697 0.2904001 -0.06560165 -0.06802403 0.1969504
#> [2,] 0.6238579 0.4553554 0.5176821  0.39032412  0.33412695 0.4491137

2.2 Multivariate data

Suppose now that our data consists of \(m\) replicates of a \(d\)-dimensional multivariate distribution.

For unstructured \(d\)-dimensional data, the estimator is based on a multilayer perceptron (MLP), and each data set is stored as a two-dimensional array (a matrix), with the second dimension storing the independent replicates. That is, in this case, we store the data as a list of \(d \times m\) matrices, and the summary network of the DeepSets representation has \(d\) neurons in the input layer.

For example, consider the task of estimating \(\boldsymbol{\theta} \equiv (\mu_1, \mu_2, \sigma, \rho)'\) from data \(\boldsymbol{Z}_1, \dots, \boldsymbol{Z}_m\) that are independent and identically distributed according to the bivariate Gaussian distribution, \[ \rm{Gau}\left(\begin{bmatrix}\mu_1 \\ \mu_2\end{bmatrix}, \sigma^2\begin{bmatrix} 1 & \rho \\ \rho & 1\end{bmatrix}\right). \] Then, to construct a neural Bayes estimator for this model, one may use the following code for defining a prior, the data simulator, and the neural-network architecture:

# Sampling from the prior 
# K: number of samples to draw from the prior
prior <- function(K) {
  mu1    <- rnorm(K)
  mu2    <- rnorm(K)
  sigma  <- rgamma(K, 1)
  rho    <- runif(K, -1, 1)
  theta  <- matrix(c(mu1, mu2, sigma, rho), byrow = TRUE, ncol = K)
  return(theta)
}

# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
simulate <- function(Theta, m) {
  apply(Theta, 2, function(theta) {
    mu    <- c(theta[1], theta[2])
    sigma <- theta[3]
    rho   <- theta[4]
    Sigma <- sigma^2 * matrix(c(1, rho, rho, 1), 2, 2)
    Z <- MASS::mvrnorm(m, mu, Sigma)
    t(Z)
  }, simplify = FALSE)
}

# Initialise the estimator
estimator <- juliaEval('
  d = 2    # dimension of each replicate 
  w = 32   # number of neurons in each hidden layer
  
  # Layer to ensure valid estimates
  final_layer = Parallel(
      vcat,
      Dense(w, 2, identity), # mean parameters
      Dense(w, 1, softplus), # variance parameter
      Dense(w, 1, tanh)      # correlation parameter
    )

  psi = Chain(Dense(d, w, relu), Dense(w, w, relu), Dense(w, w, relu))
  phi = Chain(Dense(w, w, relu), Dense(w, w, relu), final_layer)
  deepset = DeepSet(psi, phi)
  estimator = PointEstimator(deepset)
')

The training, assessment, and application of the neural Bayes estimator then remains as in the example above:

# Construct training and validation data sets
K <- 5000
m <- 250
theta_train <- prior(K)
theta_val   <- prior(K/10)
Z_train <- simulate(theta_train, m)
Z_val   <- simulate(theta_val, m)

# Train the estimator
estimator <- train(
  estimator,
  theta_train = theta_train,
  theta_val   = theta_val,
  Z_train = Z_train,
  Z_val   = Z_val, 
  epochs = 50
  )
#> Computing the initial validation risk... Initial validation risk = 0.69000894
#> Computing the initial training risk... Initial training risk = 0.69885015
#> Epoch: 1  Training risk: 0.435  Validation risk: 0.275  Run time of epoch: 4.549 seconds
#> Epoch: 2  Training risk: 0.229  Validation risk: 0.203  Run time of epoch: 1.959 seconds
#> Epoch: 3  Training risk: 0.2  Validation risk: 0.19  Run time of epoch: 2.043 seconds
#> Epoch: 4  Training risk: 0.187  Validation risk: 0.179  Run time of epoch: 1.85 seconds
#> Epoch: 5  Training risk: 0.177  Validation risk: 0.172  Run time of epoch: 1.9 seconds
#> Epoch: 6  Training risk: 0.169  Validation risk: 0.17  Run time of epoch: 1.966 seconds
#> Epoch: 7  Training risk: 0.164  Validation risk: 0.158  Run time of epoch: 2.004 seconds
#> Epoch: 8  Training risk: 0.158  Validation risk: 0.154  Run time of epoch: 1.854 seconds
#> Epoch: 9  Training risk: 0.155  Validation risk: 0.15  Run time of epoch: 1.83 seconds
#> Epoch: 10  Training risk: 0.151  Validation risk: 0.149  Run time of epoch: 1.976 seconds
#> Epoch: 11  Training risk: 0.148  Validation risk: 0.144  Run time of epoch: 2.002 seconds
#> Epoch: 12  Training risk: 0.145  Validation risk: 0.144  Run time of epoch: 1.792 seconds
#> Epoch: 13  Training risk: 0.143  Validation risk: 0.141  Run time of epoch: 1.943 seconds
#> Epoch: 14  Training risk: 0.141  Validation risk: 0.137  Run time of epoch: 1.937 seconds
#> Epoch: 15  Training risk: 0.14  Validation risk: 0.14  Run time of epoch: 1.895 seconds
#> Epoch: 16  Training risk: 0.137  Validation risk: 0.139  Run time of epoch: 2.153 seconds
#> Epoch: 17  Training risk: 0.135  Validation risk: 0.133  Run time of epoch: 2.124 seconds
#> Epoch: 18  Training risk: 0.133  Validation risk: 0.134  Run time of epoch: 2.119 seconds
#> Epoch: 19  Training risk: 0.133  Validation risk: 0.137  Run time of epoch: 2.113 seconds
#> Epoch: 20  Training risk: 0.132  Validation risk: 0.139  Run time of epoch: 2.128 seconds
#> Epoch: 21  Training risk: 0.131  Validation risk: 0.131  Run time of epoch: 1.965 seconds
#> Epoch: 22  Training risk: 0.129  Validation risk: 0.127  Run time of epoch: 2.11 seconds
#> Epoch: 23  Training risk: 0.128  Validation risk: 0.135  Run time of epoch: 1.872 seconds
#> Epoch: 24  Training risk: 0.126  Validation risk: 0.126  Run time of epoch: 1.981 seconds
#> Epoch: 25  Training risk: 0.125  Validation risk: 0.123  Run time of epoch: 1.942 seconds
#> Epoch: 26  Training risk: 0.125  Validation risk: 0.123  Run time of epoch: 1.986 seconds
#> Epoch: 27  Training risk: 0.123  Validation risk: 0.125  Run time of epoch: 2.112 seconds
#> Epoch: 28  Training risk: 0.123  Validation risk: 0.125  Run time of epoch: 2.197 seconds
#> Epoch: 29  Training risk: 0.122  Validation risk: 0.127  Run time of epoch: 2.251 seconds
#> Epoch: 30  Training risk: 0.121  Validation risk: 0.123  Run time of epoch: 2.417 seconds
#> Epoch: 31  Training risk: 0.12  Validation risk: 0.121  Run time of epoch: 2.061 seconds
#> Epoch: 32  Training risk: 0.12  Validation risk: 0.125  Run time of epoch: 1.852 seconds
#> Epoch: 33  Training risk: 0.12  Validation risk: 0.119  Run time of epoch: 1.814 seconds
#> Epoch: 34  Training risk: 0.118  Validation risk: 0.119  Run time of epoch: 1.78 seconds
#> Epoch: 35  Training risk: 0.117  Validation risk: 0.117  Run time of epoch: 1.908 seconds
#> Epoch: 36  Training risk: 0.117  Validation risk: 0.117  Run time of epoch: 1.844 seconds
#> Epoch: 37  Training risk: 0.116  Validation risk: 0.125  Run time of epoch: 1.951 seconds
#> Epoch: 38  Training risk: 0.118  Validation risk: 0.121  Run time of epoch: 1.785 seconds
#> Epoch: 39  Training risk: 0.116  Validation risk: 0.121  Run time of epoch: 1.859 seconds
#> Epoch: 40  Training risk: 0.115  Validation risk: 0.116  Run time of epoch: 1.912 seconds
#> Epoch: 41  Training risk: 0.115  Validation risk: 0.113  Run time of epoch: 2.029 seconds
#> Epoch: 42  Training risk: 0.116  Validation risk: 0.117  Run time of epoch: 2.093 seconds
#> Epoch: 43  Training risk: 0.114  Validation risk: 0.117  Run time of epoch: 1.894 seconds
#> Epoch: 44  Training risk: 0.115  Validation risk: 0.118  Run time of epoch: 2.014 seconds
#> Epoch: 45  Training risk: 0.115  Validation risk: 0.115  Run time of epoch: 2.109 seconds
#> Epoch: 46  Training risk: 0.115  Validation risk: 0.115  Run time of epoch: 2.017 seconds
#> Epoch: 47  Training risk: 0.115  Validation risk: 0.121  Run time of epoch: 1.814 seconds
#> Stopping early since the validation loss has not improved in 5 epochs

# Assess the estimator
theta_test <- prior(1000)
Z_test     <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, 
                     estimator_names = "NBE", 
                     parameter_names = c("mu1", "mu2", "sigma", "rho")) 
#>  Running NBE...
plotestimates(assessment)

2.3 Gridded data

For data collected over a regular grid, neural Bayes estimators are based on a convolutional neural network (CNN). We give specific attention to this case in a separate vignette. There, we also outline how to perform neural Bayes estimation with incomplete/missing data based on methods described by Wang et al. (2024) and Sainsbury-Dale et al. (2025b).

2.4 Irregular spatial data

To cater for spatial data collected over arbitrary spatial locations, one may employ a graph neural network (GNN; see Sainsbury-Dale, Zammit-Mangion, Richards, and Huser, 2025). The overall workflow remains as given in previous examples, with two key additional steps:

Sampling spatial configurations during the training phase, possibly using an appropriately chosen spatial point process; see Section 2.2.2 of Sainsbury-Dale et al. (2025).
Storing the spatial data as a graph.

For illustration, we develop a neural Bayes estimator for a spatial Gaussian process model, where \(\boldsymbol{Z} \equiv (Z_{1}, \dots, Z_{n})'\) are data collected at locations \(\{\boldsymbol{s}_{1}, \dots, \boldsymbol{s}_{n}\}\) in a spatial domain that is a subset of \(\mathbb{R}^2\). The data are modelled as spatially-correlated mean-zero Gaussian random variables with exponential covariance function, given by \[ \textrm{cov}(Z_i, Z_j) = \textrm{exp}(-\|\boldsymbol{s}_i - \boldsymbol{s}_j\|/\theta), \quad i, j = 1, \dots, n, \] with unknown range parameter \(\theta > 0\). Here, we take the spatial domain to be the unit square, we sample spatial configurations from a Matern cluster process, and we adopt a uniform prior \(\theta \sim \rm{Unif}(0, 0.4)\).

To begin, we define a function for sampling from the prior, and a function for sampling spatial configurations and simulating from the statistical model conditional on the sampled spatial configurations and parameter vectors.

# Sampling from the prior 
# K: number of samples to draw from the prior
prior <- function(K) { 
  theta <- runif(K, max = 0.4) # draw from the prior 
  theta <- t(theta)            # reshape to matrix
  return(theta)
}

# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
library("parallel") # mclapply()
library("spatstat") # rMatClust()
simulate <- function(theta, m = 1) { 
  
  # Number of parameter vectors
  K <- ncol(theta) 
  
  # Simulate spatial configurations over the unit square, with each configuration
  # drawn from a Matern cluster process with different, randomly sampled hyperparameters
  n      <- sample(100:300, K, replace = TRUE)  # approximate sample size 
  lambda <- runif(K, 10, 50)                    # intensity of parent Poisson point process
  mu     <- n/lambda                            # mean number of daughter points 
  r      <- runif(K, 0.05, 0.2)                 # cluster radius
  S <- lapply(1:K, function(k) {
    pts <- rMatClust(lambda[k], r = r[k], mu = mu[k])
    cbind(x = pts$x, y = pts$y)
  })
  
  # Simulate conditionally independent replicates for each pair of parameters 
  # and spatial configurations
  Z <- mclapply(1:K, function(k) {
    D <- as.matrix(dist(S[[k]])) # distance matrix   
    n <- nrow(D)                 # sample size
    Sigma <- exp(-D/theta[k])    # covariance matrix
    L <- t(chol(Sigma))          # Cholesky factor of the covariance matrix
    e <- matrix(rnorm(n*m), nrow = n, ncol = m) # standard normal variates
    Z <- L %*% e                 # conditionally independent replicates from the model
    Z
  })
  
  # Define a graph from the spatial configurations and associated spatial data
  G <- spatialgraphlist(S, Z)
  
  return(G)
}

Next, we define our GNN architecture and initialise the estimator. Here, we use an architecture tailored to isotropic spatial dependence models; for further details, see Section 2.2.1 of Sainsbury-Dale et al. (2025). In this example our goal is to construct a point estimator, however other kinds of estimators can be constructed by substituting the appropriate estimator class in the final line below:

# Initialise the estimator 
estimator <- juliaEval('

  # Load Julia packages for GNN functionality
  using GraphNeuralNetworks
  using Statistics: mean

  # Spatial weight functions: continuous surrogates for 0-1 basis functions 
  h_max = 0.15 # maximum distance to consider 
  q = 10       # output dimension of the spatial weights
  w = KernelWeights(h_max, q)
  
  # Propagation module
  propagation = GNNChain(
    SpatialGraphConv(1 => q, relu, w = w, w_out = q),
    SpatialGraphConv(q => q, relu, w = w, w_out = q)
  )
  
  # Readout module
  readout = GlobalPool(mean)
  
  # Summary network
  psi = GNNSummary(propagation, readout)
  
  # Expert summary statistics, the empirical variogram
  V = NeighbourhoodVariogram(h_max, q)
  
  # Inference network
  phi = Chain(
    Dense(2q  => 128, relu), 
    Dense(128 => 128, relu), 
    Dense(128 => 1, softplus) # softplus activation to ensure positive estimates
  )
  
  # DeepSet object 
  deepset = DeepSet(psi, phi; S = V)
  
  # Point estimator
  estimator = PointEstimator(deepset)
')

Next, we train the estimator:

# Construct training and validation data sets
K <- 5000
m <- 1
theta_train <- prior(K)
theta_val   <- prior(K/10)
Z_train <- simulate(theta_train, m)
Z_val   <- simulate(theta_val, m)

# Train the estimator
estimator <- train(
  estimator,
  theta_train = theta_train,
  theta_val   = theta_val,
  Z_train     = Z_train,
  Z_val       = Z_val,
  epochs      = 10
  )
#> Computing the initial validation risk... Initial validation risk = 0.4399248
#> Computing the initial training risk... Initial training risk = 0.44623217
#> Epoch: 1  Training risk: 0.048  Validation risk: 0.029  Run time of epoch: 42.912 seconds
#> Epoch: 2  Training risk: 0.026  Validation risk: 0.026  Run time of epoch: 26.98 seconds
#> Epoch: 3  Training risk: 0.025  Validation risk: 0.026  Run time of epoch: 26.99 seconds
#> Epoch: 4  Training risk: 0.024  Validation risk: 0.025  Run time of epoch: 26.394 seconds
#> Epoch: 5  Training risk: 0.023  Validation risk: 0.024  Run time of epoch: 26.872 seconds
#> Epoch: 6  Training risk: 0.023  Validation risk: 0.024  Run time of epoch: 26.353 seconds
#> Epoch: 7  Training risk: 0.022  Validation risk: 0.024  Run time of epoch: 27.15 seconds
#> Epoch: 8  Training risk: 0.023  Validation risk: 0.024  Run time of epoch: 26.4 seconds
#> Epoch: 9  Training risk: 0.022  Validation risk: 0.023  Run time of epoch: 27.199 seconds
#> Epoch: 10  Training risk: 0.022  Validation risk: 0.025  Run time of epoch: 26.314 seconds

Note that the computations in GNNs are performed in parallel, making them particularly well-suited for GPUs, which typically contain thousands of cores. If you have access to an NVIDIA GPU, you can utilise it by simply loading the Julia package CUDA, for example, by running juliaEval('using CUDA').

Next, we assess the trained estimator:

# Assess the estimator
theta_test <- prior(1000)
Z_test     <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, estimator_names = "GNN NBE")
#>  Running GNN NBE...
plotestimates(assessment)

Finally, once the estimator has been assessed, it may be applied to observed data:

theta    <- as.matrix(0.1)           # true parameter
Z        <- simulate(theta, m)[[1]]  # simulated data as a substitute for observed data
estimate(estimator, Z)               # point estimates from the "observed data"
#>           [,1]
#> [1,] 0.1120888

NeuralEstimators

Matthew Sainsbury-Dale, Andrew Zammit-Mangion, and Raphaël Huser