Processing math: 0%

MCMC scheme for posterior inference of Gaussian DAG models: the learn_DAG() function

library(BCDAG)

This is the second of a series of three vignettes for the R package BCDAG. In this vignette we focus on function learn_DAG(), which implements a Markov Chain Monte Carlo (MCMC) algorithm to sample from the joint posterior of DAG structures and DAG-parameters under the Gaussian assumption.

Model description

The underlying Bayesian Gaussian DAG-model can be summarized as follows: \begin{eqnarray} X_1, \dots, X_q \,|\,\boldsymbol L, \boldsymbol D, \mathcal{D} &\sim& \mathcal{N}_q\left(\boldsymbol 0, (\boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top)^{-1}\right)\\ (\boldsymbol L, \boldsymbol D)\,|\,\mathcal{D} &\sim& \text{DAG-Wishart}(\boldsymbol{a}_{c}^{\mathcal{D}}, \boldsymbol U) \\ p(\mathcal{D}) &\propto& w^{|\mathcal{S}_\mathcal{D}|}(1-w)^{\frac{q(q-1)}{2} - {|\mathcal{S}_\mathcal{D}|}} \end{eqnarray}

In particular \mathcal{D}=(V,E) denotes a DAG structure with set of nodes V=\{1,\dots,q\} and set of edges E\subseteq V \times V. Moreover, (\boldsymbol L, \boldsymbol D) are model parameters providing the decomposition of the precision (inverse-covariance) matrix \boldsymbol{\Omega} = \boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top; specifically, \boldsymbol{L} is a (q, q) matrix of coefficients such that for each (u, v)-element \boldsymbol{L}_{uv} with u \ne v, \boldsymbol{L}_{uv} \ne 0 if and only if (u, v) \in E, while \boldsymbol{L}_{uu} = 1 for each u = 1,\dots, q; also, \boldsymbol{D} is a (q, q) diagonal matrix with (u, u)-element \boldsymbol{D}_{uu}.

The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG-model; see also Castelletti & Mascaro (2021).

Conditionally to \mathcal{D}, a prior to (\boldsymbol{L}, \boldsymbol{D}) is assigned through a compatible DAG-Wishart distribution with rate hyperparameter \boldsymbol{U}, a (q,q) s.p.d. matrix, and shape hyperparameter \boldsymbol{a}^{\mathcal {D}}_{c}, a (q,1) vector; see also Cao et al. (2019) and Peluso & Consonni (2020).

Finally, a prior on DAG \mathcal{D} is assigned through a Binomial distribution on the number of edges in the graph; in p(\mathcal{D}), w \in (0,1) is a prior probability of edge inclusion, while |\mathcal{S_{\mathcal{D}}}| denotes the number of edges in \mathcal{D}; see again Castelletti & Mascaro (2021) for further details.

Target of the MCMC scheme is therefore the joint posterior of (\boldsymbol{L},\boldsymbol{D},\mathcal{D}), \begin{equation} p(\boldsymbol L, \boldsymbol D, \mathcal{D}\,|\, \boldsymbol X) \propto p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})p(\boldsymbol{L},\boldsymbol{D}\,|\,\mathcal{D}) \,p(\mathcal{D}), \end{equation} where \boldsymbol{X} denotes a (n,q) data matrix as obtained through n i.i.d. draws from the Gaussian DAG-model and p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D}) is the likelihood function. See also Castelletti & Mascaro (2022+) for full details.

Generating data

We first randomly generate a DAG \mathcal{D}, the DAG parameters (\boldsymbol{L},\boldsymbol{D}) and then n=1000 i.i.d. observations from a Gaussian DAG-model as follows:

set.seed(1)
q <- 8
w <- 0.2
DAG <- rDAG(q,w)
a <- q
U <- diag(1,q)
outDL <- rDAGWishart(n=1, DAG, a, U)
L <- outDL$L; D <- outDL$D
Omega <- L %*% solve(D) %*% t(L)
Sigma <- solve(Omega)
n <- 1000
X <- mvtnorm::rmvnorm(n = n, sigma = Sigma)

See also our vignette about data generation from Gaussian DAG-models.

learn_DAG()

Function learn_DAG() implements an MCMC algorithm to sample from the joint posterior of DAGs and DAG parameters. This is based on a Partial Analytic Structure (PAS) algorithm (Godsill, 2012) which, at each iteration:

  1. Updates the DAG through a Metropolis-Hastings (MH) step where, given the current DAG, a new (direct successor) DAG is drawn from a suitable proposal distribution and accepted with a probability given by the MH acceptance rate (see also section A note on fast = TRUE);
  2. Samples from the posterior distribution of the (updated DAG) parameters; see also Castelletti & Consonni (2021) for more details.

We implement it as follows:

out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = FALSE, collapse = FALSE)

Input

Inputs of learn_DAG() correspond to three different sets of arguments:

Output

The output of learn_DAG() is an object of class bcdag:

class(out)
#> [1] "bcdag"

bcdag objects include the output of the MCMC algorithm together with a collection of meta-data representing the input arguments of learn_DAG(); these are stored in the attributes of the object::

str(out)
#> List of 3
#>  $ Graphs: num [1:8, 1:8, 1:5000] 0 0 0 0 1 0 0 1 0 0 ...
#>  $ L     : num [1:8, 1:8, 1:5000] 1 0 0 0 -0.0194 ...
#>  $ D     : num [1:8, 1:8, 1:5000] 0.164 0 0 0 0 ...
#>  - attr(*, "class")= chr "bcdag"
#>  - attr(*, "type")= chr "complete"
#>  - attr(*, "input")=List of 10
#>   ..$ S          : num 5000
#>   ..$ burn       : num 1000
#>   ..$ data       : num [1:1000, 1:8] -0.299 -0.351 -0.631 0.219 -0.351 ...
#>   ..$ a          : num 8
#>   ..$ U          : num [1:8, 1:8] 1 0 0 0 0 0 0 0 0 1 ...
#>   ..$ w          : num 0.2
#>   ..$ fast       : logi FALSE
#>   ..$ save.memory: logi FALSE
#>   ..$ collapse   : logi FALSE
#>   ..$ verbose    : logi TRUE

Attribute type refers to the output of learn_DAG(), whose structure depends on the choice of the arguments save.memory and collapse.

When both are set equal to FALSE, as in the previous example, the output of learn_DAG() is a complete bcdag object, collecting three (q,q,S) arrays with the DAG structures (in the form of q \times q adjacency matrices) and the DAG parameters \boldsymbol{L} and \boldsymbol{D} (both as q \times q matrices) sampled by the MCMC:

out$Graphs[,,1]
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    1    0    0
#> [3,]    0    0    0    0    0    0    0    1
#> [4,]    0    0    0    0    0    0    0    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    1    0    1    0    0    0    0
#> [8,]    1    0    0    0    0    0    0    0
round(out$L[,,1],2)
#>       [,1]  [,2] [,3]  [,4] [,5] [,6] [,7] [,8]
#> [1,]  1.00  0.00    0  0.00    0 0.00    0 0.00
#> [2,]  0.00  1.00    0  0.00    0 0.07    0 0.00
#> [3,]  0.00  0.00    1  0.00    0 0.00    0 0.94
#> [4,]  0.00  0.00    0  1.00    0 0.00    0 0.00
#> [5,] -0.02  0.00    0  0.00    1 0.00    0 0.00
#> [6,]  0.00  0.00    0  0.00    0 1.00    0 0.00
#> [7,]  0.00 -0.02    0 -1.97    0 0.00    1 0.00
#> [8,]  0.40  0.00    0  0.00    0 0.00    0 1.00
round(out$D[,,1],2)
#>      [,1] [,2] [,3] [,4]  [,5] [,6] [,7] [,8]
#> [1,] 0.16 0.00 0.00 0.00  0.00  0.0 0.00 0.00
#> [2,] 0.00 0.57 0.00 0.00  0.00  0.0 0.00 0.00
#> [3,] 0.00 0.00 0.74 0.00  0.00  0.0 0.00 0.00
#> [4,] 0.00 0.00 0.00 0.94  0.00  0.0 0.00 0.00
#> [5,] 0.00 0.00 0.00 0.00 77.94  0.0 0.00 0.00
#> [6,] 0.00 0.00 0.00 0.00  0.00  0.8 0.00 0.00
#> [7,] 0.00 0.00 0.00 0.00  0.00  0.0 0.31 0.00
#> [8,] 0.00 0.00 0.00 0.00  0.00  0.0 0.00 0.96

When collapse = TRUE and save.memory = FALSE the output of learn_DAG() is a collapsed bcdag object, consisting of a (q,q,S) array with the adjacency matrices of the DAGs sampled by the MCMC:

collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = FALSE, collapse = TRUE)
names(collapsed_out)
#> [1] "Graphs"
class(collapsed_out)
#> [1] "bcdag"
attributes(collapsed_out)$type
#> [1] "collapsed"
collapsed_out$Graphs[,,1]
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    0    0    0
#> [3,]    0    0    0    0    0    0    0    0
#> [4,]    0    0    0    0    0    0    1    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    0    0    0    0    0    0    1
#> [8,]    1    0    1    0    0    0    0    0

When save.memory = TRUE and collapse = FALSE, the output is a compressed bcdag object, collecting samples from the joint posterior on DAGs and DAG parameters in the form of a vector of strings:

compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = TRUE, collapse = FALSE)
names(compressed_out)
#> [1] "Graphs" "L"      "D"
class(compressed_out)
#> [1] "bcdag"
attributes(compressed_out)$type
#> [1] "compressed"

In such a case, we can access to the MCMC draws as:

compressed_out$Graphs[1]
#> [1] "0;0;0;0;1;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0"
compressed_out$L[1]
#> [1] "1;0;0;0;-0.020714882228288;0;0;0.389080449183579;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0.46449274423891;0;0;0;1;0;0;-1.76444008583527;-0.0201974091006512;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1"
compressed_out$D[1]
#> [1] "0.154822760388321;0;0;0;0;0;0;0;0;0.544421200660814;0;0;0;0;0;0;0;0;0.395261505201722;0;0;0;0;0;0;0;0;1.04755025995277;0;0;0;0;0;0;0;0;67.9926231564593;0;0;0;0;0;0;0;0;0.83164913327451;0;0;0;0;0;0;0;0;0.303043124888535;0;0;0;0;0;0;0;0;1.76018923453861"

In addition, we implement bd_decode, an internal function that can be used to visualize the previous objects as matrices:

BCDAG:::bd_decode(compressed_out$Graphs[1])
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    0    0    0
#> [3,]    0    0    0    0    0    0    0    0
#> [4,]    0    0    0    0    0    0    0    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    0    0    1    0    0    0    0
#> [8,]    1    0    1    1    0    0    0    0
round(BCDAG:::bd_decode(compressed_out$L[1]),2)
#>       [,1] [,2] [,3]  [,4] [,5] [,6] [,7] [,8]
#> [1,]  1.00    0 0.00  0.00    0    0    0    0
#> [2,]  0.00    1 0.00  0.00    0    0    0    0
#> [3,]  0.00    0 1.00  0.00    0    0    0    0
#> [4,]  0.00    0 0.00  1.00    0    0    0    0
#> [5,] -0.02    0 0.00  0.00    1    0    0    0
#> [6,]  0.00    0 0.00  0.00    0    1    0    0
#> [7,]  0.00    0 0.00 -1.76    0    0    1    0
#> [8,]  0.39    0 0.46 -0.02    0    0    0    1
round(BCDAG:::bd_decode(compressed_out$D[1]),2)
#>      [,1] [,2] [,3] [,4]  [,5] [,6] [,7] [,8]
#> [1,] 0.15 0.00  0.0 0.00  0.00 0.00  0.0 0.00
#> [2,] 0.00 0.54  0.0 0.00  0.00 0.00  0.0 0.00
#> [3,] 0.00 0.00  0.4 0.00  0.00 0.00  0.0 0.00
#> [4,] 0.00 0.00  0.0 1.05  0.00 0.00  0.0 0.00
#> [5,] 0.00 0.00  0.0 0.00 67.99 0.00  0.0 0.00
#> [6,] 0.00 0.00  0.0 0.00  0.00 0.83  0.0 0.00
#> [7,] 0.00 0.00  0.0 0.00  0.00 0.00  0.3 0.00
#> [8,] 0.00 0.00  0.0 0.00  0.00 0.00  0.0 1.76

Finally, if save.memory = TRUE and collapse = TRUE, the output of learn_DAG() is a compressed and collapsed bcdag object collecting only the sampled DAGs represented as vector of strings:

comprcoll_out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = TRUE, collapse = TRUE)
names(comprcoll_out)
#> [1] "Graphs"
class(comprcoll_out)
#> [1] "bcdag"
attributes(comprcoll_out)$type
#> [1] "compressed and collapsed"
BCDAG:::bd_decode(comprcoll_out$Graphs[1])
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    1    0    0
#> [3,]    0    0    0    0    0    0    0    0
#> [4,]    1    0    0    0    0    0    0    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    0    0    1    0    0    0    0
#> [8,]    1    0    1    0    0    0    0    0

A note on fast = TRUE

Step 1. of the MCMC scheme implemented by learn_DAG() updates DAG \mathcal{D} by randomly drawing a new candidate DAG \mathcal{D}' from a proposal distribution and then accepting it with probability given by the Metropolis Hastings (MH) acceptance rate; see also Castelletti & Mascaro (2021). For a given DAG \mathcal{D}, the proposal distribution q(\mathcal{D}'\,|\,\mathcal{D}) is built over the set \mathcal{O}_{\mathcal{D}} of direct successors DAGs that can be reached from \mathcal{D} by inserting, deleting or reversing a single edge in \mathcal{D}. A DAG \mathcal{D}' is then proposed uniformly from the set \mathcal{O}_{\mathcal{D}} so that q(\mathcal{D}'\,|\,\mathcal{D})=1/|\mathcal{O}_{\mathcal{D}}|. Moreover, the MH rate requires to evaluate the ratio of proposals q(\mathcal{D}'\,|\,\mathcal{D})/q(\mathcal{D}\,|\,\mathcal{D}') = |\mathcal{O}_{\mathcal{D}'}|/|\mathcal{O}_{\mathcal{D}}|, and accordingly the construction of both \mathcal{O}_{\mathcal{D}} and \mathcal{O}_{\mathcal{D}'}.

If fast = FALSE, the proposal ratio is computed exactly; this requires the enumerations of \mathcal{O}_\mathcal{D} and \mathcal{O}_{\mathcal{D}'} which may become computationally expensive, especially when q is large. However, the ratio approaches 1 as the number of variables q increases: option fast = TRUE implements such an approximation, which therefore avoids the construction of \mathcal{O}_\mathcal{D} and \mathcal{O}_{\mathcal{D}'}. A comparison between fast = FALSE and fast = TRUE in the execution of learn_DAG() produces the following results in terms of computational time:

# No approximation
time_nofast <- system.time(out_nofast <- learn_DAG(S = 5000, burn = 1000, data = X, 
                      a, U, w, 
                      fast = FALSE, save.memory = FALSE, collapse = FALSE))
# Approximation
time_fast <- system.time(out_fast <- learn_DAG(S = 5000, burn = 1000, data = X, 
                      a, U, w, 
                      fast = TRUE, save.memory = FALSE, collapse = FALSE))
time_nofast
#>    user  system elapsed 
#>   5.760   0.040   5.815
time_fast
#>    user  system elapsed 
#>   2.310   0.008   2.322

Finally, the corresponding estimated posterior probabilities of edge inclusion are the following:

round(get_edgeprobs(out_nofast), 2)
#>      1    2    3    4 5    6    7    8
#> 1 0.00 0.06 0.07 0.00 0 0.00 0.00 0.00
#> 2 0.05 0.00 0.02 0.03 0 0.21 0.04 0.00
#> 3 0.09 0.02 0.00 0.00 0 0.01 0.02 0.52
#> 4 0.00 0.06 0.00 0.00 0 0.01 0.52 0.00
#> 5 1.00 0.00 0.00 0.00 0 0.00 0.01 0.00
#> 6 0.03 0.26 0.00 0.00 0 0.00 0.02 0.00
#> 7 0.03 0.01 0.00 0.48 0 0.03 0.00 0.02
#> 8 1.00 0.00 0.48 0.01 0 0.00 0.03 0.00
round(get_edgeprobs(out_fast), 2)
#>      1    2    3    4    5    6    7    8
#> 1 0.00 0.04 0.01 0.02 0.00 0.04 0.01 0.00
#> 2 0.05 0.00 0.00 0.03 0.00 0.20 0.01 0.00
#> 3 0.05 0.00 0.00 0.00 0.01 0.01 0.06 0.55
#> 4 0.02 0.03 0.00 0.00 0.00 0.02 0.55 0.01
#> 5 1.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00
#> 6 0.06 0.24 0.01 0.00 0.01 0.00 0.03 0.01
#> 7 0.00 0.02 0.03 0.45 0.00 0.01 0.00 0.00
#> 8 1.00 0.00 0.45 0.02 0.00 0.04 0.00 0.00

References