This tutorial demonstrates how to use the SEAGLE
package when the user inputs a matrix G. We’ll begin by loading the SEAGLE
package.
library(SEAGLE)
As an example, we’ll generate some synthetic data for usage in this tutorial. Let’s consider a dataset with n=5000 individuals and L=100 loci, where the first 40 are causal.
The makeSimData
function generates a covariate matrix ˜X∈Rn×3, where the first column is the all ones vector for the intercept and the second and third columns are X∼N(0,1) and E∼N(0,1), respectively. The last two columns are scaled to have 0 mean and unit variance.
The makeSimData
function additionally generates the genetic marker matrix G with synthetic haplotype data from the COSI software. Detailed procedures for generating G can be found in the accompanying journal manuscript. Finally, the makeSimData
function also generates a continuous phenotype y according to the following fixed effects model
{\bf y} = \tilde{\bf X} \boldsymbol{\gamma}_{\widetilde{\bf X}} + {\bf G}\boldsymbol{\gamma}_{G} + \text{diag}(E){\bf G}\boldsymbol{\gamma}_{GE} + {\bf e}.
Here, \boldsymbol{\gamma}_{\tilde{\bf X}} is the all ones vector of length P=3, \boldsymbol{\gamma}_{G} \in \mathbb{R}^{L}, \boldsymbol{\gamma}_{GE}\in \mathbb{R}^{L}, and {\bf e} \sim \text{N}({\bf 0}, \sigma\, {\bf I}_{n}). The entries of \boldsymbol{\gamma}_{G} and \boldsymbol{\gamma}_{GE} pertaining to causal loci are set to be \gamma_{G} = gammaG
and \gamma_{GE} = gammaGE
, respectively. The remaining entries of \boldsymbol{\gamma}_{G} and \boldsymbol{\gamma}_{GE} pertaining to non-causal loci are set to 0.
<- makeSimData(H=cosihap, n=5000, L=100, gammaG=1, gammaGE=0, causal=40, seed=1) dat
Now that we have our data, we can prepare it for use in the SEAGLE algorithm. We will input our {\bf y}, {\bf X}, {\bf E}, and {\bf G} into the prep.SEAGLE
function. The intercept = 1
parameter indicates that the first column of {\bf X} is the all ones vector for the intercept.
This preparation procedure formats the input data for the SEAGLE
function by checking the dimensions of the input data. It also pre-computes a QR decomposition for \widetilde{\bf X} = \begin{pmatrix} {\bf 1}_{n} & {\bf X} & {\bf E} \end{pmatrix}, where {\bf 1}_{n} denotes the all ones vector of length n.
<- prep.SEAGLE(y=dat$y, X=dat$X, intercept=1, E=dat$E, G=dat$G) objSEAGLE
Finally, we’ll input the prepared data into the SEAGLE
function to compute the score-like test statistic T and its corresponding p-value. The init.tau
and init.sigma
parameters are the initial values for \tau and \sigma employed in the REML EM algorithm.
<- SEAGLE(objSEAGLE, init.tau=0.5, init.sigma=0.5)
res $T
res#> [1] 246.1886
$pv
res#> [1] 0.8441451
The score-like test statistic T for the G\timesE effect and its corresponding p-value can be found in res$T
and res$pv
, respectively.