SKAT, weights, and projections

The original SKAT statistic for linear and generalised linear models is

$Q=(y-\hat\mu)'GW^2G'(y-\hat\mu)=(y-\hat\mu)'K(y-\hat\mu)$ where

$G$ is

$N\times M$ genotype matrix, and

$W$ is a weight matrix that in practice is diagonal. I’ve changed the original notation from

$W$ to

$W^2$ , because everyone basically does. The Harvard group has a factor of 1/2 somewhere in here, the BU/CHARGE group doesn’t.

When the adjustment model isn’t ordinary linear regression, there is a second weight matrix, which I’ll write

$\Sigma$ , giving the metric that makes

$y\mapsto y-\hat\mu$ the projection orthogonal to the range of

$X$ . That is

$\hat\mu = \left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)Y$ Note that both

$\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)$ and

$I-\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)$ are projections.

The matrix whose eigenvalues are needed for SKAT is

$H=P_0^{1/2}KP_0^{1/2}$ (or

$K^{1/2}P_0K^{1/2}$ ) where

$P_0=V^{1/2}\left[I-\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)\right]V^{1/2}$ is the covariance matrix of the the residuals, with

$V=\mathop{var}[Y]$ . Usually

$V=\Sigma$ , but that’s not necessary.

famSKAT has test statistic

$Q=(y-\hat\mu)'V^{-1}GW^2G'V^{-1}(y-\hat\mu)=(y-\hat\mu)'V^{-1}KV^{-1}(y-\hat\mu)$ so the matrix

$H$ is

$H=P_0^{1/2}V^{-1}KV^{-1}P_0^{1/2}$ .

When we want to take a square root of

$P_0$ it helps a lot that the central piece is a projection, and so is idempotent: we can define

$\Pi_0=\left[I-\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)\right]$ and write

$P_0=V^{1/2}\Pi_0V^{1/2}=V^{1/2}\Pi_0\Pi_0V^{1/2}$ .

Now consider

$\tilde G$ , with

$H=\tilde G^T\tilde G$ . We can take

$\tilde G = WG'V^{-1}V^{1/2}\Pi_0=WG'V^{-1/2}\Pi_0$ where

$G$ is sparse,

$W$ is diagonal. The projection

$\Pi_0$ was needed to fit the adjustment model, so it will be fast. In family data where

$V=\Sigma$ is based on expected relatedness from a pedigree, the Cholesky square root

$R=V^{1/2}=\Sigma^{1/2}$ will be sparse.

Let

$f$ be the size of the largest pedigree. We should still be able to multiply a vector by

$\tilde G$ in

$O(MN\alpha+Nf^2)$ time where

$\alpha\ll 1$ depends on the sparseness of

$G$ . If so, we can compute the leading-eigenvalue approximation in

$O(MNk\alpha+Nkf^2)$ time. (In fact, we can replace

$f^2$ by the average of the squares of number of relatives for everyone in the sample)

The relevant bits of the code, the functions that multiply by

$\tilde G$ and

$\tilde G^T$ , look like

CholSigma<-t(chol(SIGMA))
Z<-nullmodel$x
qr<-qr(as.matrix(solve(CholSigma,Z)))
rval <- list(
    mult = function(X) {
      base::qr.resid(qr,as.matrix(solve(CholSigma,(spG %*% X))))
        }, 
    tmult = function(X) {
      crossprod(spG, solve(t(CholSigma), base::qr.resid(qr,X)))
    })

SKAT, weights, and projections

Thomas Lumley