Processing math: 100%

SKAT, weights, and projections

Thomas Lumley

The original SKAT statistic for linear and generalised linear models is Q=(yˆμ)GW2G(yˆμ)=(yˆμ)K(yˆμ) where G is N×M genotype matrix, and W is a weight matrix that in practice is diagonal. I’ve changed the original notation from W to W2, because everyone basically does. The Harvard group has a factor of 1/2 somewhere in here, the BU/CHARGE group doesn’t.

When the adjustment model isn’t ordinary linear regression, there is a second weight matrix, which I’ll write Σ, giving the metric that makes yyˆμ the projection orthogonal to the range of X. That is ˆμ=(Σ1/2X(XTΣ1X)1XTΣ1/2)Y Note that both (Σ1/2X(XTΣ1X)1XTΣ1/2) and I(Σ1/2X(XTΣ1X)1XTΣ1/2) are projections.

The matrix whose eigenvalues are needed for SKAT is H=P1/20KP1/20 (or K1/2P0K1/2) where P0=V1/2[I(Σ1/2X(XTΣ1X)1XTΣ1/2)]V1/2 is the covariance matrix of the the residuals, with V=var[Y]. Usually V=Σ, but that’s not necessary.

famSKAT has test statistic Q=(yˆμ)V1GW2GV1(yˆμ)=(yˆμ)V1KV1(yˆμ) so the matrix H is H=P1/20V1KV1P1/20.

When we want to take a square root of P0 it helps a lot that the central piece is a projection, and so is idempotent: we can define Π0=[I(Σ1/2X(XTΣ1X)1XTΣ1/2)] and write P0=V1/2Π0V1/2=V1/2Π0Π0V1/2.

Now consider ˜G, with H=˜GT˜G. We can take ˜G=WGV1V1/2Π0=WGV1/2Π0 where G is sparse, W is diagonal. The projection Π0 was needed to fit the adjustment model, so it will be fast. In family data where V=Σ is based on expected relatedness from a pedigree, the Cholesky square root R=V1/2=Σ1/2 will be sparse.

Let f be the size of the largest pedigree. We should still be able to multiply a vector by ˜G in O(MNα+Nf2) time where α1 depends on the sparseness of G. If so, we can compute the leading-eigenvalue approximation in O(MNkα+Nkf2) time. (In fact, we can replace f2 by the average of the squares of number of relatives for everyone in the sample)

The relevant bits of the code, the functions that multiply by ˜G and ˜GT, look like

CholSigma<-t(chol(SIGMA))
Z<-nullmodel$x
qr<-qr(as.matrix(solve(CholSigma,Z)))
rval <- list(
    mult = function(X) {
      base::qr.resid(qr,as.matrix(solve(CholSigma,(spG %*% X))))
        }, 
    tmult = function(X) {
      crossprod(spG, solve(t(CholSigma), base::qr.resid(qr,X)))
    })