The original SKAT statistic for linear and generalised linear models is Q=(y−ˆμ)′GW2G′(y−ˆμ)=(y−ˆμ)′K(y−ˆμ) where G is N×M genotype matrix, and W is a weight matrix that in practice is diagonal. I’ve changed the original notation from W to W2, because everyone basically does. The Harvard group has a factor of 1/2 somewhere in here, the BU/CHARGE group doesn’t.
When the adjustment model isn’t ordinary linear regression, there is a second weight matrix, which I’ll write Σ, giving the metric that makes y↦y−ˆμ the projection orthogonal to the range of X. That is ˆμ=(Σ−1/2X(XTΣ−1X)−1XTΣ−1/2)Y Note that both (Σ−1/2X(XTΣ−1X)−1XTΣ−1/2) and I−(Σ−1/2X(XTΣ−1X)−1XTΣ−1/2) are projections.
The matrix whose eigenvalues are needed for SKAT is H=P1/20KP1/20 (or K1/2P0K1/2) where P0=V1/2[I−(Σ−1/2X(XTΣ−1X)−1XTΣ−1/2)]V1/2 is the covariance matrix of the the residuals, with V=var[Y]. Usually V=Σ, but that’s not necessary.
famSKAT
has test statistic Q=(y−ˆμ)′V−1GW2G′V−1(y−ˆμ)=(y−ˆμ)′V−1KV−1(y−ˆμ) so the matrix H is H=P1/20V−1KV−1P1/20.
When we want to take a square root of P0 it helps a lot that the central piece is a projection, and so is idempotent: we can define Π0=[I−(Σ−1/2X(XTΣ−1X)−1XTΣ−1/2)] and write P0=V1/2Π0V1/2=V1/2Π0Π0V1/2.
Now consider ˜G, with H=˜GT˜G. We can take ˜G=WG′V−1V1/2Π0=WG′V−1/2Π0 where G is sparse, W is diagonal. The projection Π0 was needed to fit the adjustment model, so it will be fast. In family data where V=Σ is based on expected relatedness from a pedigree, the Cholesky square root R=V1/2=Σ1/2 will be sparse.
Let f be the size of the largest pedigree. We should still be able to multiply a vector by ˜G in O(MNα+Nf2) time where α≪1 depends on the sparseness of G. If so, we can compute the leading-eigenvalue approximation in O(MNkα+Nkf2) time. (In fact, we can replace f2 by the average of the squares of number of relatives for everyone in the sample)
The relevant bits of the code, the functions that multiply by ˜G and ˜GT, look like
<-t(chol(SIGMA))
CholSigma<-nullmodel$x
Z<-qr(as.matrix(solve(CholSigma,Z)))
qr<- list(
rval mult = function(X) {
::qr.resid(qr,as.matrix(solve(CholSigma,(spG %*% X))))
base
}, tmult = function(X) {
crossprod(spG, solve(t(CholSigma), base::qr.resid(qr,X)))
})