Here we describe the details of the calculations for the Kenward-Roger degrees of freedom and the adjusted covariance matrix of the coefficients.
The model definition is the same as what we have in Details of the model fitting in
mmrm
. We are using the same notations.
For each subject i we observe a vector Yi=(yi1,…,yimi)⊤∈Rmi and given a design matrix Xi∈Rmi×p and a corresponding coefficient vector β∈Rp we assume that the observations are multivariate normal distributed: Yi∼N(Xiβ,Σi) where the covariance matrix Σi∈Rmi×mi is derived by subsetting the overall covariance matrix Σ∈Rm×m appropriately by Σi=G−1/2iS⊤iΣSiG−1/2i where the subsetting matrix Si∈{0,1}m×mi contains in each of its mi columns contains a single 1 indicating which overall time point is matching tih. Gi∈Rmi×mi>0 is the diagonal weight matrix.
Conditional on the design matrices Xi, the coefficient vector β and the covariance matrix Σ we assume that the observations are independent between the subjects.
We can write the linear model for all subjects together as Y=Xβ+ϵ where Y∈RN combines all subject specific observations vectors Yi such that we have in total N=∑ni=1mi observations, X∈RN×p combines all subject specific design matrices and ϵ∈RN has a multivariate normal distribution ϵ∼N(0,Ω) where Ω∈RN×N is block-diagonal containing the subject specific Σi covariance matrices on the diagonal and 0 in the remaining entries.
The mathematical derivation of the Kenward-Roger method is based on
the Taylor expansion of the obtained covariance matrix of ˆβ to get a more accurate estimate
for it. All these derivations are based on the restricted maximum
likelihood. Following the same notation, the
covariance matrix, Ω can be
represented as a function of covariance matrix parameters θ=(θ1,…,θk)⊤, i.e. Ω(θ). Here after model fitting
with mmrm
, we obtain the estimate ˆβ=Φ(ˆθ)X⊤Ω(ˆθ)−1Y, where Φ(θ)={X⊤Ω(θ)−1X}−1 is the asymptotic covariance matrix of ˆβ. However, Kackar and Harville (1984) suggests that
although the ˆβ is unbiased
for β, the covariance matrix,
ˆΦ={X⊤ˆΩX}−1 can be biased. They showed that the variability
of ˆβ can be partitioned
into two components,
ΦA=Φ+Λ
where Φ is the variance-covariance matrix of the asymptotic distribution of ˆβ as n→∞ as defined above, and Λ represents the amount to which the asymptotic variance-covariance matrix underestimates ΦA.
Based on a Taylor series expansion around θ, Λ can be approximated by
Λ≃Φ{k∑h=1k∑j=1Whj(Qhj−PhΦPj)}Φ where Ph=X⊤∂Ω−1∂θhX Qhj=X⊤∂Ω−1∂θhΩ∂Ω−1∂θjX
W is the inverse of the Hessian matrix of the log-likelihood function of θ evaluated at the estimate ˆθ, i.e. the observed Fisher Information matrix as a consistent estimator of the variance-covariance matrix of ˆθ.
Again, based on a Taylor series expansion about θ, Kenward and Roger (1997) show that ˆΦ≃Φ+k∑h=1(ˆθh−θh)∂Φ∂θh+12k∑h=1k∑j=1(ˆθh−θh)(ˆθj−θj)∂2Φ∂θh∂θj Ignoring the possible bias in ˆθ, E(ˆΦ)≃Φ+12k∑h=1k∑j=1Whj∂2Φ∂θh∂θj Using previously defined notations, this can be further written as ∂2Φ∂θh∂θj=Φ(PhΦPj+PjΦPh−Qhj−Qjh+Rhj)Φ where Rhj=X⊤Ω−1∂2Ω∂θh∂θjΩ−1X
substituting Φ and Λ back to the ˆΦA, we have
ˆΦA=ˆΦ+2ˆΦ{k∑h=1k∑j=1Whj(Qhj−PhˆΦPj−14Rhj)}ˆΦ
where Ω(ˆθ) replaces Ω(θ) in the right-hand side.
Please note that, if we ignore Rhj, the second-order derivatives, we will get a different estimate of adjusted covariance matrix, and we call this the linear Kenward-Roger approximation.
In mmrm models, Ω is a block-diagonal matrix, hence we can calculate P, Q and R for each subject and add them up.
Ph=N∑i=1Pih=N∑i=1X⊤i∂Σ−1i∂θhXi
Qhj=N∑i=1Qihj=N∑i=1X⊤i∂Σ−1i∂θhΣi∂Σ−1i∂θjXi
Rhj=N∑i=1Rihj=N∑i=1X⊤iΣ−1i∂2Σi∂θh∂θjΣ−1iXi
The derivative of the overall covariance matrix Σ with respect to the variance parameters can be calculated through the derivatives of the Cholesky factor, and hence obtained through automatic differentiation, following matrix identities calculations. ∂Σ∂θh=∂LL⊤∂θh=∂L∂θhL⊤+L∂L⊤∂θh
∂2Σ∂θh∂θj=∂2L∂θh∂θjL⊤+L∂2L⊤∂θh∂θj+∂L∂θh∂LT∂θj+∂L∂θj∂L⊤∂θh
The derivatives of Σ−1 can be calculated through
∂ΣΣ−1∂θh=∂Σ∂θhΣ−1+Σ∂Σ−1∂θh=0 ∂Σ−1∂θh=−Σ−1∂Σ∂θhΣ−1
If a subject do not have all visits, the corresponding covariance matrix can be represented as Σi=S⊤iΣSi
and the derivatives can be obtained through
∂Σi∂θh=S⊤i∂Σ∂θhSi
∂2Σi∂θh∂θj=S⊤i∂2Σ∂θh∂θjSi
The derivative of the Σ−1i, ∂Σ−1i∂θh can be calculated through Σ−1i and ∂Σi∂θh using the above.
When fitting grouped mmrm
models, the covariance matrix
for subject i of group g(i), can be
written as Σi=S⊤iΣg(i)Si$. Assume there are B
groups, the number of parameters is increased by B times. With the fact that for each
group, the corresponding θ
will not affect other parts, we will have block-diagonal P, Q
and R matrices, where the blocks
are given by:
Ph=(Ph,1…Ph,B)
Qhj=(Qhj,10……00Qhj,20…0⋮⋱⋮⋮⋱⋮0……0Qhj,B)
Rhj=(Rhj,10……00Rhj,20…0⋮⋱⋮⋮⋱⋮0……0Rhj,B)
Use Pj,b to denote the block diagonal part for group b, we have Ph,b=∑g(i)=bPih=∑g(i)=bX⊤i∂Σ−1i∂θhXi
Qhj,b=∑g(i)=bQihj=∑g(i)=bX⊤i∂Σ−1i∂θhΣi∂Σ−1i∂θjXi
Rhj,b=∑g(i)=bRihj=∑g(i)=bX⊤iΣ−1i∂2Σi∂θh∂θjΣ−1iXi
Similarly for R.
Under weights mmrm model, the covariance matrix for subject i, can be represented as
Σi=G−1/2iS⊤iΣSiG−1/2i
Where Gi is a diagonal matrix of the weights. Then, when deriving P, Q and R, there are no mathematical differences as they are constant, and having Gi in addition to Si does not change the algorithms and we can simply multiply the formulas with G−1/2i, similarly as above for the subsetting matrix.
Suppose we are testing the linear combination of β, Cβ with C∈Rc×p, we can use the following F-statistic F=1c(ˆβ−β)⊤C(C⊤ˆΦAC)−1C⊤(ˆβ−β) and F∗=λF follows exact Fc,m distribution.
When we have only one coefficient to test, then C is a column vector containing a single 1 inside. We can still follow the same calculations as for the multi-dimensional case. This recovers the degrees of freedom results of the Satterthwaite method.
λ and m can be calculated through
M=C(C⊤ΦC)−1C⊤
A1=k∑h=1k∑j=1Whjtr(MΦPhΦ)tr(MΦPjΦ)
A2=k∑h=1k∑j=1Whjtr(MΦPhΦMΦPjΦ)
B=12c(A1+6A2)
g=(c+1)A1−(c+4)A2(c+2)A2
c1=g3c+2(1−g)
c2=c−g3c+2(1−g)
c3=c+2−g3c+2(1−g) E∗={1−A2c}−1 V∗=2c{1+c1B(1−c2B)2(1−c3B)}
ρ=V∗2(E∗)2
m=4+c+2cρ−1 λ=mE∗(m−2)
While the Kenward-Roger adjusted covariance matrix is adopting a Taylor series to approximate the true value, the choices of parameterization can change the result. In a simple example of unstructured covariance structure, in our current approach, where the parameters are elements of the Cholesky factor of Σ (see parameterization), the second-order derivatives of Σ over our parameters, are non-zero matrices. However, if we use the elements of Σ as our parameters, then the second-order derivatives are zero matrices. However, the differences can be small and will not affect the inference. If you would like to match SAS results for the unstructured covariance model, you can use the linear Kenward-Roger approximation.
mmrm
In package mmrm
, we have implemented Kenward-Roger
calculations based on the previous sections. Specially, for the
first-order and second-order derivatives, we use automatic
differentiation to obtain the results easily for non-spatial covariance
structure. For spatial covariance structure, we derive the exact
results.
For spatial exponential covariance structure, we have
θ=(θ1,θ2) σ=eθ1 ρ=eθ21+eθ2
Σij=σρdij where dij is the distance between time point i and time point j.
So the first-order derivatives can be written as:
∂Σij∂θ1=∂σ∂θ1ρdij=eθ1ρdij=Σij
∂Σij∂θ2=σ∂ρdij∂θ2=σρdij−1dij∂ρ∂θ2=σρdij−1dijρ(1−ρ)=σρdijdij(1−ρ)
Second-order derivatives can be written as:
∂2Σij∂θ1∂θ1=∂Σij∂θ1=Σij
∂2Σij∂θ1∂θ2=∂2Σij∂θ2∂θ1=∂Σij∂θ2=σρdij−1dijρ(1−ρ)=σρdijdij(1−ρ)
∂2Σij∂θ2∂θ2=∂σρdijdij(1−ρ)∂θ2=σρdijdij(1−ρ)(dij(1−ρ)−ρ)