Loading [MathJax]/jax/output/HTML-CSS/jax.js

Kenward-Roger

Here we describe the details of the calculations for the Kenward-Roger degrees of freedom and the adjusted covariance matrix of the coefficients.

Model definition

The model definition is the same as what we have in Details of the model fitting in mmrm. We are using the same notations.

Linear model

For each subject i we observe a vector Yi=(yi1,,yimi)Rmi and given a design matrix XiRmi×p and a corresponding coefficient vector βRp we assume that the observations are multivariate normal distributed: YiN(Xiβ,Σi) where the covariance matrix ΣiRmi×mi is derived by subsetting the overall covariance matrix ΣRm×m appropriately by Σi=G1/2iSiΣSiG1/2i where the subsetting matrix Si{0,1}m×mi contains in each of its mi columns contains a single 1 indicating which overall time point is matching tih. GiRmi×mi>0 is the diagonal weight matrix.

Conditional on the design matrices Xi, the coefficient vector β and the covariance matrix Σ we assume that the observations are independent between the subjects.

We can write the linear model for all subjects together as Y=Xβ+ϵ where YRN combines all subject specific observations vectors Yi such that we have in total N=ni=1mi observations, XRN×p combines all subject specific design matrices and ϵRN has a multivariate normal distribution ϵN(0,Ω) where ΩRN×N is block-diagonal containing the subject specific Σi covariance matrices on the diagonal and 0 in the remaining entries.

Mathematical Details of Kenward-Roger method

The mathematical derivation of the Kenward-Roger method is based on the Taylor expansion of the obtained covariance matrix of ˆβ to get a more accurate estimate for it. All these derivations are based on the restricted maximum likelihood. Following the same notation, the covariance matrix, Ω can be represented as a function of covariance matrix parameters θ=(θ1,,θk), i.e. Ω(θ). Here after model fitting with mmrm, we obtain the estimate ˆβ=Φ(ˆθ)XΩ(ˆθ)1Y, where Φ(θ)={XΩ(θ)1X}1 is the asymptotic covariance matrix of ˆβ. However, Kackar and Harville (1984) suggests that although the ˆβ is unbiased for β, the covariance matrix, ˆΦ={XˆΩX}1 can be biased. They showed that the variability of ˆβ can be partitioned into two components,

ΦA=Φ+Λ

where Φ is the variance-covariance matrix of the asymptotic distribution of ˆβ as n as defined above, and Λ represents the amount to which the asymptotic variance-covariance matrix underestimates ΦA.

Based on a Taylor series expansion around θ, Λ can be approximated by

ΛΦ{kh=1kj=1Whj(QhjPhΦPj)}Φ where Ph=XΩ1θhX Qhj=XΩ1θhΩΩ1θjX

W is the inverse of the Hessian matrix of the log-likelihood function of θ evaluated at the estimate ˆθ, i.e. the observed Fisher Information matrix as a consistent estimator of the variance-covariance matrix of ˆθ.

Again, based on a Taylor series expansion about θ, Kenward and Roger (1997) show that ˆΦΦ+kh=1(ˆθhθh)Φθh+12kh=1kj=1(ˆθhθh)(ˆθjθj)2Φθhθj Ignoring the possible bias in ˆθ, E(ˆΦ)Φ+12kh=1kj=1Whj2Φθhθj Using previously defined notations, this can be further written as 2Φθhθj=Φ(PhΦPj+PjΦPhQhjQjh+Rhj)Φ where Rhj=XΩ12ΩθhθjΩ1X

substituting Φ and Λ back to the ˆΦA, we have

ˆΦA=ˆΦ+2ˆΦ{kh=1kj=1Whj(QhjPhˆΦPj14Rhj)}ˆΦ

where Ω(ˆθ) replaces Ω(θ) in the right-hand side.

Please note that, if we ignore Rhj, the second-order derivatives, we will get a different estimate of adjusted covariance matrix, and we call this the linear Kenward-Roger approximation.

Special Considerations for mmrm models

In mmrm models, Ω is a block-diagonal matrix, hence we can calculate P, Q and R for each subject and add them up.

Ph=Ni=1Pih=Ni=1XiΣ1iθhXi

Qhj=Ni=1Qihj=Ni=1XiΣ1iθhΣiΣ1iθjXi

Rhj=Ni=1Rihj=Ni=1XiΣ1i2ΣiθhθjΣ1iXi

Derivative of the overall covariance matrix Σ

The derivative of the overall covariance matrix Σ with respect to the variance parameters can be calculated through the derivatives of the Cholesky factor, and hence obtained through automatic differentiation, following matrix identities calculations. Σθh=LLθh=LθhL+LLθh

2Σθhθj=2LθhθjL+L2Lθhθj+LθhLTθj+LθjLθh

Derivative of the Σ1

The derivatives of Σ1 can be calculated through

ΣΣ1θh=ΣθhΣ1+ΣΣ1θh=0 Σ1θh=Σ1ΣθhΣ1

Subjects with missed visits

If a subject do not have all visits, the corresponding covariance matrix can be represented as Σi=SiΣSi

and the derivatives can be obtained through

Σiθh=SiΣθhSi

2Σiθhθj=Si2ΣθhθjSi

The derivative of the Σ1i, Σ1iθh can be calculated through Σ1i and Σiθh using the above.

Scenario under group specific covariance estimates

When fitting grouped mmrm models, the covariance matrix for subject i of group g(i), can be written as Σi=SiΣg(i)Si$. Assume there are B groups, the number of parameters is increased by B times. With the fact that for each group, the corresponding θ will not affect other parts, we will have block-diagonal P, Q and R matrices, where the blocks are given by:

Ph=(Ph,1Ph,B)

Qhj=(Qhj,1000Qhj,20000Qhj,B)

Rhj=(Rhj,1000Rhj,20000Rhj,B)

Use Pj,b to denote the block diagonal part for group b, we have Ph,b=g(i)=bPih=g(i)=bXiΣ1iθhXi

Qhj,b=g(i)=bQihj=g(i)=bXiΣ1iθhΣiΣ1iθjXi

Rhj,b=g(i)=bRihj=g(i)=bXiΣ1i2ΣiθhθjΣ1iXi

Similarly for R.

Scenario under weighted mmrm

Under weights mmrm model, the covariance matrix for subject i, can be represented as

Σi=G1/2iSiΣSiG1/2i

Where Gi is a diagonal matrix of the weights. Then, when deriving P, Q and R, there are no mathematical differences as they are constant, and having Gi in addition to Si does not change the algorithms and we can simply multiply the formulas with G1/2i, similarly as above for the subsetting matrix.

Inference

Suppose we are testing the linear combination of β, Cβ with CRc×p, we can use the following F-statistic F=1c(ˆββ)C(CˆΦAC)1C(ˆββ) and F=λF follows exact Fc,m distribution.

When we have only one coefficient to test, then C is a column vector containing a single 1 inside. We can still follow the same calculations as for the multi-dimensional case. This recovers the degrees of freedom results of the Satterthwaite method.

λ and m can be calculated through

M=C(CΦC)1C

A1=kh=1kj=1Whjtr(MΦPhΦ)tr(MΦPjΦ)

A2=kh=1kj=1Whjtr(MΦPhΦMΦPjΦ)

B=12c(A1+6A2)

g=(c+1)A1(c+4)A2(c+2)A2

c1=g3c+2(1g)

c2=cg3c+2(1g)

c3=c+2g3c+2(1g) E={1A2c}1 V=2c{1+c1B(1c2B)2(1c3B)}

ρ=V2(E)2

m=4+c+2cρ1 λ=mE(m2)

Parameterization methods and Kenward-Roger

While the Kenward-Roger adjusted covariance matrix is adopting a Taylor series to approximate the true value, the choices of parameterization can change the result. In a simple example of unstructured covariance structure, in our current approach, where the parameters are elements of the Cholesky factor of Σ (see parameterization), the second-order derivatives of Σ over our parameters, are non-zero matrices. However, if we use the elements of Σ as our parameters, then the second-order derivatives are zero matrices. However, the differences can be small and will not affect the inference. If you would like to match SAS results for the unstructured covariance model, you can use the linear Kenward-Roger approximation.

Implementations in mmrm

In package mmrm, we have implemented Kenward-Roger calculations based on the previous sections. Specially, for the first-order and second-order derivatives, we use automatic differentiation to obtain the results easily for non-spatial covariance structure. For spatial covariance structure, we derive the exact results.

Spatial Exponential Derivatives

For spatial exponential covariance structure, we have

θ=(θ1,θ2) σ=eθ1 ρ=eθ21+eθ2

Σij=σρdij where dij is the distance between time point i and time point j.

So the first-order derivatives can be written as:

Σijθ1=σθ1ρdij=eθ1ρdij=Σij

Σijθ2=σρdijθ2=σρdij1dijρθ2=σρdij1dijρ(1ρ)=σρdijdij(1ρ)

Second-order derivatives can be written as:

2Σijθ1θ1=Σijθ1=Σij

2Σijθ1θ2=2Σijθ2θ1=Σijθ2=σρdij1dijρ(1ρ)=σρdijdij(1ρ)

2Σijθ2θ2=σρdijdij(1ρ)θ2=σρdijdij(1ρ)(dij(1ρ)ρ)

References

Kackar RN, Harville DA (1984). “Approximations for Standard Errors of Estimators of Fixed and Random Effects in Mixed Linear Models.” Journal of the American Statistical Association, 79(388), 853–862. https://doi.org/10.1080/01621459.1984.10477102.
Kenward MG, Roger JH (1997). “Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood.” Biometrics, 53(3), 983–997. https://doi.org/10.2307/2533558.