library(serosv)
Proposed model
Penalized splines
A general model relating the prevalence to age can be written as a GLM
g(P(Yi=1|ai))=g(π(ai))=η(ai)
The linear predictor can be estimated semi-parametrically using penalized spline with truncated power basis functions of degree p and fixed knots κ1,...,κk as followed
η(ai)=β0+β1ai+...+βpapi+Σkk=1uk(ai−κk)p+
Where
(ai−κk)p+={0,ai≤κk(ai−κk)p,ai>κk
In matrix notation, the mean structure model for η(ai) becomes
η=Xβ+Zu
Where η=[η(ai)...η(aN)]T, β=[β0β1....βp]T, and u=[u1u2...uk]T are the regression with corresponding design matrices
X=[1a1a21...ap11a2a22...ap2⋮⋮⋮…⋮1aNa2N...apN],Z=[(a1−κ1)p+(a1−κ2)p+…(a1−κk)p+(a2−κ1)p+(a2−κ2)p+…(a2−κk)p+⋮⋮…⋮(aN−κ1)p+(aN−κ2)p+…(aN−κk)p+]
FOI can then be derived as
ˆλ(ai)=[^β1,2^β2ai,...,pˆβap−1i+Σkk=1pˆuk(ai−κk)p−1+]δ(ˆη(ai))
Refer to Chapter 8.2.1
Proposed approach
A first approach to fit the model is by maximizing the following penalized likelihood
ϕ−1[yT(Xβ+Zu)−1Tc(Xβ+Zu)]−12λ2[βu]TD[βu]Where:
Xβ+Zu is the linear predictor (refer to ??)
D is a known semi-definite penalty matrix (Wahba 1978), (Green and Silverman 1993)
y is the response vector
1 the unit vector, c(.) is determined by the link function used
λ is the smoothing parameter (larger values –> smoother curves)
ϕ is the overdispersion parameter and equals 1 if there is no overdispersion
Fitting data
To fit the data using the penalized likelihood framework, specify framework = "pl"
Basis function can be defined via the s
parameter, some values for s includes:
"tp"
thin plate regression splines
"cr"
cubic regression splines
"ps"
P-splines proposed by (Eilers and Marx 1996)
"ad"
for Adaptive smoothers
For more options, refer to the mgcv documentation (Wood 2017)
<- parvob19_be_2001_2003
data <- penalized_spline_model(data$age, status = data$seropositive, s = "tp", framework = "pl")
pl $info
pl#>
#> Family: binomial
#> Link function: logit
#>
#> Formula:
#> spos ~ s(age, bs = s, sp = sp)
#>
#> Estimated degrees of freedom:
#> 6.16 total = 7.16
#>
#> UBRE score: 0.1206458
plot(pl)
Refer to Chapter 8.2.2
Proposed approach
Looking back at (1), a constraint for u would be Σku2k<C for some positive value C
This is equivalent to choosing (β,u) to maximise (1) with D=diag(0,1) where 0 denotes zero vector length p+1 and 1 denotes the unit vector of length K
For a fixed value for λ this is equivalent to fitting the following generalized linear mixed model Ngo and Wand (2004)
f(y|u)=exp{ϕ−1[yT(Xβ+Zu)−c(Xβ+Zu)]+1Tc(y)},u∼N(0,G)
Thus Z is penalized by assuming the corresponding coefficients u are random effect with u∼N(0,σ2uI).
Fitting data
To fit the data using the penalized likelihood framework, specify framework = "glmm"
<- parvob19_be_2001_2003
data <- penalized_spline_model(data$age, status = data$seropositive, s = "tp", framework = "glmm")
glmm #>
#> Maximum number of PQL iterations: 20
#> iteration 1
#> iteration 2
#> iteration 3
#> iteration 4
$info$gam
glmm#>
#> Family: binomial
#> Link function: logit
#>
#> Formula:
#> spos ~ s(age, bs = s, sp = sp)
#>
#> Estimated degrees of freedom:
#> 6.45 total = 7.45
plot(glmm)