Processing math: 0%

Model specification, log-likelihood, scores and second derivatives

Notation

\boldsymbol{\beta} — vertical coefficient vector.
\boldsymbol{X} — Covariate matrix with one row per observation.
\boldsymbol{X_i} — i’th row from \boldsymbol{X}
\boldsymbol{Y} — Vertical binary outcome vector.
k — number of covariates.
n — number of observations.
i — observation index.
j — covariate index.

\begin{align} \boldsymbol{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} \quad \boldsymbol{X} = \begin{bmatrix} 1 & X_{1, 1} & X_{2, 1} & \ldots & X_{k, 1} \\ 1 & X_{1, 2} & X_{2, 2} & \ldots & X_{k, 2} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & X_{1, n} & X_{2, n} & \ldots & X_{k, n} \\ \end{bmatrix} = \begin{bmatrix} \boldsymbol{X_1} \\ \boldsymbol{X_2} \\ \vdots \\ \boldsymbol{X_n} \end{bmatrix} \quad \boldsymbol{Y} = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_3 \end{bmatrix} \end{align}

Model

P(Y_i = 1) = \frac{\lambda}{1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta})} = \frac{\text{exp}(\theta)}{(1 + \text{exp}(\theta))(1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}))}

\theta is the logit transformation of \lambda: \theta = \text{log}(\frac{\lambda}{1-\lambda})

Optimisation is done using the \theta parameterisation because it does not constrain the likelihood.

Log likelihood

l(\theta, \boldsymbol{\beta}) = \sum_i \ y_i \ \theta - \text{log} \big( 1+\text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big) - \text{log} \big( 1+\text{exp}(\theta) \big) + (1-y_i)\text{log} \Big( 1 + \text{exp} \big( \boldsymbol{X_i}\boldsymbol{\beta} \big) \big( 1 + \text{exp}(\theta) \big) \Big)

Scores

\begin{align} \begin{bmatrix} \frac{dl}{d\lambda} \\ \frac{dl}{d\beta_j} \end{bmatrix} = \begin{bmatrix} \sum_i y_i - \frac{\text{exp}(\theta)}{1 + \text{exp}(\theta)} + \frac{ (1 - y_i) \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \text{exp}(\theta) }{ 1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big( 1 + \text{exp}(\theta) \big) }\\ \sum_i x_{j, i} \Big( -\frac{ \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) }{ 1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) } + \frac{ (1 - y_i) \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big( 1 + \text{exp}(\theta) \big) }{ 1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big( 1 + \text{exp}(\theta) \big) } \Big) \\ \end{bmatrix} \end{align}

Second derivatives

\begin{align} \begin{array}{cc} \begin{matrix} \frac{dl}{d\lambda} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad & \frac{dl}{d\beta_j} \end{matrix}\\ \begin{matrix} \frac{dl}{d\lambda} \\ \frac{dl}{d\beta_j} \\ \end{matrix} \begin{bmatrix} \sum_i - \frac{ \text{exp}(\theta) }{ \big( 1+\text{exp}(\theta) \big)^2 } + \frac{ (1-y_i)(1+\text{exp}(\boldsymbol{X_i}\boldsymbol{\beta})) \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \text{exp}(\theta) }{ \Big( 1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big( 1 + \text{exp}(\theta) \big) \Big)^2 } & \sum_i x_{j,i} \Big( \frac{ (1-y_i) \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \text{exp}(\theta) }{ \Big( 1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big( 1 + \text{exp}(\theta) \big) \Big)^2 } \Big) \\ . & \sum_i x_{j,i} \Big( - \frac{ \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) }{ \big( 1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big)^2 } + \frac{ (1-y_i)(1+\text{exp}(\theta))\text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) }{ \Big( 1 + \text{exp}(\boldsymbol{X_i}\boldsymbol{\beta}) \big( 1 + \text{exp}(\theta) \big) \Big)^2 } \Big) \\ \end{bmatrix} \end{array} \end{align}

References

Dunning AJ (2006). “A model for immunological correlates of protection.” Statistics in Medicine, 25(9), 1485-1497. https://doi.org/10.1002/sim.2282.