This vignette sets out the statistics behind respondeR: the cut-point approach, each pooling method and its variance, the relative effect measures, the threshold-free common-language effect size, the standardized-mean-difference bridge, random effects, the refinement options, and the assumptions and their limits. It closes with a guide to choosing a method.
For one study arm with mean change \(\mu\), standard deviation \(\sigma\) and a minimal important difference (MID) threshold \(m\), assume the patient-level change \(X\) is Normally distributed. A responder is a patient whose change crosses the threshold. The responder probability is
\[ p = \Pr(X > m) = \Phi\!\left(\frac{\mu - m}{\sigma}\right) \quad\text{(higher change is better),} \]
or \(p = \Phi\!\left(\frac{m - \mu}{\sigma}\right)\) when a lower change is better. This is the cut-point (“dichotomization”) method reviewed by Thorlund and colleagues (2011) and detailed by Anzures-Cabrera, Sarpatwari & Higgins (2011). The between-arm contrast is then a familiar binary effect measure: by default the risk difference \(\mathrm{RD} = p_e - p_c\).
respondeR keeps proportions on the \([0, 1]\) scale internally and converts to percentages only for display.
Studies report per-arm summaries; the methods differ in how those are combined. Throughout, study \(i\) contributes \((\bar d_{e,i}, s_{e,i}, n_{e,i})\) for the experimental arm and \((\bar d_{c,i}, s_{c,i}, n_{c,i})\) for the control arm.
Dichotomize each study, form its risk difference, then pool. With \(p_{e,i} = \Phi((\bar d_{e,i} - m)/s_{e,i})\) and likewise \(p_{c,i}\),
\[ \mathrm{RD}_i = p_{e,i} - p_{c,i}, \qquad \widehat{\mathrm{RD}} = \frac{\sum_i w_i \mathrm{RD}_i}{\sum_i w_i}, \quad w_i = 1/\widehat{\mathrm{Var}}(\mathrm{RD}_i). \]
The per-study variance follows se_method:
"binomial" (default): \(\widehat{\mathrm{Var}}(\mathrm{RD}_i) =
\frac{p_{e,i}(1 - p_{e,i})}{n_{e,i}} + \frac{p_{c,i}(1 -
p_{c,i})}{n_{c,i}}\)."delta": propagates the uncertainty in the estimated
mean and SD through the Normal CDF, \(\widehat{\mathrm{Var}}(p) =
\phi(a)^2\left[\frac{1}{n} +
\frac{a^2}{2(n-1)}\right]\) with \(a =
(\mu - m)/\sigma\).The "binomial" form is a pseudo-binomial
approximation: \(p_{e,i}\) and \(p_{c,i}\) are probabilities implied by the
estimated mean and SD, not proportions of observed dichotomized
patients, so it does not carry the uncertainty in the reported mean and
SD. The "delta" form does, and is generally preferable for
summary-statistic inputs; "binomial" is the default only
for continuity with earlier results. This is the most defensible method
because it respects each study’s own scale.
Pool before dichotomizing. The mean is combined by inverse variance and the SD by the within-study pooled SD:
\[ \bar d^{\star} = \frac{\sum_i \bar d_i / v_i}{\sum_i 1/v_i},\; v_i = \frac{s_i^2}{n_i}; \qquad s^{\star} = \sqrt{\frac{\sum_i (n_i - 1)\, s_i^2}{\sum_i (n_i - 1)}}. \]
Then \(p^{\star} = \Phi((\bar d^{\star} - m)/s^{\star})\) and the risk-difference variance comes from the delta method, propagating uncertainty in both the pooled mean and the pooled SD, \[ \mathrm{Var}(p^{\star}) \approx \left(\frac{\partial p^{\star}}{\partial \mu}\right)^2 \mathrm{Var}(\bar d^{\star}) + \left(\frac{\partial p^{\star}}{\partial \sigma}\right)^2 \mathrm{Var}(s^{\star}), \qquad \mathrm{Var}(s^{\star}) \approx \frac{s^{\star 2}}{2 \sum_i (n_i - 1)} . \] Including the SD term keeps this method consistent with the individual delta method and avoids intervals that are too narrow. This is the paper-aligned “pool-then-dichotomize” estimator.
Replace the pooled summaries with the arithmetic mean or the median
of the study means and SDs. These are useful robustness summaries but
have no variance model, so respondeR reports the point
estimate with NA intervals rather than a spurious
confidence interval.
responder_analysis(sample_responder_data, mid = 1)[,
c("method", "p_e", "p_c", "rd", "rd_lb", "rd_ub")]
#> method p_e p_c rd rd_lb rd_ub
#> 1 individual NA NA 0.2554475 0.1869705 0.3239244
#> 2 weighted 0.4742782 0.2205372 0.2537410 0.1985865 0.3088955
#> 3 unweighted 0.4767051 0.2279613 0.2487438 NA NA
#> 4 median 0.4869694 0.2150781 0.2718912 NA NABy default (control = "matched") the control responder
proportion is pooled the same way as the experimental arm, so each
summary method contrasts like with like.
The simulation study that motivated this package (Sofi-Mahmudi, 2024)
instead held the baseline risk fixed at the median control
arm for every summary method, varying only how the experimental
arm was pooled. That choice is available via
control = "median". It treats the control event rate as a
single nuisance baseline, much as a GRADE summary-of-findings table
takes one representative control risk, and reports the experimental
pooling against it. Because the median control arm carries no
sampling-variance model, this option returns point estimates only.
matched <- responder_analysis(sample_responder_data, mid = 1)
medbase <- responder_analysis(sample_responder_data, mid = 1, control = "median")
keep <- matched$method %in% c("median", "unweighted", "weighted")
data.frame(
method = matched$method[keep],
pc_matched = round(matched$p_c[keep], 3),
pc_median = round(medbase$p_c[keep], 3),
rd_matched = round(matched$rd[keep], 3),
rd_median = round(medbase$rd[keep], 3)
)
#> method pc_matched pc_median rd_matched rd_median
#> 1 weighted 0.221 0.215 0.254 0.259
#> 2 unweighted 0.228 0.215 0.249 0.262
#> 3 median 0.215 0.215 0.272 0.272Under control = "median" every summary method shares one
control proportion (the median control arm); the median
method is unchanged, and the individual and
smd methods, which pool per-study contrasts, ignore the
option.
From \(p_e\) and \(p_c\) (and their variances) respondeR also reports relative measures on the log scale and the number needed to treat:
\[ \mathrm{RR} = \frac{p_e}{p_c}, \quad \mathrm{OR} = \frac{p_e/(1 - p_e)}{p_c/(1 - p_c)}, \quad \mathrm{NNT} = \frac{1}{\mathrm{RD}}. \]
Confidence intervals for RR and OR are formed on the log scale and
back-transformed. Following Altman (1998), when the risk-difference
interval excludes zero the NNT bounds are the reciprocals of
the RD bounds; when it includes zero the NNT is unbounded and
respondeR returns NA bounds to flag it.
Choosing a MID can be contentious. The common-language effect size (CLES, the probabilistic index) is the probability that a randomly chosen treated patient has a better change than a randomly chosen control. Under a Normal model it is exact:
\[ \mathrm{CLES} = \Phi(\delta), \qquad \delta = \frac{\mu_e - \mu_c}{\sqrt{\sigma_e^2 + \sigma_c^2}}. \]
Per-study \(\delta_i\) are pooled by inverse variance (with a delta-method variance) and back-transformed. No threshold is required.
method = "smd")The second approach of Anzures-Cabrera et al. (2011) pools the standardized mean difference and maps it to an odds ratio. respondeR pools Hedges’ \(g\), applies the Cox logistic link \(\ln\mathrm{OR} = \frac{\pi}{\sqrt 3}\, g\), and combines the result with the weighted-pooled control responder rate to recover risks. It is a useful cross-check on the cut-point methods because it bridges to risks through a different distributional assumption.
The individual and SMD methods pool across studies and so can use
random effects (pooling = "random"). respondeR offers
DerSimonian-Laird (closed-form, dependency-free) or REML
(tau_method = "REML", via metafor), and reports
Cochran’s \(Q\), \(I^2\), \(\tau^2\) and a prediction interval.
responder_analysis(sample_responder_data, mid = 1, method = "individual",
pooling = "random")[, c("tau2", "i2", "q", "q_p",
"pi_lb", "pi_ub")]
#> tau2 i2 q q_p pi_lb pi_ub
#> 1 0 0 1.6054 0.4481173 -0.18848 0.6993749Prediction intervals use a \(t_{k-2}\) critical value and are unstable for very few studies; interpret them cautiously when \(k\) is small.
For the pooled confidence interval itself, the default Normal (Wald)
interval can under-cover when \(k\) is
small, because \(\tau^2\) is poorly
estimated. Set ci_method = "hksj" for the
Hartung-Knapp-Sidik-Jonkman interval, a \(t\)-based interval whose width adapts to
the observed dispersion of the study estimates and which is better
calibrated for few-study meta-analyses (Rover, Knapp & Friede,
2015). The example below has only three studies, exactly where this
matters.
rbind(
wald = responder_analysis(sample_responder_data, mid = 1, method = "individual",
pooling = "random", ci_method = "wald")[, c("rd", "rd_lb", "rd_ub")],
hksj = responder_analysis(sample_responder_data, mid = 1, method = "individual",
pooling = "random", ci_method = "hksj")[, c("rd", "rd_lb", "rd_ub")]
)
#> rd rd_lb rd_ub
#> wald 0.2554475 0.1869705 0.3239244
#> hksj 0.2554475 0.1207656 0.3901293ci_type = "logit").
Proportion intervals are formed on the logit scale and risk-difference
intervals by Newcombe’s MOVER method, so they stay within \([0, 1]\) and \([-1, 1]\) even for extreme
proportions.mid_sd). If the
threshold is itself estimated, supplying its SD propagates that
uncertainty into the effect-measure variances, with the correct
between-arm correlation through the shared threshold.dist). The
change scores can be modeled as lognormal or Student-\(t\) instead of Normal, as a sensitivity
analysis for skewed or heavy-tailed data (variances are obtained
numerically).dist = "lognormal"/"t" as a sensitivity
check.ci_type = "logit" there.| If you want… | Use |
|---|---|
| A defensible default that respects each study’s scale | individual (fixed or random) |
| The paper’s pool-then-dichotomize estimator | weighted |
| A robustness or sensitivity summary | median / unweighted (point estimates) |
| A cross-check via a different bridge to risks | smd |
| To avoid choosing a threshold altogether | responder_cles() |
| Relative rather than absolute effects | the rr / or columns; nnt for
impact |
| Between-study heterogeneity quantified | pooling = "random" |
Sofi-Mahmudi, A. (2024). Identifying an optimal strategy for converting pain as a continuous outcome to a responder analysis [Master’s thesis, McMaster University]. MacSphere. https://hdl.handle.net/11375/30210
Thorlund, K., Walter, S. D., Johnston, B. C., Furukawa, T. A., & Guyatt, G. H. (2011). Pooling health-related quality of life outcomes in meta-analysis: a tutorial and review of methods for enhancing interpretability. Research Synthesis Methods, 2(3), 188 to 203. doi:10.1002/jrsm.46
Altman, D. G. (1998). Confidence intervals for the number needed to treat. BMJ, 317(7168), 1309 to 1312.
Anzures-Cabrera, J., Sarpatwari, A., & Higgins, J. P. T. (2011). Expressing findings from meta-analyses of continuous outcomes in terms of risks. Statistics in Medicine, 30(25), 2867 to 2880. doi:10.1002/sim.4298
Chinn, S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 19(22), 3127 to 3131.
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361 to 365.
Rover, C., Knapp, G., & Friede, T. (2015). Hartung-Knapp-Sidik-Jonkman approach and its modification for random-effects meta-analysis with few studies. BMC Medical Research Methodology, 15, 99. doi:10.1186/s12874-015-0091-1