irt_params_* helper family is now complete across
all registered IRT models: irt_params_1pl(),
irt_params_2pl(), irt_params_3pl(),
irt_params_grm(), irt_params_pcm(), and
irt_params_gpcm(). Each helper shares the same
distribution-aware signature pattern (<param>_dist,
<param>_mean, <param>_sd,
seed) plus model-specific extras
(c_mean/c_sd for 3PL guessing;
n_categories for the polytomous family). All six delegate
to the model registry’s generate_default_params() method,
so generation defaults live next to the model definitions and stay in
sync automatically. irt_params_2pl() and
irt_params_grm() were refactored to use this shared method
with no change to their signatures, defaults, or return shapes.irt_design() and irt_simulate() now
support three-parameter logistic (model = "3PL"), partial
credit (model = "PCM"), and generalized partial credit
(model = "GPCM") models, in addition to the existing 1PL,
2PL, and GRM. irt_study(estimation_model = ...) accepts the
full set for generation/estimation cross-fits within a response format
(binary: 1PL/2PL/3PL; polytomous: GRM/PCM/GPCM).irt_simulate() no longer crashes with
"missing value where TRUE/FALSE needed" when
mirt’s convergence flag returns NA on a hard
or large fit. The flag is coerced to a strict logical via
isTRUE() at the source and consumer, so an unconfirmable
fit routes to the existing non-converged branch (records NA
estimates for that iteration) instead of aborting the whole simulation.
Most often seen on GRM studies at ~60+ items.irt_simulate() no longer crashes when
mirt::fscores() throws during theta scoring on GRM fits
with sparsely-observed response categories (likelier at large item
counts). The call is now wrapped in tryCatch(); scoring
failure degrades to NA theta recovery for that iteration
while item-parameter estimates — the primary sample-size-planning output
— are preserved. This unblocks GRM studies at realistic operational test
lengths (60–200 items).vignettes/ is older (by last-commit timestamp) than its
source in vignettes-raw/, catching the
precompute-then-commit drift that previously relied on developer
memory.irt_design(),
irt_study(), and irt_simulate() updated to
list the full registered model set (1PL, 2PL, 3PL, GRM, PCM, GPCM).benchmarks/,
excluded from the installed package). Findings
(benchmarks/README.md): wall-clock scales roughly as
n_items^1.7, mirt::mirt() accounts for ~96% of
run time, and peak resident heap stays ~335–395 MB across an
n_items {30,60,100,150} x N {200,500,1000} grid — use
parallel = TRUE for large designs.choosing-item-parameters vignette
(vignette("choosing-item-parameters")) — deeper reference
for the three item-parameter specification workflows introduced in the
getting-started vignette: import from a prior mirt fit
(with a slope-intercept-to-IRT conversion worked example) or a CSV /
Excel parameter table; domain-typical preset values for cognitive
ability, personality, clinical, and achievement assessments with cited
reference ranges; and hypothesized / content-based specification with
explicit translation from item-review judgements to distribution
arguments. eval=TRUE knitr engine; no new exported helpers in this
release.irtsim getting-started vignette
(vignette("irtsim")) walking a stranger from “I need to
plan an IRT study” through to recommended_n(). The vignette
builds live (eval=TRUE) so it cannot drift from the package API. Three
item-parameter specification paths are demonstrated: by hand, via
irt_params_2pl(), and from a prior mirt
calibration.irt_design() now aborts with an informative error if
n_factors != 1. Multidimensional IRT support is planned for
v0.4.0; until then, the parameter is retained on the design for forward
compatibility but silently accepting n_factors > 1
produced a cryptic mirt internal error downstream. The new abort fires
up front and points users at the planned support.recommended_n() gains an aggregate
parameter ("max" / "mean" /
"median" / "none", default
"max"). The default return is now an integer scalar — the
smallest sample size that powers every item/param at the requested
threshold — with a details attribute carrying the per-item
data frame plus aggregate, criterion, and
threshold attributes. "mean" and
"median" round up via ceiling() so the
recommendation never falls below the central tendency. Pass
aggregate = "none" to recover the previous per-item data
frame return. Behavior change: the default return shape
changed from a per-item data frame to a scalar; closes the footgun where
users could under-power by forgetting to take max() across
items.paper-reproduction-gaps vignette. Its
content was a scorecard of paper Examples 2 and 3 reproduction gaps that
pointed at deferred objectives (Obj 30/31). Those objectives are now
superseded by a planned pluggable fit_fn /
extract_fn hook (Obj 39, targeted for v0.3.0); the
standalone gaps vignette no longer reflects the roadmap.
Cross-references to it from paper-example-2-mcar and
paper-example-3-grm have also been removed.CRAN resubmission. Documentation-only changes; no user-facing API or behavior changes.
DESCRIPTION: expanded all acronyms on first use (API,
IRT, 1PL, 2PL, MCAR, MAR, MSE, RMSE, SE) per CRAN reviewer request.man/: replaced \dontrun{} with
\donttest{} in irt_simulate,
summary.irt_results, plot.irt_results,
plot.summary_irt_results, recommended_n,
print.irt_results, and
print.summary_irt_results examples per CRAN reviewer
request. Examples remain wrapped (not unwrapped) because each depends on
a ~300-fit irt_simulate() call that exceeds the 5-second
CRAN example-execution budget.Initial CRAN release.
irt_design() specifies the data-generating IRT model
(items, parameters, theta distribution).irt_study() adds study conditions (sample sizes,
missing-data mechanism, optional separate estimation model).irt_simulate() runs the Monte Carlo simulation loop
with deterministic seeding and optional parallelism.summary(), plot(), and
recommended_n() methods extract simulation-based
sample-size recommendations from irt_results objects."none" — complete data"mcar" — missing completely at random"mar" — missing at random (monotone,
trait-dependent)"booklet" — structured booklet assignment with
common-item overlap"linking" — two-form linked design with user-supplied
linking matrixmse), root mean squared error
(rmse), bias, absolute bias, standard error
(se), empirical coverage, Monte Carlo SE of MSE
(mcse_mse).R/criterion_registry.R.criterion_fn
argument to summary.irt_results() — callbacks receive
estimates, true_value, ci_lower,
ci_upper, and converged and return named
numeric vectors appended to item_summary.irt_study(estimation_model = ...) allows fitting a
different IRT model than the one used to generate data (e.g., generate
2PL, fit 1PL). Compatible cross-pairs: (1PL, 2PL),
(2PL, 1PL), same-model. GRM is not cross-compatible with
dichotomous models.irt_simulate(parallel = TRUE) dispatches iterations
across workers via future.apply::future_lapply().parallel setting) guaranteed. Cross-mode
results differ because serial uses Mersenne-Twister and parallel uses
L’Ecuyer-CMRG substreams — both statistically valid.future::plan().cli::cli_progress_bar() replaces
cat()-based progress reporting (suppressible with
progress = FALSE).cli::cli_abort() error messages with
valid-option enumerations for invalid model, criterion, missing
mechanism, and estimation_model arguments.R.rsp::asis
because re-running the Monte Carlo simulations during package checks
would exceed CRAN’s build-time budget. The source .Rmd
files and data-raw/precompute_vignettes.R are available in
the GitHub repository for users who wish to reproduce results
locally.cli, future.apply,
ggplot2, mirt, rlangfuture, knitr,
R.rsp, rmarkdown, scales,
testthatSchroeders, U., and Gnambs, T. (2025). Sample size planning in item response theory: A 10-decision framework. Advances in Methods and Practices in Psychological Science. https://doi.org/10.1177/25152459251314798