Title: Example Datasets for Clinical Submission Readiness
Version: 0.1.0
Description: Provides realistic synthetic example datasets for the R4SUB (R for Regulatory Submission) ecosystem. Includes a pharma study evidence table, ADaM (Analysis Data Model) and SDTM (Study Data Tabulation Model) metadata following CDISC (Clinical Data Interchange Standards Consortium) conventions (https://www.cdisc.org), traceability mappings, a risk register based on ICH (International Council for Harmonisation) Q9 quality risk management principles (https://www.ich.org/page/quality-guidelines), and regulatory indicator definitions. Designed for demos, vignettes, and package testing.
License: MIT + file LICENSE
URL: https://github.com/R4SUB/r4subdata
BugReports: https://github.com/R4SUB/r4subdata/issues
Depends: R (≥ 4.2)
Imports: tibble
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
Encoding: UTF-8
RoxygenNote: 7.3.3
LazyData: true
NeedsCompilation: no
Packaged: 2026-02-18 17:44:18 UTC; aeroe
Author: Pawan Rama Mali [aut, cre, cph]
Maintainer: Pawan Rama Mali <prm@outlook.in>
Repository: CRAN
Date/Publication: 2026-02-20 11:40:02 UTC

r4subdata: Example Datasets for Clinical Submission Readiness

Description

Provides realistic synthetic example datasets for the R4SUB (R for Regulatory Submission) ecosystem. Includes a pharma study evidence table, ADaM (Analysis Data Model) and SDTM (Study Data Tabulation Model) metadata following CDISC (Clinical Data Interchange Standards Consortium) conventions (https://www.cdisc.org), traceability mappings, a risk register based on ICH (International Council for Harmonisation) Q9 quality risk management principles (https://www.ich.org/page/quality-guidelines), and regulatory indicator definitions. Designed for demos, vignettes, and package testing.

Author(s)

Maintainer: Pawan Rama Mali prm@outlook.in [copyright holder]

See Also

Useful links:


ADaM Variable-Level Metadata

Description

ADaM (Analysis Data Model) variable-level metadata for ADSL (Subject-Level Analysis Dataset, 16 vars), ADAE (Adverse Events Analysis Dataset, 10 vars), and ADLB (Laboratory Results Analysis Dataset, 10 vars). Follows CDISC (Clinical Data Interchange Standards Consortium) ADaM conventions.

Usage

adam_metadata

Format

A tibble with 36 rows and 6 columns:

dataset

Character. ADaM dataset name (ADSL, ADAE, ADLB).

variable

Character. Variable name.

label

Character. Variable label.

type

Character. Variable type (Char or Num).

length

Integer. Variable length.

format

Character. SAS (Statistical Analysis System) format (or NA).

Source

Synthetic metadata based on CDISC ADaM (Analysis Data Model) standards.

Examples

data(adam_metadata)
table(adam_metadata$dataset)

Dataset Column Dictionary

Description

Returns column names, types, and descriptions for a given r4subdata dataset.

Usage

dataset_dictionary(dataset)

Arguments

dataset

Character. Name of the dataset (e.g., "evidence_pharma").

Value

A tibble with columns: column, type, description.

Examples

dataset_dictionary("evidence_pharma")
dataset_dictionary("adam_metadata")


Pharma Study Evidence Table

Description

A realistic evidence table for study CDISCPILOT01 (Clinical Data Interchange Standards Consortium Pilot Study 01) covering all four R4SUB (R for Regulatory Submission) pillars (quality, trace, risk, usability) with 250 rows and 18 indicators across multiple datasets and sources.

Usage

evidence_pharma

Format

A tibble with 250 rows and 17 columns:

run_id

Character. Unique run identifier.

study_id

Character. Study identifier (CDISCPILOT01).

asset_type

Character. Asset type: dataset, define, program, validation, spec, other.

asset_id

Character. Asset identifier (e.g., ADSL, define.xml).

source_name

Character. Source of the evidence (e.g., pinnacle21).

source_version

Character. Version of the source tool.

indicator_id

Character. Indicator identifier (e.g., Q-MISS-VAR).

indicator_name

Character. Human-readable indicator name.

indicator_domain

Character. Domain: quality, trace, risk, usability.

severity

Character. Severity: info, low, medium, high, critical.

result

Character. Result: pass, fail, warn, na.

metric_value

Numeric. Metric value (if applicable).

metric_unit

Character. Unit for metric_value.

message

Character. Descriptive message.

location

Character. Location reference (e.g., ADSL:AGE).

evidence_payload

Character. JSON payload with additional details.

created_at

POSIXct. Timestamp when evidence was created.

Source

Synthetic data based on the CDISC (Clinical Data Interchange Standards Consortium) Pilot Study 01 structure.

Examples

data(evidence_pharma)
head(evidence_pharma)
table(evidence_pharma$indicator_domain)

List Available r4subdata Datasets

Description

Returns a summary of all datasets included in the r4subdata package.

Usage

list_datasets()

Value

A tibble with columns: name, description, n_rows, n_cols.

Examples

list_datasets()


Regulatory Indicator Definitions

Description

Reference table of 30 indicator definitions across all four R4SUB (R for Regulatory Submission) domains (quality, trace, risk, usability). Each indicator has a unique ID, default severity, typical source, and descriptive tags.

Usage

regulatory_indicators

Format

A tibble with 30 rows and 7 columns:

indicator_id

Character. Unique indicator identifier.

indicator_name

Character. Human-readable indicator name.

domain

Character. Indicator domain: quality, trace, risk, usability.

description

Character. Detailed description.

severity_default

Character. Default severity level.

source

Character. Typical source tool.

tags

Character. Comma-separated tags.

Source

Curated indicator definitions for the R4SUB (R for Regulatory Submission) ecosystem.

Examples

data(regulatory_indicators)
table(regulatory_indicators$domain)

Pharma Risk Register

Description

A Failure Mode and Effects Analysis (FMEA)-based risk register with 18 risks covering data quality, traceability, documentation, programming, and compliance categories. Includes probability, impact, and detectability scores on a 1-5 scale. Structured according to ICH (International Council for Harmonisation) Q9 quality risk management principles.

Usage

risk_register_pharma

Format

A tibble with 18 rows and 9 columns:

risk_id

Character. Unique risk identifier (RISK-001 to RISK-018).

description

Character. Risk description.

category

Character. Risk category.

probability

Integer. Probability of occurrence (1-5).

impact

Integer. Impact severity (1-5).

detectability

Integer. Detectability rating (1-5).

owner

Character. Risk owner name.

mitigation

Character. Mitigation action (or NA).

status

Character. Status: open, mitigated, closed, accepted.

Source

Synthetic risk register based on ICH (International Council for Harmonisation) Q9 quality risk management principles.

Examples

data(risk_register_pharma)
table(risk_register_pharma$category)

SDTM Variable-Level Metadata

Description

SDTM (Study Data Tabulation Model) variable-level metadata for DM (Demographics, 17 vars), AE (Adverse Events, 14 vars), and LB (Laboratory Results, 12 vars). Follows CDISC (Clinical Data Interchange Standards Consortium) SDTM conventions.

Usage

sdtm_metadata

Format

A tibble with 43 rows and 6 columns:

dataset

Character. SDTM domain name (DM, AE, LB).

variable

Character. Variable name.

label

Character. Variable label.

type

Character. Variable type (Char or Num).

length

Integer. Variable length.

format

Character. SAS (Statistical Analysis System) format (or NA).

Source

Synthetic metadata based on CDISC SDTM (Study Data Tabulation Model) standards.

Examples

data(sdtm_metadata)
table(sdtm_metadata$dataset)

ADaM-to-SDTM Traceability Mapping

Description

Maps ADaM (Analysis Data Model) variables to their SDTM (Study Data Tabulation Model) source variables with derivation text and confidence scores. Includes direct copies, derived variables, and unmapped entries. Follows CDISC (Clinical Data Interchange Standards Consortium) traceability conventions.

Usage

trace_mapping

Format

A tibble with 25 rows and 6 columns:

adam_dataset

Character. Source ADaM dataset.

adam_var

Character. Source ADaM variable.

sdtm_domain

Character. Target SDTM domain (NA if derived).

sdtm_var

Character. Target SDTM variable (NA if derived).

derivation_text

Character. Derivation description text.

confidence

Numeric. Mapping confidence score (0-1, NA if unmapped).

Source

Synthetic traceability mapping based on CDISC conventions.

Examples

data(trace_mapping)
table(trace_mapping$adam_dataset)