Concatenating cohort records

library(CohortConstructor)
library(CohortCharacteristics)
library(ggplot2)

For this example we’ll use the Eunomia synthetic data from the CDMConnector package.

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomiaDir())
cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main", 
                    writeSchema = "main", writePrefix = "my_study_")

Let’s start by creating a cohort of users of acetaminophen

cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = list("acetaminophen" = 1127433), 
                                 name = "medications")
cohortCount(cdm$medications)
#> # A tibble: 1 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1           9365            2580

We can merge cohort records using the collapseCohorts() function in the CohortConstructor package. The function allows us to specifying the number of days between two cohort entries, which will then be merged into a single record.

Let’s first define a new cohort where records within 1095 days (~ 3 years) of each other will be merged.

cdm$medications_collapsed <- cdm$medications |> 
  collapseCohorts(
  gap = 1095,
  name = "medications_collapsed"
)

Let’s compare how this function would change the records of a single individual.

cdm$medications |>
  filter(subject_id == 1)
#> # Source:   SQL [?? x 4]
#> # Database: DuckDB v1.1.3 [eburn@Windows 10 x64:R 4.4.0/C:\Users\eburn\AppData\Local\Temp\Rtmp2b5z04\file31074e24582.duckdb]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                  <int>      <int> <date>            <date>         
#> 1                    1          1 1980-03-15        1980-03-29     
#> 2                    1          1 1971-01-04        1971-01-18     
#> 3                    1          1 1982-09-11        1982-10-02     
#> 4                    1          1 1976-10-20        1976-11-03
cdm$medications_collapsed |>
  filter(subject_id == 1)
#> # Source:   SQL [?? x 4]
#> # Database: DuckDB v1.1.3 [eburn@Windows 10 x64:R 4.4.0/C:\Users\eburn\AppData\Local\Temp\Rtmp2b5z04\file31074e24582.duckdb]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                  <int>      <int> <date>            <date>         
#> 1                    1          1 1971-01-04        1971-01-18     
#> 2                    1          1 1976-10-20        1976-11-03     
#> 3                    1          1 1980-03-15        1982-10-02

Subject 1 initially had 4 records between 1971 and 1982. After specifying that records within three years of each other are to be merged, the number of records decreases to three. The record from 1980-03-15 to 1980-03-29 and the record from 1982-09-11 to 1982-10-02 are merged to create a new record from 1980-03-15 to 1982-10-02.

Now let’s look at how the cohorts have been changed.

summary_attrition <- summariseCohortAttrition(cdm$medications_collapsed)
tableCohortAttrition(summary_attrition)
Reason
Variable name
number_records number_subjects excluded_records excluded_subjects
An OMOP CDM database; acetaminophen
Initial qualifying events 9,365 2,580 0 0
Record start <= record end 9,365 2,580 0 0
Record in observation 9,365 2,580 0 0
Merge overlapping records 9,365 2,580 0 0
Collapse cohort with a gap of 1095 days. 7,975 2,580 1,390 0