Applying cohort table requirements

In this vignette we’ll show how requirements related to the data contained in the cohort table can be applied. For this we’ll use the Eunomia synthetic data.

library(CodelistGenerator)
library(CohortConstructor)
library(CohortCharacteristics)
library(ggplot2)
library(dplyr)
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomiaDir())
cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main", 
                    writeSchema = "main", writePrefix = "my_study_")

Let’s start by creating a cohort of acetaminophen users. Individuals will have a cohort entry for each drug exposure record they have for acetaminophen with cohort exit based on their drug record end date. Note when creating the cohort, any overlapping records will be concatenated.

acetaminophen_codes <- getDrugIngredientCodes(cdm, 
                                              name = "acetaminophen", 
                                              nameStyle = "{concept_name}")
cdm$acetaminophen <- conceptCohort(cdm = cdm, 
                                   conceptSet = acetaminophen_codes, 
                                   exit = "event_end_date",
                                   name = "acetaminophen")

At this point we have just created our base cohort without having applied any restrictions.

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)
%0 1->2 2->3 3->4 6->5 8->7 10->9 1 Initial events N subjects = 2,679 N records = 14,205 2 N subjects = 2,679 N records = 14,205 3 N subjects = 2,679 N records = 14,205 4 Final events N subjects = 2,679 N records = 13,908 5 N subjects = 0 N records = 0 6 Record start <= record end 7 N subjects = 0 N records = 0 8 Record in observation 9 N subjects = 0 N records = 297 10 Merge overlapping records

Keep only the first record per person

We can see that in our starting cohort individuals have multiple entries for each use of acetaminophen. However, we could keep only their earliest cohort entry by using requireIsFirstEntry() from CohortConstructor.

cdm$acetaminophen <- cdm$acetaminophen |> 
  requireIsFirstEntry()

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)
%0 1->2 2->3 3->4 4->5 7->6 9->8 11->10 13->12 1 Initial events N subjects = 2,679 N records = 14,205 2 N subjects = 2,679 N records = 14,205 3 N subjects = 2,679 N records = 14,205 4 N subjects = 2,679 N records = 13,908 5 Final events N subjects = 2,679 N records = 2,679 6 N subjects = 0 N records = 0 7 Record start <= record end 8 N subjects = 0 N records = 0 9 Record in observation 10 N subjects = 0 N records = 297 11 Merge overlapping records 12 N subjects = 0 N records = 11,229 13 Restricted to first entry

While the number of individuals remains unchanged, records after an individual’s first have been excluded.

If we wanted to keep the latest record per person instead of the earliest we would use requireIsLastEntry() instead. Or if we want to keep some range of records per person we can use the requireIsEntry() function.

Keep only records within a date range

Individuals may contribute multiple records over extended periods. We can filter out records that fall outside a specified date range using the requireInDateRange function.

cdm$acetaminophen <- conceptCohort(cdm = cdm, 
                                 conceptSet = acetaminophen_codes, 
                                 name = "acetaminophen")
cdm$acetaminophen <- cdm$acetaminophen |> 
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2015-01-01")))

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)
%0 1->2 2->3 3->4 4->5 5->6 8->7 10->9 12->11 14->13 16->15 1 Initial events N subjects = 2,679 N records = 14,205 2 N subjects = 2,679 N records = 14,205 3 N subjects = 2,679 N records = 14,205 4 N subjects = 2,679 N records = 13,908 5 N subjects = 1,276 N records = 1,689 6 Final events N subjects = 786 N records = 889 7 N subjects = 0 N records = 0 8 Record start <= record end 9 N subjects = 0 N records = 0 10 Record in observation 11 N subjects = 0 N records = 297 12 Merge overlapping records 13 N subjects = 1,403 N records = 12,219 14 cohort_start_date after 2010-01-01 15 N subjects = 490 N records = 800 16 cohort_start_date before 2015-01-01

Applying multiple cohort requirements

Multiple restrictions can be applied to a cohort, however it is important to note that the order that requirements are applied will often matter.

cdm$acetaminophen_1 <- conceptCohort(cdm = cdm, 
                                 conceptSet = acetaminophen_codes, 
                                 name = "acetaminophen_1") |> 
  requireIsFirstEntry() |>
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01")))

cdm$acetaminophen_2 <- conceptCohort(cdm = cdm, 
                                 conceptSet = acetaminophen_codes, 
                                 name = "acetaminophen_2") |>
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01"))) |> 
  requireIsFirstEntry()
summary_attrition_1 <- summariseCohortAttrition(cdm$acetaminophen_1)
summary_attrition_2 <- summariseCohortAttrition(cdm$acetaminophen_2)

Here we see attrition if we apply our entry requirement before our date requirement. In this case we have a cohort of people with their first ever record of acetaminophen which occurs in our study period.

plotCohortAttrition(summary_attrition_1)
%0 1->2 2->3 3->4 4->5 5->6 6->7 9->8 11->10 13->12 15->14 17->16 19->18 1 Initial events N subjects = 2,679 N records = 14,205 2 N subjects = 2,679 N records = 14,205 3 N subjects = 2,679 N records = 14,205 4 N subjects = 2,679 N records = 13,908 5 N subjects = 2,679 N records = 2,679 6 N subjects = 14 N records = 14 7 Final events N subjects = 13 N records = 13 8 N subjects = 0 N records = 0 9 Record start <= record end 10 N subjects = 0 N records = 0 11 Record in observation 12 N subjects = 0 N records = 297 13 Merge overlapping records 14 N subjects = 0 N records = 11,229 15 Restricted to first entry 16 N subjects = 2,665 N records = 2,665 17 cohort_start_date after 2010-01-01 18 N subjects = 1 N records = 1 19 cohort_start_date before 2016-01-01

And here we see attrition if we apply our date requirement before our entry requirement. In this case we have a cohort of people with their first record of acetaminophen in the study period, although this will not necessarily be their first record ever.

plotCohortAttrition(summary_attrition_2)
%0 1->2 2->3 3->4 4->5 5->6 6->7 9->8 11->10 13->12 15->14 17->16 19->18 1 Initial events N subjects = 2,679 N records = 14,205 2 N subjects = 2,679 N records = 14,205 3 N subjects = 2,679 N records = 14,205 4 N subjects = 2,679 N records = 13,908 5 N subjects = 1,276 N records = 1,689 6 N subjects = 907 N records = 1,060 7 Final events N subjects = 907 N records = 907 8 N subjects = 0 N records = 0 9 Record start <= record end 10 N subjects = 0 N records = 0 11 Record in observation 12 N subjects = 0 N records = 297 13 Merge overlapping records 14 N subjects = 1,403 N records = 12,219 15 cohort_start_date after 2010-01-01 16 N subjects = 369 N records = 629 17 cohort_start_date before 2016-01-01 18 N subjects = 0 N records = 153 19 Restricted to first entry

Keep only records from cohorts with a minimum number of individuals

Another useful functionality, particularly when working with multiple cohorts or performing a network study, is provided by requireMinCohortCount. Here we will only keep cohorts with a minimum count, filtering out records from cohorts with fewer than this number.

As an example let’s create a cohort for every drug ingredient we see in Eunomia. We can first get the drug ingredient codes.

medication_codes <- getDrugIngredientCodes(cdm = cdm, nameStyle = "{concept_name}")
medication_codes
#> 
#> - acetaminophen (7 codes)
#> - albuterol (2 codes)
#> - alendronate (2 codes)
#> - alfentanil (1 codes)
#> - alteplase (2 codes)
#> - amiodarone (2 codes)
#> along with 85 more codelists

We can see that when we make all these cohorts many have only a small number of individuals.

cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = medication_codes,
                                 name = "medications")


cohortCount(cdm$medications) |> 
  filter(number_subjects > 0) |> 
  ggplot() +
  geom_histogram(aes(number_subjects),
                 colour = "black",
                 binwidth = 25) +  
  xlab("Number of subjects") +
  theme_bw()

If we apply a minimum cohort count of 500, we end up with far fewer cohorts that all have a sufficient number of study participants.

cdm$medications <- cdm$medications |> 
  requireMinCohortCount(minCohortCount = 500)

cohortCount(cdm$medications) |> 
  filter(number_subjects > 0) |> 
  ggplot() +
  geom_histogram(aes(number_subjects),
                 colour = "black",
                 binwidth = 25) + 
  xlim(0, NA) + 
  xlab("Number of subjects") +
  theme_bw()