Insert, update, or remove rows in a database

James Wondrasek

2024-11-24

This tutorial introduces the methods {dm} provides for modifying the data in the tables of a relational model. There are 6 methods:

The dm_rows_* process

All six methods take the same arguments and using them follows the same process:

  1. Create a temporary changeset dm object that defines the intended changes on the RDBMS
  2. If desired, simulate changes with in_place = FALSE to double-check
  3. Apply changes with in_place = TRUE.

To start, a dm object is created containing the tables and rows that you want to change. This changeset dm is then copied into the same source as the dm you want to modify. With the dm in the same RDBMS as the destination dm, you call the appropriate method, such as dm_rows_insert(), to make your planned changes, along with an argument of in_place = FALSE so you can confirm you achieve the changes that you want.

This verification can be done visually, looking at row counts and the like, or using {dm}’s constraint checking method, dm_examine_constraints(). The biggest danger is damaging key relations between data spread across multiple tables by deleting or duplicating rows and their keys. dm_examine_constraints() will catch errors where primary keys are duplicated or foreign keys do not have a matching primary key (unless the foreign key value is NA).

With the changes confirmed, you execute the method again, this time with the argument in_place = TRUE to make the changes permanent. Note that in_place = FALSE is the default: you must opt in to actually change data on the database.

Each method has its own requirements in order to maintain database consistency. These involve constraints on primary key values that uniquely identify rows.

Method Requirements
dm_rows_insert() Records with existing primary keys are silently ignored (via dplyr::rows_insert(conflict = "ignore")).
dm_rows_append() All records are inserted, the underlying database might check for uniqueness of primary keys (and fail the operation) if a constraint is set.
dm_rows_update() Primary keys must match for all records to be updated.
dm_rows_patch() Updates missing values in existing records. Primary keys must match for all records to be patched.
dm_rows_upsert() Updates existing records and adds new records, based on the primary key.
dm_rows_delete() Removes matching records based on the primary key. Primary keys must match for all records to be deleted.

To ensure the integrity of all relations during the process, all methods automatically determine the correct processing order for the tables involved. For operations that create records, parent tables (which hold primary keys) are processed before child tables (which hold foreign keys). For dm_rows_delete(), child tables are processed before their parent tables. Note that the user is still responsible for setting transactions to ensure integrity of operations across multiple tables. For more details on this see vignette("howto-dm-theory") and vignette("howto-dm-db").

Usage

To demonstrate the use of these table modifying methods, we will create a simple dm object with two tables linked by a foreign key. Note that the child table has a foreign key missing (NA).

library(dm)
parent <- tibble(value = c("A", "B", "C"), pk = 1:3)
parent
child <- tibble(value = c("a", "b", "c"), pk = 1:3, fk = c(1, 1, NA))
child
demo_dm <-
  dm(parent = parent, child = child) %>%
  dm_add_pk(parent, pk) %>%
  dm_add_pk(child, pk) %>%
  dm_add_fk(child, fk, parent)

demo_dm %>%
  dm_draw(view_type = "all")

{dm} doesn’t check your key values when you create a dm, we add this check:1

dm_examine_constraints(demo_dm)

Then we copy demo_dm into an SQLite database. Note: the default for the method used, copy_dm_to(), is to create temporary tables that will be automatically deleted when your session ends. As demo_sql will be the destination dm for the examples, the argument temporary = FALSE is used to make this distinction apparent.

library(DBI)
sqlite_db <- DBI::dbConnect(RSQLite::SQLite())
demo_sql <- copy_dm_to(sqlite_db, demo_dm, temporary = FALSE)
demo_sql

{dm}’s table modification methods can be piped together to create a repeatable sequence of operations that returns a dm incorporating all the changes required. This is a common use case for {dm} – manually building a sequence of operations using temporary results until it is complete and correct, and then committing the result.

dm_rows_insert()

To demonstrate dm_rows_insert(), we create a dm with tables containing the rows to insert and copy it to sqlite_db, the same source as demo_sql. For all of the dm_rows_...() methods, the source and destination dm objects must be in the same RDBMS. You will get an error message if this is not the case.

The code below adds parent and child table entries for the letter “D”. First, the changeset dm is created and temporarily copied to the database:

new_parent <- tibble(value = "D", pk = 4)
new_parent
new_child <- tibble(value = "d", pk = 4, fk = 4)
new_child
dm_insert_in <-
  dm(parent = new_parent, child = new_child) %>%
  copy_dm_to(sqlite_db, ., temporary = TRUE)

The changeset dm is then used as an argument to dm_rows_insert().

dm_insert_out <-
  demo_sql %>%
  dm_rows_insert(dm_insert_in)

This gives us a warning that changes will not persist (i.e., they are temporary). Inspecting the child table of the resulting dm_insert_out and demo_sql, we can see that’s exactly what happened. {dm} returned to us a dm object with our inserted rows in place, but the underlying database has not changed.

dm_insert_out$child
demo_sql$child

We repeat the operation, this time with the argument in_place = TRUE and the changes now persist in demo_sql.

dm_insert_out <-
  demo_sql %>%
  dm_rows_insert(dm_insert_in, in_place = TRUE)

demo_sql$child

dm_rows_update()

dm_rows_update() works the same as dm_rows_insert(). We create the dm object and copy it to the same source as the destination. Here we will change the foreign key for the row in child containing “b” to point to the correct row in parent. And we will persist the changes.

updated_child <- tibble(value = "b", pk = 2, fk = 2)
updated_child
dm_update_in <-
  dm(child = updated_child) %>%
  copy_dm_to(sqlite_db, ., temporary = TRUE)

dm_update_out <-
  demo_sql %>%
  dm_rows_update(dm_update_in, in_place = TRUE)

demo_sql$child

dm_rows_delete()

dm_rows_delete() is not currently implemented to work with an RDBMS, so we will shift our demonstrations back to the local R environment. We’ve made changes to demo_sql, so we use collect() to copy the current tables out of SQLite. Note that persistence is not a concern for local dm objects. Every operation returns a new dm object containing the changes made.

local_dm <- collect(demo_sql)

local_dm$parent
local_dm$child
dm_deleted <-
  dm(parent = new_parent, child = new_child) %>%
  dm_rows_delete(local_dm, .)

dm_deleted$child

dm_rows_patch()

dm_rows_patch() updates missing values in existing records. We use it here to fix the missing foreign key in the child table.

patched_child <- tibble(value = "c", pk = 3, fk = 3)
patched_child
dm_patched <-
  dm(child = patched_child) %>%
  dm_rows_patch(dm_deleted, .)

dm_patched$child

dm_rows_upsert()

dm_rows_upsert() updates rows with supplied values if they exist or inserts the supplied values as new rows if they don’t. In this example we add the letter “D” back to our dm, and update the foreign key for “b”.

upserted_parent <- tibble(value = "D", pk = 4)
upserted_parent
upserted_child <- tibble(value = c("b", "d"), pk = c(2, 4), fk = c(3, 4))
upserted_child
dm_upserted <-
  dm(parent = upserted_parent, child = upserted_child) %>%
  dm_rows_upsert(dm_patched, .)

dm_upserted$parent
dm_upserted$child

When done, do not forget to disconnect:

DBI::dbDisconnect(sqlite_db)

Conclusion

The dm_rows_...() methods give you row-level granularity over the modifications you need to make to your relational model. Using the common in_place argument, they all can construct and verify your modifications before committing them. There are a few limitations, as mentioned in the tutorial, but these will be addressed in future updates to {dm}.

Further Reading

If this tutorial answered some questions, but opened others, these resources might be of assistance.

Is your data in an RDBMS? vignette("howto-dm-db") offers a detailed look at working with an existing relational data model.

If your data is in data frames, then you may want to read vignette("howto-dm-df") next.

If you would like to know more about relational data models in order to get the most out of dm, check out vignette("howto-dm-theory").

If you’re familiar with relational data models but want to know how to work with them in dm, then any of vignette("tech-dm-join"), vignette("tech-dm-filter"), or vignette("tech-dm-zoom") is a good next step.


  1. Be aware that when using dm_examine_constraints(), missing (denoted by NULL in SQL, while NA in R) foreign keys are allowed and will be counted as a match. In some cases this doesn’t make sense and non-NULL columns should be enforced by the RDBMS. Currently, {dm} does not specify or check non-NULL constraints for columns.↩︎