Build times

Thanks to Jasper Clarkberg, drake records how long it takes to build each target.

library(drake)
load_basic_example() # Get the code with drake_example("basic").
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## Unloading targets from environment:
##   small
##   large
##   coef_regression2_small
## connect 68 imports: my_storr, additions, output_plan, data_plan, faster_cache...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
make(my_plan, jobs = 2, verbose = FALSE) # See also max_useful_jobs(my_plan).

build_times(digits = 8) # From the cache.
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
##                      item   type elapsed   user system
## 1            'report.Rmd' import  0.003s     0s 0.004s
## 2             'report.md' target  0.038s 0.028s 0.008s
## 3  coef_regression1_large target  0.007s 0.008s     0s
## 4  coef_regression1_small target  0.008s 0.004s 0.004s
## 5  coef_regression2_large target  0.005s     0s 0.004s
## 6  coef_regression2_small target  0.005s 0.004s     0s
## 7              data.frame import  0.013s 0.012s     0s
## 8                    knit import  0.006s 0.004s     0s
## 9                   large target  0.007s 0.008s     0s
## 10                     lm import  0.004s 0.004s     0s
## 11                 mtcars import  0.002s     0s     0s
## 12                   nrow import  0.005s 0.004s     0s
## 13                   reg1 import  0.004s 0.004s     0s
## 14                   reg2 import  0.006s     0s 0.004s
## 15      regression1_large target  0.008s 0.004s 0.004s
## 16      regression1_small target  0.008s 0.004s 0.004s
## 17      regression2_large target  0.006s 0.004s 0.004s
## 18      regression2_small target  0.006s 0.004s     0s
## 19             sample.int import  0.008s 0.008s     0s
## 20               simulate import  0.002s 0.004s     0s
## 21                  small target  0.007s 0.004s     0s
## 22 summ_regression1_large target  0.004s     0s 0.004s
## 23 summ_regression1_small target  0.005s 0.004s     0s
## 24 summ_regression2_large target  0.004s 0.004s     0s
## 25 summ_regression2_small target  0.004s     0s 0.004s
## 26                summary import  0.006s 0.004s 0.004s
## 27       suppressWarnings import  0.005s 0.004s     0s

build_times(digits = 8, targets_only = TRUE)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
##                      item   type elapsed   user system
## 2             'report.md' target  0.038s 0.028s 0.008s
## 3  coef_regression1_large target  0.007s 0.008s     0s
## 4  coef_regression1_small target  0.008s 0.004s 0.004s
## 5  coef_regression2_large target  0.005s     0s 0.004s
## 6  coef_regression2_small target  0.005s 0.004s     0s
## 9                   large target  0.007s 0.008s     0s
## 15      regression1_large target  0.008s 0.004s 0.004s
## 16      regression1_small target  0.008s 0.004s 0.004s
## 17      regression2_large target  0.006s 0.004s 0.004s
## 18      regression2_small target  0.006s 0.004s     0s
## 21                  small target  0.007s 0.004s     0s
## 22 summ_regression1_large target  0.004s     0s 0.004s
## 23 summ_regression1_small target  0.005s 0.004s     0s
## 24 summ_regression2_large target  0.004s 0.004s     0s
## 25 summ_regression2_small target  0.004s     0s 0.004s

For drake version 4.1.0 and earlier, build_times() just measures the elapsed runtime of each command in my_plan$command. For later versions, the build times also account for all the internal operations in drake:::build(), such as storage and hashing.

Predicting runtime

Drake uses these times to predict the runtime of the next make(). At this moment, everything is up to date in the current example, so the next make() should be fast. Here, we only factor in the times of the targets (excluding the imports using targets_only = TRUE).

config <- drake_config(my_plan, verbose = FALSE)
predict_runtime(
  config,
  digits = 8,
  targets_only = TRUE
)
## [1] "0s"

But you can also predict the elapsed time of a full runthrough scratch (either after clean() or with make(..., trigger = "always")).

predict_runtime(
  config,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.122s"

Suppose we change a dependency to make some targets out of date. Now, even though from_scatch is FALSE, the next make() should take some time.

reg2 <- function(d){
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}

predict_runtime(
  config,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.068s"

We can also factor in parallelism using the future_jobs argument, which is just jobs for a hypothetical next make().

predict_runtime(
  config,
  future_jobs = 1,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.122s"

predict_runtime(
  config,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.086s"

predict_runtime(
  config,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
## [1] "0.068s"

Rate-limiting targets

To predict the next runtime with multiple parallel jobs, drake makes some assumptions.

  1. The outdated targets are spread out evenly over the available jobs.
  2. One job gets all the slowest targets (pessimistic scenario).

Then, drake simply takes the targets from the slowest job in each parallelizable stage and sums the corresponding elapsed build times. A parallelizable stage is a usually a column in the workflow graph, but if there are up-to-date targets in a column, drake skips ahead to try to fit as many targets as possible in a stage.

# Hover, click, drag, zoom, and pan.
vis_drake_graph(my_plan, width = "100%", height = "500px")

You can explore the rate-limiting targets

rate_limiting_times(
  config,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
##                      item   type elapsed  user system stage
## 1                   large target   0.007 0.008  0.000     3
## 2                   small target   0.007 0.004  0.000     3
## 3       regression1_large target   0.008 0.004  0.004     4
## 4       regression1_small target   0.008 0.004  0.004     4
## 5       regression2_large target   0.006 0.004  0.004     4
## 6       regression2_small target   0.006 0.004  0.000     4
## 7  coef_regression1_small target   0.008 0.004  0.004     5
## 8  coef_regression1_large target   0.007 0.008  0.000     5
## 9  coef_regression2_large target   0.005 0.000  0.004     5
## 10 coef_regression2_small target   0.005 0.004  0.000     5
## 11 summ_regression1_small target   0.005 0.004  0.000     5
## 12 summ_regression2_large target   0.004 0.004  0.000     5
## 13 summ_regression1_large target   0.004 0.000  0.004     5
## 14 summ_regression2_small target   0.004 0.000  0.004     5
## 15            'report.md' target   0.038 0.028  0.008     6

rate_limiting_times(
  config,
  future_jobs = 2,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
##                     item   type elapsed  user system stage
## 1                  large target   0.007 0.008  0.000     3
## 2      regression1_large target   0.008 0.004  0.004     4
## 3      regression1_small target   0.008 0.004  0.004     4
## 4 coef_regression1_small target   0.008 0.004  0.004     5
## 5 coef_regression1_large target   0.007 0.008  0.000     5
## 6 coef_regression2_large target   0.005 0.000  0.004     5
## 7 coef_regression2_small target   0.005 0.004  0.000     5
## 8            'report.md' target   0.038 0.028  0.008     6

rate_limiting_times(
  config,
  future_jobs = 4,
  from_scratch = TRUE,
  digits = 8,
  targets_only = TRUE
)
##                     item   type elapsed  user system stage
## 1                  large target   0.007 0.008  0.000     3
## 2      regression1_large target   0.008 0.004  0.004     4
## 3 coef_regression1_small target   0.008 0.004  0.004     5
## 4 coef_regression1_large target   0.007 0.008  0.000     5
## 5            'report.md' target   0.038 0.028  0.008     6

and the parallelizable stages in general.

parallel_stages(config, from_scratch = TRUE)
##                      item imported  file stage
## 1            'report.Rmd'     TRUE  TRUE     1
## 2              data.frame     TRUE FALSE     1
## 3                    knit     TRUE FALSE     1
## 4                      lm     TRUE FALSE     1
## 5                  mtcars     TRUE FALSE     1
## 6                    nrow     TRUE FALSE     1
## 7              sample.int     TRUE FALSE     1
## 8                 summary     TRUE FALSE     1
## 9        suppressWarnings     TRUE FALSE     1
## 10                   reg1     TRUE FALSE     2
## 11                   reg2     TRUE FALSE     2
## 12               simulate     TRUE FALSE     2
## 13                  large    FALSE FALSE     3
## 14                  small    FALSE FALSE     3
## 15      regression1_large    FALSE FALSE     4
## 16      regression1_small    FALSE FALSE     4
## 17      regression2_large    FALSE FALSE     4
## 18      regression2_small    FALSE FALSE     4
## 19 coef_regression1_large    FALSE FALSE     5
## 20 coef_regression1_small    FALSE FALSE     5
## 21 coef_regression2_large    FALSE FALSE     5
## 22 coef_regression2_small    FALSE FALSE     5
## 23 summ_regression1_large    FALSE FALSE     5
## 24 summ_regression1_small    FALSE FALSE     5
## 25 summ_regression2_large    FALSE FALSE     5
## 26 summ_regression2_small    FALSE FALSE     5
## 27            'report.md'    FALSE  TRUE     6

A word of caution

Drake only accounts for the targets with logged build times. If some targets have not been timed, drake throws a warning and lists the names of the untimed targets.