Inspect and run your project.
library(drake)
load_basic_example() # Get the code with drake_example("basic").
config <- drake_config(my_plan) # Master configuration list
vis_drake_graph(config) # Hover, click, drag, zoom, pan.
make(my_plan) # Run the workflow.
outdated(config) # Everything is up to date.
Debug errors.
failed() # Targets that failed in the most recent `make()`
diagnose() # Targets that failed in any previous `make()`
error <- diagnose(large) # Most recent verbose error log of `large`
str(error) # Object of class "error"
error$calls # Call stack / traceback
Dive deeper into the built-in examples.
drake_example("basic") # Write the code files.
drake_examples() # List the other examples.
vignette("quickstart") # This vignette
Is there an association between the weight and the fuel efficiency of cars? To find out, we use the mtcars
dataset from the datasets
package. The mtcars
dataset originally came from the 1974 Motor Trend US magazine, and it contains design and performance data on 32 models of automobile.
# ?mtcars # more info
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here, wt
is weight in tons, and mpg
is fuel efficiency in miles per gallon. We want to figure out if there is an association between wt
and mpg
. The mtcars
dataset itself only has 32 rows, so we generate two larger bootstrapped datasets and then analyze them with regression models. We summarize the regression models to see if there is an association.
Before you run your project, you need to set up the workspace. In other words, you need to gather the “imports”: functions, pre-loaded data objects, and saved files that you want to be available before the real work begins.
library(knitr) # Drake knows which packages you load.
library(drake)
We need a function to bootstrap larger datasets from mtcars
.
simulate <- function(n){
# Pick a random set of cars to bootstrap from the mtcars data.
index <- sample.int(n = nrow(mtcars), size = n, replace = TRUE)
data <- mtcars[index, ]
# x is the car's weight, and y is the fuel efficiency.
data.frame(
x = data$wt,
y = data$mpg
)
}
We also need functions to apply the regression models we need for detecting associations.
# Is fuel efficiency linearly related to weight?
reg1 <- function(d){
lm(y ~ + x, data = d)
}
# Is fuel efficiency related to the SQUARE of the weight?
reg2 <- function(d){
d$x2 <- d$x ^ 2
lm(y ~ x2, data = d)
}
We want to summarize the final results in an R Markdown report, so we need the following report.Rmd
source file.
path <- file.path("examples", "basic", "report.Rmd")
report_file <- system.file(path, package = "drake", mustWork = TRUE)
file.copy(from = report_file, to = getwd(), overwrite = TRUE)
## [1] TRUE
Here are the contents of the report. It will serve as a final summary of our work, and we will process it at the very end. Admittedly, some of the text spoils the punch line.
cat(readLines("report.Rmd"), sep = "\n")
## ---
## title: "Final results report for the basic example"
## author: You
## output: html_document
## ---
##
## # The weight and fuel efficiency of cars
##
## Is there an association between the weight and the fuel efficiency of cars? To find out, we use the `mtcars` dataset from the `datasets` package. The `mtcars` data originally came from the 1974 Motor Trend US magazine, and it contains design and performance data on 32 models of automobile.
##
## ```{r showmtcars}
## # ?mtcars # more info
## head(mtcars)
## ```
##
## Here, `wt` is weight in tons, and `mpg` is fuel efficiency in miles per gallon. We want to figure out if there is an association between `wt` and `mpg`. The `mtcars` dataset itself only has 32 rows, so we generated two larger bootstrapped datasets. We called them `small` and `large`.
##
## ```{r example_chunk}
## library(drake)
## head(readd(small)) # 48 rows
## loadd(large) # 64 rows
## head(large)
## ```
##
## Then, we fit a couple regression models to the `small` and `large` to try to detect an association between `wt` and `mpg`. Here are the coefficients and p-values from one of the model fits.
##
## ```{r second_example_chunk}
## readd(coef_regression2_small)
## ```
##
## Since the p-value on `x2` is so small, there may be an association between weight and fuel efficiency after all.
##
## # A note on knitr reports in drake projects.
##
## Because of the calls to `readd()` and `loadd()`, `drake` knows that `small`, `large`, and `coef_regression2_small` are dependencies of this R Markdown report. This dependency relationship is what causes the report to be processed at the very end.
Now, all our imports are set up. When the real work begins, drake
will import functions and data objects from your R session environment
ls()
## [1] "Produc" "analysis_methods"
## [3] "analysis_plan" "b"
## [5] "bad_plan" "coef_regression2_small"
## [7] "combos" "command"
## [9] "commands" "config"
## [11] "data_plan" "dataset_plan"
## [13] "datasets" "debug_plan"
## [15] "envir" "error"
## [17] "f" "files"
## [19] "get_rmspe" "good_plan"
## [21] "large" "local"
## [23] "make_my_plot" "make_my_table"
## [25] "model_plan" "my_plan"
## [27] "myplan" "output_plan"
## [29] "output_types" "package_list"
## [31] "path" "plot_rmspe"
## [33] "predictors" "recent"
## [35] "reg1" "reg2"
## [37] "report_file" "report_plan"
## [39] "reportfile" "results"
## [41] "rmspe" "rmspe_plan"
## [43] "rmspe_results_plan" "rules"
## [45] "simulate" "small"
## [47] "targets" "tmp"
## [49] "whole_plan" "x"
and saved files from your file system.
list.files()
## [1] "best-practices.R" "best-practices.Rmd"
## [3] "best-practices.html" "best-practices.md"
## [5] "caution.R" "caution.Rmd"
## [7] "caution.html" "caution.md"
## [9] "debug.R" "debug.Rmd"
## [11] "debug.html" "debug.md"
## [13] "drake.R" "drake.Rmd"
## [15] "drake.html" "drake.md"
## [17] "example-gsp.R" "example-gsp.Rmd"
## [19] "example-gsp.html" "example-gsp.md"
## [21] "example-packages.R" "example-packages.Rmd"
## [23] "example-packages.html" "example-packages.md"
## [25] "figure" "graph.R"
## [27] "graph.Rmd" "graph.html"
## [29] "graph.md" "logo-vignettes.png"
## [31] "parallelism.R" "parallelism.Rmd"
## [33] "parallelism.html" "parallelism.md"
## [35] "quickstart.R" "quickstart.Rmd"
## [37] "report.R" "report.Rmd"
## [39] "storage.Rmd" "timing.Rmd"
Now that your workspace of imports is prepared, we can outline the real work step by step in a workflow plan data frame.
load_basic_example() # Get the code with drake_example("basic").
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## Unloading targets from environment:
## small
## large
## coef_regression2_small
## connect 47 imports: output_plan, data_plan, results, tmp, predictors, config,...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
my_plan
## target
## 1 'report.md'
## 2 small
## 3 large
## 4 regression1_small
## 5 regression1_large
## 6 regression2_small
## 7 regression2_large
## 8 summ_regression1_small
## 9 summ_regression1_large
## 10 summ_regression2_small
## 11 summ_regression2_large
## 12 coef_regression1_small
## 13 coef_regression1_large
## 14 coef_regression2_small
## 15 coef_regression2_large
## command
## 1 knit('report.Rmd', quiet = TRUE)
## 2 simulate(48)
## 3 simulate(64)
## 4 reg1(small)
## 5 reg1(large)
## 6 reg2(small)
## 7 reg2(large)
## 8 suppressWarnings(summary(regression1_small$residuals))
## 9 suppressWarnings(summary(regression1_large$residuals))
## 10 suppressWarnings(summary(regression2_small$residuals))
## 11 suppressWarnings(summary(regression2_large$residuals))
## 12 suppressWarnings(summary(regression1_small))$coefficients
## 13 suppressWarnings(summary(regression1_large))$coefficients
## 14 suppressWarnings(summary(regression2_small))$coefficients
## 15 suppressWarnings(summary(regression2_large))$coefficients
Each row is an intermediate step, and each command generates a single target. A target is an output R object (cached when generated) or an output file (specified with single quotes), and a command just an ordinary piece of R code (not necessarily a single function call). Commands make use of R objects imported from your workspace, targets generated by other commands, and initial input files. These dependencies give your project an underlying network representation.
# Hover, click, drag, zoom, and pan.
config <- drake_config(my_plan)
vis_drake_graph(config, width = "100%", height = "500px") # Also drake_graph()
You can also check the dependencies of individual targets and imported functions.
deps(reg2)
## [1] "lm"
deps(my_plan$command[1]) # Files like report.Rmd are single-quoted.
## [1] "'report.Rmd'" "coef_regression2_small"
## [3] "knit" "large"
## [5] "small"
deps(my_plan$command[nrow(my_plan)])
## [1] "regression2_large" "summary" "suppressWarnings"
List all the reproducibly-tracked objects and files.
tracked(my_plan, targets = "small")
## connect 47 imports: output_plan, data_plan, results, tmp, predictors, config,...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## [1] "small" "simulate" "data.frame" "mtcars" "nrow"
## [6] "sample.int"
tracked(my_plan)
## connect 47 imports: output_plan, data_plan, results, tmp, predictors, config,...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## [1] "'report.md'" "small"
## [3] "large" "regression1_small"
## [5] "regression1_large" "regression2_small"
## [7] "regression2_large" "summ_regression1_small"
## [9] "summ_regression1_large" "summ_regression2_small"
## [11] "summ_regression2_large" "coef_regression1_small"
## [13] "coef_regression1_large" "coef_regression2_small"
## [15] "coef_regression2_large" "reg1"
## [17] "reg2" "simulate"
## [19] "'report.Rmd'" "knit"
## [21] "summary" "suppressWarnings"
## [23] "data.frame" "lm"
## [25] "mtcars" "nrow"
## [27] "sample.int"
Check for circular reasoning, missing input files, and other pitfalls.
check_plan(my_plan)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## connect 47 imports: output_plan, data_plan, results, tmp, predictors, config,...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
The workflow plan data frame my_plan
would be a pain to write by hand, so drake
has functions to help you. Here are the commands to generate the bootstrapped datasets.
my_datasets <- drake_plan(
small = simulate(48),
large = simulate(64))
my_datasets
## target command
## 1 small simulate(48)
## 2 large simulate(64)
For multiple replicates:
expand_plan(my_datasets, values = c("rep1", "rep2"))
## target command
## 1 small_rep1 simulate(48)
## 2 small_rep2 simulate(48)
## 3 large_rep1 simulate(64)
## 4 large_rep2 simulate(64)
Here is a template for applying our regression models to our bootstrapped datasets.
methods <- drake_plan(
regression1 = reg1(dataset__),
regression2 = reg2(dataset__))
methods
## target command
## 1 regression1 reg1(dataset__)
## 2 regression2 reg2(dataset__)
We evaluate the dataset__
wildcard to generate all the regression commands we need.
my_analyses <- plan_analyses(methods, data = my_datasets)
my_analyses
## target command
## 1 regression1_small reg1(small)
## 2 regression1_large reg1(large)
## 3 regression2_small reg2(small)
## 4 regression2_large reg2(large)
Next, we summarize each analysis of each dataset. We calculate descriptive statistics on the residuals, and we collect the regression coefficients and their p-values.
summary_types <- drake_plan(
summ = suppressWarnings(summary(analysis__$residuals)),
coef = suppressWarnings(summary(analysis__))$coefficients
)
summary_types
## target command
## 1 summ suppressWarnings(summary(analysis__$residuals))
## 2 coef suppressWarnings(summary(analysis__))$coefficients
results <- plan_summaries(summary_types, analyses = my_analyses,
datasets = my_datasets, gather = NULL)
results
## target
## 1 summ_regression1_small
## 2 summ_regression1_large
## 3 summ_regression2_small
## 4 summ_regression2_large
## 5 coef_regression1_small
## 6 coef_regression1_large
## 7 coef_regression2_small
## 8 coef_regression2_large
## command
## 1 suppressWarnings(summary(regression1_small$residuals))
## 2 suppressWarnings(summary(regression1_large$residuals))
## 3 suppressWarnings(summary(regression2_small$residuals))
## 4 suppressWarnings(summary(regression2_large$residuals))
## 5 suppressWarnings(summary(regression1_small))$coefficients
## 6 suppressWarnings(summary(regression1_large))$coefficients
## 7 suppressWarnings(summary(regression2_small))$coefficients
## 8 suppressWarnings(summary(regression2_large))$coefficients
The gather
feature reduces a collection of targets to a single target. The resulting commands are long, so gathering is deactivated for the sake of readability.
For the dynamic report 'report.Rmd'
/'report.md'
, be sure the file names are single-quoted. Single quotes denote file targets/imports, and double quotes denote literal strings that should not be treated as dependencies. To tell drake
to look for the other dependencies of 'report.md'
, be sure the source file 'report.Rmd'
exists and knit()
is in the workflow plan command. That way, drake
searches the active code chunks in 'report.Rmd'
for any targets/imports mentioned in calls to loadd()
and readd()
.
report <- drake_plan(
report.md = knit('report.Rmd', quiet = TRUE), # nolint
file_targets = TRUE, strings_in_dots = "filenames")
report
## target command
## 1 'report.md' knit('report.Rmd', quiet = TRUE)
Finally, consolidate your workflow using rbind()
. Row order does not matter.
my_plan <- rbind(report, my_datasets, my_analyses, results)
my_plan
## target
## 1 'report.md'
## 2 small
## 3 large
## 4 regression1_small
## 5 regression1_large
## 6 regression2_small
## 7 regression2_large
## 8 summ_regression1_small
## 9 summ_regression1_large
## 10 summ_regression2_small
## 11 summ_regression2_large
## 12 coef_regression1_small
## 13 coef_regression1_large
## 14 coef_regression2_small
## 15 coef_regression2_large
## command
## 1 knit('report.Rmd', quiet = TRUE)
## 2 simulate(48)
## 3 simulate(64)
## 4 reg1(small)
## 5 reg1(large)
## 6 reg2(small)
## 7 reg2(large)
## 8 suppressWarnings(summary(regression1_small$residuals))
## 9 suppressWarnings(summary(regression1_large$residuals))
## 10 suppressWarnings(summary(regression2_small$residuals))
## 11 suppressWarnings(summary(regression2_large$residuals))
## 12 suppressWarnings(summary(regression1_small))$coefficients
## 13 suppressWarnings(summary(regression1_large))$coefficients
## 14 suppressWarnings(summary(regression2_small))$coefficients
## 15 suppressWarnings(summary(regression2_large))$coefficients
If your workflow does not fit the rigid datasets/analyses/summaries framework, consider using functions expand_plan()
, evaluate_plan()
, and gather_plan()
.
df <- drake_plan(data = simulate(center = MU, scale = SIGMA))
df
## target command
## 1 data simulate(center = MU, scale = SIGMA)
df <- expand_plan(df, values = c("rep1", "rep2"))
df
## target command
## 1 data_rep1 simulate(center = MU, scale = SIGMA)
## 2 data_rep2 simulate(center = MU, scale = SIGMA)
evaluate_plan(df, wildcard = "MU", values = 1:2)
## target command
## 1 data_rep1_1 simulate(center = 1, scale = SIGMA)
## 2 data_rep1_2 simulate(center = 2, scale = SIGMA)
## 3 data_rep2_1 simulate(center = 1, scale = SIGMA)
## 4 data_rep2_2 simulate(center = 2, scale = SIGMA)
evaluate_plan(df, wildcard = "MU", values = 1:2, expand = FALSE)
## target command
## 1 data_rep1 simulate(center = 1, scale = SIGMA)
## 2 data_rep2 simulate(center = 2, scale = SIGMA)
evaluate_plan(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1)), expand = FALSE)
## target command
## 1 data_rep1 simulate(center = 1, scale = 0.1)
## 2 data_rep2 simulate(center = 2, scale = 1)
evaluate_plan(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1, 10)))
## target command
## 1 data_rep1_1_0.1 simulate(center = 1, scale = 0.1)
## 2 data_rep1_1_1 simulate(center = 1, scale = 1)
## 3 data_rep1_1_10 simulate(center = 1, scale = 10)
## 4 data_rep1_2_0.1 simulate(center = 2, scale = 0.1)
## 5 data_rep1_2_1 simulate(center = 2, scale = 1)
## 6 data_rep1_2_10 simulate(center = 2, scale = 10)
## 7 data_rep2_1_0.1 simulate(center = 1, scale = 0.1)
## 8 data_rep2_1_1 simulate(center = 1, scale = 1)
## 9 data_rep2_1_10 simulate(center = 1, scale = 10)
## 10 data_rep2_2_0.1 simulate(center = 2, scale = 0.1)
## 11 data_rep2_2_1 simulate(center = 2, scale = 1)
## 12 data_rep2_2_10 simulate(center = 2, scale = 10)
gather_plan(df)
## target command
## 1 target list(data_rep1 = data_rep1, data_rep2 = data_rep2)
gather_plan(df, target = "my_summaries", gather = "rbind")
## target command
## 1 my_summaries rbind(data_rep1 = data_rep1, data_rep2 = data_rep2)
You may want to check for outdated or missing targets/imports first.
config <- drake_config(my_plan, verbose = FALSE)
outdated(config) # Targets that need to be (re)built.
## [1] "'report.md'" "coef_regression1_large"
## [3] "coef_regression1_small" "coef_regression2_large"
## [5] "coef_regression2_small" "large"
## [7] "regression1_large" "regression1_small"
## [9] "regression2_large" "regression2_small"
## [11] "small" "summ_regression1_large"
## [13] "summ_regression1_small" "summ_regression2_large"
## [15] "summ_regression2_small"
missed(config) # Checks your workspace.
## character(0)
Then just make(my_plan)
.
make(my_plan)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## connect 53 imports: output_plan, data_plan, results, tmp, predictors, config,...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## check 9 items: 'report.Rmd', data.frame, knit, lm, mtcars, nrow, sample.int, ...
## check 3 items: reg1, reg2, simulate
## check 2 items: large, small
## target large
## target small
## check 4 items: regression1_large, regression1_small, regression2_large, regre...
## target regression1_large
## target regression1_small
## target regression2_large
## target regression2_small
## check 8 items: coef_regression1_large, coef_regression1_small, coef_regressio...
## target coef_regression1_large
## target coef_regression1_small
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression1_large
## target summ_regression1_small
## target summ_regression2_large
## target summ_regression2_small
## check 1 item: 'report.md'
## unload 11 items: regression1_small, regression1_large, regression2_small, reg...
## target 'report.md'
For the reg2()
model on the small dataset, the p-value on x2
is so small that there may be an association between weight and fuel efficiency after all.
readd(coef_regression2_small)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.1005134 0.9653295 28.07385 1.364319e-30
## x2 -0.6200367 0.0677864 -9.14692 6.410805e-12
The non-file dependencies of your last target are already loaded in your workspace.
ls()
## [1] "Produc" "analysis_methods"
## [3] "analysis_plan" "b"
## [5] "bad_plan" "coef_regression2_small"
## [7] "combos" "command"
## [9] "commands" "config"
## [11] "data_plan" "dataset_plan"
## [13] "datasets" "debug_plan"
## [15] "df" "envir"
## [17] "error" "f"
## [19] "files" "get_rmspe"
## [21] "good_plan" "large"
## [23] "local" "make_my_plot"
## [25] "make_my_table" "methods"
## [27] "model_plan" "my_analyses"
## [29] "my_datasets" "my_plan"
## [31] "myplan" "output_plan"
## [33] "output_types" "package_list"
## [35] "path" "plot_rmspe"
## [37] "predictors" "recent"
## [39] "reg1" "reg2"
## [41] "report" "report_file"
## [43] "report_plan" "reportfile"
## [45] "results" "rmspe"
## [47] "rmspe_plan" "rmspe_results_plan"
## [49] "rules" "simulate"
## [51] "small" "summary_types"
## [53] "targets" "tmp"
## [55] "whole_plan" "x"
outdated(config) # Everything is up to date.
## character(0)
build_times(digits = 4) # How long did it take to make each target?
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## item type elapsed user system
## 1 'report.Rmd' import 0s 0s 0s
## 2 'report.md' target 0.037s 0.036s 0s
## 3 coef_regression1_large target 0.004s 0.004s 0s
## 4 coef_regression1_small target 0.011s 0.008s 0.004s
## 5 coef_regression2_large target 0.004s 0.004s 0s
## 6 coef_regression2_small target 0.003s 0.004s 0s
## 7 data.frame import 0.005s 0.008s 0s
## 8 knit import 0.002s 0s 0s
## 9 large target 0.004s 0.004s 0s
## 10 lm import 0.001s 0.004s 0s
## 11 mtcars import 0s 0s 0s
## 12 nrow import 0.004s 0.004s 0s
## 13 reg1 import 0s 0.004s 0s
## 14 reg2 import 0.001s 0s 0s
## 15 regression1_large target 0.005s 0.008s 0s
## 16 regression1_small target 0.004s 0.004s 0s
## 17 regression2_large target 0.004s 0s 0.004s
## 18 regression2_small target 0.005s 0.004s 0s
## 19 sample.int import 0.003s 0.004s 0s
## 20 simulate import 0.001s 0s 0s
## 21 small target 0.004s 0.004s 0s
## 22 summ_regression1_large target 0.004s 0.004s 0s
## 23 summ_regression1_small target 0.003s 0.004s 0s
## 24 summ_regression2_large target 0.004s 0.004s 0s
## 25 summ_regression2_small target 0.003s 0.004s 0s
## 26 summary import 0.003s 0.004s 0s
## 27 suppressWarnings import 0.004s 0.004s 0s
See also predict_runtime()
and rate_limiting_times()
.
In the new graph, the black nodes from before are now green.
# Hover, click, drag, zoom, and pan.
vis_drake_graph(config, width = "100%", height = "500px")
Optionally, get visNetwork nodes and edges so you can make your own plot with visNetwork()
or render_drake_graph()
.
dataframes_graph(config)
Use readd()
and loadd()
to load targets into your workspace. (They are cached in the hidden .drake/
folder using storr). There are many more functions for interacting with the cache.
readd(coef_regression2_large)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.0226962 0.76499386 39.24567 1.698338e-45
## x2 -0.8558643 0.06718172 -12.73954 5.759347e-19
loadd(small)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
head(small)
## x y
## 1 2.20 32.4
## 2 3.52 15.5
## 3 3.15 22.8
## 4 5.25 10.4
## 5 3.78 15.2
## 6 2.62 21.0
rm(small)
cached(small, large)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## small large
## TRUE TRUE
cached()
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## [1] "'report.Rmd'" "'report.md'"
## [3] "coef_regression1_large" "coef_regression1_small"
## [5] "coef_regression2_large" "coef_regression2_small"
## [7] "data.frame" "knit"
## [9] "large" "lm"
## [11] "mtcars" "nrow"
## [13] "reg1" "reg2"
## [15] "regression1_large" "regression1_small"
## [17] "regression2_large" "regression2_small"
## [19] "sample.int" "simulate"
## [21] "small" "summ_regression1_large"
## [23] "summ_regression1_small" "summ_regression2_large"
## [25] "summ_regression2_small" "summary"
## [27] "suppressWarnings"
built()
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## [1] "'report.md'" "coef_regression1_large"
## [3] "coef_regression1_small" "coef_regression2_large"
## [5] "coef_regression2_small" "large"
## [7] "regression1_large" "regression1_small"
## [9] "regression2_large" "regression2_small"
## [11] "small" "summ_regression1_large"
## [13] "summ_regression1_small" "summ_regression2_large"
## [15] "summ_regression2_small"
imported()
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## [1] "'report.Rmd'" "data.frame" "knit"
## [4] "lm" "mtcars" "nrow"
## [7] "reg1" "reg2" "sample.int"
## [10] "simulate" "summary" "suppressWarnings"
head(read_drake_plan())
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## target command
## 1 'report.md' knit('report.Rmd', quiet = TRUE)
## 2 small simulate(48)
## 3 large simulate(64)
## 4 regression1_small reg1(small)
## 5 regression1_large reg1(large)
## 6 regression2_small reg2(small)
head(progress()) # See also in_progress()
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## 'report.Rmd' 'report.md' coef_regression1_large
## "finished" "finished" "finished"
## coef_regression1_small coef_regression2_large coef_regression2_small
## "finished" "finished" "finished"
progress(large)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## large
## "finished"
# drake_session() # sessionInfo() of the last make() # nolint
The next time you run make(my_plan)
, nothing will build because drake
knows everything is already up to date.
config <- make(my_plan) # Will use config later. See also drake_config().
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## Unloading targets from environment:
## large
## coef_regression2_small
## connect 53 imports: output_plan, data_plan, results, tmp, predictors, config,...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## check 9 items: 'report.Rmd', data.frame, knit, lm, mtcars, nrow, sample.int, ...
## check 3 items: reg1, reg2, simulate
## check 2 items: large, small
## check 4 items: regression1_large, regression1_small, regression2_large, regre...
## check 8 items: coef_regression1_large, coef_regression1_small, coef_regressio...
## check 1 item: 'report.md'
## All targets are already up to date.
But if you change one of your functions, commands, or other dependencies, drake will update the affected targets. Suppose we change the quadratic term to a cubic term in reg2()
. We might want to do this if we suspect a cubic relationship between tons and miles per gallon.
reg2 <- function(d) {
d$x3 <- d$x ^ 3
lm(y ~ x3, data = d)
}
The targets that depend on reg2()
need to be rebuilt.
outdated(config)
## check 9 items: 'report.Rmd', data.frame, knit, lm, mtcars, nrow, sample.int, ...
## check 3 items: reg1, reg2, simulate
## check 2 items: large, small
## check 4 items: regression1_large, regression1_small, regression2_large, regre...
## check 4 items: coef_regression1_large, coef_regression1_small, summ_regressio...
## [1] "'report.md'" "coef_regression2_large"
## [3] "coef_regression2_small" "regression2_large"
## [5] "regression2_small" "summ_regression2_large"
## [7] "summ_regression2_small"
Advanced: To find out why a target is out of date, you can load the storr cache and compare the appropriate hash keys to the output of dependency_profile()
.
dependency_profile(target = "regression2_small", config = config)
## $cached_command
## [1] "{\n reg2(small) \n}"
##
## $current_command
## [1] "{\n reg2(small) \n}"
##
## $cached_file_modification_time
## NULL
##
## $cached_dependency_hash
## [1] "864b4e5947c8753982fecde3c41d2f1e4bdba1eaf0711d870496a66523fa08b0"
##
## $current_dependency_hash
## [1] "5c77977876a4292b634be67227691b8c3834501d0c5dfc56095fe01e3ada0137"
##
## $hashes_of_dependencies
## reg2 small
## "d47109544c89ca7a" "62ca42a74bf5f8e6"
config$cache$get_hash(key = "small") # same
## [1] "62ca42a74bf5f8e6"
config$cache$get_hash(key = "reg2") # different
## [1] "cd89057e24fe00ae"
# Hover, click, drag, zoom, and pan.
# Same as drake_graph():
vis_drake_graph(config, width = "100%", height = "500px")
The next make()
will rebuild the targets depending on reg2()
and leave everything else alone.
make(my_plan)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## connect 53 imports: output_plan, data_plan, results, tmp, predictors, config,...
## connect 15 targets: 'report.md', small, large, regression1_small, regression1...
## check 9 items: 'report.Rmd', data.frame, knit, lm, mtcars, nrow, sample.int, ...
## check 3 items: reg1, reg2, simulate
## check 2 items: large, small
## check 4 items: regression1_large, regression1_small, regression2_large, regre...
## check 4 items: coef_regression1_large, coef_regression1_small, summ_regressio...
## load 2 items: large, small
## target regression2_large
## target regression2_small
## check 4 items: coef_regression2_large, coef_regression2_small, summ_regressio...
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression2_large
## target summ_regression2_small
## check 1 item: 'report.md'
## unload 5 items: regression2_small, regression2_large, summ_regression2_small,...
## target 'report.md'
Trivial changes to whitespace and comments are totally ignored.
reg2 <- function(d) {
d$x3 <- d$x ^ 3
lm(y ~ x3, data = d) # I indented here.
}
outdated(config) # Everything is up to date.
## check 9 items: 'report.Rmd', data.frame, knit, lm, mtcars, nrow, sample.int, ...
## check 3 items: reg1, reg2, simulate
## check 2 items: large, small
## check 4 items: regression1_large, regression1_small, regression2_large, regre...
## check 8 items: coef_regression1_large, coef_regression1_small, coef_regressio...
## check 1 item: 'report.md'
## character(0)
Need to add new work on the fly? Just append rows to the workflow plan. If the rest of your workflow is up to date, only the new work is run.
new_simulation <- function(n){
data.frame(x = rnorm(n), y = rnorm(n))
}
additions <- drake_plan(
new_data = new_simulation(36) + sqrt(10))
additions
## target command
## 1 new_data new_simulation(36) + sqrt(10)
my_plan <- rbind(my_plan, additions)
my_plan
## target
## 1 'report.md'
## 2 small
## 3 large
## 4 regression1_small
## 5 regression1_large
## 6 regression2_small
## 7 regression2_large
## 8 summ_regression1_small
## 9 summ_regression1_large
## 10 summ_regression2_small
## 11 summ_regression2_large
## 12 coef_regression1_small
## 13 coef_regression1_large
## 14 coef_regression2_small
## 15 coef_regression2_large
## 16 new_data
## command
## 1 knit('report.Rmd', quiet = TRUE)
## 2 simulate(48)
## 3 simulate(64)
## 4 reg1(small)
## 5 reg1(large)
## 6 reg2(small)
## 7 reg2(large)
## 8 suppressWarnings(summary(regression1_small$residuals))
## 9 suppressWarnings(summary(regression1_large$residuals))
## 10 suppressWarnings(summary(regression2_small$residuals))
## 11 suppressWarnings(summary(regression2_large$residuals))
## 12 suppressWarnings(summary(regression1_small))$coefficients
## 13 suppressWarnings(summary(regression1_large))$coefficients
## 14 suppressWarnings(summary(regression2_small))$coefficients
## 15 suppressWarnings(summary(regression2_large))$coefficients
## 16 new_simulation(36) + sqrt(10)
make(my_plan)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## Unloading targets from environment:
## small
## large
## coef_regression2_small
## connect 55 imports: additions, output_plan, data_plan, results, tmp, predicto...
## connect 16 targets: 'report.md', small, large, regression1_small, regression1...
## check 11 items: 'report.Rmd', data.frame, knit, lm, mtcars, nrow, rnorm, samp...
## check 4 items: new_simulation, reg1, reg2, simulate
## check 3 items: large, new_data, small
## check 4 items: regression1_large, regression1_small, regression2_large, regre...
## check 8 items: coef_regression1_large, coef_regression1_small, coef_regressio...
## check 1 item: 'report.md'
## target new_data
If you ever need to erase your work, use clean()
. The next make()
will rebuild any cleaned targets, so be careful. You may notice that by default, the size of the cache does not go down very much. To purge old data, you could use clean(garbage_collection = TRUE, purge = TRUE)
. To do garbage collection without removing any important targets, use drake_gc()
.
# Uncaches individual targets and imported objects.
clean(small, reg1, verbose = FALSE)
clean(verbose = FALSE) # Cleans all targets out of the cache.
drake_gc(verbose = FALSE) # Just garbage collection.
clean(destroy = TRUE, verbose = FALSE) # removes the cache entirely
As you have seen with reg2()
, drake
reacts to changes in dependencies. In other words, make()
notices when your dependencies are different from last time, rebuilds any affected targets, and continues downstream. In particular, drake
watches for nontrivial changes to the following.
readd()
and loadd()
in the code chunks to be evaluated. Drake
treats these targets and imports as dependencies of the compiled output target (say, 'report.md'
or report.html
). **To activate this feature, the command in your workflow plan data frame must call knitr::knit()
or rmarkdown::render()
. Examples of acceptable commands:
knit('report.Rmd')
knitr::knit(input = 'report.Rmd', quiet = TRUE)
render('report.Rmd')
rmarkdown::('report.Rmd', output_file = "report.html")
To enhance reproducibility beyond the scope of drake, you might consider packrat and Docker. Packrat creates a tightly-controlled local library of packages to extend the shelf life of your project. And with Docker, you can execute your project on a virtual machine to ensure platform independence. Together, packrat and Docker can help others reproduce your work even if they have different software and hardware.
Drake
has extensive high-performance computing support, from local multicore processing to serious distributed computing across multiple nodes of a cluster. See the parallelism vignette for detailed instructions.