% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DataBackendDplyr.R
\name{DataBackendDplyr}
\alias{DataBackendDplyr}
\title{DataBackend for dplyr/dbplyr}
\format{\link[R6:R6Class]{R6::R6Class} object inheriting from \link[mlr3:DataBackend]{mlr3::DataBackend}.}
\description{
A \link[mlr3:DataBackend]{mlr3::DataBackend} using \code{\link[dplyr:tbl]{dplyr::tbl()}} from packages \CRANpkg{dplyr}/\CRANpkg{dbplyr}.
This includes \code{\link[tibble:tibble]{tibbles}} and abstract data base connections interfaced by \CRANpkg{dbplyr}.
The latter allows \link[mlr3:Task]{mlr3::Task}s to interface an out-of-memory data base.
}
\section{Construction}{
\preformatted{DataBackendDplyr$new(data, primary_key = NULL, strings_as_factors = TRUE, connector = NULL)
}
\itemize{
\item \code{data} :: \code{\link[dplyr:tbl]{dplyr::tbl()}}\cr
The data object.
\item \code{primary_key} :: \code{character(1)}\cr
Name of the primary key column.
\item \code{strings_as_factors} :: \code{logical(1)} || \code{character()}\cr
Either a character vector of column names to convert to factors, or a single logical flag:
if \code{FALSE}, no column will be converted, if \code{TRUE} all string columns (except the primary key).
The backend is queried for distinct values of the respective columns and their levels are stored in \verb{$levels}.
\item \code{connector} :: \verb{function()}\cr
If not \code{NULL}, a function which re-connects to the data base in case the connection has become invalid.
Database connections can become invalid due to timeouts or if the backend is serialized to the file system and then de-serialized again.
This round trip is often performed for parallelization, e.g. to send the objects to remote workers.
\code{\link[DBI:dbIsValid]{DBI::dbIsValid()}} is called to validate the connection.
The function must return just the connection, not a \code{\link[dplyr:tbl]{dplyr::tbl()}} object!

Note that this this function is serialized together with the backend, including possible sensitive information such as login credentials.
These can be retrieved from the stored \link[mlr3:DataBackend]{mlr3::DataBackend}/\link[mlr3:Task]{mlr3::Task}.
To protect your credentials, it is recommended to use the \CRANpkg{secret} package.
}

Alternatively, use \code{\link[mlr3:as_data_backend]{mlr3::as_data_backend()}} on a \code{\link[dplyr:tbl]{dplyr::tbl()}} to construct a \link{DataBackend} for you.
Note that only objects of class \code{"tbl_lazy"} will be converted to a \link{DataBackendDplyr} (this includes all connectors from \CRANpkg{dbplyr}).
Local \code{"tbl"} objects such as \code{\link[tibble:tibble]{tibbles}} will converted to a \link[mlr3:DataBackendDataTable]{DataBackendDataTable}.
}

\section{Fields}{

All fields from \link[mlr3:DataBackend]{mlr3::DataBackend}, and additionally:
\itemize{
\item \code{levels} :: named \code{list()}\cr
List of factor levels, named with column names.
Referenced columns get automatically converted to factors in \verb{$data()} and \verb{$head()}.
\item \code{connector} :: \verb{function()}\cr
Function which is called to re-connect in case the connection became invalid.
\item \code{valid} :: \code{logical(1)}\cr
Returns \code{NA} if the data does not inherits from \code{"tbl_sql"} (i.e., it is not a real SQL data base).
Returns the result of \code{\link[DBI:dbIsValid]{DBI::dbIsValid()}} otherwise.
}
}

\section{Methods}{

All methods from \link[mlr3:DataBackend]{mlr3::DataBackend}, and additionally:
\itemize{
\item \code{finalize()}\cr
() -> \code{logical(1)}\cr
Finalizer which disconnects from the data base.
Is called during garbage collection, but may also be called manually.
}
}

\examples{
# Backend using a in-memory tibble
data = tibble::as_tibble(iris)
data$Sepal.Length[1:30] = NA
data$row_id = 1:150
b = DataBackendDplyr$new(data, primary_key = "row_id")

# Object supports all accessors of DataBackend
print(b)
b$nrow
b$ncol
b$colnames
b$data(rows = 100:101, cols = "Species")
b$distinct(b$rownames, "Species")

# Classification task using this backend
task = mlr3::TaskClassif$new(id = "iris_tibble", backend = b, target = "Species")
print(task)
task$head()

# Create a temporary SQLite data base
con = DBI::dbConnect(RSQLite::SQLite(), ":memory:")
dplyr::copy_to(con, data)
tbl = dplyr::tbl(con, "data")

# Define a backend on a subset of the data base
tbl = dplyr::select_at(tbl, setdiff(colnames(tbl), "Sepal.Width")) # do not use column "Sepal.Width"
tbl = dplyr::filter(tbl, row_id \%in\% 1:120) # Use only first 120 rows
b = DataBackendDplyr$new(tbl, primary_key = "row_id")
print(b)

# Query disinct values
b$distinct(b$rownames, "Species")

# Query number of missing values
b$missings(b$rownames, b$colnames)

# Note that SQLite does not support factors, column Species has been converted to character
lapply(b$head(), class)

# Cleanup
rm(tbl)
DBI::dbDisconnect(con)
}
