Help for package plmmr

Title:

Penalized Linear Mixed Models for Correlated Data

Version:

4.3.0

Description:

Fits penalized linear mixed models that correct for unobserved confounding factors. 'plmmr' infers and corrects for the presence of unobserved confounding effects such as population stratification and environmental heterogeneity. It then fits a linear model via penalized maximum likelihood. Originally designed for the multivariate analysis of single nucleotide polymorphisms (SNPs) measured in a genome-wide association study (GWAS), 'plmmr' eliminates the need for subpopulation-specific analyses and post-analysis p-value adjustments. Functions for the appropriate processing of 'PLINK' files are also supplied. For examples, see the package homepage https://pbreheny.github.io/plmmr/.

License:

GPL-3

URL:

https://pbreheny.github.io/plmmr/, https://github.com/pbreheny/plmmr/

BugReports:

https://github.com/pbreheny/plmmr/issues/

Depends:

bigalgebra, bigmemory, R (≥ 4.4.0)

Imports:

biglasso (≥ 1.6.0), data.table, glmnet, Matrix, ncvreg, parallel, utils

Suggests:

bigsnpr, bigstatsr, graphics, grDevices, knitr, MASS, rmarkdown, R.utils, tinytest, withr

LinkingTo:

BH, bigmemory, Rcpp, RcppArmadillo (≥ 0.8.600)

LazyData:

true

VignetteBuilder:

knitr

Encoding:

UTF-8

Config/roxygen2/version:

8.0.0

NeedsCompilation:

yes

Packaged:

2026-05-19 13:27:29 UTC; pbreheny

Author:

Tabitha K. Peter

[aut], Anna C. Reisetter

[aut], Yujing Lu [aut], Oscar A. Rysavy

[aut], Patrick J. Breheny

[aut, cre]

Maintainer:

Patrick J. Breheny <patrick-breheny@uiowa.edu>

Repository:

CRAN

Date/Publication:

2026-06-11 07:20:02 UTC

plmmr: Penalized Linear Mixed Models for Correlated Data

Description

Author(s)

Maintainer: Patrick J. Breheny patrick-breheny@uiowa.edu (ORCID)

Authors:

Patrick J. Breheny patrick-breheny@uiowa.edu (ORCID)
Tabitha K. Peter tabitha.peter15@gmail.com (ORCID)
Anna C. Reisetter anna-reisetter@uiowa.edu (ORCID)
Yujing Lu
Oscar A. Rysavy oscar-rysavy@uiowa.edu (ORCID)

A helper function to add predictors to a filebacked matrix of data

Description

A helper function to add predictors to a filebacked matrix of data

Usage

add_predictors(obj, add_predictor, id_var, rds_dir, outfile, quiet)

Arguments

obj

A bigSNP object

add_predictor

Optional: add additional covariates/predictors/features from an external file (i.e., not a PLINK file).

id_var

String specifying which column of the PLINK .fam file has the unique sample identifiers.

rds_dir

The path to the directory in which you want to create the new .rds and .bk files. Defaults to data_dir(from process_plink() call)

outfile

A string with the name of the filepath for the log file

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

A list of 2 components:

design_matrix - a bigSNP object with an added element representing the matrix that includes the additional predictors as the first few columns
unpen - an integer vector that ranges from 1 to the number of added predictors. Example: if 2 predictors are added, unpen = 1:2

Admix: Semi-simulated SNP data

Description

A dataset containing the 100 SNPs, a demographic variable representing ancestry, and a simulated outcome.

Usage

admix

Format

A list with 3 components:

X: SNP matrix (197 observations of 100 SNPs)
y: 197 x 1 matrix of simulated (continuous) outcomes
ancestry: vector with ancestry categorization: 0 = African, 1 = African American, 2 = European, 3 = Japanese

Source

https://hastie.su.domains/CASI/

A helper function to support `create_design_filebacked()`

Description

A helper function to support create_design_filebacked()

Usage

align_ids(id_var, add_predictor, og_ids, outfile, quiet)

Arguments

id_var

String specifying the variable name of the ID column

add_predictor

External data to include in design matrix. This is the add_predictor arg in create_design_filebacked()

og_ids

Character vector with the PLINK ids (FID or IID) from the original data (i.e., the data before any subsetting from handling missing phenotypes)

outfile

A string with the name of the filepath for the log file

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

A matrix with the same dimensions as add_predictor

A version of `cbind()` for file-backed matrices

Description

A version of cbind() for file-backed matrices

Usage

big_cbind(A, B, C, quiet)

Arguments

A

in-memory data

B

file-backed data

C

file-backed placeholder for combined data

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

C, filled in with all column values of A and B combined

Coef method for `cv_plmm` class

Description

Coef method for cv_plmm class

Usage

## S3 method for class 'cv_plmm'
coef(object, lambda, which = object$min, ...)

Arguments

object

An object of class cv_plmm.

lambda

A numeric vector of lambda values.

which

Vector of lambda indices for which to return coefficients. Defaults to lambda index with minimum CVE.

...

Additional arguments (not used).

Value

Returns a named numeric vector. Values are the coefficients of the model at the specified value(s) of either lambda or which. Names are the values of lambda.

Examples

cv_fit <- cv_plmm(admix$X, admix$y, return_fit = TRUE)
head(coef(cv_fit))

Coef method for `plmm` class

Description

Coef method for plmm class

Usage

## S3 method for class 'plmm'
coef(object, lambda, which = seq_along(object$lambda), drop = TRUE, ...)

Arguments

object

An object of class plmm.

lambda

A numeric vector of lambda values.

which

Vector of lambda indices for which to return coefficients.

drop

Logical. Should returned object be coerced to a vector if possible?

...

Additional arguments.

Value

Either a numeric matrix (if model was fit on data stored in memory) or a sparse matrix (if model was fit on data stored filebacked). Rownames are feature names, columns are values of lambda.

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
fit <- plmm(design = admix_design)
coef(fit)[1:10, 41:45]

A function to compute the BLUP

Description

A function to compute the BLUP

Usage

compute_blup(fit, Xb, Sigma_21, idx)

Arguments

fit

An object returned by plmm()

Xb

Linear predictor

Sigma_21

Covariance matrix between the training and the testing data. Extracted from estimated_Sigma that is generated using all observations

idx

Vector of indices of the penalty parameter lambda at which predictions are required. By default, all indices are returned.

Value

A matrix of the linear predictors + the estimated random effects

A function to construct the estimated variance matrix from a PLMM fit

Description

A function to construct the estimated variance matrix from a PLMM fit

Usage

construct_variance(fit, K = NULL, eta = NULL)

Arguments

fit

An object returned by plmm()

K

An optional matrix

eta

An optional numeric value between 0 and 1; if fit is not supplied, then this option must be specified.

Value

Sigma_hat, a matrix representing the estimated variance

A helper function to count constant features

Description

A helper function to count constant features

Usage

count_constant_features(fbm, outfile, quiet)

Arguments

fbm

A filebacked big.matrix

outfile

String specifying name of log file

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

A numeric vector with the indices of the non-singular columns of the matrix associated with fbm

A helper function to count the number of cores available on the current machine

Description

A helper function to count the number of cores available on the current machine

Usage

count_cores()

Value

A number of cores to use; if parallel is installed, this will be parallel::detectCores(). Otherwise, this returns a 1.

A function to create a design for PLMM modeling

Description

A function to create a design for PLMM modeling

Usage

create_design(data_file = NULL, rds_dir = NULL, X = NULL, y = NULL, ...)

Arguments

data_file

For filebacked data (data from process_plink() or process_delim()), this is the filepath to the processed data. Defaults to NULL (this argument does not apply for in-memory data).

rds_dir

For filebacked data, this is the filepath to the directory/folder where you want the design to be saved. Note: do not include/append the name you want for the to-be-created file – the name is the argument new_file, passed to create_design_filebacked(). Defaults to NULL (this argument does not apply for in-memory data).

X

For in-memory data (data in a matrix or data frame), this is the design matrix. Defaults to NULL (this argument does not apply for filebacked data).

y

For in-memory data, this is the numeric vector representing the outcome. Defaults to NULL (this argument does not apply for filebacked data). Note: it is the responsibility of the user to ensure that the rows in X and the corresponding elements of y have the same row order, i.e., observations must be in the same order in both the design matrix and in the outcome vector.

...

Additional arguments to pass to create_design_filebacked() or create_design_in_memory(). See the documentation for those helper functions for details.

Details

This function is a wrapper for the other create_design...() inner functions; all arguments included here are passed along to the create_design...() inner function that matches the type of the data being supplied. Note which arguments are optional and which ones are not.

Additional arguments for all filebacked data:

new_file User-specified filename (without .bk/.rds extension) for the to-be-created .rds/.bk files. Must be different from any existing .rds/.bk files in the same folder.
feature_id Optional: A string specifying the column in the data X (the feature data) with the row IDs (e.g., identifiers for each row/sample/participant/, etc.). No duplicates allowed. - for PLINK data: a string specifying an ID column of the PLINK .fam file. Options are "IID" (default) and "FID" - for all other filebacked data: a character vector of unique identifiers (IDs) for each row of the feature data (i.e., the data processed with process_delim()) - if left NULL (default), X is assumed to have the same row-order as add_outcome. Note: if this assumption is made in error, calculations downstream will be incorrect. Pay close attention here.
add_outcome A data frame or matrix with two columns: an ID column and a column with the outcome value (to be used as 'y' in the final design). IDs must be characters, outcome must be numeric.
outcome_id A string specifying the name of the ID column in add_outcome
outcome_col A string specifying the name of the phenotype column in add_outcome
na_outcome_vals Optional: a vector of numeric values used to code NA values in the outcome. Defaults to c(-9, NA_integer) (the -9 matches PLINK conventions).
overwrite Optional: logical - should existing .rds files be overwritten? Defaults to FALSE.
logfile Optional: the name (character string) of the prefix of the logfile to be written in rds_dir. Default to NULL (no log file written). Note: do not append a .log to the filename; this is done automatically.
quiet Optional: logical - should console messages be silenced? Defaults to FALSE

Additional arguments specific to PLINK data:

add_predictor Optional (for PLINK data only): a matrix or data frame to be used for adding additional unpenalized covariates/predictors/features from an external file (i.e., not a PLINK file). This matrix must have one column that is an ID column; all other columns aside the ID will be used as covariates in the design matrix. Columns must be named.
predictor_id Optional (for PLINK data only): A string specifying the name of the column in add_predictor with sample IDs. Required if add_predictor is supplied. The names will be used to subset and align this external covariate(s) with the supplied PLINK data.

Additional arguments specific to delimited file data:

unpen Optional: a character vector with the names of columns to mark as unpenalized (i.e., these features would always be included in a model). Note: if you choose to use this option, your delimited file must have column names.

Additional arguments for in-memory data:

unpen Optional: a character vector with the names of columns to mark as unpenalized (i.e., these features would always be included in a model). Note: if you choose to use this option, X must have column names.

Value

A filepath to an object of class plmm_design, which is a named list with the design matrix, outcome, penalty factor vector, and other details needed for fitting a model. This list is stored as an .rds file for filebacked data, so in the filebacked case a string with the path to that file is returned. For in-memory data, the list itself is returned.

Examples


## Example 1: matrix data in-memory ##
admix_design <- create_design(X = admix$X, y = admix$y, unpen = "Snp1")

## Example 2: delimited data ##
# process delimited data
temp_dir <- tempdir()
colon_dat <- process_delim(data_file = "colon2.txt",
 data_dir = find_example_data(parent = TRUE), overwrite = TRUE,
 rds_dir = temp_dir, rds_prefix = "processed_colon2", sep = "\t", header = TRUE)

# prepare outcome data
colon_outcome <- read.delim(find_example_data(path = "colon2_outcome.txt"))

# create a design
colon_design <- create_design(data_file = colon_dat, rds_dir = temp_dir, new_file = "std_colon2",
add_outcome = colon_outcome, outcome_id = "ID", outcome_col = "y", unpen = "sex",
overwrite = TRUE, logfile = "test.log")

# look at the results
colon_rds <- readRDS(colon_design)
str(colon_rds)

## Example 3: PLINK data ##

# process PLINK data
temp_dir <- tempdir()
unzip_example_data(outdir = temp_dir)

plink_data <- process_plink(data_dir = temp_dir,
  data_prefix = "penncath_lite",
  rds_dir = temp_dir,
  rds_prefix = "imputed_penncath_lite",
  # imputing the mode to address missing values
  impute_method = "mode",
  # overwrite existing files in temp_dir
  # (you can turn this feature off if you need to)
  overwrite = TRUE,
  # turning off parallelization - leaving this on causes problems knitting this vignette
  parallel = FALSE)

# get outcome data
penncath_pheno <- read.csv(find_example_data(path = 'penncath_clinical.csv'))

outcome <- data.frame(FamID = as.character(penncath_pheno$FamID),
                  CAD = penncath_pheno$CAD)

unpen_predictors <- data.frame(FamID = as.character(penncath_pheno$FamID),
                               sex = penncath_pheno$sex,
                               age = penncath_pheno$age)


# create design where sex and age are always included in the model
pen_design <- create_design(data_file = plink_data,
  feature_id = "FID",
  rds_dir = temp_dir,
  new_file = "std_penncath_lite",
  add_outcome = outcome,
  outcome_id = "FamID",
  outcome_col = "CAD",
  add_predictor = unpen_predictors,
  predictor_id = "FamID",
  logfile = "design",
  # again, overwrite if needed; use with caution
  overwrite = TRUE)

# examine the design - notice the components of this object
pen_design_rds <- readRDS(pen_design)

A function to create a design matrix, outcome, and penalty factor to be passed to a model fitting function

Description

A function to create a design matrix, outcome, and penalty factor to be passed to a model fitting function

Usage

create_design_filebacked(
  obj,
  rds_dir,
  new_file,
  add_outcome,
  outcome_id,
  outcome_col,
  na_outcome_vals = c(-9, NA_integer_),
  feature_id = NULL,
  add_predictor = NULL,
  predictor_id = NULL,
  unpen = NULL,
  logfile = NULL,
  overwrite = FALSE,
  quiet = FALSE
)

Arguments

obj

The RDS object read in by create_design()

rds_dir

The path to the directory in which you want to create the new .rds and .bk files.

new_file

User-specified filename (without .bk/.rds extension) for the to-be-created .rds/.bk files. Must be different from any existing .rds/.bk files in the same folder.

add_outcome

A data frame or matrix with two columns: an ID column and a column with the outcome value (to be used as 'y' in the final design). IDs must be characters, outcome must be numeric.

outcome_id

A string specifying the name of the ID column in add_outcome

outcome_col

A string specifying the name of the phenotype column in add_outcome

na_outcome_vals

A vector of numeric values used to code NA values in the outcome. Defaults to c(-9, NA_integer) (the -9 matches PLINK conventions).

feature_id

A string specifying the column in the data X (the feature data) with the row IDs (e.g., identifiers for each row/sample/participant/, etc.). No duplicates allowed.

for PLINK data: a string specifying an ID column of the PLINK .fam file. Options are "IID" (default) and "FID"
for all other filebacked data: a character vector of unique identifiers (IDs) for each row of the feature data (i.e., the data processed with process_delim())
if left NULL (default), X is assumed to have the same row-order as add_outcome. Note: if this assumption is made in error, calculations downstream will be incorrect. Pay close attention here.

add_predictor

Optional (for PLINK data only): a matrix or data frame to be used for adding additional unpenalized covariates/predictors/features from an external file (i.e., not a PLINK file). This matrix must have one column that is an ID column; all other columns aside the ID will be used as covariates in the design matrix. Columns must be named.

predictor_id

Optional (for PLINK data only): A string specifying the name of the column in add_predictor with sample IDs. Required if add_predictor is supplied. The names will be used to subset and align this external covariate with the supplied PLINK data.

unpen

Optional (for delimited file data only): an optional character vector with the names of columns to mark as unpenalized (i.e., these features would always be included in a model). Note: if you choose to use this option, X must have column names.

logfile

Optional: name of the .log file to be written – Note: do not append a .log to the filename; this is done automatically.

overwrite

Logical: should existing .rds files be overwritten? Defaults to FALSE.

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

A filepath to the created .rds file containing all the information for model fitting, including a standardized X and model design information

A function to create a design with an in-memory X matrix

Description

A function to create a design with an in-memory X matrix

Usage

create_design_in_memory(X, y, unpen = NULL)

Arguments

X

A numeric matrix in which rows correspond to observations (e.g., samples) and columns correspond to features.

y

A numeric vector representing the outcome for the model. Note: it is the responsibility of the user to ensure that the outcome_col and X have the same row order!

unpen

An optional character vector with the names of columns to mark as unpenalized (i.e., these features would always be included in a model). Note: if you choose to use this option, X must have column names.

Value

A named list containing the standardized design matrix, outcome, penalty factor vector, and other details needed for fitting a model

Create the `.log` file

Description

Create the .log file

Usage

create_log(outfile)

Arguments

outfile

String specifying the name of the to-be-created file, without extension

Value

Nothing is returned, instead a text file with the suffix .log is created. If outfile is NULL, the path to the null device is returned.

Cross-validation for plmm

Description

Performs k-fold cross validation for lasso-, MCP-, or SCAD-penalized linear mixed models over a grid of values for the regularization parameter lambda.

Usage

cv_plmm(
  design,
  y = NULL,
  K = NULL,
  eta = NULL,
  penalty = "lasso",
  type = NULL,
  gamma,
  alpha = 1,
  lambda_min,
  nlambda = 100,
  lambda,
  eps = 1e-04,
  max_iter = 10000,
  warn = TRUE,
  init = NULL,
  cluster,
  nfolds = 5,
  fold = NULL,
  seed,
  trace = FALSE,
  save_rds = NULL,
  return_fit = TRUE,
  ...
)

Arguments

design

The first argument must be one of three things: (1) plmm_design object (as created by create_design()) (2) a string with the file path to a design object (the file path must end in .rds) (3) a matrix or data.frame object representing the design matrix of interest

y

Optional: In the case where design is a matrix or data.frame, the user must also supply a numeric outcome vector as the y argument. In this case, design and y will be passed internally to create_design(X = design, y = y).

K

Similarity matrix used to rotate the data. This should either be (1) a known matrix that reflects the covariance of y, (2) an estimate (Default is \frac{1}{p}(XX^T)), or (3) a list with components s and U, as returned by a previous plmm() model fit on the same data.
Note: If a user provides their own K matrix, it is decomposed as provided and will not be scaled. Providing K will change the default of type to 'lp' as a safeguard against potential data leakage. This can be overridden by specifying type = 'blup', but should be done with caution. Cross-validation with a user-provided K is not currently implemented for filebacked data.

eta

Optional argument to input a specific eta term rather than estimate it from the data. If K is a known covariance matrix that is full rank, this should be 1.

penalty

The penalty to be applied to the model. Either "lasso" (the default), "SCAD", or "MCP".

type

A character argument indicating what should be returned from predict.plmm(). If type = 'lp', predictions are based on the linear predictor, X beta. If type = 'blup', predictions are based on the sum of the linear predictor and the estimated random effect (BLUP). Defaults to 'blup', as this has shown to be a superior prediction method in many applications.

gamma

The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD.

alpha

Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD penalty and the ridge, or L2 penalty. alpha = 1 is equivalent to MCP/SCAD penalty, while alpha = 0 would be equivalent to ridge regression. However, alpha = 0 is not supported; alpha may be arbitrarily small, but not exactly 0.

lambda_min

The smallest value for lambda, as a fraction of lambda.max. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise.

nlambda

Length of the sequence of lambda. Default is 100.

lambda

A user-specified sequence of lambda values. By default, a sequence of values of length nlambda is computed, equally spaced on the log scale.

eps

Convergence threshold. The algorithm iterates until the RMSE for the change in linear predictors for each coefficient is less than eps. Default is 1e-4.

max_iter

Maximum number of iterations (total across entire path). Default is 10000.

warn

Return warning messages for failures to converge and model saturation? Default is TRUE.

init

Initial values for coefficients. Default is 0 for all columns of X.

cluster

Option for in-memory data only: cv_plmm() can be run in parallel across a cluster using the parallel package. The cluster must be set up in advance using parallel::makeCluster(). The cluster must then be passed to cv_plmm(). Note: this option is not yet implemented for filebacked data.

nfolds

The number of cross-validation folds. Default is 5.

fold

Which fold each observation belongs to. By default, the observations are randomly assigned.

seed

You may set the seed of the random number generator in order to obtain reproducible results.

trace

If set to TRUE, inform the user of progress by announcing the beginning of each CV fold. Default is FALSE.

save_rds

Optional: if a filepath and name without the .rds suffix is specified (e.g., save_rds = "~/dir/my_results"), then the model results are saved to the provided location (e.g., "~/dir/my_results.rds"). Defaults to NULL, which does not save the result. Note: Along with the model results, two .rds files ('loss' and 'yhat') will be created in the same directory as save_rds. These files contain the loss and predicted outcome values in each fold; both files will be updated during after prediction within each fold.

return_fit

Optional: a logical value indicating whether the fitted model should be returned as a plmm object in the current (assumed interactive) session. Defaults to TRUE.

...

Additional arguments to plmm_fit

Value

A list that includes 14 items:

type: The type of prediction used ('lp' or 'blup')
cve: A numeric vector with the cross validation error (CVE) at each value of lambda
cvse: A numeric vector with the estimated standard error associated with each value of cve
fold: A numeric n length vector of integers indicating the fold to which each observation was assigned
lambda: A numeric vector of lambda values
fit: The overall fit of the object, including all predictors; this is a list as returned by plmm()
min: The index corresponding to the value of lambda that minimizes cve
lambda_min: The lambda value at which cve is minimized
min1se: The index corresponding to the value of lambda within 1 standard error of that which minimizes cve
lambda1se: The largest value of lambda such that cve is within 1 standard error of the minimum
null.dev: A numeric value representing the deviance for the intercept-only model. If you have supplied your own lambda sequence, this quantity may not be meaningful.
Y: A matrix with the predicted outcome (\hat{y}) values at each value of lambda. Rows are observations, columns are values of lambda.
loss: A matrix with the loss values at each value of lambda. Rows are observations, columns are values of lambda.
estimated_Sigma: If type = 'blup', an n x n matrix representing the estimated covariance matrix.

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
cv_fit <- cv_plmm(design = admix_design)
print(summary(cv_fit))
plot(cv_fit)

Cross-validation internal function for `cv_plmm()`

Description

Internal function for cv_plmm() which calls plmm() on a fold subset of the original data.

Usage

cvf(i, fold, type, cv_args, ...)

Arguments

i

Fold number to be excluded from fit.

fold

n-length vector of fold-assignments.

type

A character argument indicating what should be returned from predict.plmm(). If type = 'lp' predictions are based on the linear predictor, X \beta. If type = 'blup', predictions are based on the linear predictor plus the estimated random effect (BLUP).

cv_args

List of additional arguments to be passed to plmm.

...

Optional arguments to predict_within_cv()

Value

A list with three elements:

loss: a numeric vector with the loss at each value of lambda
nl: a numeric value indicating the number of lambda values used
yhat: a numeric value with the predicted outcome values at each lambda

A function to take the eigendecomposition of K

Description

Note: This is faster than taking SVD of X when p >> n

Usage

eigen_K(std_X)

Arguments

std_X

The standardized design matrix.

Value

A list with three elements:

s: The non-zero eigenvalues of K
U: The eigenvectors of K associated with s
K: The fully computed K matrix

Estimate eta (to be used in rotating the data)

Description

This function is called internally by plmm()

Usage

estimate_eta(n, s, U, y, incpt_flag)

Arguments

n

The number of observations

s

The non-zero eigenvalues of K, the realized relationship matrix

U

The eigenvectors of K associated with s

y

Continuous outcome vector

incpt_flag

Logical: Does the model require fitting an intercept?

Value

a numeric value with the estimated value of eta, the variance parameter

A function to help with accessing example PLINK files.

Description

A function to help with accessing example PLINK files.

Usage

find_example_data(path, parent = FALSE)

Arguments

path

Argument (string) specifying a path (filename) for an external data file in ⁠extdata/⁠.

parent

If the user wants the name of the parent directory where the example data is located, set parent = TRUE. Defaults to FALSE.

Value

If path = NULL, a character vector of file names is returned. If path is given, then a character string with the full file path.

Examples

find_example_data(parent = TRUE)

Read in processed data

Description

This function is intended to be called after either process_plink() or process_delim() has been called once.

Usage

get_data(path, returnX = FALSE, trace = TRUE)

Arguments

path

The file path to the RDS object containing the processed data. Do not add the .rds extension to the path.

returnX

Logical: should the design matrix be returned as a numeric matrix that will be stored in memory? Default is FALSE.

trace

Logical: should trace messages be shown? Default is TRUE.

Value

A list with these components:

std_X, the column-standardized design matrix as either (1) a numeric matrix or (2) a filebacked big.matrix object.
(if PLINK data) fam, a data frame containing the pedigree information (like a .fam file in PLINK)
(if PLINK data) map, a data frame containing the feature information (like a .bim file in PLINK)
ns: A vector indicating the which columns of X contain nonsingular features (i.e., features with variance != 0).
center: A vector of values for centering each column in X
scale: A vector of values for scaling each column in X

A function to impute SNP data

Description

A function to impute SNP data

Usage

impute_snp_data(
  obj,
  X,
  chr,
  impute,
  impute_method,
  parallel,
  outfile,
  quiet,
  seed = as.numeric(Sys.Date()),
  ...
)

Arguments

obj

A bigSNP object (as created by read_plink_files())

X

A matrix of genotype data as returned by name_and_count_bigsnp()

chr

A numeric vector of chromosomal locations of the SNPs.

impute

Logical: should data be imputed? Defaults to TRUE.

impute_method

If impute = TRUE, this argument will specify the kind of imputation desired. Options are:

mode (default): Imputes the most frequent call. See bigsnpr::snp_fastImputeSimple() for details.
random: Imputes sampling according to allele frequencies.
mean0: Imputes the rounded mean.
mean2: Imputes the mean rounded to 2 decimal places.
xgboost: Imputes using an algorithm based on local XGBoost models. See bigsnpr::snp_fastImpute() for details. Note: this can take several minutes, even for a relatively small data set.

parallel

Logical: should the computations within this function be run in parallel? Defaults to TRUE. See count_cores() and ?bigparallelr::assert_cores for more details. In particular, the user should be aware that too much parallelization can make computations slower.

outfile

Optional: the name (character string) of the prefix of the logfile to be written.

quiet

Logical: should console messages be silenced? Defaults to FALSE

seed

Numeric value to be passed as the seed for impute_method = 'xgboost'. Defaults to as.numeric(Sys.Date())

...

Optional: additional arguments to bigsnpr::snp_fastImpute() (relevant only if impute_method = 'xgboost')

Value

Nothing is returned, but the obj$genotypes is overwritten with the imputed version of the data

A function to align genotype and phenotype data

Description

A function to align genotype and phenotype data

Usage

index_samples(
  obj,
  rds_dir,
  indiv_id,
  add_outcome,
  outcome_id,
  outcome_col,
  na_outcome_vals,
  outfile,
  quiet
)

Arguments

obj

An object created by process_plink()

rds_dir

The path to the directory in which you want to create the new .rds and .bk files.

indiv_id

A character string indicating the ID column name in the 'fam' element of the genotype data list. Defaults to 'sample.ID', equivalent to 'IID' in PLINK. The other option is 'family.ID', equivalent to 'FID' in PLINK.

add_outcome

A data frame with at least two columns: an ID column and a phenotype column

outcome_id

A string specifying the name of the ID column in add_outcome

outcome_col

A string specifying the name of the phenotype column in add_outcome. This column will be used as the default y argument to plmm().

na_outcome_vals

A vector of numeric values used to code NA values in the outcome. Defaults to c(-9, NA_integer) (the -9 matches PLINK conventions).

outfile

A string with the name of the filepath for the log file

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

a list with two items:

complete_samples: a data.table with rows corresponding to the samples for which both genotype and phenotype are available.
outcome_idx: a numeric vector with indices indicating which samples were 'complete' (i.e., which samples from add_outcome had corresponding data in the PLINK files)

Generate nicely formatted lambda vector

Description

Generate nicely formatted lambda vector

Usage

lam_names(l)

Arguments

l

Vector of lambda values.

Value

A character vector of formatted lambda value names

Evaluate the negative log-likelihood of an intercept-only Gaussian plmm model

Description

This function allows you to evaluate the negative log-likelihood of a linear mixed model under the assumption of a null model in order to estimate the variance parameter, eta.

Usage

log_lik(eta, n, s, U, y, incpt_flag)

Arguments

eta

Estimated proportion of the variance in the outcome attributable to population/correlation structure

n

The number of observations

s

The non-zero eigenvalues of K, the realized relationship matrix

U

The eigenvectors of K associated with s

y

Continuous outcome vector

incpt_flag

Logical: Does the model require fitting an intercept? Passed from estimate_eta.

Value

the value of the log-likelihood of the PLMM, evaluated with the supplied parameters

A helper function to label and summarize the contents of a `bigSNP`

Description

A helper function to label and summarize the contents of a bigSNP

Usage

name_and_count_bigsnp(obj, id_var, outfile, quiet)

Arguments

obj

a bigSNP object, possibly subset by add_external_phenotype()

id_var

String specifying which column of the PLINK .fam file has the unique sample identifiers. Options are "IID" (default) and "FID".

outfile

The string with the name of the .log file

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

a list with 7 components:

na_counts: vector of missing SNP counts in genotypes
obj: a modified bigSNP list with additional components
og_plink_ids: either the IID or FID column from .fam, determined by id_var
chr: p-length containing the chromosomes for each SNP
X: the obj$genotypes as its own FBM
pos: vector of physical positions of the SNPs
chr_range: vector containing the minimum and maximum values of chr. Character strings are treated as the maximum.

Fit a linear mixed model via penalized maximum likelihood.

Description

Fit a linear mixed model via penalized maximum likelihood.

Usage

plmm(
  design,
  y = NULL,
  K = NULL,
  eta = NULL,
  penalty = "lasso",
  init = NULL,
  gamma,
  alpha = 1,
  lambda_min,
  nlambda = 100,
  lambda,
  eps = 1e-04,
  max_iter = 10000,
  dfmax = NULL,
  warn = TRUE,
  trace = FALSE,
  save_rds = NULL,
  return_fit = TRUE,
  ...
)

Arguments

design

y

K

Similarity matrix used to rotate the data. This should either be: (1) a known matrix that reflects the covariance of y, (2) an estimate (Default is \frac{1}{p}(XX^T)), or (3) a list with components s and U, as returned by a previous plmm() model fit on the same data.
Note: If a user provides their own K matrix, it is decomposed as provided and will not be scaled. User-provided K functionality is currently not supported for filebacked data.

eta

Optional argument to input a specific eta term rather than estimate it from the data. If K is a known covariance matrix that is full rank, this should be 1.

penalty

The penalty to be applied to the model. Either "lasso" (the default), "SCAD", or "MCP".

init

Initial values for coefficients. Default is 0 for all columns of X.

gamma

The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD.

alpha

lambda_min

The smallest value for lambda, as a fraction of the maximum lambda. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise.

nlambda

Length of the sequence of lambda. Default is 100.

lambda

A user-specified sequence of lambda values. By default, a sequence of values of length nlambda is computed, equally spaced on the log scale.

eps

Convergence threshold. The algorithm iterates until the RMSE for the change in linear predictors for each coefficient is less than eps. Default is 1e-4.

max_iter

Maximum number of iterations (total across entire path). Default is 10000.

dfmax

Maximum number of non-zero coefficients that may enter the model. Default is NULL (no maximum)

warn

Return warning messages for failures to converge and model saturation? Default is TRUE.

trace

If set to TRUE, inform the user of progress by announcing the beginning of each step of the modeling process. Default is FALSE.

save_rds

Optional: if a filepath and name without the .rds suffix is specified (e.g., save_rds = "~/dir/my_results"), then the model results are saved to the provided location (e.g., "~/dir/my_results.rds"). Accompanying the RDS file is a log file for documentation, e.g., "~/dir/my_results.log". Defaults to NULL, which does not save any RDS or log files.

return_fit

Optional: a logical value indicating whether the fitted model should be returned as a plmm object in the current (assumed interactive) session. Defaults to TRUE.

...

Additional optional arguments to plmm_checks()

Value

A list which includes 18 items:

beta_vals: The matrix of estimated coefficients. Rows are predictors (with the first row being the intercept), and columns are values of lambda.
std_Xbeta: A matrix of the linear predictors on the scale of the standardized design matrix. Rows are predictors, columns are values of lambda. Note: std_Xbeta will not include rows for the intercept or for constant features.
std_X_details: A list with 9 items:
- center: The center values used to center the columns of the design matrix
- scale: The scaling values used to scale the columns of the design matrix
- ns: An integer vector of the nonsingular columns of the original data
- unpen: An integer vector of indices of the unpenalized features, if any were specified in the design
- unpen_colnames: A character vector of the column names of any unpenalized features.
- X_colnames: A character vector with the column names of all features in the original design matrix
- X_rownames: A character vector with the row names of all features in the original design matrix; if none were provided, these are named 'row1', 'row2', etc.
- std_X_colnames: A subset of X_colnames representing only nonsingular columns (i.e., the columns indexed by ns)
- std_X_rownames: A subset of X_rownames representing rows that passed QC filtering & and are represented in both the genotype and phenotype data sets (this only applies to PLINK data)
std_X: If design matrix is filebacked, the descriptor for the filebacked data is returned using bigmemory::describe(). If the the data were stored in-memory, nothing is returned (std_X is NULL).
y: The outcome vector used in model fitting.
p: The total number of columns in the design matrix (including any singular columns, excluding the intercept).
plink_flag: Logical - did the data come from PLINK files?
lambda: A numeric vector of the tuning parameter values used in model fitting.
eta: A double between 0 and 1 representing the estimated proportion of the variance in the outcome attributable to population/correlation structure.
penalty: A character string indicating the penalty with which the model was fit (e.g., 'MCP')
gamma: A numeric value indicating the tuning parameter used for the SCAD or MCP penalties. Not relevant for lasso models.
alpha: A numeric value indicating the elastic net tuning parameter.
loss: A vector with the numeric values of the loss at each value of lambda (calculated on the ~rotated~ scale)
penalty_factor: A vector of indicators corresponding to each predictor, where 1 = predictor was penalized.
ns_idx: An integer vector with the indices of predictors which were non-singular features (i.e., features which had variation), where feature 1 is the intercept.
iter: An integer vector with the number of iterations needed in model fitting for each value of lambda
converged: A vector of logical values indicating whether the model fitting converged at each value of lambda
K: a list with 2 elements, s and U —
- s: a vector of the non-zero eigenvalues of the relatedness matrix K (note: K is the kinship matrix for genetic/genomic data; see the article on notation for details)
- U: a matrix of the eigenvectors of K associated with s

Examples

# using admix data
fit <- plmm(admix$X, admix$y)
s <- summary(fit, idx = 50)
print(s)
plot(fit)

A function to perform checks on passed objects before model fitting.

Description

A function to perform checks on passed objects before model fitting.

Usage

plmm_checks(
  design,
  K = NULL,
  eta = NULL,
  penalty = "lasso",
  init = NULL,
  gamma,
  alpha = 1,
  dfmax = NULL,
  trace = FALSE,
  save_rds = NULL,
  return_fit = TRUE,
  ...
)

Arguments

design

The design object, as created by create_design()

K

eta

Optional argument to input a specific eta term rather than estimate it from the data. If K is a known covariance matrix that is full rank, this should be 1.

penalty

The penalty to be applied to the model. Either "MCP" (the default), "SCAD", or "lasso".

init

Initial values for coefficients. Default is 0 for all columns of X.

gamma

The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD.

alpha

dfmax

Maximum number of non-zero coefficients that may enter the model. Default is NULL (no maximum)

trace

If set to TRUE, inform the user of progress by announcing the beginning of each step of the modeling process. Default is FALSE.

save_rds

Optional: if a filepath and name is specified (e.g., save_rds = "~/dir/my_results.rds"), then the model results are saved to the provided location. Defaults to NULL, which does not save the result.

return_fit

Optional: a logical value indicating whether the fitted model should be returned as a plmm object in the current (assumed interactive) session. Defaults to TRUE.

...

Additional arguments to get_data()

Value

A list which includes 16 items:

std_X: The standardized design matrix. If design matrix is filebacked, the descriptor for the filebacked data is returned using bigmemory::describe().
std_X_details: Metadata for std_X.
std_X_n: Number of rows in std_X.
std_X_p: Number of columns in std_X.
y: Original outcome vector.
y_name: Variable name of y.
centered_y: The centered outcome vector.
K: The relationship matrix (as passed by plmm(), may be NULL)
eta: Estimated proportion of the variance in the outcome attributable to population/correlation structure (as passed by plmm(), may be NULL)
fbm_flag: Logical, is std_X filebacked?
plink_flag: Logical, does std_X originate from PLINK files?
penalty: A character string indicating the penalty type.
gamma: Tuning parameter for the SCAD or MCP penalties.
init: Initialized values for beta coefficients.
dfmax: Maximum number of non-zero coefficients that may enter the model.
n: Number of rows in the original design matrix prior to standardization procedures.
p: Number of columns in the original design matrix prior to standardization procedures.

PLMM fit: A function that fits a PLMM using the values returned by `plmm_prep()`

Description

PLMM fit: A function that fits a PLMM using the values returned by plmm_prep()

Usage

plmm_fit(
  prep,
  y,
  std_X_details,
  fbm_flag,
  penalty,
  gamma = 3,
  alpha = 1,
  lambda_min,
  nlambda = 100,
  lambda,
  eps = 1e-04,
  max_iter = 10000,
  init = NULL,
  dfmax = NULL,
  warn = TRUE,
  ...
)

Arguments

prep

A list as returned from plmm_prep

y

The original (not centered) outcome vector. Need this for intercept estimate

std_X_details

A list with components center (values used to center X), scale (values used to scale X), and ns (indices for nonsingular columns of X)

fbm_flag

Logical: is std_X a filebacked big.matrix object? Passed from plmm().

penalty

The penalty to be applied to the model. Either "MCP" (the default), "SCAD", or "lasso".

gamma

The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD.

alpha

lambda_min

The smallest value for lambda, as a fraction of the maximum lambda. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise.

nlambda

Length of the sequence of lambda. Default is 100.

lambda

A user-specified sequence of lambda values. By default, a sequence of values of length nlambda is computed, equally spaced on the log scale.

eps

Convergence threshold. The algorithm iterates until the RMSE for the change in linear predictors for each coefficient is less than eps. Default is 1e-4.

max_iter

Maximum number of iterations (total across entire path). Default is 10000.

init

Initial values for coefficients. Default is 0 for all columns of X.

dfmax

Maximum number of non-zero coefficients that may enter the model. Default is NULL (no maximum).

warn

Return warning messages for failures to converge and model saturation? Default is TRUE.

...

Additional arguments that can be passed to biglasso::biglasso_simple_path()

Value

A list which includes 21 items:

y: The outcome vector used in model fitting.
std_scale_beta: The matrix of estimated coefficients on the standardized scale. Rows are predictors (with the first row being the intercept), and columns are values of lambda.
std_Xbeta: A matrix of the linear predictors on the scale of the standardized design matrix. Rows are predictors, columns are values of lambda. Note: std_Xbeta will not include rows for the intercept or for constant features.
centered_y: The centered outcome vector.
s: a vector of the non-zero eigenvalues of the relatedness matrix K (note: K is the kinship matrix for genetic/genomic data; see the article on notation for details)
U: a matrix of the eigenvectors of K associated with s
lambda: A numeric vector of the tuning parameter values used in model fitting.
penalty: A character string indicating the penalty with which the model was fit (e.g., 'MCP')
penalty_factor: A vector of indicators corresponding to each predictor, where 1 = predictor was penalized.
iter: An integer vector with the number of iterations needed in model fitting for each value of lambda
converged: A vector of logical values indicating whether the model fitting converged at each value of lambda
loss: A vector with the numeric values of the loss at each value of lambda (calculated on the ~rotated~ scale)
eta: A double between 0 and 1 representing the estimated proportion of the variance in the outcome attributable to population/correlation structure.
gamma: A numeric value indicating the tuning parameter used for the SCAD or MCP penalties. Not relevant for lasso models.
alpha: A numeric value indicating the elastic net tuning parameter.
nlambda Length of the sequence of lambda.
eps: Convergence threshold. The algorithm iterates until the RMSE for the change in linear predictors for each coefficient is less than eps
max_iter: Maximum number of iterations (total across entire path)
warn: Return warning messages for failures to converge and model saturation?
trace: If set to TRUE, inform the user of progress by announcing the beginning of each step of the modeling process
std_X: If design matrix is filebacked, the descriptor for the filebacked data is returned using bigmemory::describe().

PLMM format: a function to format the output of a model constructed with `plmm_fit()`

Description

PLMM format: a function to format the output of a model constructed with plmm_fit()

Usage

plmm_format(fit, p, std_X_details, fbm_flag, plink_flag)

Arguments

fit

A list of parameters describing the output of a model constructed with plmm_fit()

p

The number of features in the original data (including constant features)

std_X_details

A list with 3 items:

center: the centering values for the columns of X
scale: the scaling values for the non-singular columns of X
ns: indices of nonsingular columns in std_X

fbm_flag

Logical: is the corresponding design matrix filebacked? Passed from plmm().

plink_flag

Logical: did these data come from PLINK files? Note: This flag matters because of how non-genomic features are handled for PLINK files – in data from PLINK files, unpenalized columns are not counted in the p argument. For delimited files, p does include unpenalized columns. This difference has implications for how the untransform() function determines the appropriate dimensions for the estimated coefficient matrix it returns.

Value

A list with 18 components:

beta_vals: the matrix of estimated coefficients on the original scale. Rows are predictors, columns are values of lambda
std_Xbeta: A matrix of the linear predictors on the scale of the standardized design matrix. Rows are predictors, columns are values of lambda. Note: std_Xbeta will not include rows for the intercept or for constant features.
std_X_details: A list with 9 items:
- center: The center values used to center the columns of the design matrix
- scale: The scaling values used to scale the columns of the design matrix
- ns: An integer vector of the nonsingular columns of the original data
- unpen: An integer vector of indices of the unpenalized features, if any were specified in the design
- unpen_colnames: A character vector of the column names of any unpenalized features.
- X_colnames: A character vector with the column names of all features in the original design matrix
- X_rownames: A character vector with the row names of all features in the original design matrix; if none were provided, these are named 'row1', 'row2', etc.
- std_X_colnames: A subset of X_colnames representing only nonsingular columns (i.e., the columns indexed by ns)
- std_X_rownames: A subset of X_rownames representing rows that passed QC filtering & and are represented in both the genotype and phenotype data sets (this only applies to PLINK data)
y: The original outcome vector.
p: The total number of columns in the design matrix (including any singular columns, excluding the intercept).
plink_flag: Logical - did the data come from PLINK files?
lambda: a numeric vector of the lasso tuning parameter values used in model fitting.
eta: a number (double) between 0 and 1 representing the estimated proportion of the variance in the outcome attributable to population/correlation structure.
penalty: character string indicating the penalty with which the model was fit (e.g., 'MCP')
gamma: numeric value indicating the tuning parameter used for the SCAD or lasso penalties was used. Not relevant for lasso models.
alpha: numeric value indicating the elastic net tuning parameter.
loss: vector with the numeric values of the loss at each value of lambda (calculated on the ~rotated~ scale)
penalty_factor: vector of indicators corresponding to each predictor, where 1 = predictor was penalized.
ns_idx: vector with the indices of predictors which were nonsingular features (i.e., had variation).
iter: numeric vector with the number of iterations needed in model fitting for each value of lambda
converged: vector of logical values indicating whether the model fitting converged at each value of lambda
K: a list with 2 elements, s and U —
- s: a vector of the non-zero eigenvalues of the relatedness matrix K (note: K is the kinship matrix for genetic/genomic data; see the article on notation for details)
- U: a matrix of the eigenvectors of K associated with s
std_X: If design matrix is filebacked, the descriptor for the filebacked data is returned using bigmemory::describe().

Loss method for `plmm` class

Description

Loss method for plmm class

Usage

plmm_loss(y, yhat)

Arguments

y

Observed outcomes (response) vector

yhat

Predicted outcomes (response) vector

Value

A numeric vector of the squared-error loss values for the given observed and predicted outcomes

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
fit <- plmm(design = admix_design)
yhat <- predict(object = fit, newX = admix$X, type = 'lp', lambda = 0.05)
head(plmm_loss(yhat = yhat, y = admix$y))

PLMM prep: a function to run checks, eigendecomposition, and rotation prior to fitting a PLMM model

Description

This is an internal function for plmm()

Usage

plmm_prep(
  std_X,
  std_X_n,
  std_X_p,
  centered_y,
  penalty_factor,
  K = NULL,
  eta = NULL,
  fbm_flag,
  trace = NULL,
  ...
)

Arguments

std_X

Column standardized design matrix. May include clinical covariates and other non-SNP data.

std_X_n

The number of observations in std_X (integer)

std_X_p

The number of features in std_X (integer)

centered_y

Continuous outcome vector, centered.

penalty_factor

A multiplicative factor for the penalty applied to each coefficient.

K

eta

Optional argument to input a specific eta term rather than estimate it from the data. If K is a known covariance matrix that is full rank, this should be 1.

fbm_flag

Logical: is std_X a filebacked big.matrix object? This is set internally by plmm().

trace

If set to TRUE, inform the user of progress by announcing the beginning of each step of the modeling process. Default is FALSE.

...

Not used

Value

List with these components:

std_X: Standardized design matrix. If design matrix is filebacked, the descriptor for the filebacked data is returned using bigmemory::describe().
centered_y: Vector of centered outcomes
K: Similarity matrix
s: Vector of the non-zero eigenvalues of K
U: Matrix of eigenvectors of K associated with s (same as left singular values of X).
eta: The numeric value of the estimated eta parameter
penalty_factor A multiplicative factor for the penalty applied to each coefficient.
incpt_flag Logical: Does the model require fitting an intercept?
trace: If set to TRUE, inform the user of progress by announcing the beginning of each step of the modeling process

Plot method for `cv_plmm` class

Description

Plot method for cv_plmm class

Usage

## S3 method for class 'cv_plmm'
plot(
  x,
  log.l = TRUE,
  type = c("cve", "rsq", "scale", "snr", "all"),
  selected = TRUE,
  vertical.line = TRUE,
  col = "red",
  ...
)

Arguments

x

An object of class cv_plmm

log.l

Logical to indicate the plot should be returned on the natural log scale. Defaults to TRUE.

type

Type of plot to return. Options include:

cve: cross-validation error
rsq: estimated fraction of the deviance explained by the model (R^2)
scale: estimated standard deviation
snr: estimated signal-to-noise ratio
all: all of the above

selected

Logical to indicate if the number of variables selected should be plotted on the top axis. Defaults to TRUE.

vertical.line

Logical to indicate whether a vertical line should be plotted at the minimum/maximum value. Defaults to TRUE.

col

Color for the points along the CV curve. Defaults to "red".

...

Additional arguments.

Value

Nothing is returned; instead, a plot is drawn representing the relationship between the tuning parameter lambda value (x-axis) and the cross validation error (y-axis).

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
cvfit <- cv_plmm(design = admix_design)
plot(cvfit)

Plot method for `plmm` class

Description

Plot method for plmm class

Usage

## S3 method for class 'plmm'
plot(x, alpha = 1, log.l = FALSE, shade = TRUE, col, ...)

Arguments

x

An object of class plmm

alpha

log.l

Logical to indicate the plot should be returned on the natural log scale. Defaults to FALSE.

shade

Logical to indicate whether a local nonconvex region should be shaded. Defaults to TRUE.

col

Vector of colors for coefficient lines.

...

Additional arguments.

Value

Nothing is returned; instead, a plot of the coefficient paths is drawn at each value of lambda (one 'path' for each coefficient).

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
fit <- plmm(design = admix_design)
plot(fit)
plot(fit, log.l = TRUE)

Predict method for `cv_plmm` class

Description

Predict method for cv_plmm class

Usage

## S3 method for class 'cv_plmm'
predict(
  object,
  newX,
  type = c("blup", "coefficients", "vars", "nvars", "lp"),
  X,
  lambda,
  idx = object$min,
  ...
)

Arguments

object

An object of class cv_plmm.

newX

Matrix of values at which predictions are to be made (not used for type = "coefficients", "vars", or "nvars"). This can be either a filebacked big.matrix or a matrix object. Note: Columns of this argument must be named!

type

A character argument indicating what type of prediction should be returned. Options are "lp," "coefficients," "vars," "nvars," and "blup." See details.

X

Optional: if type = 'blup' and the model was fit in-memory, the design matrix used to fit the model represented in object must be supplied. When supplied, this design matrix will be standardized using the center/scale values in object$std_X_details, so please do not standardize this matrix before supplying here. Note: If the model was fit file-backed, then the filepath to the .bk file with this standardized design matrix is returned as std_X in the fit supplied to object.

lambda

A numeric vector of regularization parameter lambda values at which predictions are requested.

idx

Vector of indices of regularization parameter lambda at which predictions are requested. By default, this is the lambda index which minimizes the cross-validation error.

...

Additional optional arguments

Details

Define beta-hat as the coefficients estimated at the value of lambda that minimizes cross-validation error (CVE). Then options for type are as follows:

lp (linear predictor): uses the product of newX and the beta coefficients of object to predict new values of the outcome. This does not incorporate the correlation structure of the data.
blup (acronym for Best Linear Unbiased Predictor): adds to the lp a value that represents the estimated random effect. This addition is a way of incorporating the estimated correlation structure of data into our prediction of the outcome.
coefficients: returns the estimated beta-hat
vars: returns the indices of variables (e.g., SNPs) with nonzero coefficients at each value of lambda. EXCLUDES intercept.
nvars: returns the number of variables (e.g., SNPs) with nonzero coefficients at each value of lambda. EXCLUDES intercept.

Value

Depends on the type - see Details

Examples

set.seed(123)
train_idx <- sample(1:nrow(admix$X), 100)
# Note: ^ shuffling is important here! Keeps test and train groups comparable.
train <- list(X = admix$X[train_idx,], y = admix$y[train_idx])
train_design <- create_design(X = train$X, y = train$y)

test <- list(X = admix$X[-train_idx,], y = admix$y[-train_idx])
fit <- cv_plmm(design = train_design)

pred1 <- predict(object = fit, newX = test$X, X = train$X) # Minimum CVE lambda
pred2 <- predict(object = fit, newX = test$X, X = train$X, idx = fit$min1se) # 1 SE lambda

Predict method for `plmm` class

Description

Predict method for plmm class

Usage

## S3 method for class 'plmm'
predict(
  object,
  newX,
  type = c("blup", "coefficients", "vars", "nvars", "lp"),
  X = NULL,
  lambda,
  idx = seq_along(object$lambda),
  ...
)

Arguments

object

An object of class plmm.

newX

type

A character argument indicating what type of prediction should be returned. Options are "lp," "coefficients," "vars," "nvars," and "blup." See details.

X

lambda

A numeric vector of regularization parameter lambda values at which predictions are requested.

idx

Vector of indices of regularization parameter lambda at which predictions are requested. By default, all indices are returned.

...

Additional optional arguments

Details

The options for type are as follows:

lp (linear predictor): uses the product of newX and the beta coefficients of object to predict new values of the outcome. This does not incorporate the correlation structure of the data.
blup (default, acronym for Best Linear Unbiased Predictor): adds to the lp a value that represents the estimated random effect. This addition is a way of incorporating the estimated correlation structure of the data into our prediction of the outcome.
coefficients: returns the estimated beta coefficients.
vars: returns the indices of variables (e.g., SNPs) with nonzero coefficients at each value of lambda. EXCLUDES intercept.
nvars: returns the number of variables (e.g., SNPs) with nonzero coefficients at each value of lambda. EXCLUDES intercept.

Value

Depends on the type - see Details

Examples

set.seed(123)
train_idx <- sample(1:nrow(admix$X), 100)
# Note: ^ shuffling is important here! Keeps test and train groups comparable.
train <- list(X = admix$X[train_idx,], y = admix$y[train_idx])
train_design <- create_design(X = train$X, y = train$y)

test <- list(X = admix$X[-train_idx,], y = admix$y[-train_idx])
fit <- plmm(design = train_design)

# make predictions for all lambda values
 pred1 <- predict(object = fit, newX = test$X, type = "lp")
 pred2 <- predict(object = fit, newX = test$X, type = "blup", X = train$X)

# look at mean squared prediction error
mspe <- apply(pred1, 2, function(c){crossprod(test$y - c)/length(c)})
min(mspe)

mspe_blup <- apply(pred2, 2, function(c){crossprod(test$y - c)/length(c)})
min(mspe_blup) # BLUP is better

# compare the MSPE of our model to a null model, for reference
# null model = intercept only -> y_hat is always mean(y)
crossprod(mean(test$y) - test$y)/length(test$y)

Predict method to use in cross-validation (within `cvf()`)

Description

Predict method to use in cross-validation (within cvf())

Usage

predict_within_cv(fit, testX, type, fbm = FALSE, Sigma_21 = NULL)

Arguments

fit

A list with the components returned by plmm_fit.

testX

A design matrix used for computing predicted values (i.e, the test data).

type

A character argument indicating what type of prediction should be returned. Passed from cvf(). Options are "lp," "coefficients," "vars," "nvars," and "blup." See details.

fbm

Logical: is trainX a filebacked big.matrix object? If so, this function expects that testX is also an FBM. The two X matrices must be stored the same way.

Sigma_21

Covariance matrix between the training and the testing data. Required if type == 'blup'.

Details

lp (linear predictor): uses the product of testX and the beta coefficients of fit to predict new values of the outcome. This does not incorporate the correlation structure of the data.
blup (acronym for Best Linear Unbiased Predictor): adds to the 'lp“ a value that represents the estimated random effect. This addition is a way of incorporating the estimated correlation structure of data into our prediction of the outcome.
coefficients: returns the estimated beta-hat
vars: returns the indices of variables (e.g., SNPs) with nonzero coefficients at each value of lambda. EXCLUDES intercept.
nvars: returns the number of variables (e.g., SNPs) with nonzero coefficients at each value of lambda. EXCLUDES intercept.

Note: the main difference between this function and the predict.plmm() method is that here in CV, the standardized testing data (std_test_X), Sigma_11, and Sigma_21 are calculated in cvf() instead of the function defined here.

Value

A numeric vector of predicted values

A function to format the time

Description

A function to format the time

Usage

pretty_time()

Value

A string with the formatted current date and time

Print method for `summary.cv_plmm` objects

Description

Print method for summary.cv_plmm objects

Usage

## S3 method for class 'summary.cv_plmm'
print(x, digits, ...)

Arguments

x

An object of class summary.cv_plmm

digits

The number of digits to use in formatting output

...

Not used

Value

Nothing is returned; instead, a message is printed to the console summarizing the results of the cross-validated model fit.

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
cv_fit <- cv_plmm(design = admix_design)
print(summary(cv_fit))

A function to print the summary of a `plmm` model

Description

A function to print the summary of a plmm model

Usage

## S3 method for class 'summary.plmm'
print(x, ...)

Arguments

x

A summary.plmm object

...

Not used

Value

Nothing is returned; instead, a message is printed to the console summarizing the results of the model fit.

Examples

lam <- rev(seq(0.01, 1, length.out=20)) |> round(2) # for sake of example
admix_design <- create_design(X = admix$X, y = admix$y)
fit <- plmm(design = admix_design, lambda = lam)
fit2 <- plmm(design = admix_design, penalty = "SCAD", lambda = lam)
print(summary(fit, idx = 18))
print(summary(fit2, idx = 18))

A function to read in large data files as a filebacked `big.matrix`

Description

A function to read in large data files as a filebacked big.matrix

Usage

process_delim(
  data_dir,
  data_file,
  feature_id,
  rds_dir = data_dir,
  rds_prefix,
  logfile = NULL,
  overwrite = FALSE,
  quiet = FALSE,
  ...
)

Arguments

data_dir

The directory to the file.

data_file

The file to be read in, without the filepath. This should be a file of numeric values. Example: use data_file = "myfile.txt", not data_file = "~/mydirectory/myfile.txt" Note: if your file has headers/column names, set header = TRUE – this will be passed into bigmemory::read.big.matrix().

feature_id

A string specifying the column in the data X (the feature data) with the row IDs (e.g., identifiers for each row/sample/participant/, etc.). No duplicates allowed.

rds_dir

The directory where the user wants to create the .rds and .bk files. Defaults to data_dir

rds_prefix

String specifying the user's preferred filename for the to-be-created .rds file (will be create inside rds_dir folder). Note: rds_prefix cannot be the same as data_prefix

logfile

Optional: the name (character string) of the prefix of the logfile to be written in rds_dir. Default to NULL (no log file written). Note: do not append a .log to the filename; this is done automatically.

overwrite

Logical: if existing .bk/.rds files exist for the specified directory/prefix, should these be overwritten? Defaults to FALSE. Set to TRUE if you want to change the imputation method you're using, etc.

quiet

Logical: should console messages be silenced? Defaults to FALSE

...

Optional: other arguments to be passed to bigmemory::read.big.matrix(). Note: sep is an option to pass here, as is header.

Value

The file path to the newly created .rds file

Examples

temp_dir <- tempdir()
colon_dat <- process_delim(data_file = "colon2.txt",
 data_dir = find_example_data(parent = TRUE), overwrite = TRUE,
 rds_dir = temp_dir, rds_prefix = "processed_colon2", sep = "\t", header = TRUE)

colon2 <- readRDS(colon_dat)
str(colon2)

Preprocess PLINK files using the `bigsnpr` package

Description

Preprocess PLINK files using the bigsnpr package

Usage

process_plink(
  data_dir,
  data_prefix,
  rds_dir = data_dir,
  rds_prefix = NULL,
  logfile = NULL,
  impute = TRUE,
  impute_method = "mode",
  id_var = "IID",
  parallel = TRUE,
  quiet = FALSE,
  overwrite = FALSE,
  ...
)

Arguments

data_dir

The path to the bed/bim/fam data files, without a trailing "/" (e.g., use data_dir = '~/my_dir', not data_dir = '~/my_dir/')

data_prefix

The prefix (as a character string) of the bed/fam data files (e.g., data_prefix = 'mydata')

rds_dir

The path to the directory in which you want to create the new .rds and .bk files. Defaults to data_dir

rds_prefix

String specifying the user's preferred filename for the to-be-created .rds file (will be create inside rds_dir folder). If no rds_prefix is provided, the processed data files will be returned in memory. Note: rds_prefix cannot be the same as data_prefix

logfile

impute

Logical: should data be imputed? Default to TRUE.

impute_method

If impute = TRUE, this argument will specify the kind of imputation desired. Options are:

mode (default): Imputes the most frequent call. See bigsnpr::snp_fastImputeSimple() for details.
random: Imputes sampling according to allele frequencies.
mean0: Imputes the rounded mean.
mean2: Imputes the mean rounded to 2 decimal places.
xgboost: Imputes using an algorithm based on local XGBoost models. See bigsnpr::snp_fastImpute() for details. Note: this can take several minutes, even for a relatively small data set.

id_var

String specifying which column of the PLINK .fam file has the unique sample identifiers. Options are "IID" (default) and "FID"

parallel

quiet

Logical: should console messages be silenced? Defaults to FALSE

overwrite

...

Optional: additional arguments to bigsnpr::snp_fastImpute() (relevant only if impute_method = 'xgboost')

Details

Three files are created in the location specified by rds_dir:

rds_prefix.rds: This is a list with three items: (1) X: the filebacked bigmemory::big.matrix object pointing to the imputed genotype data. This matrix has type double, which is important for downstream operations in create_design() (2) map: a data.frame with the PLINK bim data (i.e., the variant information) (3) fam: a data.frame with the PLINK fam data (i.e., the pedigree information)
rds_prefix.bk: This is the backing file that stores the numeric data of the genotype matrix.
rds_prefix.desc This is the description file, needed to attach the genotype matrix to the R session.

Note that process_plink() need only be run once for a given set of PLINK files; in subsequent data analysis/scripts, get_data() will access the .rds file.

For an example, see vignette on processing PLINK files.

Value

The filepath to the .rds object created; see details for explanation.

A function to read in a large file as a numeric file-backed matrix

Description

Note: this function is a wrapper for bigmemory::read.big.matrix()

Usage

read_data_files(
  data_file,
  data_dir,
  rds_dir,
  rds_prefix,
  outfile,
  overwrite,
  quiet,
  ...
)

Arguments

data_file

The name of the file to read, not including its directory. Directory should be specified in data_dir

data_dir

The path to the directory where data_file is

rds_dir

The path to the directory in which you want to create the new .rds and .bk files. Defaults to data_dir

rds_prefix

String specifying the user's preferred filename for the to-be-created .rds/.bk files (will be create inside rds_dir folder) Note: rds_prefix cannot be the same as data_file

outfile

Optional: the name (character string) of the prefix of the logfile to be written. Defaults to NULL (no log file written).

overwrite

Logical: if existing .bk/.rds files exist for the specified directory/prefix, should these be overwritten? Defaults to FALSE. Set to TRUE if you want to change the imputation method you're using, etc.

quiet

Logical: should console messages be silenced? Defaults to FALSE

...

Optional: other arguments to be passed to bigmemory::read.big.matrix(). Note: sep is an option to pass here.

Value

.rds, .bk, and .desc files are created in data_dir, and obj (a filebacked ⁠bigmemory big.matrix⁠ object) is returned. See bigmemory documentation for more info on the big.matrix class.

A function to read in PLINK files using `bigsnpr` methods

Description

A function to read in PLINK files using bigsnpr methods

Usage

read_plink_files(
  data_dir,
  data_prefix,
  rds_dir,
  rds_prefix,
  outfile,
  parallel,
  overwrite,
  quiet
)

Arguments

data_dir

The path to the bed/bim/fam data files, without a trailing "/" (e.g., use data_dir = '~/my_dir', not data_dir = '~/my_dir/')

data_prefix

The prefix (as a character string) of the bed/fam data files (e.g., prefix = 'mydata')

rds_dir

The path to the directory in which you want to create the new .rds and .bk files. Defaults to data_dir

rds_prefix

String specifying the user's preferred filename for the to-be-created .rds file (will be create inside rds_dir folder). If no rds_prefix is provided, the processed data files will be returned in memory. Note: rds_prefix cannot be the same as data_prefix

outfile

Optional: the name (character string) of the prefix of the logfile to be written. Defaults to NULL (no log written).

parallel

overwrite

quiet

Logical: should messages be printed to the console? Defaults to TRUE

Value

.rds and .bk files are created in data_dir, and obj (a bigSNP object) is returned. See bigsnpr documentation for more info on the bigSNP class.

Calculate a relatedness matrix

Description

Given a matrix of genotypes, this function estimates the genetic relatedness matrix (GRM, also known as the RRM, see Hayes et al. 2009, doi:10.1017/S0016672308009981) among the subjects: \frac{1}{p}(XX^T), where X is standardized.

Usage

relatedness_mat(X, std = TRUE)

Arguments

X

An n x p numeric matrix of genotypes (from fully-imputed data). Can be a filebacked big.matrix object. Note: This matrix should not include non-genetic features.

std

Logical: should X be standardized? If you set this to FALSE, you should have a good reason for doing so, as standardization is a best practice.

Value

An n x n numeric matrix capturing the genomic relatedness of the samples represented in X. In our notation, we call this matrix K for 'kinship'; this is also known as the GRM or RRM.

Examples

RRM <- relatedness_mat(X = admix$X)
RRM[1:5, 1:5]

A function to rotate filebacked data

Description

A function to rotate filebacked data

Usage

rotate_filebacked(prep, tocenter = TRUE, ...)

Arguments

prep

The object returned by plmm_prep()

tocenter

Should the matrix be centered in addition to scaled? Defaults to TRUE

...

Not used

Value

a list with 4 items:

stdrot_X: X on the rotated and re-standardized scale
rot_y: y on the rotated scale (a numeric vector)
stdrot_X_center: numeric vector of values used to center rot_X
stdrot_X_scale: numeric vector of values used to scale rot_X

Compute sequence of lambda values for `plmm` models

Description

Compute sequence of lambda values for plmm models

Usage

setup_lambda(X, y, alpha, lambda_min, nlambda, penalty_factor)

Arguments

X

Rotated and standardized design matrix which includes the intercept column if present. May include clinical covariates and other non-SNP data. This can be either a matrix or a filebacked big.matrix object.

y

Continuous outcome vector.

alpha

lambda_min

The smallest value for lambda, as a fraction of the maximum lambda. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise. A value of lambda_min = 0 is not supported.

nlambda

The desired number of lambda values in the sequence to be generated.

penalty_factor

A multiplicative factor for the penalty applied to each coefficient. If supplied, penalty_factor must be a numeric vector of length equal to the number of columns of X. The purpose of penalty_factor is to apply differential penalization if some coefficients are thought to be more likely than others to be in the model. In particular, penalty_factor can be 0, in which case the coefficient is always in the model without shrinkage.

Value

a numeric vector of lambda values, equally spaced on the log scale

A helper function to standardize a filebacked matrix

Description

A helper function to standardize a filebacked matrix

Usage

standardize_filebacked(X, outfile, quiet, tocenter = TRUE)

Arguments

X

A big.matrix object that has been subset &/or had any additional predictors appended as columns

outfile

Optional: the name (character string) of the logfile to be written.

quiet

Logical: should console messages be silenced? Defaults to FALSE

tocenter

Should the matrix be centered in addition to scaled? Defaults to TRUE.

Value

A list with a component called std_X - this is a filebacked big.matrix with column-standardized data. List also includes several other indices/meta-data on the standardized matrix

A helper function to standardize matrices

Description

A helper function to standardize matrices

Usage

standardize_in_memory(X, tocenter = TRUE)

Arguments

X

A matrix

tocenter

Should the matrix be centered in addition to scaled? Defaults to TRUE.

Details

This function is adapted from https://github.com/pbreheny/ncvreg/blob/master/R/std.R NOTE: this function returns a matrix in memory. For standardizing filebacked data, use standardize_filebacked() – see src/big_standardize.cpp

Value

A list containing the standardized X matrix and associated metadata

A helper function to subset `big.matrix` objects

Description

A helper function to subset big.matrix objects

Usage

subset_filebacked(X, new_file, complete_samples, ns, rds_dir, outfile, quiet)

Arguments

X

A filebacked big.matrix with the to-be-standardized design matrix

new_file

Optional user-specified new file for the to-be-created .rds/.bk files.

complete_samples

Numeric vector with indices marking the rows of the original data which have a non-missing entry in the 6th column of the .fam file

ns

Numeric vector with the indices of the non-singular columns

rds_dir

The path to the directory in which you want to create the new .rds and .bk files. Defaults to data_dir

outfile

Optional: the name (character string) of the logfile to be written.

quiet

Logical: should console messages be silenced? Defaults to FALSE

Value

A list with two components. First, a big.matrix object, subset_X, representing a design matrix wherein:

rows are subset to those with complete phenotype information
columns are subset so that no constant features remain – this is important for standardization downstream

The list also includes the integer vector ns which marks which columns of the original matrix were 'non-singular' (i.e. not constant features). The ns index plays an important role in plmm_format() and untransform()

A summary function for `cv_plmm` objects

Description

A summary function for cv_plmm objects

Usage

## S3 method for class 'cv_plmm'
summary(object, lambda = "min", ...)

Arguments

object

A cv_plmm object

lambda

The regularization parameter value at which inference should be reported. Can choose a numeric value, 'min', or '1se'. Defaults to 'min'.

...

Not used

Value

The return value is an object with S3 class summary.cv_plmm. The class has its own print method and contains the following list elements:

lambda_min: The lambda value at the minimum cross validation error
lambda.1se: The maximum lambda value within 1 standard error of the minimum cross validation error
penalty: The penalty applied to the fitted model
nvars: The number of non-zero coefficients at the selected lambda value
cve: The cross validation error at all folds
min: The minimum cross validation error
fit: The plmm fit used in the cross validation

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
cv_fit <- cv_plmm(design = admix_design)
summary(cv_fit)

A summary method for `plmm` objects

Description

A summary method for plmm objects

Usage

## S3 method for class 'plmm'
summary(object, lambda, idx, eps = 1e-05, ...)

Arguments

object

An object of class plmm

lambda

The regularization parameter value at which inference should be reported.

idx

Alternatively, lambda may be specified by an index; idx = 10 means: report inference for the 10th value of lambda along the regularization path. If both lambda and idx are specified, lambda takes precedence.

eps

If lambda is given, eps is the tolerance for difference between the given lambda value and a lambda value from the object. Defaults to 0.00001 (1e-5)

...

Not used

Value

The return value is an object with S3 class summary.plmm. The class has its own print method and contains the following list elements:

penalty: The penalty used by plmm (e.g. SCAD, MCP, lasso)
n: Number of instances/observations
std_X_n: the number of observations in the standardized data; the only time this would differ from n is if data are from PLINK and the external data does not include all the same samples
p: Number of regression coefficients (not including the intercept)
converged: Logical indicator for whether the model converged
lambda: The lambda value at which inference is being reported
lambda_char: A formatted character string indicating the lambda value
nvars: The number of nonzero coefficients (again, not including the intercept) at that value of lambda
nonzero: The column names indicating the nonzero coefficients in the model at the specified value of lambda

Examples

admix_design <- create_design(X = admix$X, y = admix$y)
fit <- plmm(design = admix_design)
summary(fit, idx = 97)

Untransform coefficient values back to the original scale

Description

This function unwinds the initial standardization of the data to obtain coefficient values on their original scale. It is called by plmm_format().

Usage

untransform(
  std_scale_beta,
  p,
  std_X_details,
  fbm_flag,
  plink_flag,
  use_names = TRUE
)

Arguments

std_scale_beta

The estimated coefficients on the standardized scale

p

The number of columns in the original design matrix

std_X_details

A list with 3 elements describing the standardized design matrix BEFORE rotation; this should have elements scale, center, and ns

fbm_flag

Logical: is the corresponding design matrix filebacked?

plink_flag

use_names

Logical: should names be added? Defaults to TRUE. Set to FALSE inside of cvf() helper, as ns will vary within CV folds.

Value

a matrix of estimated coefficients, untransformed_beta, that is on the scale of the original data.

Untransform coefficient values back to the original scale for file-backed data

Description

This function unwinds the initial standardization of the data to obtain coefficient values on their original scale. It is called by plmm_format().

Usage

untransform_delim(std_scale_beta, p, std_X_details, use_names = TRUE)

Arguments

std_scale_beta

The estimated coefficients on the standardized scale

p

The number of columns in the original design matrix

std_X_details

A list with 3 elements describing the standardized design matrix BEFORE rotation; this should have elements scale, center, and ns

use_names

Logical: should names be added? Defaults to TRUE. Set to FALSE inside of cvf() helper, as ns will vary within CV folds.

Value

a matrix of estimated coefficients, untransformed_beta, that is on the scale of the original data.

Untransform coefficient values back to the original scale in memory

Description

This function unwinds the initial standardization of the data to obtain coefficient values on their original scale. It is called by plmm_format().

Usage

untransform_in_memory(std_scale_beta, p, std_X_details, use_names = TRUE)

Arguments

std_scale_beta

The estimated coefficients on the standardized scale

p

The number of columns in the original design matrix

std_X_details

A list with 3 elements describing the standardized design matrix BEFORE rotation; this should have elements scale, center, and ns

use_names

Logical: should names be added? Defaults to TRUE. Set to FALSE inside of cvf() helper, as ns will vary within CV folds.

Value

a matrix of estimated coefficients, untransformed_beta, that is on the scale of the original data.

Untransform coefficient values back to the original scale for file-backed data

Description

This function unwinds the initial standardization of the data to obtain coefficient values on their original scale. It is called by plmm_format().

Usage

untransform_plink(std_scale_beta, p, std_X_details, use_names = TRUE)

Arguments

std_scale_beta

The estimated coefficients on the standardized scale

p

The number of columns in the original design matrix

std_X_details

A list with 3 elements describing the standardized design matrix BEFORE rotation; this should have elements scale, center, and ns

use_names

Logical: should names be added? Defaults to TRUE. Set to FALSE inside of cvf() helper, as ns will vary within CV folds.

Value

a matrix of estimated coefficients, untransformed_beta, that is on the scale of the original data.

Companion function to unzip the `.gz` files that ship with the `plmmr` package.

Description

Companion function to unzip the .gz files that ship with the plmmr package.

Usage

unzip_example_data(outdir)

Arguments

outdir

The file path to the directory to which the .gz files should be written.

Details

For an example of this function, look at vignette('plink_files', package = "plmmr").

Value

Nothing is returned; the PLINK files that ship with the plmmr package are stored in the directory specified by outdir.

Package {plmmr}

plmmr: Penalized Linear Mixed Models for Correlated Data

Description

Author(s)

See Also

A helper function to add predictors to a filebacked matrix of data

Description

Usage

Arguments

Value

Admix: Semi-simulated SNP data

Description

Usage

Format

Source

A helper function to support create_design_filebacked()

Description

Usage

Arguments

Value

A version of cbind() for file-backed matrices

Description

Usage

Arguments

Value

Coef method for cv_plmm class

Description

Usage

Arguments

Value

Examples

Coef method for plmm class

Description

Usage

Arguments

Value

Examples

A function to compute the BLUP

Description

Usage

Arguments

Value

A function to construct the estimated variance matrix from a PLMM fit

Description

Usage

Arguments

Value

A helper function to count constant features

Description

Usage

Arguments

Value

A helper function to count the number of cores available on the current machine

Description

Usage

Value

A function to create a design for PLMM modeling

Description

Usage

Arguments

Details

Value

Examples

A function to create a design matrix, outcome, and penalty factor to be passed to a model fitting function

Description

Usage

Arguments

Value

A function to create a design with an in-memory X matrix

Description

Usage

Arguments

Value

Create the .log file

Description

Usage

Arguments

Value

Cross-validation for plmm

Description

A helper function to support `create_design_filebacked()`

A version of `cbind()` for file-backed matrices

Coef method for `cv_plmm` class

Coef method for `plmm` class

Create the `.log` file

Cross-validation internal function for `cv_plmm()`

A helper function to label and summarize the contents of a `bigSNP`

PLMM fit: A function that fits a PLMM using the values returned by `plmm_prep()`