% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cv-plmm.R
\name{cv_plmm}
\alias{cv_plmm}
\title{Cross-validation for plmm}
\usage{
cv_plmm(
  design,
  y = NULL,
  K = NULL,
  diag_K = NULL,
  eta_star = NULL,
  penalty = "lasso",
  type = "blup",
  gamma,
  alpha = 1,
  lambda_min,
  nlambda = 100,
  lambda,
  eps = 1e-04,
  max_iter = 10000,
  warn = TRUE,
  init = NULL,
  cluster,
  nfolds = 5,
  seed,
  fold = NULL,
  trace = FALSE,
  save_rds = NULL,
  return_fit = TRUE,
  ...
)
}
\arguments{
\item{design}{The first argument must be one of three things:
(1) \code{plmm_design} object (as created by \code{create_design()})
(2) a string with the file path to a design object (the file path must end in '.rds')
(3) a \code{matrix} or \code{data.frame} object representing the design matrix of interest}

\item{y}{Optional: In the case where \code{design} is a \code{matrix} or \code{data.frame}, the user must also supply
a numeric outcome vector as the \code{y} argument. In this case, \code{design} and \code{y} will be passed
internally to \code{create_design(X = design, y = y)}.}

\item{K}{Similarity matrix used to rotate the data. This should either be (1) a known matrix that reflects the covariance of y, (2) an estimate (Default is \eqn{\frac{1}{p}(XX^T)}), or (3) a list with components 's' and 'u', as returned by choose_k().}

\item{diag_K}{Logical: should K be a diagonal matrix? This would reflect observations that are unrelated, or that can be treated as unrelated. Defaults to FALSE.
Note: plmm() does not check to see if a matrix is diagonal. If you want to use a diagonal K matrix, you must set diag_K = TRUE.}

\item{eta_star}{Optional argument to input a specific eta term rather than estimate it from the data. If K is a known covariance matrix that is full rank, this should be 1.}

\item{penalty}{The penalty to be applied to the model. Either "lasso" (the default), "SCAD", or "MCP".}

\item{type}{A character argument indicating what should be returned from predict.plmm(). If type == 'lp', predictions are
based on the linear predictor, X beta. If type == 'blup', predictions are based on the sum of the linear predictor
and the estimated random effect (BLUP). Defaults to 'blup', as this has shown to be a superior prediction method
in many applications.}

\item{gamma}{The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD.}

\item{alpha}{Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD penalty and the ridge, or L2 penalty. alpha=1 is equivalent to MCP/SCAD penalty, while alpha=0 would be equivalent to ridge regression. However, alpha=0 is not supported; alpha may be arbitrarily small, but not exactly 0.}

\item{lambda_min}{The smallest value for lambda, as a fraction of lambda.max. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise.}

\item{nlambda}{Length of the sequence of lambda. Default is 100.}

\item{lambda}{A user-specified sequence of lambda values. By default, a sequence of values of length nlambda is computed, equally spaced on the log scale.}

\item{eps}{Convergence threshold. The algorithm iterates until the RMSD for the change in linear predictors for each coefficient is less than eps. Default is \code{1e-4}.}

\item{max_iter}{Maximum number of iterations (total across entire path). Default is 10000.}

\item{warn}{Return warning messages for failures to converge and model saturation? Default is TRUE.}

\item{init}{Initial values for coefficients. Default is 0 for all columns of X.}

\item{cluster}{Option for \strong{in-memory data only}: cv_plmm() can be run in parallel across a cluster using the parallel package.
The cluster must be set up in advance using parallel::makeCluster(). The cluster must then be passed to cv_plmm().
\strong{Note}: this option is not yet implemented for filebacked data.}

\item{nfolds}{The number of cross-validation folds. Default is 5.}

\item{seed}{You may set the seed of the random number generator in order to obtain reproducible results.}

\item{fold}{Which fold each observation belongs to. By default, the observations are randomly assigned.}

\item{trace}{If set to TRUE, inform the user of progress by announcing the beginning of each CV fold. Default is FALSE.}

\item{save_rds}{Optional: if a filepath and name \emph{without} the '.rds' suffix is specified (e.g., \code{save_rds = "~/dir/my_results"}), then the model results are saved to the provided location (e.g., "~/dir/my_results.rds").
Defaults to NULL, which does not save the result.
\strong{Note}: Along with the model results, two '.rds' files ('loss' and 'yhat') will be created in the same directory as 'save_rds'.
These files contain the loss and predicted outcome values in each fold; both files will be updated during after prediction within each fold.}

\item{return_fit}{Optional: a logical value indicating whether the fitted model should be returned as a \code{plmm} object in the current (assumed interactive) session. Defaults to TRUE.}

\item{...}{Additional arguments to \code{plmm_fit}}
}
\value{
A list that includes 15 items:
\itemize{
\item type: The type of prediction used ('lp' or 'blup')
\item cve: A numeric vector with the cross validation error (CVE) at each value of \code{lambda}
\item cvse: A numeric vector with the estimated standard error associated with each value of \code{cve}
\item fold: A numeric \code{n} length vector of integers indicating the fold to which each observation was assigned
\item lambda: A numeric vector of \code{lambda} values
\item fit: The overall fit of the object, including all predictors; this is a list as returned by \code{plmm()}
\item min: The index corresponding to the value of \code{lambda} that minimizes \code{cve}
\item lambda_min: The \code{lambda} value at which \code{cve} is minimized
\item min1se: The index corresponding to the value of \code{lambda} within 1 standard error of
that which minimizes \code{cve}
\item lambda1se: The largest value of lambda such that \code{cve} is within 1 standard error of the minimum
\item null.dev: A numeric value representing the deviance for the intercept-only model. If you have supplied
your own \code{lambda} sequence, this quantity may not be meaningful.
\item Y: A matrix with the predicted outcome (\eqn{\hat{y}}) values at each value of \code{lambda}.
Rows are observations, columns are values of \code{lambda}.
\item loss: A matrix with the loss values at each value of lambda. Rows are observations,
columns are values of \code{lambda}.
\item estimated_Sigma: An n x n matrix representing the estimated covariance matrix.
}
}
\description{
Performs k-fold cross validation for lasso-, MCP-, or SCAD-penalized
linear mixed models over a grid of values for the regularization parameter \code{lambda}.
}
\examples{
admix_design <- create_design(X = admix$X, y = admix$y)
cv_fit <- cv_plmm(design = admix_design)
print(summary(cv_fit))
plot(cv_fit)
}
