% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/fit.R
\name{gamlasso}
\alias{gamlasso}
\alias{gamlasso.formula}
\alias{gamlasso.default}
\title{Fitting a gamlasso model}
\usage{
\method{gamlasso}{formula}(
  formula,
  data,
  family = "gaussian",
  linear.penalty = "l1",
  smooth.penalty = "l2",
  num.knots = 5,
  offset = NULL,
  weights = NULL,
  interactions = F,
  seed = .Random.seed[1],
  num.iter = 100,
  tolerance = 1e-04,
  ...
)

\method{gamlasso}{default}(
  response,
  linear.terms,
  smooth.terms,
  data,
  family = "gaussian",
  linear.penalty = "l1",
  smooth.penalty = "l2",
  num.knots = 5,
  offset = NULL,
  weights = NULL,
  interactions = F,
  seed = .Random.seed[1],
  num.iter = 100,
  tolerance = 1e-04,
  prompts = F,
  verbose = T,
  ...
)
}
\arguments{
\item{formula}{A formula describing the model to be fitted}

\item{response}{The name of the response variable. Could be two variables
in case of a general binomial fit (see details below)}

\item{linear.terms}{The names of the variables to be used as linear predictors}

\item{smooth.terms}{The names of the variables to be used as smoothers}

\item{data}{The data with which to fit the model}

\item{family}{The family describing the error distribution and link function
to be used in the model. A character string which can only be
\code{"gaussian"} (default), \code{"binomial"}, \code{"poisson"} or
\code{"cox"}. For \code{family = "binomial"}, \code{response} can be
a vector of two and for \code{family="cox"}, \code{weights} must
be provided (see details below).}

\item{linear.penalty}{The penalty used on the linear predictors. A character
string which can be \code{"none"} (default), \code{"l1"} or \code{"l2"}. If
\code{"l1"} is used then we use the gam and lasso loop. Otherwise only a
gam model is fitted (with penalities on parametric terms if
\code{linear.penalty = "l2"} ).}

\item{smooth.penalty}{The penalty used on the smoothers. A character
string which can be \code{"l1"} or \code{"l2"} (default). \code{"l2"} refers
to the inherent second order penalty smoothers have for controlling their
shape, so \code{"none"} is not an option. For \code{"l1"} basis is specified
by \code{bs='ts'}, else \code{bs='tp'} is used. (see \code{\link[mgcv]{gam}}
for details on basis types)}

\item{num.knots}{Number of knots for each smoothers. Can be a single integer
(recycled for each smoother variable) or a vector of integers the same length
as the number of smoothers.}

\item{offset}{The name of the offset variable. \code{NULL} (default) if not provided}

\item{weights}{The name of the weights variable. \code{NULL} (default) if not
provided. See details below.}

\item{interactions}{logical. Should interactions be included as covariates.
If \code{TRUE} then the smoothers are fitted with \code{\link[mgcv]{ti}}
instead of \code{\link[mgcv]{s}} so that the added effects of the interactions
can be quantified separately.}

\item{seed}{The random seed can be specified for reproducibility. This is used
for fitting the gam and lasso models, or fixed before each loop of gamlasso.}

\item{num.iter}{Number of iterations for the gamlasso loop}

\item{tolerance}{Tolerance for covergence of the gamlasso loop}

\item{prompts}{logical. Should \code{gamlassoChecks} provide interactive
user prompts for corrective action when needed.}

\item{verbose}{logical. Should there be "progress reports" printed to the
console while fitting the model.}

\item{...}{Additional arguments}
}
\value{
If the arguments fail the basic checking by \code{gamlassoChecks}
  then returns \code{NULL}. Else the function calls \code{gamlassoFit} which
  returns a list of two models, \code{gam} and \code{cv.glmnet}.
  Either of these could be \code{NULL} but if both are non-null then
  \code{convergence}, a matrix of values determining the convergence
  of the gamlasso loop is also returned.
  \code{gamlassoFit} also returns \code{inherit}, a list of select
  arguments used to fit the \code{gamlasso} model and some more values needed
  for prediction.
}
\description{
This function will fit a gamlasso model with the given penalties. For some
special cases using \code{\link[mgcv]{gam}} or \code{\link[glmnet]{glmnet}}
might be more efficient and/or flexible
}
\details{
\code{gamlasso} allows for specifying models in two ways:
  1) with the the formula approach, and 2) with the term specification approach.

  The formula approach is appropriate for when the user wants an L1-penalty on the
  linear terms of the model, in which case the user is required to specify the linear terms
  in a model matrix named "X" appended to the input data frame. A typical formula specification
  would be "\code{y ~ X + s(z) + ...}" where "\code{X}" corresponds to the model-matrix of
  linear terms subject to an L1-penalty, while everything to the right of "\code{X}" is
  considered part of the gam formula (i.e. all smooth terms). In light of the above formula,
  gamlasso iterates (until convergence) between the following two lines of pseudo code:

  \itemize{
    \item \code{model.cv.glmnet <- cv.glmnet(y=y, x=X, offset="model.gam fitted values")}
    \item \code{model.gam <- gam(y ~ s(z) + ..., offset="model.cv.glmnet fitted values")}
  }

  The term specification approach can fit the same type of models as the formula approach
  (i.e. models with L1-penalty on the linear terms). However, it is more flexible in terms
  of penalty-structure and can be useful if the user has big data sets with lots of variables
  making the formula specification cumbersome. In the term specification approach
  the user simply specifies the names of the data columns corresponding to the
  \code{response}, \code{linear.terms} and \code{smooth.terms} and then specifies
  whether to put a \code{linear.penalty="l1"}, \code{"l2"} or \code{"none"}
  (on \code{linear.terms}) and whether to put a \code{smooth.penalty="l1"} or
  \code{"l2"} (on \code{smooth.terms}).

  While fitting a binomial model for binary responses (0/1) include the response
  variable before "~" if using the formula approach or when using the term-
  specification approach the \code{response} argument will be a single variable name.
  In general if the responses are success/failure counts then the formula should
  start with something similar to \code{cbind(success,failure) ~ ...} and for
  using the term-specification approach the \code{response} argument should be a
  vector of length two giving the success and failure variable names.

  If \code{family="cox"} then the \code{weights} argument must be provided
  and should correspond to a status variable (1-censor). For other models
  it should correspond to a custom weights variables to be used for the
  weighted log-likelihood, for example the total counts for fitting a
  binomial model. (weights for families other than "cox" currently not
  implemented)

  Both the formula and term-specification approaches can fit interaction models as
  well. There are three kinds of interactions - those between two linear predictors,
  between two smooth predictors and between linear and smooth predictors. For the
  formula approach the first type of interaction must be included as additional
  columns in the "\code{X}" matrix and the other two types must be mentioned in the
  smooth terms part of the formula. For the term-specification approach the argument
  \code{interaction} must be \code{TRUE} in which case all the pairwise
  interactions are used as predictors and variable selection is done on all of them.
}
\note{
The default values of \code{num.iter} and \code{tolerance} are
  essentially arbitrary. Also for each step when we check for convergence
  between the new and old predictions by the gam and lasso predictions,
  we use the following distance metric
  \deqn{ d(x,y) = \frac{1}{length(x)} \sum_{i=1}^{length(x)} (x_i - y_i)^2 }
}
\examples{
library(plsmselect)

data(simData)

## Fit gaussian gamlasso model using the formula approach:
## (L1-penalty both on model matrix (X) and smooth terms (bs="ts"))
simData$X = model.matrix(~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10, data=simData)[,-1]

gfit = gamlasso(Yg ~ X +
                   s(z1, k=5, bs="ts") +
                   s(z2, k=5, bs="ts") +
                   s(z3, k=5, bs="ts") +
                   s(z4, k=5, bs="ts"),
                   data = simData,
                   seed=1)

\donttest{## Equivalently with term specification approach:
gfit = gamlasso(response="Yg",
                  linear.terms=paste0("x",1:10),
                  smooth.terms=paste0("z",1:4),
                  data=simData,
                  linear.penalty = "l1",
                  smooth.penalty = "l1",
                  num.knots = 5,
                  seed=1)
}
## The two main components of gfit are
## gfit$cv.glmnet (LASSO component) and gfit$gam (GAM components):

## Extract lasso estimates of linear terms:
coef(gfit$cv.glmnet, s="lambda.min")

## Plot the estimates of the smooth effects:
plot(gfit$gam, pages=1)

# See ?summary.gamlasso for an example fitting a binomial response model
# See ?predict.gamlasso for an example fitting a poisson response model
# See ?cumbasehaz for an example fitting a survival response model
}
\seealso{
\code{\link[mgcv]{gam}}, \code{\link[glmnet]{glmnet}}
}
