% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/survextrap.R
\name{mspline_spec}
\alias{mspline_spec}
\title{Make default M-spline knot specification given a survival dataset.}
\usage{
mspline_spec(
  formula,
  data,
  cure = FALSE,
  nonprop = NULL,
  backhaz = NULL,
  backhaz_strata = NULL,
  external = NULL,
  df = 10,
  add_knots = NULL,
  degree = 3,
  bsmooth = TRUE
)
}
\arguments{
\item{formula}{A survival formula in standard R formula syntax, with a call to \code{Surv()}
on the left hand side.

Covariates included on the right hand side of the formula with be
modelled with proportional hazards, or if \code{nonprop} is
\code{TRUE} then a non-proportional hazards is used.

If \code{data} is omitted, so that the model is being fitted to
external aggregate data alone, without individual data, then the
formula should not include a \code{Surv()} call.  The left-hand
side of the formula will then be empty, and the right hand side
specifies the covariates as usual.  For example, \code{formula =
~1} if there are no covariates.}

\item{data}{Data frame containing variables in \code{formula}.
Variables should be in a data frame, and not in the working
environment.

This may be omitted, in which case \code{external} must be
supplied.  This allows a model to be fitted to external aggregate
data alone, without any individual-level data.}

\item{cure}{If \code{TRUE}, a mixture cure model is used, where the
"uncured" survival is defined by the M-spline model, and the cure
probability is estimated.}

\item{nonprop}{Non-proportional hazards model specification.
This is achieved by modelling the spline basis coefficients in terms of the covariates.  See
the \href{https://chjackson.github.io/survextrap/articles/methods.html}{methods vignette} for more details.

If \code{TRUE}, then all covariates are modelled with
non-proportional hazards, using the same model formula as
\code{formula}.

If this is a formula, then this is assumed to define a model for
the dependence of the basis coefficients on the covariates.

IF this is \code{NULL} or \code{FALSE} (the default) then any
covariates are modelled with proportional hazards.}

\item{backhaz}{Background hazard, that is, for causes of death
other than the cause of interest. This defines a
"relative survival" or "additive hazards" model.  The overall
hazard that describes the all-cause survival data (given in the
\code{data} and/or \code{external} argument) is then modelled as the sum of
a cause-specific hazard and a background hazard.

The background hazard is assumed to be known, and the
cause-specific hazard is modelled with the flexible parametric
model.

The background hazard can be supplied in two forms.  The meaning of predictions
from the model depends on which of these is used.

(a) A data frame with columns \code{"hazard"} and \code{"time"},
specifying the background hazard at all times as a
piecewise-constant (step) function.  Each row gives the background
hazard between the specified time and the next time.  The first
element of \code{"time"} should be 0, and the final row specifies
the hazard at all times greater than the last element of
\code{"time"}.  Predictions from the model fitted by \code{survextrap}
will then include this background hazard, because it is known at
all times.

(b) The (quoted) name of a variable in the data giving the
background hazard.  For censored cases, the exact value does not
matter.  The predictions from \code{survextrap} will then describe the
excess hazard or survival on top of this background.  The overall
hazard cannot be predicted in general, because the background
hazard is only specified over a limited range of time.

If there is external data, and \code{backhaz} is supplied in form (b),
then the user should also supply the background survival at the
start and stop points in columns of the external data named
\code{"backsurv_start"} and \code{"backsurv_stop"}.  That is, the probability
of survival up to each of these times for someone alive at time 0.
This should describe the
same reference population as \code{backhaz}, though the package does not
check for consistency between these.

If there are stratifying variables specified in
\code{backhaz_strata}, then there should be multiple rows giving
the background hazard value for each time period and stratifying
variable.

If \code{backhaz} is \code{NULL} (the default) then no background hazard
component is included in the model.}

\item{backhaz_strata}{A character vector of names of variables that
appear in \code{backhaz} that indicate strata, e.g.
\code{backhaz_strata = c("agegroup","sex")}.  This allows
different background hazard values to be used for different
subgroups.  These variables must also
appear in the datasets being modelled, that is, in \code{data},
\code{external} or both.  Each row of those datasets should then
have a corresponding row in \code{backhaz} which has the same
values of the stratifying variables.

This is \code{NULL} by default, indicating no stratification of the
background hazard.

If stratification is done, then \code{backhaz} must be supplied in
form (a), as a data frame rather than a variable in the data.}

\item{external}{External data as a data frame of aggregate survival counts with columns named:

\code{start}: Start time

\code{stop}: Follow-up time

\code{n}: Number of people alive at \code{start}

\code{r}: Number of those people who are still alive at \code{stop}

If there are covariates in \code{formula}, then the values they
take in the external data must be supplied as additional columns in
\code{external}.  Therefore if there are external data, the
covariates in \code{formula} and \code{data} should not be named
\code{start},\code{stop},\code{n} or \code{r}.}

\item{df}{Desired number of basis terms, or "degrees of freedom"
in the spline.  If \code{knots} is not supplied, the number of
knots is then chosen to satisfy this.}

\item{add_knots}{Additional knots, other than those determined from the quantiles of the individual data.
Typically used to add a maximum knot at the time that we want to extrapolate to.}

\item{degree}{Spline polynomial degree.  Can only be changed from
the default of 3 if \code{bsmooth} is \code{FALSE}.}

\item{bsmooth}{If \code{TRUE} then the function is constrained to
also have zero derivative and second derivative at the boundary.}
}
\value{
A list with components

\code{knots} Knot locations.  The number of
knots will be equal to \code{df} + \code{degree} + 2.
\code{degree} Spline polynomial degree (i.e. 3)
\code{nvars} Number of basis variables (an alias for \code{df})
}
\description{
Choose default M-spline knot locations given a dataset and desired
number of spline parameters.  Assumes a cubic spline, and knots
based on quantiles of event times observed in the individual data.
}
\details{
If there are also external data, then these are based on quantiles
of a vector defined by concatenating the event times in the
individual data with the unique start and stop times in the
external data.

This is designed to have the same arguments as
\code{\link{survextrap}}.  It is intended for use when we want to
fit a set of \code{\link{survextrap}} models with the same spline
specification.

See also \code{\link{mspline_list_init}} and \code{\link{mspline_init}},
which have lower-level interfaces, and are designed for use without
data, e.g. when illustrating a theoretical M-spline model.
}
