% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ecmave.R
\name{ecmave}
\alias{ecmave}
\title{Build an averaged error correction model}
\usage{
ecmave(
  y,
  xeq,
  xtr,
  includeIntercept = TRUE,
  k,
  method = "boot",
  seed = 5,
  weights = NULL,
  ...
)
}
\arguments{
\item{y}{The target variable}

\item{xeq}{The variables to be used in the equilibrium term of the error correction model}

\item{xtr}{The variables to be used in the transient term of the error correction model}

\item{includeIntercept}{Boolean whether the y-intercept should be included}

\item{k}{The number of models or data partitions desired}

\item{method}{Whether to split data by folds ("fold"), nested folds ("nestedfold"), or bootstrapping ("boot")}

\item{seed}{Seed for reproducibility (only needed if method is "boot")}

\item{weights}{Optional vector of weights to be passed to the fitting process}

\item{...}{Additional arguments to be passed to the 'lm' function (careful in that these may need to be modified for ecm or may not be appropriate!)}
}
\value{
an lm object representing an error correction model
}
\description{
Builds multiple ECM models on subsets of the data and averages them. See the lmave function for more details
on the methodology and use cases for this approach.
}
\details{
In some cases, instead of building an ECM on the entire dataset, it may be preferable to build k ECM models on k subsets of the data, each subset containing (k-1)/k*nrow(data)
observations of the full dataset, and then average their coefficients. Reasons to do this include controlling for overfitting or extending the training sample. For example, 
in many time series modeling exercises, the holdout test sample is often the latest few months or years worth of data. Ideally, it's desirable to include these data since 
they likely have more future predictive power than older observations. However, including the entire dataset in the training sample could result in overfitting, or using a 
different time period as the test sample may be even less representative of future performance. One potential solution is to build multiple ECM models using the entire dataset, 
each with a different holdout test sample, and then average them to get a final ECM. This approach is somewhat similar to the idea of random forest regression, in which 
multiple regression trees are built on subsets of the data and then averaged.
}
\examples{
##Not run

#Use ecm to predict Wilshire 5000 index based on corporate profits, 
#Federal Reserve funds rate, and unemployment rate
data(Wilshire)

#Use 2015-12-01 and earlier data to build models
trn <- Wilshire[Wilshire$date<='2015-12-01',]

#Build five ECM models and average them to get one model
xeq <- xtr <- trn[c('CorpProfits', 'FedFundsRate', 'UnempRate')]
model1 <- ecmave(trn$Wilshire5000, xeq, xtr, includeIntercept=TRUE, k=5)

}
\seealso{
\code{lm}
}
