% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/filter-repeat-visits.r
\name{filter_repeat_visits}
\alias{filter_repeat_visits}
\title{Filter observations to repeat visits for hierarchical modeling}
\usage{
filter_repeat_visits(
  x,
  min_obs = 2L,
  max_obs = 10L,
  annual_closure = TRUE,
  n_days = NULL,
  date_var = "observation_date",
  site_vars = c("locality_id", "observer_id"),
  ll_digits = 6L
)
}
\arguments{
\item{x}{\code{data.frame}; observation data, e.g. data from the eBird Basic
Dataset (EBD) zero-filled with \code{\link[=auk_zerofill]{auk_zerofill()}}. This function will also
work with an \code{auk_zerofill} object, in which case it will be converted to
a data frame with \code{\link[=collapse_zerofill]{collapse_zerofill()}}.
\strong{Note that these data must for a single species}.}

\item{min_obs}{integer; minimum number of observations required for each
site.}

\item{max_obs}{integer; maximum number of observations allowed for each site.}

\item{annual_closure}{logical; whether the entire year should be treated as
the period of closure (the default). This can be useful, for example, if
the data have been subset to a period of closure prior to calling
\code{\link[=filter_repeat_visits]{filter_repeat_visits()}}.}

\item{n_days}{integer; number of days defining the temporal length of
closure. Ignored if \code{annual_closure = TRUE}.}

\item{date_var}{character; column name of the variable in \code{x} containing the
date. This column should either be in \code{Date} format or convertible to
\code{Date} format with \code{\link[=as.Date]{as.Date()}}.}

\item{site_vars}{character; names of one of more columns in \code{x} that define a
site, typically the location (e.g. latitude/longitude) and observer ID.}

\item{ll_digits}{integer; the number of digits to round latitude and longitude
to. If latitude and/or longitude are used as \code{site_vars}, it's usually best
to round them prior to identifying sites, otherwise locations that are only
slightly offset (e.g. a few centimeters) will be treated as different. This
argument can also be used to group sites together that are close but not
identical. Note that 1 degree of latitude is approximately 100 km, so the
default value of 6 for \code{ll_digits} is equivalent to about 10 cm.}
}
\value{
A \code{data.frame} filtered to only retain observations from sites with
the allowed number of observations within the period of closure. The
results will be sorted such that sites are together and in chronological
order. The following variables are added to the data frame:
\itemize{
\item \code{site}: a unique identifier for each "site" corresponding to all the
variables in \code{site_vars} and \code{closure_id} concatenated together with
underscore separators.
\item \code{closure_id}: a unique ID for each closure period. If
\code{annual_closure = TRUE}, this will be the year. Otherwise, it will be the
number of blocks of \code{n_days} days since the earliest observation. Note that
in this latter case, there may be gaps in the IDs.
\item \code{n_observations}: number of observations at each site after all
filtering.
}
}
\description{
Hierarchical modeling of abundance and occurrence requires repeat visits to
sites to estimate detectability. These visits should be all be within a
period of closure, i.e. when the population can be assumed to be closed.
eBird data, and many other data sources, do not explicitly follow this
protocol; however, subsets of the data can be extracted to produce data
suitable for hierarchical modeling. This function extracts a subset of
observation data that have a desired number of repeat visits within a period
of closure.
}
\details{
In addition to specifying the minimum and maximum number of
observations per site, users must specify the variables in the dataset that
define a "site". This is typically a combination of IDs defining the
geographic site and the unique observer (repeat visits are meant to be
conducted by the same observer). Finally, the number of days defining the
period of closure is required. A default value of 14 days is used; however,
users should choose a suitable period for their species within which the
population can reasonably be assumed to be closed.
}
\examples{
# read and zero-fill the ebd data
f_ebd <- system.file("extdata/zerofill-ex_ebd.txt", package = "auk")
f_smpl <- system.file("extdata/zerofill-ex_sampling.txt", package = "auk")
# data must be for a single species
ebd_zf <- auk_zerofill(x = f_ebd, sampling_events = f_smpl,
                       species = "Collared Kingfisher",
                       collapse = TRUE)
filter_repeat_visits(ebd_zf, n_days = 30)
}
\seealso{
Other modeling: 
\code{\link{format_unmarked_occu}()}
}
\concept{modeling}
