% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/thin_geo.R
\name{thin_geo}
\alias{thin_geo}
\title{Flag records that are close to each other in the geographic space}
\usage{
thin_geo(
  occ,
  species = "species",
  long = "decimalLongitude",
  lat = "decimalLatitude",
  d,
  prioritary_column = NULL,
  decreasing = TRUE,
  remove_invalid = TRUE,
  optimize_memory = FALSE,
  verbose = TRUE
)
}
\arguments{
\item{occ}{(data.frame or data.table) a data frame containing the occurrence
records to be flagged. Must contain columns for species, longitude, and
latitude.}

\item{species}{(character) the name of the column in \code{occ} that contains the
species scientific names. Default is \code{"species"}.}

\item{long}{(character) the name of the column in \code{occ} that contains the
longitude values. Default is \code{"decimalLongitude"}.}

\item{lat}{(character) the name of the column in \code{occ} that contains the
latitude values. Default is \code{"decimalLatitude"}.}

\item{d}{(numeric) thinning distance in \strong{kilometers} (e.g., 10 for
10km).}

\item{prioritary_column}{(character) name of a numeric columns in \code{occ}to
define retention priority (e.g., quality score, year). See details.}

\item{decreasing}{(logical) whether to sort records in decreasing order using
the \code{prioritary_column} (e.g., from most recent to oldest when the variable
is \code{"year"}). Only applicable when \code{prioritary_column} is not \code{NULL}.
Default is \code{TRUE}.}

\item{remove_invalid}{(logical) whether to remove invalid coordinates.
Default is \code{TRUE}.}

\item{optimize_memory}{(logical) whether to compute the distance matrix
using a C++ implementation that reduces memory usage at the cost of
increased computation time. Recommended for large datasets (> 10,000 records).
Default is FALSE.}

\item{verbose}{(logical) whether to display messages during function
execution. Set to TRUE to enable display, or FALSE to run silently. Default
is TRUE.}
}
\value{
The original \code{occ} data frame augmented with a new logical column named
\code{thin_geo_flag}. Records that are retained after thinning receive
\code{TRUE}, while records identified as too close to a higher-priority
record receive \code{FALSE}.
}
\description{
Marks occurrence records for thinning by keeping only one record per species
within a radius of 'd' kilometers.
}
\details{
This function is similar to the \code{thin()} function from the \strong{spThin} package,
but with an important difference: it allows specifying a priority order for
retaining records.

When a thinning distance is provided (e.g., 10 km), the function identifies
clusters of records within this distance. Within each cluster, it keeps the
record with the highest priority according to the column defined in
\code{prioritary_column} (for example, keeping the most recent record if
\code{prioritary_column = "year"}), and flags the remaining nearby records for
removal.

If \code{prioritary_column} is \code{NULL}, the priority follows the original order of
rows in the input \code{occ} data.frame.
}
\examples{
# Load example data
data("occurrences", package = "RuHere")
# Subset occurrences for Araucaria angustifolia
occ <- occurrences[occurrences$species == "Araucaria angustifolia", ]
# Thin records using a 10 km distance threshold
occ_thin <- thin_geo(occ = occ, d = 10)
sum(!occ_thin$thin_geo_flag)  # Number of records flagged for removal
# Prioritizing more recent records within each cluster
occ_thin_recent <- thin_geo(occ = occ, d = 10, prioritary_column = "year")
sum(!occ_thin_recent$thin_geo_flag)  # Number of records flagged for removal

}
