% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DoU-classify-units.R
\name{DoU_classify_units}
\alias{DoU_classify_units}
\title{Create the DEGURBA spatial units classification}
\usage{
DoU_classify_units(
  data,
  id = "UID",
  level1 = TRUE,
  values = NULL,
  official_workflow = TRUE,
  rules_from_2021 = FALSE,
  filename = NULL
)
}
\arguments{
\item{data}{named list with the required data, as returned by the function \code{\link[=DoU_preprocess_units]{DoU_preprocess_units()}}}

\item{id}{character. Unique column in the \code{units} data as id for spatial units}

\item{level1}{logical. Whether to classify the spatial units according to first hierarchical level (\code{TRUE}) or the second hierarchical level (\code{FALSE}). For more details, see section "Classification rules" below.}

\item{values}{vector with the values assigned to the different classes in the resulting units classification:
\itemize{
\item If \code{level1=TRUE}: the vector should contain the values for (1) cities, (2) town and semi-dense areas and (3) rural areas.
\item If \code{level1=FALSE}: the vector should contain the values for (1) cities, (2) dense towns, (3) semi-dense towns, (4) suburb or peri-urban areas, (5) villages, (6) dispersed rural areas and (7) mostly uninhabited areas.
}}

\item{official_workflow}{logical. Whether to employ the official workflow of the GHSL (\code{TRUE}) or the alternative workflow (\code{FALSE}). For more details, see section "Workflow" below.}

\item{rules_from_2021}{logical. Whether to employ the original classification rules as described in the 2021 version of the DEGURBA manual. The DEUGURBA Level 2 unit classification rules have been modified in July 2024. By default, the function uses the most recent rules as described in the \href{https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Applying_the_degree_of_urbanisation_manual}{online version} of the methodological manual. For more details, see section "Modification of the unit classification rules" below.}

\item{filename}{character. Output filename (csv). The resulting classification together with a metadata file (in JSON format) will be saved if \code{filename} is not \code{NULL}.}
}
\value{
dataframe with for each spatial unit the classification and the share of population per grid class
}
\description{
The function reconstructs the spatial units classification of the Degree of Urbanisation based on the grid cell classification.
}
\section{Classification rules}{


The Degree of Urbanisation consists of two hierarchical levels. In level 1, the spatial units are classified in cities, towns and semi-dense areas, and rural areas. In level 2, towns and semi-dense areas are further divided in dense towns, semi-dense towns and suburban or peri-urban areas. Rural areas are further divided in villages, dispersed rural areas and mostly uninhabited areas.

The detailed classification rules are as follows:

\strong{LEVEL 1:}
\itemize{
\item \strong{Cities:} units that have at least 50\% of their population in urban centres
\item \strong{Towns and semi-dense areas:} units that have less than 50\% of their population in urban centres and no more than 50\% of their population in rural grid cells
\item \strong{Rural areas:} units that have more then 50\% of their population in rural grid cells
}

\strong{LEVEL 2:}
\itemize{
\item \strong{Cities:} units that have at least 50\% of their population in urban centres
\item \strong{Dense towns:} units that have at least 50\% of their population in the combination of urban centres and dense urban clusters
\item \strong{Semi-dense towns:} units that have less than 50\% of their population in the combination of urban centres and dense urban clusters, or have less than 50\% of their population in the combination of suburban and peri-urban cells and rural grid cells
\item \strong{Suburbs or peri-urban areas:} units that have at least 50\% of their population in the combination of suburban and peri-urban cells and rural grid cells
\item \strong{Villages}: units that have at least 50\% of their population in the combination of urban centres, urban clusters and rural clusters
\item \strong{Dispersed rural areas}: units that have less than 50\% of their population in the combination of urban centres, urban clusters and rural clusters, or have less than 50\% of their population in very low-density rural grid cells
\item \strong{Mostly uninhabited areas}: units that have at least 50\% of their population in very low-density rural grid cells
}
}

\section{Workflow}{


The classification of small spatial units requires a vector layer with the small spatial units, a raster layer with the grid cell classification, and a raster layer with the population grid. Standard, a population grid of 100 m resolution is used by the Degree of Urbanisation.

The function includes two different workflows to establish the spatial units classification based on these three data sources.

\strong{Official workflow according to the GHSL:}

For the official workflow, the three layers should be pre-processed by \code{\link[=DoU_preprocess_units]{DoU_preprocess_units()}}. In this function, the classification grid and population grid are resampled to a user-defined \code{resample_resolution} with the nearest neighbour algorithm (the Degree of Urbanisation uses standard a resample resolution of 50 m). In doing this, the values of the population grid are divided by the oversampling ratio (for example: going from a resolution of 100 m to a resolution of 50 m, the values of the grid are divided by 4).

Afterwards, the spatial units classification is constructed with \code{\link[=DoU_classify_units]{DoU_classify_units()}} as follows. The vector layer with small spatial units is rasterised to match the population and classification grid. Based on the overlap of the three grids, the share of population per flexurba grid class is computed per spatial unit with a zonal statistics procedure. The units are subsequently classified according to the classification rules (see above).

Apart from this, there are two special cases. First, if a unit has no population, it is classified according to the share of \emph{land area} in each of the flexurba grid classes (computed with a zonal statistics procedure). Second, if a unit initially could not be rasterised (can occur if the area of the unit < \code{resample_resolution}), then it is processed separately as follows. The unit is individually rasterised by all touching cells. The unit is classified according to the share of population in the flexurba grid classes in these touching cells. However, to avoid double counting of population, no population is assigned to the unit in the result.

For more information about the official workflow to construct the units classification, see \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2023.pdf}{GHSL Data Package 2023 (Section 2.7.2.3)}.

\strong{Alternative workflow:}

Besides the official workflow of the GHSL, the function also includes an alternative workflow to construct the spatial units classification. The alternative workflow does not require rasterising the spatial units layer, but relies on the overlap between the spatial units layer and the grid layers.

The three layers should again be pre-processed by the function \code{\link[=DoU_preprocess_units]{DoU_preprocess_units()}}, but this time without \code{resampling_resolution}. For the classification in \code{\link[=DoU_classify_units]{DoU_classify_units()}},  the function \code{\link[exactextractr:exact_extract]{exactextractr::exact_extract()}} is used to (1) overlay the grids with the spatial units layer, and (2) summarise the values of the population grid and classification grid per unit. The units are subsequently classified according to the classification rules (see above). As an exception, if a unit has no population, it is classified according to the share of \emph{land area} in each of the flexurba grid classes. The alternative workflow is slightly more efficient as it does not require resampling the population and classification grids and rasterising the spatial units layer.
}

\section{Modification of the unit classification rules}{


The unit classification rules of Level 2 of DEGURBA were updated in July 2024. By default, the function \code{\link[=DoU_classify_units]{DoU_classify_units()}} applies the latest classification rules, as described in the \href{https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Applying_the_degree_of_urbanisation_manual}{online version} of the methodological manual. However, you can also use the original 2021 classification rules if desired, by setting the argument \code{rules_from_2021} to \code{TRUE}. In that case, the rules to classify units are as follows:
\itemize{
\item \strong{Cities:} units that have at least 50\% of their population in urban centres
\item \strong{Dense towns:} units that have a larger share of the population in dense urban clusters than in semi-dense urban clusters, and that have a larger share of the population in dense + semi-dense urban clusters than in suburban or peri-urban cells
\item \strong{Semi-dense towns:} units that have a larger share of the population in semi-dense urban clusters than in semi-dense urban clusters, and that have a larger share of the population in dense + semi-dense urban clusters than in suburban or peri-urban cells
\item \strong{Suburbs or peri-urban areas:} units that have a larger share in suburban or peri-urban cells than in dense + semi-dense urban clusters
\item \strong{Villages}: units that have the largest share of their rural grid cell population living in rural clusters
\item \strong{Dispersed rural areas}: units that have the largest share of their rural grid cell population living in low density rural grid cells
\item \strong{Mostly uninhabited areas}: units that have the largest share of their rural grid cell population living in very low density rural grid cells
}
}

\examples{
# load the grid data
data_belgium <- flexurba::DoU_load_grid_data_belgium()
# load the units and filter for West-Flanders
units_data <- flexurba::units_belgium \%>\%
  dplyr::filter(GID_2 == "30000")
# classify the grid
classification <- DoU_classify_grid(data = data_belgium)

\donttest{
# official workflow
data1 <- DoU_preprocess_units(
  units = units_data,
  classification = classification,
  pop = data_belgium$pop,
  resample_resolution = 50
)
units_classification1 <- DoU_classify_units(data1)
}

# alternative workflow
data2 <- DoU_preprocess_units(
  units = units_data,
  classification = classification,
  pop = data_belgium$pop
)
units_classification2 <- DoU_classify_units(data2, official_workflow = FALSE)

# spatial units classification, dissolved at level 3 (Belgian districts)
data3 <- DoU_preprocess_units(
  units = units_data,
  classification = classification,
  pop = data_belgium$pop,
  dissolve_units_by = "GID_3"
)
units_classification3 <- DoU_classify_units(data3, id = "GID_3")
}
