% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dispersion_min_max_functions.R
\name{find_max_disp_tdm}
\alias{find_max_disp_tdm}
\title{Find the maximally dispersed distribution of each item in a term-document matrix}
\usage{
find_max_disp_tdm(
  tdm,
  row_partsize = "first",
  freq_adjust_method = freq_adjust_method
)
}
\arguments{
\item{tdm}{A term-document matrix, where rows represent items and columns represent corpus parts; must also contain a row giving the size of the corpus parts (first or last row in the term-document matrix)}

\item{row_partsize}{Character string indicating which row in the term-document matrix contains the size of the corpus parts. Possible values are \code{"first"} (default) and \code{"last"}}

\item{freq_adjust_method}{Character string indicating which method to use for devising dispersion extremes. See details below. Possible values are \code{"even"} (default) and \code{"pervasive"}}
}
\value{
A matrix of integers with one row per item and one column per corpus part
}
\description{
This function takes as input a term-document matrix and returns, for each item (i.e. row), the (hypothetical) distribution of subfrequencies that represents the highest possible level of dispersion for the item across the corpus parts. This distribution is required for the min-max transformation proposed by Gries (2022: 184-191; 2024: 196-208) to obtain frequency-adjusted dispersion scores.
}
\details{
This function takes as input a term-document matrix and creates, for each item in the matrix, a hypothetical distribution of the total number of occurrences of the item (i.e. the sum of the subfrequencies) across corpus parts. To obtain the highest possible level of dispersion, the argument \code{freq_adjust_method} allows the user to choose between two distributional features: pervasiveness (\code{pervasive}) or evenness (\code{even}). For details and explanations, see \code{vignette("frequency-adjustment")}. To obtain the highest possible level of dispersion, the occurrences are either spread as broadly across corpus parts as possible (\code{pervasive}), or they are allocated to corpus parts in proportion to their size (\code{even}). The choice between these methods is particularly relevant if corpus parts differ considerably in size. Since the dispersion of items that occur only once in the corpus (hapaxes) cannot be sensibly measured or manipulated, such items are disregarded; the function returns their observed subfrequencies.
}
\examples{
find_max_disp_tdm(
  tdm = biber150_spokenBNC2014[1:10,],
  row_partsize = "first",
  freq_adjust_method = "even")

}
\references{
Gries, Stefan Th. 2022. What do (most of) our dispersion measures measure (most)? Dispersion? \emph{Journal of Second Language Studies} 5(2). 171--205. \doi{doi:10.1075/jsls.21029.gri}

Gries, Stefan Th. 2024. \emph{Frequency, dispersion, association, and keyness: Revising and tupleizing corpus-linguistic measures}. Amsterdam: Benjamins.
}
\seealso{
\code{\link[=find_max_disp]{find_max_disp()}}
}
\author{
Lukas Soenning
}
