% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bind_tf_idf2.R
\name{bind_tf_idf2}
\alias{bind_tf_idf2}
\title{Bind term frequency and inverse document frequency}
\usage{
bind_tf_idf2(
  tbl,
  term = "token",
  document = "doc_id",
  n = "n",
  tf = c("tf", "tf2", "tf3", "itf"),
  idf = c("idf", "idf2", "idf3", "idf4", "df"),
  norm = FALSE,
  rmecab_compat = TRUE
)
}
\arguments{
\item{tbl}{A tidy text dataset.}

\item{term}{<\code{\link[rlang:args_data_masking]{data-masked}}>
Column containing terms.}

\item{document}{<\code{\link[rlang:args_data_masking]{data-masked}}>
Column containing document IDs.}

\item{n}{<\code{\link[rlang:args_data_masking]{data-masked}}>
Column containing document-term counts.}

\item{tf}{Method for computing term frequency.}

\item{idf}{Method for computing inverse document frequency.}

\item{norm}{Logical; If passed as \code{TRUE}, TF-IDF values are normalized
being divided with L2 norms.}

\item{rmecab_compat}{Logical; If passed as \code{TRUE}, computes values while
taking care of compatibility with 'RMeCab'.
Note that 'RMeCab' always computes IDF values using term frequency
rather than raw term counts, and thus TF-IDF values may be
doubly affected by term frequency.}
}
\value{
A data.frame.
}
\description{
Calculates and binds the term frequency, inverse document frequency,
and TF-IDF of the dataset.
This function experimentally supports 4 types of term frequencies
and 5 types of inverse document frequencies.
}
\details{
Types of term frequency can be switched with \code{tf} argument:
\itemize{
\item \code{tf} is term frequency (not raw count of terms).
\item \code{tf2} is logarithmic term frequency of which base is \code{exp(1)}.
\item \code{tf3} is binary-weighted term frequency.
\item \code{itf} is inverse term frequency. Use with \code{idf="df"}.
}

Types of inverse document frequencies can be switched with \code{idf} argument:
\itemize{
\item \code{idf} is inverse document frequency of which base is 2, with smoothed.
'smoothed' here means just adding 1 to raw values after logarithmizing.
\item \code{idf2} is global frequency IDF.
\item \code{idf3} is probabilistic IDF of which base is 2.
\item \code{idf4} is global entropy, not IDF in actual.
\item \code{df} is document frequency. Use with \code{tf="itf"}.
}
}
\examples{
\donttest{
df <- dplyr::count(hiroba, doc_id, token)
bind_tf_idf2(df)
}
}
