% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/jackstraw_cluster.R
\name{jackstraw_cluster}
\alias{jackstraw_cluster}
\title{Jackstraw for the User-Defined Clustering Algorithm}
\usage{
jackstraw_cluster(
  dat,
  k,
  cluster = NULL,
  centers = NULL,
  algorithm = function(x, centers) kmeans(x, centers, ...),
  s = 1,
  B = 1000,
  center = TRUE,
  noise = NULL,
  covariate = NULL,
  verbose = FALSE,
  seed = NULL,
  ...
)
}
\arguments{
\item{dat}{a data matrix with \code{m} rows as variables and \code{n} columns as observations.}

\item{k}{a number of clusters.}

\item{cluster}{a vector of cluster assignments.}

\item{centers}{a matrix of all cluster centers.}

\item{algorithm}{a clustering algorithm to use, where an output must include `cluster` and `centers`. For exact specification, see \code{kmeans}.}

\item{s}{a number of ``synthetic'' null variables. Out of \code{m} variables, \code{s} variables are independently permuted.}

\item{B}{a number of resampling iterations.}

\item{center}{a logical specifying to center the rows. By default, \code{TRUE}.}

\item{noise}{specify a parametric distribution to generate a noise term. If \code{NULL}, a non-parametric jackstraw test is performed.}

\item{covariate}{a model matrix of covariates with \code{n} observations. Must include an intercept in the first column.}

\item{verbose}{a logical specifying to print the computational progress. By default, \code{FALSE}.}

\item{seed}{a seed for the random number generator.}

\item{...}{optional arguments to control the clustering algorithm.}
}
\value{
\code{jackstraw_cluster} returns a list consisting of
\item{F.obs}{\code{m} observed F statistics between variables and cluster centers.}
\item{F.null}{F null statistics between null variables and cluster centers, from the jackstraw method.}
\item{p.F}{\code{m} p-values of membership.}
}
\description{
Test the cluster membership using a user-defined clustering algorithm
}
\details{
The clustering algorithms assign \code{m} rows into \code{K} clusters. This function enable statistical
evaluation if the cluster membership is correctly assigned. Each of \code{m} p-values refers to
the statistical test of that row with regard to its assigned cluster.
Its resampling strategy accounts for the over-fitting characteristics due to direct computation of clusters from the observed data
and protects against an anti-conservative bias.

The user is expected to explore the data with a given clustering algorithm and
determine the number of clusters \code{k}.
Furthermore, provide \code{cluster} and \code{centers} as given by applying \code{algorithm} onto \code{dat}.
The rows of \code{centers} correspond to \code{k} clusters, as well as available levels in \code{cluster}.
This function allows you to specify a parametric distribution of a noise term. It is an experimental feature.
}
\references{
Chung and Storey (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4): 545-554 \url{https://academic.oup.com/bioinformatics/article/31/4/545/2748186}

Chung (2020) Statistical significance of cluster membership for unsupervised evaluation of cell identities \url{https://academic.oup.com/bioinformatics/article/36/10/3107/5788523}
}
\author{
Neo Christopher Chung \email{nchchung@gmail.com}
}
