% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mst.R
\name{mst}
\alias{mst}
\alias{mst.default}
\alias{mst.dist}
\title{Minimum Spanning Tree of the Pairwise Distance Graph}
\usage{
mst(d, ...)

\method{mst}{default}(
  d,
  distance = c("euclidean", "l2", "manhattan", "cityblock", "l1", "cosine"),
  M = 1L,
  cast_float32 = TRUE,
  verbose = FALSE,
  ...
)

\method{mst}{dist}(d, M = 1L, verbose = FALSE, ...)
}
\arguments{
\item{d}{either a numeric matrix (or an object coercible to one,
e.g., a data frame with numeric-like columns) or an
object of class \code{dist}, see \code{\link[stats]{dist}}}

\item{...}{further arguments passed to or from other methods}

\item{distance}{metric used to compute the linkage, one of:
\code{"euclidean"} (synonym: \code{"l2"}),
\code{"manhattan"} (a.k.a. \code{"l1"} and \code{"cityblock"}),
\code{"cosine"}}

\item{M}{smoothing factor; \code{M} = 1 gives the selected \code{distance};
otherwise, the mutual reachability distance is used}

\item{cast_float32}{logical; whether to compute the distances using 32-bit
instead of 64-bit precision floating-point arithmetic (up to 2x faster)}

\item{verbose}{logical; whether to print diagnostic messages
and progress information}
}
\value{
Matrix of class \code{mst} with n-1 rows and 3 columns:
\code{from}, \code{to} and \code{dist}. It holds \code{from} < \code{to}.
Moreover, \code{dist} is sorted nondecreasingly.
The i-th row gives the i-th edge of the MST.
\code{(from[i], to[i])} defines the vertices (in 1,...,n)
and \code{dist[i]} gives the weight, i.e., the
distance between the corresponding points.

The \code{method} attribute gives the name of the distance used.
The \code{Labels} attribute gives the labels of all the input points.

If \code{M} > 1, the \code{nn} attribute gives the indices of the \code{M}-1
nearest neighbours of each point.
}
\description{
An parallelised implementation of a Jarnik (Prim/Dijkstra)-like
algorithm for determining
a(*) minimum spanning tree (MST) of a complete undirected graph
representing a set of n points
with weights given by a pairwise distance matrix.

(*) Note that there might be multiple minimum trees spanning a given graph.
}
\details{
If \code{d} is a numeric matrix of size \eqn{n p},
the \eqn{n (n-1)/2} distances are computed on the fly, so that \eqn{O(n M)}
memory is used.


The algorithm is parallelised; set the \code{OMP_NUM_THREADS} environment
variable \code{\link[base]{Sys.setenv}} to control the number of threads
used.

Time complexity is \eqn{O(n^2)} for the method accepting an object of
class \code{dist} and \eqn{O(p n^2)} otherwise.

If \code{M} >= 2, then the mutual reachability distance \eqn{m(i,j)} with smoothing
factor \code{M} (see Campello et al. 2015)
is used instead of the chosen "raw" distance \eqn{d(i,j)}.
It holds \eqn{m(i, j)=\max(d(i,j), c(i), c(j))}, where \eqn{c(i)} is
\eqn{d(i, k)} with \eqn{k} being the (\code{M}-1)-th nearest neighbour of \eqn{i}.
This makes "noise" and "boundary" points being "pulled away" from each other.
Genie++ clustering algorithm (see \code{\link{gclust}})
with respect to the mutual reachability distance gains the ability to
identify some observations are noise points.

Note that the case \code{M} = 2 corresponds to the original distance, but we are
determining the 1-nearest neighbours separately as well, which is a bit
suboptimal; you can file a feature request if this makes your data analysis
tasks too slow.
}
\examples{
library("datasets")
data("iris")
X <- iris[1:4]
tree <- mst(X)

}
\references{
Jarnik V., O jistem problemu minimalnim,
\emph{Prace Moravske Prirodovedecke Spolecnosti} 6, 1930, 57-63.

Olson C.F., Parallel algorithms for hierarchical clustering,
\emph{Parallel Comput.} 21, 1995, 1313-1325.

Prim R., Shortest connection networks and some generalisations,
\emph{Bell Syst. Tech. J.} 36, 1957, 1389-1401.

Campello R., Moulavi D., Zimek A., Sander J.,
Hierarchical density estimates for data clustering, visualization,
and outlier detection, \emph{ACM Transactions on Knowledge Discovery
from Data} 10(1), 2015, 5:1-5:51.
}
\seealso{
The official online manual of \pkg{genieclust} at \url{https://genieclust.gagolewski.com/}

Gagolewski M., \pkg{genieclust}: Fast and robust hierarchical clustering, \emph{SoftwareX} 15:100722, 2021, \doi{10.1016/j.softx.2021.100722}.

\code{\link{emst_mlpack}()} for a very fast alternative
in case of (very) low-dimensional Euclidean spaces (and \code{M} = 1).
}
\author{
\href{https://www.gagolewski.com/}{Marek Gagolewski} and other contributors
}
