% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/markerQC.R
\name{check_hwe}
\alias{check_hwe}
\title{Identification of SNPs showing a significant deviation from Hardy-Weinberg-
equilibrium (HWE)}
\usage{
check_hwe(
  indir,
  name,
  qcdir = indir,
  hweTh = 1e-05,
  interactive = FALSE,
  path2plink = NULL,
  verbose = FALSE,
  showPlinkOutput = TRUE,
  keep_individuals = NULL,
  remove_individuals = NULL,
  exclude_markers = NULL,
  extract_markers = NULL,
  legend_text_size = 5,
  legend_title_size = 7,
  axis_text_size = 5,
  axis_title_size = 7,
  title_size = 9
)
}
\arguments{
\item{indir}{[character] /path/to/directory containing the basic PLINK data
files name.bim, name.bed, name.fam files.}

\item{name}{[character] Prefix of PLINK files, i.e. name.bed, name.bim,
name.fam.}

\item{qcdir}{[character] /path/to/directory where results will be written to.
If \code{\link{perIndividualQC}} was conducted, this directory should be the
same as qcdir specified in \code{\link{perIndividualQC}}, i.e. it contains
name.fail.IDs with IIDs of individuals that failed QC. User needs writing
permission to qcdir. Per default, qcdir=indir.}

\item{hweTh}{[double] Significance threshold for deviation from HWE.}

\item{interactive}{[logical] Should plots be shown interactively? When
choosing this option, make sure you have X-forwarding/graphical interface
available for interactive plotting. Alternatively, set interactive=FALSE and
save the returned plot object (p_hwe) via ggplot2::ggsave(p=p_hwe,
other_arguments) or pdf(outfile) print(p_hwe) dev.off().}

\item{path2plink}{[character] Absolute path to PLINK executable
(\url{https://www.cog-genomics.org/plink/1.9/}) i.e.
plink should be accessible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by \code{\link[sys]{exec}}('plink').}

\item{verbose}{[logical] If TRUE, progress info is printed to standard out
and specifically, if TRUE, plink log will be displayed.}

\item{showPlinkOutput}{[logical] If TRUE, plink log and error messages are
printed to standard out.}

\item{keep_individuals}{[character] Path to file with individuals to be
retained in the analysis. The file has to be a space/tab-delimited text file
with family IDs in the first column and within-family IDs in the second
column. All samples not listed in this file will be removed from the current
analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#indiv}.
Default: NULL, i.e. no filtering on individuals.}

\item{remove_individuals}{[character] Path to file with individuals to be
removed from the analysis. The file has to be a space/tab-delimited text file
with family IDs in the first column and within-family IDs in the second
column. All samples listed in this file will be removed from the current
analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#indiv}.
Default: NULL, i.e. no filtering on individuals.}

\item{exclude_markers}{[character] Path to file with makers to be
removed from the analysis. The file has to be a text file with a list of
variant IDs (usually one per line, but it's okay for them to just be
separated by spaces). All listed variants will be removed from the current
analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#snp}.
Default: NULL, i.e. no filtering on markers.}

\item{extract_markers}{[character] Path to file with makers to be
included in the analysis. The file has to be a text file with a list of
variant IDs (usually one per line, but it's okay for them to just be
separated by spaces). All unlisted variants will be removed from the current
analysis. See \url{https://www.cog-genomics.org/plink/1.9/filter#snp}.
Default: NULL, i.e. no filtering on markers.}

\item{legend_text_size}{[integer] Size for legend text.}

\item{legend_title_size}{[integer] Size for legend title.}

\item{axis_text_size}{[integer] Size for axis text.}

\item{axis_title_size}{[integer] Size for axis title.}

\item{title_size}{[integer] Size for plot title.}
}
\value{
Named list with i) fail_hwe containing a [data.frame] with CHR
(Chromosome code), SNP (Variant identifier), TEST (Type of test: one of: 
ALL', 'AFF', 'UNAFF', 'ALL(QT)', 'ALL(NP)'), A1 (Allele 1; usually minor),
A2 (Allele 2; usually major), GENO ('/'-separated genotype counts: A1 hom,
het, A2 hom), O(HET) (Observed heterozygote frequency E(HET) (Expected
heterozygote frequency), P (Hardy-Weinberg equilibrium exact test p-value)
for all SNPs that failed the hweTh and ii) p_hwe, a ggplot2-object
'containing' the HWE p-value distribution histogram which can be shown by
(print(p_hwe)).
}
\description{
Runs and evaluates results from plink --hardy. It calculates the observed and
expected heterozygote frequencies for all variants in the individuals that
passed the \code{\link{perIndividualQC}} and computes the deviation of the
frequencies from Hardy-Weinberg equilibrium (HWE) by HWE exact test. The
p-values of the HWE exact test are displayed as histograms (stratified by
all and low p-values), where the hweTh is used to depict the quality control
cut-off for SNPs.
}
\details{
\code{check_hwe} uses plink --remove name.fail.IDs --hardy to
calculate the observed and expected heterozygote frequencies per SNP in the
individuals that passed the \code{\link{perIndividualQC}}. It does so
without generating a new dataset but simply removes the IDs when calculating
the statistics.

For details on the output data.frame fail_hwe, check the original
description on the PLINK output format page:
\url{https://www.cog-genomics.org/plink/1.9/formats#hwe}.
}
\examples{
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
\dontrun{
# run on all individuals and markers
fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=FALSE,
verbose=TRUE, path2plink=path2plink)

# run on subset of individuals and markers
remove_individuals_file <- system.file("extdata", "remove_individuals",
package="plinkQC")
extract_markers_file <- system.file("extdata", "extract_markers",
package="plinkQC")
fail_hwe <- check_hwe(qcdir=qcdir, indir=indir,
name=name, interactive=FALSE, verbose=TRUE, path2plink=path2plink,
remove_individuals=remove_individuals_file,
extract_markers=extract_markers_file)
}
}
