Overview
PCAmatchR optimally matches a set of population-based controls to cases. PCAmatchR converts user-provided principal components (PC) into a Mahalanobis distance metric for selecting a set of well-matched controls for each case.
PCAmatchR takes as input user defined PCs and eigenvalues and directly outputs optimal case and control matches.
Important Note
The optmatch code is not contained in this package. In order
to use PCAmatchR, users must manually install and load the
optmatch package (>=0.9-1) separately and accept its
license. Manual loading is necessary due to software license issues. If
the optmatch package is not loaded, the PCAmatchR main
function, match_maker()
, will fail and display an error
message. For more information about the optmatch package,
please see the reference below.
Installation
To install the release version from CRAN:
install.packages("PCAmatchR")
To install the development version from GitHub:
devtools::install_github("machiela-lab/PCAmatchR")
Available functions
Function | Description |
---|---|
match_maker
|
Main function. Weighted matching of controls to cases using PCA results. |
plot_maker
|
Easily make a plot of matches from match_maker output.
|
Available sample data sets
Data set | Description |
---|---|
PCs_1000G
|
First 20 principal components of 2504 individuals from Phase 3 of 1000 Genomes Project. |
eigenvalues_1000G
|
A sample data set containing the first 20 eigenvalues. |
eigenvalues_all_1000G
|
A sample data set containing all of the eigenvalues. |
library(PCAmatchR)
library(optmatch)
##### Input match_maker sample data
# Create PC data frame
<- as.data.frame(PCs_1000G[,c(1,5:24)])
pcs
# Create eigenvalues vector
<- c(eigenvalues_1000G)$eigen_values
eigen_vals
# Create full eigenvalues vector
<- c(eigenvalues_all_1000G)$eigen_values
all_eigen_vals
# Create Covarite data frame
<- PCs_1000G[,c(1:4)]
cov_data
# Generate a case status variable
$case <- ifelse(cov_data$pop=="ESN", c(1), c(0))
cov_data
###################
# Run match_maker #
###################
# 1 to 1 matching
<- match_maker(PC = pcs,
test eigen_value = eigen_vals,
data = cov_data,
ids = c("sample"),
case_control = c("case"),
num_controls = 1,
eigen_sum = sum(all_eigen_vals))
$matches
test$weights
test
# 1 to 2 matching
<- match_maker(PC = pcs,
test eigen_value = eigen_vals,
data = cov_data,
ids = c("sample"),
case_control = c("case"),
num_controls = 2,
eigen_sum = sum(all_eigen_vals))
$matches
test$weights
test
# 1 to 1 matching with exact "gender" matching
<- match_maker(PC = pcs,
test eigen_value = eigen_vals,
data = cov_data,
ids = c("sample"),
case_control = c("case"),
num_controls = 1,
eigen_sum = sum(all_eigen_vals),
exact_match=c("gender"))
$matches
test$weights test
Reference
Hansen BB, Klopfer SO. Optimal full matching and related designs via network flows. Journal of computational and Graphical Statistics. 2006 Sep 1;15(3):609-27.