DiffCorr: Analyzing and Visualizing Differential Correlation Networks in Transcriptomic and Metabolomic Data

Atsushi Fukushima

Kyoto Prefectural University/RIKEN Center for Sustainable Resource Science

2024-09-30

DiffCorr [1-2] is a package for identifying pattern changes between 2 experimental conditions in correlation networks (e.g., gene co-expression networks), which builds on a commonly used association measure, such as Pearson’s correlation coefficient. This document demonstrates typical correlation network analysis using transcriptome and metabolome data.

Installation

Release version (CRAN):

install.packages("DiffCorr")

Development version (Github):

install.packages("devtools")
install.packages(c("igraph", "fdrtool"))

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("pcaMethods", "multtest"))

library(devtools)
install_github("afukushima/DiffCorr")

Introduction

Molecular interactions can be modeled as networks by measuring associations between molecules in omics data. Gene co-expression analysis, commonly based on transcriptome datasets from microarray experiments and RNA-seq, uses metrics like Pearson’s correlation coefficient to quantify these relationships.

When gene correlations surpass a threshold, they form co-expression or correlation networks. These analyses, often using a “guide-gene” approach [3], offer insights into regulatory mechanisms and have been used to identify genes involved in plant secondary metabolisms.

In addition to identifying differentially expressed genes (DEGs) between samples, changes in correlation patterns, or “differential correlations,” provide insights into molecular interactions [4]. Differential network analysis, which compares networks (e.g., normal vs. diseased), has been applied to both plant and animal studies and has been useful in metabolomics for understanding complex metabolic processes.

This document demonstrate typical correlation network analysis using transcriptome and metabolome data. It also showcases the utility of the DiffCorr [1-2] package by identifying biologically relevant, differentially correlated molecules in transcriptome co-expression and metabolite-to-metabolite correlation networks.

library(DiffCorr)

DiffCorr for Golub’s data (ALL/AML leukemia dataset)

This section was created from Additional File 3 included in the original DiffCorr package. As an example, we use Golub’s data (https://coxpress.sourceforge.net/golub.txt). The dataset consist of gene expression profiles from 38 tumor samples including 2 different leukemia subtypes: 27 acute lymphoblastic leukemia (ALL) and 11 acute myeloid leukemia (AML) samples (Golub et al., 1999). The microarray platform used, Affymetrix GeneChip HuGeneFL (known as HU6800), contains 6800 probe-sets. To demonstrate the usefulness of DiffCorr package, we describe and discuss the results from analysis of the transcriptomic dataset.

Reading the Golub dataset

golub.df <- read.table("https://coxpress.sourceforge.net/golub.txt", 
                       sep = "\t", header = TRUE, row.names = 1)
dim(golub.df)
#> [1] 2568   38

Clusters on each subset

hc.mol1 <- cluster.molecule(golub.df[, 1:27], "pearson", "average")  ## ALL (27 samples)
hc.mol2 <- cluster.molecule(golub.df[, 28:38], "pearson", "average") ## AML (11 samples)

Cut the tree at a correlation of 0.6 using cutree function

g1 <- cutree(hc.mol1, h = 0.4)
g2 <- cutree(hc.mol2, h = 0.4)
##
res1 <- get.eigen.molecule(data = golub.df, groups = g1)
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10
#> [1] 11
#> [1] 12
#> [1] 13
#> [1] 14
#> [1] 15
#> [1] 16
#> [1] 17
#> [1] 18
#> [1] 19
#> [1] 20
#> [1] 21
#> [1] 22
res2 <- get.eigen.molecule(data = golub.df, groups = g2)
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10
#> [1] 11
#> [1] 12
#> [1] 13
#> [1] 14
#> [1] 15
#> [1] 16
#> [1] 17
#> [1] 18
#> [1] 19
#> [1] 20
#> [1] 21
#> [1] 22
#> [1] 23
#> [1] 24
#> [1] 25
#> [1] 26
#> [1] 27
#> [1] 28
#> [1] 29
#> [1] 30
#> [1] 31
#> [1] 32
#> [1] 33
#> [1] 34
#> [1] 35
#> [1] 36
#> [1] 37
#> [1] 38
#> [1] 39
#> [1] 40
#> [1] 41
#> [1] 42
#> [1] 43
#> [1] 44
#> [1] 45
#> [1] 46
#> [1] 47
#> [1] 48
#> [1] 49
#> [1] 50
#> [1] 51
#> [1] 52
#> [1] 53
#> [1] 54
#> [1] 55
#> [1] 56
#> [1] 57
#> [1] 58
#> [1] 59
#> [1] 60
#> [1] 61
#> [1] 62
#> [1] 63
#> [1] 64
#> [1] 65
#> [1] 66
#> [1] 67
#> [1] 68
#> [1] 69
#> [1] 70
#> [1] 71
#> [1] 72
#> [1] 73
#> [1] 74
#> [1] 75
#> [1] 76
#> [1] 77
#> [1] 78
#> [1] 79
#> [1] 80
#> [1] 81
#> [1] 82
#> [1] 83
#> [1] 84
#> [1] 85
#> [1] 86
#> [1] 87

Visualizing module networks

gg1 <- get.eigen.molecule.graph(res1)
plot(gg1, layout = layout.fruchterman.reingold(gg1))


gg2 <- get.eigen.molecule.graph(res2)
plot(gg2, layout = layout.fruchterman.reingold(gg2))

You can save the results.

write.modules(g1, res1, outfile = "module1_list.txt")
write.modules(g2, res2, outfile = "module2_list.txt")

You can examine the relationship between modules.

for (i in 1:length(res1$eigen.molecules)) {
  for (j in 1: length(res2$eigen.molecules)) {
    r <- cor(res1$eigen.molecules[[i]],res2$eigen.molecules[[j]], method = "spearman")
    if (abs(r) > 0.8) {
      print(paste("(i, j): ", i, " ", j, sep = ""))
      print(r)
    }
  }
}
#> [1] "(i, j): 2 8"
#> [1] 0.830303
#> [1] "(i, j): 4 83"
#> [1] 0.8424242
#> [1] "(i, j): 5 86"
#> [1] 0.8787879
#> [1] "(i, j): 10 56"
#> [1] -0.8666667
#> [1] "(i, j): 10 63"
#> [1] -0.8545455
#> [1] "(i, j): 13 47"
#> [1] -0.8060606
#> [1] "(i, j): 13 87"
#> [1] 0.8181818
#> [1] "(i, j): 21 24"
#> [1] -0.9515152

cor(res1$eigen.molecules[[2]], res2$eigen.molecules[[8]], method = "spearman")
#> [1] 0.830303
plot(res1$eigen.molecules[[2]], res2$eigen.molecules[[8]])

plot(res1$eigen.molecules[[21]], res2$eigen.molecules[[24]])

Examine groups of interest graphically

look at groups 21 and 24

plotDiffCorrGroup(golub.df, g1, g2, 21, 24, 1:27, 28:38,
                    scale.center = TRUE, scale.scale = TRUE,
                    ylim=c(-5,5))

Export the results (FDR < 0.05)

comp.2.cc.fdr(output.file = "res.txt", golub.df[, 1:27], golub.df[, 28:38], threshold = 0.05, save = TRUE)

Exploring the metabolome data of flavonoid-deficient Arabidopsis

Kusano et al. [5] studied flavonoid-deficient Arabidopsis thaliana (Arabidopsis) mutants and wild-type plants using gas chromatography-mass spectrometry (GC-MS) for metabolite profiling [5-6]. The mutant, transparent testa 4 (tt4), lacks chalcone synthase (CHS), a key enzyme in the flavonoid biosynthesis pathway, and is unable to produce flavonoids, which protect plants from UV-B radiation.

AraMetLeaves dataset

AraMetLeaves includes metabolite profiles of 37 aerial part samples, consisting of 17 Columbia-0 wild-type (Col-0) and 20 tt4 plants, covering a wide range of primary metabolites. The dataset AraMetLeaves is available in the DiffCorr package.

data(AraMetLeaves)
dim(AraMetLeaves)
#> [1] 59 50

The AraMetLeaves dataset contains 59 metabolites (rows) and 50 observations (columns). For comparison with data from aerial parts [5-6], we selected 59 commonly detected metabolites across both datasets using MetMask (https://metmask.sourceforge.net). It is important to note that another genotype, mto1, is also present in the data matrix. For further information, refer to the help page of AraMetLeaves.

colnames(AraMetLeaves)
#>  [1] "Col0.1"  "Col0.2"  "Col0.3"  "Col0.4"  "Col0.5"  "Col0.6"  "Col0.7" 
#>  [8] "Col0.8"  "Col0.9"  "Col0.10" "Col0.11" "Col0.12" "Col0.13" "Col0.14"
#> [15] "Col0.15" "Col0.16" "Col0.17" "tt4.1"   "tt4.2"   "tt4.3"   "tt4.4"  
#> [22] "tt4.5"   "tt4.6"   "tt4.7"   "tt4.8"   "tt4.9"   "tt4.10"  "tt4.11" 
#> [29] "tt4.12"  "tt4.13"  "tt4.14"  "tt4.15"  "tt4.16"  "tt4.17"  "tt4.18" 
#> [36] "tt4.19"  "tt4.20"  "mto1.1"  "mto1.2"  "mto1.3"  "mto1.4"  "mto1.5" 
#> [43] "mto1.6"  "mto1.7"  "mto1.8"  "mto1.9"  "mto1.10" "mto1.11" "mto1.12"
#> [50] "mto1.13"
?AraMetLeaves

Differential correlation analysis for tt4 mutant and the wild-type plants

Differential correlation between tt4 and Col-0 can be performed as follows:

comp.2.cc.fdr(output.file = "Met_DiffCorr_res.txt", 
              log10(AraMetLeaves[, 1:17]),   ## Col-0 (17 samples)
              log10(AraMetLeaves[, 18:37]),  ## tt4 (20 samples)
              method = "pearson",
              threshold = 1.0, save = TRUE)

As indicated in the ASCII result file “Met_DiffCorr_res.txt,” the DiffCorr package identified significant differential correlations between sinapate and aromatic metabolites in tt4 and wild-type plants. Consistent with previous findings [2], aromatic metabolites in the shikimate pathway—specifically sinapate, phenylalanine, and tyrosine exhibited significant correlations in tt4 but not in wild-type plants (Table 1). This suggests a connection to the role of sinapoyl-malate in protecting the flavonoid-deficient tt4 mutant against UV-B irradiation [5]. Our results demonstrate that Arabidopsis compensates for the deficiency in either flavonoid or sinapoyl-malate production by over-accumulating alternative protective compounds [7]. These findings suggest that DiffCorr is applicable not only to transcriptomic data but also to other post-genomic data types, including metabolomic data.

Table 1. A typical result of pairwise differential correlations from the DiffCorr package. The full list can be found in [1].

molecule X molecule Y r1 p1 r2 p2 p (difference) (r1-r2) lfdr (in cond. 1) lfdr (in cond. 2) lfdr (difference)
Malate Threonine 0.77 0.00034 0.94 1.5E-09 0.057 -0.17 0.0049 3.2E-08 0.76
Malate Phenylalanine 0.45 0.070 0.89 1.2E-07 0.0086 -0.44 0.20 2.2E-06 0.64

Conclusion

The R package DiffCorr provides a straightforward and efficient framework for detecting differential correlations between two conditions in omics data, utilizing Fisher’s z-test. It is a useful tool for inferring potential relationships and identifying biomarker candidates. Based on the concept of “differential network biology,” DiffCorr [1, 5] is applicable not only to metabolomic data but also to transcriptome, proteome, and integrated omics datasets.

References

  1. Fukushima, Gene (2013) https://doi.org/10.1016/j.gene.2012.11.028

  2. Fukushima and Nishida “Using the DiffCorr Package to Analyze and Visualize Differential Correlations in Biological Networks” - Book chapter in “Challenges of Computational Network Analysis with R”. Editors: Matthias Dehmer, Yongtang Shi, and Frank Emmert-Streib. WILEY.

  3. Saito et al. Trends Plant Sci (2008) https://doi.org/10.1016/j.tplants.2007.10.006

  4. de la Fuente, Trends Genet (2010) https://doi.org/10.1016/j.tig.2010.05.001

  5. Kusano et al. BMC Syst Biol (2007) https://doi.org/10.1186/1752-0509-1-53

  6. Fukushima et al. BMC Syst Biol (2011) https://doi.org/10.1186/1752-0509-5-1

  7. Kusano et al. Plant J (2011) https://doi.org/10.1111/j.1365-313x.2011.04599.x