Let Y={yi} and X={xi} be the two spatial cross sectional variable, where i=1,2,…,n denotes spatial units (e.g., regions or grid cells), the shadow manifolds of X can be constructed using the different spatial lag values of all spatial units:
Mx=[S(1)(x1)S(1+τ)(x1)⋯S(1+(E−1)τ)(x1)S(1)(x2)S(1+τ)(x2)⋯S(1+(E−1)τ)(x2)⋮⋮⋱⋮S(1)(xn)S(1+τ)(xn)⋯S(1+(E−1)τ)(xn)]
Here, S(j)(xi) denotes the j th-order spatial lag value of spatial unit i, $$ is the step size for the spatial lag order, and E is the embedding dimension and Mx corresponds to the shadow manifolds of X.
With the reconstructed shadow manifolds Mx, the state of Y can be predicted with the state of X through
ˆYs∣Mx=k∑i=1(ωsiYsi∣Mx)
where s represents a spatial unit at which the value of Y needs to be predicted, ˆYs is the prediction result, k is the number of nearest neighbors used for prediction, si is the spatial unit used in the prediction, Ysi is the observation value of Y at si. ωsi is the corresponding weight defined as:
ωsi∣Mx=weight(ψ(Mx,si),ψ(Mx,s))∑L+1i=1weight(ψ(Mx,si),ψ(Mx,s))
where ψ(Mx,si) is the state vector of spatial unit si in the shadow manifold $ M_x$, and weight(∗,∗) is the weight function between two states in the shadow manifold, defined as:
weight(ψ(Mx,si),ψ(Mx,s))=exp(−dis(ψ(Mx,si),ψ(Mx,s))dis(ψ(Mx,s1),ψ(Mx,s)))
where exp is the exponential function and dis(∗,∗) represents the distance function between two states in the shadow manifold defined as:
dis(ψ(Mx,si),ψ(Mx,s))=1EE∑m=1|ψm(Mx,si)−ψm(Mx,s)|
where dis(ψ(Mx,si),ψ(Mx,s)) denotes the average absolute difference between corresponding elements of the two state vectors in the shadow manifold Mx, E is the embedding dimension, and ψm(Mx,si) is the m-th element of the state vector ψ(Mx,si).
The skill of cross-mapping prediction ρ is measured by the Pearson correlation coefficient between the true observations and corresponding predictions, and the confidence interval of ρ can be estimated based the z-statistics with the normal distribution:
ρ=Cov(Y,ˆY∣Mx)√Var(Y)Var(ˆY∣Mx)
t=ρ√n−21−ρ2
where n is the number of observations to be predicted, and
z=12ln(1+ρ1−ρ)
The prediction skill ρ varies by setting different sizes of libraries, which means the quantity of observations used in reconstruction of the shadow manifold. We can use the convergence of ρ to infer the causal associations. For GCCM, the convergence means that ρ increases with the size of libraries and is statistically significant when the library becomes largest.
ρx→y=lim
where \rho_{x \to y} is the correlation after convergence, used to measure the causation effect from Y to X, despite the notation suggesting the reverse direction.
Load the spEDM
package and its county-level population
density data:
library(spEDM)
popd_nb = spdep::read.gal(system.file("case/popd_nb.gal",package = "spEDM"))
## Warning in spdep::read.gal(system.file("case/popd_nb.gal", package = "spEDM")):
## neighbour object has 4 sub-graphs
popd = readr::read_csv(system.file("case/popd.csv",package = "spEDM"))
## Rows: 2806 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (7): x, y, popd, elev, tem, pre, slope
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
popd_sf = sf::st_as_sf(popd, coords = c("x","y"), crs = 4326)
popd_sf
## Simple feature collection with 2806 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346
## Geodetic CRS: WGS 84
## # A tibble: 2,806 × 6
## popd elev tem pre slope geometry
## * <dbl> <dbl> <dbl> <dbl> <dbl> <POINT [°]>
## 1 780. 8 17.4 1528. 0.452 (116.912 30.4879)
## 2 395. 48 17.2 1487. 0.842 (116.755 30.5877)
## 3 261. 49 16.0 1456. 3.56 (116.541 30.7548)
## 4 258. 23 17.4 1555. 0.932 (116.241 30.104)
## 5 211. 101 16.3 1494. 3.34 (116.173 30.495)
## 6 386. 10 16.6 1382. 1.65 (116.935 30.9839)
## 7 350. 23 17.5 1569. 0.346 (116.677 30.2412)
## 8 470. 22 17.1 1493. 1.88 (117.066 30.6514)
## 9 1226. 11 17.4 1526. 0.208 (117.171 30.5558)
## 10 137. 598 13.9 1458. 5.92 (116.208 30.8983)
## # ℹ 2,796 more rows
Run GCCM:
startTime = Sys.time()
pd_res = gccm(data = popd_sf,
cause = "pre",
effect = "popd",
libsizes = seq(100, 2800, by = 200),
E = c(2,5),
k = 6,
nb = popd_nb,
progressbar = FALSE)
endTime = Sys.time()
print(difftime(endTime,startTime, units ="mins"))
## Time difference of 3.026651 mins
pd_res
## libsizes pre->popd popd->pre
## 1 100 0.1712430 0.03310752
## 2 300 0.2734076 0.06281068
## 3 500 0.3373114 0.08695222
## 4 700 0.4129516 0.10770835
## 5 900 0.4659603 0.12410538
## 6 1100 0.5035359 0.13743351
## 7 1300 0.5375355 0.14767426
## 8 1500 0.5659653 0.15726216
## 9 1700 0.5986755 0.16723965
## 10 1900 0.6305533 0.17562842
## 11 2100 0.6614280 0.18275701
## 12 2300 0.6901668 0.18874968
## 13 2500 0.7178725 0.19471162
## 14 2700 0.7433577 0.20087427
Visualize the result:
Load the spEDM
package and its farmland NPP data:
library(spEDM)
npp = terra::rast(system.file("case/npp.tif", package = "spEDM"))
npp
## class : SpatRaster
## dimensions : 404, 483, 5 (nrow, ncol, nlyr)
## resolution : 10000, 10000 (x, y)
## extent : -2625763, 2204237, 1877078, 5917078 (xmin, xmax, ymin, ymax)
## coord. ref. : CGCS2000_Albers
## source : npp.tif
## names : npp, pre, tem, elev, hfp
## min values : 164.00, 384.3409, -47.8194, -122.2004, 0.03390418
## max values : 16606.33, 23878.3555, 263.6938, 5350.4902, 44.90312195
# To save the computation time, we will aggregate the data by 3 times
npp = terra::aggregate(npp, fact = 3, na.rm = TRUE)
npp
## class : SpatRaster
## dimensions : 135, 161, 5 (nrow, ncol, nlyr)
## resolution : 30000, 30000 (x, y)
## extent : -2625763, 2204237, 1867078, 5917078 (xmin, xmax, ymin, ymax)
## coord. ref. : CGCS2000_Albers
## source(s) : memory
## names : npp, pre, tem, elev, hfp
## min values : 187.50, 390.3351, -47.8194, -110.1494, 0.04434316
## max values : 15381.89, 23734.5330, 262.8576, 5217.6431, 42.68803711
# Inspect NA values
terra::global(npp,"isNA")
## isNA
## npp 14815
## pre 14766
## tem 14766
## elev 14760
## hfp 14972
terra::ncell(npp)
## [1] 21735
nnamat = terra::as.matrix(npp[[1]], wide = TRUE)
nnaindice = which(!is.na(nnamat), arr.ind = TRUE)
dim(nnaindice)
## [1] 6920 2
# Select 1500 non-NA pixels to predict:
set.seed(2025)
indices = sample(nrow(nnaindice), size = 1500, replace = FALSE)
libindice = nnaindice[-indices,]
predindice = nnaindice[indices,]
Run GCCM:
startTime = Sys.time()
npp_res = gccm(data = npp,
cause = "pre",
effect = "npp",
libsizes = matrix(rep(seq(10,130,20),2),ncol = 2),
E = c(2,10),
k = 8,
lib = nnaindice,
pred = predindice,
progressbar = FALSE)
endTime = Sys.time()
print(difftime(endTime,startTime, units ="mins"))
## Time difference of 5.069043 mins
npp_res
## libsizes pre->npp npp->pre
## 1 10 0.1295281 0.1013898
## 2 30 0.2607783 0.2286559
## 3 50 0.3788341 0.2956426
## 4 70 0.4961658 0.3394660
## 5 90 0.5972341 0.3376894
## 6 110 0.7358447 0.3412305
## 7 130 0.8386849 0.3663077
Visualize the result:
plot(npp_res,xlimits = c(5, 135),ylimits = c(0.05,1)) +
ggplot2::theme(legend.justification = c(0.25,1))