The CpDyna package is dedicated to dynamic graphs and implements the algorithm described in (Corneli, Latouche, and Rossi 2018). In that paper, a model based approached is proposed in order to cluster the vertices of a dynamic graph, while detecting multiple change points in the interaction intensities. In this vignette, we show how tu use the package on simulated and real datasets.
set.seed(1)
library(CpDyna)
The package contains 1 simulated dataset and 2 real datasets.
Gnu is a \((3\times40)\) simulated data matrix. It reports on the first row \(40\) interacion times. One interaction time corresponds to an undirected interaction from a source node to a destination node. Source nodes are reported on the second row of Gnu, destination nodes on the third one.
Loading the simulated dataset:
data("Gnu")
A custom partition is created based on the time horizon in Gnu:
tail(Gnu[1,])
## [1] 99.43402 99.44803 99.73759 99.82499 99.87731 99.97434
custom_ptn <- c(1:100)
Call to the function ModFit:
res <- ModFit(Gnu,
4,
4,
MinimalPartition = FALSE,
N_initializations = 20,
custom_partition = custom_ptn)
Looking at the estimated clusters/change points:
res$est_z
## [1] 3 3 3 3 3 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [36] 2 2 2 2 2
res$est_cp
## [1] 20 30 60 100
and plotting the results:
par(mfrow = c(1,4))
ModPlot(Gnu, res, type = "adjacency")
## Warning: package 'knitr' was built under R version 3.4.3
Loading the real datasets:
data(wtab)
data(stations_df)
In order to take a look at the dynamics, we plot an histogram of the interaction dates:
par(mfrow=c(1,1), bty="n", cex.axis=1.5, cex.lab=1.5)
hist(wtab$Start.Date, breaks = 60, xaxt="n", yaxt="n", main = "", xlab = "time (hours)")
last <- 86340 # roughly 24h
time.seq<-seq(from=1, to=last, by=3600)
axis(1, at=time.seq, lab=c(0:23), lwd=.2)
axis(2, lwd=.2)
Gnu is slightly manipulated to fit the ModFit function format:
Gnu <- as.matrix(wtab)
Gnu <- t(Gnu)
Finally, the ModFit function is called to perform node clustering and change point detection:
N <- max(Gnu[2,], Gnu[3,]) # number of nodes/stations
step <- 900 # in seconds, corresponds to 15 minutes
custom_ptn <- seq(from = step, to = max(Gnu[1,]), by = step) # user defind partition
res <- ModFit(Gnu, 4, 4, eps=10^(-1),MinimalPartition = FALSE, custom_partition = custom_ptn, N_initializations = 1, verbose = TRUE)
The estimated change points are added to the histogram:
cp <- res$est_cp
abline(v = cp, col="red", lwd=1.5)
library(mapview)
library(sp)
new_df <- stations_df[, c(1,2,4,5)]
#converting columns from factor to proper formats
sq<-c(1,3,4)
for(i in 1:length(sq)) new_df[,sq[i]]<-as.numeric(levels(new_df[,sq[i]]))[new_df[,sq[i]]]
new_df[,2]<-as.character(levels(new_df[,2]))[new_df[,2]]
WhoIsWho <- seq(1:N)
match_pos <- res$est_z[match(new_df$id, WhoIsWho)] # for each station (id) in new_df, i look for its position in WhoIsWho and take outz at this position
new_df<-cbind(new_df, match_pos)
pos_na <- which(is.na(new_df$match_pos))
new_df <- new_df[-pos_na, ]
tav <-c ("RoyalBlue3","red", "green3","gold")
sub_df <- new_df[,c(3,4)]
coordinates(sub_df) <- ~long+lat
proj4string(sub_df) <- CRS("+init=epsg:4326")
mapview(sub_df, color=tav[new_df$match_pos], cex= 0.5, alpha = 0.8, lwd=6)
Corneli, Marco, Pierre Latouche, and Fabrice Rossi. 2018. “Multiple Change Points Detection and Clustering in Dynamic Networks.” Statistics and Computing 28 (5): 989–1007. doi:10.1007/s11222-017-9775-1.