library(ripserr)
In this vignette, we will generate a point cloud using a sample of 2-dimensional points on the unit circle's circumference; this will be stored in a variable named circle2d
.
# create reproducible dataset
set.seed(42)
unif_angles <- runif(25, 0, 2 * pi)
circle2d <- data.frame(x = cos(unif_angles),
y = sin(unif_angles))
# take a peek at first 6 rows
head(circle2d)
#> x y
#> 1 0.8601210 -0.5100900
#> 2 0.9228553 -0.3851468
#> 3 -0.2251251 0.9743299
#> 4 0.4842164 -0.8749483
#> 5 -0.6289353 -0.7774577
#> 6 -0.9928106 -0.1196957
Above, each of the 100 rows represents a single point, with each of the 2 columns representing a Cartesian coordinate for a single dimension.
Column x
contains the x-coordinates of the 100 points and column y
contains the respective y-coordinates.
To confirm that the points in circle2d
do lie on the circumference of a circle, we can quickly create a scatterplot.
# scatterplot of circle2d
plot(circle2d, xlab = "x", ylab = "y", main = "2-d circle point cloud")
Given that the points in circle2d
are uniformly distributed across the circumference of a circle without any error or noise, we expect a single prominent 1-cycle to be present in its persistent homology.
The Ripser C++ library is wrapped by R using Rcpp, and performs calculations on a Vietoris-Rips complex created with the input point cloud [@Rcpp-paper].
These calculations result in a numeric matrix that contains all the necessary information to characterize the persistence of homological features within circle2d
, and can be performed with a single line of R code using ripserr.
# calculate persistent homology
circle.phom <- vietoris_rips(circle2d)
# print first 6 features (ordered by dimension and birth)
head(circle.phom)
#> dimension birth death
#> 1 0 0 0.01509939
#> 2 0 0 0.01846671
#> 3 0 0 0.02540582
#> 4 0 0 0.02859409
#> 5 0 0 0.04180345
#> 6 0 0 0.06699952
# print last 6 features (ordered by dimension and birth)
tail(circle.phom)
#> dimension birth death
#> 20 0 0.000000 0.4582335
#> 21 0 0.000000 0.5059727
#> 22 0 0.000000 0.5793416
#> 23 0 0.000000 0.5812266
#> 24 0 0.000000 0.7170409
#> 25 1 1.026735 1.7859821
Each row in the homology matrix returned by the vietoris_rips
function (variable named circle.phom
) represents a single feature (cycle).
The homology matrix has 3 columns in the following order:
Persistence of a feature is generally defined as the length of the interval of the radius within which the feature exists.
This can be calculated as the numerical difference between the second (birth) and third (death) columns of the homology matrix.
Confirmed in the output of the head
and tail
functions above, the homology matrix is ordered by dimension, with the birth column used to compare features of the same dimension.
As expected for circle2d
, the homology matrix contains a single prominent 1-cycle (last line of tail
's output).
Although we suspect the feature to be a persistent 1-cycle, comparison with the other features in the homology matrix is required to confirm that it is sufficiently persistent.
This task is done far more easily with an appropriate visualization than by eyeballing the contents of circle.phom
.
Below, we use the TDAstats R package to quickly generate a topological barcode for persistent homology visualization.
# plot topological barcode
TDAstats::plot_barcode(as.matrix(circle.phom))
The single blue bar represents the expected 1-cycle in a 2-dimensional, circular point cloud.