Statistical Applications of taxodist

The taxodist distance matrix is a standard dist object and integrates directly with the R ecosystem for multivariate analysis. This vignette shows how to use it alongside vegan and ape for ecological and phylogenetic analyses. All examples use a set of threatened mammals from Brazil spanning five broad taxonomic groups.


Example dataset: threatened mammals of Brazil

taxa_brazil <- c(
  "Priodontes", "Myrmecophaga", "Chrysocyon", "Tapirus", "Didelphis",
  "Leontopithecus", "Brachyteles",
  "Panthera", "Pteronura", "Puma",
  "Sotalia", "Pontoporia",
  "Trichechus", "Mazama", "Blastocerus"
)

The dataset covers xenarthrans (Priodontes, Myrmecophaga), a marsupial (Didelphis), ungulates (Tapirus, Mazama, Blastocerus), primates (Leontopithecus, Brachyteles), carnivores (Chrysocyon, Panthera, Pteronura, Puma), cetaceans (Sotalia, Pontoporia), and a sirenian (Trichechus).

The pairwise distance matrix is computed with a single call. Lineages are cached after the first retrieval, so subsequent analyses on the same taxa are instantaneous.

library(taxodist)
mat <- distance_matrix(taxa_brazil)
print(mat)
#>                Priodontes Myrmecophaga Chrysocyon    Tapirus  Didelphis
#> Myrmecophaga   0.01639344
#> Chrysocyon     0.01694915   0.01694915
#> Tapirus        0.01694915   0.01694915 0.01587302
#> Didelphis      0.01754386   0.01754386 0.01754386 0.01754386
#> Leontopithecus 0.01694915   0.01694915 0.01666667 0.01666667 0.01754386
#> Brachyteles    0.01694915   0.01694915 0.01666667 0.01666667 0.01754386
#> Panthera       0.01694915   0.01694915 0.01492537 0.01587302 0.01754386
#> Pteronura      0.01694915   0.01694915 0.01470588 0.01587302 0.01754386
#> Puma           0.01694915   0.01694915 0.01492537 0.01587302 0.01754386
#> Sotalia        0.01694915   0.01694915 0.01587302 0.01562500 0.01754386
#> Pontoporia     0.01694915   0.01694915 0.01587302 0.01562500 0.01754386
#> Trichechus     0.01666667   0.01666667 0.01694915 0.01694915 0.01754386
#> Mazama         0.01694915   0.01694915 0.01587302 0.01562500 0.01754386
#> Blastocerus    0.01694915   0.01694915 0.01587302 0.01562500 0.01754386

Didelphis shows the largest distances to all other taxa (0.01754), consistent with the early divergence of marsupials from placental mammals. Carnivores (Panthera, Pteronura, Puma) show smaller distances among themselves, reflecting a more recent common ancestor within Carnivora.


Phylogenetic tree with ape

taxo_cluster() performs hierarchical clustering on the distance matrix and returns an object that ape can convert directly to a phylo tree for publication-quality visualisation.

library(ape)

cl   <- taxo_cluster(taxa_brazil)
tree <- ape::as.phylo(cl$hclust)

plot(tree,
     main      = "Threatened Mammals of Brazil",
     cex       = 0.85,
     tip.color = "gray20")

The phylo object is fully compatible with all ape tree manipulation and plotting functions, including nodelabels(), edgelabels(), and export to Newick format via write.tree().

ape::write.tree(tree, file = "taxa_brazil.nwk")

Taxonomic diversity with vegan

Clarke and Warwick indices (taxondive)

vegan::taxondive() computes a family of taxonomic diversity indices (Clarke and Warwick 1998, 2001) from a community matrix and a taxonomic distance matrix. These indices capture not just species richness but how distantly related the species in a community are: a community of closely related species scores lower than one with the same richness but spanning multiple orders.

The function expects a community matrix (sites x species) and the dist object from distance_matrix(). Here we define three hypothetical communities, each dominated by one broad taxonomic group.

set.seed(123)
library(vegan)

comm <- matrix(c(
  1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,   # xenarthrans + marsupial
  0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,   # primates + carnivores
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1    # cetaceans + sirenian + ungulates
), nrow = 3, byrow = TRUE)

colnames(comm) <- taxa_brazil
rownames(comm) <- c("community_A", "community_B", "community_C")

vegan::taxondive(comm, mat)
#>               Species      Delta       Delta*     Lambda+     Delta+     S Delta+
#> community_A  3.0000e+00  1.7160e-02  1.7160e-02  2.9410e-07  1.7160e-02   0.0515
#> community_B  5.0000e+00  1.5905e-02  1.5905e-02  8.8325e-07  1.5905e-02   0.0795
#> community_C  5.0000e+00  1.5750e-02  1.5750e-02  1.1834e-06  1.5750e-02   0.0788
#> Expected                -9.9265e-02  1.6544e-02              1.6457e-02         

The key indices are:

Delta (\(\Delta\)): mean taxonomic distance between all pairs of species in the community.

Delta* (\(\Delta^*\)): mean pairwise distance excluding self-comparisons; comparable across communities of different sizes.

Delta+ (\(\Delta^+\)): mean pairwise distance weighted by species abundances. For presence-absence data this equals \(\Delta^*\).

Lambda+ (\(\Lambda^+\)): variance in taxonomic distances, measuring how evenly a community spans the taxonomic tree.

community_A, dominated by xenarthrans and a marsupial, scores the highest \(\Delta\) (0.01673), reflecting deeper divergences among its members. community_B (primates and carnivores) scores the lowest (0.01591), as its members share more recent common ancestors within Placentalia.


Mantel test with vegan

The Mantel test assesses the correlation between two distance matrices by permutation. A typical application is testing whether taxonomic distance correlates with geographic distance, a signature of phylogeographic structure.

Here we use simulated coordinates for illustration. In practice, replace coords with observed latitude and longitude values.

set.seed(42)
coords <- matrix(rnorm(30), ncol = 2)
rownames(coords) <- taxa_brazil
geo_dist <- dist(coords)

vegan::mantel(mat, geo_dist)
#> Mantel statistic based on Pearson's product-moment correlation
#>
#> Call:
#> vegan::mantel(xdis = mat, ydis = geo_dist)
#>
#> Mantel statistic r: -0.0569
#>       Significance: 0.653
#>
#> Upper quantiles of permutations (null model):
#>   90%   95% 97.5%   99% 
#> 0.188 0.237 0.272 0.336 
#> Permutation: free
#> Number of permutations: 999

The non-significant result (r = -0.27, p = 0.972) is expected with random coordinates. For non-normal distance distributions, "spearman" is more robust than the default "pearson".

vegan::mantel(mat, geo_dist, method = "spearman", permutations = 9999)
#> Mantel statistic based on Spearman's rank correlation rho 
#>
#> Call:
#> vegan::mantel(xdis = mat, ydis = geo_dist, method = "spearman",
#>     permutations = 9999)
#>
#> Mantel statistic r: -0.07405
#>       Significance: 0.672
#>
#> Upper quantiles of permutations (null model):
#>   90%   95% 97.5%   99%
#> 0.189 0.244 0.293 0.353 
#> Permutation: free
#> Number of permutations: 9999

PERMANOVA with vegan

PERMANOVA (Anderson 2001) tests whether groups of taxa differ significantly in taxonomic distance space. It partitions total variance in the distance matrix into within-group and between-group components and assesses significance by permutation, with no assumption of multivariate normality.

groups <- c(
  "xenarthra", "xenarthra", "carnivore", "ungulate",  "marsupial",
  "primate",   "primate",
  "carnivore", "carnivore", "carnivore",
  "cetacean",  "cetacean",
  "sirenian",  "ungulate",  "ungulate"
)

vegan::adonis2(mat ~ groups)
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#>
#> vegan::adonis2(formula = mat ~ groups)
#>          Df  SumOfSqs      R2      F Pr(>F)
#> Model     6 0.00100049 0.52646 1.4823  0.001 ***
#> Residual  8 0.00089992 0.47354  
#> Total    14 0.00190041 1.00000   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The taxonomic grouping explains \(49.6\%\) of the total variance in the distance matrix (\(\text{R}^2 = 0.496\), \(\text{p} = 0.001\)). This confirms that the broad mammalian orders occupy distinct regions of taxonomic distance space, consistent with their deep evolutionary divergences.

For pairwise comparisons between groups after a significant PERMANOVA, see pairwise.adonis2() in the pairwiseAdonis package.


A complete workflow

The following script chains all steps from taxon names to multivariate analysis.

library(taxodist)
library(vegan)
library(ape)

# 1. define taxa
taxa_brazil <- c(
  "Priodontes", "Myrmecophaga", "Chrysocyon", "Tapirus", "Didelphis",
  "Leontopithecus", "Brachyteles",
  "Panthera", "Pteronura", "Puma",
  "Sotalia", "Pontoporia",
  "Trichechus", "Mazama", "Blastocerus"
)

# 2. compute distance matrix
mat <- distance_matrix(taxa_brazil)

# 3. hierarchical clustering and phylo plot
cl   <- taxo_cluster(taxa_brazil)
tree <- ape::as.phylo(cl$hclust)
plot(tree, main = "Threatened Mammals of Brazil", cex = 0.85)

# 4. PCoA ordination
ord <- taxo_ordinate(mat)
plot(ord, main = "PCoA: Taxonomic Distance Space")

# 5. PERMANOVA
groups <- c(
  "xenarthra", "xenarthra", "carnivore", "ungulate",  "marsupial",
  "primate",   "primate",
  "carnivore", "carnivore", "carnivore",
  "cetacean",  "cetacean",
  "sirenian",  "ungulate",  "ungulate"
)
vegan::adonis2(mat ~ groups)

References

Anderson, M.J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26, 32–46.

Clarke, K.R. and Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35, 523–531.

Clarke, K.R. and Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation in taxonomic distinctness. Marine Ecology Progress Series, 216, 265–278.

Brands, S.J. (1989 onwards). Systema Naturae 2000. Amsterdam, The Netherlands. Retrieved from The Taxonomicon, http://taxonomicon.taxonomy.nl.