Loading [MathJax]/jax/output/HTML-CSS/jax.js

High Dimensional Data Visualization

Wayne Oldford and Zehao Xu

2024-04-08

Serialaxes coordinate

Serial axes coordinate is a methodology for visualizing the p-dimensional geometry and multivariate data. As the name suggested, all axes are shown in serial. The axes can be a finite p space or transformed to an infinite space (e.g. Fourier transformation).

In the finite p space, all axes can be displayed in parallel which is known as the parallel coordinate; also, all axes can be displayed under a polar coordinate that is often known as the radial coordinate or radar plot. In the infinite space, a mathematical transformation is often applied. More details will be explained in the sub-section Infinite axes

A point in Euclidean p-space Rp is represented as a polyline in serial axes coordinate, it is found that a point <–> line duality is induced in the Euclidean plane R2 (A. Inselberg and Dimsdale 1990).

Before we start, a couple of things should be noticed:

Finite axes

Suppose we are interested in the data set iris. A parallel coordinate chart can be created as followings:

library(ggmulti)
# parallel axes plot
ggplot(iris, 
       mapping = aes(
         Sepal.Length = Sepal.Length,
         Sepal.Width = Sepal.Width,
         Petal.Length = Petal.Length,
         Petal.Width = Petal.Width,
         colour = factor(Species))) +
  geom_path(alpha = 0.2)  + 
  coord_serialaxes() -> p
p

A histogram layer can be displayed by adding layer geom_histogram

p + 
  geom_histogram(alpha = 0.3, 
                 mapping = aes(fill = factor(Species))) + 
  theme(axis.text.x = element_text(angle = 30, hjust = 0.7))

A density layer can be drawn by adding layer geom_density

p + 
  geom_density(alpha = 0.3, 
               mapping = aes(fill = factor(Species)))

A parallel coordinate can be converted to radial coordinate by setting axes.layout = "radial" in function coord_serialaxes.

p$coordinates$axes.layout <- "radial"
p

Note that: layers, such as geom_histogram, geom_density, etc, are not implemented in the radial coordinate yet.

Infinite axes

Andrews (1972) plot is a way to project multi-response observations into a function f(t), by defining f(t) as an inner product of the observed values of responses and orthonormal functions in t

fyi(t)=<yi,at>

where yi is the ith responses and at is the orthonormal functions under certain interval. Andrew suggests to use the Fourier transformation

at={12,sin(t),cos(t),sin(2t),cos(2t),...}

which are orthonormal on interval (π,π). In other word, we can project a p dimensional space to an infinite (π,π) space. The following figure illustrates how to construct an “Andrew’s plot”.

p <- ggplot(iris, 
            mapping = aes(Sepal.Length = Sepal.Length,
                          Sepal.Width = Sepal.Width,
                          Petal.Length = Petal.Length,
                          Petal.Width = Petal.Width,
                          colour = Species)) +
  geom_path(alpha = 0.2, 
            stat = "dotProduct")  + 
  coord_serialaxes()
p

A quantile layer can be displayed on top

p + 
 geom_quantiles(stat = "dotProduct",
                quantiles = c(0.25, 0.5, 0.75),
                linewidth = 2,
                linetype = 2) 

A couple of things should be noticed:

An alternative way to create a serial axes plot

Rather than calling function coord_serialaxes, an alternative way to create a serial axes object is to add a geom_serialaxes_... object in our model.

For example, Figure 1 to 4 can be created by calling

g <- ggplot(iris, 
            mapping = aes(Sepal.Length = Sepal.Length,
                          Sepal.Width = Sepal.Width,
                          Petal.Length = Petal.Length,
                          Petal.Width = Petal.Width,
                          colour = Species))
g + geom_serialaxes(alpha = 0.2)
g + 
  geom_serialaxes(alpha = 0.2) + 
  geom_serialaxes_hist(mapping = aes(fill = Species), alpha = 0.2)
g + 
  geom_serialaxes(alpha = 0.2) + 
  geom_serialaxes_density(mapping = aes(fill = Species), alpha = 0.2)
# radial axes can be created by 
# calling `coord_radial()` 
# this is slightly different, check it out! 
g + 
  geom_serialaxes(alpha = 0.2) + 
  geom_serialaxes(alpha = 0.2) + 
  coord_radial()

Figure 5 and 7 can be created by setting “stat” and “transform” in geom_serialaxes; to Figure 6, geom_serialaxes_quantile can be added to create a serial axes quantile layer.

Some slight difference should be noticed here:

# The serial axes is `Sepal.Length`, `Sepal.Width`, `Sepal.Length`
# With meaningful labels
ggplot(iris, 
       mapping = aes(Sepal.Length = Sepal.Length,
                     Sepal.Width = Sepal.Width,
                     Sepal.Length = Sepal.Length)) + 
  geom_path() + 
  coord_serialaxes()

# The serial axes is `Sepal.Length`, `Sepal.Length`
# No meaningful labels
ggplot(iris, 
       mapping = aes(Sepal.Length = Sepal.Length,
                     Sepal.Width = Sepal.Width,
                     Sepal.Length = Sepal.Length)) + 
  geom_serialaxes()

Also, if the dimension of data is large, typing each variate in mapping aesthetics is such a headache. Parameter axes.sequence is provided to determine the axes. For example, a serialaxes object can be created as

ggplot(iris) + 
  geom_path() + 
  coord_serialaxes(axes.sequence = colnames(iris)[-5])

At very end, please report bugs here. Enjoy the high dimensional visualization! “Don’t panic… Just do it in ‘serial’” (Alfred Inselberg 1999).

Reference

Andrews, David F. 1972. “Plots of High-Dimensional Data.” Biometrics, 125–0136.
Gnanadesikan, Ram. 2011. “Methods for Statistical Data Analysis of Multivariate Observations.” In, 321:207–0218. John Wiley & Sons.
Inselberg, A., and B. Dimsdale. 1990. “Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry.” In Proceedings of the First IEEE Conference on Visualization: Visualization ‘90, 361–0378.
Inselberg, Alfred. 1999. “Don’t Panic... Just Do It in Parallel!” Computational Statistics 14 (1): 53–077.