Serial axes coordinate is a methodology for visualizing the p-dimensional geometry and multivariate data. As the name suggested, all axes are shown in serial. The axes can be a finite p space or transformed to an infinite space (e.g. Fourier transformation).
In the finite p space, all axes
can be displayed in parallel which is known as the parallel coordinate;
also, all axes can be displayed under a polar coordinate that is often
known as the radial coordinate or radar plot. In the infinite space, a
mathematical transformation is often applied. More details will be
explained in the sub-section Infinite axes
A point in Euclidean p-space Rp is represented as a polyline in serial axes coordinate, it is found that a point <–> line duality is induced in the Euclidean plane R2 (A. Inselberg and Dimsdale 1990).
Before we start, a couple of things should be noticed:
In the serial axes coordinate system, no x
or
y
(even group
) are required; but other
aesthetics, such as colour
, fill
,
size
, etc, are accommodated.
Layer geom_path
is used to draw the serial lines;
layer geom_histogram
, geom_quantiles
, and
geom_density
are used to draw the histograms, quantiles
(not quantile
regression) and densities. Users can
also customize their own layer (i.e. geom_boxplot
,
geom_violin
, etc) by editing function
add_serialaxes_layers
.
Suppose we are interested in the data set iris
. A
parallel coordinate chart can be created as followings:
library(ggmulti)
# parallel axes plot
ggplot(iris,
mapping = aes(
Sepal.Length = Sepal.Length,
Sepal.Width = Sepal.Width,
Petal.Length = Petal.Length,
Petal.Width = Petal.Width,
colour = factor(Species))) +
geom_path(alpha = 0.2) +
coord_serialaxes() -> p
p
A histogram layer can be displayed by adding layer
geom_histogram
p +
geom_histogram(alpha = 0.3,
mapping = aes(fill = factor(Species))) +
theme(axis.text.x = element_text(angle = 30, hjust = 0.7))
A density layer can be drawn by adding layer
geom_density
A parallel coordinate can be converted to radial coordinate by
setting axes.layout = "radial"
in function
coord_serialaxes
.
Note that: layers, such as
geom_histogram
, geom_density
, etc, are not
implemented in the radial coordinate yet.
Andrews (1972) plot is a way to project multi-response observations into a function f(t), by defining f(t) as an inner product of the observed values of responses and orthonormal functions in t
fyi(t)=<yi,at>
where yi is the ith responses and at is the orthonormal functions under certain interval. Andrew suggests to use the Fourier transformation
at={1√2,sin(t),cos(t),sin(2t),cos(2t),...}
which are orthonormal on interval (−π,π). In other word, we can project a p dimensional space to an infinite (−π,π) space. The following figure illustrates how to construct an “Andrew’s plot”.
p <- ggplot(iris,
mapping = aes(Sepal.Length = Sepal.Length,
Sepal.Width = Sepal.Width,
Petal.Length = Petal.Length,
Petal.Width = Petal.Width,
colour = Species)) +
geom_path(alpha = 0.2,
stat = "dotProduct") +
coord_serialaxes()
p
A quantile layer can be displayed on top
p +
geom_quantiles(stat = "dotProduct",
quantiles = c(0.25, 0.5, 0.75),
linewidth = 2,
linetype = 2)
A couple of things should be noticed:
mapping aesthetics is used to define the p dimensional space, if not provided, all
columns in the dataset ‘iris’ will be transformed. An alternative way to
determine the p dimensional space
to set parameter axes.sequence
in each layer or in
coord_serialaxes
.
To construct a dot product serial axes plot, say Fourier
transformation, “Andrew’s plot”, we need to set the parameter
stat
in geom_path
to “dotProduct”. The default
transformation function is the Andrew’s (function andrews
).
Users can customize their own, for example, Tukey suggests the following
projected space
at={cos(t),cos(√2t),cos(√3t),cos(√5t),...}
where t∈[0,kπ] (Gnanadesikan 2011).
tukey <- function(p = 4, k = 50 * (p - 1), ...) {
t <- seq(0, p* base::pi, length.out = k)
seq_k <- seq(p)
values <- sapply(seq_k,
function(i) {
if(i == 1) return(cos(t))
if(i == 2) return(cos(sqrt(2) * t))
Fibonacci <- seq_k[i - 1] + seq_k[i - 2]
cos(sqrt(Fibonacci) * t)
})
list(
vector = t,
matrix = matrix(values, nrow = p, byrow = TRUE)
)
}
ggplot(iris,
mapping = aes(Sepal.Length = Sepal.Length,
Sepal.Width = Sepal.Width,
Petal.Length = Petal.Length,
Petal.Width = Petal.Width,
colour = Species)) +
geom_path(alpha = 0.2, stat = "dotProduct", transform = tukey) +
coord_serialaxes()
Note that: Tukey’s suggestion, element at can “cover” more spheres in p dimensional space, but it is not orthonormal.
Rather than calling function coord_serialaxes
, an
alternative way to create a serial axes object is to add a
geom_serialaxes_...
object in our model.
For example, Figure 1 to 4 can be created by calling
g <- ggplot(iris,
mapping = aes(Sepal.Length = Sepal.Length,
Sepal.Width = Sepal.Width,
Petal.Length = Petal.Length,
Petal.Width = Petal.Width,
colour = Species))
g + geom_serialaxes(alpha = 0.2)
g +
geom_serialaxes(alpha = 0.2) +
geom_serialaxes_hist(mapping = aes(fill = Species), alpha = 0.2)
g +
geom_serialaxes(alpha = 0.2) +
geom_serialaxes_density(mapping = aes(fill = Species), alpha = 0.2)
# radial axes can be created by
# calling `coord_radial()`
# this is slightly different, check it out!
g +
geom_serialaxes(alpha = 0.2) +
geom_serialaxes(alpha = 0.2) +
coord_radial()
Figure 5 and 7 can be created by setting “stat” and “transform” in
geom_serialaxes
; to Figure 6,
geom_serialaxes_quantile
can be added to create a serial
axes quantile layer.
Some slight difference should be noticed here:
One benefit of calling coord_serialaxes
rather than
geom_serialaxes_...
is that coord_serialaxes
can accommodate duplicated axes in mapping aesthetics (e.g. Eulerian
path, Hamiltonian path, etc). However, in
geom_serialaxes_...
, duplicated axes will be
omitted.
Meaningful axes labels in coord_serialaxes
can be
created automatically, while in geom_serialaxes_...
, users
have to set axes labels by ggplot2::scale_x_continuous
or
ggplot2::scale_y_continuous
manually.
As we turn the serial axes into interactive graphics (via package
loon.ggplot),
serial axes lines in coord_serialaxes()
could be turned as
interactive but in geom_serialaxes_...
all objects are
static.
# The serial axes is `Sepal.Length`, `Sepal.Width`, `Sepal.Length`
# With meaningful labels
ggplot(iris,
mapping = aes(Sepal.Length = Sepal.Length,
Sepal.Width = Sepal.Width,
Sepal.Length = Sepal.Length)) +
geom_path() +
coord_serialaxes()
# The serial axes is `Sepal.Length`, `Sepal.Length`
# No meaningful labels
ggplot(iris,
mapping = aes(Sepal.Length = Sepal.Length,
Sepal.Width = Sepal.Width,
Sepal.Length = Sepal.Length)) +
geom_serialaxes()
Also, if the dimension of data is large, typing each variate in
mapping aesthetics is such a headache. Parameter
axes.sequence
is provided to determine the axes. For
example, a serialaxes
object can be created as
At very end, please report bugs here. Enjoy the high dimensional visualization! “Don’t panic… Just do it in ‘serial’” (Alfred Inselberg 1999).