The R package ewens gives functions for the
Ewens
sampling distribution. The Ewens distribution is a probability
distribution over partitions of an integer \(n\). It is governed by a single
non-negative parameter, \(\theta\). The
larger the value of \(\theta\), the
more parts there are in the partition. If \(\theta\) is equal to zero, there is just a
single partition with size \(n\).
The Ewens distribution can be thought of as a distribution over partitions of an integer, but it originated in population genetics to give the probability of alleles in a sample, and sampling from the Ewens distribution involves running through the Chinese Restaurant Process \(n\) times. For further information on the Ewens sampling formula, see Simon Tavaré’s The magical Ewens sampling formula or Harry Crane’s The Ubiquitous Ewens Sampling Formula.
You can install the development version of ewens like
so:
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("chrishanretty/ewens")Suppose I wish to divide a class of 24 into groups of different
sizes. I set \(\theta\) to one, and
call rewens
set.seed(2923)
library(ewens)
allocation <- rewens(24, theta = 1)The vector allocation gives the groups to which each
student has been assigned. The sizes of the groups can be recovered by
tabulating the result.
table(allocation)Suppose a student challenges their allocation into a particular group. What can we say about the probability of the particular allocation, given \(\theta = 1\)?
pr <- dewens(allocation, theta = 1)
prThe probability of drawing this particular allocation is very low – but then again, given that there are 1575 different partitions of twenty-four, we should not be too surprised.
It is also possible to calculate the probability that there are \(x\) parts in a partition of \(n\) given \(\theta\). This is done using
dewens_k. Thus, the probability of creating the groups
above is given by
dewens_k(k = length(unique(allocation)),
n = 24,
theta = 1)