Recall that a random variable X has (univariate) negative binomial law with parameters κ>0,0<p<1, i.e., X∼NB(κ,p) if its probability mass function is given by P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in \{0,1,\ldots\}.
A random vector {\bf X}=(X_1,\dots,X_n)' is said to follow the multivariate negative binomial distribution with parameters \kappa, p_1, \dots, p_n if its probability mass function is given by P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa}, where, for i=1,\dots,n, x_i\in\{0,1,\dots\}, 0<p_i<1 such that \sum_{i=1}^np_i<1 and \kappa>0.
We note that the function stats::rnbinom
can be used to simulate from the univariate negative binomial distribution. The trawl
package introduces the function Bivariate_NBsim
which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps: First, we simulate X_1 from the univariate negative binomial distribution NB(\kappa,p_1/(1-p_2)). Second, we simulate X_2|X_1=x_1 from the univariate negative binomial distribution NB(\kappa+x_1,p_2), see for instance (Dunn 1967).
###Example
set.seed(1)
<- 3
kappa<- 0.1
p1 <- 0.85
p2 <- p1+p2
p <-1-p1-p2
p0
<- 10000
N
#Simulate from the bivariate negative binomial distribution
<- trawl::Bivariate_NBsim(N,kappa,p1,p2)
y
#Compare the empirical and theoretical mean of the first component
::mean(y[,1])
base#> [1] 5.9653
*p1/(1-p)
kappa#> [1] 6
#Compare the empirical and theoretical variance of the first component
::var(y[,1])
stats#> [1] 18.0047
*p1*(1-p2)/(1-p)^2
kappa#> [1] 18
#Compare the empirical and theoretical mean of the second component
::mean(y[,2])
base#> [1] 50.9742
*p2/(1-p)
kappa#> [1] 51
#Compare the empirical and theoretical variance of the second component
::var(y[,2])
stats#> [1] 926.3358
*p2*(1-p1)/(1-p)^2
kappa#> [1] 918
#Compare the empirical and theoretical correlation between the two components
::cor(y[,1],y[,2])
stats#> [1] 0.7891007
*p2/(p0+p1)*(p0+p2))^(1/2)
(p1#> [1] 0.7141428
We say that a vector {\bf X}=(X_1,\dots,X_n)' follows the multivariate logarithmic series distribution (LSD), see, e.g., (Patil and Bildikar 1967). {\bf X} \sim \text{LSD}(p_1,\ldots,p_n), where 0<p_i<1, p:=\sum_{i=1}^np_i <1 if for {\bf x}\in \mathbb{N}_0^n \setminus \{{\bf 0} \}, if its probability mass function is given by
P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{\{-\ln(1-p)\}}.
Note that each component X_i follows the modified univariate logarithmic distribution with parameters \tilde p_i = p_i/(1-p+p_i) and \delta_i= \ln(1-p+p_i)/\ln(1-p), i.e., X_i\sim\text{ModLSD}(\tilde p_i, \delta_i) with
P(X_i =x_i) = \left\{ \begin{array}{ll} \delta_i, & \text{for } x_i=0\\
(1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{\{-\ln(1-\tilde p_i)\}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right.
Simulations from the univariate LSD can be carried out using the function Runuran::urlogarithmic
. The trawl
package implements the functions Bivariate_LSDsim
and Trivariate_LSDsim
to simulate from both the bivariate and the trivariate logarithmic series distribution.
The function Bivariate_NBsim
can be used to simulate from the bivariate logarithmic series distribution. To this end, note that the probability mass function of a random vector {\bf X}=(X_1,X_2)' following the bivariate logarithmic series distribution with parameters 0<p_1, p_2<1 with p:=p_1+p_2<1 is given by P(X_1=x_1,X_2=x_2)=\frac{\Gamma(x_1+x_2)}{x_1!x_2!}
\frac{p_1^{x_1}p_2^{x_2}}{\{-\ln(1-p)\}}, for x_1,x_2=0,1,2,\dots such that x_1+x_2>0.
The simulation proceeds in two steps: First, X_1 is simulated from the modified logarithmic distribution with parameters \tilde p_1=p_1/(1-p_2) and \delta_1=\ln(1-p_2)/\ln(1-p). Then we simulate X_2 conditional on X_1. We note that X_2|X_1=x_1 follows the logarithmic series distribution with parameter p_2 when x_1=0, and the negative binomial distribution with parameters (x_1,p_2) when x_1>0.
Next we provide an example of a simulation from the bivariate LSD and we showcase the functions ModLSD_Mean
, ModLSD_Var
, BivLSD_Cor
and BivLSD_Cov
which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.
set.seed(1)
<-0.15
p1<-0.3
p2
<-10000
N
#Simulate N realisations from the bivariate LSD
<-trawl::Bivariate_LSDsim(N, p1, p2)
y
#Compute the empirical and theoretical mean of the first component
::mean(y[,1])
base#> [1] 0.4575
::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
trawl#> [1] 0.45619
#Compute the empirical and theoretical mean of the second component
::mean(y[,2])
base#> [1] 0.8995
::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
trawl#> [1] 0.91238
#Compute the empirical and theoretical variance of the first component
::var(y[,1])
stats#> [1] 0.3686306
::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
trawl#> [1] 0.3724961
#Compute the empirical and theoretical variance of the second component
::var(y[,2])
stats#> [1] 0.5690567
::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
trawl#> [1] 0.5776045
##Compute the empirical and theoretical correlation between the two components
::cor(y[,1],y[,2])
stats#> [1] -0.3961486
::BivLSD_Cor(p1,p2)
trawl#> [1] -0.3608673
##Compute the empirical and theoretical covariance between the two components
::cov(y[,1],y[,2])
stats#> [1] -0.1814394
::BivLSD_Cov(p1,p2)
trawl#> [1] -0.1673877
The function Trivariate_NBsim
can be used to simulate from the trivariate logarithmic series distribution. The simulation proceeds in two steps: First, X_1 is simulated from the modified logarithmic distribution with parameters \tilde p_1=p_1/(1-p_2-p_3) and \delta_1=\ln(1-p_2-p_3)/\ln(1-p). Then we simulate (X_2,X_3)' conditional on X_1. We note that (X_2,X_3)'|X_1=x_1 follows the bivariate logarithmic series distribution with paramaters (p_2,p_3) when x_1=0, and the bivariate negative binomial distribution with parameters (x_1,p_2,p_3) when x_1>0.
set.seed(1)
<-0.15
p1<-0.25
p2<-0.55
p3
<- 10000
N
#Simulate N realisations from the bivariate LSD
<-trawl::Trivariate_LSDsim(N, p1, p2, p3)
y
#Compute the empirical and theoretical mean of the first component
::mean(y[,1])
base#> [1] 1.0152
::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
trawl#> [1] 1.001425
#Compute the empirical and theoretical mean of the second component
::mean(y[,2])
base#> [1] 1.6857
::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
trawl#> [1] 1.669041
#Compute the empirical and theoretical mean of the third component
::mean(y[,3])
base#> [1] 3.7275
::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
trawl#> [1] 3.67189
#Compute the empirical and theoretical variance of the first component
::var(y[,1])
stats#> [1] 2.999069
::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
trawl#> [1] 3.002847
#Compute the empirical and theoretical variance of the second component
::var(y[,2])
stats#> [1] 7.255041
::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
trawl#> [1] 7.228548
#Compute the empirical and theoretical variance of the third component
::var(y[,3])
stats#> [1] 30.87613
::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
trawl#> [1] 30.5799
#Computing the bivariate covariances and correlations
#Cor(X1,X2):
<- base::log(1-p3)/base::log(1-p1-p2-p3)
delta <-p1/(1-p3)
hatp1 <-p2/(1-p3)
hatp2
::cov(y[,1],y[,2])
stats#> [1] 3.316309
::BivModLSD_Cov(delta,hatp1,hatp2)
trawl#> [1] 3.335704
::cor(y[,1],y[,2])
stats#> [1] 0.7109545
::BivModLSD_Cor(delta,hatp1,hatp2)
trawl#> [1] 0.6041258
#Cor(X1,X3):
<- log(1-p2)/log(1-p1-p2-p3)
delta <-p1/(1-p2)
hatp1 <-p3/(1-p2)
hatp2
::cov(y[,1],y[,3])
stats#> [1] 7.287671
::BivModLSD_Cov(delta,hatp1,hatp2)
trawl#> [1] 7.338549
::cor(y[,1],y[,3])
stats#> [1] 0.7573281
::BivModLSD_Cor(delta,hatp1,hatp2)
trawl#> [1] 0.7220044
#Cor(X2,X3):
<- log(1-p1)/log(1-p1-p2-p3)
delta <-p2/(1-p1)
hatp1 <-p3/(1-p1)
hatp2
::cov(y[,2],y[,3])
stats#> [1] 12.31899
::BivModLSD_Cov(delta,hatp1,hatp2)
trawl#> [1] 12.23092
::cor(y[,2],y[,3])
stats#> [1] 0.8230829
::BivModLSD_Cor(delta,hatp1,hatp2)
trawl#> [1] 0.7969081
##References