The multinomial probit is obtained with the same modeling that we used while presenting the random utility model. The utility of an alternative is still the sum of two components : Uj=Vj+ϵj.
but the joint distribution of the error terms is now a multivariate normal with mean 0 and with a matrix of covariance denoted Ω1.
Alternative l is chosen if : {U1−Ul=(V1−Vl)+(ϵ1−ϵl)<0U2−Ul=(V2−Vl)+(ϵ2−ϵl)<0⋮UJ−Ul=(VJ−Vl)+(ϵJ−ϵl)<0
wich implies, denoting Vlj=Vj−Vl :
{ϵl1=(ϵ1−ϵl)<−Vl1ϵl2=(ϵ2−ϵl)<−Vl2⋮⋮ϵlJ=(ϵJ−ϵl)<−VlJ
The initial vector of errors ϵ are transformed using the following transformation :
ϵl=Mlϵ
where the transformation matrix Ml is a (J−1)×J matrix obtained by inserting in an identity matrix a lth column of −1. For example, if J=4 and l=3 :
M3=(10−1001−1000−11)
The covariance matrix of the error differences is obtained using the following matrix :
V(ϵl)=V(Mlϵ)=MlV(ϵ)Ml⊤=MlΩMl⊤
The probability of choosing l is then :
Pl=P(ϵl1<−Vl1&ϵl2<−Vl2&...ϵlJ<−VlJ)with the hypothesis of distribution, this writes :
Pl=∫−Vl1−∞∫−Vl2−∞...∫−VlJ−∞ϕ(ϵl)dϵl1dϵl2...dlJwith :
ϕ(ϵl)=1(2π)(J−1)/2∣Ωl∣1/2e−12ϵlΩl−1ϵlTwo problems arise with this model :
The meaning-full parameters are those of the covariance matrix of the error Ω. For example, with J=3 :
Ω=(σ11σ12σ13σ21σ22σ23σ31σ32σ33)
Ω1=M1ΩM1⊤=(σ11+σ22−2σ12σ11+σ23−σ12−σ13σ11+σ23−σ12−σ13σ11+σ33−2σ13)
The overall scale of utility being unidentified, one has to impose the value of one of the variance, for example the first one is fixed to 1. We then have :
Ω1=(1σ11+σ23−σ12−σ13σ11+σ22−2σ12σ11+σ23−σ12−σ13σ11+σ22−2σ12σ11+σ33−2σ13σ11+σ22−2σ12)
Therefore, out the 6 structural parameters of the covariance matrix, only 3 can be identified. Moreover, it's almost impossible to interpret these parameters.
More generally, with J alternatives, the number of the parameters of the covariance matrix is (J+1)×J/2 and the number of identified parameters is J×(J−1)/2−1.
Let Ll be the Choleski decomposition of the covariance matrix of the error differences :
Ωl=LlLl⊤
This matrix is a lower triangular matrix of dimension (J−1) :
Ll=(l1100...0l21l220...0l31l32l33...0⋮⋮⋮⋱⋮l(J−1)1l(J−1)2l(J−1)3...l(J−1)(J−1))
Let η be a vector of standard normal deviates :
η∼N(0,I)
Therefore, we have :
V(Llη)=LlV(η)Ll⊤=LlILl⊤=Ωl
Therefore, if we draw a vector of standard normal deviates η and apply to it this transformation, we get a realization of ϵl.
This joint probability can be written as a product of conditional and marginal probabilities :
Pl=P(ϵl1<−Vl1&ϵl2<−Vl2&...&ϵlJ<−VlJ))=P(ϵl1<−Vl1))×P(ϵl2<−Vl2∣ϵl1<−Vl1)×P(ϵl3<−Vl3∣ϵl1<−Vl1&ϵl2<−Vl2)⋮×P(ϵlJ<−VlJ∣ϵl1<−Vl1&...&ϵlJ−1<−VlJ−1))
The vector of error differences deviates is :
(ϵl1ϵl2ϵl3⋮ϵlJ)=(l1100...0l21l220...0l31l32l33...0⋮⋮⋮⋱⋮l(J−1)1l(J−1)2l(J−1)3...l(J−1)(J−1))×(η1η2η3⋮ηJ)
(ϵl1ϵl2ϵl3⋮ϵlJ)=(l11η1l21η1+l22η2l31η1+l32η2+l33η3⋮l(J−1)1η1+l(J−1)2η2+...+l(J−1)(J−1)ηJ−1)
Let's now investigate the marginal and conditional probabilities :
This probabilities can easily be simulated by drawing numbers from a truncated normal distribution.
This so called GHK algorithm2 (for Geweke, Hajivassiliou and Keane who developed this algorithm) can be described as follow :
\begin{enumerate} - compute Φ(−Vl1l11) - draw a number called ηr1 from a standard normal distribution upper-truncated at −Vl1l11 and compute Φ(−Vl2+l21ηr1l22) - draw a number called ηr2 from a standard normal distribution upper-truncated at −Vl2+l21ηr1l22 and compute Φ(−Vl3+l31ηr1+l32ηr2l33) - ... draw a number called ηrJ−1 from a standard normal distribution upper-truncated at −VlJ−1+l(J−1)1ηr1+...VlJ−1+l(J−1)(J−2)ηrJ−2l(J−1)(J−1) - multiply all these probabilities and get a realization of the probability called Prl. - repeat all these steps many times and average all these probabilities ; this average is an estimation of the probability : ˉPl=∑Rr=1Prl/R. \end{enumerate}
Several points should be noted concerning this algorithm :
We use again the Fishing
data frame, with only a subset of three alternatives used. The multinomial probit model is estimated using mlogit
with the probit
argument equal to TRUE
.
library("mlogit")
data("Fishing", package = "mlogit")
Fish <- dfidx(Fishing, varying = 2:9, choice = "mode", idnames = c("chid", "alt"))
Fish.mprobit <- mlogit(mode~price | income | catch, Fish, probit = TRUE, alt.subset=c('beach', 'boat','pier'))
summary(Fish.mprobit)
##
## Call:
## mlogit(formula = mode ~ price | income | catch, data = Fish,
## alt.subset = c("beach", "boat", "pier"), probit = TRUE)
##
## Frequencies of alternatives:choice
## beach boat pier
## 0.18356 0.57260 0.24384
##
## bfgs method
## 14 iterations, 0h:0m:11s
## g'(-H)^-1g = 9.77E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept):boat 7.2514e-01 3.5809e-01 2.0250 0.0428661 *
## (Intercept):pier 6.2393e-01 2.7396e-01 2.2774 0.0227617 *
## price -1.2154e-02 1.7697e-03 -6.8681 6.505e-12 ***
## income:boat 2.4005e-06 3.6698e-05 0.0654 0.9478448
## income:pier -6.5419e-05 4.0832e-05 -1.6022 0.1091198
## catch:beach 1.5479e+00 4.3002e-01 3.5995 0.0003188 ***
## catch:boat 4.0010e-01 4.1600e-01 0.9618 0.3361595
## catch:pier 1.2747e+00 5.5863e-01 2.2819 0.0224968 *
## boat.pier 5.4570e-01 4.6263e-01 1.1795 0.2381809
## pier.pier 6.9544e-01 2.9294e-01 2.3740 0.0175973 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -478.43
## McFadden R^2: 0.32751
## Likelihood ratio test : chisq = 465.99 (p.value = < 2.22e-16)
Daganzo, C. 1979. Multinomial Probit: The Theory and Its Application to Demand Forecasting. Academic Press, New York.
Geweke, J., M. Keane, and D. Runkle. 1994. “Alternative Computational Approaches to Inference in the Multinomial Probit Model.” Review of Economics and Statistics 76: 609–32.
Hausman, J., and D. Wise. 1978. “A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdemendence and Heterogeneous Preferences.” Econometrica 48: 403–29.