% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/budgetIV_scalar.R
\name{budgetIV_scalar}
\alias{budgetIV_scalar}
\title{Efficient partial identification of a scalar causal effect parameter with invalid instruments}
\usage{
budgetIV_scalar(
  beta_y,
  beta_phi,
  tau_vec = NULL,
  b_vec = NULL,
  delta_beta_y = NULL,
  bounds_only = TRUE
)
}
\arguments{
\item{beta_y}{A \eqn{d_{Z}}-dimensional vector representing the (estimated) 
cross covariance \eqn{\mathrm{Cov}(Y, Z)}.}

\item{beta_phi}{A \eqn{d_{Z}}-dimensional vector representing the (estimated) 
cross covariance \eqn{\mathrm{Cov}(\Phi (X), Z)}.}

\item{tau_vec}{A \eqn{K}-dimensional vector of increasing, positive 
thresholds representing degrees of IV invalidity. 
The default value \code{NULL} can be used for a single threshold at \eqn{0}.}

\item{b_vec}{A \eqn{K}-dimensional vector of increasing positive integers 
representing the maximum number of IVs that can surpass each threshold. 
The default value \code{NULL} can be used for a single threshold at \eqn{0}, with at least \eqn{50\%} of IVs assumed to be valid.}

\item{delta_beta_y}{A \eqn{d_{Z}}-dimensional vector of positive half-widths for box-shaped 
confidence bounds on \code{beta_y}. 
The default value \code{NULL} can be used to not include finite sample uncertainty.}

\item{bounds_only}{A boolean \code{TRUE} or \code{FALSE}. \code{TRUE} will store overlapping intervals in the confidence set as a single interval, 
while \code{FALSE} will store different intervals for different values of \code{budget_assignment} (see return value of Penn et al. (2025) for further details).
The default is \code{TRUE}.

If TRUE (default), the output consists only of disjoint bounds. Otherwise, if FALSE, the output consists of bounds for 
possibly touching intervals (but never overlapping), as well as the budget assignment corresponding to each bound.}
}
\value{
A data.table with each row corresponding to bounds on the scalar causal effect parameter \eqn{\theta} corresponding to a particular budget assignment \eqn{U} (see Penn et al. (2025)). 
The return table has the following rows: a logical \code{is_point} determining whether the upper and lower bounds are equivalent; numerical \code{lower_bound}
and \code{upper_bound} giving the lower and upper bounds; and a list \code{budget_assignment} giving the value of \eqn{U} for each candidate instrument. 
\code{budget_assignment} will only be returned if \code{bounds_only == FALSE} as input by the user.

A list of two entries: \code{intervals}, which is a two-column matrix with rows corresponding to disjoint bounds containing plausible values of \eqn{\theta}; 
and \code{points}, which is a one-column matrix consisting of lone plausible values of \eqn{\theta}---relevant when using \eqn{\tau_1 = 0}.
}
\description{
Partial identification and coverage of a causal effect parameter using summary statistics and budget constraint assumptions.
See Penn et al. (2025) for technical definitions.
}
\details{
Instrumental variables are defined by three structural assumptions: (A1) they are associated with the treatment; 
(A2) they are unconfounded with the outcome; and (A3) they exclusively effect the outcome through the treatment. 
Assumption (A1) has a simple statistical test, whereas for many data generating processes (A2) and (A3) are 
unprovably false. 
The \code{budgetIV} and \code{budgetIV_scalar} algorithms allow for valid causal inference when some proportion, 
possibly a small minority, of candidate instruments satisfy both (A2) and (A3).

\code{budgetIV} & \code{budgetIV_scalar} assume a homogeneous treatment effect, which implies the separable structural 
equation \eqn{Y = \theta \Phi(X) + g_y(Z, \epsilon_x)}. 
The difference between the algorithms is that \code{budgetIV_scalar} assumes \eqn{\Phi(X)} and \eqn{\theta} take
scalar values, which is exploited for super-exponential computational speedup and allows for causal inference
with thousands of candidate instruments \eqn{Z}.
Both methods assume ground truth knowledge of the functional form of \eqn{\Phi (X)}, e.g., a linear, 
logistic, Cox hazard, principal component based or other model. 
The parameter \eqn{\theta} captures the unknown treatment effect.
Violation of (A2) and/or (A3) will bias classical IV approaches through the statistical dependence
between \eqn{Z} and \eqn{g_y(Z, \epsilon_x)}, summarized by the covariance parameter 
\eqn{\gamma := \mathrm{Cov} (g_y(Z, \epsilon_x), Z)}.

\code{budgetIV} & \code{budgetIV_scalar} constrain \eqn{\gamma} through a series of positive thresholds 
\eqn{0 \leq \tau_1 < \tau_2 < \ldots < \tau_K} and corresponding integer budgets \eqn{0 < b_1 < b_2 < \ldots < b_K \leq d_Z}. 
It is assumed for each \eqn{i \in \{ 1, \ldots, K\}} that no more than \eqn{b_i} components of \eqn{\gamma} are greater in 
magnitude than \eqn{\tau_i}.
For instance, taking \eqn{d_Z = 100}, \eqn{K = 1}, \eqn{b_1 = 5} and \eqn{\tau_1 = 0} means 
assuming \eqn{5} of the \eqn{100} candidates are valid instrumental variables (in the sense that their ratio 
estimates \eqn{\theta_j := \mathrm{Cov}(Y, Z_j)/\mathrm{Cov}(\Phi(X), Z_j)} are unbiased).

With \code{delta_beta_y = NA}, \code{budgetIV} & \code{budgetIV_scalar} return the identified set
of causal effects that agree with both the budget constraints described above and the values of
\eqn{\mathrm{Cov}(Y, Z)} and \eqn{\mathrm{Cov}(Y, Z)}, assumed to be exactly precise. 
Unlike classical partial identification methods (see Manski (1990) ofr a canonical example), the non-convex mixed-integer
budget constraints yield a possibly disconnected identified set. 
Each connected subset has a different interpretation as to which of the candidate instruments \eqn{Z} 
are valid up to each threshold.
\code{budgetIV_scalar} returns these interpretations alongside the corresponding bounds on \eqn{\theta}. 

When \code{delta_beta_y} is not null, it is used as box-constraints to quantify uncertainty in \code{beta_y}. 
In the examples, \code{delta_beta_y} is calculated through a Bonferroni correction and gives an (asymptotically) 
valid confidence set over \code{beta_y}. 
Under the so-called "no measurement error" (NOME) assumption (see Bowden et al. (2016)) which is commonly applied in Mendelian randomisation, it is
assumed that the estimate of \code{beta_y} is the dominant source of finite-sample uncertainty, with uncertainty in \code{beta_x}
entirely negligible. 
With an (asymptotically) valid confidence set over \code{delta_beta_y} and under the "no measurement error" assumption, \code{budgetIV_scalar} 
returns an (asymptotically) valid confidence set for \eqn{\theta}.
}
\examples{
 
data(Do_et_al_summary_statistics)

candidatesHDL = Do_et_al_summary_statistics[Do_et_al_summary_statistics$pHDL <= 1e-8, ]

candidate_labels <- candidatesHDL$rsID
d_Z <- length(candidate_labels)

beta_x <- candidatesHDL$betaHDL

beta_y <- candidatesHDL$betaCAD

SE_beta_y <- abs(beta_y) / qnorm(1-candidatesHDL$pCAD/2)

alpha = 0.05
delta_beta_y <- qnorm(1 - alpha/(2*d_Z))*SE_beta_y

feasible_region <- budgetIV_scalar(
                                   beta_y = candidatesHDL$betaCAD,
                                   beta_phi = beta_x,
                                   tau_vec = c(0),
                                   b_vec = c(30),
                                   delta_beta_y = delta_beta_y,
                                   bounds_only = FALSE
                                   )

}
\references{
Jordan Penn, Lee Gunderson, Gecia Bravo-Hermsdorff,
Ricardo Silva, and David Watson. (2024). BudgetIV: Optimal Partial Identification of Causal Effects with Mostly Invalid Instruments. \emph{arXiv}
preprint, 2411.06913.

Jack Bowden, Fabiola Del Greco M, Cosetta Minelli, George Davey Smith, Nuala A Sheehan, and John R Thompson. (2016). Assessing the suitability of summary data for 
two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I^2 statistic. \emph{Int. J. Epidemiol.} 46.6, pp. 1985--1998.

Charles F Manski. (1990). Nonparametric bounds on treatment effects. \emph{Am. Econ. Rev.} 80.2, pp. 219--323.
}
