\name{kottby.user}
\alias{kottby.user}
\alias{global}
\alias{user.estimator}
\title{Estimation for user-defined estimators}
\description{
Calculates estimates, standard errors and confidence intervals for user-defined estimators (even non-analytic) in subpopulations.
}
\usage{
kottby.user(deskott, by = NULL, user.estimator, na.replace = NULL,
            conf.int = FALSE, conf.lev = 0.95, 
            df = attr(deskott, "nrg") - 1, ...)

global(deskott)
}
\arguments{
  \item{deskott}{Object of class \code{kott.design} containing the replicated survey data.}
  \item{by}{Formula specifying the variables that define the "estimation domains". If \code{NULL} (the default option) estimates refer to the whole population.}
  \item{user.estimator}{\R function to compute the value of the desired estimator on the original survey sample (see also 'Details' and 'Defining a user estimator function').}
  \item{na.replace}{Value to be used to replace any \code{NA}s in the output estimates (see 'Details').}
  \item{conf.int}{Boolean (\code{logical}) value to request confidence intervals for the estimates: the default is \code{FALSE}.}
  \item{conf.lev}{Probability specifying the desired confidence level: the default value is \code{0.95}.}
  \item{df}{Degrees of freedom for the t distribution used to build confidence intervals (see 'Details').}
  \item{\dots}{Additional parameters (if any) to be passed to the \code{user.estimator} function.}
}
\details{
The \code{kottby.user} function is designed to fully exploit the versatility of the DAGJK [Kott 99-01] replication method. It is intended to provide the user with a user-friendly tool for calculating estimates, standard errors and confidence intervals for estimators defined by the user themselves. As is obvious, weighted estimates for the \emph{"user-defined estimator"} are computed using suitable weights depending on the class of \code{deskott}: calibrated weights for class \code{kott.cal.design} and direct weights otherwise.

The optional argument \code{by} specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If \code{by=NULL} (the default option), the estimates produced by \code{kottby} refer to the whole population. Estimation domains must be defined by a formula: for example the statement \code{by=~B1:B2} selects as estimation domains the subpopulations determined by crossing the modalities of variables \code{B1} and \code{B2}. The \code{deskott} variables referenced by \code{by} (if any) must be \code{factor} and must not contain any missing value (\code{NA}).

The mandatory argument \code{user.estimator} is used to specify the calculation method for the "user-defined estimator". In more precise terms: the value bound to the formal argument \code{user.estimator} must be a function (an \R object of class \code{function}, even anonymous) able to compute the value of the required estimator on the sample data frame contained in \code{deskott}. It is not necessary for the \code{user.estimator} function's return value to be a single numerical value (it can be a vector, a matrix, an array, \ldots). In any case, it will be tacitly coerced to array by \code{kottby.user}. More detailed indications on how the \code{user.estimator} function must be constructed can be found in the 'Defining a user estimator function' section below.

The optional argument \code{na.replace} makes it possible to specify a value to be used to replace any missing values generated by \code{user.estimator} in the \code{kottby.user} function output. By default \code{na.replace=NULL} and the missing values are returned as \code{NA}s. 

The \code{conf.int} argument allows to request the confidence intervals for the estimates. By default \code{conf.int=FALSE}, that is the confidence intervals are not provided. 

Whenever confidence intervals are requested (i.e. \code{conf.int=TRUE}), the desired confidence level can be specified by means of the \code{conf.lev} argument. The \code{conf.lev} value must represent a probability (\code{0<=conf.lev<=1}) and its default is chosen to be \code{0.95}.

Given an input \code{kott.design} object with \code{nrg} random groups, \emph{by default} \code{kottby.user} builds the confidence intervals making use of a t distribution with \code{nrg-1} degrees of freedom. Indeed the argument \code{df} has a default value of \code{nrg-1}. Notice, however, that this default value should be used only when the user-defined function \code{user.estimator} estimates a univariate parameter of interest. As an example, if \code{user.estimator} were designed to estimate regression coefficients for a multiple linear regression with \emph{p} predictors and no intercept, the right choice would be \code{df = nrg-p}.

The special argument \code{\dots} (\emph{dot-dot-dot}) allows to specify additional parameters to be passed to the user-defined \code{user.estimator} function.
}
\section{Defining a user estimator function}{
In order to be correctly invoked by \code{kottby.user}, the function that codifies the "user-defined estimator" must comply with specific \emph{syntactical} restrictions. On the other hand there is not any constraint (at least in principle) on the \emph{semantics} of the function, that is on "what it calculates".\cr
The fundamental constraint is that the function's formal arguments list meets some minimal requirements. Suppose, for simplicity, that the function bound to the \code{user.estimator} formal argument is named \code{user.estfun}; than its structure must necessarily be of the following type:

\code{user.estfun=function(data, weights, etc){body}}									[1]	

The structure [1] has to be interpreted as follows: \code{user.estfun} body must contain all the instructions that would make it possible to compute the required estimator on the sample data contained in the \code{data} data frame using the weights contained in its \code{weights} column. The \code{"etc"} symbol represents in [1] any other \code{user.estfun}'s formal arguments whose actual values can be specified, when invoking \code{kottby.user}, using its special argument \code{\dots} (\emph{dot-dot-dot}).

Sometimes users may need to employ "global" quantities in the body of the \code{user.estfun} function, that is, quantities that, even when dealing with sub-population estimates, \emph{should not be re-calculated} for the sub-populations themselves (the latter being the standard \code{kottby.user} behaviour). This need is met by the \code{global} function: the user has only to reference, wherever the need arises, the \code{user.estfun} input data frame by means of the \code{global(data)} expression rather than the standard one \code{data}.\cr
The \code{global} function only accepts \code{kott.design} class objects and can only be used within functions invoked by \code{user.estfun}. An example that clearly illustrates the utility of \code{global} is provided by the calculation of poverty estimates (see the \code{poverty} function documented in the 'Examples' section below).
}
\value{
The return value depends on the value of the input parameters. In the most general case, the function returns an object of class \code{list} (typically a list made up of data frames).
}
\note{
The freedom granted to the user in developing the \code{user.estimator} function has important consequences that are worth highlighting. The key point is that, since only the user knows the semantics of \code{user.estimator}, he must vouch for its correct functioning. In particular:\cr 
(i) The \code{kottby.user} function must be able to invoke the \code{user.estimator} function on the \code{deskott} sample data frame and, if necessary, on its subsets defined by the \code{by} variables. Consequently, when developing the function, the user must make sure that the instructions in its \code{body} refer to variables that are actually contained in that data frame. This check could not be done by the \code{kottby.user} caller function albeit at the expense of limiting the user's freedom in constructing his \code{user.estimator};\cr
(ii) In the same way, due to user's freedom in developing \code{user.estimator}, the \code{kottby.user} function cannot prevent the generation of missing values in its output. The usefulness of the \code{na.replace} parameter must, therefore, be considered as purely "cosmetic".
}
\author{Diego Zardetto}
\references{
Kott, Phillip S. (1999) \emph{"The Extended Delete-A-Group Jackknife"}. Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167-168.

Kott, Phillip S. (2001) \emph{"The Delete-A-Group Jackknife"}. Journal of Official Statistics, Vol.17, No.4, pp. 521-526.
}
\seealso{
\code{\link{kottby}} for estimating totals and means, \code{\link{kott.ratio}} for estimating ratios between totals, \code{\link{kott.quantile}} for estimating quantiles and \code{\link{kott.regcoef}} for estimating regression coefficients.
}
\examples{
# Some examples of user-defined estimators and illustration
# of their use via kottby.user. Remember that R functions
# expressing user-defined estimators must comply with the
# condition indicated in [1]. The 3 functions that appear
# in the following examples ('ones', 'ratio' and 'poverty')
# are contained in the data.examples file.
# The 'poverty' function (also) illustrates the correct use
# of the 'global' function.

data(data.examples)

# Creation of a kott.design object:
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
      weights=~weight,nrg=15)


# 1) Estimator of the number of final units in the population.
#    Use the name 'ones' to refer to the R function that
#    expresses the estimator and define it as follows: 

#    ones <- function (d, w)
#    ######################################
#    #  Number of final units estimator.  #
#    ######################################
#    {
#        sum(d[, w])
#    }

#    Now using kottby.user is easy, for instance:
kottby.user(kdes,user.estimator=ones)


# 2) Estimator of ratios between totals (or means) for 2
#    quantitative variables. Use the name 'ratio' to refer
#    to the R function that expresses the estimator and
#    define it as follows (notice the use of the etc
#    arguments in [1]): 

#    ratio <- function (d, w, num, den)
#    ###########################################
#    #  Ratio estimator for totals (or means)  #
#    #  of quantitative variables.             #
#    ###########################################
#    {
#        sum(d[, w] * d[, num])/sum(d[, w] * d[, den])
#    }

#    Calculating ratio estimates and standard errors
#    is easy (notice the use of the \dots argument
#    of kottby.user):
kottby.user(kdes,user.estimator=ratio,num="y1",den="x1")


# 3) A non-analytic estimator: population percentage 
#    with income below the poverty threshold (defined,
#    for the sake of simplicity, as 0.6 times the
#    average income for the whole population).
#    Call 'poverty' the estimator and define it as follows:

#    poverty <- function (d, w, y, threshold)
#    ####################################################################
#    #  Population percentage with income below the poverty threshold.  #  
#    #  Suppose poverty threshold is defined as 0.6 times the average   #
#    #  income for the whole population.                                #
#    ####################################################################
#    {
#        if (missing(threshold)) {
#        # if I do want to take into account the variance of the poverty
#        # threshold letting it be re-calculated replicate by replicate.
#            d.global = global(d)
#            th.value = 0.6 * sum(d.global[, w] * d.global[, y])/sum(d.global[, w])
#        }
#        else {
#        # if I do not want to take into account the variance of the poverty
#        # threshold, I will supply its point estimate to the 'threshold' argument.
#            th.value = threshold 
#        }
#        est = 100 * sum(d[d[, y] < th.value, w])/sum(d[, w])
#        est
#    }


#    3.1) First use: neglect the variance of the poverty threshold
#         and supply to 'threshold' (by means of the \dots argument
#         of kottby.user) its point estimate obtained using kottby:
pov.line<-0.6*kottby(kdes,~income,estimator="mean")$mean
kottby.user(kdes,user.estimator=poverty,y="income",threshold=pov.line)

#    3.2) Second use: do take into account the variance of the poverty
#         threshold letting it be re-calculated replicate by replicate
#         (thus not supplying any actual value to 'threshold'):
kottby.user(kdes,user.estimator=poverty,y="income")

#    Notice that the standard error estimate for the 'poverty' estimator 
#    obtained in 3.2) cannot be calculated analytically by Taylor 
#    linearization. 

#    Notice the use of the 'global' function in the body of 'poverty': 
#    since the poverty status of each final unit depends on a global
#    value (that is, the average income for the whole population) 
#    'global' is used to prevent, whenever a sub-population poverty
#    estimate is needed, this global value being calculated locally
#    i.e. within the sub-population itself.
#    In fact: 
pov.line<-0.6*kottby(kdes,~income,estimator="mean")$mean
kdes2<-kott.addvars(kdes,pov.status=as.factor(ifelse(income<pov.line,
                                              "poor","not-poor")))
kottby.user(kdes2,by=~pov.status,user.estimator=poverty,y="income")

#    If the 'global' function were not used in 'poverty' 
#    the poverty threshold would be calculated relative to  
#    each individual sub-population:

poverty2 <- function (d, w, y, threshold)
###############################################
#  Whithout relying on the 'global' function  #  
###############################################
{
    if (missing(threshold)) {
        th.value = 0.6 * sum(d[, w] * d[, y])/sum(d[, w])
    }
    else {
        th.value = threshold 
    }
    est = 100 * sum(d[d[, y] < th.value, w])/sum(d[, w])
    est
}

kottby.user(kdes2,by=~pov.status,user.estimator=poverty2,y="income")

#    This means that without 'global' a non-null fraction of poors
#    would be paradoxically estimated for the "non-poors" sub-population
#    (and, conversely, a non-null fraction of non-poors among the "poors").
}
\keyword{survey}