\name{CBT}
\alias{CBT}
\alias{Emp_CBT}
\alias{Ana_CBT}
\alias{Uniform_Prior}
\alias{Sine_Prior}
\alias{Cosine_Prior}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{ Confidence Bound Target (CBT) Algorithm
%%  ~~function to do ... ~~
}
\description{\code{CBT} and \code{EMp_CBT} provide simution to infinite arms with Bernoulli Rewards.
\code{CBT} assumes prior ditribution in known whereas \code{EMp_CBT} does not. \code{Ana_CBT} performs analysis to real data.
%%  ~~ A concise (1-5 lines) description of what the function does. ~~
}
\usage{


CBT(n, prior, bn = log(log(n)), cn = log(log(n)))
Emp_CBT(n, prior, bn = log(log(n)), cn = log(log(n)))
Ana_CBT(n, data,  bn = log(log(n)), cn = log(log(n)))

}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{n}{total number of rewards.
%%     ~~Describe \code{n} here~~
}
  \item{prior}{ prior distribution on mean of the rewards. Currently avaiable priors: "Uniform", "Sine" and "Cosine".
%%     ~~Describe \code{prior} here~~
} 
  \item{bn}{ bn should increse slowly to infinity with n. 
%%     ~~Describe \code{bn} here~~
}
  \item{cn}{ cn should increse slowly to infinity with n. 
%%     ~~Describe \code{cn} here~~
}
  \item{data}{ A matrix or dataframe. Each column is a population.
}
}
\details{
If \code{bn} or \code{cn} are not specified they assume the default value of \code{log(log(n))}.\cr
The confidence bound for an arm with \eqn{t} observations is 
         \deqn{L = max ( xbar/bn, xbar-cn*sigma/sqrt(t) ),} 
where xbar and sigma are the mean and standatd deviation of the rewards from that paticular arm.\cr
CBT is a non-recalling algorithm. An arm is played until its confidence bound \eqn{L} drops below the target mean \eqn{\mu_*}, and it is not played after that.\cr
If the prior distribution is unknown, we shall apply empirical CBT, in which the target mean \eqn{\mu_*} is replaced by \eqn{S/n}, with \eqn{S} the sum of rewards among all arms played at current stage. Unlike CBT howerver empirical CBT is a recalling algorithm which decides from among all arms which to play further, rather than to consider only the current arm.
 
%%  ~~ If necessary, more details than the description above ~~
}
\value{
 A list including elements
  \item{regret}{cumulative regret generated by n rewards. }
  \item{K}{total number of experimented arms.}
%%  ~Describe the value returned
%%  If it is a LIST, use
%%  \item{comp1 }{Description of 'comp1'}
%%  \item{comp2 }{Description of 'comp2'}
%% ...
}
\references{  H.P. Chan and S. Hu (2018) Infinite Arms Bandit: Optimality via Confidence Bounds <arXiv:1805.11793>
%% ~put references to the literature/web site here ~
}
\author{Hock Peng Chan and Shouri Hu
%%  ~~who you are~~
}

\examples{
R = 1000

cum_regret = numeric(R)
arms = numeric(R)

for(i in 1:R){
  result = CBT(n = 10000, prior = "Sine")
  cum_regret[i] = result$regret
  arms[i] = result$K
}

mean(cum_regret)
sd(cum_regret)/sqrt(R)
mean(arms)
sd(arms)/sqrt(R)

}
% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
\keyword{Confidence Bound}% use one of  RShowDoc("KEYWORDS")
\keyword{Multi-armed Bandit}

