\name{wilcox.selection.split}
\alias{wilcox.selection.split}

\title{Wilcoxon-based variable selection in cross-validation (CV) and Monte-Carlo cross-validation (MCCV)}
\usage{
wilcox.selection.split(x,y,split,algo="new",pvalue=FALSE)
}
\arguments{
  \item{x}{a matrix or a data frame of size n x p giving the expression levels of the p variables (genes) for the n observations (arrays).
Variables correspond to columns, observations to rows.}
  \item{y}{a vector of length n giving the class membership for the n observations (arrays). \code{y} can be
either a factor or a numeric and must be coded as 0,1.}
\item{split}{A \code{niter} x \code{ntest} matrix giving the indices of the \code{ntest} observations included
in each of the \code{niter} test sets, as generated by the functions \code{\link{generate.split}} or \code{\link{generate.cv}}. The i-th row of \code{split} gives the indices of the observations included in the test data set for the i-th random splitting iteration.}
  \item{algo}{either \code{"new"} or \code{"naive"}. If \code{type="new"}, the new fast method
described in Boulesteix (2007) is used. If \code{type="naive"}, results are obtained by running the
function \code{wilcox.test} \code{niter} times.}
  \item{pvalue}{Logical. Should p-values be returned?}
}
\description{
The function \code{wilcox.selection.split}  performs variable ordering based on the Wilcoxon rank sum test for all \code{niter} CV or MCCV iterations.   
}

\details{The Wilcoxon rank sum statistic is defined as the sum of the X-ranks of
the observations with \code{y=0}. The Wilcoxon rank sum test is equivalent to the
Mann-Whitney test. It is implemented in the function \code{wilcox.test}.

In the context of cross-validation (CV) or Monte-Carlo cross-validation (MCCV), \code{wilcox.selection.split} computes the
Wilcoxon rank sum statistic for each iteration, for each variable. At each iteration, a subset
of the \code{n} observations is excluded from the data set and considered as test data set. 
The indices of the observations considered as test set for each of the \code{niter} iterations
are given in the \code{niter} x \code{ntest} matrix \code{split}.
}

\value{
  A list with the following components:
  \item{ordering.split}{A \code{niter} x p matrix giving the indices of the genes ordered by pvalue. For example,
the first column of \code{ordering.split} gives the index of the variable with lowest pvalue in each of the
\code{niter} random splitting iterations, the second column of \code{ordering.split} gives the index of the variable with the second lowest pvalue in each of the \code{niter} random splitting iterations. For the i-th iteration, the indices of the 50 best variables are given in the 50 first columns of row i.}
  \item{pvalue.split}{Returned only if \code{pvalue=TRUE}. A \code{niter} x p matrix of pvalues. The element in the 
i-th row and j-th column is the pvalue of variable j in the i-th iteration.} 
  
  
}

\references{
 
 A. L. Boulesteix (2007). WilcoxCV: an R package for fast variable selection in cross-validation. Bioinformatics 23:1702-1704.

}

\author{
  Anne-Laure Boulesteix (\url{http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/index.html}) 
  
 

}
\seealso{\code{\link{wilcox.test}}, \code{\link{generate.split}}, \code{\link{generate.cv}},  \code{\link{wilcox.split}}}

\examples{
# load WilcoxCV library
library(WilcoxCV)

# Generate data
x<-matrix(rnorm(1000),100,10)
y<-sample(c(0,1),100,replace=TRUE)

# Generate 50 MCCV splits with ratio 2:1 for a data set including 90 observations
my.split<-generate.split(niter=50,n=90,ntest=30)

# Compute the Wilcoxon rank sum statistic for the 50 iterations.
wilcox.selection.split(x=x,y=y,split=my.split,algo="new",pvalue=TRUE)
}
\keyword{htest}
