% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/MutationProfiling.R
\name{observedMutations}
\alias{observedMutations}
\title{Calculate observed numbers of mutations}
\usage{
observedMutations(db, sequenceColumn = "SEQUENCE_IMGT",
  germlineColumn = "GERMLINE_IMGT_D_MASK", regionDefinition = NULL,
  mutationDefinition = NULL, ambiguousMode = c("eitherOr", "and"),
  frequency = FALSE, combine = FALSE, nproc = 1)
}
\arguments{
\item{db}{\code{data.frame} containing sequence data.}

\item{sequenceColumn}{\code{character} name of the column containing input 
sequences. IUPAC ambiguous characters for DNA are 
supported.}

\item{germlineColumn}{\code{character} name of the column containing 
the germline or reference sequence. IUPAC ambiguous 
characters for DNA are supported.}

\item{regionDefinition}{\link{RegionDefinition} object defining the regions
and boundaries of the Ig sequences. If NULL, mutations 
are counted for entire sequence.}

\item{mutationDefinition}{\link{MutationDefinition} object defining replacement
and silent mutation criteria. If \code{NULL} then 
replacement and silent are determined by exact 
amino acid identity.}

\item{ambiguousMode}{whether to consider ambiguous characters as 
\code{"either or"} or \code{"and"} when determining and 
counting the type(s) of mutations. Applicable only if
\code{sequenceColumn} and/or \code{germlineColumn} 
contain(s) ambiguous characters. One of 
\code{c("eitherOr", "and")}. Default is \code{"eitherOr"}.}

\item{frequency}{\code{logical} indicating whether or not to calculate
mutation frequencies. Default is \code{FALSE}.}

\item{combine}{\code{logical} indicating whether for each sequence should
the mutation counts for the different regions (CDR, FWR) and 
mutation types be combined and return one value of 
count/frequency per sequence instead of 
multiple values. Default is \code{FALSE}.}

\item{nproc}{number of cores to distribute the operation over. If the 
cluster has already been set the call function with 
\code{nproc} = 0 to not reset or reinitialize. Default is 
\code{nproc} = 1.}
}
\value{
A modified \code{db} \code{data.frame} with observed mutation counts for each 
          sequence listed. The columns names are dynamically created based on the
          regions in the \code{regionDefinition}. For example, when using the
          \link{IMGT_V} definition, which defines positions for CDR and
          FWR, the following columns are added:
          \itemize{
            \item  \code{MU_COUNT_CDR_R}:  number of replacement mutations in CDR1 and 
                                           CDR2 of the V-segment.
            \item  \code{MU_COUNT_CDR_S}:  number of silent mutations in CDR1 and CDR2 
                                           of the V-segment.
            \item  \code{MU_COUNT_FWR_R}:  number of replacement mutations in FWR1, 
                                           FWR2 and FWR3 of the V-segment.
            \item  \code{MU_COUNT_FWR_S}:  number of silent mutations in FWR1, FWR2 and
                                           FWR3 of the V-segment.
          }
          If \code{frequency=TRUE}, R and S mutation frequencies are
          calculated over the number of non-N positions in the speficied regions.
          \itemize{
            \item  \code{MU_FREQ_CDR_R}:  frequency of replacement mutations in CDR1 and 
                                           CDR2 of the V-segment.
            \item  \code{MU_FREQ_CDR_S}:  frequency of silent mutations in CDR1 and CDR2 
                                           of the V-segment.
            \item  \code{MU_FREQ_FWR_R}:  frequency of replacement mutations in FWR1, 
                                           FWR2 and FWR3 of the V-segment.
            \item  \code{MU_FREQ_FWR_S}:  frequency of silent mutations in FWR1, FWR2 and
                                           FWR3 of the V-segment.
          } 
          If \code{frequency=TRUE} and \code{combine=TRUE}, the mutations and non-N positions
          are aggregated and a single \code{MU_FREQ} value is returned
          \itemize{
            \item  \code{MU_FREQ}:  frequency of replacement and silent mutations in the 
                                     specified region
          }
}
\description{
\code{observedMutations} calculates the observed number of mutations for each 
sequence in the input \code{data.frame}.
}
\details{
Mutation counts are determined by comparing the input sequences (in the column specified 
by \code{sequenceColumn}) to the germline sequence (in the column specified by 
\code{germlineColumn}). See \link{calcObservedMutations} for more technical details, 
\strong{including criteria for which sequence differences are included in the mutation 
counts and which are not}.

The mutations are binned as either replacement (R) or silent (S) across the different 
regions of the sequences as defined by \code{regionDefinition}. Typically, this would 
be the framework (FWR) and complementarity determining (CDR) regions of IMGT-gapped 
nucleotide sequences. Mutation counts are appended to the input \code{db} as 
additional columns.
}
\examples{
# Subset example data
data(ExampleDb, package="alakazam")
db <- subset(ExampleDb, ISOTYPE == "IgG" & SAMPLE == "+7d")

# Calculate mutation frequency over the entire sequence
db_obs <- observedMutations(db, sequenceColumn="SEQUENCE_IMGT",
                            germlineColumn="GERMLINE_IMGT_D_MASK",
                            frequency=TRUE,
                            nproc=1)

# Count of V-region mutations split by FWR and CDR
# With mutations only considered replacement if charge changes
db_obs <- observedMutations(db, sequenceColumn="SEQUENCE_IMGT",
                            germlineColumn="GERMLINE_IMGT_D_MASK",
                            regionDefinition=IMGT_V,
                            mutationDefinition=CHARGE_MUTATIONS,
                            nproc=1)
                     
}
\seealso{
\link{calcObservedMutations} is called by this function to get the number of mutations 
in each sequence grouped by the \link{RegionDefinition}. 
See \link{IMGT_SCHEMES} for a set of predefined \link{RegionDefinition} objects.
See \link{expectedMutations} for calculating expected mutation frequencies.
}
