\name{split_sumerian}
\alias{split_sumerian}
\title{Split a String into Sumerian Signs and Separators}
\description{
Splits a transliterated Sumerian text string into its constituent signs and the separators between them. The function recognizes three types of Sumerian sign representations: lowercase transliterations, uppercase sign names, and Unicode cuneiform characters.
}
\usage{
split_sumerian(x)
}
\arguments{
  \item{x}{A character string containing transliterated Sumerian text.}
}
\details{
The function identifies Sumerian signs based on three patterns:

\enumerate{
  \item \strong{Lowercase transliterations} (type 1): Sequences of lowercase letters (a-z) including special characters (\enc{ĝ}{g}, \enc{š}{sz}, ...) and accented vowels (\enc{á}{a}, \enc{é}{e}, \enc{í}{i}, \enc{ú}{u}, \enc{à}{a}, \enc{è}{e}, \enc{ì}{i}, \enc{ù}{u}), optionally followed by a numeric index.

  \item \strong{Uppercase sign names} (type 2): Sequences starting with an uppercase letter, optionally followed by additional uppercase letters, digits, or the characters \code{+}, \code{/}, and \enc{×}{x}.

  \item \strong{Cuneiform characters} (type 3): Unicode characters in the Cuneiform block (U+12000 to U+12500).
}

The function returns the signs and separators in a format that allows exact reconstruction of the original string using \code{paste0(c("", signs), separators, collapse = "")}.
}
\value{
A list with three components:
  \item{signs}{A character vector containing the extracted Sumerian signs.}
  \item{separators}{A character vector of length \code{length(signs) + 1} containing the separators. The first element contains any text before the first sign, subsequent elements contain text between consecutive signs, and the last element contains any text after the final sign. Empty strings indicate no separator at that position.}
  \item{types}{An integer vector of the same length as \code{signs} indicating the type of each sign: \code{1} for lowercase transliterations, \code{2} for uppercase sign names, and \code{3} for cuneiform characters.}
}

\examples{

# Example 1
x <- "en-tarah-an-na-ke4"

result <- split_sumerian(x)

result

# Example 2

x <- "en-DARA3.AN.na-ke4"

result <- split_sumerian(x)

result

# Reconstruct the original string
paste0(c("", result$signs), result$separators, collapse = "")

}
\keyword{character}
\keyword{utilities}
