| Type: | Package |
| Title: | Functional Data Analysis Pipeline, Extracting Functional Traits from Biological Time-Series Data |
| Version: | 1.1.0 |
| Description: | Provides a pipeline of tools for analysing circadian time-series data using functional data analysis (FDA). The package supports smoothing of rhythmic time series, functional principle component analysis (FPCA), and extraction of group-level traits from functional representations. Analyses can incorporate multiple curve derivatives and optional temporal segmentation, enabling comparative analysis of circadian dynamics across experimental groups and time windows. |
| URL: | https://github.com/scllock/TimeTraits |
| BugReports: | https://github.com/scllock/TimeTraits/issues |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | stats, pracma, lomb, fda, fdaoutlier |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-18 06:02:16 UTC; sarahlock |
| Author: | Sarah Lock [aut, cre] |
| Maintainer: | Sarah Lock <sarah.lock@york.ac.uk> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-20 10:30:20 UTC |
TimeTraits: Functional Data Analysis Pipeline, Extracting Functional Traits from Biological Time-Series Data.
Description
Provides tools for analysing circadian time-series using functional data analysis, including smoothing, functional principal component analysis, and extraction of group-level traits.
Author(s)
Maintainer: Sarah Lock sarah.lock@york.ac.uk
See Also
Useful links:
Filter, detrend, and quality-control luminescence time-series data
Description
filter_curves preprocesses luminescence time-series data by applying
a series of quality control and rhythm-detection steps. The function first
subsets the data to a specified time window, removes samples with low signal
intensity, detrends each time series using linear regression, and then
assesses rhythmicity using Lomb–Scargle periodogram analysis. Only samples
showing statistically significant rhythmic components are retained.
Usage
filter_curves(
x,
time,
meta = NULL,
from = min(time, na.rm = TRUE),
to = max(time, na.rm = TRUE),
min_lum = 0,
periodogram_length = NULL
)
Arguments
x |
A numeric matrix of luminescence values, where rows correspond to time points and columns correspond to samples. Sparse matrices are allowed. |
time |
A numeric vector giving the time points corresponding to the rows
of |
meta |
An optional data frame containing sample metadata. Must include
a |
from |
Numeric value specifying the start time for analysis. Defaults
to the minimum of |
to |
Numeric value specifying the end time for analysis. Defaults to
the maximum of |
min_lum |
Numeric threshold specifying the minimum luminescence value required for a sample to be retained. Samples with a minimum value less than or equal to this threshold are removed. Default is 0. |
periodogram_length |
Numeric value specifying the length (in the same
units as |
Details
The function returns detrended curves that pass all filtering criteria, along with optional metadata for retained samples and a log of samples removed at each step.
After subsetting the data to the requested time window, samples are filtered based on minimum luminescence to remove low-signal or failed measurements. Each remaining time series is detrended by fitting a linear model against time and extracting residuals. Rhythmicity is then assessed using a Lomb–Scargle periodogram over periods between 18 and 30 hours. Samples whose strongest spectral peak does not exceed the corresponding significance threshold are classified as non-rhythmic and removed.
A record of all removed samples and the reason for their removal is returned to support transparent quality control.
Value
A list with the following elements:
- curves
A data frame containing the time vector and detrended luminescence curves that passed all filtering steps.
- meta_kept
A data frame of metadata corresponding to retained samples, or
NULLif no metadata were supplied.- removed
A data frame logging samples removed during filtering and the reason for removal, or
NULLif no samples were removed.
Examples
data(mydata_example)
data <- mydata_example[, -1]
time <- as.numeric(mydata_example[, 1])
res <- filter_curves(
x = data,
time = time,
from = 24,
to = 144,
min_lum = 200,
periodogram_length = 48
)
plot(res$curves[,1],res$curves[,2], xlab = "time", ylab = "relative luminescence")
Functional PCA traits for circadian rhythm data
Description
Performs functional principal component analysis (FPCA) on smoothed circadian time-series data and extracts group-level traits from the resulting FPCA score space.
Usage
functional_traits(
smooth_out,
time,
meta = NULL,
group_col = "Genotype",
groups = NULL,
derivative = c(0, 1, 2),
n_pc = NULL,
segments = NULL,
centerfns = FALSE
)
Arguments
smooth_out |
An object of class |
time |
A numeric vector giving the time points corresponding to the functional observations. |
meta |
An optional data frame containing sample metadata. Must include
a column named |
group_col |
Character string giving the name of the column in
|
groups |
Optional factor or vector defining group membership for each
sample. Used only if |
derivative |
Integer vector specifying which derivatives of the
functional data to analyse (e.g. 0 = original curve, 1 = first derivative).
Default is |
n_pc |
Integer giving the number of principal components to retain.
If |
segments |
Optional named list defining time segments over which
traits should be summarised. Each element must be a numeric vector of
length two giving the start and end times (e.g.
|
centerfns |
Logical indicating whether to centre functional observations
before FPCA. Passed to |
Details
The analysis can optionally be stratified by time segments (e.g. pre/post environmental shift) and by derivatives of the curves. When segments are supplied, functional observations from all segments are first re-evaluated onto a shared time grid and analysed jointly, ensuring that FPCA scores are directly comparable across segments.
For each derivative, FPCA is performed once across all samples and segments simultaneously. Segment- and group-specific traits are then computed post hoc from the shared FPCA score space.
Extracted traits include:
Mean and standard deviation of FPCA scores (per principal component)
Mean and standard deviation of distance to the FPCA-space centroid
Convex hull area in PC1–PC2 space (measure of within-group diversity)
FPCA is performed in a shared functional space for each derivative. When segments are provided, all segment-specific curves are interpolated onto a common time grid prior to FPCA, ensuring statistical comparability of FPCA scores across segments. Group-level traits are computed after FPCA and do not influence the decomposition itself.
Convex hull area is computed in PC1-PC2 space only and requires at least three samples per group and segment.
Value
A list with the following components:
- traits
A tidy data frame of group-level FPCA-derived traits, including marginal PC summaries and FPCA-space dispersion measures.
- scores
A data frame of FPCA scores for each sample, annotated with group, segment, and derivative information.
- fpca
A named list of
pca.fdobjects, one per derivative.- meta
The metadata data frame aligned to the FPCA input order. A two-column data frame containing
Sample_idandGenotype.
See Also
Examples
time_vec <- matrix(seq(0, 48, length.out = 100))
Genotype_A <- matrix(
sin(2 * pi * (1:100) / 24) +
matrix(rnorm(100 * 5, 0, 0.1), nrow = 100, ncol = 5),
nrow = 100,
ncol = 5,
dimnames = list(NULL, paste0("A", 1:5)))
Genotype_B <- matrix(
0.3 * sin(2 * pi * (1:100) / 24) +
matrix(rnorm(100 * 5, 0, 0.5), nrow = 100, ncol = 5),
nrow = 100,
ncol = 5,
dimnames = list(NULL, paste0("B", 1:5)))
raw_data <- cbind(Genotype_A, Genotype_B)
smooth_data <- smooth_fun(raw_data, time_vec)
meta_df <- data.frame(
Sample_id = colnames(raw_data),
Genotype = rep(c("A", "B"), each = 5))
fpca_res <- functional_traits(
smooth_out = smooth_data$smooth,
time = time_vec,
meta = meta_df,
n_pc = 2)
head(fpca_res$traits)
cols <- c(A = "red", B = "blue")
with(subset(fpca_res$scores, derivative == 0),
plot(PC1, PC2, pch = 19, xlab = "PC1", ylab = "PC2", col = cols[group]))
legend("topright", legend = names(cols), col = cols, pch = 19)
Arabidopsis thaliana luciferase circadian rhythm dataset
Description
Raw luminescence time-series data from an Arabidopsis thaliana luciferase reporter experiment examining circadian responses to changes in photoperiod.
Usage
data(mydata_example)
Format
A data frame with 300 rows and 189 variables:
- new_time
Numeric vector giving time since experiment start (hours).
- WS2.001–WS2.096
Luminescence counts per second for individual WS-2 wild-type plants. Each column corresponds to one biological replicate.
- phyb.001–phyb.096
Luminescence counts per second for individual phyB mutant plants. Each column corresponds to one biological replicate.
All luminescence values are raw photon counts per second (cps).
Details
Plants (Ws-2 wild-type and phyB mutant lines) were entrained under short-day conditions (8 hours light: 16 hours dark) and subsequently shifted to long-day conditions (16 hours light: 8 hours dark). Luminescence was recorded at regular time intervals for approximately 166 hours.
The dataset is provided in wide format, with one column per individual plant and one row per time point (h).
Source
Experimental data generated as part of a luciferase circadian rhythm study.
Examples
data(mydata_example)
head(mydata_example)
Select optimal number of basis functions and smoothing parameter
Description
This function performs leave-one-timepoint-out cross-validation (LOOCV)
on a single functional trajectory to determine the optimal number of
B-spline basis functions ('nbasis') and smoothing parameter ('lambda')
for fitting a functional data object using fda::smooth.basis.
Usage
select_basis_lambda_LOOCV(
data,
time,
nbasis_range = seq(10, 100, 5),
lambda_range = 10^seq(-6, 2, length.out = 10),
norder = 4,
penalty_order = 2
)
Arguments
data |
A numeric vector of observed values (one curve). |
time |
A numeric vector of time points corresponding to 'data'. |
nbasis_range |
A numeric vector specifying the range of basis function numbers to test. |
lambda_range |
A numeric vector of lambda values to test
(e.g. |
norder |
Integer specifying the order of the B-spline basis (default = 4, cubic). |
penalty_order |
Integer specifying the derivative order used in the roughness penalty (default = 2). |
Details
Each time point is left out in turn, the curve is refit using the remaining observations, and prediction error at the omitted time point is recorded. Mean squared prediction error is used as the selection criterion.
This function is computationally expensive and intended as a helper routine for selecting smoothing parameters prior to large-scale functional analyses.
The function searches for the best combination of number of basis functions
and roughness penalty by minimizing leave-one-out cross-validation error.
The smoothing parameter (\lambda) controls the trade-off between
goodness of fit and smoothness; larger values enforce smoother curves.
Value
A list with:
best_nbasis |
Number of basis functions minimizing LOOCV error. |
best_lambda |
Lambda value minimizing LOOCV error. |
cv_errors |
Matrix of mean squared LOOCV errors across all combinations. |
nbasis_range |
Tested nbasis values. |
lambda_range |
Tested lambda values. |
Examples
time <- seq(0, 48, length.out = 100)
y <- sin(2 * pi * time / 24) + rnorm(100, 0, 0.1)
result <- select_basis_lambda_LOOCV(y, time,
nbasis_range = seq(5, 25, 5),
lambda_range = 10^seq(-6, 0, length = 5))
print(result)
Smooth circadian time-series data with optional shape-based outlier detection
Description
smooth_fun smooths circadian time-series data using a functional data
analysis (FDA) framework based on cubic B-spline basis functions. Each curve
is optionally normalised, smoothed individually, and evaluated on a common
time grid. An optional shape-based outlier detection step can be applied using
total variation depth, after which the remaining curves are smoothed
collectively.
Usage
smooth_fun(
x,
time,
meta = NULL,
normalize = TRUE,
outlier = FALSE,
deriv = 1,
nbasis = 50,
lambda = 1
)
Arguments
x |
A numeric matrix of observations, where rows correspond to time points and columns correspond to samples. |
time |
A numeric matrix giving the time values corresponding to the rows
of |
meta |
An optional data frame containing sample metadata. Must include a
|
normalize |
Logical indicating whether each curve should be normalised
to the range [0, 1] prior to smoothing. Default is |
outlier |
Logical indicating whether shape-based outlier detection should
be performed using total variation depth. If |
deriv |
Integer specifying the derivative order at which curves are evaluated for outlier detection. Default is 1 (first derivative). |
nbasis |
Integer specifying the maximum number of B-spline basis functions used when smoothing individual curves. The effective number of basis functions is constrained by the number of observations per curve. Default is 50. |
lambda |
Numeric value controlling the roughness penalty applied during smoothing of individual curves. Higher values produce smoother curves. Default is 1. |
Details
The function supports metadata tracking, allowing identification of samples removed during outlier detection.
Each curve is first optionally normalised and stripped of missing values. Curves are smoothed individually using cubic B-spline basis functions with a second-derivative roughness penalty. Functional parameter objects are cached and reused for curves sharing the same basis configuration to improve computational efficiency.
When outlier = TRUE, curves are smoothed again using a higher roughness
penalty and evaluated at the specified derivative order. Shape outliers are
identified using total variation depth via
fdaoutlier::tvdmss and removed prior to final smoothing.
Value
A list with the following components:
- curves
A numeric matrix of smoothed curves evaluated on a common time grid, with columns corresponding to retained samples.
- smooth
A functional data object containing the final smoothed curves.
- time
A numeric vector giving the common time grid used for evaluation.
- meta_kept
A data frame of metadata for retained samples, or
NULLif no metadata were supplied.- meta_removed
A data frame of metadata for samples removed during outlier detection, or
NULLif no samples were removed.
Examples
data(mydata_example)
data <- mydata_example[, -1]
time <- as.matrix(mydata_example[, 1])
smooth_curves <- smooth_fun(
x = data,
time = time,
nbasis = 40,
lambda = 1,
outlier = FALSE
)
plot(smooth_curves$smooth)