couplr 1.2.0
New Features
Full Matching
- New
full_match() function assigns
every unit to a matched group with variable ratios (1:k or k:1):
- Greedy group formation: match each left to nearest right, then
assign remaining right units to nearest matched left
- Caliper support:
caliper (absolute) or
caliper_sd (SD-based)
- Control group size constraints:
min_controls,
max_controls
- Weights inversely proportional to group size
- Returns
full_matching_result S3 class
Coarsened Exact Matching
(CEM)
- New
cem_match() function implements
coarsened exact matching:
- Coarsens continuous variables into bins (Sturges, FD, Scott, or
custom)
- Exact matching on coarsened values with stratum-based weights
- Support for categorical grouping variables via
grouping
parameter
- Custom cutpoints per variable via
cutpoints
parameter
- Returns
cem_result S3 class with matched units and
strata summary
Subclassification
- New
subclass_match() function divides
units into propensity score strata:
- Quantile-based stratification with configurable number of
subclasses
- Supports pre-computed PS, pre-fitted models, or formula
interface
- Target estimands: ATT, ATE, ATC with appropriate weighting
- Returns
subclass_result S3 class with subclass
summary
Output Layer & Ecosystem
Integration
- New
match_data() generic converts any
couplr result to analysis-ready format with treatment,
weights, subclass, and distance
columns. Methods for all result types (matching, full, CEM,
subclass).
- New
as_matchit() converter creates
matchit-class objects from couplr results, enabling interop
with cobalt, marginaleffects, and other MatchIt ecosystem packages.
- cobalt
bal.tab() methods for all
couplr result types. Requires cobalt package (in Suggests).
Mahalanobis Distance
Improvements
- Robust singularity check using
rcond()
instead of fragile det() == 0
- Custom
sigma parameter in
match_couples(), greedy_couples(), and
compute_distance_matrix() for user-supplied covariance
matrices
- Vectorized computation replacing nested R for-loops
for ~10x speedup
S3 Generics
balance_diagnostics() and join_matched()
are now S3 generics with methods for all result types. Existing code is
100% backward-compatible.
New Functions
full_match() - Variable-ratio full matching
cem_match() - Coarsened exact matching
subclass_match() - Propensity score
subclassification
match_data() - Unified analysis-ready output
as_matchit() - Convert to MatchIt format
couplr 1.1.0
New Features
Ratio and Replacement
Matching
- k:1 ratio matching via
ratio parameter
in match_couples() and greedy_couples().
Matches k control units to each treated unit by replicating the cost
matrix, then deduplicates assignments.
- With-replacement matching via
replace
parameter. Each treated unit independently selects its nearest control,
allowing controls to be reused across multiple treated units.
Propensity Score Matching
- New
ps_match() function wraps
match_couples() with logistic regression:
- Accepts a formula or pre-fitted
glm object
- Matches on the logit of propensity scores with a caliper
- Default caliper: 0.2 SD of logit(PS) (Rosenbaum and Rubin
recommendation)
- Returns matching_result with PS model metadata
Cardinality Matching
- New
cardinality_match() function
maximizes sample size subject to balance constraints:
- Starts with a full optimal match, then iteratively prunes imbalanced
pairs
- Balance threshold via
max_std_diff (default: 0.1 for
excellent balance)
- Configurable pruning speed with
batch_fraction
- Returns pruning diagnostics: iterations, pairs removed, final
balance
Sensitivity Analysis
- New
sensitivity_analysis() function
implements Rosenbaum bounds:
- Tests sensitivity of matched comparisons to hidden bias
- Uses Wilcoxon signed-rank statistic with upper/lower p-value
bounds
- Reports critical gamma (smallest gamma at which significance is
lost)
- S3 methods:
print(), summary(),
plot()
Visualization
autoplot() methods for ggplot2-based
visualizations (requires ggplot2):
autoplot.matching_result(): histogram, density, or ecdf
of distances
autoplot.balance_diagnostics(): love plot, histogram,
or variance ratio plot
autoplot.sensitivity_analysis(): gamma vs p-value
curve
- Enhanced
summary.matching_result() now
reports match rate and distance percentiles
New Functions
ps_match() - Propensity score matching with logit
caliper
cardinality_match() - Balance-constrained cardinality
matching
sensitivity_analysis() - Rosenbaum bounds sensitivity
analysis
Tests
- Added 58 new tests across 7 test files
- All 4916 tests passing across platforms
couplr 1.0.7
Bug Fixes
- Fixed undefined behavior (UB) in Gabow-Tarjan algorithm: replaced
left bit-shift of potentially negative values with multiplication to
avoid sanitizer errors on M1-SAN checks
- Fixed namespace conflict with
select() in vignettes by
using explicit dplyr::select() to prevent masking by MASS
or other packages
couplr 1.0.6
Documentation
- Added Overview section to algorithms vignette with audience and
prerequisites
- Fixed workflow diagram dark mode text handling in matching-workflows
vignette
- Improved SVG theme-awareness for multi-line text labels
- Removed grid lines from matching-workflows plots for cleaner
appearance
- Added threshold labels to balance comparison plot
couplr 1.0.0
Major New Features
(2025-11-19 Update)
Automatic Preprocessing and
Scaling
The package now includes intelligent preprocessing to improve
matching quality:
- New
auto_scale parameter in
match_couples() and greedy_couples() enables
automatic preprocessing
- Variable health checks detect and handle
problematic variables:
- Constant columns (SD = 0) are automatically excluded with
warnings
- High missingness (>50%) triggers warnings
- Extreme skewness (|skewness| > 2) is flagged
- Smart scaling method selection analyzes data and
recommends:
- “robust” scaling using median and MAD (resistant to outliers)
- “standardize” for traditional mean-centering and SD scaling
- “range” for min-max normalization
- New
preprocess_matching_vars() function for manual
preprocessing control
- Categorical variable encoding for binary and ordered factors
Balance Diagnostics
Comprehensive tools to assess matching quality:
- New
balance_diagnostics() function
computes multiple balance metrics:
- Standardized differences: (mean_left - mean_right) / pooled_sd
- Variance ratios: SD_left / SD_right
- Kolmogorov-Smirnov tests for distribution comparison
- Overall balance metrics (mean, max, % large imbalance)
- Quality thresholds with interpretation:
- |Std Diff| < 0.10: Excellent balance
- |Std Diff| 0.10-0.25: Good balance
- |Std Diff| 0.25-0.50: Acceptable balance
- |Std Diff| > 0.50: Poor balance
- Per-block statistics with quality ratings when blocking is used
balance_table() creates publication-ready formatted
tables
- Informative print methods with interpretation guides
Joined Matched Dataset
Output
Create analysis-ready datasets directly from matching results:
- New
join_matched() function automates
data preparation:
- Joins matched pairs with original left and right datasets
- Eliminates manual data wrangling after matching
- Select specific variables via
left_vars and
right_vars parameters
- Customizable suffixes (default:
_left,
_right) for overlapping columns
- Optional metadata:
pair_id, distance,
block_id
- Works with both optimal and greedy matching
- Broom-style
augment() method for
tidymodels integration:
- S3 method following broom package conventions
- Sensible defaults for quick exploration
- Supports all
join_matched() parameters
- Flexible output control:
include_distance - Include/exclude matching
distance
include_pair_id - Include/exclude sequential pair
IDs
include_block_id - Include/exclude block
identifiers
- Custom ID column support via
left_id and
right_id
- Clean column ordering: pair_id → IDs → distance → block →
variables
Precomputed and Reusable
Distances
Performance optimization for exploring multiple matching
strategies:
- New
compute_distances() function
precomputes and caches distance matrices:
- Compute distances once, reuse across multiple matching
operations
- Store complete metadata: variables, distance metric, scaling method,
timestamps
- Preserve original datasets for seamless integration with
join_matched()
- Enable rapid exploration of different matching parameters
- Performance improvement: ~60% faster when trying multiple matching
strategies
- Distance objects (S3 class
distance_object):
- Self-contained: cost matrix, IDs, metadata, original data
- Works with both
match_couples() and
greedy_couples()
- Pass as first argument instead of datasets:
match_couples(dist_obj, max_distance = 5)
- Informative print and summary methods with distance statistics
- Constraint modification via
update_constraints():
- Apply new
max_distance or calipers without
recomputing distances
- Creates new distance object following copy-on-modify semantics
- Experiment with different constraints efficiently
- Backward compatible integration:
- Modified function signatures:
match_couples(left, right = NULL, vars = NULL, ...)
- Automatically detects distance objects vs. datasets
- All existing code continues to work unchanged
Parallel Processing
Speed up blocked matching with multi-core processing:
- New
parallel parameter in
match_couples() and greedy_couples():
- Enable with
parallel = TRUE for automatic
configuration
- Specify plan with
parallel = "multisession" or other
future plan
- Works with any number of blocks - automatically determines if
beneficial
- Gracefully falls back if future packages not installed
- Powered by the
future package:
- Cross-platform support (Windows, Unix/Mac, clusters)
- Respects user-configured parallel backends
- Automatic worker management
- Clean restoration of original plan after execution
- Performance:
- Best for 10+ blocks with 50+ units per block
- Speedup scales with number of cores and complexity
- Minimal overhead for small problems
- Integration:
- Works with all blocking methods (exact, fuzzy, clustering)
- Compatible with distance caching from Step 4
- Supports all matching parameters (constraints, calipers,
scaling)
Fun Error Messages and
Cost Checking
Like testthat, couplr makes errors light, memorable, and helpful with
couple-themed messages:
- New
check_costs parameter (default:
TRUE) in match_couples() and
greedy_couples():
- Automatically checks distance distributions before matching
- Provides friendly, actionable warnings for common problems
- Set to
FALSE to skip checks in production code
- Fun couple-themed error messages throughout the
package:
- 💔 “No matches made - can’t couple without candidates!”
- 🔍 “Your constraints are too strict. Love can’t bloom in a
vacuum!”
- ✨ Helpful suggestions: “Try increasing max_distance or relaxing
calipers”
- 💖 Success messages: “Excellent balance! These couples are
well-matched!”
- Automatic problem detection:
- Too many zeros: Warns about duplicates or identical
values (>10% zero distances)
- Extreme costs: Detects skewed distributions (99th
percentile > 10x the 95th)
- Many forbidden pairs: Warns when constraints
eliminate >50% of valid pairs
- Constant distances: Alerts when all distances are
identical
- Constant variables: Detects and excludes variables
with no variation
- New diagnostic function
diagnose_distance_matrix():
- Comprehensive analysis of cost distributions
- Variable-specific problem detection
- Actionable suggestions for fixes
- Quality rating (good/fair/poor)
- Emoji control: Disable with
options(couplr.emoji = FALSE) if preferred
- Philosophy: Errors should be less intimidating,
more memorable, and provide clear guidance
New Functions
preprocess_matching_vars() - Main preprocessing
orchestrator
balance_diagnostics() - Comprehensive balance
assessment
balance_table() - Formatted balance tables for
reporting
join_matched() - Create analysis-ready datasets from
matching results
augment.matching_result() - Broom-style interface for
joined data
compute_distances() - Precompute and cache distance
matrices
update_constraints() - Modify constraints on distance
objects
is_distance_object() - Type checking for distance
objects
diagnose_distance_matrix() - Comprehensive distance
diagnostics
check_cost_distribution() - Check for distribution
problems
- Added robust scaling method using median and MAD
Documentation & Examples
examples/auto_scale_demo.R - 5 preprocessing
demonstrations
examples/balance_diagnostics_demo.R - 6 balance
diagnostic examples
examples/join_matched_demo.R - 8 joined dataset
demonstrations
examples/distance_cache_demo.R - Distance caching and
reuse examples
examples/parallel_matching_demo.R - 7 parallel
processing examples
examples/error_messages_demo.R - 10 fun error message
demonstrations
- Complete implementation documentation
(claude/IMPLEMENTATION_STEP1.md through STEP6.md)
- All functions have full Roxygen documentation
Tests
- Added 34+ new tests (10 for preprocessing, 11 for balance
diagnostics, 13 for joined datasets, tests for distance caching)
- All tests passing with full backward compatibility
Major Changes (Initial
1.0.0 Release)
Package Renamed: lapr → couplr
The package has been renamed from lapr to
couplr to better reflect its purpose as a general
pairing and matching toolkit.
couplr = Optimal pairing and matching via linear
assignment
Clean 1.0.0 Release
First official stable release with clean, well-organized
codebase.
New Organization
R Code
- Eliminated 3 redundant files
- Consistent
morph_* naming prefix
- Two-layer API:
assignment() (low-level) +
lap_solve() (tidy)
- 10 well-organized files (down from 13)
C++ Code
- Modular subdirectory structure:
src/core/ - Utilities and headers
src/interface/ - Rcpp exports
src/solvers/ - 14 LAP algorithms
src/gabow_tarjan/ - Gabow-Tarjan solver
src/morph/ - Image morphing
Features
Solvers
Hungarian, Jonker-Volgenant, Auction (3 variants), SAP/SSP,
SSAP-Bucket, Cost-scaling, Cycle-cancel, Gabow-Tarjan, Hopcroft-Karp,
Line-metric, Brute-force, Auto-select
High-Level
✅ Tidy tibble interface ✅ Matrix & data frame inputs
✅ Grouped data frames ✅ Batch solving + parallelization ✅ K-best
solutions (Murty, Lawler) ✅ Rectangular matrices ✅ Forbidden
assignments (NA/Inf) ✅ Maximize/minimize ✅ Pixel morphing
visualization
API
lap_solve() - Main tidy interface
lap_solve_batch() - Batch solving
lap_solve_kbest() - K-best solutions
assignment() - Low-level solver
- Utilities:
get_total_cost(),
as_assignment_matrix(), etc.
- Visualization:
pixel_morph(),
pixel_morph_animate()
Development history under “lapr” available in git log before
v1.0.0.