The goal of pudu is to provide function declarations and inline
function definitions that facilitate cleaning strings in C++ code before
passing them to R. It works with cpp11::strings
and
std::vector<std::string>
objects.
The idea is the same as the janitor package, but for C++ code.
Why is the name Pudu? Pudu is the smallest deer on planet Earth and this package is tiny too. The original Pudu (unvectorized) was drawn by Pokanvas. This package emerged as a spinoff from the redatam package while cleaning strings in C++ code.
You can install the development version of pudu with:
::install_github("pachadotdev/pudu") remotes
Here is how you can use the functions in this package in C++ code:
#include <cpp11.hpp>
#include <pudu.hpp>
using namespace cpp11;
// Example 1
std::vector<std::string> x = {" REGION NAME "};
(x); // returns 'REGION NAME'
tidy_std_names
// Example 2
(x); // returns 'region_name'
tidy_std_vars
// Example 3
// test_tidy_r_names(" REGION NAME ") returns 'REGION NAME'
[[cpp11::register]] cpp11::writable::strings test_tidy_r_names(
const cpp11::strings& x) {
::writable::strings res = tidy_r_names(x);
cpp11return res;
}
// Example 4
// test_tidy_r_names(" REGION NAME ") returns 'region_name'
[[cpp11::register]] cpp11::writable::strings test_tidy_r_vars(
const cpp11::strings& x) {
::writable::strings res = tidy_r_vars(x);
cpp11return res;
}
Messy strings such as ” DEPTO. .REF_ID_ ” are converted to “depto_ref_id” or “DEPTO. .REF_ID_”.
The following tests in R should give an idea of how the functions work:
# German
<- "Gau\xc3\x9f"
vars expect_equal(test_tidy_r_names(vars), "gau")
expect_equal(test_tidy_r_vars(vars), "Gau\u00df")
# French
<- "c\xc2\xb4est-\xc3\xa0-dire"
vars expect_equal(test_tidy_r_names(vars), "c_est_a_dire")
expect_equal(test_tidy_r_vars(vars), "c\u00b4est-\u00e0-dire")
# Spanish
<- "\xc2\xbfC\xc3\xb3mo est\xc3\xa1s\x3f"
vars expect_equal(test_tidy_r_names(vars), "como_estas")
expect_equal(test_tidy_r_vars(vars), "\u00bfC\u00f3mo est\u00e1s\u003f")
# Japanese
<- "Konnichiwa \xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf"
vars expect_equal(test_tidy_r_names(vars), "konnichiwa")
expect_equal(test_tidy_r_vars(vars), "Konnichiwa \u3053\u3093\u306b\u3061\u306f")