Kevin Ushey — written Jan 8, 2013 — source
This is a quick example of how you might use Rcpp to send and receive R ‘strings’ to and from R. We’ll demonstrate this with a few operations.
Note that we can do this in R in a fairly fast way:
[1] "aelpps" "adn" "abceeinrrrs"
Let’s see if we can re-create the output with Rcpp.
Note the main things we do here:
as
-ing and wrap
-ing of vectors; we even just
specify our return type as std::vector< std::string >
.void
method std::sort
, which can sort a string in place,Now, let’s test it, and let’s benchmark it as well.
[1] "aelpps" "adn" "abceeinrrrs"
test replications elapsed relative user.self 1 cpp_str_sort(long_strings) 3 0.898 1.000 0.883 2 R_str_sort(long_strings) 3 2.356 2.624 2.350 sys.self user.child sys.child 1 0.014 0 0 2 0.007 0 0
Note that the C++ implementation is quite a bit faster (on my machine). However,
std::sort
will not handle UTF-8 encoded vectors.
Now, let’s do something crazy – let’s see if we can use Rcpp to perform an operation that takes a vector of strings, and returns a list of vectors of strings. (Or, in R parlance, a list of vectors of type character).
We’ll do a simple ‘split’, such that each string is split every n
indices.
Main things to notice:
List
,List
container of size num_strings
,out[i] = tmp
, we can assign our vector
of strings directly as an element of the list),[[1]] [1] "ab" "cd" [[2]] [1] "ef" "gh" [[3]] [1] "ij" "kl"
[[1]] [1] "ab" [[2]] [1] "de"
My solution is perhaps a bit deficient (bug or feature?) in that it truncates any strings not long enough; ideally, we’d either improve the C++ code or form an appropriate wrapper to the function in R (and warn the user if truncation might occur).
Hopefully this gives you a better idea how you might use Rcpp to perform more extensive string manipulation with R character vectors.
Tweet