Using Rcpp with Boost.Regex for regular expression
Dirk Eddelbuettel —
written Mar 1, 2013 —
updated Mar 4, 2018 —
source
Gabor asked
about Rcpp use with regular expression libraries. This post shows a very simple example, based on
one of the Boost.Regex examples .
There is one big difference between this example, and other Boost examples,
possibly using the BH package. Here, we need to set linker
options as Boost regex requires its library . Similar restrictions apply for
Boost System Library ,
Boost Filesystem and a few
other Boost libraries.
Now, if you computer has them (as would be common under Linux or on macOS), then this can be as simple as
Sys.setenv ( "PKG_LIBS" = "-lboost_regex" )
provided the corresponding library libboost_regex
is indeed in one of the system library directories.
If so, the following example can be built:
// cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp
#include <Rcpp.h>
#include <string>
#include <boost/regex.hpp>
bool validate_card_format ( const std :: string & s ) {
static const boost :: regex e ( "( \\ d{4}[- ]){3} \\ d{4}" );
return boost :: regex_match ( s , e );
}
const boost :: regex e ( " \\ A( \\ d{3,4})[- ]?( \\ d{4})[- ]?( \\ d{4})[- ]?( \\ d{4}) \\ z" );
const std :: string machine_format ( " \\ 1 \\ 2 \\ 3 \\ 4" );
const std :: string human_format ( " \\ 1- \\ 2- \\ 3- \\ 4" );
std :: string machine_readable_card_number ( const std :: string & s ) {
return boost :: regex_replace ( s , e , machine_format , boost :: match_default | boost :: format_sed );
}
std :: string human_readable_card_number ( const std :: string & s ) {
return boost :: regex_replace ( s , e , human_format , boost :: match_default | boost :: format_sed );
}
// [[Rcpp::export]]
Rcpp :: DataFrame regexDemo ( std :: vector < std :: string > s ) {
int n = s . size ();
std :: vector < bool > valid ( n );
std :: vector < std :: string > machine ( n );
std :: vector < std :: string > human ( n );
for ( int i = 0 ; i < n ; i ++ ) {
valid [ i ] = validate_card_format ( s [ i ]);
machine [ i ] = machine_readable_card_number ( s [ i ]);
human [ i ] = human_readable_card_number ( s [ i ]);
}
return Rcpp :: DataFrame :: create ( Rcpp :: Named ( "input" ) = s ,
Rcpp :: Named ( "valid" ) = valid ,
Rcpp :: Named ( "machine" ) = machine ,
Rcpp :: Named ( "human" ) = human );
}
We can test the function using the same input as the Boost example:
s <- c ( "0000111122223333" , "0000 1111 2222 3333" , "0000-1111-2222-3333" , "000-1111-2222-3333" )
regexDemo ( s )
input valid machine human
1 0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333
2 0000 1111 2222 3333 TRUE 0000111122223333 0000-1111-2222-3333
3 0000-1111-2222-3333 TRUE 0000111122223333 0000-1111-2222-3333
4 000-1111-2222-3333 FALSE 000111122223333 000-1111-2222-3333
tags:
boost
basics
Related Articles