Introduction to RcppNT2

Kevin Ushey — written Feb 1, 2016 — source

Modern CPU processors are built with new, extended instruction sets that optimize for certain operations. A class of these allow for vectorized operations, called Single Instruction / Multiple Data (SIMD) instructions. Although modern compilers will use these instructions when possible, they are often unable to reason about whether or not a particular block of code can be executed using SIMD instructions.

The Numerical Template Toolbox (NT2) is a collection of header-only C++ libraries that make it possible to explicitly request the use of SIMD instructions when possible, while falling back to regular scalar operations when not. NT2 itself is powered by Boost, alongside two proposed Boost libraries – Boost.Dispatch, which provides a mechanism for efficient tag-based dispatch for functions, and Boost.SIMD, which provides a framework for the implementation of algorithms that take advantage of SIMD instructions. RcppNT2 wraps and exposes these libraries for use with R.

The primary abstraction that Boost.SIMD uses under the hood is the boost::simd::pack<> data structure. This item represents a small, contiguous, pack of integral objects (e.g. doubles), and comes with a host of functions that facilitate the use of SIMD operations on those objects when possible. Although you don’t need to know the details to use the high-level functionality provided by Boost.SIMD, it’s useful for understanding what happens behind the scenes.

Here’s a quick example of how we might compute the sum of elements in a vector, using NT2.

// [[Rcpp::depends(RcppNT2)]]
#include <RcppNT2.h>
using namespace RcppNT2;

#include <Rcpp.h>
using namespace Rcpp;

// Define a functor -- a C++ class which defines a templated
// 'function call' operator -- to perform the addition of 
// two pieces of data.
struct add_two {
  template <typename T>
  T operator()(const T& lhs, const T& rhs) {
    return lhs + rhs;

// [[Rcpp::export]]
double simd_sum(NumericVector x) {
  // Pass the functor to 'simdReduce()'. This is an
  // algorithm provided by RcppNT2, which makes it
  // easy to apply nt2-style functor definitions
  // across a range of data.
  return simdReduce(x.begin(), x.end(), 0.0, add_two());

Behind the scenes, simdReduce() takes care of iteration over the provided sequence, and ensures that we use optimized SIMD instructions over packs of numbers when possible, and scalar instructions when not. By passing a templated functor, simdReduce() can automatically choose the correct template specialization depending on whether it’s working with a pack or not. In other words, two template specializations will be generated in this case: one with T = double, and another with T = boost::simd::pack<double>.

Let’s confirm that this produces the correct output, and run a small benchmark.

# helper function for printing microbenchmark output
printBm <- function(bm) {
  summary <- summary(bm)
  print(summary[, 1:7], row.names = FALSE)

# generate some data
data <- rnorm(1024 * 1000)

# verify that it produces the correct sum
all.equal(simd_sum(data), sum(data))
[1] TRUE
# compare results
bm <- microbenchmark(sum(data), simd_sum(data))
           expr     min       lq      mean    median       uq      max
      sum(data) 894.451 943.4145 1033.5598 1020.5000 1071.327 1429.533
 simd_sum(data) 280.585 293.6315  316.6797  307.8795  314.429  574.050

We get a noticable gain by taking advantage of SIMD instructions here. However, it’s worth noting that we don’t handle NA and NaN with the same granularity as R.

Learning More

This article provides just a taste of how RcppNT2 can be used. If you’re interested in learning more, please check out the RcppNT2 website.

tags: simd  parallel 

Related Articles