Kevin Ushey — written Feb 1, 2016 — source
Modern CPU processors are built with new, extended instruction sets that optimize for certain operations. A class of these allow for vectorized operations, called Single Instruction / Multiple Data (SIMD) instructions. Although modern compilers will use these instructions when possible, they are often unable to reason about whether or not a particular block of code can be executed using SIMD instructions.
The Numerical Template Toolbox (NT2)
is a collection of header-only C++ libraries that make it
possible to explicitly request the use of SIMD instructions
when possible, while falling back to regular scalar
operations when not. NT2 itself is powered
by Boost, alongside two proposed
Boost libraries – Boost.Dispatch
, which provides a
mechanism for efficient tag-based dispatch for functions,
and Boost.SIMD
, which provides a framework for the
implementation of algorithms that take advantage of SIMD
instructions. RcppNT2
wraps and exposes these libraries for use with R
.
The primary abstraction that Boost.SIMD
uses under the
hood is the boost::simd::pack<>
data structure. This item
represents a small, contiguous, pack of integral objects
(e.g. double
s), and comes with a host of functions that
facilitate the use of SIMD operations on those objects when
possible. Although you don’t need to know the details to use
the high-level functionality provided by Boost.SIMD
, it’s
useful for understanding what happens behind the scenes.
Here’s a quick example of how we might compute the sum of elements in a vector, using NT2.
Behind the scenes, simdReduce()
takes care of iteration
over the provided sequence, and ensures that we use optimized SIMD
instructions over packs of numbers when possible, and scalar
instructions when not. By passing a templated functor,
simdReduce()
can automatically choose the correct template
specialization depending on whether it’s working with a pack
or not. In other words, two template specializations will be
generated in this case: one with T = double
, and another
with T = boost::simd::pack<double>
.
Let’s confirm that this produces the correct output, and run a small benchmark.
[1] TRUE
expr min lq mean median uq max sum(data) 894.451 943.4145 1033.5598 1020.5000 1071.327 1429.533 simd_sum(data) 280.585 293.6315 316.6797 307.8795 314.429 574.050
We get a noticable gain by taking advantage of SIMD
instructions here. However, it’s worth noting that we don’t
handle NA
and NaN
with the same granularity as R
.
This article provides just a taste of how RcppNT2 can be used. If you’re interested in learning more, please check out the RcppNT2 website.
Tweet