JJ Allaire — written Jun 29, 2014 — source
The RcppParallel package includes
high level functions for doing parallel programming with Rcpp. For example,
parallelReduce function can be used aggreggate values from a set of
inputs in parallel. This article describes using RcppParallel to sum an R
First a serial version of computing the sum of a vector. For this we use
a simple call to the STL
Now we adapt our code to run in parallel. We’ll use the
function to do this. As with the
we implement a “Worker” function object with our logic and RcppParallel takes
care of scheduling work on threads and calling our function when required. For
parallelReduce the function object has three jobs:
Implement a standard and “splitting” constructor. The standard constructor takes a pointer to the array that will be traversed and sets it’s sum variable to 0. The splitting constructor is called when work needs to be split onto other threads—it takes a reference to the instance it is being split from and simply copies the pointer to the input array and sets it’s internal sum to 0.
operator() to perform the summing. Here we just call
std::accumulate as we did in the serial version, but limit the accumulation
to the items specified by the
end arguments (note that other
threads will have been given the task of processing other items in the input
array). We save the accumulated value in our
value member variable.
Finally, we implement a
join method which composes the operations of two
Sum instances that were previously split. Here we simply add the accumulated
sum of the instance we are being joined with to our own.
Here’s the definition of the
Sum function object:
Sum derives from the
RcppParallel::Worker class. This is
required for function objects passed to
Note also that we use the
RVector<double> type for accessing the vector.
This is because this code will execute on a background thread where it’s not
safe to call R or Rcpp APIs. The
RVector class is included in the
RcppParallel package and provides a lightweight, thread-safe wrapper around R
Now that we’ve defined the functor, implementing the parallel sum
function is straightforward. Just initialize an instance of
with the input vector and call
A comparison of the performance of the two functions shows the parallel version performing about 4 times as fast on a machine with 4 cores:
test replications elapsed relative 2 parallelVectorSum(v) 100 0.248 1.000 1 vectorSum(v) 100 0.894 3.605
You can learn more about using RcppParallel at https://rcppcore.github.com/RcppParallel.