Ross Bennett — written Jan 6, 2013 — source
First, let us consider a running sum function in pure R. To get started,
I looked at the source code of the TTR package to see the algorithm
runSum function uses a Fortran routine to compute
the running/rolling sum of a vector. The
run_sum_R function below is
my interpretation of that algorithm implemented in R.
Many thanks to the package author, Joshua Ulrich, for pointing out to me
that, in many cases, the algorithm is more critical to performance than
test replications elapsed relative 1 run_sum_R(x, 50) 100 3.364 1.007 2 run_sum_R(x, 100) 100 3.339 1.000 3 run_sum_R(x, 150) 100 3.390 1.015 4 run_sum_R(x, 200) 100 3.590 1.075
For these benchmarks, I will just focus on the performance of the functions
for a fixed
x and varying the value of
n. The results of the benchmark
run_sum_R show that the elapsed time is fairly constant for the
given values of
n (i.e. O(1)).
Now let us consider a running sum function in C++, call it
One approach is to loop through each element of the given vector
calling std::accumulate to compute the running sum.
test replications elapsed relative 1 run_sum_v1(x, 50) 100 0.045 1.000 2 run_sum_v1(x, 100) 100 0.088 1.956 3 run_sum_v1(x, 150) 100 0.128 2.844 4 run_sum_v1(x, 200) 100 0.170 3.778
Although the elapsed times of
run_sum_v1 are quite fast, note that the
time increases approximately linearly as
n increases (i.e. O(N)). This
will become a problem if we use this function with large values of
Now let us write another running sum function in C++ that uses
the same algorithm that is used in
run_sum_R, call it
test replications elapsed relative 1 run_sum_v2(x, 50) 100 0.007 1 2 run_sum_v2(x, 100) 100 0.007 1 3 run_sum_v2(x, 150) 100 0.007 1 4 run_sum_v2(x, 200) 100 0.007 1
The benchmark results of
run_sum_v2 are quite fast and much more
favorable than both
run_sum_v1. The elapsed time is
approximately constant across the given values of
n (i.e O(N)).
Finally, let us benchmark all three functions as well as
the TTR package for a point of reference using larger values for the
test replications elapsed relative 3 run_sum_v2(y, 4500) 100 0.082 1.00 1 runSum(y, 4500) 100 0.889 10.84 4 run_sum_R(y, 4500) 100 33.717 411.18 2 run_sum_v1(y, 4500) 100 37.538 457.78
An interesting result of benchmarking with these larger values is
run_sum_R is faster than
run_sum_v1 for the given values.
This example demonstrates that it is not always the case that C++ code
is faster than R code. The inefficiency of
run_sum_v1 is due to having
std::accumulate inside the for loop. For a vector of size 100,000 and
n = 5000,
std::accumulate is called 95,001 times!
This is obviously not an “apples-to-apples” comparison because a different algorithm is used, but the point of the example is to demonstrate the importance of the algorithm regardless of the programming language.
It should be noted that
runSum does some extra
work in R such as checking for a valid
n, non-leading NAs, etc.
and should be considered when comparing the benchmark results of