Joshua French — written Jun 6, 2020 — source
Sometimes one needs to mimic the exact behavior of R’s Distributions
within C++
code. The incredible Rcpp team has provided access to these distributions through
Rmath.h
(in the R::
namespace), as well as through the Rcpp::
namespace where there
can be two forms: scalar as in R, and vectorized via Rcpp sugar. The behavior of these
functions may not always exactly match what the user expects from the standard R behavior,
particularly if attempting to use the functions in Rmath.h
. In particular, the functions
in Rmath.h
are not vectorized. In what follows, I will use Rcpp to mimic the
behavior of both the rmultinom
and rpois
functions available in base R so that this
functionality and behavior is provided in native C++.
The multinomial distribution
generalizes the binomial distribution to k
discrete outcomes instead of 2; consequently,
it is parameterized in terms of k
probabilities that must sum to 1. The base R function
rmultinom
used for generating multinomial data takes three arguments: n
the number of
simulated data sets to produce, size
, the number of multinomial outcomes to sample for
each data set, and prob
a numeric vector of probabilities. The function returns a k
$\times$ n
integer matrix.
The following C++ code uses the R::rmultinom
function available in Rmath.h
to generate
size
multinomial outcomes. The R::rmultinom
function relies on referencing a pointer
to an IntegerVector
to store the results. We create a helper function, rmultinom_1
,
that draws size
multinomial outcomes from the multinomial distribution based on the
probabilities in prob
. We then do this n
independent times in the function
rmultinom_rcpp
. To match the base R functionality, rmultinom_rcpp
returns a k
$\times$ n
IntegerMatrix
.
#include <Rcpp.h>
using namespace Rcpp;
IntegerVector rmultinom_1(unsigned int &size, NumericVector &probs, unsigned int &N) {
IntegerVector outcome(N);
rmultinom(size, probs.begin(), N, outcome.begin());
return outcome;
}
// [[Rcpp::export]]
IntegerMatrix rmultinom_rcpp(unsigned int &n, unsigned int &size, NumericVector &probs) {
unsigned int N = probs.length();
IntegerMatrix sim(N, n);
for (unsigned int i = 0; i < n; i++) {
sim(_,i) = rmultinom_1(size, probs, N);
}
return sim;
}
We now check if the rmultinom
and rmultinom_rcpp
functions produce the same results.
We generate a vector of 200 probabilities that sum to 1. We will sample 500 multinomial
outcomes and do this independently 20 times.
prob <- runif(200)
prob <- prob/sum(prob) # standardize the probabilities
size <- 500
n <- 20
set.seed(10)
sim_r <- rmultinom(n, size, prob)
set.seed(10)
sim_rcpp <- rmultinom_rcpp(n, size, prob)
all.equal(sim_r, sim_rcpp)
[1] TRUE
A benchmark of the functions suggests that the rmultinom_rcpp
function is very slightly
slower than the rmultinom
function, but that is not really a concern for our purposes.
microbenchmark::microbenchmark(
rmultinom(1000, size, prob),
rmultinom_rcpp(1000, size, prob)
)
Unit: milliseconds expr min lq mean median uq max neval cld rmultinom(1000, size, prob) 10.9042 11.1841 11.7729 11.6485 12.1532 14.1841 100 a rmultinom_rcpp(1000, size, prob) 11.1452 11.3780 12.0209 11.8841 12.2702 14.9434 100 b
The poisson distribution is a
non-negative discrete distribution characterized by having identical mean and
variance. The base R function rpois
used for generating Poisson data takes two
arguments: n
the number of simulated values to produce, and lambda
, a positive numeric
vector. The rpois
function cycles (and recycles) through the values in lambda
for each
successive value simulated. The function produces an integer vector of length n
. We
provide similar functionality using the R::rpois
function available in Rmath.h
. Note
that we cycle through the values of lambda
so that if the end of the lambda
vector is
reached before we have generated n
values, then we restart at the beginning of the
lambda
vector.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector rpois_rcpp(unsigned int &n, NumericVector &lambda) {
unsigned int lambda_i = 0;
IntegerVector sim(n);
for (unsigned int i = 0; i < n; i++) {
sim[i] = R::rpois(lambda[lambda_i]);
// update lambda_i to match next realized value with correct mean
lambda_i++;
// restart lambda_i at 0 if end of lambda reached
if (lambda_i == lambda.length()) {
lambda_i = 0;
}
}
return sim;
}
We now evaluate whether the rpois
and rpois
functions produce the same results. We
generate a positive vector with 200 values for lambda
and draw length(lambda) + 5
independent Poisson values.
lambda <- runif(200, 0.5, 3)
set.seed(10)
pois_sim_r <- rpois(length(lambda) + 5, lambda)
set.seed(10)
pois_sim_rcpp <- rpois_rcpp(length(lambda) + 5, lambda)
all.equal(pois_sim_r, pois_sim_rcpp)
[1] TRUE
A benchmark of the two functions suggests the rpois_rcpp
function may be slightly faster,
but once again, that is not our primary concern here.
microbenchmark::microbenchmark(
rpois(length(lambda) + 5, lambda),
rpois_rcpp(length(lambda) + 5, lambda)
)
Unit: microseconds expr min lq mean median uq max neval cld rpois(length(lambda) + 5, lambda) 7.455 7.7825 8.02154 7.909 8.2425 11.145 100 b rpois_rcpp(length(lambda) + 5, lambda) 6.737 6.9860 7.31607 7.182 7.4515 16.328 100 a
tags: basics
Tweet