Joshua French — written Jun 6, 2020 — source
Sometimes one needs to mimic the exact behavior of R’s Distributions
within C++
code. The incredible Rcpp team has provided access to these distributions through
Rmath.h
(in the R::
namespace), as well as through the Rcpp::
namespace where there
can be two forms: scalar as in R, and vectorized via Rcpp sugar. The behavior of these
functions may not always exactly match what the user expects from the standard R behavior,
particularly if attempting to use the functions in Rmath.h
. In particular, the functions
in Rmath.h
are not vectorized. In what follows, I will use Rcpp to mimic the
behavior of both the rmultinom
and rpois
functions available in base R so that this
functionality and behavior is provided in native C++.
The multinomial distribution
generalizes the binomial distribution to k
discrete outcomes instead of 2; consequently,
it is parameterized in terms of k
probabilities that must sum to 1. The base R function
rmultinom
used for generating multinomial data takes three arguments: n
the number of
simulated data sets to produce, size
, the number of multinomial outcomes to sample for
each data set, and prob
a numeric vector of probabilities. The function returns a k
$\times$ n
integer matrix.
The following C++ code uses the R::rmultinom
function available in Rmath.h
to generate
size
multinomial outcomes. The R::rmultinom
function relies on referencing a pointer
to an IntegerVector
to store the results. We create a helper function, rmultinom_1
,
that draws size
multinomial outcomes from the multinomial distribution based on the
probabilities in prob
. We then do this n
independent times in the function
rmultinom_rcpp
. To match the base R functionality, rmultinom_rcpp
returns a k
$\times$ n
IntegerMatrix
.
We now check if the rmultinom
and rmultinom_rcpp
functions produce the same results.
We generate a vector of 200 probabilities that sum to 1. We will sample 500 multinomial
outcomes and do this independently 20 times.
[1] TRUE
A benchmark of the functions suggests that the rmultinom_rcpp
function is very slightly
slower than the rmultinom
function, but that is not really a concern for our purposes.
Unit: milliseconds expr min lq mean median uq max neval cld rmultinom(1000, size, prob) 10.9042 11.1841 11.7729 11.6485 12.1532 14.1841 100 a rmultinom_rcpp(1000, size, prob) 11.1452 11.3780 12.0209 11.8841 12.2702 14.9434 100 b
The poisson distribution is a
non-negative discrete distribution characterized by having identical mean and
variance. The base R function rpois
used for generating Poisson data takes two
arguments: n
the number of simulated values to produce, and lambda
, a positive numeric
vector. The rpois
function cycles (and recycles) through the values in lambda
for each
successive value simulated. The function produces an integer vector of length n
. We
provide similar functionality using the R::rpois
function available in Rmath.h
. Note
that we cycle through the values of lambda
so that if the end of the lambda
vector is
reached before we have generated n
values, then we restart at the beginning of the
lambda
vector.
We now evaluate whether the rpois
and rpois
functions produce the same results. We
generate a positive vector with 200 values for lambda
and draw length(lambda) + 5
independent Poisson values.
[1] TRUE
A benchmark of the two functions suggests the rpois_rcpp
function may be slightly faster,
but once again, that is not our primary concern here.
Unit: microseconds expr min lq mean median uq max neval cld rpois(length(lambda) + 5, lambda) 7.455 7.7825 8.02154 7.909 8.2425 11.145 100 b rpois_rcpp(length(lambda) + 5, lambda) 6.737 6.9860 7.31607 7.182 7.4515 16.328 100 a
tags: basics
Tweet