Joshua French — written Jun 6, 2020 — source

Sometimes one needs to mimic the exact behavior of R’s `Distributions`

within C++
code. The incredible Rcpp team has provided access to these distributions through
`Rmath.h`

(in the `R::`

namespace), as well as through the `Rcpp::`

namespace where there
can be two forms: scalar as in R, and vectorized via Rcpp sugar. The behavior of these
functions may not always exactly match what the user expects from the standard R behavior,
particularly if attempting to use the functions in `Rmath.h`

. In particular, the functions
in `Rmath.h`

are not vectorized. In what follows, I will use Rcpp to mimic the
behavior of both the `rmultinom`

and `rpois`

functions available in base R so that this
functionality and behavior is provided in native C++.

The multinomial distribution
generalizes the binomial distribution to `k`

discrete outcomes instead of 2; consequently,
it is parameterized in terms of `k`

probabilities that must sum to 1. The base R function
`rmultinom`

used for generating multinomial data takes three arguments: `n`

the number of
simulated data sets to produce, `size`

, the number of multinomial outcomes to sample for
each data set, and `prob`

a numeric vector of probabilities. The function returns a `k`

$\times$ `n`

integer matrix.

The following C++ code uses the `R::rmultinom`

function available in `Rmath.h`

to generate
`size`

multinomial outcomes. The `R::rmultinom`

function relies on referencing a pointer
to an `IntegerVector`

to store the results. We create a helper function, `rmultinom_1`

,
that draws `size`

multinomial outcomes from the multinomial distribution based on the
probabilities in `prob`

. We then do this `n`

independent times in the function
`rmultinom_rcpp`

. To match the base R functionality, `rmultinom_rcpp`

returns a `k`

$\times$ `n`

`IntegerMatrix`

.

We now check if the `rmultinom`

and `rmultinom_rcpp`

functions produce the same results.
We generate a vector of 200 probabilities that sum to 1. We will sample 500 multinomial
outcomes and do this independently 20 times.

[1] TRUE

A benchmark of the functions suggests that the `rmultinom_rcpp`

function is very slightly
slower than the `rmultinom`

function, but that is not really a concern for our purposes.

Unit: milliseconds expr min lq mean median uq max neval cld rmultinom(1000, size, prob) 10.9042 11.1841 11.7729 11.6485 12.1532 14.1841 100 a rmultinom_rcpp(1000, size, prob) 11.1452 11.3780 12.0209 11.8841 12.2702 14.9434 100 b

The poisson distribution is a
non-negative discrete distribution characterized by having identical mean and
variance. The base R function `rpois`

used for generating Poisson data takes two
arguments: `n`

the number of simulated values to produce, and `lambda`

, a positive numeric
vector. The `rpois`

function cycles (and recycles) through the values in `lambda`

for each
successive value simulated. The function produces an integer vector of length `n`

. We
provide similar functionality using the `R::rpois`

function available in `Rmath.h`

. Note
that we cycle through the values of `lambda`

so that if the end of the `lambda`

vector is
reached before we have generated `n`

values, then we restart at the beginning of the
`lambda`

vector.

We now evaluate whether the `rpois`

and `rpois`

functions produce the same results. We
generate a positive vector with 200 values for `lambda`

and draw `length(lambda) + 5`

independent Poisson values.

[1] TRUE

A benchmark of the two functions suggests the `rpois_rcpp`

function may be slightly faster,
but once again, that is not our primary concern here.

Unit: microseconds expr min lq mean median uq max neval cld rpois(length(lambda) + 5, lambda) 7.455 7.7825 8.02154 7.909 8.2425 11.145 100 b rpois_rcpp(length(lambda) + 5, lambda) 6.737 6.9860 7.31607 7.182 7.4515 16.328 100 a

**tags:**
basics

- Handling R6 objects in C++ — David Quesada
- Mixing Rcpp modules and Rcpp attributes — Bob Jansen
- Nullable Optional Arguments in Rcpp functions — Satyaprakash Nayak