Tomasz Woźniak — written Sep 5, 2021 — source
Simulation smoothing involves drawing latent state variables in discrete-time state–space models from their conditional distribution given parameters and data as defined by McCausland, Miller, & Pelletier (2011). This tool greatly facilitates Bayesian estimation in the case of a popular family of conditionally linear Gaussian state-space models. This post presents Rcpp implementations of a sampling algorithm for a multivariate normal distribution. RcppArmadillo is used to efficiently calculate the Cholesky decomposition and implement the back-substitution to solve linear equations, which can greatly improves computation times when using routines for banded or tridiagonal matrices.
Consider the problem of sampling random numbers from a multivariate normal distribution specified by Chan & Jeliazkov (2009) as: \begin{equation}\label{eq1} \mathcal{N}_T\left(\Omega^{-1}\alpha, \Omega^{-1}\right) \end{equation} where:
the covariance matrix, denoted by $\Omega^{-1}$, is a $T\times T$ symmetric positive-definite matrix given by the inverse of a precision matrix $\Omega$ that, depending on the autoregressive lag order of the state equation, is band diagonal or tridiagonal and easy to compute, and
the mean of the distribution, denoted by $\Omega^{-1}\alpha$, is constructed as the product of the inverse of the precision matrix times an easy-to-compute $T$-vector $\alpha$.
Let $L$ be a lower-triangular matrix obtained by the Cholesky decomposition of the precision matrix $\Omega=LL’$. Consequently, the mean of the normal distribution above can be presented as $(LL’)^{-1}\alpha = L^{-1\prime}L^{-1}\alpha$. Let a $T$-column vector $\epsilon$ contain independent random draws from the standard normal distribution. A random draw from the target distribution can be computed then by: \begin{equation}\label{eq2} L^{-1\prime}L^{-1}\alpha + L^{-1\prime}\epsilon = L^{-1\prime}\left(L^{-1}\alpha + \epsilon\right) \end{equation} The formulation of the problem on the right-hand side of the equation above follows the proposition of McCausland, Miller, & Pelletier (2011) and leads to the sampling algorithm following the steps:
The computational gains from applying the algorithm presented above are twofold. First, taking into account that matrix $\Omega$ is band or tridiagonal makes computing the Cholesky decomposition fast if dedicated routines are applied and leads to a lower-triangular matrix $L$ with as many non-zero subdiagonals as in the precision matrix. Secondly, solving the linear equations using back-substitution bypasses the computation of the inverse of $L$ and takes advantage of the special type of matrix $L$. This strategy greatly reduces the computational time required to obtain the random draws and, therefore, was recommended by Frühwirth-Schnatter & Kastner (2014) and Bitto & Frühwirth-Schnatter (2019) following the developments by Chan & Jeliazkov (2009) and McCausland, Miller, & Pelletier (2011).
All the presented solutions are designed to sample n
draws from the target distribution and return them in columns of the output matrix draw
. The results are required to be reproducible. Therefore, the compatibility with R’s set.seed
is obtained by generating normal random draws using Rcpp::rnorm
rather than arma::randn
. All the considered functions take three arguments: the number of requested draws n
, the precision
matrix, and the covector location
. Then they read the dimension of the normal vector T
, fill matrix epsilon
with normal random draws, create the location_matrix
of appropriate dimensions whose columns are filled with location
using each_col
, and proceed to the implementation of steps 3 and 4 from the sampling algorithm presented above.
The first function rmvnorm_arma_inv
is based on the Cholesky decomposition computed by chol
subsequently declared as an upper-triangular matrix using trimatu
, and uses function inv
to compute the inverse. This matrix is then used to compute the draws from the target distribution using matrix multiplication and summation.
The second function rmvnorm_arma_solve
uses chol
and trimatu
the same way as the first one. However, this implementation uses solve
searching for the solutions to the linear equations using back-substitution. Additionally, function solve
checks if the matrix is band and applies dedicated numerical procedures as explained in the API Documentation for Armadillo 10.6. This additional check facilitates the computational gains. The development of the two functions considered thus far was somehow inspired by the RcppGallery article by Hardt, Ahmadou, & Christoffersen (2013-2020).
The third function rmvnorm_arma_stochvol
is based on three utility functions from package stochvol by Kastner (2016) and Hosszejni & Kastner. These functions implement dedicated numerical algorithms described in McCausland, Miller, & Pelletier (2011) taking advantage of the fact that, in many applications of the state-space models, all the elements on the first sub-diagonal of the precision matrix are given by the same number. In this case, the algorithms for the Cholesky decomposition, forward- and back-substitution simplify granting further computational time gains. The three functions taken from the stochvol package namespace are cholesky_tridiagonal
, forward_algorithm
, and backward_algorithm
. The only changes implemented in the functions presented in the listing below consider the definitions of object types that were simplified. The function rmvnorm_arma_stochvol
exhibits a slightly adjusted structure relative to the first two presented functions for the simulation smoother accommodating the fact that the functions forward_algorithm
, and backward_algorithm
accept only a vector as the third argument and not a matrix.
In addition to the cpp functions, three R functions are presented for the sake of comparing the computational time of their execution. These functions exhibit a similar structure as the cpp functions with respect to the definition of the arguments, outputs, and the order of implementing the sampling algorithm steps.
The first function rmvnorm_solve
uses the functionality of R’s base package and is very similar in its structure to function rmvnorm_arma_inv
. The Cholesky decomposition is performed using function chol
and the inversion of a matrix using function solve
. These functions seem to not check the type of the input matrix.
The remaining two functions use LAPACK library R wrappers included in package mgcv by Wood (2021). This package provides dedicated functions for the Cholesky decomposition for band and tridiagonal matrices. Additionally, these implementations use R’s base package wrappers for the level-3 BLAS routine dtrsm
for solving linear equations by forward- and back-substitution given by functions forwardsolve
and backsolve
respectively. They recognize a special type of the input matrix to be inverted to solve the linear equations.
The function rmvnorm_bandchol
is written for a band precision matrix and uses function bandchol
, being the implementation of LAPACK’s dpbtrf
routine, to compute the Cholesky decomposition.
The third function rmvnorm_trichol
is written for a tridiagonal precision matrix and uses function trichol
from package mgcv to compute the Cholesky decomposition using LAPACK’s dpttrf
routine.
The compiled versions of the three R functions were constructed using package compiler. However, they improved the computational time of their execution only by a small margin relative to the original functions and, thus, are skipped in the reporting below.
Another package SparseM by Koenker & Ng (2003) provides useful functions to sample random draws from the target distribution. Its application requires defining the precision matrix as sparse using function as.matrix.csr
, computation of the Cholesky decomposition using function chol
, and solving the linear equations using function backsolve
. The computational gains from using such an implementation are significant relative to the basic R’s implementation rmvnorm_solve
. However, as the package vignette points out, function chol
returns a solution subjected to a random permutation of rows and columns, which makes the results uncomparable with the reproducible implementations considered here.
The functions are compared using routine microbenchmark
from package microbenchmark in an exercise consisting of sampling n=10
draws from the target distribution with the location
matrix filled with standard normal random draws, and the precision
matrix imitating the structure of a precision matrix for a state-space equation given by an autoregressive process of order 1. It implies that this matrix is tridiagonal with all the elements on the first sub-diagonal being equal to the same number. The results are reported for the precision
and location
of size T=250
which is motivated by the number of observations available for quarterly macroeconomic aggregates for many developed countries.
Function microbenchmark
includes the set.seed
specification and returns the tabular outcome only if the matrices sampled using alternative functions are equal to one another.
Unit: microseconds expr min lq mean median uq max neval cld cpp.inv 790.184 882.677 3582.257 938.960 1678.659 49951.59 100 a cpp.solve 504.267 914.869 1016.789 965.231 1020.201 1711.95 100 a cpp.stochvol 172.530 322.851 401.402 346.279 364.765 4501.92 100 a r.solve 2352.417 2583.490 17877.209 3918.437 17634.294 279008.16 100 b r.bandchol 1834.725 4234.104 6738.529 4432.209 5653.852 93943.39 100 a r.trichol 484.612 813.264 4031.655 891.851 1605.639 97691.21 100 a
Unsurprisingly, function rmvnorm_arma_stochvol
that is custom-made for this example wins the performance comparison. It is twice as fast as another cpp implementation rmvnorm_arma_solve
, three times faster than the fastest R implementation rmvnorm_trichol
, and over thirteen times faster than R’s rmvnorm_bandchol
.
Two other unreported runs of this exercise were performed for T=750
and T=2500
, the sample sizes corresponding to the number of observations available for monthly macroeconomic aggregates and ten years of daily financial data respectively. The ordering of the functions in terms of their average execution time remained the same while the multipliers increased. For instance, when T=2500
the function rmvnorm_arma_stochvol
was over six times faster than both rmvnorm_arma_solve
and rmvnorm_trichol
and over seventeen times faster than rmvnorm_arma_inv
and rmvnorm_bandchol
.
One can expect substantial gains in the computation times by writing the functions in Rcpp for the simulation smoother. The more fine-tuned the simulation algorithm to the application at hand the more improvements are provided. The computational gains are significant in the context of using the simulation smoother in the Gibbs sampler, an iterative Bayesian estimation procedure in which a single or multiple draws from the multivariate normal distribution are sampled at each iteration, where the number of iterations ranges between several thousand up to even a million.
Tomasz thanks Alex Ballentine, Matthieu Droumaguet, and Gregor Kastner for their useful feedback.
tags: armadillo
Tweet