Armadillo subsetting

Dirk Eddelbuettel — written Jan 2, 2013 — updated Nov 19, 2016 — James Joseph Balamuta — source

Introduction

Subsetting in armadillo is a popular topic that frequently appears on StackOverflow. Prinicipally, the occurrences are due to the unique approach that armadillo takes on performing subset operations. Within this topic, light will be shed on the applicable use of armadillo’s subsetting capabilities as it relates to non-contiguous submatrix views.

Vector Subset and Assignment

To begin, we start with a recent inquiring on StackOverflow that dealt with the proposition of isolating pre-known locations and assigning the locations new values. The R equivalent would be in the lines of:

(v     <- 1:10)     # Generate and Preview Values

 [1]  1  2  3  4  5  6  7  8  9 10

pos    <- c(2,4,10) # Replacement IDs
rv     <- c(6,3,1)  # Replacement Values
v[pos] <- rv        # Change values
v                   # Observe updated vector

 [1] 1 6 3 3 5 6 7 8 9 1

Within armadillo, the submatrix views that we are most interested in to replicate this behavior is .elem(). The main difference of subsetting with this function is the need to use an arma::uvec containing the locations of elements to extract. The u prepended to the vec object standards for unsigned int, which are only able to store positive integer numbers (e.g. 0, 1, 2,…, 42, and so on.)

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::rowvec arma_sub(arma::rowvec x, arma::uvec pos, arma::vec vals) {
    
    x.elem(pos) = vals;  // Subset by element position and set equal to
  
    return x;
}

To execute this code, we would use:

(v <- 1:10)               # Generate and Preview Values

 [1]  1  2  3  4  5  6  7  8  9 10

arma_sub(v, pos - 1, rv)  # Note the pos shift from R to C++ index

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    6    3    3    5    6    7    8    9     1

Finding Values to Subset

Another common operation is to use logical indexing to ascertain where within a vector certain conditions are met. For instance, to find values that are greater than or equal to 0.5 in R the operation is governed by:

set.seed(112)        # Set seed for reproducibility
(v <- runif(10))     # Generate and Preview Values

 [1] 0.37667253 0.92201944 0.99187774 0.97723602 0.23629728 0.10706678
 [7] 0.03915213 0.72802690 0.13023496 0.39995203

v[v >= 0.5] <- 1     # Change values
v                    # Observe updated vector

 [1] 0.37667253 1.00000000 1.00000000 1.00000000 0.23629728 0.10706678
 [7] 0.03915213 1.00000000 0.13023496 0.39995203

Within armadillo, the use of logical values to subset is not possible without arma::find to replicate R’s logical index subset behavior. However, per the previous discussion, the values that .elem() takes requires indices. Therefore, the arma::find function is taking logical values and convert them into index positions. This operation is akin to the process behind R’s which() function. Furthermore, to specify a specific value that should be assigned to the subset, the use of .fill() is required.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::rowvec arma_sub_cond(arma::rowvec x, double val, double replace) {
    
    arma::uvec ids = find(x >= val); // Find indices
  
    x.elem(ids).fill(replace);       // Assign value to condition
  
    return x;
}

set.seed(112)            # Set seed for reproducibility
(v <- runif(10))         # Generate and preview Values

 [1] 0.37667253 0.92201944 0.99187774 0.97723602 0.23629728 0.10706678
 [7] 0.03915213 0.72802690 0.13023496 0.39995203

arma_sub_cond(v, 0.5, 1)

          [,1] [,2] [,3] [,4]      [,5]      [,6]       [,7] [,8]     [,9]
[1,] 0.3766725    1    1    1 0.2362973 0.1070668 0.03915213    1 0.130235
        [,10]
[1,] 0.399952

Subsetting a Matrix

As was the case with vectors, elements can be extracted from a matrix using both the arma::find function to obtain indices and the .elem() function in submatrix views to extract values. For example, to know the values of M * M' that are greater or equal to 100, one could do:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::vec matrix_sub(arma::mat M) {
    arma::mat Z = M * M.t();                        
    arma::vec v = Z.elem( find( Z >= 100 ) ); // Find IDs & obtain values
    return v;
}

The result:

(M <- matrix(1:9, 3, 3))   # Construct & Display Matrix

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

matrix_sub(M)

     [,1]
[1,]  108
[2,]  108
[3,]  126

Obtaining Values with Matrix Subscripts

Another form of subsetting within matrices is done using matrix notation. Matrix notation allows for the specification of an element based on their row and column location. This question recently popped up on StackOverflow as the non-continuous submatrix views were initially causing the user problems.

(M <- matrix(1:9, 3, 3))   # Construct & Display Matrix

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

(locs <- rbind(c(1, 2),    # List non-continuous locations
               c(3, 1),
               c(2, 3)))

     [,1] [,2]
[1,]    1    2
[2,]    3    1
[3,]    2    3

M[locs]                    # Subset the matrix

[1] 4 3 8

As one StackOverflow user discovered, the non-continuous submatrix views obtained via X.submat( vector_of_row_indices, vector_of_column_indices ) does not provide the traditional R matrix subset. Within armadillo 7.200, the ability to supply multiple matrix notations to sub2ind() was added. However, it requires a different matrix format than R. Instead of supplying the row and column as one row in the matrix, one must supply each observation as a column in the matrix. That is, the subscript matrix must be \(2 \times n\), where \(n\) is the number of matrix locations. The first row is associated with row location of the element in the matrix and the second row is requires the column location of the element.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::rowvec matrix_locs(arma::mat M, arma::umat locs) {
  
    arma::uvec eids = sub2ind( size(M), locs ); // Obtain Element IDs
    arma::vec v  = M.elem( eids );              // Values of the Elements
    
    return v.t();                               // Transpose to mimic R
}

The result:

cpp_locs <- locs - 1       # Shift indices from R to C++

(cpp_locs <- t(cpp_locs))  # Transpose matrix for 2 x n form

     [,1] [,2] [,3]
[1,]    0    2    1
[2,]    1    0    2

matrix_locs(M, cpp_locs)   # Subset the matrix

     [,1] [,2] [,3]
[1,]    4    3    8

Converting between `umat` matrix types to standard `arma` types.

Awhile back, on StackOverflow question, someone asked how convert to convert from arma::umat to arma::mat. The former is the format we have seen used for arma::find.

For the particular example at hand, a call to the conv_to<T>() converter provided the solution. For posterity, we rewrite the answer here.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::mat matrix_convert(arma::mat M) {
    
    arma::umat a = trans(M) > M;                      // Logical condition 
  
    arma::mat  N = arma::conv_to<arma::mat>::from(a); // Convert to double matrix
    
    return N;
}

(M <- matrix(1:9, 3, 3)) # Construct & Display Matrix

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

matrix_convert(M)        # Boolean display of M^T > M

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    0    0
[3,]    1    1    0

tags: armadillo matrix vector

Extending R with C++ and Fortran — Dirk Eddelbuettel and JBrandon Duck-Mayr
Benchmarking Rcpp code with RcppClock — Zach DeBruine
Simulation Smoother using RcppArmadillo — Tomasz Woźniak