Chapter 21 Using C++ in R with Rcpp

  • Sometimes R code is not fast enough no matter how hard you try

  • In that case, you can consider writing that slow part of your program in C++

  • With the package Rcpp, you can connect C++ with R easily

  • A typical bottleneck that C++ can address is loops that cannot be (easily) vectorized because subsequent iterations depend on previous ones (e.g., Gibbs sampling when one perform Bayesian inference)

Basic references: https://adv-r.hadley.nz/rcpp.html (this notes is almost completely following this link)

More examples: https://gallery.rcpp.org/

Useful Reference: https://teuder.github.io/rcpp4everyone_en/

Installation:

  • On Windows, install Rtools (this is not a R package).

  • On Mac, install Xcode from the app store.

Then, install Rcpp using install.packages("Rcpp") in R.

21.1 Basic Examples

library(Rcpp)

Consider this R function that add \(3\) numbers:

add_R <- function(x, y, z) {
  x + y + z
}
add_R(1, 2, 3)
## [1] 6

Let’s write the C++ equivalent:

# need to use ' '  or " " inside cppFunction()
cppFunction(
'int add_C(int x, int y, int z) {
  int sum = x + y + z;
  return sum;
}'
)

When you run this code, Rcpp will compile the C++ code and construct an R function that connects to the compiled C++ function.

# add works like a regular R function
add_C
## function (x, y, z) 
## .Call(<pointer: 0x00007fff86cd1610>, x, y, z)

add_C(1, 2, 3)
## [1] 6

Important differences for writing functions between R and C++:

R C++
Declare output type no yes
Declare input type no yes
Require explicit return statement no yes
Statements end with; no yes

The common data types are:

Type Vector Scalar Matrix
numeric NumericVector double NumericMatrix
integer IntegerVector int IntegerMatrix
character CharacterVector String CharacterMatrix
logical LogicalVector bool LogicalMatrix

21.2 Scalar Input, Scalar Output

sign_R <- function(x) {
  if (x > 0) {
    1
  } else if (x == 0) {
    0
  } else {
    -1
  }
}

cppFunction(
'int sign_C(int x) {
  if (x > 0) {
    return 1;
  } else if (x == 0) {
    return 0;
  } else {
    return -1;
  }
}'
)
  • if syntax is identical

21.3 Vector Input, Scalar Ouptut

sum_R <- function(x) {
  total <- 0
  n <- length(x)
  for (i in 1:n) {
    total <- total + x[i]
  }
  total
}

cppFunction(
'double sum_C(NumericVector x) {
  int n = x.size();
  double total = 0;
  for (int i = 0; i < n; ++i) {
    total = total + x[i];
  }
  return total;
}'
)
  • Need to declare the type of each variable (n, total, i)

  • for loop syntax is different.

  • In C++, need to use =, not <-

  • Can also use total += x[i]

  • To find the length of a vector in C++, use .size()

  • IMPORTANT: in C++, vector indices start at \(0\). If your vector has \(n\) elements, the last position is at \(n-1\).

Comparing performance using mark() from the package bench

library(bench)
x <- runif(1000)
mark(
  sum(x),
  sum_R(x),
  sum_C(x)
)
## # A tibble: 3 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 sum(x)        600ns    700ns  1373136.        0B        0
## 2 sum_R(x)     13.3µs   16.6µs    57293.   23.24KB        0
## 3 sum_C(x)      800ns      1µs   669550.    2.49KB        0

You can see sum_R is much slower than the built-in sum (which is highly optimized).

sum_C is much faster than sum_R

21.4 Vector Input, Vector Output

Consider a function that computes \[\begin{equation*} \sqrt{(y_i - x)^2}, \quad i=1,\ldots,n. \end{equation*}\]

# vectorized R function
pdist_R <- function(x, y) {
  sqrt((y - x) ^ 2)
}

# R function with for loop
pdist_R_for <- function(x, y) {
  n <- length(y)
  out <- rep(0, n)
  for (i in 1:n) {
    out[i] = sqrt((y[i] - x)^2)
  }
  out
}

cppFunction(
'NumericVector pdist_C(double x, NumericVector y) {
  int n = y.size();
  NumericVector out(n);

  for(int i = 0; i < n; ++i) {
    out[i] = sqrt(pow(y[i] - x, 2.0));
  }
  return out;
}'
)
  • To create a numeric vector called out with length \(n\), use NumericVector out(n)

  • C++ uses pow(), not ^, for exponentiation

Compare the performance:

y <- runif(1e6)
mark(
  pdist_R(0.5, y),
  pdist_R_for(0.5, y),
  pdist_C(0.5, y)
)
## # A tibble: 3 × 6
##   expression               min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 pdist_R(0.5, y)        3.9ms   4.38ms     224.     7.63MB     9.34
## 2 pdist_R_for(0.5, y)  43.15ms  45.24ms      21.9    7.67MB     0   
## 3 pdist_C(0.5, y)       2.43ms   2.59ms     361.     7.63MB    14.6

The C++ version is faster than the vectorized R function in this case.

In general, I recommend if you can use a vectorized R function, you do not have to write the C++ version (at least not the first part to be considered when you are optimizing your code).

21.5 Returning a list

Return a list with two elements with names sum and diff:

cppFunction('
List sum_diff_C(double x, double y) {
  return List::create(Named("sum") = x + y, Named("diff") = x - y);
}
')

sum_diff_C(2, 3)
## $sum
## [1] 5
## 
## $diff
## [1] -1

21.6 Using sourceCpp

We have illustrated how to write simple C++ functions with cppFunction(). For real problems, we have to break down our program in various pieces and we should use stand-alone C++ files.

For example, create a file called my_Rcpp.cpp that contains the following code:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double mean_C(NumericVector x) {
  int n = x.size();
  double total = 0;

  for(int i = 0; i < n; ++i) {
    total = total + x[i];
  }
  return total / n;
}

In R, use sourceCpp() to compile your my_Rcpp.cpp file.

sourceCpp("my_Rcpp.cpp")

Compare the performance:

x <- runif(1e5)
mark(
  mean(x),
  mean_C(x)
)
## # A tibble: 2 × 6
##   expression      min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 mean(x)     123.6µs  130.8µs     7312.        0B        0
## 2 mean_C(x)    40.6µs   42.6µs    22568.    2.49KB        0

21.7 Example: Gibbs Sampler

R code:

gibbs_R <- function(N, thin) {
  mat <- matrix(nrow = N, ncol = 2)
  x <- 0
  y <- 0

  for (i in 1:N) {
    for (j in 1:thin) {
      x <- rgamma(1, 3, y * y + 4)
      y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
    }
    mat[i, ] <- c(x, y)
  }
  mat
}

C++:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix gibbs_C(int N, int thin) {
  NumericMatrix mat(N, 2);
  double x = 0;
  double y = 0;

  for(int i = 0; i < N; i++) {
    for(int j = 0; j < thin; j++) {
      x = rgamma(1, 3, 1 / (y * y + 4))[0];
      y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];
    }
    mat(i, 0) = x;
    mat(i, 1) = y;
  }
  return(mat);
}
  • Use ( instead of [ to index into the matrix

  • [0] is to convert the vector output into scalar

C++ version is about 20 times faster:

mark(
  gibbs_R(100, 10),
  gibbs_C(100, 10),
  check = FALSE
)
## # A tibble: 2 × 6
##   expression            min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 gibbs_R(100, 10)   1.81ms   2.19ms      421.    4.94MB     9.57
## 2 gibbs_C(100, 10)  153.9µs  191.3µs     4922.     4.1KB     6.36

21.8 More Details

  • Comment in C++ is //.

Vector

Access an individual element of a vector using [] or ().

R Rcpp
x <- rep(0, 5) NumericVector x(5);
x <- rep(1, 5) NumericVector x(5, 1);
x <- c(1, 2, 3) NumericVector x = {1, 2, 3};
x[2] x[1] or x(1)

Elementwise arithmetic operations are possible for NumericVector

For example:

cppFunction('
NumericVector vector_sum(NumericVector x, NumericVector y) {
  NumericVector sum = x + y;
  return sum;
}
')

x <- 1:5
y <- 2:6
vector_sum(x, y)
## [1]  3  5  7  9 11

For comparison operators, we need to use logical vectors with LogicalVector.

For example:

cppFunction('
LogicalVector vector_greater(NumericVector x, NumericVector y) {
  LogicalVector comp = x > y;
  return comp;
}
')

x <- 1:5
y <- 2:6
vector_greater(x, y)
## [1] FALSE FALSE FALSE FALSE FALSE

Other comparison operators: ==, !=, <, >, >=, <=

Matrix

R Rcpp
A <- matrix(0, nrow = 2, ncol = 3) NumericMatrix A(2, 3);
x <- A[1, 2] double x = A(0, 1);
x <- A[, 2] NumericVector x = A(_, 1);
cppFunction('
List matrix_operation(NumericMatrix A) {
  return List::create(Named("A") = A, 
  Named("row_number") = A.nrow(),
  Named("col_number") = A.ncol(),
  Named("transpose") = transpose(A));
}
')


A <- matrix(1:12, 3, 4)
matrix_operation(A)
## $A
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## $row_number
## [1] 3
## 
## $col_number
## [1] 4
## 
## $transpose
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12