Chapter 21 Using C++ in R with Rcpp
- Sometimes R code is not fast enough no matter how hard you try 
- In that case, you can consider writing that slow part of your program in C++ 
- With the package - Rcpp, you can connect C++ with R easily
- A typical bottleneck that C++ can address is loops that cannot be (easily) vectorized because subsequent iterations depend on previous ones (e.g., Gibbs sampling when one perform Bayesian inference) 
Basic references: https://adv-r.hadley.nz/rcpp.html (this notes is almost completely following this link)
More examples: https://gallery.rcpp.org/
Useful Reference: https://teuder.github.io/rcpp4everyone_en/
Installation:
- On Windows, install Rtools (this is not a R package). 
- On Mac, install Xcode from the app store. 
Then, install Rcpp using install.packages("Rcpp") in R.
21.1 Basic Examples
Consider this R function that add \(3\) numbers:
Let’s write the C++ equivalent:
# need to use ' '  or " " inside cppFunction()
cppFunction(
'int add_C(int x, int y, int z) {
  int sum = x + y + z;
  return sum;
}'
)When you run this code, Rcpp will compile the C++ code and construct an R function that connects to the compiled C++ function.
# add works like a regular R function
add_C
## function (x, y, z) 
## .Call(<pointer: 0x00007ff9bb9e1610>, x, y, z)
add_C(1, 2, 3)
## [1] 6Important differences for writing functions between R and C++:
| R | C++ | |
|---|---|---|
| Declare output type | no | yes | 
| Declare input type | no | yes | 
| Require explicit return statement | no | yes | 
| Statements end with; | no | yes | 
The common data types are:
| Type | Vector | Scalar | Matrix | 
|---|---|---|---|
| numeric | NumericVector | double | NumericMatrix | 
| integer | IntegerVector | int | IntegerMatrix | 
| character | CharacterVector | String | CharacterMatrix | 
| logical | LogicalVector | bool | LogicalMatrix | 
21.2 Scalar Input, Scalar Output
sign_R <- function(x) {
  if (x > 0) {
    1
  } else if (x == 0) {
    0
  } else {
    -1
  }
}
cppFunction(
'int sign_C(int x) {
  if (x > 0) {
    return 1;
  } else if (x == 0) {
    return 0;
  } else {
    return -1;
  }
}'
)- ifsyntax is identical
21.3 Vector Input, Scalar Ouptut
sum_R <- function(x) {
  total <- 0
  n <- length(x)
  for (i in 1:n) {
    total <- total + x[i]
  }
  total
}
cppFunction(
'double sum_C(NumericVector x) {
  int n = x.size();
  double total = 0;
  for (int i = 0; i < n; ++i) {
    total = total + x[i];
  }
  return total;
}'
)- Need to declare the type of each variable ( - n,- total,- i)
- forloop syntax is different.
- In C++, need to use - =, not- <-
- Can also use - total += x[i]
- To find the length of a vector in C++, use - .size()
- IMPORTANT: in C++, vector indices start at \(0\). If your vector has \(n\) elements, the last position is at \(n-1\). 
Comparing performance using mark() from the package bench
library(bench)
x <- runif(1000)
mark(
  sum(x),
  sum_R(x),
  sum_C(x)
)
## # A tibble: 3 × 6
##   expression      min   median `itr/sec` mem_alloc
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>
## 1 sum(x)        1.3µs    1.4µs   516137.        0B
## 2 sum_R(x)     25.4µs   27.6µs    35059.   23.21KB
## 3 sum_C(x)      1.7µs    1.9µs   420129.    2.49KB
## # ℹ 1 more variable: `gc/sec` <dbl>You can see sum_R is much slower than the built-in sum (which is highly optimized).
sum_C is much faster than sum_R
21.4 Vector Input, Vector Output
Consider a function that computes \[\begin{equation*} \sqrt{(y_i - x)^2}, \quad i=1,\ldots,n. \end{equation*}\]
# vectorized R function
pdist_R <- function(x, y) {
  sqrt((y - x) ^ 2)
}
# R function with for loop
pdist_R_for <- function(x, y) {
  n <- length(y)
  out <- rep(0, n)
  for (i in 1:n) {
    out[i] = sqrt((y[i] - x)^2)
  }
  out
}
cppFunction(
'NumericVector pdist_C(double x, NumericVector y) {
  int n = y.size();
  NumericVector out(n);
  for(int i = 0; i < n; ++i) {
    out[i] = sqrt(pow(y[i] - x, 2.0));
  }
  return out;
}'
)- To create a numeric vector called - outwith length \(n\), use- NumericVector out(n)
- C++ uses - pow(), not- ^, for exponentiation
Compare the performance:
y <- runif(1e6)
mark(
  pdist_R(0.5, y),
  pdist_R_for(0.5, y),
  pdist_C(0.5, y)
)
## # A tibble: 3 × 6
##   expression           min  median `itr/sec` mem_alloc
##   <bch:expr>       <bch:t> <bch:t>     <dbl> <bch:byt>
## 1 pdist_R(0.5, y)   6.41ms  6.68ms     128.     7.63MB
## 2 pdist_R_for(0.5… 71.86ms  74.1ms      11.8    7.67MB
## 3 pdist_C(0.5, y)    3.5ms  3.81ms     253.     7.63MB
## # ℹ 1 more variable: `gc/sec` <dbl>The C++ version is faster than the vectorized R function in this case.
In general, I recommend if you can use a vectorized R function, you do not have to write the C++ version (at least not the first part to be considered when you are optimizing your code).
21.6 Using sourceCpp
We have illustrated how to write simple C++ functions with cppFunction(). For real problems, we have to break down our program in various pieces and we should use stand-alone C++ files.
For example, create a file called my_Rcpp.cpp that contains the following code:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double mean_C(NumericVector x) {
  int n = x.size();
  double total = 0;
  for(int i = 0; i < n; ++i) {
    total = total + x[i];
  }
  return total / n;
}In R, use sourceCpp() to compile your my_Rcpp.cpp file.
Compare the performance:
21.7 Example: Gibbs Sampler
R code:
gibbs_R <- function(N, thin) {
  mat <- matrix(nrow = N, ncol = 2)
  x <- 0
  y <- 0
  for (i in 1:N) {
    for (j in 1:thin) {
      x <- rgamma(1, 3, y * y + 4)
      y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
    }
    mat[i, ] <- c(x, y)
  }
  mat
}C++:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix gibbs_C(int N, int thin) {
  NumericMatrix mat(N, 2);
  double x = 0;
  double y = 0;
  for(int i = 0; i < N; i++) {
    for(int j = 0; j < thin; j++) {
      x = rgamma(1, 3, 1 / (y * y + 4))[0];
      y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];
    }
    mat(i, 0) = x;
    mat(i, 1) = y;
  }
  return(mat);
}- Use - (instead of- [to index into the matrix
- [0]is to convert the vector output into scalar
C++ version is about 20 times faster:
mark(
  gibbs_R(100, 10),
  gibbs_C(100, 10),
  check = FALSE
)
## # A tibble: 2 × 6
##   expression          min   median `itr/sec` mem_alloc
##   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>
## 1 gibbs_R(100, …   2.81ms   3.78ms      244.    4.94MB
## 2 gibbs_C(100, …    224µs 269.75µs     3612.     4.1KB
## # ℹ 1 more variable: `gc/sec` <dbl>21.8 More Details
- Comment in C++ is //.
Vector
Access an individual element of a vector using [] or ().
| R | Rcpp | 
|---|---|
| x <- rep(0, 5) | NumericVector x(5); | 
| x <- rep(1, 5) | NumericVector x(5, 1); | 
| x <- c(1, 2, 3) | NumericVector x = {1, 2, 3}; | 
| x[2] | x[1]orx(1) | 
Elementwise arithmetic operations are possible for NumericVector
For example:
cppFunction('
NumericVector vector_sum(NumericVector x, NumericVector y) {
  NumericVector sum = x + y;
  return sum;
}
')
x <- 1:5
y <- 2:6
vector_sum(x, y)
## [1]  3  5  7  9 11For comparison operators, we need to use logical vectors with LogicalVector.
For example:
cppFunction('
LogicalVector vector_greater(NumericVector x, NumericVector y) {
  LogicalVector comp = x > y;
  return comp;
}
')
x <- 1:5
y <- 2:6
vector_greater(x, y)
## [1] FALSE FALSE FALSE FALSE FALSEOther comparison operators: ==, !=, <, >, >=, <=
Matrix
| R | Rcpp | 
|---|---|
| A <- matrix(0, nrow = 2, ncol = 3) | NumericMatrix A(2, 3); | 
| x <- A[1, 2] | double x = A(0, 1); | 
| x <- A[, 2] | NumericVector x = A(_, 1); | 
cppFunction('
List matrix_operation(NumericMatrix A) {
  return List::create(Named("A") = A, 
  Named("row_number") = A.nrow(),
  Named("col_number") = A.ncol(),
  Named("transpose") = transpose(A));
}
')
A <- matrix(1:12, 3, 4)
matrix_operation(A)
## $A
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## $row_number
## [1] 3
## 
## $col_number
## [1] 4
## 
## $transpose
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12