Chapter 21 Using C++ in R with Rcpp
Sometimes R code is not fast enough no matter how hard you try
In that case, you can consider writing that slow part of your program in C++
With the package
Rcpp
, you can connect C++ with R easilyA typical bottleneck that C++ can address is loops that cannot be (easily) vectorized because subsequent iterations depend on previous ones (e.g., Gibbs sampling when one perform Bayesian inference)
Basic references: https://adv-r.hadley.nz/rcpp.html (this notes is almost completely following this link)
More examples: https://gallery.rcpp.org/
Useful Reference: https://teuder.github.io/rcpp4everyone_en/
Installation:
On Windows, install Rtools (this is not a R package).
On Mac, install Xcode from the app store.
Then, install Rcpp
using install.packages("Rcpp")
in R.
21.1 Basic Examples
Consider this R function that add \(3\) numbers:
Let’s write the C++ equivalent:
# need to use ' ' or " " inside cppFunction()
cppFunction(
'int add_C(int x, int y, int z) {
int sum = x + y + z;
return sum;
}'
)
When you run this code, Rcpp will compile the C++ code and construct an R function that connects to the compiled C++ function.
# add works like a regular R function
add_C
## function (x, y, z)
## .Call(<pointer: 0x00007fff86cd1610>, x, y, z)
add_C(1, 2, 3)
## [1] 6
Important differences for writing functions between R and C++:
R | C++ | |
---|---|---|
Declare output type | no | yes |
Declare input type | no | yes |
Require explicit return statement | no | yes |
Statements end with; | no | yes |
The common data types are:
Type | Vector | Scalar | Matrix |
---|---|---|---|
numeric | NumericVector |
double |
NumericMatrix |
integer | IntegerVector |
int |
IntegerMatrix |
character | CharacterVector |
String |
CharacterMatrix |
logical | LogicalVector |
bool |
LogicalMatrix |
21.2 Scalar Input, Scalar Output
sign_R <- function(x) {
if (x > 0) {
1
} else if (x == 0) {
0
} else {
-1
}
}
cppFunction(
'int sign_C(int x) {
if (x > 0) {
return 1;
} else if (x == 0) {
return 0;
} else {
return -1;
}
}'
)
if
syntax is identical
21.3 Vector Input, Scalar Ouptut
sum_R <- function(x) {
total <- 0
n <- length(x)
for (i in 1:n) {
total <- total + x[i]
}
total
}
cppFunction(
'double sum_C(NumericVector x) {
int n = x.size();
double total = 0;
for (int i = 0; i < n; ++i) {
total = total + x[i];
}
return total;
}'
)
Need to declare the type of each variable (
n
,total
,i
)for
loop syntax is different.In C++, need to use
=
, not<-
Can also use
total += x[i]
To find the length of a vector in C++, use
.size()
IMPORTANT: in C++, vector indices start at \(0\). If your vector has \(n\) elements, the last position is at \(n-1\).
Comparing performance using mark()
from the package bench
library(bench)
x <- runif(1000)
mark(
sum(x),
sum_R(x),
sum_C(x)
)
## # A tibble: 3 × 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 sum(x) 600ns 700ns 1373136. 0B 0
## 2 sum_R(x) 13.3µs 16.6µs 57293. 23.24KB 0
## 3 sum_C(x) 800ns 1µs 669550. 2.49KB 0
You can see sum_R
is much slower than the built-in sum
(which is highly optimized).
sum_C
is much faster than sum_R
21.4 Vector Input, Vector Output
Consider a function that computes \[\begin{equation*} \sqrt{(y_i - x)^2}, \quad i=1,\ldots,n. \end{equation*}\]
# vectorized R function
pdist_R <- function(x, y) {
sqrt((y - x) ^ 2)
}
# R function with for loop
pdist_R_for <- function(x, y) {
n <- length(y)
out <- rep(0, n)
for (i in 1:n) {
out[i] = sqrt((y[i] - x)^2)
}
out
}
cppFunction(
'NumericVector pdist_C(double x, NumericVector y) {
int n = y.size();
NumericVector out(n);
for(int i = 0; i < n; ++i) {
out[i] = sqrt(pow(y[i] - x, 2.0));
}
return out;
}'
)
To create a numeric vector called
out
with length \(n\), useNumericVector out(n)
C++ uses
pow()
, not^
, for exponentiation
Compare the performance:
y <- runif(1e6)
mark(
pdist_R(0.5, y),
pdist_R_for(0.5, y),
pdist_C(0.5, y)
)
## # A tibble: 3 × 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 pdist_R(0.5, y) 3.9ms 4.38ms 224. 7.63MB 9.34
## 2 pdist_R_for(0.5, y) 43.15ms 45.24ms 21.9 7.67MB 0
## 3 pdist_C(0.5, y) 2.43ms 2.59ms 361. 7.63MB 14.6
The C++ version is faster than the vectorized R function in this case.
In general, I recommend if you can use a vectorized R function, you do not have to write the C++ version (at least not the first part to be considered when you are optimizing your code).
21.6 Using sourceCpp
We have illustrated how to write simple C++ functions with cppFunction()
. For real problems, we have to break down our program in various pieces and we should use stand-alone C++ files.
For example, create a file called my_Rcpp.cpp
that contains the following code:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double mean_C(NumericVector x) {
int n = x.size();
double total = 0;
for(int i = 0; i < n; ++i) {
total = total + x[i];
}
return total / n;
}
In R, use sourceCpp()
to compile your my_Rcpp.cpp
file.
Compare the performance:
21.7 Example: Gibbs Sampler
R code:
gibbs_R <- function(N, thin) {
mat <- matrix(nrow = N, ncol = 2)
x <- 0
y <- 0
for (i in 1:N) {
for (j in 1:thin) {
x <- rgamma(1, 3, y * y + 4)
y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
}
mat[i, ] <- c(x, y)
}
mat
}
C++:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix gibbs_C(int N, int thin) {
NumericMatrix mat(N, 2);
double x = 0;
double y = 0;
for(int i = 0; i < N; i++) {
for(int j = 0; j < thin; j++) {
x = rgamma(1, 3, 1 / (y * y + 4))[0];
y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];
}
mat(i, 0) = x;
mat(i, 1) = y;
}
return(mat);
}
Use
(
instead of[
to index into the matrix[0]
is to convert the vector output into scalar
C++ version is about 20 times faster:
21.8 More Details
- Comment in C++ is
//
.
Vector
Access an individual element of a vector using []
or ()
.
R | Rcpp |
---|---|
x <- rep(0, 5) |
NumericVector x(5); |
x <- rep(1, 5) |
NumericVector x(5, 1); |
x <- c(1, 2, 3) |
NumericVector x = {1, 2, 3}; |
x[2] |
x[1] or x(1) |
Elementwise arithmetic operations are possible for NumericVector
For example:
cppFunction('
NumericVector vector_sum(NumericVector x, NumericVector y) {
NumericVector sum = x + y;
return sum;
}
')
x <- 1:5
y <- 2:6
vector_sum(x, y)
## [1] 3 5 7 9 11
For comparison operators, we need to use logical vectors with LogicalVector
.
For example:
cppFunction('
LogicalVector vector_greater(NumericVector x, NumericVector y) {
LogicalVector comp = x > y;
return comp;
}
')
x <- 1:5
y <- 2:6
vector_greater(x, y)
## [1] FALSE FALSE FALSE FALSE FALSE
Other comparison operators: ==
, !=
, <
, >
, >=
, <=
Matrix
R | Rcpp |
---|---|
A <- matrix(0, nrow = 2, ncol = 3) |
NumericMatrix A(2, 3); |
x <- A[1, 2] |
double x = A(0, 1); |
x <- A[, 2] |
NumericVector x = A(_, 1); |
cppFunction('
List matrix_operation(NumericMatrix A) {
return List::create(Named("A") = A,
Named("row_number") = A.nrow(),
Named("col_number") = A.ncol(),
Named("transpose") = transpose(A));
}
')
A <- matrix(1:12, 3, 4)
matrix_operation(A)
## $A
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
##
## $row_number
## [1] 3
##
## $col_number
## [1] 4
##
## $transpose
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12