Introduction
Sometimes R code just isn’t fast enough and there’s no simply way to make the code any faster.
The Rcpp
package provides a consistent API for seamlessly accessing, extending or modifying R objects at the C++ level. The API is a rewritten and extented version of an earlier API which we refer to as the ‘classic Rcpp API’.
Rcpp, R and C++
All examples in this chapter need at least version 0.10.1 of the Rcpp
package. This version includes cppFunction
and sourceCpp
, which makes it very easy to connect C++ to R. You’ll also need a working C++ compiler.
Before using any of Rccp facilities, we need at least:
1 |
require(Rcpp) |
1 |
## Loading required package: Rcpp |
Moreover, in order to perfom some comparisons, we also load:
1 |
require(rbenchmark) |
1 |
## Loading required package: rbenchmark |
1 |
require(microbenchmark) |
1 |
## Loading required package: microbenchmark |
A simple example with scalars
cppFunction
allows developer to write C++ functions in R like this:
1 2 3 4 5 6 |
cppFunction(' int add(int x, int y, int z) { int sum = x + y + z; return sum; }' ) |
Once the function is defined, its print method displays info about the function
1 |
add |
1 2 |
## function (x, y, z) ## .Primitive(".Call")(<pointer: 0x2b7921338c10>, x, y, z) |
and we can use it as any other R
function:
1 |
add(1, 2, 3) |
1 |
## [1] 6 |
Some notes about R and C++ that may be worth to recall before moving forward might be:
- In C++ we must declare the type of output the function returns.
- The classes for the most common types of R vectors are:
NumericVector
,IntegerVector
,CharacterVector
andLogicalVector
. - Scalars and vectors in C++ are different.
- The scalar equivalents of
numeric
,integer
,character
andlogical
R
vectors are, respectively,double
,int
,String
andbool
inC++
.
Brief introduction to vectors
Now let us try a slightly more complex example.
1 2 3 4 5 6 7 8 9 10 |
cppFunction(' double sumC(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total; } ') |
The above function simply returns the sum of elements of a vector passed as a parameter to function sumC()
.
Note the use of NumericVector
to “say” to sumC()
that the argument x
is a vector containing floating point (double precision) numeric values.
Note also that the C++ code performs an explicit loop on all the vector elements to calculate the sum.
sumC()
function is available in .Global
environment, as all R functions created by user:
1 |
ls() |
1 |
## [1] "add" "sumC" |
And the speed of sumC()
is comparable to speed of native R sum()
function:
1 2 3 4 5 |
y=rnorm(1000000) microbenchmark( sumC(y), sum(y) ) |
1 2 3 4 |
## Unit: milliseconds ## expr min lq median uq max neval ## sumC(y) 1.002 1.012 1.036 1.057 1.198 100 ## sum(y) 1.021 1.047 1.055 1.074 1.201 100 |
Now let us try to produce a yet more complex example: a function that calculates the inner product between two equal-length vectors:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
cppFunction(' double sumP(NumericVector x, NumericVector y) { int nx = x.size(); int ny = y.size(); double sumup=0; if(nx==ny) { for(int i=0; i < nx; i++) sumup += x[i] * y[i]; } else sumup = NA_REAL; // NA_REAL: constant of NA value for numeric (double) values return sumup; }' ) |
We than compare our newly created function with two equivalent R
native approaches:
1 2 3 4 5 |
microbenchmark( sumP(rnorm(1E5),rnorm(1E5)), sum(rnorm(1E5)*rnorm(1E5)), t(rnorm(1E5)) %*% rnorm(1E5) ) |
1 2 3 4 5 |
## Unit: milliseconds ## expr min lq median uq max neval ## sumP(rnorm(1e+05), rnorm(1e+05)) 16.19 16.66 16.78 17.18 17.97 100 ## sum(rnorm(1e+05) * rnorm(1e+05)) 16.35 16.64 16.86 17.17 44.39 100 ## t(rnorm(1e+05)) %*% rnorm(1e+05) 17.22 17.51 17.72 18.24 46.14 100 |
In above example, the function sumP()
seems a just about faster than both sum()
and the linear algebra-based calculation. Moreover, some small changes to the function may reveal quite sensible advantages either in terms of memory usage, or in terms of performance:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
x <- rnorm(1E8) y <- rnorm(1E8) #Test C function mmC1=gc(reset=TRUE) tmC=system.time(sumP(x,y)) mmC2=gc() #Test standard R mmR1=gc(reset=TRUE) tmR=system.time(sum(x*y)) mmR2=gc() #Test Matrix calculation mmR3=gc(reset=TRUE) tmR1=system.time(t(x) %*% y) mmR4=gc() |
The amount of memory requested by sumP()
to perform the computations is close to 0: mmC2[2,6]
–mmC1[2,6]
=
0.1 MB, whereas sum()
needed mmR2[2,6]
–mmR1[2,6]
=
763 MB.
This because the calculation performed by sum()
is made of two steps:
- Calculate the vector containing the element by element product
- Sum the values of products
This produced another side effct: the time requested to perform the inner product alone with sumP()
is about 40% of the time requested by sum()
, since some time was needed to create the vector and the two calculations “steps” (instead of only one of sumP()
) in sum()
.
The same calculations performed by linear algebra operators return the worst performances both in terms of execution speed and in terms of memory usage.
That is, by a simple function written in C++, we gain sensibily faster and less memory hungry code.
Finally, we could object that the crossprod()
R function does the same computations performed by sumP()
. This is true, but as an example, this could be equally useful.
Now, let’s try a new problem: we need a function that “generates” (“simulates”) an n
-length realization from an AR(1) process with parameter phi
, starting from start
and having a white noise error with standard deviation equal to sigma
.
In this case the return value is not a scalar but a vector. However, the solution is simple:
1 2 3 4 5 6 7 8 9 10 11 12 |
cppFunction(' NumericVector arC(double start, int n, double phi, double sigma) { RNGScope scope; NumericVector out(n); out[0]=start; for(int i = 1; i < n; i++) { out[i] = phi*out[i-1]+rnorm(1,0,sigma)[0]; } return out; } ') |
The arC()
function will return a n
-length vector directly manageable by R.
Notice in above code the use of rnorm()
function, that behave exactly as the R rnorm()
function.
In Rcpp sugar (an abstraction layer of Rcpp that provide some facilities to the developer) all the main r/p/q/d R “vector” distribution functions are directly available to programmer.
Notice also the RNGScope scope;
line of code, which sets the random number generatator on entry to a block and resets it on exit.
Let’s try the function:
1 2 3 4 5 6 7 |
set.seed(1000) phi=.7 ts=arC(10,500,phi,2) plot(ts,type="l", main=paste("Plot of AR(1) Rcpp realization with parameter",phi)) abline(h=0,col="blue",lty=2) grid() |
1 2 3 |
op=par(mfrow=c(2,1)) acf(x=ts) pacf(x=ts) |
1 |
par(op) |
Obviously, to avoid many calls to rnorm()
in above function, a unique call like rnorm(n,0,sigma)
can be made, but this requires more RAM:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
cppFunction(' NumericVector arC2(double start, int n, double phi, double sigma) { RNGScope scope; NumericVector out(n); NumericVector rn=rnorm(n,0,sigma); out[0]=start; for(int i = 1; i < n; i++) { out[i] = phi*out[i-1]+rn[i]; } return out; } ') |
Assuming, we want to write a similar function in R
, we must write more complicated or much slower code. The simplest, and perhaps slowest, solution is:
1 2 3 4 5 6 7 8 9 |
arR=function(start, n , phi, sigma){ out=numeric(n) out[1]=start for( i in 2:n){ out[i]=phi*out[i-1]+rnorm(1,sd=sigma) } return(out) } |
And this is the plot:
1 2 3 4 5 6 7 |
set.seed(1000) phi=.7 tsR=arR(10,500,phi,2) plot(ts,type="l", main=paste("Plot of AR(1) R realization with parameter",phi)) abline(h=0,col="blue",lty=2) grid() |
Note that, given the assigned seed, the Rcpp and R functions produce exactly the same numeric output. That means that the compiled C++ code produced by Rcpp is totally integrated with R libraries:
1 |
sum(abs(ts-tsR)) |
1 |
## [1] 0 |
A slightly better R implementation of the same function could be:
1 2 3 4 5 6 7 8 9 10 |
arR2=function(start, n , phi, sigma){ out=numeric(n) out[1]=start err=rnorm(n,sd=sigma) for( i in 2:n){ out[i]=phi*out[i-1]+err[i] } return(out) } |
The four above functions give the following performance results:
1 2 3 4 5 6 |
(mb=microbenchmark( arC(10,500,phi,2), arC2(10,500,phi,2), arR(10,500,phi,2), arR2(10,500,phi,2) )) |
1 2 3 4 5 6 |
## Unit: microseconds ## expr min lq median uq max neval ## arC(10, 500, phi, 2) 50.39 54.11 56.86 59.74 71.03 100 ## arC2(10, 500, phi, 2) 33.12 35.13 36.51 38.93 59.90 100 ## arR(10, 500, phi, 2) 1940.79 2002.79 2029.68 2060.74 3306.61 100 ## arR2(10, 500, phi, 2) 625.46 651.32 665.47 694.64 2106.98 100 |
with average performances from 57 to 13 times better for C++ code.
Using sourceCpp
To simplify the initial presentation, the examples in this chapter have used inline C++ via cppFunction()
. For real problems, it’s usually easier to use standalone C++ files and then source them into R using the sourceCpp()
function.
This will enable us to take advantage of text editor support for C++ files (e.g. syntax highlighting) as well as make it easier to identify the line numbers of compilation errors.
Standalone C++ files can also contain embedded R code in special C++ comment blocks. This is really convenient if you want to run some R test code.
The standalone C++ file should have extension .cpp, and needs to start with:
1 2 |
#include <Rcpp.h> using namespace Rcpp; |
And for each function that we want available within R
, we need to prefix it with:
1 |
// [[Rcpp::export]] |
where the space after //
is mandatory
Then using sourceCpp("path/to/file.cpp")
will compile the C++ code, create the matching R functions and add them to current session.
Note that these functions, and the ones created using cppFunction()
, will not persist across save()
and load()
, such as when you restore your workspace.
Let’s try to copy/paste following code into a cppExample1.cpp
file in current directory, and compile/run the script as below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double meanC(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total / n; } /*** R library(microbenchmark) x <- runif(1e5) microbenchmark( mean(x), meanC(x)) */ |
Note the R code block embedded between /*** R
and */
.
This code will be executed after the compilation to test the functions.
1 |
sourceCpp("cppExample1.cpp") |
1 2 3 4 5 6 7 8 9 10 |
## ## > library(microbenchmark) ## ## > x <- runif(1e+05) ## ## > microbenchmark(mean(x), meanC(x)) ## Unit: microseconds ## expr min lq median uq max neval ## mean(x) 189.34 190.27 195.52 197.23 254.4 100 ## meanC(x) 93.32 93.98 96.38 97.18 119.4 100 |
And now meanC()
is available in current workspace:
1 |
"meanC" %in% ls() |
1 |
## [1] TRUE |
1 |
meanC |
1 2 |
## function (x) ## .Primitive(".Call")(<pointer: 0x2b79220f2c50>, x) |
Matrix computations example
Now we want to write a very simple function that calculates a correlation matrix from a data matrix.
For sake of simplicity, we suppose that the input matrix is numeric and that it does not contain NAs.
In Rcpp, the data types for integer, double, logical, and character matrices are IntegerMatrix
, NumericMatrix
, LogicalMatrix
, CharacterMatrix
, respectively.
Of course, the return value of our example function shall be an object of NumericMatrix
type.
1 2 3 4 5 6 7 8 9 10 11 12 |
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericMatrix corrC(NumericMatrix X) { Environment stats("package:stats"); Function corr=stats["cor"]; NumericMatrix out=corr(X); return out; } |
The above code is saved in cppExample2.cpp
source C++ file.
Notice that the R cor()
function of stats
package has been used within the code. To use the function, an object of class Environment
“containing” the stats
package has been created, and then from this object a class Function
object named corr
has been created, where the cor()
function of stats
package has been assigned.
With these two simple lines of code, the R cor()
function has became available to C++ code.
Now we can launch the compilation of code:
1 |
sourceCpp("cppExample2.cpp") |
And now we can compare the performances of R and Rcpp implementations:
1 2 3 4 5 |
X=matrix(rnorm(100000),nrow=1000,ncol=100) microbenchmark( corrC(X), as.matrix(cor(X)) ) |
1 2 3 4 |
## Unit: milliseconds ## expr min lq median uq max neval ## corrC(X) 5.214 5.281 5.364 5.425 5.809 100 ## as.matrix(cor(X)) 5.193 5.271 5.344 5.406 5.853 100 |
In this specific case, the differences are very small, since the two codes actually implement the same function.
A last more complex example
The truncgof
library contains functions that allow one to calculate GoF statistics, and related p-values, for truncated distributions. The main drawback of these functions is that they use Montecarlo simulations to calculate p-values.
Since the estimates are often obtained by maximization of likelihood, the p-value calculation become a very time-consuming exercize, since for left-truncated distributions, as in this case, the maximum of likelihood is seldom obtainable in closed form, and then it requires a numeric optimization as provided by optim()
or similar functions.
We would then like to speed-up the optimization process, and then the p-value calculation too.
Let us try an example: suppose that we have to estimate via MLE the parameters of a left-truncated lognormal distribution. The left truncated distribution function is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
dtrlnorm=function(x,pars,L=0,log=FALSE){ if(L<=0){ L=0 } if(log){ ret=log(x>=L) + dlnorm(x,meanlog=pars[1],sdlog=pars[2],log=log)- log(1-plnorm(L,meanlog=pars[1],sdlog=pars[2])) } else{ ret=(x>=L)* dlnorm(x,meanlog=pars[1],sdlog=pars[2],log=log)/ (1-plnorm(L,meanlog=pars[1],sdlog=pars[2])) } return(ret) } |
and then the neg-log-likelihood:
1 2 3 4 5 |
nlglik=function(pars,x,L){ return( -sum(dtrlnorm(x=x,pars=pars,L=L,log=TRUE)) ) } |
The Rcpp code that calculates the truncated distribution and log-likelihood is below, and is saved as cppExample3.cpp
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector dtrlnormC(NumericVector x, NumericVector pars, double L, int lg) { NumericVector out; if(lg==1){ double tmp=log(1-R::plnorm(L,pars[0],pars[1],1,0)); out= ifelse(x>=L,dlnorm(x,pars[0], pars[1],lg)-tmp,-INFINITY); } else{ double tmp=1-R::plnorm(L,pars[0],pars[1],1,0); out= ifelse(x>=L,dlnorm(x,pars[0], pars[1],lg)/tmp,0); } return out; } // [[Rcpp::export]] NumericVector dtrlnormC1(NumericVector x, NumericVector pars, double L, int lg) { int n=x.size(); NumericVector tmp(n); NumericVector out(n); tmp=dlnorm(x,pars[0], pars[1],lg); if(lg==1){ double cnst=log(1-R::plnorm(L,pars[0],pars[1],1,0)); for(int i=0;i<n;i++){ if(x[i]>=L){ out[i]=tmp[i]-cnst; } else{ out[i]=-INFINITY; } } } else{ double cnst=1-R::plnorm(L,pars[0],pars[1],1,0); for(int i=0;i<n;i++){ if(x[i]>=L){ out[i]=tmp[i]/cnst; } else{ out[i]=0; } } } return out; } // [[Rcpp::export]] double nlglikC(NumericVector pars, NumericVector x, double L){ return -sum(dtrlnormC(x,pars, L,1)); } // [[Rcpp::export]] double nlglikC1(NumericVector pars, NumericVector x, double L){ return -sum(dtrlnormC1(x,pars, L,1)); } |
Note that two implementations of the same dtrlnorm*
procedure have been reported: the first is slightly less performant, but more “concise”, since it uses some facilities of Rcpp Sugar that allows the developer to use vector functions (see the ifelse()
function). The only difference between the two nlglik*
functions, instead, is on the call to dtrlnormC
or dtrlnormC1
.
Now let’s compile the above code:
1 |
sourceCpp('cppExample3.cpp') |
Now, suppose that a sample from a left truncated lognormal distribution has been produced:
1 2 3 4 |
set.seed(1000) dt=rlnorm(n=1000,meanlog=9,sdlog=2) L=4000 dt=dt[dt>L] |
We can produce some comparison bechmarks between R and C++ implementations
1 2 3 4 5 6 7 8 9 10 11 |
# Start values for optimization process: init=c(mean(log(dt)),sd(log(dt))) microbenchmark( dtrlnorm(init,x=dt,L=L,log=FALSE), dtrlnormC(init,x=dt,L=L,lg=FALSE), dtrlnormC1(init,x=dt,L=L,lg=FALSE), dtrlnorm(init,x=dt,L=L,log=TRUE), dtrlnormC(init,x=dt,L=L,lg=TRUE), dtrlnormC1(init,x=dt,L=L,lg=TRUE) ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
## Unit: microseconds ## expr min lq median uq ## dtrlnorm(init, x = dt, L = L, log = FALSE) 84.52 86.06 88.40 90.73 ## dtrlnormC(init, x = dt, L = L, lg = FALSE) 65.21 66.28 67.72 68.72 ## dtrlnormC1(init, x = dt, L = L, lg = FALSE) 57.78 59.21 60.58 61.82 ## dtrlnorm(init, x = dt, L = L, log = TRUE) 98.00 100.92 102.91 105.77 ## dtrlnormC(init, x = dt, L = L, lg = TRUE) 61.53 63.08 64.52 65.60 ## dtrlnormC1(init, x = dt, L = L, lg = TRUE) 54.25 56.06 57.28 58.87 ## max neval ## 228.91 100 ## 97.59 100 ## 77.67 100 ## 121.96 100 ## 93.06 100 ## 71.54 100 |
1 2 3 4 5 |
microbenchmark( nlglik(init,x=dt,L=L), nlglikC(init,x=dt,L=L), nlglikC1(init,x=dt,L=L) ) |
1 2 3 4 5 |
## Unit: microseconds ## expr min lq median uq max neval ## nlglik(init, x = dt, L = L) 100.62 103.30 106.29 111.64 196.46 100 ## nlglikC(init, x = dt, L = L) 61.57 62.81 64.42 66.17 91.51 100 ## nlglikC1(init, x = dt, L = L) 54.58 56.30 57.27 58.77 76.16 100 |
Note that, at the best, the C++ implementation requires about one-half of time required by the “pure R” code.
An estimate of lognormal parameters may be obtained via maximum likelihood. Below an implementation of a function that performs such as calculation using optim()
:
1 2 3 4 5 |
lnMLE=function(x,nllik,L=0,method="Nelder-Mead", init=c(mean(log(x)),sd(log(x)))){ invisible( optim(par=init,fn=nllik,method=method,x=x,L=L)) } |
Note that the function takes as a parameter the specific instance of neg-log-likelihood function to be used, and that the C++ compiled Rcpp functions can be passed to optim()
like all standard R functions.
And now the estimates:
1 2 3 4 5 |
ests=lnMLE(x=dt,nllik=nlglik,L=L) estsC=lnMLE(x=dt,nllik=nlglikC,L=L) estsC1=lnMLE(x=dt,nllik=nlglikC1,L=L) ests |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
## $par ## [1] 9.418 1.718 ## ## $value ## [1] 7373 ## ## $counts ## function gradient ## 51 NA ## ## $convergence ## [1] 0 ## ## $message ## NULL |
1 |
ests$par-estsC$par |
1 |
## [1] 0 0 |
1 |
ests$par-estsC1$par |
1 |
## [1] 0 0 |
1 |
estsC$par-estsC1$par |
1 |
## [1] 0 0 |
1 |
ests$value-estsC$value |
1 |
## [1] -5.457e-12 |
1 |
ests$value-estsC1$value |
1 |
## [1] -5.457e-12 |
1 |
estsC$value-estsC1$value |
1 |
## [1] 0 |
Only very small differences appear between the different implementations.
Now a last benchmark on MLE procedures:
1 2 3 4 5 |
microbenchmark( lnMLE(x=dt,nllik=nlglik,L=L,init=init), lnMLE(x=dt,nllik=nlglikC,L=L,init=init), lnMLE(x=dt,nllik=nlglikC1,L=L,init=init),times=1000 ) |
1 2 3 4 5 6 7 8 9 |
## Unit: milliseconds ## expr min lq median ## lnMLE(x = dt, nllik = nlglik, L = L, init = init) 5.156 5.284 5.347 ## lnMLE(x = dt, nllik = nlglikC, L = L, init = init) 3.230 3.324 3.354 ## lnMLE(x = dt, nllik = nlglikC1, L = L, init = init) 2.905 2.985 3.016 ## uq max neval ## 5.402 8.243 1000 ## 3.389 6.859 1000 ## 3.054 5.200 1000 |
The C++ implementation requires about one half of time required by the R code.
And now, we will use the truncgof
package to apply a right-tailed AD2 test on simulated data (see the ad2up.test()
help for info about the function):
1 |
require(truncgof) |
1 2 3 4 5 6 7 8 |
## Loading required package: truncgof ## Loading required package: MASS ## ## Attaching package: 'truncgof' ## ## The following object is masked from 'package:stats': ## ## ks.test |
1 2 3 4 |
set.seed(100000) system.time(print(ad2up.test(x=dt,distn="plnorm",fit=ests$par,H=L, sim=1000,tol=0, estfun="as.list(lnMLE(x=dt, nllik=nlglik, L = H)$par)"))) |
1 2 3 4 5 6 7 |
## ## Quadratic Class Anderson-Darling Upper Tail Test ## ## data: dt ## AD2up = 5.373, p-value = 0.58 ## ## treshold = 4000, simulations: 1000 |
1 2 |
## user system elapsed ## 5.804 0.008 5.817 |
1 2 3 4 |
set.seed(100000) system.time(print(ad2up.test(x=dt,distn="plnorm",fit=estsC$par,H=L, sim=1000,tol=0, estfun="as.list(lnMLE(x=dt, nllik=nlglikC, L = H)$par)"))) |
1 2 3 4 5 6 7 |
## ## Quadratic Class Anderson-Darling Upper Tail Test ## ## data: dt ## AD2up = 5.373, p-value = 0.58 ## ## treshold = 4000, simulations: 1000 |
1 2 |
## user system elapsed ## 3.910 0.006 3.919 |
1 2 3 4 |
set.seed(100000) system.time(print(ad2up.test(x=dt,distn="plnorm",fit=estsC1$par,H=L, sim=1000,tol=0, estfun="as.list(lnMLE(x=dt, nllik=nlglikC1, L = H)$par)"))) |
1 2 3 4 5 6 7 |
## ## Quadratic Class Anderson-Darling Upper Tail Test ## ## data: dt ## AD2up = 5.373, p-value = 0.58 ## ## treshold = 4000, simulations: 1000 |
1 2 |
## user system elapsed ## 3.530 0.007 3.541 |
All the approaches produce the same results, but the best Rcpp implemetation requires a few more than one half of the “pure R” computational time.