Functions structure
When working with R we all make constant use of functions and, when developing, we create new functions so that functions look like very familiar R objects. Nevertheless, understanding the theory and the rationals underlying R functions may help to create much more efficient and possibly elegant coding.
We can create and assign functions to a variable names as we do with any other object:
1 2 3 4 |
f <- function(x, y = 0) { z <- x + y z } |
Eventually, we can delete any function with the usual call to rm()
or remove()
Functions are objects with three basic components:
- a formal arguments list
- a body
- an environment.
1 |
formals(f) |
1 2 3 4 5 |
## $x ## ## ## $y ## [1] 0 |
1 |
body(f) |
1 2 3 4 |
## { ## z <- x + y ## z ## } |
1 |
environment(f) |
1 |
## <environment: R_GlobalEnv> |
Formals
Formals are the formal arguments of a function returned as an object of class pairlist
where a pairlist
can be thought as something similar to a list with an important difference:
1 |
is.null(pairlist()) |
1 |
## [1] TRUE |
1 |
is.null(list()) |
1 |
## [1] FALSE |
that is: a pairlist
of length zero is NULL
while a list
is not.
When we call a function, formals arguments can be specified by position or by name and we can mix positional matching with matching by name so that the following are equivalent:
1 |
mean(x = 1:5, trim = 0.1) |
1 |
## [1] 3 |
1 |
mean(1:5, trim = 0.1) |
1 |
## [1] 3 |
1 |
mean(x = 1:5, 0.1) |
1 |
## [1] 3 |
1 |
mean(1:5, 0.1) |
1 |
## [1] 3 |
1 |
mean(trim = 0.1, x = 1:5) |
1 |
## [1] 3 |
Along with position and name, we can also specify formals by partial matching so that:
1 |
mean(1:5, tr = 0.1) |
1 |
## [1] 3 |
1 |
mean(tr = 0.1, x = 1:5) |
1 |
## [1] 3 |
would work anyway.
Functions formals may also have the construct symbol = default
, that unless differently specified, forces any argument to be used with its default value.
Specifically, function mean()
also have a third argument na.rm
that defaults to FALSE
and , as a result passing vectors with NA
values to mean()
returns NA
1 |
mean(c(1, 2, NA)) |
1 |
## [1] NA |
While, by specifying na.rm=TRUE
we get the mean of all non missing elements of vector x
.
1 |
mean(c(1, 2, NA), na.rm = TRUE) |
1 |
## [1] 1.5 |
The order R
uses for matching formals against value is:
- Check for exact match for a named argument
- Check for a partial match
- Check for a positional match
Formals of a function are normally used within functions by the internal R
evaluator but, we can use function formals()
to expose formals explicitly.
1 |
formals(f) |
1 2 3 4 5 |
## $x ## ## ## $y ## [1] 0 |
args()
is an other function that displays the formals in a more user friendly fashion. Actually, args(fun)
returns a function with the same arguments as fun
but with an empty body.
1 |
args(f) |
1 2 |
## function (x, y = 0) ## NULL |
Surely, for programming purposes, formals()
is a better choice as it returns a simple pairlist
that can be handled as a list:
1 |
is.list(formals(mean)) |
1 |
## [1] TRUE |
As a replacement method exists for function formals
:
1 |
exists("formals<-") |
1 |
## [1] TRUE |
formals of a function can manipulated by using function alist()
: a list()
type function that handles unevaluated arguments
1 2 |
g <- function(x, y=0) x+y g(1) |
1 |
## [1] 1 |
1 |
formals(g) |
1 2 3 4 5 |
## $x ## ## ## $y ## [1] 0 |
1 2 |
formals(g) <- alist(x=, y=1) g(1) |
1 |
## [1] 2 |
As an example of practical use of formals()
we may decide to re-define function mean()
that defaults na.rm
to TRUE
by simply:
1 2 |
formals(mean.default)$na.rm <- TRUE mean(c(1,2,NA)) |
1 |
## [1] 1.5 |
Clearly, we now have copy of mean.default()
in our globalenv
:
1 |
exists("mean.default", envir = globalenv()) |
1 |
## [1] TRUE |
Finally, let’s notice that:
1 |
environment(mean.default) |
1 |
## <environment: namespace:base> |
remains the base environment: the environment where the function was created.
The “...
” argument of a function is a special argument and can contain any number of symbol=value
arguments . The “...
” argument is transformed by R
into a list that is simply added to the formals
list:
1 2 |
h <- function (x, ...) {0} formals(h) |
1 2 3 4 |
## $x ## ## ## $... |
The “...
” argument can be used if the number of arguments is unknown. Suppose we want to define a function that counts the number of rows of any given number of data frames we could write:
1 2 3 4 5 6 |
count_rows <- function(...) { list <- list(...) lapply(list, nrow) } count_rows(airquality, cars) |
1 2 3 4 5 |
## [[1]] ## [1] 153 ## ## [[2]] ## [1] 50 |
Similarly, the “...
” arguments becomes very handy when the “...
” arguments will be passed on to another function as it often happened when calling plot()
from within another function. The following example shows a basic plot function used for depths plotting where additional graphics parameters are passed via “...
”:
1 2 3 4 5 6 7 8 9 10 |
time <- 1:13 depth <- c(0,9,18,21,21,21,21,18,9,3,3,3,0) plot_depth <- function ( time , depth , type = "l", ...){ plot(time, -depth, type = type, ylab = deparse(substitute(depth)), ...) } par(mfrow = c(1, 2)) plot_depth(time, depth, lty = 2) plot_depth(time, depth, lwd = 4, col = "red") |
Body of a function
The body of a function is a parsed R statement. In practice, this implies that the body of a function needs to be correct from a formal point of view but no evaluation of the body of a function occurred yet.
As a result, this function would return an error:
1 |
wrong <- function(x) {x =} |
as its body is not a correct R
statement.
While this function:
1 |
right <- function(x){x+y} |
is accepted by R
as is formally correct even thought, except under specific circumstances, will always return an error:
1 |
right(x = 2) |
1 |
## Error: object 'y' not found |
The body of a function, is usually a collection of statements in braces but it can be a single statement, a symbol or even a constant.
The body of function is an object of class call
:
1 2 |
f <- function(x) {x+1} class(body(f)) |
1 |
## [1] "{" |
and as a call
object, the body of a function can be manipulated as a list:
1 |
as.list(body(f)) |
1 2 3 4 5 |
## [[1]] ## `{` ## ## [[2]] ## x + 1 |
and, as function body()
has a replacement method: body()<-
, the body of a function can be easily manipulated:
1 2 |
body(f)[[2]][[1]] <- `-` f(1) |
1 |
## [1] 0 |
This technique can be eventually used for testing on the fly small changes to a function without rewriting its full body.
Environment of a function
The environment of a function is the environment that was active at the time that the function was created. Generally, for user defined function, the Global environment:
1 2 |
f <- function(x){x+1} environment(f) |
1 |
## <environment: R_GlobalEnv> |
or, when a function is defined within a package, the environment associated to that package:
1 |
environment(mean) |
1 |
## <environment: namespace:base> |
The environment of a function is a structural component of the function and belongs to the function itself.
As an example, we can define a function f()
that simply returns zero
1 |
f <- function() 0 |
the environment of f()
is clearly the globalenv()
1 |
environment(f) |
1 |
## <environment: R_GlobalEnv> |
we can modify the environment of a function and assign to f()
a newly created environment
1 2 3 |
env <- new.env() environment(f) <- env environment(f) |
1 |
## <environment: 0x3ae6070> |
in case we delete environment env
1 |
rm(env) |
f()
will keep working
1 |
f() |
1 |
## [1] 0 |
All of this happen as env
and the environment of f()
are two pointers to the same piece of memory address but they exist as separate objects.
As an example we may consider a function defined in a dedicated environment along with some other objects in the same environment.
1 2 3 4 5 6 7 8 9 |
env <- new.env() with(env,{ y <- 1 g <- function(x){x+y} }) with(env, g(1)) |
1 |
## [1] 2 |
As we can see, clearly g()
knows that x=1
as it was passed to the function as an argument but, g()
also remembers that y=1
as y
belongs to the environment env
: the environment of g()
.
The same behavior occurs many times when we develop R
function and may lead to errors when calling these functions. Suppose we simply write:
1 2 3 |
y <- 1 g <- function(x){x+y} g(2) |
1 |
## [1] 3 |
The above example works as the environment of g()
is now the global environment. But, as soon as we do:
1 |
rm(y) |
clearly, g()
will stop working as object y
no longer exists in the global environment
1 |
g(1) |
1 |
## Error: object 'y' not found |
Notice that, if we define this odd function
1 |
f <- function() x |
this function works if it finds variable x
in its chain of searchable environments. As a result, if we define
1 2 3 |
env <- new.env() env$x <- 0 environment(f) <- env |
now f()
returns zero as it finds x
within its environment
1 |
f() |
1 |
## [1] 0 |
if now delete env
1 |
rm(env) |
f()
will keep working
1 |
f() |
1 |
## [1] 0 |
as a pointer to the same memory address exists as part of f()
itself
Along with the environment where the function was created, functions usually interact with, at least, two more environments:
- The evaluation environment
- The calling environment
The evaluation environment is created any time the function is called. Within this environment, the formals arguments of the function are matched with the supplied arguments and the body of the function is evaluated.
The evaluation environment, as any other environment, has a parent. The parent of the evaluation environment of a function is the environment of the function. In other words, the function environment is the enclosure, the parent, of the evaluation environment.
As a proof of concept we can write simple function that returns the its evaluation environment along with the evaluated symbols that are created within this environment :
1 2 3 4 5 6 7 |
f <- function(x){ env <- environment() env } env_f <- f(x = 0) get("x", envir = env_f) |
1 |
## [1] 0 |
As we can see, object x
is bounded to the evaluation environment of f()
.
The calling environment is the environment the function is called from. When using R
interactively, the calling environment of a function is usually the global environment but, this is not always the case.
When we call a function, the function first looks for any variable in the evaluation environment and then in its enclosure; usually, for user defined functions, the global environment. In case no variable is found, R
keeps searching along the environments stack until it reaches the empty environment. As we can see, this process does not take into account the calling environment.
When using R
interactively, the environment of a function and the calling environment of that function often coincide: functions are defined in the global environment and called from the same environment.
In order to better understand the difference between the environment of a function and the calling environment of a function, we may consider a new environment, with a function f()
defined in it, whose enclosure is forced to the base
environment:
1 2 |
env <- new.env(parent = baseenv()) with(env, f <- function(x) {is.function(x)}) |
Function f()
takes a single argument and returns TRUE
in case it is a function, FALSE
otherwise.
If we call this function with argument x = c
:
1 |
with(env, f(c)) |
1 |
## [1] TRUE |
f()
returns TRUE
as it is considering function c()
from the base
environment.
if we define an object c
within environment env
:
1 |
with(env, c <- 0) |
and we call it:
1 |
with(env, f(x = c)) |
1 |
## [1] FALSE |
now f()
returns FALSE
as it is considering variable c
within the env
environment and does not find function c()
in the base
environment.
If we now remove c
from env
:
1 |
remove(c, envir = env) |
and we re-define c
within our global environment:
1 |
c <- 0 |
when now calling f(x = c)
,
1 |
with(env, f(x = c)) |
1 |
## [1] TRUE |
we can see that f()
now returns TRUE
despite the c <- 0
assignment in the global environment.
Basically, f()
start searching from its environment: env
and, if necessary, keeps searching along the environment tree structure that, in this case, does not include the globalenv
.
R
provides at least two useful functions to deal with the environments of a functions:
parent.env()
parent.frame()
parent.env()
returns the environment in which the function was defined while parent.frame(n = 1)
identify the environment from which the function was invoked.
In order to illustrate this concepts, we can define:
1 2 3 4 5 6 7 8 9 10 |
env_of_fun <- function(){ evaluated_in <- environment() defined_in <- parent.env(evaluated_in) called_from <- parent.frame(n = 1) c(evaluated_in = evaluated_in, defined_in = defined_in, called_from = called_from) } env_of_fun() |
1 2 3 4 5 6 7 8 |
## $evaluated_in ## <environment: 0x2826418> ## ## $defined_in ## <environment: R_GlobalEnv> ## ## $called_from ## <environment: R_GlobalEnv> |
This function was defined in the global environment and called from the global environment.
Suppose we now define a new environment env
and we move env_of_fun()
in it:
1 2 3 |
env <- new.env() env$env_of_fun <- env_of_fun rm(env_of_fun) |
when we now call env_of_fun()
1 |
with(env, env_of_fun()) |
1 2 3 4 5 6 7 8 |
## $evaluated_in ## <environment: 0x38f7308> ## ## $defined_in ## <environment: R_GlobalEnv> ## ## $called_from ## <environment: 0x2ca1fb8> |
we can see that the calling environment is now different from the definition environment.
Understanding this idea can help to improve clarity and avoid annoying conflicts.
As an example, we can define function f()
within a newly created environment env
and use function parent.frame()
within the newly created function:
1 2 3 4 5 6 |
rm(list = ls()) env <- new.env(parent = baseenv()) with(env, f <- function(x) { x <- eval(x, envir = parent.frame(n = 1)) is.function(x) }) |
and observe that:
1 |
env$f(c) |
1 |
## [1] TRUE |
1 2 |
c <- 1 env$f(c) |
1 |
## [1] FALSE |
1 |
with(env, f(c)) |
1 |
## [1] TRUE |
that is, function parent.frame()
forced f()
to look for c
first inside the calling environment rather than the creation environment: env
or its parent:
Similarly, in order to avoid conflicts between objects passed as arguments to a function and objects stored in any other environment, such as a package, we could define f()
within env
as:
1 2 |
env <- new.env(parent = baseenv()) with(env, f <- function(x) eval(x, parent.env(environment()))) |
in this case we can be sure that whenever we call f()
it first looks for the value of x
as stored either in env
or its parent: :
Suppose, in fact, we call;
1 |
with(env, f(x = pi)) |
1 |
## [1] 3.142 |
1 2 |
pi <- 0 with(env, f(x = pi)) |
1 |
## [1] 3.142 |
we can observe that f(x = pi)
always returns teh correct value for pi
Example: Remove all objects from the workspace
As an example of use of the environment of a function, we can consider several strategies to write a function capable of removing all objects from the globalenv
. We can iniatially write a simple function:
1 2 3 4 |
clear = function(env = globalenv()) { obj = ls(envir = env) rm(list = obj, envir = env) } |
Function clear()
removes all objects from a specified environment and seems to work correctly:
1 2 |
x <- 1; y <- 2; z <- 3 ls() |
1 |
## [1] "c" "clear" "env" "pi" "x" "y" "z" |
1 2 |
clear() ls() |
1 |
## character(0) |
At this point, should be obvious what is the drawback of this solution. Function clear()
deletes also itself and, as a result, it cannot be reused without redefined it.
1 2 |
a <- 2 clear() |
1 |
## Error: could not find function "clear" |
This function can be improved, to keep function clear()
when all other objects are deleted.
1 2 3 4 5 6 |
clear <- function (env = globalenv()){ objects <- objects(env) objects <- objects[objects != "clear"] rm(list = objects, envir = env) invisible (NULL) } |
Now the function can be used more than once.
1 2 3 4 |
a <- b <- c <- 0 clear() a <- b <- c <- 1 clear() |
Unfortunately, this function has also a drawback: it stops working when reassigned.
1 2 3 4 |
clean <- clear rm (clear) a <- b <- c <- 0 clean() |
As defined above, function clean()
also removes itself: only the object named clear
is preserved.
1 2 |
a <- 3 clean() |
1 |
## Error: could not find function "clean" |
To dynamically keep function name, we may modify function clear as follow.
1 2 3 4 5 6 7 |
clear <- function (env = globalenv()){ fname <- as.character(match.call()[[1]]) objects <- objects(env) objects <- objects[objects != fname] rm(list <- objects, envir = env) invisible (NULL) } |
Nevertheless, beside the above solution, a smart way to obtain the same result is the follow:
1 2 3 4 5 6 |
assign("clean", function(env = globalenv()){ rm(list = ls(envir = env), envir = env) }, envir = attach(NULL, name = "myenv", pos = 2) ) |
Through function assign()
, function clear()
is created in a new environment called myenv
. In this way, all objects in the global environment can be removed without deleting function clear()
1 2 |
a <- b <- c <- 0 ls() |
1 |
## [1] "a" "b" "c" "clear" |
1 2 |
clean() ls() |
1 |
## character(0) |
search()
Return Value
The last object called within a function is returned by the function and therefore available for assignment. Functions can return only a single value but, in practice, this is not a limitation as a list containing any number of objects can be returned.
Objects can be returned visible
or invisible
. This option has no effect on the assignment side but affects the way results are displayed when the function is called.
1 2 3 4 5 6 7 |
g <- function (n){ out <- runif(n) cat(head(out)) invisible(out) } x <- g(10^5) |
1 |
## 0.3526 0.3616 0.3935 0.1849 0.8523 0.5663 |
1 |
length(x) |
1 |
## [1] 100000 |
Sometimes, we may want a function that does any job but returns nothing. In this case, the return value will be set to NULL
and returned as invisible.
Suppose we need a function that cat()
a message we can write:
1 2 3 4 |
msg <- function(x){ cat(x, "\n") invisible(NULL) } |
and use it as:
1 |
msg("test message") |
1 |
## test message |
with no assignment nor returned value.
Operators
Operators in R are simple function. Specifically, operators are infix functions as opposite to standard functions that are defined as prefix as the name of the function comes before its arguments. Operators can be defined as function with the only constrain that their name must be surrounded with ‘’%’’. As a result, a simple operator that concatenate strings can be defined as:
1 2 |
"%+%" = function(x,y){paste(x, y, sep = "")} "we " %+% "love " %+% "R !" |
1 |
## [1] "we love R !" |
A more complex approach, based on R
capabilities as an object oriented programming language, takes advantage of, +
being a generic function:
1 |
methods(`+`) |
1 |
## [1] +.Date +.POSIXt |
As a result, different methods for generic function +
can be defined for different classes of objects.
As an example, we may define a class of objects named string
:
1 2 3 4 5 |
string <- function(x) { s <- as.character(x) class(s) <- "string" s } |
with a +
method that concatenates strings:
1 |
`+.string` <- function(s1, s2) paste(s1, s2, sep = "") |
and as a result:
1 2 3 |
a <- string("Mickey") b <- string("Mouse") a+b |
1 |
## [1] "MickeyMouse" |
Lazy evaluation
Functions arguments, except few exceptions, are, by default, lazy; that is, they are not evaluated when the function is called but only when the argument are explicitly used.
Let’s take as an example this simple function where the y
argument is never evaluated within the function body:
1 2 3 4 |
rm(list = ls()) f = function(x, y){ x+1 } |
We can call f()
and pass a non existing object z
to argument y
. Clearly, this kind of statement would result in a error as z
is not defined but, it works as a function argument:
1 |
f(x = 0, y = z) |
1 |
## [1] 1 |
As we can see, y
is assigned to z
and z
does not exit but, R
does not return any error. This is because y = z
is never evaluated within the function body.
As a second example we can consider this basic function that simply prints its arguments:
1 2 3 4 5 |
h <- function(a , b){ cat ("a is:", a, "\n") cat ("b is:", b, "\n") invisible(NULL) } |
If we call h()
without passing any vale to b
we see that:
1 |
h(a = "we love R") |
1 |
## a is: we love R |
1 |
## Error: argument "b" is missing, with no default |
that is: h()
returns an error only when the evaluation of b
is required. Prior to that, this function works perfectly.
Usually, whenever a function returns an error if any argument is not provided and not yet evaluated, this is because a control mechanism has been programmed within the function body:
1 2 3 4 5 6 7 8 |
g <-function(x, y){ call <- match.call() args <- match(c("x", "y"), names(call)) if(any(is.na(args))) stop("All args must be provided!") pi } g(y = 1) |
1 |
## Error: All args must be provided! |
More formally, an unevaluated argument is called a promise
. A promise is an object made of three slots:
- a value
- an expression
- an environment
Practically, when a function is called, any argument is associated to a promise object along with the expression associated to that argument and a pointer to the environment where the expression will be, eventually, evaluated and assigned to the argument symbol.
Evaluation of an argument is required when:
- Interfacing with foreign language
- Selecting a method for a generic function
- An argument needs to be assigned within a function
There is generally no way within R
to check whether an object is a promise or not, nor is there a way to determine the environment of a promise.
Lazy evaluation permits flexible handling of missing arguments and computations depending on the expression for the argument rather than its value.
The following example is a good case in point:
1 2 3 4 5 |
rescale = function(x, location = min(x), scale = max(y)){ y <- x-location y/scale } rescale(1:4) |
1 |
## [1] 0.0000 0.3333 0.6667 1.0000 |
This function scales any vector, by default, in the [0,1]
range.
Argument scale
depends on the value of y
that is not defined but, it will be defined: y = x-location
prior to its evaluation y/scale
.
Function delayedAssign()
offers a direct mechanism for accessing promise mechanism outside a function
1 2 3 4 |
delayedAssign("promise" , {x+y}) x <- 0 y <- 1 eval(promise) |
1 |
## [1] 1 |
Functions call
Functions in R
can be called directly or by mean of a second function such as do.call()
by passing a string corresponding to the function name.
do.call()
Function do.call()
takes as input two arguments:
- either a function or a non-empty character string naming the function to be called.
- a list of arguments to the function call
Basically:
1 |
mean(x = 1:100, trim = 0.2) |
corresponds to:
1 2 |
do.call("mean", list(x = 1:100, trim = 0.2)) do.call(mean, list(x = 1:100, trim = 0.2)) |
Example: Maximumum Likelihood Estimamates
As an example, we may consider a maximum likelihood estimator for normal distributions:
1 2 3 4 5 6 7 8 |
mle = function(theta, x){ ml = function(theta, x) { ml = dnorm(x = x, mean = theta[1], sd = theta[2]) ml = -sum(log(ml)) } optim(theta, ml, x = x)$par } mle(theta = c(0, 1), x = rnorm(100, 5, 2)) |
1 |
## [1] 4.759 1.979 |
We can re-implement the estimator by using do.call()
:
1 2 3 4 5 6 7 8 9 |
mle = function(theta, x){ ml = function(theta, x) { ml = do.call(dnorm, list(x, theta[1], theta[2])) ml = -sum(log(ml)) } optim(theta, ml, x = x)$par } mle(theta = c(0, 1), x = rnorm(100, 5, 2)) |
1 |
## [1] 5.252 2.060 |
The distribution name can be passed as an argument to the mle()
and, as a consequence, to do.call()
at the cost of a minor modification to the internal function mle()
.
1 2 3 4 5 6 7 8 9 |
mle = function(theta, x, dist){ dist = paste("d", dist , sep = "") ml = function(dist , theta, x) { ml = do.call(dist, list(x, theta[1], theta[2])) -sum(log(ml)) } optim(theta, ml, dist = dist , x = x)$par } mle(dist = "norm" , theta = c(0, 1), x = rnorm(10, 5, 2)) |
1 |
## [1] 4.758 1.644 |
Now it works with most of two parameters distributions assuming that the right initial theta
is provided.
1 |
mle(dist = "lnorm" , theta = c(0,1), x = rlnorm(100, 3, 1)) |
1 |
## [1] 3.0405 0.9058 |
1 |
mle(dist = "weibull" , theta = c(1,1), x = rweibull(100, 3, 1)) |
1 |
## [1] 3.458 1.017 |
Clearly, this is a good value generalization given the programming effort required.
match.call()
Function match.call()
is used within functions and it simply returns the call that has been passed to a function
1 2 3 4 5 6 |
f = function(a, b){ call = match.call() call} my_call = f(2, 3) my_call |
1 |
## f(a = 2, b = 3) |
1 |
class(my_call) |
1 |
## [1] "call" |
Any call
is an object of class call
that can explored as a list object:
1 2 |
my_call_list <- as.list(my_call) my_call_list |
1 2 3 4 5 6 7 8 |
## [[1]] ## f ## ## $a ## [1] 2 ## ## $b ## [1] 3 |
Call
objects can also be manipulated as list.
1 2 |
my_call$a <- 0 eval(my_call) |
1 |
## f(a = 0, b = 3) |
Example: Function anyway()
As an example, we consider a function with two arguments a, b
that returns, in case both arguments are numeric, the sum of the arguments; the character variable "a+b"
otherwise.
1 2 3 4 5 6 7 8 9 10 11 |
anyway = function(a , b){ call <- match.call() if (is.numeric(a) & is.numeric(b)) {call[[1]] <- as.name("sum")} else { call[[1]] <- as.name("paste" ) call$sep <- "+" } eval(call) } anyway(3, 6) |
1 |
## [1] 9 |
1 |
anyway("c", 2) |
1 |
## [1] "c+2" |
Example: Function write.csv()
revisited
As a as second application of do.call()
we consider write.csv()
. This function is a wrapper to write.table()
forcing sep = ","
and dec = "."
.
Such a function could be easily written as:
1 2 3 |
write.csv <- function(...) write.table(sep = ",", dec = ".", ...) siris <- head(iris, 3) write.csv(siris) |
1 2 3 4 |
## "Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species" ## "1",5.1,3.5,1.4,0.2,"setosa" ## "2",4.9,3,1.4,0.2,"setosa" ## "3",4.7,3.2,1.3,0.2,"setosa" |
Nevertheless, if we try to pass any of the sep
or dec
arguments via the ‘’...
’’ argument, the function returns error:
1 |
write.csv(siris, sep = ";") |
1 |
## Error: formal argument "sep" matched by multiple actual arguments |
Basically, the ‘’...
’’ argument may take any argument to be passed to write.table()
except sep
and dec
. In case any of these arguments is explicitly passed to the function, they have to be forced to the desired default values: ","
and "."
.
A simplified version of write.csv()
can be re-written as:
1 2 3 4 5 6 7 8 9 |
write.csv = function(...){ call = match.call() call[[1]] = as.name("write.table") call$sep = "," call$dec = "." eval(call) } write.csv(siris, sep = ";") |
1 2 3 4 |
## "Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species" ## "1",5.1,3.5,1.4,0.2,"setosa" ## "2",4.9,3,1.4,0.2,"setosa" ## "3",4.7,3.2,1.3,0.2,"setosa" |
Now, sep = ";"
is simply ignored.
Recursive functions
A recursive function use recursion and can call itself until a certain condition is met.
As an example we may consider a function that takes x
as an argument and keep dividing it by 2
until the result is greater than 2
. This idea can be implemented by a simple while()
loop:
1 2 3 4 5 6 7 |
one_c <- function(x){ while (x > 2){ x <- x/2 } x } one_c(10) |
1 |
## [1] 1.25 |
or alternatively by the use of function Recall()
: a placeholder for the name of the function in which it is called. It allows the definition of recursive functions which still work after being renamed
1 2 3 4 5 6 7 8 |
one_r <- function(x){ if (x > 2 ){ x <- x/2 x <- Recall(x) } x } one_r(10) |
1 |
## [1] 1.25 |
in this case Recall(x)
is equivalent to one_r(x)
.
The use of recursion my look redundant in this simple example but, it given an idea of how much a function can change by the simple introduction of this concept.
When dealing with more complex problem, the use of recursion may help indeed to simplify our coding.
Example: Quicksort
A good example of the advantages, and possibly disadvantages, of using recursive function is represented by the implementation of the quicksort: a divide and conquer algorithm that first divides a large list into two smaller sub-lists: the low elements and the high elements. Quicksort can then recursively sort the sub-lists.
As a simple implementation we may consider:
1 2 3 4 5 6 7 8 9 10 11 12 |
quick_sort_r <- function(x) { if(length(x) > 1) { base <- x[1] l <- Recall(x[x < base]) m <- x[x == base] h <- Recall(x[x > base]) c(l, m, h) } else x } |
that results in:
1 |
quick_sort_r(sample(1:10)) |
1 |
## [1] 1 2 3 4 5 6 7 8 9 10 |
Note that its non recursive implementation could be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
quick_sort_c <- function(x , max_lev = 1000) { n <- length(x) i <- 1 beg <- end <- max_lev beg[1] <- 1 end[1] <- n+1 while (i>=1) { L <- beg[i] R <- end[i]-1 if (L<R) { piv <- x[L] if (i == max_lev) stop("Error: max_lev reached"); while (L<R) { while (x[R]>=piv && L<R){ R <- R-1 } if (L < R){ x[L] <- x[R] L <- L+1 } while (x[L]<=piv && L<R){ L <- L+1 } if (L<R) { x[R] <- x[L] R <- R-1 } } x[L] <- piv beg[i+1] <- L+1 end[i+1] <- end[i] end[i] <- L i <- i+1 } else { i <- i-1 } } return( x) } |
that keeps working
1 |
quick_sort_c(sample(1:10)) |
1 |
## [1] 1 2 3 4 5 6 7 8 9 10 |
but does not express the same level of clarity.
Moreover, when looking for performances, the use of recursion in R
is a clear advantage:
1 2 |
x <- sample(1:10^5) system.time(quick_sort_r(sample(x))) |
1 2 |
## user system elapsed ## 0.50 0.01 0.51 |
1 |
system.time(quick_sort_c(sample(x))) |
1 2 |
## user system elapsed ## 2.910 0.052 2.967 |
Example: Left join
Suppose we want to implent a left join between three data frames:
1 2 3 |
df1 <- data.frame(id = 1:6, x1 = 1:6) df2 <- data.frame(id = 2:4, x2 = 2:4) df3 <- data.frame(id = 3:5, x1 = 3:5) |
we will have to acheive this goal in two steps:
1 2 3 |
df12 <- merge(df1, df2, by = "id", all.x = T) df123 <- merge(df12, df3, by = "id", all.x = T) df123 |
1 2 3 4 5 6 7 |
## id x1.x x2 x1.y ## 1 1 1 NA NA ## 2 2 2 2 NA ## 3 3 3 3 3 ## 4 4 4 4 4 ## 5 5 5 NA 5 ## 6 6 6 NA NA |
In case we have to repeat this task several times, expecialy with a variable number of data frames, we could define function left_join()
as:
1 2 3 4 5 6 7 8 9 |
left_join <- function(df_list, by, all.x = T){ df_merged <- merge(df_list[[1]], df_list[[2]], by = by, all.x = all.x) df_list <- df_list[-1] df_list[[1]] <- df_merged if (length(df_list) > 1){ df_merged <- Recall(df_list, by = by, all.x = all.x) } df_merged } |
and use it as:
1 |
left_join(list(df1, df2, df3), by = "id") |
1 2 3 4 5 6 7 |
## id x1.x x2 x1.y ## 1 1 1 NA NA ## 2 2 2 2 NA ## 3 3 3 3 3 ## 4 4 4 4 4 ## 5 5 5 NA 5 ## 6 6 6 NA NA |
Replacement functions
Given any f()
function sometimes we are allowed to write expressions like: f(x) <- y
. For example, given any data.frame
:
1 |
df <- data.frame(x = 1:3, y = 3:1) |
we can query for the names of the variables within the data.frane by:
1 |
names(df) |
1 |
## [1] "x" "y" |
in order to replace variables names, we often use:
1 2 |
names(df) <- c("xx", "yy") names(df) |
1 |
## [1] "xx" "yy" |
This is possible as a function names<-()
exists and it is known as the replacement method for names()
.
1 |
get("names<-") |
1 |
## function (x, value) .Primitive("names<-") |
In principle any replacement function takes the general form of: "f<-"(x, value)
with value
being the replacement argument.
Example: Trim and replace
As an example, we may consider function trim()
that trims any vector at a the quantile corresponding to the p
(probability) argument:
1 2 3 4 5 |
trim <- function(x, p){ x[x <= quantile(x, p)] } trim(1:10, p = .25) |
1 |
## [1] 1 2 3 |
A simple replacement method for this function can be written as:
1 2 3 4 |
"trim<-" <- function (x, p, value){ x[x <= quantile(x, p)] <- value x } |
and can be used as:
1 2 3 |
y <- 1:10 trim(x = y, p = .25) <- 0 y |
1 |
## [1] 0 0 0 4 5 6 7 8 9 10 |
Replacing non assigned objects
Note that using replacement functions requires that the object passed as argument x
exists in the calling environment of the function. As a proof of concept we can see that:
1 2 |
df <- data.frame(x = 0, y = 1) names(df) <- c("a", "b") |
works normally, while
1 |
names(data.frame(x = 0, y = 1)) <- c("a", "b") |
1 |
## Error: target of assignment expands to non-language object |
does not work as the data.frame
to be modified does not exist.