The ‘’read parse evaluate’’ loop
The ‘’read parse evaluate’’ loop is at the core of the R
system. Understanding this mechanism and being able to manage and control it is a key point for writing efficient R
codes.
When we type R
commands, two things happen:
- those characters are parsed into an
R
expression - the expression is then evaluated by the internal evaluator
This process, known as the ‘’read parse evaluate’’ loop, is internally performed by the R
system but, the same process is available to the end user by mean of two functions: parse()
and eval()
.
Effectively, when typing any a <- 1
, what happens is
1 |
eval(parse (text = "a <- 1")) |
The inner parse section returns an object of class expression
and afterward the expression is evaluated by the evaluator.
When we call functions eval()
and parse()
directly, we generally pass character strings as arguments to the parse()
function either from quoted text strings or external files
1 |
parse(text = "a <- 1") |
1 |
## expression(a <- 1) |
1 |
parse(file = './input.R') |
Expressions objects are special language objects of class expression
which contain parsed but unevaluated R statements. Parsed expressions are stored in an R object that can be explored as standard list objects.
1 2 |
expr <- parse (text = "a <- 1") class(expr) |
1 |
## [1] "expression" |
1 |
str(expr) |
1 |
## expression(a <- 1) |
1 |
as.list(expr) |
1 2 |
## [[1]] ## a <- 1 |
1 |
as.list(expr[[1]]) |
1 2 3 4 5 6 7 8 |
## [[1]] ## `<-` ## ## [[2]] ## a ## ## [[3]] ## [1] 1 |
and even manipulated as standard list objects
1 2 |
expr <- parse (text = "1+2") as.list(expr[[1]]) |
1 2 3 4 5 6 7 8 |
## [[1]] ## `+` ## ## [[2]] ## [1] 1 ## ## [[3]] ## [1] 2 |
1 2 |
expr[[1]][[3]] <- 7 as.list(expr[[1]]) |
1 2 3 4 5 6 7 8 |
## [[1]] ## `+` ## ## [[2]] ## [1] 1 ## ## [[3]] ## [1] 7 |
The evaluation part of the R
program consists of passing the object resulting from parsing the current expression to the R
evaluator.
The parsed expression is then evaluated by the function eval()
.
1 |
eval(expr) |
1 |
## [1] 8 |
and the result is returned.
Because of the way R
works, evaluating expressions is, except few exceptions, about evaluating functions calls. This is clearly true when we call any a standard function in R
as:
1 |
mean(x = 1:10) |
but, this is also true when we write any assignment statement. In fact R
translates:
1 |
x <- 0 |
into a function call:
1 |
`<-`(x, 0) |
and even a conditional construct such as:
1 |
if ( pi > 0) 1 else 0 |
1 |
## [1] 1 |
translates into a call to a function
1 |
`if`( pi > 0 , 1 , 0) |
The parse-eval
mechanism has at least three exceptions: constants, names and promises:
Constants in R
are evaluated into themselves. Any expression as:
1 |
1 |
1 |
## [1] 1 |
is the evaluation of a constant while
1 |
-1 |
1 |
## [1] -1 |
turns into a call to a function:
1 |
`-`(1) |
More on how R
evaluates function in the chapter dedicated to functions
A symbol is a variable name with a value associated to it: x
is a symbol, or a symbol name:
1 2 |
x <- 0 class(quote(x)) |
1 |
## [1] "name" |
Symbols in R may be made of lower or capital letters, numbers and the special characters "."
and "_"
. Almost any rule is a valid one when defining a symbol
1 2 3 4 5 |
x <- 0 x0 <- 0 x_0 <- 0 x.0 <- 0 .x <- 0 |
Standard symbols cannot start with a number or a "_"
. Any name staring with a "."
is a hidden name meaning that it is not returned by a call to ls()
unless argument all.names
is set to TRUE
.
1 2 |
.x <- 0 ls() |
1 |
## character(0) |
1 |
ls(all.names = TRUE) |
1 |
## [1] ".x" |
When we ask R
to evaluate a symbol, R
looks for the value associated to that symbol, first in the current environment and, in case the symbol is not found within current environment, R
looks progressively in all the parents environments until the object value is returned or an error occurs as the symbol is not found. This key idea will be fully discussed in the chapter dedicated to environments.
Assignment
When typing a = 1
at the command prompt, the value 1
in assigned to the symbol a
. The =
operator is used to perform the assignment. In fact, R
provides three operators for assignments: =
, <-
and <<-
the last two being bi-directional.
Operators =
and <-
assign into the environment in which they are evaluated. Therefore, at the command prompt a = 1
is equivalent to a <- 1
.
When an assignment is done on formal parameter lists within functions calls, assignment is performed in the environment where the function is evaluated if the =
operator is used while the same assignment occurs in the local environment in case the of the <-
operator. As a simple example, we can consider a simple call to any function i.e. median()
. First we clean up our workspace:
1 |
rm(list = ls()) |
Then we call median()
using both =
and <-
operators for parameters assignement:
1 |
median(x = 1:10) |
1 |
## [1] 5.5 |
1 |
exists("x") |
1 |
## [1] FALSE |
1 |
median(x <- 1:10) |
1 |
## [1] 5.5 |
1 |
exists("x") |
1 |
## [1] TRUE |
1 |
x |
1 |
## [1] 1 2 3 4 5 6 7 8 9 10 |
In practice, the way the evaluator understands assignment is:
1 2 |
'='(a, 1) a |
1 |
## [1] 1 |
1 2 |
'<-'(b, 2) b |
1 |
## [1] 2 |
R
supports multiple assignments with both operators.
1 2 3 |
x = xx = 1 y <- yy <- 0 c(x, xx, y, yy) |
1 |
## [1] 1 1 0 0 |
Attention should be paid as:
1 2 |
k = p <- 0 c(k, p) |
1 |
## [1] 0 0 |
works correctly but:
1 |
k <- p = 0 |
returns an error. This happens as
k = p <- 0
translates to '='(k,'<-'(p, 0))
while
k <- p = 0
is interpreted as '='('<-'(k, p), 0)
as the <-
operator takes precedence on the =
operator.
Finally, operator <<-
is used to assign into the parent environment. As an example consider:
1 2 |
env <-new.env() parent.env(env) |
1 |
## <environment: R_GlobalEnv> |
1 |
with(env, x <<- 8) |
In this case the assignment x = 8
is performed within the parent frame of env
, that is R_GlobalEnv
. Thus:
1 |
ls(env = env) |
1 |
## character(0) |
does not show any x
symbol while x
is still available in env
:
1 |
get("x", env = env) |
1 |
## [1] 8 |
as, since the evaluator does not find x
in the local frame, it looks for x
in the parent frame. In fact:
1 |
x |
1 |
## [1] 8 |
Removing objects
To remove objects, the function rm()
can be used. The function remove()
may be considered as an alias for the rm()
function.
As seen above, ls()
returns a vector containing all objects in the current environment. To remove all objects in the current environment, all you need is
1 2 |
rm(list = ls()) ls() |
1 |
## character(0) |
Of course, the list
argument can contain any character vector with object names.
1 2 |
x <- 1; y <- 2; z <- 3 ls() |
1 |
## [1] "x" "y" "z" |
1 2 |
rm(list = c("x", "y", "z")) ls() |
1 |
## character(0) |
When argument are not already in a vector, they can be passed directly:
1 2 |
x <- 1; y <- 2; z <- 3 ls() |
1 |
## [1] "x" "y" "z" |
1 2 |
rm("x", "y", "z") ls() |
1 |
## character(0) |
When arguments are passed directly, and not in the character vector list
, it is not mandatory to quote them.
1 2 |
x <- 1; y <- 2; z <- 3 ls() |
1 |
## [1] "x" "y" "z" |
1 2 |
rm(x, y, z) ls() |
1 |
## character(0) |
Garbage collection
When objects are no longer used, and this clearly happens when objects are deleted. R
releases immediately the memory they filled in the system. This is done automatically by the garbage collector gc()
.
We can call gc()
to see how much memory R
is using for allocating objects
1 |
gc() |
1 2 3 |
## used (Mb) gc trigger (Mb) max used (Mb) ## Ncells 147432 7.9 350000 18.7 350000 18.7 ## Vcells 338920 2.6 786432 6.0 669952 5.2 |
and as a proof, we can create 100x10^7
elements matrix
1 2 |
n <- 100*10^7 big_matrix <- matrix(1:n, ncol = 100) |
and check how much memory R
is using:
1 |
gc() |
1 2 3 |
## used (Mb) gc trigger (Mb) max used (Mb) ## Ncells 147450 7.9 3.500e+05 18.7 3.5e+05 18.7 ## Vcells 500338959 3817.3 1.051e+09 8015.5 1.0e+09 7632.4 |
The increase of memory usage is related to the newly created matrix that takes:
1 |
print(object.size(big_matrix), units = "Gb") |
1 |
## 3.7 Gb |
When this matrix is removed, the memory is immediately released to the operating system.
1 2 |
rm(big_matrix) gc() |
1 2 3 |
## used (Mb) gc trigger (Mb) max used (Mb) ## Ncells 147755 7.9 350000 18.7 3.5e+05 18.7 ## Vcells 339529 2.6 840481003 6412.4 1.0e+09 7632.4 |