Select Page

## Introduction

Assuming that we are all familiar with classic R objects such as vectors, matrices, lists, data.frames, etc …, this chapter takes into consideration a critical type of objects: environments.

Within the R computation mechanism, environments play a crucial role as they are constantly used by R just behind the scene of interactive computation.

An environment is an object that takes care of mapping variable names to values. Each mapping is called a binding.

Being able to understand and manage environments represents a key step in the R programming learning curve.

## Environments in R

The environment definition is clearly stated by in R Language Definition manual:

Environments can be thought of as consisting of two things.

• A frame, consisting of a set of symbol-value pairs,
• an enclosure, a pointer to an enclosing environment.

Given that a frame is a set of objects each of them associated to a name, where a name is a simple character string, in practice, we can consider an environment as a self contained portion of memory containing a frame. Each environment can access one and only one other environment known as the parent environment.

Environments in R are created, and eventually destroyed, under many circumstances.

Any R session has an environment associated known as the global environment. as returned by functions globalenv() and environment():

When we are working with R in interactive mode, we are using the frame within the globalenv as a container for our objects:

Any package has at least one environment:

Almost all functions have an environment as part of their definition:

User defined functions have an environment too:

Function environmentName() returns the name of an environment. As a result we may query R for the environment of function f():

or for the name of the environment associated to a package:

Unfortunately, function environmentName() does not always return the expected results:

Environment names for packages and namespaces are assigned at the C level. Therefore, user created environments do not reveal names. Users cannot set the name of an environment in R even through a, possibly misleadingly named, function called environmentName() exists. This function is really only meant for packages and namespaces, not other environments.

## The ‘’environment tree structure’’

The definition of environment also states that an environment is made of an enclosure: a pointer to an enclosing environment. As a consequence, any environment has a parent environment that, as an environment has a parent environment. This chain of parent environments, known as the environment tree structure, roots to a special environment called the empty environment that, as stated by its name, contains no objects.

R has a very useful function, known as parent.env(), that returns the parent of any given environment:

In order to visualize the environment tree structure we can easily define a function that returns this structure starting from any given environment:

The above function make use of function Recall() that will be examined in the chapter dedicated to functions.

We can test tree() starting with globalenv() as argument:

Or we may want to use the built in functions search() that returns similar results

When we attach a list, usually a data.frame, we actually insert an entry in the environment tree structure in the position given by the pos argument of function attach(). As this parameter defaults to pos=2L, most of the times we attach just underneath the global environment:

When loading libraries, functions library() or require() work on a similar basis and use the same parameter pos = 2L

## How R looks for objects

When R looks for any object, a symbol value pair, by default R looks for a matching symbol in the current environment and, if a matching symbol is found, the corresponding value is returned.

In case we want to search starting from a different environment we are usually able to specify it directly. As an example, we may consider the well known function get() that has an argument envir specifying which environment to search, at least as a starting point.

As a result, we can create an object named Formaldehyde in the current environment:

and use get() to find it along with the environment where to look for:

Note that an object with the same name exists in the environment of package:datasets and we can find it by specifying the right environment:

When R does not find the required symbol in the current environment, R looks in the parent environment and then in the parent of the parent until R either finds the symbol in any environment or reaches the empty environment. In the latest case, as by definition the empty environment contains no objects, R returns an error.

Given this search mechanism, R stops searching as soon as it finds an object with the corresponding name ignoring any object with the same name in any other environment in the environment tree structure.

This effect, known as masking, may result in quite embarrassing situations.

As a very simple example, suppose we define a simple function for computing circumference length given radius as argument:

and that, at any point of our working session we defined:

The result we would gain looks quite embarrassing:

In this case the object pi in the globalenv() :

masks the same symbol in the base environment

A robust method that reduce the risk of masking consists in specifying the package we are calling objects from: We could achieve this goal by using the ‘’::’’ operator:

Finally, any conflict is returned by:

## Computing with Environments

As we have seen, environments are an essential components of the R working mechanism. As a consequence, it should not come as a surprise if environments are defined as R objects themselves.

As a consequence of being R objects, environments can be created:

and eventually deleted:

The frame component of an environment can be used as an objects place holder almost as we do with lists. We can place objects within an environment at least in three different ways:

## Hashed environments

When we create a new environment, by setting hash=TRUE: the default value, we create a hashed environment.

In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys, to associated values. Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought.

Hashed environment, allow value look up by symbol faster than traditional methods at the price of the hash table implementation.

As a proof of concept we may consider the following example.

First, we create a simple data frame whose rows represent name-value pairs:

Secondly, we create a new environment and we fill it with the name-value pairs so that we define, within the newly created environment, n objects of value i and name p.i:

As we can see, implementing the hash table require a certain amount of computing time.

We now define a random sample of names:

and finally, we want to create a vector out containing the values corresponding to each name. In practice, if we selected what <- c(p.1,p.2,p.3) we would like R to return c(1,2,3).

In order to achieve this result we may use either a crazy for loop approach

or the common R vectorized approach

or, finally, the new hash approach

Definitely, the hash approach, if we are willing to pay the computational price required for building the hash table, offers a clear advantage.