Object Oriented Programming

Object oriented programming is a programming paradigm based on classes and methods.

A class is an abstract definition of a concrete real world object. A class is generally made of ordered and named slots.

For instance, a rectangle is defined given the lengths of its sides. Therefore objects of class rectangle can be defined by a class containing two slots of type numeric, named for instance x and y, corresponding to its sides. The specific rectangle of sides x = 3 and y = 6 represents an instance of the class rectangle.

Given a class, any number of dedicated methods can be written for that class. A specific method, for a given class, is a function that performs specific actions on an object of that class. Specific methods are defined as particular cases of general methods.

This mechanism is almost everywhere in R. For instance:

When calling head(cars), R understand that cars is an object of class data.frame, head is a generic method and therefore, R looks for a specific head method for objects of class data.frame. This method exists and is named head.data.frame as shown by:

As head.data.frame is defined as a non-visible function within the namespace of utils, its content can be visualized by typing:

Finally, when calling method head on an object of class function:

According to the same mechanism, R returns the first six row of the lm() function by calling utils:::head.function.

End users are not interested in the class structure itself but do care about methods that are available to access the class. The R way of reaching this goal is to use generic functions and method dispatch: the same function performs different computations depending on the types of its arguments.

R is both interactive and has a system for object-orientation. The interactive component of R is a great tools for data analysis and quick development. Nevertheless, when it comes to software development, especially software development at enterprise level, a serious object oriented programming system is recommended.

R tries to achieve a compromise between object orientation and interactive programming and, although compromises are never optimal with respect to all goals they try to reach, they often work surprisingly well in practice.

Being able to understand when interactive programming has to be converted and structured into an object oriented library is a key point to make best use of R.

The S language, of which R is a dialect, has two object systems, known informally as S3 and S4. Their names originate from the version of S they appear first. S3 objects, classes and methods have been available in R from the beginning. S4 objects, classes and methods have been available in R through the methods package, attached by default since R version 1.7.0.

S3

S3 objects, classes and methods have been available in R from the beginning, they are informal, yet ‘’very interactive’‘. S3 was first described in the’‘White Book’’ (Statistical Models in S).

S3 is not a real class system, it mostly is a set of naming conventions. Classes are attached to objects as simple attributes. Method dispatch looks for the class of the first argument and then searches for functions conforming to a naming convention: do() methods for objects of class obj are called do.obj(). If no do method is found, S3 searches for do.default().

This system is simple and powerful at the same time. Objects of widely used classes such as lm or glm are still implemented as S3:

Nevertheless, S3 is far from be structured and validated:

The system should not accept that a simple string can be defined as an object of class linear model.

S4

S4 objects, classes and methods are much more formal and rigorous, hence ‘’less interactive’‘. S4 was first described in the’‘Green Book’’ (Programming with Data). In R it is available through the methods package, attached by default since version 1.7.0.

Example: Class rectangle

As a simple example, we can consider a class rectangle. As any rectangle can be entirely defined by the dimensions of its sides, a class for objects of type rectangle can be defined as made of two numeric slots: x and y representing the sides of the rectangle.

Note the use of argument prototype with function setClass. This argument allows to create a rectangle of sides x=1 and y=1 whenever its dimensions are not explicitelly given.

Once the class is defined, an object of class rectangle can be created by:

Generally, objects are not created directly by using function new(). We usually define a specific function in order to perform this task:

The prototype argument of class definition allows great flexibility when passing arguments to function rectangle:

As seen, class definition performs same validity check by itself. Nevertheless, either zeros or negative numbers should not be accepted as valid input for sides dimensions. For appropriate validity check a specific validity method can be defined by using function setValidity(). Note that validity methods are stored together with class definitions.

Testing the class after validity method is defined allows great control on input arguments:

After the class is defined, we can define basic methods, generally: show, print, summary and plot. Method show() is usually the first method we develop as this method is applied when objects are called without a function and allows objects to be displayed in a ordered and clear fashion.

We can define a method print(), with identical output to show():

We can write a more exhaustive output with method summary():

area and perimeter, as they have been computed are returned as invisible from method summary().

Method plot() closes the list of standard methods usually developed for any class:

plot of chunk s4-018

print(), plot() and summary() are existing generic methods. If required, we can define a new generic method. For instance, a rotate() method that rotates the rectangle of 90 degree can be defined in two steps:

  • Define the generic rotate() method as it does not exists by default in R.
  • Define a specific rotate() method for objects of class rectangle.

Given the rotate generic method a rotate specific method for class rectangle can be written as:

plot of chunk s4-020

Example: Class parallelepiped

Given class rectangle, class parallelepiped can be defined as an extension of class rectangle. The whole structure of class rectangle is inherited by parallelepiped. Therefore, when defining the new class, only additional slots need to be defined. Specifically, only slot z representing the third dimension of the parallelepiped needs to be defined. Slots x and y are implicitly inherited from parent class rectangle along with all defined methods.

Class parallelepiped is explicitly defined as an extension of class rectangle and R tracks all of this within the definitions of both rectangle and parallelepiped classes.

Specific methods can be written for class parallelepiped. Alternatively, methods of the parent class rectangle are used. Note that this may lead to some confusion:

Clearly these are not all the information someone would expect about a parallelepiped. A new print method should be written that includes, at least, side z:

Example: Class square

The same mechanism can be used the other way round in order to define classes that are specific cases of an existing class. Again, methods are inherit from parent to child:

plot of chunk s4-025

Moreover, a class square can be defined as a coerced class from class rectangle by writing a definition for function setAs(). As an example, definition may impose that any rectangle(x, y) is coerced into a square(x).

Definition written within setAs() function is then used by R when calling function as():

Example: Rolygons’’ S4 with closures

The combination of the S4 methods with functional programmimg tecniques permits the development of quite interesting coding techniques.

In this case we want to generate a set of functions each of them returning a regular polygon: square, pentagon, etc with a built in plot method.

Thus, we first define a rolygon() function that returns a generic function capable of generating specific regular polygons with plot method inherited from rolygons environment:

Note that class rolygon, its plot method and f() function are all defined within the evaluation environment of rolygon(). When rolygon is evaluated, f() is returned and f() remembers about class rolygon and its plotting method.

As a result, we can define an heptagon() function as:

a specific heptagon of side = 1 becomes:

as heptagon() has a plot method built in, we only need:

plot of chunk s4-030

Finally, with a bit of imagination:

plot of chunk s4-032

S4 House keeping

Package methods dispatches several function for S4 object oriented programming and most of them have already been illustrated in the previous section:

  • define classes: setClass()
  • create objects: new()
  • define generics: setGeneric()
  • define methods: setMethods()
  • delete classes: removeClasses()
  • delete methods: removeMethods()
  • convert objects: as(), setAs()
  • check object validity: setValidity(), validObject()
  • access registry: showClass(), showMethods(), getMethod()

When a class or a method is created, R saves it in a dedicated registry within the working environment. Each package has its own dedicated registry. Methods and classes are usually accessed by dedicated functions. Functions showClasses() and getClasses() return the structure, of a class. For instance, in order to gain the structure of class rectangle:

The validity function, if defined, of a given class is obtained by function getValidity():

The function showMethods() checks weather a method exists for given class; to check show and print methods for class rectangle:

Note that omitting argument f within showMethods() returns all methods for a given class:

The definition of a given method can be displayed by:

As methods and classes are created they can be deleted respectively with functions removeClasses() and removeMethods().

References

kindly provided at : stackoverflow ### On the web

  • The methods help files : help files from the package methods, where much of the necessary information can be found
  • S4 classes in 15 pages : Short introduction on the programming with S4 objects.
  • How S4 methods work : more explanation about the underlying mechanisms.
  • Not so short introduction to S4 : with practical examples of how to construct the classes and some useful tips. It contains a handy overview as appendix, but contains errors as well. Care should be taken using this one.
  • OOP in R : handout notes with practical examples on S3 and S4
  • S4 Objects : presentation by Thomas Lumley about S4 objects

Books

  • Software for Data Analysis-Programming with R (J. Chambers) : A classic, although not reviewed positive everywhere, that contains a large section on S4
  • R programming for Bioinformatics (R. Gentleman) : specifically directed towards working with Bioconductor, which is completely based on S4. But it gives a broad overview and is useful for many other people too.

RC reference classes

A recent development in R is Reference classes also known as RC or R5.

RC makes R object oriented programming paradigm very close to those implemented in C++ or Java.

On the other hand, when approaching reference classes we should also take into account that:

  • Documentation on reference classes is still very limited
  • RC require to learn a new form of programming syntax
  • Mutable state does not fit very well the no side effect nature of most R functions

Example: zero_one, a toy example

As first basic example consider creating a new class zero_one with two self explicative methods associated to it: $set_to_zero() and set_to_one()

First notice that:

  • RC does not simply register a class, as setClass() in S4 does, but holds the newly created class in an object. Now, object zero_one holds class zero_one.
  • fields corresponds to representation in S4
  • methods defines functions as true methods belonging to the class

The call to setRefClass() defines class zero_one and returns a generator object for class zero_one.

By using method $new() we create a new object of class zero_one

We can now apply methods $set_to_zero() and set_to_one() to the newly created object:

and see how zero_one_test modifies its fields

R functions usually do not have any side effects. Objects are modified by assignment and this happens within a copy on mofy criterion. Reference classes instead allows us to mutate the state of objects without duplicating them.

Reference class methods can use the operator <<-. This modifies the value of a field in place by using a combination of environment and makeActiveBinding().

Example: A stack implementation

Within this example we define a stack implementation with methods $put_in() and $get_out() where the latest come with two flavors:

  • fifo: first in first out
  • lilo: last in last out

We first define the reference class:

And now we test it: