# R and the R Project

## The R-Project: a bit of history

R is a programming environment for data analysis, graphics and statistical computing. The R language is widely used among statisticians for developing statistical software and data analysis.

R was initially developed in early 90s by Robert Gentleman and Ross Ihaka at the Department of Statistics of the University of Auckland as a dialect of the S language.

The R name is partly based on the (first) names of the first two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of S.

### What is S and a bit of history

S is a statistical programming language developed by John Chambers and others in Bell Laboratories.

A bit of history:

- 1976: the first version of S was developed as an internal statistical analysis environment. It was originally implemented as Fortran libraries.
- 1980: the first version of S distributed outside of Bell Laboratories. In 1981, source version were made available.
- 1984: Richard A. Becker and John M. Chambers, “S. An Interactive Environment for Data Analysis and Graphics”. (Brown Book). Historical interest only.
- 1988: Richard A. Becker, John M. Chambers and Allan R. Wilks, “The New S Language”. London: Chapman & Hall. (Blue Book). It introduced what is now known as S version 2. The system was rewritten in C and began to resemble the system that we have today.
- 1992: John M. Chambers and Trevor J. Hastie, “Statistical Models in S”. (White Book). It introduced S version 3, often abbreviated S3, which added structures to facilitate statistical modeling in S.
- 1998: John M. Chambers, “Programming with Data”. (Green Book). It introduced S version 4, often abbreviated S4, which provided advanced object-oriented features. S4 classes differ markedly from S3 classes.

The S language itself has not changed dramatically since 1998.

### What is S-PLUS and a bit of history

S-PLUS is a commercial implementation of the S programming language.

S-PLUS provides a number of fancy features (GUIs, mostly) on top of it, hence the “PLUS”.

A bit of history:

- 1993: Statistical Sciences, Inc. acquires the exclusive license to distribute S and merges with MathSoft.
- 2001: MathSoft sells its Cambridge-based Engineering and Education Products Division (EEPD). It changes name to Insightful Corporation.
- 2004: Insightful purchases the S language from Lucent Technologies for $2 million.
- 2008: TIBCO acquires Insightful Corporation.

### R: a bit of history

- 1993: First announcement of R to the public.
- 1995: Martin Maechler convinces Ross Ihaka and Robert Gentleman to use the GNU General Public License to make R free software.
- 1997: The R Development Core Team is formed. The team controls the source code for R.
- 2000: R version 1.0.0 released. Developers considered R stable enough for production use.
- 2004: R version 2.0.0 released. Introduced lazy loading, which enables fast loading of data with minimal expense of system memory.
- 2013: R version 3.0.0 released. Introduced long vectors.

### The R-project and R licence

R is supported by a wide community of academic users, professors, companies and developers. This community composes the so-called “R-project”. The “R-project” is supported by the “R Foundation”. The R Foundation is a not for profit organisation.

R is an official part of the Free Software Foundation’s GNU project. The R Foundation has similar goals to other open source software foundations like the Apache Foundation or the GNOME Foundation. R is free and open source software. It is released under the GPL (version 2) licence.

R is free:

- you can have R without paying for it (freeware);
- you can copy and re-use the software (free software);
- you can access source code and modify it (open source).

## R Commercial Support

### Revolution R

Revolution Analytics (www.revolutionanalytics.com) was founded in 2007 to provide commercial support for Revolution R. Revolution R is the distribution of R developed by Revolution Analytics which also includes components developed by the company.

Revolution R Enterprise includes all of R’s advanced data analysis and graphics capabilities, plus additional components. Major additional components include: ParallelR (for parallel computing), the R Productivity Environment IDE, RevoScaleR (for big data analysis), RevoDeployR (web services framework and the ability for reading and writing data in the SAS file format).

## What R does?

R provides a suite of software facilities for:

- matrix algebra;
- hash tables and regular expressions;
- reading and manipulating data;
- computation;
- programming language: loops, subroutines, functions, etc.;
- conducting statistical analyses;
- graphics and tables;
- displaying the results.

On the contrary, R:

- it is not a database, but it connects to databases;
- it does not provide a graphical interface, but it uses Java, TclTk and, under Windows, COM to provide graphical interfaces;
- it is not a spreadsheet, but it connects to spreadsheets;
- it does not provide commercial support. Revolution R is a commercially supported distribution of R.

In conclusion, R is an interpreted computer language. R provides a platform for the development and implementation of new algorithms and technology transfer. Most user-visible functions are written in R itself, calling upon a smaller set of internal primitives. It is possible to interface procedures written in C, C+, or FORTRAN languages for efficiency, and to write additional primitives. System commands can be called from within R.

### R advantages and disadvantages

Main R advantages are:

- Fast and free.
- State of the art: Statistical researchers provide their methods as R packages. SPSS and SAS are years behind R!
- Excellent for graphics.
- Mx, WinBugs, and other programs use or will use R.
- Active user community.
- Excellent for simulation, programming, computer intensive analyses, etc.
- Forces you to think about your analysis.
- Interfaces with database storage software (SQL).

Main R disadvantages are:

- Not user friendly at start: steep learning curve, minimal GUI.
- Sometimes, figuring out correct methods or how to use a function on your own can be frustrating.
- Easy to make mistakes and not know.
- Working with large datasets is limited by RAM.
- Data preparation and cleaning can be messier and more mistake prone in R vs SPSS or SAS.

## R Resources

### R-project website

The R-project website (www.r-project.org) is the starting point for R materials.

The website contains:

- the software and packages;
- the search engine interface (the same queries can be submitted with the RSiteSearch(‘query’) function within R);
- the on-line documentation both in HTML and in PDF format. The HTML version can be accessed with the help.start() function within R;
- the R Journal. The R Journal is the open access, refereed journal of the R project. It features short to medium length articles covering topics that might be of interest to users or developers of R;
- the interface to the mailing list;
- the wiki, suggested books and many others.

The on-line documentation includes the following manuals. These manuals have been written by the R Development Core Team itself and contain precious information.

*An Introduction to R*gives an introduction to the language and how to use R for doing statistical analysis and graphics.*Writing R Extensions covers*how to create your own packages, write R help files, and the foreign language (C, C++, Fortran, …) interfaces.*R Data Import/Export*describes the import and export facilities available either in R itself or via packages which are available from CRAN.*R Installation and Administration*.

Other manuals and tutorials provided by R users can be downloaded from the R-project website (cran.r-project.org/other-docs.html).

Mailing lists is the most important tool to contact the R community. Mailing lists can be accessed from the R-project website (www.r-project.org/mail.html).

There are four general mailing lists devoted to R:

*R-announce*: This list is for major announcements about the development of R and the availability of new code.*R-packages*: This list is for announcements as well, usually on the availability of new or enhanced contributed packages (on CRAN, typically).*R-help*: The “main” R mailing list, for discussion about problems and solutions using R, announcements about the availability of new functionality for R and documentation of R, comparison and compatibility with S-plus, and for the posting of nice examples and benchmarks.*R-devel*: This list is intended for questions and discussion about code development in R.

### Other on-line resources

It is very difficult estimate how many sites about R are on-line. However, Google returns 224.000.000 sites searching “R stat blog”. Also if only the 0.1% of these sites talk about R, it means almost 220.000 sites about R.

R-bloggers (www.r-bloggers.com) is a blog aggregator of content collected from bloggers who write about R. R-bloggers contains R news and tutorials contributed by hundreds of R bloggers.

Other useful websites about R are:

- Quick-R (www.statmethods.net) is a useful on-line guide to R. It provides many examples and useful tips.
- R seek (rseek.org) uses Google to search in a selected list of websites about R.
- Graphiques avec R (zoonek2.free.fr/UNIX/48_R_2004/04.html) provides also a gallery of R-made graphics.

### Books

A partially annotated list of books that are related to S or R may be found in the R-project website (www.r-project.org/doc/bib/R-books.html).

The following book may be considered the milestone book about R: – William N. Venables and Brian D. Ripley. Modern Applied Statistics with S. Fourth Edition. Springer, New York, 2002. ISBN 0-387-95457-0.

Other suggested books are:

- Everitt and Hothorn (2009).
*A handbook of statistical analyses using R*. Chapman & Hall/CRC. - Chambers (2008).
*Software for Data Analysis*, Springer. - Chambers (1998).
*Programming with Data*, Springer. - Murrell (2005).
*R Graphics*, Chapman & Hall/CRC Press. - Dalgard (2002).
*Introductory Statistics with R*. Springer. - Kabakoff (2011).
*R in Action*. Manning. - Braun and Murdoch (2007).
*A First Course in Statistical Programming with R*. Cambridge University Press.

Springer is developing a series of books called *Use R!*.