Using R functions

Mathematical Functions

The names of the basic functions and mathematical operators in R follow the standards of programming languages. In this paragraph the functions and operators enabling basic mathematical operations to be performed will be dealt with. R can also be used to perform more complex calculations, such as matrix operations or calculations with complex numbers.

Functions are usually applied to one or more vectors. In this case, operations are performed on each element of each vector. Vectors have to be the same length.

The sum() function calculates the sum of all the elements of a vector.

The floor, ceiling, trunc and round functions can be used to round a number. floor() returns a numeric vector containing the largest integers not greater than the corresponding input elements. ceiling() returns a numeric vector containing the smallest integers not less than the corresponding input elements. trunc() returns a numeric vector containing the integers formed by truncating the input values toward 0. round() rounds the values in its first argument to the specified number of decimal places (default 0).

Probabilistic Functions

Probabilistic functions in R fall into four categories:

  1. r* functions for generating random numbers,
  2. d* functions for calculating the value of the density function in a point,
  3. p* functions for calculating the cumulative distribution function,
  4. q* functions for calculating quantiles.

The asterisk indicates the distribution which is used: norm for normal distribution, t for Student’s t-distribution, binom for binomial distribution, gamma distribution, beta distribution, weibull distribution, etc. R integrates numerous statistical distributions. The list of all the probability distributions included in the base R can be obtained by typing help(Distributions). Other probability distributions become available when loading additional packages.

The following functions:

generate, respectively:

  • 10 pseudorandom values from a normal distribution with parameters (0, 1);
  • 20 pseudorandom values from a normal distribution with parameters (3, 5);
  • 50 pseudorandom values from a binomial distribution with \(n = 20\) and \(\pi = 0.8\);
  • 50 pseudorandom values from a Weibull distribution with parameters (5, 3).

The following functions:

calculate, respectively:

  • the probability that x is equal to 20, if x is distributed as a binomial distribution with \(n = 20\) and \(\pi = 0.8\);
  • thee values of the density function of a standard normal for integer values comprised between -5 and 5. As expected, the highest value is obtained with 0.

The following functions:

calculate, respectively:

  • the value of the cumulative distribution function of a standard normal distribution at zero; as expected the result is 0.5.
  • the value of a cumulative distribution function of a binomial distribution with parameters \(n = 20\) and \(\pi = 0.8\) at 20; as expected, the result is 1.

The following functions:

calculate, respectively:

  • the quantile with which a 0.5 probability on the left is obtained in a standard normal distribution;
  • The quantile with which a 0.5 probability on the left is obtained in a binomial distribution with parameters \(n = 20\) and \(\pi = 0.8\).

Statistical Functions

Any kind of statistical analysis can be performed in R thanks to the built-in functions of the base version or the numerous additional packages. The functions enabling the calculation of the main descriptive statistical analyses are explained below.

The mean(), median(), sd() and var() functions are used to calculate the mean, the median, the sample standard deviation and the sample variance of a numeric vector.

The quantile() function calculates one or more quantiles.

The min() and max() functions return the minimum and maximum value respectively.

The summary() generic function applied to a numeric vector returns minimum, maximum, quartiles and arithmetic mean.

Correlation and covariance can be calculated with the cor() and cov() functions, respectively.

String Manipulation in R

The base R provides a wide set of functions to manipulate character strings.

Some functions are particularly useful when character vectors are manipulated. Some functions allowing string manipulation are shown below. R has other numerous, sometimes complex, tools for manipulating characters, for example using regular expressions.

Consider the vector with the names of some cars of the F1 championship.

The nchar() function returns the number of characters contained in a text string.

The tolower() and toupper() functions change all the characters in a string to lower and upper cases respectively.

The paste() function merges two or more strings using the sep parameter as separator. Inserting the result of a numerical operation into a string can be useful.

The substring() function extracts a part of the string. The extraction can be performed by defining the first and last parameter of the function which correspond to the first and last character to be extracted.

The strsplit() function divides a string into different elements. The split parameter of the function separates the string exactly where the defined element is situated. The string segments which have been created are saved in a vector which, in turn, is saved inside a list. To obtain a vector with substrings as components apply the unlist() function to the newly created list.

The sub() function replaces a part of a string. The parameters of the sub() function are pattern, which contains the substring to be replaced, replacement which contains the substring which will replace the previous one and x which contains the string on which the replacement has to be performed.

The sub() function stops after a replacement. If there is more than one substring to be replaced use the gsub() function. The following example removes the spaces contained in the string.

The match() function enables the comparison of a character vector with a single string.

The match is the first match which appears in the vector. As a matter of fact, if the carnames vector of the above-mentioned example had another “McLaren Mercedes” element after position 2, the results would be the same.

The match() function does not recognize substrings. In this case, the pmatch() function is used. pmatch() should be used to search for partial matching at the begin of a string:

The result of the function is NA if the x substring appears more than once in the vector contained in table.

Use the grep() function to search for a pattern inside a string.

Summary

In this chapter, we introduced the main mathematical, probabilistic and statistical functions within R. We manipulated strings using the character manipulation functions. In the next chapter, we’ll explore the graphical potentiality of R.