Using R functions
Mathematical Functions
The names of the basic functions and mathematical operators in R follow the standards of programming languages. In this paragraph the functions and operators enabling basic mathematical operations to be performed will be dealt with. R can also be used to perform more complex calculations, such as matrix operations or calculations with complex numbers.
Functions are usually applied to one or more vectors. In this case, operations are performed on each element of each vector. Vectors have to be the same length.
1 2 3 4 |
x = 1:10 y = 11:20 z = -4:5 x + y + z |
1 |
## [1] 8 11 14 17 20 23 26 29 32 35 |
1 |
exp(x) |
1 2 |
## [1] 2.718 7.389 20.086 54.598 148.413 403.429 1096.633 ## [8] 2980.958 8103.084 22026.466 |
1 |
log(z) |
1 |
## Warning: Si è prodotto un NaN |
1 |
## [1] NaN NaN NaN NaN -Inf 0.0000 0.6931 1.0986 1.3863 1.6094 |
1 |
abs(z) |
1 |
## [1] 4 3 2 1 0 1 2 3 4 5 |
1 |
sqrt(x) |
1 |
## [1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000 3.162 |
The sum()
function calculates the sum of all the elements of a vector.
1 |
sum(x) |
1 |
## [1] 55 |
The floor
, ceiling
, trunc
and round
functions can be used to round a number. floor()
returns a numeric vector containing the largest integers not greater than the corresponding input elements. ceiling()
returns a numeric vector containing the smallest integers not less than the corresponding input elements. trunc()
returns a numeric vector containing the integers formed by truncating the input values toward 0. round()
rounds the values in its first argument to the specified number of decimal places (default 0).
1 |
floor(3.14) |
1 |
## [1] 3 |
1 |
floor(3.67) |
1 |
## [1] 3 |
1 |
ceiling(3.14) |
1 |
## [1] 4 |
1 |
ceiling(3.67) |
1 |
## [1] 4 |
1 |
trunc(3.14) |
1 |
## [1] 3 |
1 |
trunc(3.67) |
1 |
## [1] 3 |
1 |
round(3.14, digits = 1) |
1 |
## [1] 3.1 |
1 |
round(3.19, digits = 1) |
1 |
## [1] 3.2 |
Probabilistic Functions
Probabilistic functions in R fall into four categories:
r*
functions for generating random numbers,d*
functions for calculating the value of the density function in a point,p*
functions for calculating the cumulative distribution function,q*
functions for calculating quantiles.
The asterisk indicates the distribution which is used: norm
for normal distribution, t
for Student’s t-distribution, binom
for binomial distribution, gamma
distribution, beta
distribution, weibull
distribution, etc. R integrates numerous statistical distributions. The list of all the probability distributions included in the base R can be obtained by typing help(Distributions)
. Other probability distributions become available when loading additional packages.
The following functions:
1 |
rnorm(n = 10) |
1 2 |
## [1] -1.41236 -0.76566 0.18043 0.07248 0.68410 -0.06415 -0.50363 ## [8] -2.83625 -0.17094 0.01284 |
1 |
rnorm(n = 20, mean = 3, sd = 5) |
1 2 3 |
## [1] 7.58407 7.01845 5.32963 5.53594 4.49414 8.08674 -3.86649 ## [8] 6.56173 -2.86031 -3.68357 4.55096 3.72336 7.77772 -7.12217 ## [15] 9.44867 0.04972 7.44873 6.23496 0.79544 -1.91693 |
1 |
rbinom(n = 50, size = 20, prob = 0.8) |
1 2 3 |
## [1] 12 16 16 14 17 18 12 17 18 17 17 16 16 16 16 15 12 13 17 13 15 16 18 ## [24] 17 16 12 13 17 13 15 13 13 15 15 18 16 19 17 15 19 18 12 15 14 17 14 ## [47] 17 15 16 15 |
1 |
rweibull(n = 30, shape = 5, scale = 3) |
1 2 3 |
## [1] 1.328 2.991 3.018 3.112 2.918 2.492 2.640 2.954 3.316 2.061 3.025 ## [12] 3.008 2.747 2.686 2.041 3.711 3.233 3.156 1.809 3.218 1.448 2.144 ## [23] 2.668 2.958 1.559 3.035 0.804 2.080 2.026 3.428 |
generate, respectively:
- 10 pseudorandom values from a normal distribution with parameters (0, 1);
- 20 pseudorandom values from a normal distribution with parameters (3, 5);
- 50 pseudorandom values from a binomial distribution with \(n = 20\) and \(\pi = 0.8\);
- 50 pseudorandom values from a Weibull distribution with parameters (5, 3).
The following functions:
1 |
dbinom(x = 20, size = 20, prob = 0.8) |
1 |
## [1] 0.01153 |
1 |
dnorm(x = -5:5, mean = 0, sd = 1) |
1 2 |
## [1] 1.487e-06 1.338e-04 4.432e-03 5.399e-02 2.420e-01 3.989e-01 2.420e-01 ## [8] 5.399e-02 4.432e-03 1.338e-04 1.487e-06 |
calculate, respectively:
- the probability that
x
is equal to 20, ifx
is distributed as a binomial distribution with \(n = 20\) and \(\pi = 0.8\); - thee values of the density function of a standard normal for integer values comprised between -5 and 5. As expected, the highest value is obtained with 0.
The following functions:
1 |
pnorm(q = 0, mean = 0, sd = 1) |
1 |
## [1] 0.5 |
1 |
pbinom(q = 20, size = 20, prob = 0.8) |
1 |
## [1] 1 |
calculate, respectively:
- the value of the cumulative distribution function of a standard normal distribution at zero; as expected the result is 0.5.
- the value of a cumulative distribution function of a binomial distribution with parameters \(n = 20\) and \(\pi = 0.8\) at 20; as expected, the result is 1.
The following functions:
1 |
qnorm(p = 0.5, mean = 0, sd = 1) |
1 |
## [1] 0 |
1 |
qbinom(p = 0.5, size = 20, prob = 0.8) |
1 |
## [1] 16 |
calculate, respectively:
- the quantile with which a 0.5 probability on the left is obtained in a standard normal distribution;
- The quantile with which a 0.5 probability on the left is obtained in a binomial distribution with parameters \(n = 20\) and \(\pi = 0.8\).
Statistical Functions
Any kind of statistical analysis can be performed in R thanks to the built-in functions of the base version or the numerous additional packages. The functions enabling the calculation of the main descriptive statistical analyses are explained below.
The mean()
, median()
, sd()
and var()
functions are used to calculate the mean, the median, the sample standard deviation and the sample variance of a numeric vector.
1 2 |
x = mtcars$mpg mean(x) |
1 |
## [1] 20.09 |
1 |
median(x) |
1 |
## [1] 19.2 |
1 |
sd(x) |
1 |
## [1] 6.027 |
1 |
var(x) |
1 |
## [1] 36.32 |
The quantile()
function calculates one or more quantiles.
1 |
quantile(x, .9,) |
1 2 |
## 90% ## 30.09 |
1 |
quantile(x, c(.3, .84)) |
1 2 |
## 30% 84% ## 15.98 26.05 |
1 |
quantile(x, c(.25, .50, .75)) |
1 2 |
## 25% 50% 75% ## 15.43 19.20 22.80 |
The min()
and max()
functions return the minimum and maximum value respectively.
1 |
min(x) |
1 |
## [1] 10.4 |
1 |
max(x) |
1 |
## [1] 33.9 |
The summary()
generic function applied to a numeric vector returns minimum, maximum, quartiles and arithmetic mean.
1 |
summary(x) |
1 2 |
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 10.4 15.4 19.2 20.1 22.8 33.9 |
Correlation and covariance can be calculated with the cor()
and cov()
functions, respectively.
1 2 |
data = mtcars[, c(1, 3, 4, 5, 6)] cor(data) |
1 2 3 4 5 6 |
## mpg disp hp drat wt ## mpg 1.0000 -0.8476 -0.7762 0.6812 -0.8677 ## disp -0.8476 1.0000 0.7909 -0.7102 0.8880 ## hp -0.7762 0.7909 1.0000 -0.4488 0.6587 ## drat 0.6812 -0.7102 -0.4488 1.0000 -0.7124 ## wt -0.8677 0.8880 0.6587 -0.7124 1.0000 |
1 |
cov(data) |
1 2 3 4 5 6 |
## mpg disp hp drat wt ## mpg 36.324 -633.10 -320.73 2.1951 -5.1167 ## disp -633.097 15360.80 6721.16 -47.0640 107.6842 ## hp -320.732 6721.16 4700.87 -16.4511 44.1927 ## drat 2.195 -47.06 -16.45 0.2859 -0.3727 ## wt -5.117 107.68 44.19 -0.3727 0.9574 |
String Manipulation in R
The base R provides a wide set of functions to manipulate character strings.
Some functions are particularly useful when character vectors are manipulated. Some functions allowing string manipulation are shown below. R has other numerous, sometimes complex, tools for manipulating characters, for example using regular expressions.
Consider the vector with the names of some cars of the F1 championship.
1 2 |
load("charmanip.Rda") carnames |
1 2 3 4 |
## [1] "Red Bull Racing Renault" "McLaren Mercedes" ## [3] "Ferrari" "Mercedes" ## [5] "Renault" "Force India Mercedes" ## [7] "Sauber Ferrari" "Scuderia Toro Rosso Ferrari" |
The nchar()
function returns the number of characters contained in a text string.
1 |
nchar("My auto is dark grey") |
1 |
## [1] 20 |
1 |
nchar(carnames) |
1 |
## [1] 23 16 7 8 7 20 14 27 |
The tolower()
and toupper()
functions change all the characters in a string to lower and upper cases respectively.
1 |
tolower(carnames) |
1 2 3 4 |
## [1] "red bull racing renault" "mclaren mercedes" ## [3] "ferrari" "mercedes" ## [5] "renault" "force india mercedes" ## [7] "sauber ferrari" "scuderia toro rosso ferrari" |
1 |
toupper(carnames) |
1 2 3 4 |
## [1] "RED BULL RACING RENAULT" "MCLAREN MERCEDES" ## [3] "FERRARI" "MERCEDES" ## [5] "RENAULT" "FORCE INDIA MERCEDES" ## [7] "SAUBER FERRARI" "SCUDERIA TORO ROSSO FERRARI" |
The paste()
function merges two or more strings using the sep parameter as separator. Inserting the result of a numerical operation into a string can be useful.
1 2 |
ret = 4 + 2 + 1 + 5 + 4 + 1 + 5 + 7 + 8 + 4 paste("Scuderia Toro Rosso Ferrari scored", ret, "points in 2011") |
1 |
## [1] "Scuderia Toro Rosso Ferrari scored 41 points in 2011" |
1 |
paste("Scuderia Toro Rosso score (2011)", ret, sep = ": ") |
1 |
## [1] "Scuderia Toro Rosso score (2011): 41" |
The substring()
function extracts a part of the string. The extraction can be performed by defining the first
and last
parameter of the function which correspond to the first and last character to be extracted.
1 |
substring(carnames, first = 1, last = 3) |
1 |
## [1] "Red" "McL" "Fer" "Mer" "Ren" "For" "Sau" "Scu" |
The strsplit()
function divides a string into different elements. The split
parameter of the function separates the string exactly where the defined element is situated. The string segments which have been created are saved in a vector which, in turn, is saved inside a list. To obtain a vector with substrings as components apply the unlist()
function to the newly created list.
1 2 |
carlist = strsplit(carnames, " ") head(carlist) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
## [[1]] ## [1] "Red" "Bull" "Racing" "Renault" ## ## [[2]] ## [1] "McLaren" "Mercedes" ## ## [[3]] ## [1] "Ferrari" ## ## [[4]] ## [1] "Mercedes" ## ## [[5]] ## [1] "Renault" ## ## [[6]] ## [1] "Force" "India" "Mercedes" |
1 |
unlist(carlist) |
1 2 3 |
## [1] "Red" "Bull" "Racing" "Renault" "McLaren" "Mercedes" ## [7] "Ferrari" "Mercedes" "Renault" "Force" "India" "Mercedes" ## [13] "Sauber" "Ferrari" "Scuderia" "Toro" "Rosso" "Ferrari" |
The sub()
function replaces a part of a string. The parameters of the sub()
function are pattern
, which contains the substring to be replaced, replacement
which contains the substring which will replace the previous one and x
which contains the string on which the replacement has to be performed.
1 2 |
x = "basic statistics course" sub(pattern = "basic", replacement = "advanced", x = x) |
1 |
## [1] "advanced statistics course" |
The sub()
function stops after a replacement. If there is more than one substring to be replaced use the gsub()
function. The following example removes the spaces contained in the string.
1 2 |
x = "Basic Statistics Course With R" sub(pattern = " ", replacement = "", x = x) |
1 |
## [1] "BasicStatistics Course With R" |
1 |
gsub(pattern = " ", replacement = "", x = x) |
1 |
## [1] "BasicStatisticsCourseWithR" |
The match()
function enables the comparison of a character vector with a single string.
1 |
match(x = "McLaren Mercedes", table = carnames) |
1 |
## [1] 2 |
1 |
match(x = carnames, table = "McLaren Mercedes") |
1 |
## [1] NA 1 NA NA NA NA NA NA |
1 |
match(x = carnames, table = "McLaren Mercedes", nomatch = 0) |
1 |
## [1] 0 1 0 0 0 0 0 0 |
The match is the first match which appears in the vector. As a matter of fact, if the carnames
vector of the above-mentioned example had another “McLaren Mercedes” element after position 2, the results would be the same.
The match()
function does not recognize substrings. In this case, the pmatch()
function is used. pmatch()
should be used to search for partial matching at the begin of a string:
1 |
match(x = "Scuderia Toro", table = carnames) |
1 |
## [1] NA |
1 |
pmatch(x = "Scuderia Toro", table = carnames) |
1 |
## [1] 8 |
1 |
pmatch(x = "Toro Rosso", table = carnames) |
1 |
## [1] NA |
The result of the function is NA
if the x
substring appears more than once in the vector contained in table
.
1 |
pmatch(x = "Re", table = carnames) |
1 |
## [1] NA |
1 |
pmatch(x = "Red", table = carnames) |
1 |
## [1] 1 |
Use the grep()
function to search for a pattern inside a string.
1 2 |
pos = grep(pattern = "Mercedes", x = carnames) pos |
1 |
## [1] 2 4 6 |
1 |
carnames[pos] |
1 |
## [1] "McLaren Mercedes" "Mercedes" "Force India Mercedes" |
Summary
In this chapter, we introduced the main mathematical, probabilistic and statistical functions within R. We manipulated strings using the character manipulation functions. In the next chapter, we’ll explore the graphical potentiality of R.