Using R functions

Mathematical Functions

The names of the basic functions and mathematical operators in R follow the standards of programming languages. In this paragraph the functions and operators enabling basic mathematical operations to be performed will be dealt with. R can also be used to perform more complex calculations, such as matrix operations or calculations with complex numbers.

Functions are usually applied to one or more vectors. In this case, operations are performed on each element of each vector. Vectors have to be the same length.

x = 1:10
y = 11:20
z = -4:5
x + y + z

x = 1:10

y = 11:20

z = -4:5

x + y + z

##  [1]  8 11 14 17 20 23 26 29 32 35

1	## [1] 8 11 14 17 20 23 26 29 32 35

exp(x)

exp(x)

##  [1]     2.718     7.389    20.086    54.598   148.413   403.429  1096.633
##  [8]  2980.958  8103.084 22026.466

1 2	## [1] 2.718 7.389 20.086 54.598 148.413 403.429 1096.633 ## [8] 2980.958 8103.084 22026.466

log(z)

log(z)

## Warning: Si è prodotto un NaN

1	## Warning: Si è prodotto un NaN

##  [1]    NaN    NaN    NaN    NaN   -Inf 0.0000 0.6931 1.0986 1.3863 1.6094

1	## [1] NaN NaN NaN NaN -Inf 0.0000 0.6931 1.0986 1.3863 1.6094

abs(z)

abs(z)

##  [1] 4 3 2 1 0 1 2 3 4 5

1	## [1] 4 3 2 1 0 1 2 3 4 5

sqrt(x)

sqrt(x)

##  [1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000 3.162

1	## [1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000 3.162

The sum() function calculates the sum of all the elements of a vector.

sum(x)

sum(x)

## [1] 55

## [1] 55

The floor, ceiling, trunc and round functions can be used to round a number. floor() returns a numeric vector containing the largest integers not greater than the corresponding input elements. ceiling() returns a numeric vector containing the smallest integers not less than the corresponding input elements. trunc() returns a numeric vector containing the integers formed by truncating the input values toward 0. round() rounds the values in its first argument to the specified number of decimal places (default 0).

floor(3.14)

1	floor(3.14)

## [1] 3

## [1] 3

floor(3.67)

1	floor(3.67)

## [1] 3

## [1] 3

ceiling(3.14)

1	ceiling(3.14)

## [1] 4

## [1] 4

ceiling(3.67)

1	ceiling(3.67)

## [1] 4

## [1] 4

trunc(3.14)

1	trunc(3.14)

## [1] 3

## [1] 3

trunc(3.67)

1	trunc(3.67)

## [1] 3

## [1] 3

round(3.14, digits = 1)

1	round(3.14, digits = 1)

## [1] 3.1

1	## [1] 3.1

round(3.19, digits = 1)

1	round(3.19, digits = 1)

## [1] 3.2

1	## [1] 3.2

Probabilistic Functions

Probabilistic functions in R fall into four categories:

r* functions for generating random numbers,
d* functions for calculating the value of the density function in a point,
p* functions for calculating the cumulative distribution function,
q* functions for calculating quantiles.

The asterisk indicates the distribution which is used: norm for normal distribution, t for Student’s t-distribution, binom for binomial distribution, gamma distribution, beta distribution, weibull distribution, etc. R integrates numerous statistical distributions. The list of all the probability distributions included in the base R can be obtained by typing help(Distributions). Other probability distributions become available when loading additional packages.

The following functions:

rnorm(n = 10)

1	rnorm(n = 10)

##  [1] -1.41236 -0.76566  0.18043  0.07248  0.68410 -0.06415 -0.50363
##  [8] -2.83625 -0.17094  0.01284

1 2	## [1] -1.41236 -0.76566 0.18043 0.07248 0.68410 -0.06415 -0.50363 ## [8] -2.83625 -0.17094 0.01284

rnorm(n = 20, mean = 3, sd = 5)

1	rnorm(n = 20, mean = 3, sd = 5)

##  [1]  7.58407  7.01845  5.32963  5.53594  4.49414  8.08674 -3.86649
##  [8]  6.56173 -2.86031 -3.68357  4.55096  3.72336  7.77772 -7.12217
## [15]  9.44867  0.04972  7.44873  6.23496  0.79544 -1.91693

## [1] 7.58407 7.01845 5.32963 5.53594 4.49414 8.08674 -3.86649

## [8] 6.56173 -2.86031 -3.68357 4.55096 3.72336 7.77772 -7.12217

## [15] 9.44867 0.04972 7.44873 6.23496 0.79544 -1.91693

rbinom(n = 50, size = 20, prob = 0.8)

1	rbinom(n = 50, size = 20, prob = 0.8)

##  [1] 12 16 16 14 17 18 12 17 18 17 17 16 16 16 16 15 12 13 17 13 15 16 18
## [24] 17 16 12 13 17 13 15 13 13 15 15 18 16 19 17 15 19 18 12 15 14 17 14
## [47] 17 15 16 15

## [1] 12 16 16 14 17 18 12 17 18 17 17 16 16 16 16 15 12 13 17 13 15 16 18

## [24] 17 16 12 13 17 13 15 13 13 15 15 18 16 19 17 15 19 18 12 15 14 17 14

## [47] 17 15 16 15

rweibull(n = 30, shape = 5, scale = 3)

1	rweibull(n = 30, shape = 5, scale = 3)

##  [1] 1.328 2.991 3.018 3.112 2.918 2.492 2.640 2.954 3.316 2.061 3.025
## [12] 3.008 2.747 2.686 2.041 3.711 3.233 3.156 1.809 3.218 1.448 2.144
## [23] 2.668 2.958 1.559 3.035 0.804 2.080 2.026 3.428

## [1] 1.328 2.991 3.018 3.112 2.918 2.492 2.640 2.954 3.316 2.061 3.025

## [12] 3.008 2.747 2.686 2.041 3.711 3.233 3.156 1.809 3.218 1.448 2.144

## [23] 2.668 2.958 1.559 3.035 0.804 2.080 2.026 3.428

generate, respectively:

10 pseudorandom values from a normal distribution with parameters (0, 1);
20 pseudorandom values from a normal distribution with parameters (3, 5);
50 pseudorandom values from a binomial distribution with $n = 20$ and $\pi = 0.8$;
50 pseudorandom values from a Weibull distribution with parameters (5, 3).

The following functions:

dbinom(x = 20, size = 20, prob = 0.8)

1	dbinom(x = 20, size = 20, prob = 0.8)

## [1] 0.01153

1	## [1] 0.01153

dnorm(x = -5:5, mean = 0, sd = 1)

1	dnorm(x = -5:5, mean = 0, sd = 1)

##  [1] 1.487e-06 1.338e-04 4.432e-03 5.399e-02 2.420e-01 3.989e-01 2.420e-01
##  [8] 5.399e-02 4.432e-03 1.338e-04 1.487e-06

1 2	## [1] 1.487e-06 1.338e-04 4.432e-03 5.399e-02 2.420e-01 3.989e-01 2.420e-01 ## [8] 5.399e-02 4.432e-03 1.338e-04 1.487e-06

calculate, respectively:

the probability that x is equal to 20, if x is distributed as a binomial distribution with $n = 20$ and $\pi = 0.8$;
thee values of the density function of a standard normal for integer values comprised between -5 and 5. As expected, the highest value is obtained with 0.

The following functions:

pnorm(q = 0, mean = 0, sd = 1)

1	pnorm(q = 0, mean = 0, sd = 1)

## [1] 0.5

1	## [1] 0.5

pbinom(q = 20, size = 20, prob = 0.8)

1	pbinom(q = 20, size = 20, prob = 0.8)

## [1] 1

## [1] 1

calculate, respectively:

the value of the cumulative distribution function of a standard normal distribution at zero; as expected the result is 0.5.
the value of a cumulative distribution function of a binomial distribution with parameters $n = 20$ and $\pi = 0.8$ at 20; as expected, the result is 1.

The following functions:

qnorm(p = 0.5, mean = 0, sd = 1)

1	qnorm(p = 0.5, mean = 0, sd = 1)

## [1] 0

## [1] 0

qbinom(p = 0.5, size = 20, prob = 0.8)

1	qbinom(p = 0.5, size = 20, prob = 0.8)

## [1] 16

## [1] 16

calculate, respectively:

the quantile with which a 0.5 probability on the left is obtained in a standard normal distribution;
The quantile with which a 0.5 probability on the left is obtained in a binomial distribution with parameters $n = 20$ and $\pi = 0.8$.

Statistical Functions

Any kind of statistical analysis can be performed in R thanks to the built-in functions of the base version or the numerous additional packages. The functions enabling the calculation of the main descriptive statistical analyses are explained below.

The mean(), median(), sd() and var() functions are used to calculate the mean, the median, the sample standard deviation and the sample variance of a numeric vector.

x = mtcars$mpg
mean(x)

1 2	x = mtcars$mpg mean(x)

## [1] 20.09

1	## [1] 20.09

median(x)

median(x)

## [1] 19.2

1	## [1] 19.2

sd(x)

sd(x)

## [1] 6.027

1	## [1] 6.027

var(x)

var(x)

## [1] 36.32

1	## [1] 36.32

The quantile() function calculates one or more quantiles.

quantile(x, .9,)

1	quantile(x, .9,)

##   90% 
## 30.09

1 2	## 90% ## 30.09

quantile(x, c(.3, .84))

1	quantile(x, c(.3, .84))

##   30%   84% 
## 15.98 26.05

1 2	## 30% 84% ## 15.98 26.05

quantile(x, c(.25, .50, .75))

1	quantile(x, c(.25, .50, .75))

##   25%   50%   75% 
## 15.43 19.20 22.80

1 2	## 25% 50% 75% ## 15.43 19.20 22.80

The min() and max() functions return the minimum and maximum value respectively.

min(x)

min(x)

## [1] 10.4

1	## [1] 10.4

max(x)

max(x)

## [1] 33.9

1	## [1] 33.9

The summary() generic function applied to a numeric vector returns minimum, maximum, quartiles and arithmetic mean.

summary(x)

1	summary(x)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.4    15.4    19.2    20.1    22.8    33.9

1 2	## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 10.4 15.4 19.2 20.1 22.8 33.9

Correlation and covariance can be calculated with the cor() and cov() functions, respectively.

data = mtcars[, c(1, 3, 4, 5, 6)]
cor(data)

1 2	data = mtcars[, c(1, 3, 4, 5, 6)] cor(data)

##          mpg    disp      hp    drat      wt
## mpg   1.0000 -0.8476 -0.7762  0.6812 -0.8677
## disp -0.8476  1.0000  0.7909 -0.7102  0.8880
## hp   -0.7762  0.7909  1.0000 -0.4488  0.6587
## drat  0.6812 -0.7102 -0.4488  1.0000 -0.7124
## wt   -0.8677  0.8880  0.6587 -0.7124  1.0000

## mpg disp hp drat wt

## mpg 1.0000 -0.8476 -0.7762 0.6812 -0.8677

## disp -0.8476 1.0000 0.7909 -0.7102 0.8880

## hp -0.7762 0.7909 1.0000 -0.4488 0.6587

## drat 0.6812 -0.7102 -0.4488 1.0000 -0.7124

## wt -0.8677 0.8880 0.6587 -0.7124 1.0000

cov(data)

cov(data)

##           mpg     disp      hp     drat       wt
## mpg    36.324  -633.10 -320.73   2.1951  -5.1167
## disp -633.097 15360.80 6721.16 -47.0640 107.6842
## hp   -320.732  6721.16 4700.87 -16.4511  44.1927
## drat    2.195   -47.06  -16.45   0.2859  -0.3727
## wt     -5.117   107.68   44.19  -0.3727   0.9574

## mpg disp hp drat wt

## mpg 36.324 -633.10 -320.73 2.1951 -5.1167

## disp -633.097 15360.80 6721.16 -47.0640 107.6842

## hp -320.732 6721.16 4700.87 -16.4511 44.1927

## drat 2.195 -47.06 -16.45 0.2859 -0.3727

## wt -5.117 107.68 44.19 -0.3727 0.9574

String Manipulation in R

The base R provides a wide set of functions to manipulate character strings.

Some functions are particularly useful when character vectors are manipulated. Some functions allowing string manipulation are shown below. R has other numerous, sometimes complex, tools for manipulating characters, for example using regular expressions.

Consider the vector with the names of some cars of the F1 championship.

load("charmanip.Rda")
carnames

1 2	load("charmanip.Rda") carnames

## [1] "Red Bull Racing Renault"     "McLaren Mercedes"           
## [3] "Ferrari"                     "Mercedes"                   
## [5] "Renault"                     "Force India Mercedes"       
## [7] "Sauber Ferrari"              "Scuderia Toro Rosso Ferrari"

## [1] "Red Bull Racing Renault" "McLaren Mercedes"

## [3] "Ferrari" "Mercedes"

## [5] "Renault" "Force India Mercedes"

## [7] "Sauber Ferrari" "Scuderia Toro Rosso Ferrari"

The nchar() function returns the number of characters contained in a text string.

nchar("My auto is dark grey")

1	nchar("My auto is dark grey")

## [1] 20

## [1] 20

nchar(carnames)

1	nchar(carnames)

## [1] 23 16  7  8  7 20 14 27

1	## [1] 23 16 7 8 7 20 14 27

The tolower() and toupper() functions change all the characters in a string to lower and upper cases respectively.

tolower(carnames)

1	tolower(carnames)

## [1] "red bull racing renault"     "mclaren mercedes"           
## [3] "ferrari"                     "mercedes"                   
## [5] "renault"                     "force india mercedes"       
## [7] "sauber ferrari"              "scuderia toro rosso ferrari"

## [1] "red bull racing renault" "mclaren mercedes"

## [3] "ferrari" "mercedes"

## [5] "renault" "force india mercedes"

## [7] "sauber ferrari" "scuderia toro rosso ferrari"

toupper(carnames)

1	toupper(carnames)

## [1] "RED BULL RACING RENAULT"     "MCLAREN MERCEDES"           
## [3] "FERRARI"                     "MERCEDES"                   
## [5] "RENAULT"                     "FORCE INDIA MERCEDES"       
## [7] "SAUBER FERRARI"              "SCUDERIA TORO ROSSO FERRARI"

## [1] "RED BULL RACING RENAULT" "MCLAREN MERCEDES"

## [3] "FERRARI" "MERCEDES"

## [5] "RENAULT" "FORCE INDIA MERCEDES"

## [7] "SAUBER FERRARI" "SCUDERIA TORO ROSSO FERRARI"

The paste() function merges two or more strings using the sep parameter as separator. Inserting the result of a numerical operation into a string can be useful.

ret = 4 + 2 + 1 + 5 + 4 + 1 + 5 + 7 + 8 + 4
paste("Scuderia Toro Rosso Ferrari scored", ret, "points in 2011")

1 2	ret = 4 + 2 + 1 + 5 + 4 + 1 + 5 + 7 + 8 + 4 paste("Scuderia Toro Rosso Ferrari scored", ret, "points in 2011")

## [1] "Scuderia Toro Rosso Ferrari scored 41 points in 2011"

1	## [1] "Scuderia Toro Rosso Ferrari scored 41 points in 2011"

paste("Scuderia Toro Rosso score (2011)", ret, sep = ": ")

1	paste("Scuderia Toro Rosso score (2011)", ret, sep = ": ")

## [1] "Scuderia Toro Rosso score (2011): 41"

1	## [1] "Scuderia Toro Rosso score (2011): 41"

The substring() function extracts a part of the string. The extraction can be performed by defining the first and last parameter of the function which correspond to the first and last character to be extracted.

substring(carnames, first = 1, last = 3)

1	substring(carnames, first = 1, last = 3)

## [1] "Red" "McL" "Fer" "Mer" "Ren" "For" "Sau" "Scu"

1	## [1] "Red" "McL" "Fer" "Mer" "Ren" "For" "Sau" "Scu"

The strsplit() function divides a string into different elements. The split parameter of the function separates the string exactly where the defined element is situated. The string segments which have been created are saved in a vector which, in turn, is saved inside a list. To obtain a vector with substrings as components apply the unlist() function to the newly created list.

carlist = strsplit(carnames, " ")
head(carlist)

1 2	carlist = strsplit(carnames, " ") head(carlist)

## [[1]]
## [1] "Red"     "Bull"    "Racing"  "Renault"
## 
## [[2]]
## [1] "McLaren"  "Mercedes"
## 
## [[3]]
## [1] "Ferrari"
## 
## [[4]]
## [1] "Mercedes"
## 
## [[5]]
## [1] "Renault"
## 
## [[6]]
## [1] "Force"    "India"    "Mercedes"

## [[1]]

## [1] "Red" "Bull" "Racing" "Renault"

## [[2]]

## [1] "McLaren" "Mercedes"

## [[3]]

## [1] "Ferrari"

## [[4]]

## [1] "Mercedes"

## [[5]]

## [1] "Renault"

## [[6]]

## [1] "Force" "India" "Mercedes"

unlist(carlist)

1	unlist(carlist)

##  [1] "Red"      "Bull"     "Racing"   "Renault"  "McLaren"  "Mercedes"
##  [7] "Ferrari"  "Mercedes" "Renault"  "Force"    "India"    "Mercedes"
## [13] "Sauber"   "Ferrari"  "Scuderia" "Toro"     "Rosso"    "Ferrari"

## [1] "Red" "Bull" "Racing" "Renault" "McLaren" "Mercedes"

## [7] "Ferrari" "Mercedes" "Renault" "Force" "India" "Mercedes"

## [13] "Sauber" "Ferrari" "Scuderia" "Toro" "Rosso" "Ferrari"

The sub() function replaces a part of a string. The parameters of the sub() function are pattern, which contains the substring to be replaced, replacement which contains the substring which will replace the previous one and x which contains the string on which the replacement has to be performed.

x = "basic statistics course"
sub(pattern = "basic", replacement = "advanced", x = x)

1 2	x = "basic statistics course" sub(pattern = "basic", replacement = "advanced", x = x)

## [1] "advanced statistics course"

1	## [1] "advanced statistics course"

The sub() function stops after a replacement. If there is more than one substring to be replaced use the gsub() function. The following example removes the spaces contained in the string.

x = "Basic Statistics Course With R"
sub(pattern = " ", replacement = "", x = x)

1 2	x = "Basic Statistics Course With R" sub(pattern = " ", replacement = "", x = x)

## [1] "BasicStatistics Course With R"

1	## [1] "BasicStatistics Course With R"

gsub(pattern = " ", replacement = "", x = x)

1	gsub(pattern = " ", replacement = "", x = x)

## [1] "BasicStatisticsCourseWithR"

1	## [1] "BasicStatisticsCourseWithR"

The match() function enables the comparison of a character vector with a single string.

match(x = "McLaren Mercedes", table = carnames)

1	match(x = "McLaren Mercedes", table = carnames)

## [1] 2

## [1] 2

match(x = carnames, table = "McLaren Mercedes")

1	match(x = carnames, table = "McLaren Mercedes")

## [1] NA  1 NA NA NA NA NA NA

1	## [1] NA 1 NA NA NA NA NA NA

match(x = carnames, table = "McLaren Mercedes", nomatch = 0)

1	match(x = carnames, table = "McLaren Mercedes", nomatch = 0)

## [1] 0 1 0 0 0 0 0 0

1	## [1] 0 1 0 0 0 0 0 0

The match is the first match which appears in the vector. As a matter of fact, if the carnames vector of the above-mentioned example had another “McLaren Mercedes” element after position 2, the results would be the same.

The match() function does not recognize substrings. In this case, the pmatch() function is used. pmatch() should be used to search for partial matching at the begin of a string:

match(x = "Scuderia Toro", table = carnames)

1	match(x = "Scuderia Toro", table = carnames)

## [1] NA

## [1] NA

pmatch(x = "Scuderia Toro", table = carnames)

1	pmatch(x = "Scuderia Toro", table = carnames)

## [1] 8

## [1] 8

pmatch(x = "Toro Rosso", table = carnames)

1	pmatch(x = "Toro Rosso", table = carnames)

## [1] NA

## [1] NA

The result of the function is NA if the x substring appears more than once in the vector contained in table.

pmatch(x = "Re", table = carnames)

1	pmatch(x = "Re", table = carnames)

## [1] NA

## [1] NA

pmatch(x = "Red", table = carnames)

1	pmatch(x = "Red", table = carnames)

## [1] 1

## [1] 1

Use the grep() function to search for a pattern inside a string.

pos = grep(pattern = "Mercedes", x = carnames)
pos

1 2	pos = grep(pattern = "Mercedes", x = carnames) pos

## [1] 2 4 6

1	## [1] 2 4 6

carnames[pos]

1	carnames[pos]

## [1] "McLaren Mercedes"     "Mercedes"             "Force India Mercedes"

1	## [1] "McLaren Mercedes" "Mercedes" "Force India Mercedes"

Summary

In this chapter, we introduced the main mathematical, probabilistic and statistical functions within R. We manipulated strings using the character manipulation functions. In the next chapter, we’ll explore the graphical potentiality of R.

Chapter 6

Using R functions

Mathematical Functions

Probabilistic Functions

Statistical Functions

String Manipulation in R

Summary

Join us!

Courses calendar

We are part of

Categories

Archives

Chapter 6

Using R functions

Mathematical Functions

Probabilistic Functions

Statistical Functions

String Manipulation in R

Summary

Join us!

Courses calendar

We are part of

Categories

Archives

Tags