Tables with R
The istat
data set contains gender, geographical area, weight and height for 1806 people. The frequency table for the gender or the geographical area can be obtained using the table()
function.
1 2 |
load("istat.Rda") head(istat) |
1 2 3 4 5 6 7 |
## Gender Area Weight Height ## 1 Female Nord 64 172 ## 2 Female Nord 57 167 ## 3 Male Nord 68 175 ## 4 Male Nord 64 173 ## 5 Male Nord 70 172 ## 6 Male Nord 60 170 |
1 |
table(istat$Gender) |
1 2 3 |
## ## Female Male ## 908 898 |
1 |
table(istat$Area) |
1 2 3 |
## ## Centro Isole Nord Sud ## 388 214 738 466 |
A contingency table of the gender and geographical area can be built.
1 |
table(istat$Gender, istat$Area) |
1 2 3 4 |
## ## Centro Isole Nord Sud ## Female 202 100 366 240 ## Male 186 114 372 226 |
The airquality
data set contains daily air quality measurements in New York from May to September 1973. How many days the Ozone measurements was greater than 180 ppb for each month? Ozone measurements contains NA values.
1 |
table(airquality$Ozone > 80, airquality$Month) |
1 2 3 4 |
## ## 5 6 7 8 9 ## FALSE 25 9 20 19 27 ## TRUE 1 0 6 7 2 |
The above table doesn’t show missing values. The useNA
argument allows to manage NA
values. Its default value is “no
”. “ifany
” shows missing values when present. “always
” shows a NA
level also when there are not missing valuse.
1 |
table(airquality$Ozone > 80, airquality$Month, useNA = "ifany") |
1 2 3 4 5 |
## ## 5 6 7 8 9 ## FALSE 25 9 20 19 27 ## TRUE 1 0 6 7 2 ## <NA> 5 21 5 5 1 |
1 |
table(airquality$Ozone > 80, airquality$Month, useNA = "always") |
1 2 3 4 5 |
## ## 5 6 7 8 9 <NA> ## FALSE 25 9 20 19 27 0 ## TRUE 1 0 6 7 2 0 ## <NA> 5 21 5 5 1 0 |
To get relative frequencies, the prop.table()
function can be used. Its input are: a table, returned from table()
function, and the index to generate margin for.
1 2 |
tab = table(airquality$Ozone > 80, airquality$Month, useNA = "ifany") prop.table(tab, 1) |
1 2 3 4 5 |
## ## 5 6 7 8 9 ## FALSE 0.25000 0.09000 0.20000 0.19000 0.27000 ## TRUE 0.06250 0.00000 0.37500 0.43750 0.12500 ## <NA> 0.13514 0.56757 0.13514 0.13514 0.02703 |
1 |
prop.table(tab, 2) |
1 2 3 4 5 |
## ## 5 6 7 8 9 ## FALSE 0.80645 0.30000 0.64516 0.61290 0.90000 ## TRUE 0.03226 0.00000 0.19355 0.22581 0.06667 ## <NA> 0.16129 0.70000 0.16129 0.16129 0.03333 |
The margin.table()
function returns the sum by row or by column of values of the table.
1 |
margin.table(tab, 1) |
1 2 3 |
## ## FALSE TRUE <NA> ## 100 16 37 |
1 |
margin.table(tab, 2) |
1 2 3 |
## ## 5 6 7 8 9 ## 31 30 31 31 30 |
Margins on a table can be added using the addmargins()
function.
1 |
addmargins(tab) |
1 2 3 4 5 6 |
## ## 5 6 7 8 9 Sum ## FALSE 25 9 20 19 27 100 ## TRUE 1 0 6 7 2 16 ## <NA> 5 21 5 5 1 37 ## Sum 31 30 31 31 30 153 |
addmargins()
allows to add any function. The code below adds the mean.
1 |
addmargins(tab, FUN = mean) |
1 2 3 4 |
## Margins computed over dimensions ## in the following order: ## 1: ## 2: |
1 2 3 4 5 6 |
## ## 5 6 7 8 9 mean ## FALSE 25.00 9.00 20.00 19.00 27.00 20.00 ## TRUE 1.00 0.00 6.00 7.00 2.00 3.20 ## <NA> 5.00 21.00 5.00 5.00 1.00 7.40 ## mean 10.33 10.00 10.33 10.33 10.00 10.20 |