[Here you can see the Building views with R cheat sheet at a full resolution]
Queries
In database theory a query is a request for data or information from a database table or combination of tables.
Since dplyr
we have something that quite closely conceptually resembles a query in R
:
1 |
require(dplyr) |
1 |
## Warning: package 'dplyr' was built under R version 3.2.5 |
1 |
require(pryr) |
1 2 3 4 |
mtcars %>% tbl_df() %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg)) |
1 2 3 4 5 6 |
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 26.66364 4.509828 ## 2 6 19.74286 1.453567 ## 3 8 15.10000 2.560048 |
I particularly appreciate of dplyr
the possibility of building my query as a step by step set of R
statement that I can progressively test at each step.
Views
Again in database theory, a view is the result set of a stored query on the data, which the database users can query just as they would in a table.
I would like to have something similar to a view in R
As far as I know, I can achieve this goal in three ways:
- Function
makeActiveBinding
- Operator
%>a%
from packagepryr
- My proposed `%>>% operator
Function makeActiveBinding()
Function makeActiveBinding(sym, fun, env)
installs a function in an environment env
so that getting the value of sym
calls fun
with no arguments.
As a basic example I can actively bind a function that simulates a dice to an object named dice
:
1 |
makeActiveBinding("dice", function() sample(1:6, 1), env = globalenv()) |
so that:
1 |
replicate(5 , dice) |
1 |
## [1] 5 1 6 2 3 |
Similarly, I can wrap adplyr
expression into a function:
1 2 3 |
f <- function() {mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))} |
and then actively bind it to a symbol:
1 |
makeActiveBinding('view', f , env = globalenv()) |
so that, any time we call view
the result of function f()
is computed again:
1 |
view |
1 2 3 4 5 6 |
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 26.66364 4.509828 ## 2 6 19.74286 1.453567 ## 3 8 15.10000 2.560048 |
As a result, if I change any value of mpg
within mtcars
, view
is automatically updated:
1 2 |
mtcars$mpg[c(1,3,5)] <- 0 view |
1 2 3 4 5 6 |
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 24.59091 9.231192 ## 2 6 16.74286 7.504189 ## 3 8 13.76429 4.601606 |
Clearly, I have to admit that all of this looks quite unfriendly, at least to me.
Operator %<a-%
A valid alternative, that wraps away the complexity of function makeActiveBinding()
is provided by operator %<a-%
from package pryr
:
1 2 3 |
view %<a-% {mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))} |
Again, if I change any value of mpg
within mtcars
, the value of view
get automatically updated:
1 2 |
mtcars$mpg[c(1,3,5)] <- 50 view |
1 2 3 4 5 6 |
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 29.13636 8.159568 ## 2 6 23.88571 11.593451 ## 3 8 17.33571 9.688503 |
Note that in this case I have to enclose the whole expression within curly brackets.
Moreover, the final assignment: %<a-%
goes on the left hand side of my chain of dplyr
statements.
Operator %>>%
Finally I would like to propose a third alternative, still based on makeActiveBinding()
, that I named %>>%
1 2 3 4 5 6 7 8 |
`%>>%` <- function( expr, x) { x <- substitute(x) call <- match.call()[-1] fun <- function() {NULL} body(fun) <- call$expr makeActiveBinding(sym = deparse(x), fun = fun, env = parent.frame()) invisible(NULL) } |
that can be used as:
1 2 3 4 |
mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg)) %>>% view |
And again, if I change the values of mpg
:
1 |
mtcars$mpg[c(1,3,5)] <- 100 |
The content of view
changes accordingly
1 |
view |
1 2 3 4 5 6 |
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 33.68182 22.41624 ## 2 6 31.02857 30.44321 ## 3 8 20.90714 22.88454 |
I believe this operator offers two advantages:
- Avoids the usage of curly brackets around my
dplyr
expression - Allows me to actively assign the result of my chain of
dplyr
statements, in a more natural way at the end of the chain
This is great! Thanks for sharing!
See also https://cran.r-project.org/web/packages/refset/index.html
Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
data science training