R with Database and Big Data

 

This course presents the latest techniques to work with big data within the R environment. This means manipulating, analyzing, visualizing big data structures that exceed the single computer capacity in a true R style. The large amount of data available nowadays is a tangled and hidden source of knowledge: being able to quickly and effectively unravel high value information from the vastness of data is the most powerful driver for success in this modern competitive market.

Audience

This course is suitable for those that already use R. No previous knowledge of big data technology is required.

Attendees

6 attendees max.

Course organization

During the first day you will focus on accessing and manipulating databases. First you will be given an introduction to databases and you will be given the details for connecting with them through R. The tools for manipulating data with are then provided. More specifically, you will focus on the tools provided in the tidyverse such as dplyr and tidyr.

The second day is dedicated to distributed infrastructure. Again, after an introduction to distributed systems, such as Spark and Hadoop, you will learn how to deal with them through the tools provided by R. You will also learn about the sparkML libraries for out of memory data modeling and ad hoc techniques for big data visualization.

Outline

  • Introduction to databases
  • Connecting databases through R: ODBC and RSQLite
  • Data manipulation with dplyr
  • Using dplyr with databases
  • Introduction to distributed infrastructure
  • Spark and Hadoop
  • Sparklyr
  • Distributed data manipulation with dplyr
  • Distributed machine-learning with SparkML
  • Data visualization for big data

Cost

The cost of a 2 day course is 800 + VAT per person, which includes lunch, comprehensive course materials plus 1 hour of individual online post course support for each student within 30 days from course date.

Discounts

We offer an academic discount for those engaged in full time studies or research and for private attendees. For them the cost of a 2-day course is 500 + VAT.

Date

Date to be announced.

Location

Via Vitruvio, 1
20124 Milano
Italy

Teachers

Andrea Spanò
Andrea Spanò is an Rstudio certificated instructor who has worked as an R trainer and consultant for over 20 years.  He runs Quantide consulting firm and teaches at Luiss University post grad course on Big Data Management

Andrea Melloncelli
Andrea is graduated in Physics. He has a solid experience in R, C, C++ and Python programming and development, along with extensive skills in Unix system management, IT automation tools, cloud technologies and big-data platforms, such as Hadoop & Spark.