Tidyverse is a collection of R (programming language) packages which were designed to work together. This makes for a nice and consistent UX. Three of the big packages are dplyr (pronounced d-plier like pliers the tool; d stands for data), tidyr, and ggplot2.

dplyr allows for slicing and dicing data frames nicely. It has SQL-like capabilities and some of the functions are named similarly.

  • select allows picking which columns of the data frame to operate on
  • filter removes some items from the set like a sql WHERE clause
  • arrange does sorting
  • mutate allows creating new variables from others
  • summarize function (also summarise ) is like a reducer to get down to a single value
  • across lets you apply a function over multiple columns

It will also let you join together data with left, right, full and inner joins.

tidyr lets you tidy data. It provides facilities to go from a wide data format to a long one (pivot_longer) and vice versa (pivot_wider) which is helpful for different visualization tools.It also lets you clean up underlying data types with things like fill, drop_na and replace_na.

ggplot2 is apparently one of the best data visualization libraries in any programming language. It’s based off of Leland Wilkinson’s Grammer of Graphics. In his paper on it, ggplot2 author Hadley Wickham said:

A grammar of graphics is a tool that enables us to concisely describe the components of a graphic. Such a grammar allows us to move beyond named graphics (e.g., the “scat- terplot”) and gain insight into the deep structure that underlies statistical graphics.

The approach uses coordinate systems and different layers built up from different elements. This allows for a variety of benefits. The layers provide modularity. You can compose them together, which makes for easy to build plots incrementally. The layers also provide a consistent interface to perform operations against.