Tidyverse is a collection of R (programming language) packages which were designed to work together. This makes for a nice and consistent UX. Three of the big packages are dplyr (pronounced d-plier like pliers the tool; d stands for data), tidyr, and ggplot2.
dplyr
allows for slicing and dicing data frames nicely. It has SQL-like
capabilities and some of the functions are named similarly.
select
allows picking which columns of the data frame to operate onfilter
removes some items from the set like a sqlWHERE
clausearrange
does sortingmutate
allows creating new variables from otherssummarize
function (alsosummarise
) is like a reducer to get down to a single valueacross
lets you apply a function over multiple columns
It will also let you join together data with left, right, full and inner joins.
tidyr
lets you tidy data. It provides facilities to go from a wide data format
to a long one (pivot_longer
) and vice versa (pivot_wider
) which is helpful for
different visualization tools.It also lets you clean up underlying data types
with things like fill
, drop_na
and replace_na
.
ggplot2
is apparently one of the best data visualization libraries in any
programming language. It’s based off of Leland Wilkinson’s Grammer of
Graphics. In his paper on it, ggplot2
author Hadley Wickham said:
A grammar of graphics is a tool that enables us to concisely describe the components of a graphic. Such a grammar allows us to move beyond named graphics (e.g., the “scat- terplot”) and gain insight into the deep structure that underlies statistical graphics.
The approach uses coordinate systems and different layers built up from different elements. This allows for a variety of benefits. The layers provide modularity. You can compose them together, which makes for easy to build plots incrementally. The layers also provide a consistent interface to perform operations against.