Grammar of Graphics

Jiratchaya Nuanpirom

What is Grammar of Graphics

 

  • Data: Your input data (in long format)

  • Aesthetics: what makes your data visible, e.g., size, line color, variables to plot, fill color, line type, transparency, etc.

  • Geometry: determines the type of plot.

  • Statistics: statistical transformation of continuous data

  • Facets: for splitting plot into subplots.

  • Coordinates: Numeric systems to limit, breakdown, transform position of geometry.

  • Themes: Overall visual of plots and customization.

Building a plot layer-by-layer

  1. Load data with ggplot()
# Load library
library(ggplot2)
# Define data and global aesthetics
ggplot(diamonds)

Building a plot layer-by-layer

  1. Add aesthetics by aes()
# Load library
library(ggplot2)
# Plot diamonds
ggplot(diamonds, aes(x = carat, y = price, color = cut))

Building a plot layer-by-layer

  1. Add geometry by geom()
# Load library
library(ggplot2)
# Define data and global aesthetics
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point(alpha = 0.8)

Building a plot layer-by-layer

  1. Add statistics
# Load library
library(ggplot2)
# Define data and global aesthetics
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point(alpha = 0.8) +
  stat_smooth(color = "black", linewidth = 0.8)

Building a plot layer-by-layer

  1. Add facets
# Load library
library(ggplot2)
# Define data and global aesthetics
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point(alpha = 0.8) +
  stat_smooth(color = "black", linewidth = 0.8) +
  facet_grid(cut ~ color)

Building a plot layer-by-layer

  1. Adding coordinates
# Load library
library(ggplot2)
# Define data and global aesthetics
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point(alpha = 0.8) +
  stat_smooth(color = "black", linewidth = 0.8) +
  facet_grid(cut ~ color) +
  scale_y_continuous(breaks = seq(from = 0, to = 20000, by = 10000))

Building a plot layer-by-layer

  1. Adding theme
# Load library
library(ggplot2)
# Define data and global aesthetics
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point(alpha = 0.8) +
  stat_smooth(color = "black", linewidth = 0.8) +
  facet_grid(cut ~ color) +
  scale_y_continuous(breaks = seq(from = 0, to = 20000, by = 10000)) +
  theme_bw()

Aesthetics

  • Aesthetics aes() describe how variables map to visual properties or aesthetics.

  • The position of data points are described by values from x and y

  • shape, size, or color styles can also be specified in aes().

Commonly used aesthetics in data visualization: position, shape, size, color, line width, line type. Figure from Wilke (2019)

Types of Variables Used in Aesthetics

Continuous variables

  • A variable or a set of value you can measure.

  • Continuous data values are values you can arbitrarily fine intermediates.

  • Age, height, BMI, date, assignment score, etc.

  • Sometimes, series of continuous variable can be a discrete variable.

my_seq <- seq(from = 0, to = 10, by = 1)
class(my_seq)
sum(my_seq)
count(my_seq)

my_seq2 <- as.character(my_seq)
class(my_seq2)
sum(my_seq2)
count(my_seq2)

Discrete variables

  • A variable or a set of value you can count.

  • Grade (A B C D), Name, Type, number of person in a room, etc.

  • Sometimes a vector of a discrete variable can be classified into a kind of ordinal number. That may required before time-series plotting.

my_var <- c("0_hpi", "12hpi", "Control", "48_hpi", "24_hpi")
my_var

my_var_2 <- factor(my_var, 
                   levels = c("Control", "0_hpi", "12hpi",
                              "24_hpi", "48_hpi"))
my_var_2

Geoms

Frequently used geoms (Explore more plot in R Graph Gallery: https://r-graph-gallery.com)

Figure 1: geom_histogram()

Figure 2: geom_bar()

Figure 3: geom_line() with geom_point()

Figure 4: geom_boxplot() with geom_jitter()

Figure 5: geom_violin()

Figure 6: geom_density()

Figure 7: geom_point() with geom_smooth()

Figure 8: geom_tile()

Position scales and axes

Numeric position scales

  • Limit
ggplot(airquality, aes(x = Wind, y = Temp)) +
  geom_point() +
  scale_x_continuous(limits = c(0,15)) +
  scale_y_continuous(limits = c(60,80))

  • Breaks
ggplot(airquality,aes(x = Wind, y = Temp)) +
  geom_point() +
  scale_x_continuous(breaks = seq(from = 0, to = 20, by = 2)) +
  scale_y_continuous(breaks = seq(from = 0, to = 100, by = 5))

Position scales and axes

Numeric position scales (2)

  • Expand
ggplot(airquality,
       aes(x = Temp, 
           group = as.factor(Month), 
           fill = as.factor(Month))) +
  geom_density(alpha = 0.6) +
  scale_y_continuous(expand = c(0,0)) +
  scale_x_continuous(expand = c(0,0))

  • log transformation
ggplot(diamonds,
       aes(x = carat, y = price, color = cut)) +
  geom_point(alpha = 0.6) +
  scale_y_log10()

Position scales and axes

Date-time position scale

ggplot(economics, aes(x = date, y = psavert)) + 
  geom_line(na.rm = TRUE) +
  scale_x_date(date_breaks = "15 years")

lim <- as.Date(c("2004-01-01", "2005-01-01"))
ggplot(economics, aes(x = date, y = psavert)) + 
  geom_line(na.rm = TRUE) +
  scale_x_date(limits = lim, date_labels = "%B\n%Y")

Date scales behave like numeric scales, it’s ordinal, but is often more convenient to use the date_labels argument with the predefined formats. More available formatting strings: https://ggplot2-book.org/scales-position.html#sec-date-labels.

Position scales and axes

Binned position scales

ggplot(airquality, aes(x = Month, y = Ozone, color = Ozone)) +
  geom_count(na.rm = TRUE) +
  scale_y_binned(n.breaks = 10)

Color scales and legends

Color blindness

Available color palettes from package colorBlindness.

# Load package
library(colorBlindness)
displayAvailablePalette(color="white")

More information on R colorBlindness package: https://cran.r-project.org/web/packages/colorBlindness/vignettes/colorBlindness.html

Color scales and legends

Continuous color scales: viridis color palettes

erupt <- ggplot(faithfuld, aes(waiting, eruptions, fill = density)) +
  geom_raster() + scale_x_continuous(NULL, expand = c(0, 0)) + scale_y_continuous(NULL, expand = c(0, 0))
# Plot
erupt
erupt + scale_fill_viridis_c(option = "viridis")
erupt + scale_fill_viridis_c(option = "magma")
erupt + scale_fill_viridis_c(option = "plasma")
erupt + scale_fill_viridis_c(option = "rocket")
erupt + scale_fill_viridis_c(option = "turbo")

default continuous palette

viridis - viridis

viridis - magma

viridis - plasma

viridis - rocket

viridis - turbo

Color scales and legends

Continuous color scales: distiller color palettes

erupt + scale_fill_distiller(palette = "RdBu")
erupt + scale_fill_distiller(palette = "Pastel1")
erupt + scale_fill_distiller(palette = "OrRd")

distiller - Diverging

distiller - Qualitative

distiller - Sequential

The distiller scales applied brewer color palettes by by smoothly interpolating 7 colors from any palette to a continuous scale. For more brewer color palettes, see https://colorbrewer2.org.

Color scales and legends

Continuous color scales: ggsci color palettes

library(ggsci)
dt_hm <- scale(as.matrix(mtcars)[1:10, ], center = TRUE, scale = TRUE)
p_hm <- as.data.frame(dt_hm) %>% rownames_to_column(var = "cars") %>% 
  pivot_longer(!cars) %>%
  ggplot(aes(x = name, y = cars, fill = value)) +
  geom_tile(color = "black") +
  coord_equal() +
  labs(x=NULL, y = NULL) +
  theme(legend.position = "none",
        axis.text.x = element_blank())

p_hm
p_hm + scale_fill_gsea()
p_hm + scale_fill_material("yellow")
p_hm + scale_fill_material("grey")

Default heatmap color

ggsci - GSEA

ggsci - material (red)

ggsci - material (grey)

Discover more continuous ggsci color palette: https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html

Color scales and legends

More Continuous color scales: paletteer color palettes

erupt + scale_fill_paletteer_c("ggthemes::Green-Blue Diverging")
erupt + scale_fill_paletteer_c("ggthemes::Red-Blue-White Diverging")
erupt + scale_fill_paletteer_c("ggthemes::Temperature Diverging")
erupt + scale_fill_paletteer_c("grDevices::rainbow")
erupt + scale_fill_paletteer_c("grDevices::heat.colors")
erupt + scale_fill_paletteer_c("grDevices::Viridis")

ggthemes::Green-Blue Diverging

ggthemes::Red-Blue-White Diverging

ggthemes::Temperature Diverging

grDevices::rainbow

grDevices::heat.colors

grDevices::Viridis

More continuous paletteer color palettes can be found at: https://pmassicotte.github.io/paletteer_gallery.

Color scales and legends

Discrete color scales: default palette

df <- data.frame(x = c("a", "b", "c", "d","e"), y = c(3, 4, 1, 2,5))
bars <- ggplot(df, aes(x, y, fill = x)) + 
  geom_bar(stat = "identity", color = "black") + 
  labs(x = NULL, y = NULL) +
  theme(legend.position = "none")

bars
bars + scale_fill_hue()

Color scales and legends

Discrete color scales: RColorBrewer palettes

bars + scale_fill_brewer(palette = "BrBG")
bars + scale_fill_brewer(palette = "RdYlGn")
bars + scale_fill_brewer(palette = "Dark2")

Diverging - BrBG

Diverging - RdYlGn

Sequential - Dark2

Interactive RColorBrewer picker: https://colorbrewer2.org

Color scales and legends

Discrete color scales: ggsci palettes

ggsci offers high-quality color palettes based on color schemes used in scientific journals, data visualization libraries, and science fiction movies.

bars + scale_fill_aaas()
bars + scale_fill_npg()
bars + scale_fill_nejm()
bars + scale_fill_frontiers()
bars + scale_fill_rickandmorty()
bars + scale_fill_flatui()
bars + scale_fill_startrek()
bars + scale_fill_simpsons()

Inspired by Science

Inspired by Nature

Inspired by NEJM

Inspired by Frontiers

Inspired by Rick & Morty

Inspired by Flat UI design

Inspired by Startrek

Inspired by The Simpsons

Color scales and legends

More discrete color scales from paletteer

bars + scale_fill_paletteer_d("awtools::bpalette")
bars + scale_fill_paletteer_d("basetheme::ink")
bars + scale_fill_paletteer_d("calecopal::kelp1")
bars + scale_fill_paletteer_d("fishualize::Centropyge_loricula")

awtools::bpalette

basetheme::ink

calecopal::kelp1

fishualize::Centropyge_loricula

Interactive discrete paletteer color palette: https://emilhvitfeldt.github.io/r-color-palettes/discrete.html

Color scales and legends

Manual discrete color scale

bars + 
  scale_fill_manual(values = c("sienna1", "sienna4", 
                               "hotpink1", "hotpink4", "salmon"))
bars + scale_fill_manual(values = c("a" = "#C62828", "b" = "#9C27B0",
                                    "c" = "#2196F3", "d" = "#4CAF50",
                                    "e" = "#FF9800"))

Color scales and legends

Alpha

The alpha scale maps shade transparency to a numerical value.

ggplot(faithfuld, aes(waiting, eruptions, alpha = density)) +
  geom_raster(fill = "maroon") +
  scale_x_continuous(expand = c(0, 0)) + 
  scale_y_continuous(expand = c(0, 0))
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point(size = 4, alpha = 0.3, color = "blue")

Color scales and legends

Legend positions

bars + theme(legend.position = "left")
bars + theme(legend.position = "right")
bars + theme(legend.position = "top")
bars + theme(legend.position = "bottom")
bars + theme(legend.position = "none")

References

Wilke, Claus O. 2019. Fundamentals of Data Visualization. https://clauswilke.com/dataviz/.