R ggplot cookbook


Overview

About this document

This document is cookbook for R ggplot2 and ggplot2 add-on packages (such as cowplot) for recipes that I’ve developed and those I’ve had to repeatedly look up.

A couple notes about the code examples:

  • All examples assume the ggplot2 library has been imported
  • At the end of each example subsection, I list the versions of R, ggplot2 and all other packages used to create each example at the time of writing
  • In the code examples, p or any p* (e.g. p1, p2, pall) variable is a ggplot2 object

This document will be updated periodically.

Table of contents

Click to expand table of contents

{% set table_of_contents = toc(block('toc_content'), 2, 4) %} {{ table_of_contents }}


Last updated: 2021-08-19


Create a plot

Create a ribbon plots (fill between between two lines)

Ribbon geom_ribbon() plots can also be used to fill between two lines or point/scatter plots

# Data must be in wider (spread) format

ggplot(plotdata, aes(xvar)) +
  # Providing 'color' automatically creates a label
  geom_line(aes(y = yvar1, color = "Label 1"), size = 0.75) +
  geom_line(aes(y = yvar2, color = "Label 2"), size = 0.75) + 

  # 'geom_ribbon' fills between a 'ymin' and 'ymax' value. Adding
  # a transparency (alpha) value makes the plot look better.
  geom_ribbon(aes(ymin = yvar1, ymax = yvar2), alpha = 0.20) +

⇧ Back to TOCR: 3.6.2 ◊ ggplot: 3.2.1

Create a correlation plot

Use ggplot2 extension GGally. Package function ggpairs can produce a variety of matrix-type plots between variables.

fpp3::us_change %>%
  GGally::ggpairs(columns = 2:6)

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2 ◊ GGally: 2.0.0

Create a histogram with an superimposed density curve

Both the histogram and the density curve are normalized to unit area. cvar is the continuous or count variable being histogrammed.

stat(density) and stat(count) are special statistical transformations of the original data. ..density.. is a special statistical transformation of the original data. It’s equivalent to stat(count).

  • ..density..: Transform from count to density
  • ..count..: Transform from density to count

Example 1: Standard overlay (density curve is scaled to the count data)

stat(count) is now preferred to ..count..

ggplot(plotdata, aes(x = cvar)) +
  geom_histogram(binwidth = 2, color="white") +
  geom_density(aes(y = stat(count)), color="dodgerblue", size = 1) +
  #geom_density(aes(y = ..count..), color="dodgerblue", size = 1) +
  theme_linedraw()

Example 2: Density overlay (count data is scaled to unit area)

library(scales)

ggplot(plotdata, aes(cvar)) +

  geom_histogram(aes(y = ..density..), binwidth = 2, color="white") +
  geom_density(color="dodgerblue", size = 1) +

  # Convert the y labels from fractional numbers to percent.
  # Uses the scales package.
  scale_y_continuous(labels = percent) +
  theme_linedraw()

⇧ Back to TOCR: 4.0.5 ◊ ggplot2: 3.3.3

Create a boxplot

A boxplot compares distributions of different groups of numerical data by plotting their quartiles.

Create a boxplot

Compare the distribution of numerical variable numvar across groups defined by categorical variable carvar.

ggplot(plotdata, aes(x = catvar, y=numvar)) +
  geom_boxplot()

⇧ Back to TOCR: 4.1.1 ◊ ggplot2: 3.3.5

Create a boxplot with added mean values

As above, but with the addition of point symbols indicating the mean value of each distribution. Box plots, by definition, only have an indicator for the median value (Q_2 or 50th percentile).

ggplot(plotdata, aes(x = catvar, y=numvar)) +
  geom_boxplot() +
  # Add a point for the mean value. Give it a different color to distinguish it
  # from the box plot.
  stat_summary(fun = mean, geom = "point", color="red")

⇧ Back to TOCR: 4.1.1 ◊ ggplot2: 3.3.5

Create a grid of boxplots

As above, but create a grid of boxplots with respect to a second categorical variable catvar2.

ggplot(plotdata, aes(x = catvar1, y=numvar)) +
  geom_boxplot() +
  # Optional addition of the mean value
  stat_summary(fun = mean, geom = "point", color="red") + 
  # facet_wrap creates the grid of plots with respect to catvar2
  facet_wrap(~catvar2)

⇧ Back to TOCR: 4.1.1 ◊ ggplot2: 3.3.5

Create a ridgeline plot

A ridgeline plot, also known as a joyplot, compares the distribution of a numerical variables for different groups by through arranging a sequence of density plots. The collection of density plots looks like a mountain ridgeline. Ridgeline plots work well when the number of groups is medium to large.

Create a ridgeline plot grouped by week

Each group represents a binned time period (binvar). This type of plot can show how the distribution of a numerical variable evolves in time.

library(ggridges)

# Bin by datevar. datevar could be date, week, month, etc.
ggplot(plotdata, aes(x=numvar, y = binvar, group=binvar)) +
    ggridges::geom_density_ridges()

⇧ Back to TOCR: 4.1.1 ◊ ggplot2: 3.3.5 ◊ ggridges: 0.5.3

Additional parameters I often pass to geom_density_ridges() go improve their appearance:

  • alpha: Transparency of each individual density plot. Because each ridgeline can overlap if the number of ridgeline is medium to large, I set an alpha value (e.g. 0.6).
  • scale: Scale the heights of each individual ridgeline by factor scale.
  • size:” Set line thickness
  • rel_min_height: Remove trailing tails by setting a percentage cutoff relative to the height point of any density curve

Create a cumulative ridgeline plot grouped by week

In a normal density plot or histogram, the height of the distribution at a given numerical value is proportional to the number of observations at that value. In a cumulative ridgeline plot, however, the standard method won’t work. Each sequential density plot contains the data of all previous time periods plus the current time period. The heights of each sequential density plot would increase.

Instead, we normalize each density plot to the same height by manually supplying the heights.

The general algorithm1:

flowchart TB algorithmStart((Start))--> calcStartEnd("Calculate start/end weeks") calcStartEnd-->forLoop{For\nweek=start:end} forLoop--> CumulativeHistogram subgraph CumulativeHistogram gatherData(Build cumulative dataset)-->histogram(Calculate histogram) histogram-->normHistogram(Normalize histogram height to 1) normHistogram-->extractValues(Extract histogram values) end CumulativeHistogram-->lastWeek{week=wEnd?} lastWeek --> |No| CumulativeHistogram lastWeek --> |Yes| plotRidgeline(Plot ridgeline plot) plotRidgeline-->algorithmEnd((End))

And the associated code:

# Display cutoff value of the numerical value
maxdays <- 400

startweek <- min(plotdata$week)
endweek <- max(plotdata$week)cxis

# Create a single cumulative histogram using data from start week to the
# passed week
getCumulativeDensityHistogram <- function(week){

  # Pull all the days-to-complete data for the cumulative time slice of
  # interest
  weekseq <- seq(startweek, week, by='weeks')
  cumdata <- plotdata %>% 
      filter(week %in% weekseq) %>% 
      pull(days_to_complete)

  # Histogram the cumulative data. Chop off values that exceed the plot
  # range (0 to maxdays) to prevent `hist()` from throwing an error.
  histdata <- hist(cumdata[cumdata >=0 & cumdata < maxdays],
                   breaks = seq(0,maxdays,1), plot=FALSE)

  # Extrac the histogrma data. Normalize the histogram heights to 1
  histx <- histdata$breaks[1:400]
  histy <- histdata$density[1:400] / max(histdata$density)

  # Put the data into a datframe
  df <- data.frame(x = histx, y = histy) %>%
    mutate(week = week)

  return(df)
}

# Process all weeks. For loop is done as a map.
# Complete dataframe of cumulative data
plotdata_cumulative <- do.call("rbind",
    # List of per-week histogram data
    lapply(
      seq(startweek, endweek, by='weeks'),
      getCumulativeDensityHistogram
    )
  ) %>%
  arrange(week,x) %>%
  select(week,x,y)

# Plot the data. We've directly supplied the histogram height at each value
# of x, so set aesthetics correctly and use stat="identity".
p <- ggplot(plotdata_cumulative,
            aes(x = x, y = week, height=y, group=week)) +
        ggridges::geom_density_ridges(
            stat="identity", alpha=0.5, scale=5, size=0.25,
            rel_min_height = 1e-2)

⇧ Back to TOCR: 4.1.1 ◊ ggplot2: 3.3.5 ◊ ggridges: 0.5.3


Change axes limits and labels

Add x,y axis labels

# Using labs
p + labs(x = "x label",
         y = "y label")

# Using xlab and ylab
p + xlab("x label") + ylab("y label")

⇧ Back to TOCR: 3.6.2 ◊ ggplot2: 3.2.1

Rotate x or y axis tick labels

Method 1 (preferred): Use guide_axis() function

The guide_axis() was introduced in ggplot version 3.3. It manually positions each tick label so that they abuts each tick without having provide adjustment parameters.

p + scale_x_discrete(guide = guide_axis(angle = 45))

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2

Method 2 (outdated): Modify theme element axis.text.x

P + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2

Remove x or y axis labels

Remove labels but keep the space that the axis labels took up

p + labs(x = "",
         y = "")

Remove labels and remove the space that the axis labels took up

p + labs(x = NULL,
         y = NULL)

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2

Remove x or y axis tick labels

p + theme(axis.text.x = element_blank(),
          axis.text.y = element_blank())

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2

Put an axis tick labels into scientific notation

Method 1 (preferred): Use the scales package

Use the scales package. The scales package has multiple functions for scaling axes, formatting labels, and determining breaks.

library(scales)

# Continuous y-axis (even applies to histograms, other count data)
p + scale_y_continuous(labels = scales::label_number_si())

⇧ Back to TOCR: 4.0.4 ◊ ggplot: 3.3.3 ◊ scales: 1.1.1

Method 2: Define a SI formatting function

Provide a labels function to scale_y_continuous() that converts the default labels to scientific SI notation.

The format_si() function is based on code by Ben Tupper. See https://stat.ethz.ch/pipermail/r-help/2012-January/299804.html.

format_si <- function(...) {

  function(x) {
    limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12,
                1e-9,  1e-6,  1e-3,  1e0,   1e3,
                1e6,   1e9,   1e12,  1e15,  1e18,
                1e21,  1e24,  1e27,  1e30,  1e33)
    prefix <- c("y",   "z",   "a",   "f",   "p",
                "n",   "µ",   "m",   " ",   "k",
                "M",   "G",   "T",   "P",   "E",
                "Z",   "Y",   "kY",  "MY",  "GY")

    # Vector with array indices according to position in intervals
    i <- findInterval(abs(x), limits)

    # Set prefix to " " for very small values < 1e-24
    i <- ifelse(i==0, which(limits == 1e0), i)

    paste(format(round(x/limits[i], 1),
                 trim=TRUE, scientific=FALSE, ...),
          prefix[i])
  }
}

p + scale_y_continuous(labels = format_si())

⇧ Back to TOCR: 3.6.2 ◊ ggplot: 3.2.1

Put y axis into percentage format

p + scale_y_continuous(labels = scales::percent)

⇧ Back to TOCR: 3.6.2 ◊ ggplot2: 3.2.1

Make y-axis origin start at zero

Method 1 (preferred): Use expand_limits()

This method correctly expands the y-axis to start at \(y = 0\).

p + expand_limits(y=0)

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2

Method 2: Use scale_y_continuous()

This method works but I feel like it shifts the scale to start at zero rather than expands the scale to start at zero. I prefer Method 1.

p + scale_y_continuous(expand = c(0, 0), limits = c(0, NA))

The same logic applies to the x-axis using scale_x_continuous

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2


Add, remove, or modify a plot legend

Add a legend to a single plot

Legend are automatically created if color (also spelled as colour) is assigned to the categorical or group variable. This requires data to be in the longer (gathered) format.

ggplot(plotdata, aes(xvar,yvar,color=groupvar)) +
  geom_line()

Alternatively, if the data is in wider (spread) format, color can be assigned within each individual aesthetic with the legend label of choice.

# Data must be in wider (spread) format.
# Manually assign labels through the 'color' variable.

ggplot(plotdata, aes(x = xvar))+
  geom_line(aes(y = yvar1, color = "Label 1")) +
  geom_line(aes(y = yvar2, color = "Label 2"))

⇧ Back to TOCR: 3.6.2 ◊ ggplot: 3.2.1

Add a common legend to multiple plots using cowplot

Recipe assumes that all plots share the same legend

Steps:

  1. Create individual plots and suppress their legends (removed using theme(legend.position = "none"))
  2. Create a common legend by taking the legend details from one of the plots. Each individual plot should have the same legend therefore it doesn’t matter which one we grab. Change the parameters to make it visible and put in the shape desired (e.g. single row for plot bottom)
  3. Create a cowplot::plot_grid for just the plots
  4. Create a new cowplot::plot_grid with the first plot grid and the legend
# Suppress legends for the individual plots
p1 <- p1 + theme(legend.position = "none")
p2 <- p2 + theme(legend.position = "none")
p3 <- p3 + theme(legend.position = "none")
p4 <- p4 + theme(legend.position = "none")

# Extract details for the common legend from any one of the individual plots
legend <- get_legend(
  p1 + 
    # Make the legend visible. Bottom alignment.
    theme(legend.position = "bottom") +
    guides(color = guide_legend(nrow = 1))
)

# Combine the four individual plots into a single combination plot
pall <- cowplot::plot_grid(p1, p2, p3, p4)

# Join the legend and the combination plot into a single, final plot.
# The `rel_heights` parameter gives the relative height proportion each
# component (combination plot, legend) should take of the final plot
# realestate.
plot_grid(pall, all, ncol = 1, rel_heights = c(1, .1))

⇧ Back to TOCR: 4.0.2 ◊ ggplot2: 3.3.2 ◊ cowplot: 1.0.0

Overwrite or change automatic legend labels

Overwrite the labels. The labels must be provided in the same alphabetical order as the default legend.

Method 1: Use scale_color_discrete()

p + scale_color_discrete(labels = c("label1", "label2", "label3"))

Method 2 (for point or line plot): Use scale_color_manual()

p + scale_color_manual(labels = c("label1", "label2", "label3"))

Method 2 is required when we also need to also change the color palette, e.g.

p + scale_color_manual(labels = c("label1", "label2", "label3"),
                       values = cbPalette)

We can’t use both scale_color_discrete() and scale_color_manual() The two methods compete with eachother.

Method 3 (for fill-type plots): Use scale_color_manual()

As Method 2, but for fill-type plots.

p + scale_color_manual(labels = c("label1", "label2", "label3"))

# Or if we also need to specify a color palette
p + scale_color_manual(labels = c("label1", "label2", "label3"),
                       values = cbPalette)

⇧ Back to TOCR: 4.0.1 ◊ ggplot2: 3.3.2

Move a legend to different location

Legend position is set through the parameter legend.position within theme(). The variable accepts words, e.g. “top”, “bottom”, “left”, “right”, or relative x,y coordinates such as c(0.8,0.8).

The new legend will be automatically reshaped for its new location, e.g. a legend.position of "right" (default) will create a column-like legend, a legend.position of "bottom" will create a horizontal-like legend.

# Example 1
p + theme(legend.position = "top")

# Example 2
p + theme(legend.position = c(0.8,0.8))

⇧ Back to TOCR: 3.6.2 ◊ ggplot2: 3.2.1

Remove entire legend

p + theme(legend.position = "none")

⇧ Back to TOCR: 3.6.2 ◊ ggplot: 3.2.1

Remove legend title

p + theme(legend.title = element_blank())

⇧ Back to TOCR: 3.6.2 ◊ ggplot: 3.2.1

Change legend title

Example 1: For point- or line-like plots

# For geom_line, either
p + labs(color="New Legend Title")
# or
p + labs(color=guide_legend(title="New Legend Title"))

Example 2: For fill-like plots

# For geom_boxplot, either 
p + labs(fill="New Legend Title")
# or 
p + labs(fill=guide_legend(title="New Legend Title"))

⇧ Back to TOCR: 3.6.2 ◊ ggplot2: 3.2.1

Change legend shape

Method 1: Specifying number of rows (e.g. put everything in a single row)

# set 'nrow = 1' for a single row (everything arranged horizontally)
p + guides(color = guide_legend(nrow = 2))

⇧ Back to TOCR: 3.6.2 ◊ ggplot2: 3.2.1

Method 2: Specify the box shape

p + theme(legend.box = "horizontal")

⇧ Back to TOCR: 3.6.2 ◊ ggplot2: 3.2.1

Make legend background transparent

Useful when positioning a legend inside a plot. The default legend background is a white rectangle and it will overlay and hide plot elements like grid lines or plot borders.

p + theme(legend.background = element_rect(fill = 'transparent', color=NA))

⇧ Back to TOCR: 3.6.2 ◊ ggplot2: 3.2.1

Change legend markers attributes (make symbols bigger, non-transparent, etc.) {#change-legend-markers-attributes-make-symbols-bigger-non-transparent-etc.}

Sometimes when plotting, choices are made that look good on the plot but that don’t translate well to the plot legend.

For example, when plotting hundreds of thousands of data points, I may make the points very small and with high transparency. In the plot they superimpose into a solid, distinguishing color. But as single dots in the legend, they are too small and transparent to read.

To fix, override the legend parameters.

# The most common legend items I need to fix are marker size and transparency
p + guides(colour = guide_legend(override.aes = list(size=10, alpha=1.0)))

⇧ Back to TOCR: 4.1.0 ◊ ggplot2: 3.3.3


Add or modify plot text

Put a portion of a title or subtitle in italics or bold font

Using library ggtext. Its theme element element_markdown() understands basic markup characters like * and **.

require(ggtext)

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
geom_point() + 
labs(
    title = "Sepal length and sepal width of various *Iris* species",
    x = "Sepal length (cm)",
    y = "Sepal width (cm)"
  ) +
# Use ggtext::element_markdown() theme element(instead of element_text(). It
# understands basic markup characters like `*` (italics), and `**` (bold)
theme(plot.title = ggtext::element_markdown())

⇧ Back to TOCR: 4.0.2 ◊ ggplot2:3.3.2 &loz ggtext: 0.1.0

Change title font size

Adjust plot title font size through modifying the theme.

p <- p + theme(plot.title = element_text(size=12))

⇧ Back to TOCR: 4.1.0 ◊ ggplot2:3.3.3


Change plot aesthetics

Change the default color palette (e.g. to a colorblind palette)

Automatic ggplot color choices can be overwritten by providing a color palette. For example, two color-blind palettes:

# Color blinds pallettes. Both palettes are identical except for the first
# color which is either grey or black.

# Colorblind palette with grey
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442",
                "#0072B2", "#D55E00", "#CC79A7")

# Colorblind palette with black
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442",
                 "#0072B2", "#D55E00", "#CC79A7")

Example 1: For line and point colors use scale_color_manual()

p + scale_color_manual(values = cbPalette)

Example 2: For fills use scale_fill_manual

p + scale_fill_manual(values = cbPalette)

⇧ Back to TOCR: 4.0.2 ◊ ggplot2:3.3.2

Increase line thickness

Use the size parameter

p + geom_line(size=1.0)

⇧ Back to TOCR: 3.6.2 ◊ ggplot: 3.2.1

Combine multiple plots into a single plot

Using ggplot2::facet_wrap()

Function facet_wrap() wraps multiple plots that differ by a faceting variable (i.e. a factor variable) into a grid. An example taken directly from ggplot2 documentation:

p <- ggplot(mpg, aes(displ, hwy)) +
       geom_point() +
       # Use vars() to supply faceting variables
       facet_wrap(vars(class))

⇧ Back to TOCR: 4.1.0 ◊ ggplot: 3.3.3

Using ‘grid’ library

I don’t use this method, but I’ve included it as reference.

Using package grid:

library(grid)

grid.newpage()
# 'p1','p2' are ggplot objects
grid.draw(rbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))

This can be modified to write a PDF of the plot to disk

library(grid)

grid.newpage()
pdf(file="composite_plot1.pdf")
grid.draw(rbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))
dev.off()

⇧ Back to TOCR: 4.0.2 ◊ ggplot: 3.3.2 ◊ grid: 4.0.2

Using ‘cowplot’ library

library(cowplot)

# plot1, plot2, etc. are ggplot objects. Any number of these can be supplied as
# arguments.
cowplot::plot_grid(p1, p2, p3, p4, ncol = 2)

# Save plot to disk
cowplot::save_plot("myplot.png", p_allforecasts)

⇧ Back to TOCR: 4.0.2 ◊ ggplot: 3.3.2 ◊ cowplot: 1.0.0


Saving a plot

Save a plot as a png image

Function ggsave() will save the last rendered plot.

# By default ggsaves the last rendered plot
ggsave("myplot.png", width = 6, height = 4.25, units = "in")

⇧ Back to TOCR: 3.6.2 ◊ ggplot 3.2.1


Miscellaneous

Plot a variable given as a string

See ggplot reference: defining aesthetic mappings progromatically.

Example 1: Use quasi-quotation

!! unquotes a string.

ggplot(plotdata, aes(xvar, !! yvar)) +
    geom_line()

Example 2: Use aes_string() to define the aesthetic (depreciated; see note after example).

The passed variable string can be an expression. For example, assuming we had the variables INCOMING and OUTGOING in our dataset we could could pass yvar = "INCOMING - OUTGOING" to our function plotVariable()

# xvarstr and yvarstr are strings
ggplot(plotdata, aes_string(xvarstr, yvarstr, color = groupvar) +
    geom_line()

The function aes_string() is depreciated. The regular aesthetic aes() quasi-quotation should be used instead. See Example 1 above.

⇧ Back to TOCR: 3.6.2 ◊ ggplot: 3.2.1

Suppress plot messages

Sometimes when working in a RMarkdown computational notebook, executing a cell with a ggplot plot may also produce and output unwanted messages. To suppress the messages, depending on the message type, try either messages=FALSE or warnings=FALSE in the cell header:

```{r, messages=FALSE, warnings=FALSE}
# PLOT CODE
```

If that doesn’t work, use suppressMessages().


  1. Flowchart was written using Mermaid and displayed using the Grav Mermaid Diagrams plugin.