This document is cookbook for R ggplot2 and ggplot2 add-on packages (such as cowplot) for recipes that I’ve developed and those I’ve had to repeatedly look up.
A couple notes about the code examples:
p
or any p*
(e.g. p1
, p2
, pall
)
variable is a ggplot2 objectThis document will be updated periodically.
Last updated: 2021-08-19
Ribbon geom_ribbon()
plots can also be used to fill between two lines
or point/scatter plots
# Data must be in wider (spread) format
ggplot(plotdata, aes(xvar)) +
# Providing 'color' automatically creates a label
geom_line(aes(y = yvar1, color = "Label 1"), size = 0.75) +
geom_line(aes(y = yvar2, color = "Label 2"), size = 0.75) +
# 'geom_ribbon' fills between a 'ymin' and 'ymax' value. Adding
# a transparency (alpha) value makes the plot look better.
geom_ribbon(aes(ymin = yvar1, ymax = yvar2), alpha = 0.20) +
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot: 3.2.1
Use ggplot2 extension GGally. Package function ggpairs can produce a variety of matrix-type plots between variables.
fpp3::us_change %>%
GGally::ggpairs(columns = 2:6)
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2 ◊ GGally: 2.0.0
Both the histogram and the density curve are normalized to unit area.
cvar
is the continuous or count variable being histogrammed.
stat(density)
and stat(count)
are special statistical
transformations of the original data. ..density..
is a special
statistical transformation of the original data. It’s equivalent to
stat(count)
.
..density..
: Transform from count to density..count..
: Transform from density to countExample 1: Standard overlay (density curve is scaled to the count data)
stat(count)
is now preferred to ..count..
ggplot(plotdata, aes(x = cvar)) +
geom_histogram(binwidth = 2, color="white") +
geom_density(aes(y = stat(count)), color="dodgerblue", size = 1) +
#geom_density(aes(y = ..count..), color="dodgerblue", size = 1) +
theme_linedraw()
Example 2: Density overlay (count data is scaled to unit area)
library(scales)
ggplot(plotdata, aes(cvar)) +
geom_histogram(aes(y = ..density..), binwidth = 2, color="white") +
geom_density(color="dodgerblue", size = 1) +
# Convert the y labels from fractional numbers to percent.
# Uses the scales package.
scale_y_continuous(labels = percent) +
theme_linedraw()
⇧ Back to TOC ◊ R: 4.0.5 ◊ ggplot2: 3.3.3
A boxplot compares distributions of different groups of numerical data by plotting their quartiles.
Compare the distribution of numerical variable numvar
across groups
defined by categorical variable carvar
.
ggplot(plotdata, aes(x = catvar, y=numvar)) +
geom_boxplot()
⇧ Back to TOC ◊ R: 4.1.1 ◊ ggplot2: 3.3.5
As above, but with the addition of point symbols indicating the mean value of each distribution. Box plots, by definition, only have an indicator for the median value (Q_2 or 50th percentile).
ggplot(plotdata, aes(x = catvar, y=numvar)) +
geom_boxplot() +
# Add a point for the mean value. Give it a different color to distinguish it
# from the box plot.
stat_summary(fun = mean, geom = "point", color="red")
⇧ Back to TOC ◊ R: 4.1.1 ◊ ggplot2: 3.3.5
As above, but create a grid of boxplots with respect to a second
categorical variable catvar2
.
ggplot(plotdata, aes(x = catvar1, y=numvar)) +
geom_boxplot() +
# Optional addition of the mean value
stat_summary(fun = mean, geom = "point", color="red") +
# facet_wrap creates the grid of plots with respect to catvar2
facet_wrap(~catvar2)
⇧ Back to TOC ◊ R: 4.1.1 ◊ ggplot2: 3.3.5
A ridgeline plot, also known as a joyplot, compares the distribution of a numerical variables for different groups by through arranging a sequence of density plots. The collection of density plots looks like a mountain ridgeline. Ridgeline plots work well when the number of groups is medium to large.
Each group represents a binned time period (binvar
). This type of plot
can show how the distribution of a numerical variable evolves in time.
library(ggridges)
# Bin by datevar. datevar could be date, week, month, etc.
ggplot(plotdata, aes(x=numvar, y = binvar, group=binvar)) +
ggridges::geom_density_ridges()
⇧ Back to TOC ◊ R: 4.1.1 ◊ ggplot2: 3.3.5 ◊ ggridges: 0.5.3
Additional parameters I often pass to geom_density_ridges()
go improve
their appearance:
alpha
: Transparency of each individual density plot. Because each
ridgeline can overlap if the number of ridgeline is medium to large,
I set an alpha value (e.g. 0.6).scale
: Scale the heights of each individual ridgeline by factor
scale.size
:” Set line thicknessrel_min_height
: Remove trailing tails by setting a percentage
cutoff relative to the height point of any density curveIn a normal density plot or histogram, the height of the distribution at a given numerical value is proportional to the number of observations at that value. In a cumulative ridgeline plot, however, the standard method won’t work. Each sequential density plot contains the data of all previous time periods plus the current time period. The heights of each sequential density plot would increase.
Instead, we normalize each density plot to the same height by manually supplying the heights.
The general algorithm1:
And the associated code:
# Display cutoff value of the numerical value
maxdays <- 400
startweek <- min(plotdata$week)
endweek <- max(plotdata$week)cxis
# Create a single cumulative histogram using data from start week to the
# passed week
getCumulativeDensityHistogram <- function(week){
# Pull all the days-to-complete data for the cumulative time slice of
# interest
weekseq <- seq(startweek, week, by='weeks')
cumdata <- plotdata %>%
filter(week %in% weekseq) %>%
pull(days_to_complete)
# Histogram the cumulative data. Chop off values that exceed the plot
# range (0 to maxdays) to prevent `hist()` from throwing an error.
histdata <- hist(cumdata[cumdata >=0 & cumdata < maxdays],
breaks = seq(0,maxdays,1), plot=FALSE)
# Extrac the histogrma data. Normalize the histogram heights to 1
histx <- histdata$breaks[1:400]
histy <- histdata$density[1:400] / max(histdata$density)
# Put the data into a datframe
df <- data.frame(x = histx, y = histy) %>%
mutate(week = week)
return(df)
}
# Process all weeks. For loop is done as a map.
# Complete dataframe of cumulative data
plotdata_cumulative <- do.call("rbind",
# List of per-week histogram data
lapply(
seq(startweek, endweek, by='weeks'),
getCumulativeDensityHistogram
)
) %>%
arrange(week,x) %>%
select(week,x,y)
# Plot the data. We've directly supplied the histogram height at each value
# of x, so set aesthetics correctly and use stat="identity".
p <- ggplot(plotdata_cumulative,
aes(x = x, y = week, height=y, group=week)) +
ggridges::geom_density_ridges(
stat="identity", alpha=0.5, scale=5, size=0.25,
rel_min_height = 1e-2)
⇧ Back to TOC ◊ R: 4.1.1 ◊ ggplot2: 3.3.5 ◊ ggridges: 0.5.3
# Using labs
p + labs(x = "x label",
y = "y label")
# Using xlab and ylab
p + xlab("x label") + ylab("y label")
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot2: 3.2.1
guide_axis()
functionThe guide_axis()
was introduced in ggplot version 3.3. It manually
positions each tick label so that they abuts each tick without having
provide adjustment parameters.
p + scale_x_discrete(guide = guide_axis(angle = 45))
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2
axis.text.x
P + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2
p + labs(x = "",
y = "")
p + labs(x = NULL,
y = NULL)
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2
p + theme(axis.text.x = element_blank(),
axis.text.y = element_blank())
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2
scales
packageUse the scales
package. The scales
package has multiple functions for scaling axes,
formatting labels, and determining breaks.
library(scales)
# Continuous y-axis (even applies to histograms, other count data)
p + scale_y_continuous(labels = scales::label_number_si())
⇧ Back to TOC ◊ R: 4.0.4 ◊ ggplot: 3.3.3 ◊ scales: 1.1.1
Provide a labels
function to scale_y_continuous()
that converts the
default labels to scientific SI notation.
The format_si()
function is based on code by Ben Tupper. See
https://stat.ethz.ch/pipermail/r-help/2012-January/299804.html.
format_si <- function(...) {
function(x) {
limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12,
1e-9, 1e-6, 1e-3, 1e0, 1e3,
1e6, 1e9, 1e12, 1e15, 1e18,
1e21, 1e24, 1e27, 1e30, 1e33)
prefix <- c("y", "z", "a", "f", "p",
"n", "µ", "m", " ", "k",
"M", "G", "T", "P", "E",
"Z", "Y", "kY", "MY", "GY")
# Vector with array indices according to position in intervals
i <- findInterval(abs(x), limits)
# Set prefix to " " for very small values < 1e-24
i <- ifelse(i==0, which(limits == 1e0), i)
paste(format(round(x/limits[i], 1),
trim=TRUE, scientific=FALSE, ...),
prefix[i])
}
}
p + scale_y_continuous(labels = format_si())
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot: 3.2.1
p + scale_y_continuous(labels = scales::percent)
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot2: 3.2.1
expand_limits()
This method correctly expands the y-axis to start at \(y = 0\).
p + expand_limits(y=0)
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2
scale_y_continuous()
This method works but I feel like it shifts the scale to start at zero rather than expands the scale to start at zero. I prefer Method 1.
p + scale_y_continuous(expand = c(0, 0), limits = c(0, NA))
The same logic applies to the x-axis using scale_x_continuous
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2
Legend are automatically created if color
(also spelled as colour
)
is assigned to the categorical or group variable. This requires data to
be in the longer (gathered) format.
ggplot(plotdata, aes(xvar,yvar,color=groupvar)) +
geom_line()
Alternatively, if the data is in wider (spread) format, color
can be
assigned within each individual aesthetic with the legend label of
choice.
# Data must be in wider (spread) format.
# Manually assign labels through the 'color' variable.
ggplot(plotdata, aes(x = xvar))+
geom_line(aes(y = yvar1, color = "Label 1")) +
geom_line(aes(y = yvar2, color = "Label 2"))
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot: 3.2.1
cowplot
Recipe assumes that all plots share the same legend
Steps:
theme(legend.position = "none")
)cowplot::plot_grid
for just the plotscowplot::plot_grid
with the first plot grid and the
legend# Suppress legends for the individual plots
p1 <- p1 + theme(legend.position = "none")
p2 <- p2 + theme(legend.position = "none")
p3 <- p3 + theme(legend.position = "none")
p4 <- p4 + theme(legend.position = "none")
# Extract details for the common legend from any one of the individual plots
legend <- get_legend(
p1 +
# Make the legend visible. Bottom alignment.
theme(legend.position = "bottom") +
guides(color = guide_legend(nrow = 1))
)
# Combine the four individual plots into a single combination plot
pall <- cowplot::plot_grid(p1, p2, p3, p4)
# Join the legend and the combination plot into a single, final plot.
# The `rel_heights` parameter gives the relative height proportion each
# component (combination plot, legend) should take of the final plot
# realestate.
plot_grid(pall, all, ncol = 1, rel_heights = c(1, .1))
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2: 3.3.2 ◊ cowplot: 1.0.0
Overwrite the labels. The labels must be provided in the same alphabetical order as the default legend.
scale_color_discrete()
p + scale_color_discrete(labels = c("label1", "label2", "label3"))
scale_color_manual()
p + scale_color_manual(labels = c("label1", "label2", "label3"))
Method 2 is required when we also need to also change the color palette, e.g.
p + scale_color_manual(labels = c("label1", "label2", "label3"),
values = cbPalette)
We can’t use both scale_color_discrete()
and scale_color_manual()
The two methods compete with eachother.
scale_color_manual()
As Method 2, but for fill-type plots.
p + scale_color_manual(labels = c("label1", "label2", "label3"))
# Or if we also need to specify a color palette
p + scale_color_manual(labels = c("label1", "label2", "label3"),
values = cbPalette)
⇧ Back to TOC ◊ R: 4.0.1 ◊ ggplot2: 3.3.2
Legend position is set through the parameter legend.position
within
theme()
. The variable accepts words, e.g. “top”, “bottom”, “left”,
“right”, or relative x,y coordinates such as c(0.8,0.8)
.
The new legend will be automatically reshaped for its new location, e.g.
a legend.position
of "right"
(default) will create a column-like
legend, a legend.position
of "bottom"
will create a horizontal-like
legend.
# Example 1
p + theme(legend.position = "top")
# Example 2
p + theme(legend.position = c(0.8,0.8))
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot2: 3.2.1
p + theme(legend.position = "none")
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot: 3.2.1
p + theme(legend.title = element_blank())
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot: 3.2.1
Example 1: For point- or line-like plots
# For geom_line, either
p + labs(color="New Legend Title")
# or
p + labs(color=guide_legend(title="New Legend Title"))
Example 2: For fill-like plots
# For geom_boxplot, either
p + labs(fill="New Legend Title")
# or
p + labs(fill=guide_legend(title="New Legend Title"))
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot2: 3.2.1
# set 'nrow = 1' for a single row (everything arranged horizontally)
p + guides(color = guide_legend(nrow = 2))
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot2: 3.2.1
p + theme(legend.box = "horizontal")
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot2: 3.2.1
Useful when positioning a legend inside a plot. The default legend background is a white rectangle and it will overlay and hide plot elements like grid lines or plot borders.
p + theme(legend.background = element_rect(fill = 'transparent', color=NA))
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot2: 3.2.1
Sometimes when plotting, choices are made that look good on the plot but that don’t translate well to the plot legend.
For example, when plotting hundreds of thousands of data points, I may make the points very small and with high transparency. In the plot they superimpose into a solid, distinguishing color. But as single dots in the legend, they are too small and transparent to read.
To fix, override the legend parameters.
# The most common legend items I need to fix are marker size and transparency
p + guides(colour = guide_legend(override.aes = list(size=10, alpha=1.0)))
⇧ Back to TOC ◊ R: 4.1.0 ◊ ggplot2: 3.3.3
Using library ggtext
. Its theme element element_markdown()
understands basic markup characters like *
and **
.
require(ggtext)
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point() +
labs(
title = "Sepal length and sepal width of various *Iris* species",
x = "Sepal length (cm)",
y = "Sepal width (cm)"
) +
# Use ggtext::element_markdown() theme element(instead of element_text(). It
# understands basic markup characters like `*` (italics), and `**` (bold)
theme(plot.title = ggtext::element_markdown())
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2:3.3.2 &loz ggtext: 0.1.0
Adjust plot title font size through modifying the theme.
p <- p + theme(plot.title = element_text(size=12))
⇧ Back to TOC ◊ R: 4.1.0 ◊ ggplot2:3.3.3
Automatic ggplot color choices can be overwritten by providing a color palette. For example, two color-blind palettes:
# Color blinds pallettes. Both palettes are identical except for the first
# color which is either grey or black.
# Colorblind palette with grey
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7")
# Colorblind palette with black
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7")
Example 1: For line and point colors use scale_color_manual()
p + scale_color_manual(values = cbPalette)
Example 2: For fills use scale_fill_manual
p + scale_fill_manual(values = cbPalette)
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot2:3.3.2
Use the size parameter
p + geom_line(size=1.0)
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot: 3.2.1
ggplot2::facet_wrap()
Function facet_wrap()
wraps multiple plots that differ by a faceting
variable (i.e. a factor variable) into a grid. An example taken directly
from ggplot2
documentation:
p <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
# Use vars() to supply faceting variables
facet_wrap(vars(class))
⇧ Back to TOC ◊ R: 4.1.0 ◊ ggplot: 3.3.3
I don’t use this method, but I’ve included it as reference.
Using package grid
:
library(grid)
grid.newpage()
# 'p1','p2' are ggplot objects
grid.draw(rbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))
This can be modified to write a PDF of the plot to disk
library(grid)
grid.newpage()
pdf(file="composite_plot1.pdf")
grid.draw(rbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))
dev.off()
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot: 3.3.2 ◊ grid: 4.0.2
library(cowplot)
# plot1, plot2, etc. are ggplot objects. Any number of these can be supplied as
# arguments.
cowplot::plot_grid(p1, p2, p3, p4, ncol = 2)
# Save plot to disk
cowplot::save_plot("myplot.png", p_allforecasts)
⇧ Back to TOC ◊ R: 4.0.2 ◊ ggplot: 3.3.2 ◊ cowplot: 1.0.0
Function ggsave()
will save the last rendered plot.
# By default ggsaves the last rendered plot
ggsave("myplot.png", width = 6, height = 4.25, units = "in")
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot 3.2.1
See ggplot reference: defining aesthetic mappings progromatically.
Example 1: Use quasi-quotation
!!
unquotes a string.
ggplot(plotdata, aes(xvar, !! yvar)) +
geom_line()
Example 2: Use aes_string()
to define the aesthetic
(depreciated; see note after example).
The passed variable string can be an expression. For example, assuming
we had the variables INCOMING
and OUTGOING
in our dataset we could
could pass yvar = "INCOMING - OUTGOING"
to our function
plotVariable()
# xvarstr and yvarstr are strings
ggplot(plotdata, aes_string(xvarstr, yvarstr, color = groupvar) +
geom_line()
The function aes_string()
is depreciated. The regular aesthetic
aes()
quasi-quotation should be used instead. See Example 1 above.
⇧ Back to TOC ◊ R: 3.6.2 ◊ ggplot: 3.2.1
Sometimes when working in a RMarkdown computational notebook, executing
a cell with a ggplot plot may also produce and output unwanted
messages. To suppress the messages, depending on the message type, try
either messages=FALSE
or warnings=FALSE
in the cell header:
```{r, messages=FALSE, warnings=FALSE}
# PLOT CODE
```
If that doesn’t work, use suppressMessages()
.
Flowchart was written using Mermaid and displayed using the Grav Mermaid Diagrams plugin. ↩