Chapter 6 Data Visualization with ggplot2

Main reference for this chapter: R graphics cookbook (https://r-graphics.org/)

In the previous chapter, we learned how to create some basic plots with base R. In this chapter, we will see how to use ggplot for data visualization.

We will use some of the datasets from the the package gcookbook. Therefore, we will install it now.

install.packages("gcookbook")

Load the packages gcookbook, tidyverse and nycflights13.

library(gcookbook) # contains some datasets for illustration
library(tidyverse) # contains ggplot2 and dplyr
library(nycflights13) # contains the dataset "flights"

6.1 Bar charts

We will start with bar charts. Many of the usages discussed in this section can also be transferable to create other plots.

Recall that there are two types of bar charts:

  1. Bar chart of values. x-axis: discrete variable, y-axis: numeric data (not necessarily count data)
  2. Bar chart of counts. x-axis: discrete variable, y-axis: count of cases in the discrete variable

Using ggplot:

  1. For bar chart of values, we use geom_col(), which is the same as using geom_bar(stat = "identity").
  2. For bar chart of counts, we use geom_bar(), which is the same as using geom_bar(stat = "count"). That is, the default for geom_bar() is to use stat = "count".

Bar chart of values:

pg_mean is a simple dataset with groupwise means of some plant growth data.

pg_mean
##   group weight
## 1  ctrl  5.032
## 2  trt1  4.661
## 3  trt2  5.526
ggplot(data = pg_mean, mapping = aes(x = group, y = weight)) +
  geom_col()

Recall the mtcars dataset. Let’s create a bar chart of values for the mean weights grouped by the number of gears. First, we summarize the data using summarize.

by_gear <- group_by(mtcars, gear)
mtcars_wt <- summarize(by_gear, mean_wt_by_gear = mean(wt))

# Alternatively, using %>%
mtcars_wt <- mtcars %>% 
  group_by(gear) %>% 
  summarize(mean_wt_by_gear = mean(wt))

Create the bar chart:

ggplot(mtcars_wt, aes(x = gear, y = mean_wt_by_gear)) +
  geom_col()

To change the colour of the bars, use fill.

ggplot(mtcars_wt, aes(x = gear, y = mean_wt_by_gear)) +
  geom_col(fill = "lightblue")

By default, there is no outline around the fill. To add an outline, use colour (or color).

ggplot(mtcars_wt, aes(x = gear, y = mean_wt_by_gear)) +
  geom_col(color = "red")

Of course, you can combine the two settings:

ggplot(mtcars_wt, aes(x = gear, y = mean_wt_by_gear)) +
  geom_col(fill = "lightblue", color = "red")

Graph with grouped bars

The most basic bar chart of values have one categorical variable on the x-axis and one continuous variable on the y-axis. If you want to include another categorical variable to divide up the data, you can use a graph with grouped bars.

In mtcars, vs represents the engine of the car with 0 = V-shaped and 1 = straight. We can use vc to divide up the data in addition to gear using fill. To create a grouped bar chart, set position = "dodge" in geom_col(); otherwise, you will get a stacked bar chart.

# prepare the data
by_gear_vs <- group_by(mtcars, gear, vs)
mtcars_wt2 <- summarize(by_gear_vs, mean_wt = mean(wt))
# convert to factor in the data
mtcars_wt2$vs <- as.factor(mtcars_wt2$vs) 

# Alternatively, using %>%
mtcars_wt2 <- mtcars %>% 
  group_by(gear, vs) %>% 
  summarize(mean_wt = mean(wt)) %>% 
  ungroup() %>% 
  mutate(vs = as.factor(vs))

# plot
ggplot(mtcars_wt2, aes(x = gear, y = mean_wt, fill = vs)) +
  geom_col(position = "dodge")

Without position = "dodge", we get a stacked bar chart:

ggplot(mtcars_wt2, aes(x = gear, y = mean_wt, fill = vs)) +
  geom_col()

You can also convert vs to factor in call to ggplot():

# prepare the data
by_gear_vs <- group_by(mtcars, gear, vs)
mtcars_wt3 <- summarize(by_gear_vs, mean_wt = mean(wt))

# plot
ggplot(mtcars_wt3, aes(x = gear, y = mean_wt, fill = factor(vs))) +
  geom_col(position = "dodge")

To change the colours of the bars:

ggplot(mtcars_wt2, aes(x = gear, y = mean_wt, fill = vs)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Pastel2")

You can try with different palettes:

library(RColorBrewer)
display.brewer.all()

Using palette = "Oranges":

ggplot(mtcars_wt2, aes(x = gear, y = mean_wt, fill = vs)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Oranges")

Using a manually defined palette:

ggplot(mtcars_wt2, aes(x = gear, y = mean_wt, fill = vs)) +
  geom_col(position = "dodge") +
  scale_fill_manual(values = c("#cc6666", "#66cccc")) 

Bar Charts of Counts

Creating a bar chart of counts is very similar to creating a bar chart of values.

Bar chart of the number of cars by gear in mtcars:

ggplot(mtcars, aes(x = gear)) +
  geom_bar()

Bar chart of the number of flights by each month in nycflights13:

ggplot(flights, aes(x = factor(month))) + 
  geom_bar(fill = "lightblue")

Controlling the width (by default, width = 0.9):

ggplot(flights, aes(x = month)) + 
  geom_bar(fill = "lightblue", width = 0.5)

Bar chart of the number of flights by origin and month:

ggplot(flights, aes(x = origin, fill = month)) + 
  geom_bar(position = "dodge", color = "black") 

6.2 Line Graph

Suppose you want to make a line graph of the daily average departure delay in flights. From now on, we will use %>% whenever it is appropriate.

avg_delay <- 
  flights %>% 
  group_by(month, day) %>% 
  summarize(delay = mean(dep_delay, na.rm = TRUE)) %>% 
  ungroup() %>% 
  mutate(Time = 1:365) 

ggplot(avg_delay, aes(x = Time, y = delay)) +
  geom_line()

Labeling the graph:

# notice how we put each argument on its own line when the arguments
# do not all fit on one line
ggplot(avg_delay, aes(x=Time, y=delay)) +
  geom_line() +
  labs(
    y = "Average Delay", 
    title = "Daily Average Departure Delay of Flights from NYC in 2013"
  )

By default, the range of the y-axis of a line graph is just enough to include all the y values in the data. Sometimes, you may want to change the range manually. For example, the range of the y-axis in the following graph does not include 0.

ggplot(BOD, aes(x = Time, y = demand)) +
  geom_line()

If you want to include 0 in the y range, you can use ylim:

ggplot(BOD, aes(x = Time, y = demand)) +
  geom_line() +
  ylim(0, max(BOD$demand))

Line Graph with multiple lines

Suppose we want to create a line graph showing the daily average departure delay from the 3 airports in flights.

# prepare the data
flights_delay <- flights %>%
  group_by(month, origin) %>% 
  summarize(delay = mean(dep_delay, na.rm = TRUE)) %>% 
  ungroup()

Line Graph:

ggplot(flights_delay, aes(x = month, y = delay, color = origin)) +
  geom_line() +
  scale_x_continuous(breaks = 1:12)

With different line types:

ggplot(flights_delay, aes(x = month, y = delay, linetype = origin)) +
  geom_line()

Add the points on top of the lines:

ggplot(flights_delay, aes(x = month, y = delay, linetype = origin, color = origin)) +
  geom_line() +
  geom_point()

Change the point shapes according to origin:

ggplot(flights_delay, aes(x = month, y = delay, 
                          linetype = origin, color = origin, shape = origin)) +
  geom_line() +
  geom_point()

To use one single shape for the points, we can specify the shape in geom_point(). The default shape is shape = 16. The default size is size = 2. fill is only applicable for shape = 21 to 25.

ggplot(flights_delay, aes(x = month, y = delay, linetype = origin, color = origin)) +
  geom_line() +
  geom_point(shape = 22, size = 3, fill = "white", color = "darkred")

Using another colour palette and changing the size of the lines:

ggplot(flights_delay, aes(x = month, y = delay, color = origin)) +
  geom_line(size = 2) +
  geom_point(shape = 22, size = 3, fill = "white", color = "darkred")+
  scale_colour_brewer(palette = "Set2")

6.3 Scatter Plots

Scatter plots are often used to visualize the relationship between two continuous variables. It is also possible to use a scatter plot when either or both variables are discrete.

The dataset heightweight contains sex, age, height and weight of some schoolchildren.

head(heightweight)
##   sex ageYear ageMonth heightIn weightLb
## 1   f   11.92      143     56.3     85.0
## 2   f   12.92      155     62.3    105.0
## 3   f   12.75      153     63.3    108.0
## 4   f   13.42      161     59.0     92.0
## 5   f   15.92      191     62.5    112.5
## 6   f   14.25      171     62.5    112.0

To create a basic scatter plot, use geom_point():

ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
  geom_point()

You can control the shape, size, and color of the points as illustrated in the last section.

ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
  geom_point(size = 1.5, shape = 4, color = "blue")

If shape = 21-25, you can control the color in the points and outline of the points using fill and color, respectively.

ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
  geom_point(size = 1.5, shape = 22, fill = "red", color = "blue")

Visualizing an additional discrete variable

Suppose you want to use different colours for the points according to different categories of sex.

ggplot(heightweight, aes(x = ageYear, y = heightIn, color = sex)) +
  geom_point()

Suppose you want to use different shapes for the points according to different categories of sex.

ggplot(heightweight, aes(x = ageYear, y = heightIn, shape = sex)) +
  geom_point()

You can use colours and shapes at the same time:

ggplot(heightweight, aes(x = ageYear, y = heightIn, shape = sex, color = sex)) +
  geom_point()

You can change the shapes or colours manually:

ggplot(heightweight, aes(x = ageYear, y = heightIn, shape = sex, color = sex)) +
  geom_point() +
  scale_shape_manual(values = c(21,22)) +
  scale_colour_brewer(palette = "Set2")

Visualizing an additional continuous variable

You may map an additional continuous variable to color.

ggplot(heightweight, aes(x = ageYear, y = heightIn, color = weightLb)) +
  geom_point()

Visualizing two additional discrete variables

Let’s create a new column to indicate if the child weights < 100 or >= 100 pounds (this is a discrete variable).

heightweight2 <- heightweight %>%
  mutate(weightgroup = ifelse(weightLb < 100, "< 100", ">= 100"))

Now, we can add both sex and weightgroup in the plot in the following way:

ggplot(heightweight2, aes(x = ageYear, y = heightIn, shape = sex, fill = weightgroup)) +
  geom_point() +
  scale_shape_manual(values = c(21, 24)) +
  scale_fill_manual(
    values = c("red", "black"),
    guide = guide_legend(override.aes = list(shape = 21)) # to change the legend
  ) 

Changing the mark ticks, limits and labels of the x-axis and y-axis:

ggplot(heightweight2, aes(x = ageYear, y = heightIn, shape = sex, fill = weightgroup)) +
  geom_point() +
  scale_shape_manual(values = c(21, 24)) +
  scale_fill_manual(
    values = c("red", "black"),
    guide = guide_legend(override.aes = list(shape = 21)) # to change the legend
  ) +
  scale_x_continuous(name = "Age (Year)", breaks = 11:18, limits = c(11, 18)) +
  scale_y_continuous(name = "Height (In)", breaks = seq(50, 70, 5), limits = c(50, 73))

6.3.1 Overplotting

Overplotting refers to the situation when you have a large dataset so that the points in a scatter plot overlap and obscure each other.

# We can create a variable to store the "ggplot" 
diamonds_ggplot <- ggplot(diamonds, aes(x = carat, y = price))
diamonds_ggplot +
  geom_point()

Possible solutions for overplotting:

  1. Use smaller points (size)
# with diamonds_ggplot, we do not have to type 
# ggplot(diamonds, aes(x = carat, y = price))
diamonds_ggplot +
  geom_point(size = 0.1)

  1. Make the points semitransparent (alpha)
diamonds_ggplot +
  geom_point(alpha = 0.05, size = 0.1) # 0.05 = 95% transparent

We can see some vertical bands at some values of carats, meaning that diamonds tend to be cut to those sizes.

  1. Bin the data into rectangles (stat_bin2d)

bins controls the number of bins in the x and y directions. The color of the rectangle indicates how many data points there are in the region.

# by default, bins = 30
diamonds_ggplot +
  stat_bin2d(bins = 3)  

With bins = 50:

diamonds_ggplot +
  stat_bin2d(bins = 50) +
  scale_fill_gradient(low = "lightblue", high = "red")

  1. Overplotting can also occur when the data is discrete on one or both axes.

In the following example, we use the dataset ChickWeight, where Time is a discrete variable.

head(ChickWeight)
## Grouped Data: weight ~ Time | Chick
##   weight Time Chick Diet
## 1     42    0     1    1
## 2     51    2     1    1
## 3     59    4     1    1
## 4     64    6     1    1
## 5     76    8     1    1
## 6     93   10     1    1
# create a base plot
cw_ggplot <- ggplot(ChickWeight, aes(x = Time, y = weight))
cw_ggplot +
  geom_point()

You may randomly jitter the points:

cw_ggplot +
  geom_point(position = "jitter")

Jittering the points means a small amount of random variation is added to the location of each point. If you only want to jitter in the x-direction:

cw_ggplot +
  geom_point(position = position_jitter(width = 0.5, height = 0))

6.3.2 Labelling points in a scatter plot

We can use annotate() or geom_text_repel() to label points in a scatter plot. For the latter, we have to install the package ggrepel.

We will use the countries dataset in the package gcookbook and visualize the relationship between health expenditures and infant mortality rate. We will consider a subset of data by focusing the data from 2009 and countries with more than \(2,000\) USD health expenditures per capita:

countries_subset <- countries %>% 
  filter(Year == 2009, healthexp > 2000)

Using annotate:

# find out the x and y coordinates for the point corresponding to Canada
canada_x <- filter(countries_subset, Name == "Canada")$healthexp
canada_y <- filter(countries_subset, Name == "Canada")$infmortality

ggplot(countries_subset, aes(x = healthexp, y = infmortality)) + 
  geom_point() +
  annotate("text", x = canada_x, y = canada_y + 0.2, label = "Canada")

# + 0.2 is to avoid the label placing on top of the point

Label all the points with geom_text_repel:

# to use geom_text_repel, load the package ggrepel
library(ggrepel)
ggplot(countries_subset, aes(x = healthexp, y = infmortality)) + 
  geom_point() +
  geom_text_repel(aes(label = Name), size = 3)

Label all the points with geom_label_repel (with a box around the label):

# geom_label_repel also depends on the package ggrepel
ggplot(countries_subset, aes(x = healthexp, y = infmortality)) + 
  geom_point() +
  geom_label_repel(aes(label = Name), size = 3)

6.4 Summarizing Data Distributions

6.4.1 Histogram

Histogram can be used to visualize the distribution of a variable. We will illustrate how to create histograms using the dataset birthwt from the package MASS.

library(MASS)

birthwt contains data of 189 birth weights with some covariates of the mothers.

Take a look at the dataset:

head(birthwt)
##    low age lwt race smoke ptl ht ui ftv  bwt
## 85   0  19 182    2     0   0  0  1   0 2523
## 86   0  33 155    3     0   0  0  0   3 2551
## 87   0  20 105    1     1   0  0  0   1 2557
## 88   0  21 108    1     1   0  0  1   2 2594
## 89   0  18 107    1     1   0  0  1   0 2600
## 91   0  21 124    3     0   0  0  0   0 2622

Basic histogram:

ggplot(birthwt, aes(x=bwt)) +
  geom_histogram()

Plot a histogram with density (not frequency):

ggplot(birthwt, aes(x=bwt)) +
  geom_histogram(aes(y = ..density..))

To compare two histograms

  1. Use facet_grid() to display two histograms in the same plot.

Suppose we group the data according to the smoking status during pregnancy and we want to display the two histograms of the birth weight:

ggplot(birthwt, aes(x = bwt)) +
  geom_histogram() +
  facet_grid(smoke ~ .)

To change the label, we can change the content of the variable:

# create another dataset
birthwt_mod <- birthwt
birthwt_mod$smoke <- ifelse(birthwt_mod$smoke == 1, "Smoke", "No Smoke")

ggplot(birthwt_mod, aes(x = bwt)) +
  geom_histogram() +
  facet_grid(smoke ~ .)

Alternatively, we can use recode_factor:

birthwt_mod$smoke <- recode_factor(birthwt_mod$smoke, 
                                   "0" = "No Smoke", "1" = "Smoke")
  1. Use fill() to put two groups in the same plot with different colors. We need to set position = "identity"; otherwise, the bars will be stacked on top of each other vertically which is not what we want.
ggplot(birthwt_mod, aes(x=bwt, fill=smoke)) +
  geom_histogram(position = "identity", alpha = 0.4) #+

#  facet_grid(race~., scales="free")

It is also possible to use both facet_grid and fill when we have want to group the data with two discrete variables. We will illustrate this with grouping according to the smoking status and the race. We also add scales = "free" so that the ranges of the y-axes will be adjusted according to the data in each histogram.

# change the name so that the labels can be understood easily
birthwt_mod$race[which(birthwt_mod$race==1)] = "White"
birthwt_mod$race[which(birthwt_mod$race==2)] = "Black"
birthwt_mod$race[which(birthwt_mod$race==3)] = "Other"

ggplot(birthwt_mod, aes(x = bwt, fill = smoke)) +
  geom_histogram(position = "identity", alpha = 0.4) +
  facet_grid(race ~ ., scales = "free")

Note: we do not have a large dataset in this example so that grouping by two variables may not give us a very good understanding of the data.

6.4.2 Kernel Density Estimate

Kernel density estimation is a nonparametric method to estimate the density of the samples. Nonparametric method means we do not impose a parametric model. A parametric model has a finite dimensional parameter \(\theta \in \mathbb{R}^d\) for some finite \(d\). Let \(X_1,\ldots,X_n\) be i.i.d. random variables from some distribution with density \(f\). The histogram for \(f\) at point \(x_0\) is \[\begin{equation*} \hat{f}(x_0) = \frac{\text{number of $x_i$ in the bin containing $x_0$}}{n h}, \end{equation*}\] where the bin width is \(h\). As we already know, the histogram will not give a smooth estimate of the density. One may use another method called kernel density estimator, which could produce smooth estimate of the density. The kernel density estimator is \[\begin{equation*} \hat{f}_n(x_0) = \frac{1}{nh}\sum^n_{i=1} K \bigg( \frac{x_0 - x_i}{h} \bigg), \end{equation*}\] where \(K\) is a kernel and \(h\) is the bandwidth. For our purposes, a kernel is a non-negative symmetric function such that \(\int^\infty_{-\infty}K(x)dx = 1\) and \(\int^\infty_{-\infty} x K(x)dx =0\). For example, \[\begin{eqnarray*} \text{the boxcar kernel:} && K(x) = \frac{1}{2}I(|x| \leq 1)\\ \text{the Gaussian kernel:} && K(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \\ \text{the Epanechnikov kernel:} && K(x) = \frac{3}{4}(1-x^2)I(|x| \leq 1) \\ \text{the tricube kernel:} && K(x) = \frac{70}{81}(1-|x|^3)^3I(|x| \leq 1), \end{eqnarray*}\] where \(I(|x| \leq 1) = 1\) if \(|x| \leq 1\) and equals \(0\) otherwise.

Since the kernel is symmetric around \(0\), the magnitude \((x-x_i)/h\) is the distance from \(0\). For the above kernels, the value of the kernels is smaller when we evaluate at a point further from \(0\). Therefore, data close to \(x_0\) will contribute larger weights in estimating \(\hat{f}(x_0)\).

The bandwidth will control the smoothness of the estimate: larger bandwidth will result in a smoother curve and smaller bandwidth will result in a noisy and rough curve. We can create a kernel density estimate of the distribution using geom_density().

ggplot(birthwt, aes(x = bwt)) +
  geom_density() +
  geom_density(adjust = 0.25, color = "red") + # smaller bandwidth -> noisy
  geom_density(adjust = 2, color = "blue") # large bandwidth -> smoother

Overlaying a density curve with a histogram

ggplot(birthwt, aes(x = bwt)) +
  geom_histogram(fill = "cornsilk", aes(y = ..density..)) +
  geom_density()

Displaying kernel density Estimates from grouped data

To use geom_density() to display kernel density estimates from grouped data, the grouping variable must be a factor or a character vector. Recall that in birthwt_mod that we created earlier, the smoke variable is a character vector.

With color:

ggplot(birthwt_mod, aes(x = bwt, color = smoke)) +
  geom_density()

With fill:

ggplot(birthwt_mod, aes(x = bwt, fill = smoke)) +
  geom_density(alpha = 0.3) # to control the transparency 

With facet_grid():

ggplot(birthwt_mod, aes(x = bwt)) +
  geom_density() +
  facet_grid(smoke ~ .)

6.5 Saving your plots

There are two types of image files: vector and raster (bitmap)

Raster images are pixel-based. When you zoom in the image, you can see the individual pixels. Two examples are JPG and PNG files. JPG files’ quality is lower than that of the PNG files.

Vector images are constructed using mathematical formulas. You can resize the image without a loss in image quality. When you zoom in the image, it is still smooth and clear. Two examples are AI and PDF files.

6.5.1 Outputting to pdf vector files

Suppose you want to save the plot from the following code:

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()
# first argument is the file name
# width and height are in inches
pdf("filename.pdf", width = 4, height = 4)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()

dev.off()

Outputting to a pdf file:

  • usually the best option
  • usually smaller than bitmap files such as PNG files.
  • when you have overplotting (many points on the plot), a PDF file can be much larger than a PNG file.

6.5.2 Outputting to bitmap files

# width and heights are in pixels
png("png_plot.png", width = 600, height = 600)
ggplot(mtcars, aes(x=wt,y=mpg)) + 
  geom_point()
dev.off()

For high-quality print output, it is recommended to use at least 300 ppi (ppi = pixels per inch). Suppose you want to create a 4x4-inch PNG file with 300 ppi:

ppi <- 300
png("png_plot.png", width = 4*ppi, height = 4*ppi, res = ppi)
ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point()
dev.off()

6.6 Axes, appearance

6.6.1 Swapping X- and Y-axes

plot1 <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
  geom_boxplot()

plot2 <- plot1 +
  coord_flip()

ggarrange(plot1, plot2)

6.6.2 Setting the range of a continuous axis

m_plot <- ggplot(marathon, aes(x = Half, y = Full)) +
  geom_point()

m_plot2 <- m_plot +
  xlim(0, max(marathon$Half)) +
  ylim(0, max(marathon$Full))

ggarrange(m_plot, m_plot2)  

6.6.3 Changing the text of axis labels

hw_plot <- ggplot(heightweight, aes(x = ageYear, y = heightIn, colour = sex)) +
  geom_point()

hw_plot +
  xlab("Age\n(years)") +
  ylab("Height in Inches")

6.6.4 Adding Title

library(ggpubr) # to use ggarrange
hw_plot <- ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
  geom_point() 
hw_plot1 <- hw_plot +
  ggtitle("Age and Height of Schoolchildren")
hw_plot2 <- hw_plot +
  ggtitle("Age and Height \nof Schoolchildren")
ggarrange(hw_plot1, hw_plot2)

6.6.5 Adding Subtitle

You can add a subtitle by providing a string as the second argument of ggtitle(). It will display with slightly smaller text than the main title.

hw_plot +
  ggtitle("Age and Height of Schoolchildren", 
          "11.5 to 17.5 years old")

6.6.6 Using Themes

# Grey theme (default theme)

hw_plot_grey <- hw_plot + 
  theme_grey()

hw_plot_classic <- hw_plot +
  theme_classic()

hw_plot_bw <- hw_plot +
  theme_bw()

hw_plot_minimal <- hw_plot +
  theme_minimal()

ggarrange(hw_plot_grey, hw_plot_classic, hw_plot_bw, hw_plot_minimal)

6.7 Summary

6.7.1 Bar charts

  1. examples of using pipe %>% together with ggplot
  2. create bar charts of counts
  3. create bar charts of values
  4. change “fill” and “outline” of the bars
  5. create grouped bar charts
  6. create stacked bar charts
  7. convert a variable into factor in ggplot
  8. use different colour palette
  9. control the width of the bars

6.7.2 Line graphs

  1. create line graphs
  2. label the graph
  3. change the range of y-axis
  4. create line graphs with multiple lines
  5. use multiple geoms (geometric objects) (e.g. additing the points on top of the lines)
  6. change shape, size, fill, outline of points
  7. change line type

6.7.3 Scatter plot

  1. create scatter plots
  2. visualize an additional discrete variable
  3. visualize an additional continuous variable
  4. visualize two additional discrete variables
  5. overplotting (use smaller points, make points semitransparent, bin data into rectangels, jitter the points)
  6. label points in a scatter plot

6.7.4 Summarizing data distributions

  1. create histograms (frequency and density)
  2. compare two histograms (facet_grid(), fill())
  3. create histograms with two additional discrete variables
  4. create kernel density estimates
  5. overlay a density curve with a histogram
  6. display kernel density estimates from grouped data (color, fill, facet_grid)

6.7.5 Saving your plots

  1. output to pdf vector files
  2. output to bitmap files

6.7.6 Axes, appearance

  1. swapping x- and y-axes
  2. setting the range of a continuous axis
  3. change the text of axis labels
  4. adding title
  5. adding subtitle
  6. using themes