Chapter 5 Creating Some Basic Plots
Reference: R graphics cookbook (https://r-graphics.org/)
The base R contains many basic methods for producing graphics. We will learn some of them in this chapter. These plotting functions are good for very quick exploration of data.
For more elegant plots, we will use the package
ggplot2
(next chapter).We will use some simple datasets in base R to illustrate how to create some basic plots in this chapter.
5.1 Scatter Plot
Let’s take a look at the mtcars
dataset. This dataset comes with base R.
head(mtcars) # this is a data frame
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars) # display the structure of the data frame
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
mpg
: miles/gallon
wt
: weight (1000lbs)
Scatter plot with base graphics
# x-axis: mtcars$wt
# y-axis: mtcars$mpg
plot(x = mtcars$wt, y = mtcars$mpg) # "x =", "y =" are optional
You can produce the same plot with plot(mtcars$wt, mtcars$mpg)
Scatter plot with base graphics (when you only have \(x\) and \(y\) but not a dataset)
5.2 Line Graph
The dataset pressure
(also in base R) contains the relation between temperature in degrees Celsius and vapor pressure of mercury in millimeters (of mercury).
Line graph with base graphics
# the only difference from a scatter plot is that we add type="l"
plot(pressure$temperature, pressure$pressure, type = "l") #l = line
Line graph with base graphics with points
plot(pressure$temperature, pressure$pressure, type = "l")
points(pressure$temperature, pressure$pressure) # add some points
Line graph with base graphics with another line and points (with color)
plot(pressure$temperature, pressure$pressure, type = "l")
points(pressure$temperature, pressure$pressure)
# the additional line may not have a physical meaningful
# just an illustration on how to add a line with base graphics
lines(pressure$temperature, pressure$pressure / 2, col = "red")
points(pressure$temperature, pressure$pressure / 2, col = "red")
Colors in R You can go to https://www.r-graph-gallery.com/ggplot2-color.html and read more about colors in R.
For example, you can specify the color by name, rgb, number and hex code.
plot(pressure$temperature, pressure$pressure, type = "l", col = rgb(0.1, 0.2, 0.5, 1))
lines(pressure$temperature, pressure$pressure / 2, type = "l", col = 2)
lines(pressure$temperature, pressure$pressure * 2, col = "#8B2813")
lines(pressure$temperature, pressure$pressure * 3, col = "cornflowerblue")
5.3 Bar Chart
Two types of bar charts:
- Bar chart of values. x-axis: discrete variable, y-axis: numeric data (not necessarily count data)
- Bar chart of count. x-axis: discrete variable, y-axis: count of cases in the discrete variable
Remark: for histogram: x-axis = continuous variable, y-axis = count of cases in the interval.
The BOD
data set has 6 rows and 2 columns giving the biochemical oxygen demand versus time in an evaluation of water quality.
str(BOD)
## 'data.frame': 6 obs. of 2 variables:
## $ Time : num 1 2 3 4 5 7
## $ demand: num 8.3 10.3 19 16 15.6 19.8
## - attr(*, "reference")= chr "A1.4, p. 270"
Bar chart of values with base graphics
# names.arg = a vector of names to be plotted below each bar or group of bars.
barplot(BOD$demand, names.arg = BOD$Time)
Bar chart of counts with base graphics
In the dateset mtcars
, cyl
is the number of cylinders in the car. The possible values are \(4, 6\), and \(8\). We first find the count of each unique value in mtcars$cyl
:
To plot the bar chart, we use
5.4 Histogram
mpg
in mtcars
is the miles/gallon of the car. It is a continuous variable.
Histogram with base graphics
Histogram with base graphics
Remark: different bin widths will give you histograms with different looks.
5.5 Box Plot
A box plot (or box-and-whisker plot) display the dateset based on the five-number summary (minimum, maximum, sample median, first (\(Q_1\)) and third (\(Q_3\)) quartiles).
IQR = Interquartile range = \(Q_3 - Q_1\)
A box plot usually includes two parts, a box and a set of whiskers. The box is drawn from the first quartile to third quartile with a horizontal line drawn in the middle denoting the median. The boundaries of the whiskers is often based on the 1.5 IQR value. From above \(Q_3\), a distance of \(1.5\) times the IQR is measured out and a whisker is drawn up to the largest observed data point from the dataset that falls within this distance. Similar for the lower one. All other data points outside the buondary of the whiskers are plotted as outliers.
Let’s take a look at another dataset ToothGrowth
. In particular, supp
is a factor.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Box plot with basic graphics (using plot
)
Box plot with basic graphics (using boxplot
)
Box plot with basic graphics + interaction of two variables on x-axis (using boxplot
)
5.6 Plotting a function curve
Alternatively:
Plotting a built-in function
Plotting a self-defined function
Plotting a function with additional arguments
5.7 More on plots with base R
5.7.2 Type of Plot
Option | Type |
---|---|
type = "p" |
Points (default) |
type = "l" |
Lines connecting the data points |
type = "b" |
Points and non-overlapping lines |
type = "h" |
Height lines |
type = "o" |
Points and overlapping lines |
5.7.3 Parameters of a plot
Parameter | Meaning |
---|---|
type |
See Type of Plot |
main |
Title |
sub |
Subtitle |
xlab |
x-axis label |
ylab |
y-axis label |
xlim |
x-axis range |
ylim |
y-axis range |
pch |
Symbol of data points |
col |
Color of data points |
lty |
Type of the line |
To illustrate some of the components: