We will begin our journey into statistical graphics with the package
ggplot2. This is another package by Hadley Wickham and is part of the tidyverse. This means we can use
chaining to build our graphics. After going through this material, if you would like further information please check out the following books:
A good place to start might be with what
ggplot2 cannot do. From here we will introduce what it can do.
For this section of the course we will consider the New York City Flights 2013 data. This data contains information on all arriving and departing flights from NYC in 2013. The variables in this dataset are:
As we start with
ggplot2 it is important to understand the structure of this. The bas graphics built into R require the use of many different functions and each of them seem to have their own method for how to use them.
ggplot2 will be more fluid and the more you learn about it the more amazing of graphics you can create. We will get started with the components of every
For example, we will create a simple scatter plot of distance by departure delay:
library(dplyr) library(ggplot2) library(nycflights13) data = flights %>% sample_frac(.01) ggplot(data, aes(x=distance, y= dep_delay)) + geom_point()
What the code first does is takes a random 1% sample of all of the flights data. Given that the original data has 336,776 flights, it can be hard to vizualise this much data with any clarity so we will observe a sample for this. We then see that the aesthetic mapping is distance by departure delay. Finally we have a layer of points. This then leads to the following graph:
As we proceed through this section we will begin the graph things in the following pattern: