There are many ways in which we can organize data. Some of these ways can make for easy data analysis. Others lead to a lot of frustration. This is where tidy data comes in. Tidy data is a concept from Hadley Wickham’s 2014 paper Tidy Data.
In the framework of tidy data every row is an observation, every column represents variables and every entry into the cells of the data frame are values. R for Data Science sums this up with the following graphic:
In order to work with data in this way all of these feature line up for us. Consider the following datasets:
#table1 # A tibble: 6 × 4 country year cases population <fctr> <int> <int> <int> 1 Afghanistan 1999 745 19987071 2 Afghanistan 2000 2666 20595360 3 Brazil 1999 37737 172006362 4 Brazil 2000 80488 174504898 5 China 1999 212258 1272915272 6 China 2000 213766 1280428583
#table2 # A tibble: 12 × 4 country year key value <fctr> <int> <fctr> <int> 1 Afghanistan 1999 cases 745 2 Afghanistan 1999 population 19987071 3 Afghanistan 2000 cases 2666 4 Afghanistan 2000 population 20595360 5 Brazil 1999 cases 37737 6 Brazil 1999 population 172006362 7 Brazil 2000 cases 80488 8 Brazil 2000 population 174504898 9 China 1999 cases 212258 10 China 1999 population 1272915272 11 China 2000 cases 213766 12 China 2000 population 1280428583
#table3 # A tibble: 6 × 3 country year rate <fctr> <int> <chr> 1 Afghanistan 1999 745/19987071 2 Afghanistan 2000 2666/20595360 3 Brazil 1999 37737/172006362 4 Brazil 2000 80488/174504898 5 China 1999 212258/1272915272 6 China 2000 213766/1280428583
From these above tables we can see that only Table 1 is actually tidy data. We will consider how we can create tidy data from the other 2 as well as some other examples as we move through this unit.
To start out with getting the Data Set ready we will use the package `tidyr` and then to start transforming and working with the data to model and graph it, we will use the `dplyr` packages, both of `tidyverse`.
To start out with getting the Data Set ready we will use the package
tidyr and then to start transforming and working with the data to model and graph it, we will use the
dplyr packages, both of
tidyr package we will focus on the following 4 functions:
In order to learn R you must do R. Follow the steps below in your RStudio console:
You will be promted to choose a course. Type whatever number is in front of 02 Getting Data. This will then take you to a menu of lessons. For now we will just use lesson 6. Type 6 to choose Looking at Data then follow all the instructions until you are finished.
Once you are finished with the lesson come back to this course and continue.