Learning Objectives
Following this assignment students should be able to:
- install and load an R package
- understand the data manipulation functions of
dplyr- execute a simple import and analyze data scenario
Reading
Readings
Optional Resources:
Lecture Notes
Place this code at the start of the assignment to load all the required packages.
library(dplyr)
Exercises
Shrub Volume Aggregation (10 pts)
This is a follow-up to Shrub Volume Data Basics.
Dr. Granger wants some summary data of the plants at her sites and for her experiments. Check if the file
shrub-volume-data.csvis already in your work space (your instructor may have already added them). If not download the shrub dimensions data.This code calculates the average height of a plant at each site:
shrub_dims <- read.csv('shrub-volume-data.csv') by_site <- group_by(shrub_dims, site) avg_height <- summarize(by_site, avg_height = mean(height))- Modify the code to calculate and print the average height of a plant in each experiment.
- Use
max()to determine the maximum height of a plant at each site.
Shrub Volume Join (10 pts)
This is a follow-up to Shrub Volume Aggregation.
In addition to the main data table on shrub dimensions, Dr. Granger has two additional data tables. The first describes the manipulation for each experiment. The second provides information about the different sites. Check if the files
shrub-volume-experiments.csvandshrub-volume-sites.csvare in your work space (your instructor may have already added them). If not download the experiments data and the sites data.- Import the experiments data and then use
inner_jointo combine it with the shrub dimensions data to add amanipulationcolumn to the shrub data. - Import the sites data and then combine it with both the data on shrub dimensions and the data on experiments to produce a single data frame that contains all of the data.
- Import the experiments data and then use
Portal Data Aggregation (10 pts)
Download a copy of the Portal Teaching Database surveys table and load it into R using
read.csv().- Use the
group_by()andsummarize()functions to get a count of the number of individuals in each species ID. - Use the
group_by()andsummarize()functions to get a count of the number of individuals in each species ID in each year. - Use the
filter(),group_by(), andsummarize()functions to get the mean mass of speciesDOin each year.
- Use the
Fix the Code (15 pts)
This is a follow-up to Shrub Volume Aggregation. If you don’t already have the shrub volume data in your working directory download it.
The following code is supposed to import the shrub volume data and calculate the average shrub volume for each site and, separately, for each experiment.
read.csv("shrub-volume-data.csv") shrub_data %>% mutate(volume = length * width * height) %>% group_by(site) %>% summarize(mean_volume = max(volume)) shrub_data %>% mutate(volume = length * width * height) group_by(experiment) %>% summarize(mean_volume = mean(volume))- Fix the errors in the code so that it does what it’s supposed to
- Add a comment to the top of the code explaining what it does
Portal Data Joins (15 pts)
If
surveys.csv,species.csv, andplots.csvare not available in your workspace download them:Load them into R using
read.csv().- Use
inner_join()to create a table that contains the information from both thesurveystable and thespeciestable. - Use
inner_join()twice to create a table that contains the information from all three tables. - Use
inner_join()andfilter()to get a data frame with the information from thesurveysandplotstables where theplot_typeisControl.
- Use
Portal Data dplyr Review (20 pts)
If
surveys.csv,species.csv, andplots.csvare not available in your workspace download them:Load them into R using
read.csv().We want to do an analysis comparing the size of individuals on the
Expected outputs for Portal Data dplyr Review: 1Controlplots to theLong-term Krat Exclosures. Create a data frame with theyear,genus,species,weightandplot_typefor all cases where the plot type is eitherControlorLong-term Krat Exclosure. Only include cases whereTaxaisRodent. Remove any records where theweightis missing.Extracting vectors from data frames (10 pts)
Using the Portal data
surveystable:- Use
$to extract theweightcolumn into a vector - Use
[]to extract themonthcolumn into a vector - Extract the
hindfoot_lengthcolumn into a vector and calculate the mean hindfoot length ignoring null values.
- Use
Building data frames from vectors (10 pts)
You have data on the length, width, and height of 10 individuals of the yew Taxus baccata stored in the following vectors:
length <- c(2.2, 2.1, 2.7, 3.0, 3.1, 2.5, 1.9, 1.1, 3.5, 2.9) width <- c(1.3, 2.2, 1.5, 4.5, 3.1, NA, 1.8, 0.5, 2.0, 2.7) height <- c(9.6, 7.6, 2.2, 1.5, 4.0, 3.0, 4.5, 2.3, 7.5, 3.2)Make a data frame that contains these three vectors as columns along with a
Expected outputs for Building data frames from vectors: 1genuscolumn containing the name Taxus on all rows and aspeciescolumn containing the word baccata on all rows.