Data in Tables · Data Science with R for the Life Sciences

Learning Objectives

Following this assignment students should be able to:

install and load an R package

understand the data manipulation functions of dplyr

execute a simple import and analyze data scenario

Reading

Readings

R for Data Science - Data transformation 5.1-5.6

Optional Resources:

Lecture Notes

Place this code at the start of the assignment to load all the required packages.

library(dplyr)

Exercises

Shrub Volume Data Basics (20 pts)

Dr. Granger is interested in studying the factors controlling the size and carbon storage of shrubs. She has conducted an experiment looking at the effect of three different treatments on shrub volume at four different locations. She has placed the data file on the web for you to download:
- Shrub dimensions data
Download this into your data folder and get familiar with the data by importing it using read.csv() and use dplyr to complete the following tasks.
1. Select the data from the length column and print it out (using select).
2. Select the data from the site and experiment columns and print it out (using select).
3. Add a new column named area containing the area of the shrub, which is the length times the width (using mutate).
4. Sort the data by length (using arrange).
5. Filter the data to include only plants with heights greater than 5 (using filter).
6. Filter the data to include only plants with heights greater than 4 and widths greater than 2 (using , or & to include two conditions).
7. Filter the data to include only plants from Experiment 1 or Experiment 3 (using | for “or”).
8. Filter the data to remove rows with null values in the height column (using !is.na)
9. Create a new data frame called shrub_volumes that includes all of the original data and a new column containing the volumes (length * width * height), and display it.
Expected outputs for Shrub Volume Data Basics: 1
Code Shuffle (20 pts)

We are interested in understanding the monthly variation in precipitation in Gainesville, FL. We’ll use some data from the NOAA National Climatic Data Center.

Each row of the data file is a year (from 1961-2013) and each column is a month (January - December).

Rearrange the following program so that it:
- Imports the data
- Calculates the average precipitation in each month across years
- Plots the monthly averages as a simple line plot
Finally, add a comment above the code that describes what it does. The comment character in R is #.

It’s OK if you don’t know exactly how the details of the program work at this point, you just need to figure out the right order of the lines based on when variables are defined and when they are used.
```
plot(monthly_mean_ppt, type = "l", xlab = "Month", ylab = "Mean Precipitation")
monthly_mean_ppt <- colMeans(ppt_data)
ppt_data <- read.csv("https://datacarpentry.org/semester-biology/data/gainesville-precip.csv", header = FALSE)
```
Expected outputs for Code Shuffle: 1
Bird Banding (20 pts)

El número de aves anilladas en unos sitios de muestreo ha sido contado por su equipo de campo e ingresado en el siguiente vector. Los conteos se ingresan en el orden pertinente al sitio de muestreo, es decir, la posición 1 del vector corresponde a los pájaros contados en el sitio de conteo 1. Corte y pegue el vector en su tarea y responda las siguientes preguntas usando funciones predeterminadas. Imprima el resultado a la pantalla. Algunas funciones de R que serán útiles incluyen length(), max(), min(), sum(), mean() y [].
```
number_of_birds <- c(28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 
36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32, 
900, 33, 14, 39, 56, 81, 29, 38, 1, 0, 143, 37, 98, 77, 92, 
83, 34, 98, 40, 45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
102, 273, 600, 10, 11)
```
1. ¿Cuántos sitios de muestreo hay?
2. ¿Cuántas aves se contaron en el sitio de muestreo 42?
3. ¿Cuál es el número total de aves contadas en todos los sitios de muestreo?
4. ¿Cuál es el menor número de aves contadas?
5. ¿Cuál es el mayor número de aves contadas?
6. ¿Cuál es el número promedio de aves contadas?
7. ¿Cuántas aves se contaron en el último sitio? Haga que la computadora elija el último sitio automáticamente, no ingrese manualmente la posición. ¿Conoce de alguna función que le devuelva la posición del último valor? (dado que las posiciones comienzan en 1, la posición del último valor en un vector es igual que su longitud).
Expected outputs for Bird Banding:
Bird Banding (20 pts)

The number of birds banded at a series of sampling sites has been counted by your field crew and entered into the following vector. Counts are entered in order and sites are numbered starting at one. Cut and paste the vector into your assignment and then answer the following questions by using code and printing the result to the screen. Some R functions that will come in handy include length(), max(), min(), sum(), and mean().
```
number_of_birds <- c(28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 
36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32, 
900, 33, 14, 39, 56, 81, 29, 38, 1, 0, 143, 37, 98, 77, 92, 
83, 34, 98, 40, 45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
102, 273, 600, 10, 11)
```
1. How many sites are there?
2. How many birds were counted at site 42?
3. What is the total number of birds counted across all of the sites?
4. What is the smallest number of birds counted?
5. What is the largest number of birds counted?
6. What is the average number of birds seen at a site?
7. How many birds were counted at the last site? Have the computer choose the last site automatically in some way, not by manually entering its position. Do you know a function that will give you the position of the last value? (since positions start at 1 the position of the last value in a vector is the same as its length).
Expected outputs for Bird Banding: 1
Portal Data Manipulation (20 pts)

Download a copy of the Portal Teaching Database surveys table (If you are using RStudio Cloud in a class it may have already been added to your workspace for you) and load it into R using read.csv().

Do not use pipes for this exercise.
1. Use select() to create a new data frame with just the year, month, day, and species_id columns in that order.
2. Use mutate(), select(), and filter() with !is.na() to create a new data frame with the year, species_id, and weight in kilograms of each individual, with no null weights. The weight in the table is given in grams so you will need to create a new column for weight in kilograms by dividing the weight column by 1000.
3. Use the filter() function to get all of the rows in the data frame for the species ID SH.
Expected outputs for Portal Data Manipulation: 1
Portal Data Manipulation Pipes (20 pts)

Download a copy of the Portal Teaching Database surveys table and load it into R using read.csv().

Use pipes (%>%) to combine the following operations to manipulate the data.
1. Use mutate(), select(), and filter() with is.na() to create a new data frame with the year, species_id, and weight in kilograms of each individual, with no null weights.
2. Use the filter() and select() to get the year, month, day, and species_id columns for all of the rows in the data frame where species_id is SH.
Expected outputs for Portal Data Manipulation Pipes: 1
Portal Data Challenge (optional)

Develop a data manipulation pipeline for the Portal surveys table that produces a table of data for only the three Dipodomys species (DM, DO, DS). The species IDs should be presented as lower case, not upper case. The table should contain information on the date, the species ID, the weight and hindfoot length. The data should not include null values for either weight or hindfoot length. The table should be sorted first by the species (so that each species is grouped together) and then by weight, with the largest weights at the top.
Expected outputs for Portal Data Challenge: 1

Assignment submission & checklist