#possum.csv needs to be saved in the same directory as this file.
#possum <- read.csv("possum.csv")
Notes B
Wrangling data with dplyr
: filter
, select
, arrange
Importing Data
In this class, we are going to be working with the possum
dataset which consists of nine morphometric measurements on each of 104 mountain brushtail possums, trapped at seven Australian sites from Southern Victoria to central Queensland.
Source: Lindenmayer, D. B., Viggers, K. L., Cunningham, R. B., and Donnelly, C. F. 1995. Morphological variation among columns of the mountain brushtail possum, Trichosurus caninus Ogilby (Phalangeridae: Marsupiala). Australian Journal of Zoology 43: 449-458.
What is a .csv file?
How do we import it into R?
Alternatively, you can download it directly from the DAAD
package.
#You need to install the DAAG package first. Then you can call it using the code below.
library(DAAG)
data(possum)
Let’s take a look at this data for a minute to see what information has been recorded.
This data frame contains the following columns:
case
observation number site
one of seven locations where possums were trapped. The sites were, in order,Cambarville, Bellbird, Whian Whian, Byrangery, Conondale, Allyn River and Bulburin Pop
a factor which classifies the sites as Vic Victoria, other New South Wales or Queensland sex
a factor with levels f female, m male age
age hdlngth
head length, in cm skullw
skull width, in cm totlngth
total length, in cm taill
tail length, in cm footlgth
foot length, in cm earconch
ear conch length, in cm eye
distance from medial canthus to lateral canthus of right eye, in cm chest
chest girth, in cm belly
belly girth, in cm
Installing and Using Packages
Sometimes everything we need (data, functions, etc) are not available in base R. In R, expert users will package up useful things like data and functions into packages that be download and used.
First, you need to download the package from the right hand menu –> You only need to do this once.
In each new .Rmd document, you need to call any packages you want to use but adding the code library(packagename)
inside an R chunk.
For example, in this class we will use the tidyverse
package a lot.
library(tidyverse)
dplyr
There are actually many commonly used packages wrapped up inside one tidyverse
package.
Today we are specifically going to be talking about the package dplyr
which is useful to manipulating data sets.
filter
We can use the filter
function to extract rows from the data that have a particular characteristic.
For example, we may be interested in only looking at female possums.
#start with the possums dataset, the pipe "%" means apply the action on the following line to the previous line. In this case, pick out only the rows were the sex variable is "f"
# %>% == and then do this
%>%
possum filter(sex == "f")
case site Pop sex age hdlngth skullw totlngth taill footlgth earconch
C5 2 1 Vic f 6 92.5 57.6 91.5 36.5 72.5 51.2
C10 3 1 Vic f 6 94.0 60.0 95.5 39.0 75.4 51.9
C15 4 1 Vic f 6 93.2 57.1 92.0 38.0 76.1 52.2
C23 5 1 Vic f 2 91.5 56.3 85.5 36.0 71.0 53.2
C24 6 1 Vic f 1 93.1 54.8 90.5 35.5 73.2 53.6
C27 8 1 Vic f 6 94.8 57.6 91.0 37.0 72.7 53.9
C28 9 1 Vic f 9 93.4 56.3 91.5 37.0 72.4 52.9
C31 10 1 Vic f 6 91.8 58.0 89.5 37.5 70.9 53.4
C32 11 1 Vic f 9 93.3 57.2 89.5 39.0 77.2 51.3
C34 12 1 Vic f 5 94.9 55.6 92.0 35.5 71.7 51.0
C45 17 1 Vic f 1 94.7 67.7 89.5 36.5 73.2 53.2
C48 19 1 Vic f 5 94.4 55.4 90.5 35.0 73.4 53.9
C50 20 1 Vic f 4 94.8 56.3 89.0 38.0 73.8 52.4
C54 21 1 Vic f 3 95.9 58.1 96.5 39.5 77.9 52.9
C58 23 1 Vic f 4 92.5 56.1 89.0 36.0 72.8 53.3
C63 27 1 Vic f 2 90.5 54.5 85.0 35.0 70.3 50.8
A1 29 1 Vic f 3 92.8 56.0 88.0 35.0 74.9 51.8
A2 30 1 Vic f 2 92.1 54.4 84.0 33.5 70.6 50.8
A4 32 1 Vic f 4 94.3 56.7 94.0 39.0 74.8 52.0
BB17 37 2 Vic f 2 89.3 54.8 82.5 35.0 71.2 52.0
BB31 39 2 Vic f 1 84.7 51.5 75.0 34.0 68.7 53.4
BB33 40 2 Vic f 3 91.0 55.0 84.5 36.0 72.8 51.4
BB36 41 2 Vic f 5 88.4 57.0 83.0 36.5 NA 40.3
BB40 43 2 Vic f 2 90.0 55.5 81.0 32.0 72.0 49.4
WW4 50 3 other f 5 91.6 56.4 88.0 38.0 65.0 47.2
WW5 51 3 other f 5 95.6 59.6 85.0 36.0 64.0 43.9
WW7 53 3 other f 3 93.1 58.1 91.0 38.0 67.4 46.0
BR4 57 4 other f 4 95.1 59.4 93.0 41.0 67.2 45.3
BR7 60 4 other f 2 91.3 57.7 88.0 39.0 63.1 47.0
CD2 62 5 other f 3 91.3 58.0 90.5 39.0 65.5 41.3
CD3 63 5 other f 6 92.0 56.4 88.5 38.0 64.1 46.3
CD4 64 5 other f 3 96.9 56.5 89.5 38.5 63.0 45.1
CD5 65 5 other f 5 93.5 57.4 88.5 38.0 68.2 41.7
CD6 66 5 other f 3 90.4 55.8 86.0 36.5 63.2 44.2
CD10 70 5 other f 7 91.9 56.4 87.0 38.0 65.4 44.1
BSF1 74 6 other f 4 88.7 52.0 83.0 38.0 61.5 45.9
BSF9 82 6 other f 4 86.0 54.0 82.0 36.5 60.7 42.9
BSF10 83 6 other f 3 90.0 53.8 81.5 36.0 62.0 43.3
BSF13 86 6 other f 3 88.2 53.2 86.5 38.5 60.3 43.7
BTP3 88 7 other f 2 89.6 58.0 87.5 38.0 66.7 43.5
BTP15 99 7 other f 3 93.3 56.2 86.5 38.5 64.8 43.8
BTP19 102 7 other f 6 92.4 55.0 89.0 38.0 63.5 45.4
BTP21 104 7 other f 3 93.6 59.9 89.0 40.0 67.6 46.0
eye chest belly
C5 16.0 28.5 33.0
C10 15.5 30.0 34.0
C15 15.2 28.0 34.0
C23 15.1 28.5 33.0
C24 14.2 30.0 32.0
C27 14.5 29.0 34.0
C28 15.5 28.0 33.0
C31 14.4 27.5 32.0
C32 14.9 31.0 34.0
C34 15.3 28.0 33.0
C45 14.7 29.0 31.0
C48 15.2 28.0 32.0
C50 15.5 27.0 36.0
C54 14.2 30.0 40.0
C58 15.4 28.0 35.0
C63 14.2 23.0 28.0
A1 14.0 24.0 32.0
A2 14.5 24.5 33.0
A4 14.9 28.0 34.0
BB17 13.6 28.0 31.5
BB31 13.0 25.0 25.0
BB33 13.6 27.0 30.0
BB36 15.9 27.0 30.5
BB40 13.4 29.0 31.0
WW4 14.9 28.0 36.0
WW5 17.4 28.0 38.5
WW7 16.5 26.0 33.5
BR4 14.5 31.0 39.0
BR7 14.4 26.0 30.0
CD2 16.0 27.0 32.0
CD3 15.2 25.5 28.5
CD4 17.1 25.5 33.0
CD5 14.0 29.0 38.5
CD6 15.7 26.5 34.0
CD10 13.0 27.0 34.0
BSF1 14.7 26.0 34.0
BSF9 15.4 26.0 32.0
BSF10 14.0 25.0 29.0
BSF13 13.6 26.0 31.0
BTP3 16.0 25.5 31.5
BTP15 14.0 28.0 35.0
BTP19 13.0 25.0 30.0
BTP21 14.8 28.5 33.5
##note the sex is text/categorical and so quotation marks are needed.
##R doesn't care about whether they are double quotation marks (") or single ('). They work the same.
# If we don't assign it to an object, then it just prints out for us to see!
#oftentimes, we want to take our subset and give it a new name. This takes our subset and assigns it to a new dataset called `possum_f`.
<- possum %>%
possum_f filter(sex == "f")
#Notice if you assign it to an object that it doesn't print out the contents.
# You'll see the new object in your environment on the top right --->
# If you click on the word `possum_f` (not the blue play button) it will open the object so you can see what is saved inside.
It can also be used with numeric criteria.
Suppose we want a list of all the possums whose total length is less than 80cm.
%>%
possum filter(totlngth < 80)
case site Pop sex age hdlngth skullw totlngth taill footlgth earconch eye
BB31 39 2 Vic f 1 84.7 51.5 75 34.0 68.7 53.4 13.0
BB38 42 2 Vic m 3 85.3 54.1 77 32.0 62.7 51.2 13.8
BB41 44 2 Vic m NA 85.1 51.5 76 35.5 70.3 52.6 14.4
chest belly
BB31 25.0 25
BB38 25.5 33
BB41 23.0 27
The logical operators are given below:
select
select is used to extract only certain columns. For example, perhaps we only want to print out a dataset with only the population, the sex, and the total length of each possum.
# %>%
#MAC (CMD + Shift + M)
#PC (Ctrl + Shift + M)
%>%
possum select(Pop, sex, totlngth)
Pop sex totlngth
C3 Vic m 89.0
C5 Vic f 91.5
C10 Vic f 95.5
C15 Vic f 92.0
C23 Vic f 85.5
C24 Vic f 90.5
C26 Vic m 89.5
C27 Vic f 91.0
C28 Vic f 91.5
C31 Vic f 89.5
C32 Vic f 89.5
C34 Vic f 92.0
C36 Vic m 89.5
C37 Vic m 91.5
C39 Vic m 85.5
C40 Vic m 86.0
C45 Vic f 89.5
C47 Vic m 90.0
C48 Vic f 90.5
C50 Vic f 89.0
C54 Vic f 96.5
C55 Vic m 91.0
C58 Vic f 89.0
C59 Vic m 84.0
C60 Vic m 91.5
C61 Vic m 90.0
C63 Vic f 85.0
C64 Vic m 87.0
A1 Vic f 88.0
A2 Vic f 84.0
A3 Vic m 93.0
A4 Vic f 94.0
AD1 Vic m 89.0
BB4 Vic m 85.5
BB13 Vic m 85.0
BB15 Vic m 88.0
BB17 Vic f 82.5
BB25 Vic m 80.5
BB31 Vic f 75.0
BB33 Vic f 84.5
BB36 Vic f 83.0
BB38 Vic m 77.0
BB40 Vic f 81.0
BB41 Vic m 76.0
BB44 Vic m 81.0
BB45 Vic m 84.0
WW1 other m 89.0
WW2 other m 85.0
WW3 other m 85.0
WW4 other f 88.0
WW5 other f 85.0
WW6 other m 93.5
WW7 other f 91.0
BR1 other m 91.5
BR2 other m 92.5
BR3 other m 93.7
BR4 other f 93.0
BR5 other m 91.0
BR6 other m 96.0
BR7 other f 88.0
CD1 other m 86.0
CD2 other f 90.5
CD3 other f 88.5
CD4 other f 89.5
CD5 other f 88.5
CD6 other f 86.0
CD7 other m 85.0
CD8 other m 88.5
CD9 other m 88.0
CD10 other f 87.0
CD11 other m 90.0
CD12 other m 80.5
CD13 other m 82.0
BSF1 other f 83.0
BSF2 other m 89.0
BSF3 other m 89.0
BSF4 other m 84.0
BSF5 other m 81.0
BSF6 other m 81.0
BSF7 other m 84.0
BSF8 other m 85.5
BSF9 other f 82.0
BSF10 other f 81.5
BSF11 other m 80.5
BSF12 other m 92.0
BSF13 other f 86.5
BTP1 other m 93.0
BTP3 other f 87.5
BTP4 other m 84.5
BTP5 other m 85.0
BTP6 other m 89.0
BTP7 other m 85.0
BTP8 other m 82.0
BTP9 other m 84.0
BTP10 other m 88.5
BTP12 other m 83.0
BTP13 other m 86.0
BTP14 other m 84.0
BTP15 other f 86.5
BTP16 other m 81.5
BTP17 other m 82.5
BTP19 other f 89.0
BTP20 other m 82.5
BTP21 other f 89.0
We can combine criteria together as well in one command with multiple pipes:
%>%
possum filter(sex == "f") %>%
filter(totlngth < 80) %>%
select(Pop, sex, age, totlngth)
Pop sex age totlngth
BB31 Vic f 1 75
arrange
The arrange
function allows us to order the rows of the data frame by the values of a particular column.
For example, arrange all the female possums by the longest length to the shortest total length
%>%
possum_f arrange(-totlngth)
case site Pop sex age hdlngth skullw totlngth taill footlgth earconch
C54 21 1 Vic f 3 95.9 58.1 96.5 39.5 77.9 52.9
C10 3 1 Vic f 6 94.0 60.0 95.5 39.0 75.4 51.9
A4 32 1 Vic f 4 94.3 56.7 94.0 39.0 74.8 52.0
BR4 57 4 other f 4 95.1 59.4 93.0 41.0 67.2 45.3
C15 4 1 Vic f 6 93.2 57.1 92.0 38.0 76.1 52.2
C34 12 1 Vic f 5 94.9 55.6 92.0 35.5 71.7 51.0
C5 2 1 Vic f 6 92.5 57.6 91.5 36.5 72.5 51.2
C28 9 1 Vic f 9 93.4 56.3 91.5 37.0 72.4 52.9
C27 8 1 Vic f 6 94.8 57.6 91.0 37.0 72.7 53.9
WW7 53 3 other f 3 93.1 58.1 91.0 38.0 67.4 46.0
C24 6 1 Vic f 1 93.1 54.8 90.5 35.5 73.2 53.6
C48 19 1 Vic f 5 94.4 55.4 90.5 35.0 73.4 53.9
CD2 62 5 other f 3 91.3 58.0 90.5 39.0 65.5 41.3
C31 10 1 Vic f 6 91.8 58.0 89.5 37.5 70.9 53.4
C32 11 1 Vic f 9 93.3 57.2 89.5 39.0 77.2 51.3
C45 17 1 Vic f 1 94.7 67.7 89.5 36.5 73.2 53.2
CD4 64 5 other f 3 96.9 56.5 89.5 38.5 63.0 45.1
C50 20 1 Vic f 4 94.8 56.3 89.0 38.0 73.8 52.4
C58 23 1 Vic f 4 92.5 56.1 89.0 36.0 72.8 53.3
BTP19 102 7 other f 6 92.4 55.0 89.0 38.0 63.5 45.4
BTP21 104 7 other f 3 93.6 59.9 89.0 40.0 67.6 46.0
CD3 63 5 other f 6 92.0 56.4 88.5 38.0 64.1 46.3
CD5 65 5 other f 5 93.5 57.4 88.5 38.0 68.2 41.7
A1 29 1 Vic f 3 92.8 56.0 88.0 35.0 74.9 51.8
WW4 50 3 other f 5 91.6 56.4 88.0 38.0 65.0 47.2
BR7 60 4 other f 2 91.3 57.7 88.0 39.0 63.1 47.0
BTP3 88 7 other f 2 89.6 58.0 87.5 38.0 66.7 43.5
CD10 70 5 other f 7 91.9 56.4 87.0 38.0 65.4 44.1
BSF13 86 6 other f 3 88.2 53.2 86.5 38.5 60.3 43.7
BTP15 99 7 other f 3 93.3 56.2 86.5 38.5 64.8 43.8
CD6 66 5 other f 3 90.4 55.8 86.0 36.5 63.2 44.2
C23 5 1 Vic f 2 91.5 56.3 85.5 36.0 71.0 53.2
C63 27 1 Vic f 2 90.5 54.5 85.0 35.0 70.3 50.8
WW5 51 3 other f 5 95.6 59.6 85.0 36.0 64.0 43.9
BB33 40 2 Vic f 3 91.0 55.0 84.5 36.0 72.8 51.4
A2 30 1 Vic f 2 92.1 54.4 84.0 33.5 70.6 50.8
BB36 41 2 Vic f 5 88.4 57.0 83.0 36.5 NA 40.3
BSF1 74 6 other f 4 88.7 52.0 83.0 38.0 61.5 45.9
BB17 37 2 Vic f 2 89.3 54.8 82.5 35.0 71.2 52.0
BSF9 82 6 other f 4 86.0 54.0 82.0 36.5 60.7 42.9
BSF10 83 6 other f 3 90.0 53.8 81.5 36.0 62.0 43.3
BB40 43 2 Vic f 2 90.0 55.5 81.0 32.0 72.0 49.4
BB31 39 2 Vic f 1 84.7 51.5 75.0 34.0 68.7 53.4
eye chest belly
C54 14.2 30.0 40.0
C10 15.5 30.0 34.0
A4 14.9 28.0 34.0
BR4 14.5 31.0 39.0
C15 15.2 28.0 34.0
C34 15.3 28.0 33.0
C5 16.0 28.5 33.0
C28 15.5 28.0 33.0
C27 14.5 29.0 34.0
WW7 16.5 26.0 33.5
C24 14.2 30.0 32.0
C48 15.2 28.0 32.0
CD2 16.0 27.0 32.0
C31 14.4 27.5 32.0
C32 14.9 31.0 34.0
C45 14.7 29.0 31.0
CD4 17.1 25.5 33.0
C50 15.5 27.0 36.0
C58 15.4 28.0 35.0
BTP19 13.0 25.0 30.0
BTP21 14.8 28.5 33.5
CD3 15.2 25.5 28.5
CD5 14.0 29.0 38.5
A1 14.0 24.0 32.0
WW4 14.9 28.0 36.0
BR7 14.4 26.0 30.0
BTP3 16.0 25.5 31.5
CD10 13.0 27.0 34.0
BSF13 13.6 26.0 31.0
BTP15 14.0 28.0 35.0
CD6 15.7 26.5 34.0
C23 15.1 28.5 33.0
C63 14.2 23.0 28.0
WW5 17.4 28.0 38.5
BB33 13.6 27.0 30.0
A2 14.5 24.5 33.0
BB36 15.9 27.0 30.5
BSF1 14.7 26.0 34.0
BB17 13.6 28.0 31.5
BSF9 15.4 26.0 32.0
BSF10 14.0 25.0 29.0
BB40 13.4 29.0 31.0
BB31 13.0 25.0 25.0
#use arrange(variable) to go from least to most
#use arrange(desc(variable)) to go from most to least, arrange(-variable) also works
slice
The slice function will allow us to pick only a subset of the rows based on their numeric order (1st through last).
For example, if I want a list the 5 female possums with longest total lengths.
%>%
possum_f arrange(-totlngth) %>%
slice(1:5)
case site Pop sex age hdlngth skullw totlngth taill footlgth earconch
C54 21 1 Vic f 3 95.9 58.1 96.5 39.5 77.9 52.9
C10 3 1 Vic f 6 94.0 60.0 95.5 39.0 75.4 51.9
A4 32 1 Vic f 4 94.3 56.7 94.0 39.0 74.8 52.0
BR4 57 4 other f 4 95.1 59.4 93.0 41.0 67.2 45.3
C15 4 1 Vic f 6 93.2 57.1 92.0 38.0 76.1 52.2
eye chest belly
C54 14.2 30 40
C10 15.5 30 34
A4 14.9 28 34
BR4 14.5 31 39
C15 15.2 28 34