• Spatial data is often larger than we need it to be
  • E.g., Raster data is typically a large rectangular block of data
  • And we are often only interested in the portion of that data that is located inside our study region
  • To get the piece of spatial data that we want for our maps or analysis we can “crop” the data
  • We’ll look at this using the data from Harvard forest we’ve been working with so far
  • Including the DTM file that covers the entire site
library(stars)
library(sf)
library(ggplot2)

harv_boundary <- st_read("data/HARV/harv_boundary.shp")
harv_dtm <- read_stars("data/HARV/HARV_dtmFull.tif")
  • If we plot this data we see that we have elevation data for a large square surrounding the site
ggplot() +
  geom_stars(data = harv_dtm) +
  scale_fill_viridis_c() +
  geom_sf(data = harv_boundary, alpha = 0)

Cropping a raster using to a vector polygon

  • There are two general approaches to cropping data
  • The first is to crop a raster to only keep the portion that falls inside some vector data
  • For example, we often only want the portion of a raster dataset that falls inside the boundar of our study site
  • We can do this using the st_crop function
  • The first argument is the raster we want to crop
  • The second argument is the vector data we want to crop it to
  • Let’s crop our raster to only include points inside the site boundary
harv_dtm_cropped <- st_crop(harv_dtm, harv_boundary)
  • We can see that the data has been cropped by looking at the extents
harv_dtm
harv_dtm_cropped
  • harv_dtm_cropped has smaller values for from and to in both the x and y dimensions
  • Let’s plot the cropped raster to see what it looks like
ggplot() +
  geom_stars(data = harv_dtm_cropped) +
  scale_fill_viridis_c() +
  geom_sf(data = harv_boundary, alpha = 0)
  • We now see colored values only for the part of the part of the raster inside the boundary
  • But we still see gray boxes outside of it
  • These boxes indicate that the values have been replaced with null values
  • The raster has been “cropped” to the limits of the vector object but rasters are always rectangular
  • So these null values fill the space outside of the vector object but within it’s x and y extent
  • We can choose to not show these null values by setting their color to be transparent
ggplot() +
  geom_stars(data = harv_dtm_cropped) +
  scale_fill_viridis_c(na.value = "transparent") +
  geom_sf(data = harv_boundary, alpha = 0)
  • Cropping removes the portion of the raster that is outside the x/y extent of the vector
  • If we want to keep the full dimensions of the raster but convert all values outside the vector to NA we “mask” the data instead of cropping it
  • Do this with an optional argument crop = FALSE
harv_dtm_masked <- st_crop(harv_dtm, harv_boundary, crop = FALSE)
harv_dtm_masked
  • We can see that it still has the same dimensions as the original raster, 150 x 150
  • But if we plot it in the same way as the cropped data we will only see the values inside the vector
ggplot() +
  geom_stars(data = harv_dtm_masked) +
  scale_fill_viridis_c(na.value = "transparent") +
  geom_sf(data = harv_boundary, alpha = 0)

Cropping to a bounding box

  • The other common approach to cropping is to crop spatial objects to only include those within a bounding box
  • E.g., we might only want to explore a specific area within Harvard Forest
  • We still use st_crop to do this but we pass it a square region instead of a polygon
  • To do this we need to know the values for the region we want to crop
  • To figure out these values let’s modify our plot to plot in the CRS of our data
ggplot() +
  geom_stars(data = harv_dtm_cropped) +
  scale_fill_viridis_c(na.value = "transparent") +
  geom_sf(data = harv_boundary, alpha = 0) +
  coord_sf(datum = st_crs(harv_dtm))
  • We create a bounding box using the st_bbox function
  • We describe the square based on the largest and smallest values of both x and y
  • We provide this information in a named vector
  • We create this using the c function, but giving names to each value using the name we want and the =
  • The names for the bounding box are xmin, xmax, ymin, ymax
  • And we also need to provide the CRS
bbox <- st_bbox(c(xmin = 731000, ymin = 4713000, xmax = 732000, ymax = 4714000), crs = st_crs(harv_dtm))
  • So let’s crop a square in this region here
harv_dtm_small <- st_crop(harv_dtm, bbox)
  • We can also perform bounding box cropping on vector data
  • Let’s load and then crop our soils data
harv_soils <- st_read("data/HARV/harv_soils.shp")
harv_soils_small <- st_crop(harv_soils, bbox)
  • For sf vector data we can also skip the st_bbox function and just pass the named vector with xmin, xmax, ymin, and ymax directly
  • Let’s plot our cropped regions together
ggplot() +
  geom_stars(data = harv_dtm_small) +
  scale_fill_viridis_c(na.value = "transparent") +
  geom_sf(data = harv_soils_small, alpha = 0)