1 Introduction

This page will demonstrate one situation where the difference between site and core location coordinates matters when using paleodata from the Neotoma Paleoecology Database: getting sediment depths associated with locations.

2 Our sites

Our sites and cores are mapped out below. Sites and associated cores are filled with the same color. Sites have black outlines, and cores have white outlines, and if you click on a site or core its name should pop up. For those sites that are described as polygons, the point location plotted by the neotoma2::plotLeaflet() function is also rendered as a big point with a black outline.

3 Depths

Let’s compare the depths associated with the point site locations to the depths associated with the core locations. We grab all the sites from Neotoma with “Tanganyika” in their names. Then we download the sample data from all the sites just to get the elevation field, using the neotoma2::samples(), get_downloads(), and get_datasets() functions.

Meanwhile, we define the object site_sf as the point locations of our sites, and we download and filter for the relevant collection units associated with those sites. (See this tutorial for more detail about the collection units and site information.)

tanganyikaSites = get_sites(sitename = "%Tanganyika%")

samps = get_downloads(get_datasets(tanganyikaSites))
## ........................
site_sf = as.data.frame(tanganyikaSites)[c(4,3,2,1)] %>% st_as_sf(coords=c("long","lat"), crs="WGS84")




text="collectionunits"
collunits = content(GET(paste0('https://api.neotomadb.org/v2.0/data/dbtables/',text,'?count=false&limit=99999&offset=0')))$data



collunit_mat = matrix(nrow=length(collunits),ncol=20)

for (i in seq(length(collunits))) {
  for (j in seq(20)) {
    if (!is.null(collunits[[i]][[j]])) {
      collunit_mat[[i,j]] = collunits[[i]][[j]]
    }
  }
}

collunit_df = collunit_mat %>% as.data.frame()

names(collunit_df) = c("collectionunitid","handle","siteid","colltypeid","depenvtid","collunitname","colldate","colldevice","gpslatitude","gpslongitude","gpsaltitude","gpserror","waterdepth","substrateid","slopeaspect","slopeangle","location","notes","recdaterecreated","recdatemodified")


filtered_colls = collunit_df %>% dplyr::filter(collectionunitid %in% siteMetadata_df$collectionunitid) 
filteredColl_sf = collunit_df %>% dplyr::filter(collectionunitid %in% siteMetadata_df$collectionunitid) %>% st_as_sf(coords=c("gpslongitude","gpslatitude"), crs="WGS84")

Now that we’ve defined all our objects, we are going to look for all the relevant depth and elevation data. We can load up a bathymetric map of Lake Tanganyika and vectors of the depths of Lake Tanganyika provided by Mike McGlue and tcarta. When we plot the bathymetric raster map, we get a qualitative sense of the depth profile of Lake Tanganyika, but we want actual numbers, so we need to use the depth contour vectors. (The difference between raster and vector data is beyond the scope of this tutorial, but if you need a refresher, see this page.)

bathy = stack("tanganyika_bathy.tif")

contours = st_read("TCarta_Tanganyika_contours.shp")
## Reading layer `TCarta_tanganyika_contours' from data source 
##   `C:\Users\Blois Lab Generic\Documents\GitHub\neotomaTutorials\TCarta_tanganyika_contours.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 8101 features and 4 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 29.08808 ymin: -8.81098 xmax: 31.20154 ymax: -3.347897
## Geodetic CRS:  WGS 84
tm_shape(bathy[[1:3]]) + tm_rgb(r = 1, g = 2, b = 3)
## stars object downsampled to 625 by 1600 cells. See tm_shape manual (argument raster.downsample)

To get the values of our site and collection unit points, we need to do some spatial interpolation, because the contours don’t overlap our points of interest. We’ll use a simple method called inverse distance-weighted spatial interpolation, implemented in the R package gstat. First we get 100,000 points from our contour lines, filtering out any unsampled contour lines, and then we build our spatial interpolation model, which simply looks for the nearest neighbors of of our site and collection unit points from the points we sampled, and averages them, giving more weight to points that are closer. Finally, we use our model to predict depths based on site location and based on core location, and display them, as well as their difference, in our table below.

Note: while creating this exercise, the Neotoma team noticed an error in the Neotoma elevation metadata and corrected it. People curate these data, which makes them high value, but sometimes people make mistakes!

depth_multipoints = st_sample(contours,100000)
## although coordinates are longitude/latitude, st_sample assumes that they are
## planar
sf_object <- st_sf(depth = contours$ELEVATION, geometry = depth_multipoints) %>% filter(!st_is_empty(geometry))

idw_model <- gstat(
  formula = depth ~ 1, # Depth is the dependent variable
  locations = sf_object,
  nmax = 10, # Optional: Limit to nearest neighbors
  set = list(idp = 2) # Set power for IDW
)

# Predict at specific points
predictions_site <- predict(idw_model, newdata = rbind(site_sf))
## [inverse distance weighted interpolation]
predictions_coll <- predict(idw_model, newdata = rbind(filteredColl_sf))
## [inverse distance weighted interpolation]
site_df = st_drop_geometry(site_sf) %>% cbind(round(predictions_site$var1.pred))

coll_df = st_drop_geometry(filteredColl_sf)
coll_df$depth_coll = (round(predictions_coll$var1.pred))
names(site_df) = c("sitename","siteid","depth_site")

depths_all = as.data.frame(samps) %>% left_join(coll_df,by=join_by("siteid" == "siteid")) %>%  left_join(site_df,by=join_by("siteid" == "siteid")) %>% dplyr::select(siteid,collectionunitid, elev,gpsaltitude,location,depth_site,depth_coll)

depths_all = depths_all %>% dplyr::mutate(depth_difference = depth_site - depth_coll)
datatable(depths_all,rownames=FALSE, options = list(pageLength=11))

The difference between the depth estimated from the site location and the depth estimated from the core location is in the column “depth_difference” in the above table. “Depth_site,” “depth_coll”, and “depth_difference” are all measured in meters. We can see there is one case where the depth based on the collection unit is equal to the depth based on the site - but that’s our one point where the site is in fact the location of a single core, so we expect that. Otherwise, there’s always a difference between the purported site depth and the actual collection unit depth that can be as big as 500 m. That’s a significant difference if you’re interested in the aquatic ecology of the collection unit area. Also notice that when the depth is reported in the Neotoma metadata (in the location field), except for our one site where there’s no difference, the core location depth is always closer to the reported depth.

4 Conclusion

We compared the depths associated with nominal point site locations and with actual core locations for Neotoma paleodata from Lake Tanganyika. We found that there’s always a difference between the site and core location depth, confirming our intuition: we better be careful about the difference between sites and collection units in Neotoma.

If you would like to provide feedback about this tutorial, please complete this form or reach out to Nick Hoffman at .