1 Introduction

When you are using and comparing pollen data from multiple sites, it is important to make sure pollen taxa are named consistently across sites. This process is called taxon harmonization. Throughout this tutorial we will be using Olea, the olive, as an example.

There are a few different reasons why taxon harmonization is important.

  1. Taxon names change over time. For example, the African olive once was thought to be its own species, Olea africana. However, with new botanical and molecular information, we now know that it is actually a subspecies, rather than its own species. So the accepted name has now been updated to Olea europaea subsp. cuspidata. Older datasets in Neotoma may refer to the same taxon by its older name.
  2. A particular taxon can be identified to varying resolution. For instance, Olea capensis pollen can be identified to a species-level morphotype, but sometimes analysts will only identify it to the genus level. Depending on the nature of your analysis, you may therefore want to aggregate all the Olea capensis pollen in your data to a broader Olea category.
  3. Many plant taxa are cosmopolitan. Across a regional or continental synthesis, it isn’t obvious how you should deal with the question of splitting and lumping. In the case of Olea, depending on your question, you may want to treat the population from southern Africa as distinct from the east African population - or you may not. This is something you should be intentional about.

There are multiple good ways to harmonize. It all depends on what best suits the analysis you intend.

Luckily for us, The African Pollen Database curates a valuable table for assisting in taxa harmonization across African pollen. This guide will walk you through use of the APD taxa harmonization table with a simple example.

1.1 Packages and Data

We’ll first load up some packages we’re going to be using, and then grab some pollen data to play with from Neotoma.

if (length(grep("pacman",as.data.frame(installed.packages())$Package)) ==0) {
  install.packages("pacman") 
  library(pacman)
  p_load(neotoma2,tidyverse,DT,geojsonsf,sf,leaflet,httr,stringr, ggplot2, tmap, rosm, osmdata,plotly)
} else {
  library(pacman)
  p_load(neotoma2,tidyverse,DT,geojsonsf,sf,leaflet,httr,stringr, ggplot2, tmap, rosm, osmdata,plotly)}

We make a bounding box that encompasses all of Africa. Then we grab all Neotoma sites from that box, and filter for just those datasetids that concern pollen. Then we use the Neotoma2 package to download all of those pollen data.

lats = c(38, 38, -36, -36)
lons = c(-18, 52, 52, -18) # Reordered for a rectangle

# Create a data frame with coordinates
coordinates = data.frame(lat = lats, lon = lons)

# Convert to sf object and create a polygon
coordinates_sf = coordinates %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
  summarise(geometry = st_combine(geometry)) %>%
  st_cast("POLYGON")

# Plot to check
tm_shape(osm.raster(coordinates_sf)) +
  tm_rgb() +
  tm_shape(coordinates_sf) +
  tm_polygons(alpha = 0.5)