4.4.2. Solutions and recommendations for taxonomic data integration
What could possibly go wrong when you try to integrate two data sets with species names? This GBIF blog post highlights some pitfalls of matching species names from new data sets to the GBIF taxonomic backbone.
In the following there are common pitfalls:
Variable spelling of species name:
- Data set 1: Carabus arvensis Herbst 1784 (valid name)
- Data set 2: Carabus arcensis Herbst 1784 (Synonym)
- Data set 3: Carabus aruensis Herbst 1784 (misspelt epithet)
Variable spelling of authority
- Data set 1: Carabus arvensis Herbst 1784
- Data set 2: Carabus arvensis Hbst., 1784 (official abbreviation of authority)
- Data set 3: Carabus arvensis Hrst. 1784 (misspelt authority)
Missing authority
- Data set 1: Glocianus punctiger
- Data set 2: Glocianus punctiger
Mismatch of ranks
- Data set 1: Carabus arvensis Herbst 1784
- Data set 2: (Red List Germany 2009 ff., red-list status was evaluated for three subspecies)
- Carabus arvensis arvensis Herbst, 1784 - Vorwarnliste (similar to IUCN Red List Near threatened (NT), but different methodology)
- Carabus arvensis noricus Sokolar, 1910 - Ungefährdet (similar to IUCN Red List Least concern (LC), but different methodology)
- Carabus arvensis sylvaticus Dejean, 1826 - Gefährdet (similar to IUCN Red List Endangered (ED), but different methodology)
- Data set 3: Carabus sp.
- Data set 4: Carabidae sp. 4
To integrate taxonomic data sets for your own purposes, you should match the two sets of species names against a taxonomic reference database. Many databases are using alphanumeric unique taxon identifiers (UTI) additionally to species names to facilitate taxonomic data integration. A multitude of software tools and R packages helps to access and use these taxonomic databases, matching both UTIs and species names. Many research teams are working hard to improve taxonomic integration at all levels of data curation: Schellenberger Costa et al. (2023) [1] compared four global authoritative checklists for vascular plants and proposed workflows to better integrate these important databases in the future, Grenié et al. (2021, 2022) [2][3] compiled a living database of taxonomic databases and R packages and gave an overview of tools, databases and best practices for matching species names, and Sandall et al. (2023) [4] proposed key elements of a global integrated structure of taxonomy (GIST) to reconcile different aspects, approaches and cultures of taxonomy.
Figure 5. Taxonomic reference databases are essential to integrate taxonomic data sets. [5],Figure 2, CC BY 4.0
Taxonomic databases serve specific tasks and encompass different aspects of biodiversity. The suitability of a particular database for your project depends on the type of data you are looking for [6][7]:
- Databases creating new taxonomic frameworks: Examples include FishBase, Avibase, Amphibian Species of the World (ASW). These databases develop novel taxonomic backbones.
- Databases aggregating primary taxonomic lists and linking databases: Notable databases in this category are the Catalogue of Life (COL+), Encyclopedia of Life (EoL). They bring together primary taxonomic information and link various databases.
- Databases combining new frameworks and taxonomic lists: Databases like ITIS and WoRMS fall into this category. They not only produce new taxonomic backbones but also integrate primary taxonomic lists.
- Databases focused on biodiversity data aggregation: This group includes databases like GBIF, OBIS, Map of Life (MOL), INSDC and IUCN. While not primarily taxonomic, their main purpose is to aggregate and provide comprehensive biodiversity data.
Understanding the nature and focus of these databases is essential in choosing the right one for your research or project. If you are re-using taxonomic data collected by others, you should therefore take care for
- Find out which taxonomic reference was used to identify the species
- Consider the possibility of misidentification (e.g. if a genetic sequence is not linked to a voucher specimen, or if the authority is missing)
- Consider the possibility of misspelling of both species names and authority names
- Select a taxonomic backbone that is complete and suitable for your taxon of interest for taxonomic harmonisation
If you are collecting new taxonomic data yourself, or in collaboration with taxonomic experts, you should additionally take care to:
- Note the scientific name and authority of a species
- Note the identification literature that was used to identify specimens and the person(s) who identified specimens (when working with taxonomic experts: ask them which reference literature they used)
- Note the taxonomic references for each taxon, i.e. which taxonomic references the used identification literature is based on
- Note the place and date when you found a specimen
- Note details about the method used to record an occurrence
Species names connect physical, functional, spatial, and genetic data sets. Providing detailed taxonomic metadata will help to keep your data alive and re-usable for your own and for others research.
[1] Schellenberger Costa, D., Boehnisch, G., Freiberg, M., Govaerts, R., Grenié, M., Hassler, M., et al. (2023). The big four of plant taxonomy – a comparison of global checklists of vascular plant names. New Phytologist, nph. https://doi.org/10.1111/nph.18961
[2] Grenié, M., Berti, E., Carvajal-Quintero, J., Dädlow, G., Sagouis, A. & Winter, M. (2021). taxharmonizexplorer - Navigate the Taxonomic Harmonization landscape. taxharmonizeexplorer. Available at: https://mgrenie.shinyapps.io/taxharmonizexplorer/. Last accessed 24 October 2023.
[3] Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G.M.L., Sagouis, A. & Winter, M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods Ecol Evol, 2041–210X.13802. https://doi.org/10.1111/2041-210X.13802
[4] Sandall, E.L., Maureaud, A.A., Guralnick, R., McGeoch, M.A., Sica, Y.V., Rogan, M.S., et al. (2023). A globally integrated structure of taxonomy to support biodiversity science and conservation. Trends in Ecology & Evolution, S016953472300215X. https://doi.org/10.1016/j.tree.2023.08.004
[5] Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G.M.L., Sagouis, A. & Winter, M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods Ecol Evol, 2041–210X.13802. https://doi.org/10.1111/2041-210X.13802
[6] Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G.M.L., Sagouis, A. & Winter, M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods Ecol Evol, 2041–210X.13802. https://doi.org/10.1111/2041-210X.13802
[7] Sandall, E.L., Maureaud, A.A., Guralnick, R., McGeoch, M.A., Sica, Y.V., Rogan, M.S., et al. (2023). A globally integrated structure of taxonomy to support biodiversity science and conservation. Trends in Ecology & Evolution, S016953472300215X. https://doi.org/10.1016/j.tree.2023.08.004