Funktionen

4.4.2. Solutions and recommendations for taxonomic data integration

What could possibly go wrong when you try to integrate two data sets with species names? This GBIF blog post highlights some pitfalls of matching species names from new data sets to the GBIF taxonomic backbone. 
In the following there are common pitfalls:
Variable spelling of species name:
  • Data set 1: Carabus arvensis Herbst 1784 (valid name)
  • Data set 2: Carabus arcensis Herbst 1784 (Synonym)
  • Data set 3: Carabus aruensis Herbst 1784 (misspelt epithet)
Variable spelling of authority
  • Data set 1: Carabus arvensis Herbst 1784
  • Data set 2: Carabus arvensis Hbst., 1784 (official abbreviation of authority)
  • Data set 3: Carabus arvensis Hrst. 1784 (misspelt authority)
Missing authority
Mismatch of ranks
To integrate taxonomic data sets for your own purposes, you should match the two sets of species names against a taxonomic reference database. Many databases are using alphanumeric unique taxon identifiers (UTI) additionally to species names to facilitate taxonomic data integration. A multitude of software tools and R packages helps to access and use these taxonomic databases, matching both UTIs and species names. Many research teams are working hard to improve taxonomic integration at all levels of data curation: Schellenberger Costa et al. (2023) [1] compared four global authoritative checklists for vascular plants and proposed workflows to better integrate these important databases in the future, Grenié et al. (2021, 2022) [2][3] compiled a living database of taxonomic databases and R packages and gave an overview of tools, databases and best practices for matching species names, and Sandall et al. (2023) [4] proposed key elements of a global integrated structure of taxonomy (GIST) to reconcile different aspects, approaches and cultures of taxonomy.
This image is a diagram explaining the concept of "Taxonomic Harmonization" between two datasets, A and B.
On the left side, labeled "Dataset A," there are circles with different colors and labels representing the conservation status of species: green with 'LC' for Sp1, yellow with 'VU' for Sp2, teal with 'NT' for Sp3, and red with 'CR' for Sp5.
In the center, there is a column with the title "Taxonomic Reference Database" with a list of species, Sp1 to Sp6. The species from Dataset A (Sp1, Sp2, Sp3, and Sp5) are matched with their counterparts in the database. Sp5 from Dataset A is dashed, indicating an uncertain match with Sp6 in the database.
On the right side, labeled "Dataset B," there are colored bars representing the traits of species Sp2, Sp3, Sp4, and Sp6, with each species having a unique combination of colors.
Arrows show the correspondence between the datasets and the taxonomic reference database, illustrating the process of harmonization where species are cross-referenced and aligned between different datasets based on a standard taxonomic concept.
Figure 5. Taxonomic reference databases are essential to integrate taxonomic data sets.
Figure 5. Taxonomic reference databases are essential to integrate taxonomic data sets. [5],Figure 2, CC BY 4.0
Taxonomic databases serve specific tasks and encompass different aspects of biodiversity. The suitability of a particular database for your project depends on the type of data you are looking for [6][7]:
  • Databases creating new taxonomic frameworks: Examples include FishBase, Avibase, Amphibian Species of the World (ASW). These databases develop novel taxonomic backbones.
  • Databases aggregating primary taxonomic lists and linking databases: Notable databases in this category are the Catalogue of Life (COL+), Encyclopedia of Life (EoL). They bring together primary taxonomic information and link various databases.
  • Databases combining new frameworks and taxonomic lists: Databases like ITIS and WoRMS fall into this category. They not only produce new taxonomic backbones but also integrate primary taxonomic lists.
  • Databases focused on biodiversity data aggregation: This group includes databases like GBIF, OBIS, Map of Life (MOL), INSDC and IUCN. While not primarily taxonomic, their main purpose is to aggregate and provide comprehensive biodiversity data.
Understanding the nature and focus of these databases is essential in choosing the right one for your research or project. If you are re-using taxonomic data collected by others, you should therefore take care for
  • Find out which taxonomic reference was used to identify the species
  • Consider the possibility of misidentification (e.g. if a genetic sequence is not linked to a voucher specimen, or if the authority is missing)
  • Consider the possibility of misspelling of both species names and authority names
  • Select a taxonomic backbone that is complete and suitable for your taxon of interest for taxonomic harmonisation
If you are collecting new taxonomic data yourself, or in collaboration with taxonomic experts, you should additionally take care to:
  • Note the scientific name and authority of a species
  • Note the identification literature that was used to identify specimens and the person(s) who identified specimens (when working with taxonomic experts: ask them which reference literature they used)
  • Note the taxonomic references for each taxon, i.e. which taxonomic references the used identification literature is based on
  • Note the place and date when you found a specimen
  • Note details about the method used to record an occurrence
Species names connect physical, functional, spatial, and genetic data sets. Providing detailed taxonomic metadata will help to keep your data alive and re-usable for your own and for others research.

[1] Schellenberger Costa, D., Boehnisch, G., Freiberg, M., Govaerts, R., Grenié, M., Hassler, M., et al. (2023). The big four of plant taxonomy – a comparison of global checklists of vascular plant names. New Phytologist, nph. https://doi.org/10.1111/nph.18961
[2] Grenié, M., Berti, E., Carvajal-Quintero, J., Dädlow, G., Sagouis, A. & Winter, M. (2021). taxharmonizexplorer - Navigate the Taxonomic Harmonization landscape. taxharmonizeexplorer. Available at: https://mgrenie.shinyapps.io/taxharmonizexplorer/. Last accessed 24 October 2023.
[3] Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G.M.L., Sagouis, A. & Winter, M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods Ecol Evol, 2041–210X.13802. https://doi.org/10.1111/2041-210X.13802
[4] Sandall, E.L., Maureaud, A.A., Guralnick, R., McGeoch, M.A., Sica, Y.V., Rogan, M.S., et al. (2023). A globally integrated structure of taxonomy to support biodiversity science and conservation. Trends in Ecology & Evolution, S016953472300215X. https://doi.org/10.1016/j.tree.2023.08.004
[5] Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G.M.L., Sagouis, A. & Winter, M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods Ecol Evol, 2041–210X.13802. https://doi.org/10.1111/2041-210X.13802
[6] Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G.M.L., Sagouis, A. & Winter, M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods Ecol Evol, 2041–210X.13802. https://doi.org/10.1111/2041-210X.13802
[7] Sandall, E.L., Maureaud, A.A., Guralnick, R., McGeoch, M.A., Sica, Y.V., Rogan, M.S., et al. (2023). A globally integrated structure of taxonomy to support biodiversity science and conservation. Trends in Ecology & Evolution, S016953472300215X. https://doi.org/10.1016/j.tree.2023.08.004 


Benutzerbild: daehne
[daehne] - 4. Jan 2024
Zählung der Figure? und Zeile noch mal darunter?