Funktionen

6.3. Data preservation using repositories and data centres

Data preservation in a research data repository is usually accompanied by publication of the data produced. Access to such publications can and, in the case of sensitive data such as personal data, must be restricted. In accordance with good scientific practice, repositories must ensure that the published research data are stored and made available for at least ten years, after which time availability is no longer necessarily guaranteed, but is nevertheless usually continued. If data are removed from the repository after this minimum retention period at the decision of the operator, the reference to the metadata must remain available.
By using data centres or even repositories, it is possible to restrict access to confidential and sensitive data and at the same time enable data sharing for research and educational purposes. The data held in data centres and archives are generally not publicly accessible. Their use after user registration is restricted to specific purposes. Users sign an end-user licence in which they agree to certain conditions, such as not using data for commercial purposes or not identifying potentially identifiable individuals. The type of data access permitted is determined in advance with the originator. Furthermore, data centres can impose additional access regulations for confidential data [1].
Repositories are usually divided into three different types: Institutional repositories, subject repositories and interdisciplinary or generic repositories. A fourth, more specific variant are so-called software repositories, in which software or pure software code can be published. These are usually designed for one programming language at a time (e.g. PyPI for the programming language “Python”).
Institutional repositories include all those repositories that are provided by mostly state-recognised institutions. These may include universities, museums, research institutions or other institutions that have an interest in making research results or other documents of scientific importance available to the public. As part of the DFG's “Guidelines for Safeguarding Good Research Practice”, there is an official requirement that the research data on which a scientific work is based must be “archived in an accessible and identifiable manner for a period of ten years at the institution where the data were produced or in cross-location repositories” [2]. The archiving period begins on the date when the results are made publicly available.
Subject-specific repositories are equipped to handle domain-specific data and hold valuable community resources for research, such as the European Nucleotide Archive (ENA). Publishing in a renowned subject-specific repository, for example one of the natural history collections, can greatly contribute to enhancing your scientific reputation, but this needs to be prepared carefully. For biological data there are specific repositories, for example OBIS for marine biodiversity data or the Plant Genomics and Phenomics Research Data Repository (e!DAL-PGP). Species observations should be deposited in repositories which deliver data to the Global Biodiversity Information Facility (GBIF), for example PANGAEA Publisher for earth and environmental science. To find out whether a suitable subject-specific repository is available for your research area, it is useful to search via the repository index re3data and FAIRsharing.org. As an environmental scientist or ecologist, you can use the GFBio submission service as a central point of entry for federated archiving of different project outputs in specialist repositories (including ENA, PANGAEA and several collection data centres). 
If there is no suitable repository, the last option is to publish in a large interdisciplinary repository. A free option is offered by the service Zenodo, funded by the European Commission. If your university is a member of Dryad, you can also publish there, free of charge. RADAR offers a fee-based service for publishing data in Germany and is also a good option for larger projects with centralised data management. The most frequently used option in Europe is probably Zenodo. When publishing on Zenodo, make sure that you also assign your research data to one or more communities that in some way reflect a subject-specificity within this generic service.
The main difference between PANGAEA/Dryad and Zenodo/RADAR is that in PANGAEA and Dryad the data are curated by expert editors. They transform and harmonise the data for interoperability and reusability which renders them FAIR. In Zenodo and RADAR data is mostly stored as delivered. Often as undocumented files or zip archives which massively hampers reusability.
Repositories are typically domain-agnostic, accommodating a wide range of research fields. They may lack standardised protocols and structures, leading to variability in data organisation and metadata. Additionally, repositories may not always provide guarantees for long-term archiving, as their sustainability can depend on institutional or project-specific factors, potentially making them less reliable for the extended preservation of research data when compared to the more comprehensive and standardised services offered by data centres. Data centres, In contrast to repositories, are well-equipped to ensure long-term preservation of data, often adhering to domain-specific standards and best practices for data management and archiving. They are typically connected to data portals that systematically harvest and disseminate metadata, enhancing data discoverability and accessibility on a broader scale. This structured approach and their commitment to domain-specific standards make data centres a reliable choice for researchers seeking robust and standardised long-term data preservation solutions. NFDI4Biodiversity collaborates with a network of specialised data centres, each with distinct expertise. These centres include those focused on nucleotide, plant, and environmental data such as e!DAL-PGP, ENA, and PANGAEA, as well as data centres associated with natural science collections like the Botanical Garden and Botanical Museum Berlin (BGBM), DSMZ-German Collection of Microorganisms and Cell Cultures, Leibniz-Institut zur Analyse des Biodiversitätswandels (LIB), Museum für Naturkunde Berlin (MfN), Senckenberg, Staatliches Museum für Naturkunde Stuttgart (SMNS) and Staatliche Naturwissenschaftliche Sammlungen Bayerns (SNSB). These centres can manage a wide range of data types crucial for biodiversity research, from occurrence data and environmental variables to molecular sequences, multimedia files, experimental measurements, and even geospatial data like orthophotos and digital surface models, offering comprehensive support for biodiversity-related research endeavours. The method for data submission varies depending on the data center you choose. You have the option to submit data directly or utilize tools like Diversity Workbench. Additionally, the GFBio Data Submission Service offers professional assistance, particularly beneficial if you are dealing with heterogeneous datasets or are uncertain about the most suitable data center for your specific requirements. Regardless of where you end up publishing your data, always make sure to include a descriptive “metadata file” in addition to the data, describing the data and setting out the context of the data collection. When choosing your preferred repository, also look to see if it is certified in any way (e.g. CoreTrustSeal). Whether a repository is certified can be checked at re3data.
Recommended selection criteria for repositories:
  • Repositories should be trustworthy (recommended: Core Trust Seal or DINI-certified)
  • As discipline-specific and suitable for data type as possible
  • Published data should be fully citable via persistent identifier (e.g. DOI, URN)
  • Data can be cross-referenced with journal articles
  • Searchable across publishers & institutions, by human or machine
  • Metadata are machine harvestable, contents machine-retrievable
  • Provides steps for data preparation and quality control
  • Transparent usage and licensing conditions
  • Open terms of reuse, no paywalls
  • Transparent cost/business mode
  • The repository provides professional support during the preparation and archiving of your data
  • Preservation services, including migration of formats
Check re3data.org to get information on what kind of services repositories provide.

[1] forschungsdaten.info. (2023a). Daten­schutz­recht. Schutz von personenbezogenen Forschungsdaten. forschungsdaten.info. Available at: https://forschungsdaten.info/themen/rechte-und-pflichten/datenschutzrecht/. Last accessed 28 September 2023
[2] DFG. (2022). Guidelines for Safeguarding Good Research Practice. Code of Conduct. https://doi.org/10.5281/zenodo.6472827


Bisher wurde noch kein Kommentar abgegeben.