ILIAS of the Philipps-Universit├Ąt Marburg
AutorInnen: Marius Dieckmann, Prof. Dr. Alexander Goesmann
Abstract

The German Network for Bioinformatics Infrastructure (de.NBI) consists of eightservice centers joining forces to provide bioinformatics services for scientists inbiotechnology and biomedicine. As part of the Bielefeld/Giessen (BiGi) servicecenter, the Goesmann Lab at Justus Liebig University Giessen (JLU) has a strongfocus on high-performance computing services including an OpenStackinfrastructure for cloud computing and a repository of reusable workflows suitablefor high-throughput sequence data analysis. For this task, we have direct accessto a comprehensive hardware and cloud infrastructure that is operated by theBioinformatics Core Facility (BCF) at JLU. In cooperation with national andinternational partners from different scientific disciplines, we analyze tremendousamounts of biological sequence data using our own bioinformatics tools andprovide support and user training in the field of microbial bioinformatics to thelife science community. To meet the needs of an efficient and sustainable data storage and evaluation,we started to develop data management solutions. In this context, we are alsopart of the recently funded initiative NFDI4BioDiversity. Here, we develop aproject data management system for the researchers involved. It is designed tosupport a state-of-the-art data lifecycle management that can handle petabytesof data and millions of data objects and also involves storage and usage ofmetadata files. Within this lifecycle, data is not only generated, analyzed andstored once but it is also part of a refinement process undergoing constantimprovement. These improvements include (i) small bugfixes and (ii) majorchanges in the dataset due to the incorporation of additional data as well as (iii)changes in the data format. The improved results are stored and versioned toallow for reproduction.