AUTHOR=Dorst Mathieu , Zeevenhooven Nathan , Wilding Rory , Mende Daniel , Brandt Bernd W. , Zaura Egija , Hoekstra Alfons , Sheraton Vivek M. TITLE=FAIR compliant database development for human microbiome data samples JOURNAL=Frontiers in Cellular and Infection Microbiology VOLUME=14 YEAR=2024 URL=https://www.frontiersin.org/journals/cellular-and-infection-microbiology/articles/10.3389/fcimb.2024.1384809 DOI=10.3389/fcimb.2024.1384809 ISSN=2235-2988 ABSTRACT=Introduction

Sharing microbiome data among researchers fosters new innovations and reduces cost for research. Practically, this means that the (meta)data will have to be standardized, transparent and readily available for researchers. The microbiome data and associated metadata will then be described with regards to composition and origin, in order to maximize the possibilities for application in various contexts of research. Here, we propose a set of tools and protocols to develop a real-time FAIR (Findable. Accessible, Interoperable and Reusable) compliant database for the handling and storage of human microbiome and host-associated data.

Methods

The conflicts arising from privacy laws with respect to metadata, possible human genome sequences in the metagenome shotgun data and FAIR implementations are discussed. Alternate pathways for achieving compliance in such conflicts are analyzed. Sample traceable and sensitive microbiome data, such as DNA sequences or geolocalized metadata are identified, and the role of the GDPR (General Data Protection Regulation) data regulations are considered. For the construction of the database, procedures have been realized to make data FAIR compliant, while preserving privacy of the participants providing the data.

Results and discussion

An open-source development platform, Supabase, was used to implement the microbiome database. Researchers can deploy this real-time database to access, upload, download and interact with human microbiome data in a FAIR complaint manner. In addition, a large language model (LLM) powered by ChatGPT is developed and deployed to enable knowledge dissemination and non-expert usage of the database.