AUTHOR=Kumar Bablu , Lorusso Erika , Fosso Bruno , Pesole Graziano TITLE=A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions JOURNAL=Frontiers in Microbiology VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2024.1343572 DOI=10.3389/fmicb.2024.1343572 ISSN=1664-302X ABSTRACT=
Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.