Skip to main content

ORIGINAL RESEARCH article

Front. Genet.
Sec. RNA
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1442759

ML-GAP: Machine Learning-Enhanced Genomic Analysis Pipeline with Autoencoders and Data Augmentation

Provisionally accepted
  • 1 Brown University, Providence, United States
  • 2 Erciyes University, Kayseri, Türkiye
  • 3 Warren Alpert Medical School, Brown University, Providence, Rhode Island, United States
  • 4 University of Rhode Island, Kingston, Rhode Island, United States

The final, formatted version of the article will be published soon.

    The advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer. We introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model's ability to generalize from the training data to unseen examples. This, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers. Our results demonstrate the ML-GAP's superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial 1 Agraz et al.contribution to the pipeline's effectiveness, advancing greatly genomic data analysis and setting a new standard in the field.

    Keywords: RNA-Seq, autocorrelation, Mixup, machine learning, Feature Selection

    Received: 02 Jun 2024; Accepted: 03 Sep 2024.

    Copyright: © 2024 Agraz, Goksuluk, Zhang, Choi, Clements, Choudhary and Karniadakis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Melih Agraz, Brown University, Providence, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.