Skip to main content

ORIGINAL RESEARCH article

Front. Genet.
Sec. Computational Genomics
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1492752
This article is part of the Research Topic Computational Approaches Integrate Multi-Omics Data for Disease Diagnosis and Treatment View all articles

AI-Enabled Pipeline for Virus Detection, Validation, and SNP Discovery from Next-Generation Sequencing Data

Provisionally accepted
  • 1 Nuclear Science and Technology Research Institute (NSTRI), Karaj, Iran
  • 2 Department of Medical and Surgical Science, University of Magna Graecia, Catanzaro, Calabria, Italy

The final, formatted version of the article will be published soon.

    The rapid and accurate detection of viruses and the discovery of single nucleotide polymorphisms (SNPs) are critical for disease management and understanding viral evolution. This study presents a pipeline for virus detection, validation, and SNP discovery from next-generation sequencing (NGS) data. The pipeline processes raw sequencing data to identify viral sequences with high accuracy and sensitivity by integrating state-of-the-art bioinformatics tools with artificial intelligence. Before aligning the reads to the reference genomes, quality control measures, and adapter trimming are performed to ensure the integrity of the data. Unmapped reads are subjected to de novo assembly to reveal novel viral sequences and genetic elements. The effectiveness of the pipeline is demonstrated by the identification of virus sequences, illustrating its potential for detecting known and emerging pathogens. SNP discovery is performed using a custom Python script that compares the entire population of sequenced viral reads to a reference genome. This approach provides a comprehensive overview of viral genetic diversity and identifies dominant variants and a spectrum of genetic variations. The robustness of the pipeline is confirmed by the recovery of complete viral sequences, which improves our understanding of viral genomics. This research aims to develop an auto-bioinformatics pipeline for novel viral sequence discovery, in vitro validation, and SNPs using the Python (AI) language to understand viral evolution. This study highlights the synergy between traditional bioinformatics techniques and modern approaches, providing a robust tool for analyzing viral genomes and contributing to the broader field of viral genomics.

    Keywords: Virus detection, Next-generation sequencing, Bioinformatics analysis, SNP discovery, Viral genomics, AI-Assisted Genomics pandas 2.2.2 Yes Requires Installation Used for data manipulation and analysis

    Received: 07 Sep 2024; Accepted: 28 Oct 2024.

    Copyright: © 2024 Ghorbani, Rostami and Guzzi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Abozar Ghorbani, Nuclear Science and Technology Research Institute (NSTRI), Karaj, Iran
    Pietro Hiram Guzzi, Department of Medical and Surgical Science, University of Magna Graecia, Catanzaro, 88100, Calabria, Italy

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.