Skip to main content

ORIGINAL RESEARCH article

Front. Genet.
Sec. Computational Genomics
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1492226
This article is part of the Research Topic Computational Approaches Integrate Multi-Omics Data for Disease Diagnosis and Treatment View all articles

Dinucleotide Composition Representation (DCR)-based deep learning to predict scoliosis-associated Fibrillin-1 genotypes

Provisionally accepted
  • 1 State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Science, Beijing, Beijing Municipality, China
  • 2 College of Basic Medical Sciences, Inner Mongolia Medical University, Hohhot, Inner Mongolia Autonomous Region, China
  • 3 Laboratory of Advanced Biotechnology, Academy of Military Medical Sciences, Beijing, China
  • 4 College of Veterinary Medicine, Shanxi Agricultural University, Jinzhong, Shanxi Province, China

The final, formatted version of the article will be published soon.

    Scoliosis is a pathological spine structure deformation, predominantly classified as "idiopathic" due to its unknown etiology. However, it has been suggested that scoliosis may be linked to polygenic backgrounds. It's crucial to identify potential Adolescent Idiopathic Scoliosis (AIS)-related genetic backgrounds before scoliosis onset. The present study was designed to intelligently parse, decompose and predict AIS-related variants in ClinVar database. Possible AIS-related variant records downloaded from ClinVar were parsed for various labels, decomposed for Dinucleotide Compositional Representation (DCR) and other traits, screened for high-risk genes with statistical analysis, and then learned intelligently with deep learning to predict high-risk AIS genotypes. Results demonstrated that the present framework is composed of all technical sections of data parsing, scoliosis genotyping, genome encoding, machine learning (ML) / deep learning (DL) and scoliosis genotype predicting. 58, 000 scoliosis-related records were automatically parsed and statistically analyzed for high-risk genes and genotypes, such as FBN1, LAMA2 and SPG11. All variant genes were decomposed for DCR and other traits. Unsupervised ML indicated marked inter-group separation and intra-group clustering of the DCR of FBN1, LAMA2 or SPG11 for the five types of variants (Pathogenic, Pathogeniclikely, Benign, Benignlikely and Uncertain). A FBN1 DCR-based Convolutional Neural Network (CNN) was trained for Pathogenic and Benign/Benignlikely variants performed accurately on validation data and predicted 179 high-risk scoliosis variants. The trained predictor was interpretable for the similar distribution of variant types and variant locations within 2D structure units in the predicted 3D structure of FBN1. In summary, scoliosis risk is predictable by deep learning based on genomic decomposed features of DCR. DCR-based classifier has predicted more scoliosis risk FBN1 variants in ClinVar database. DCR-based models would be promising for genotype-to-phenotype prediction for more disease types.

    Keywords: Scoliosis, genotypes, deep learning, FBN1, Genome composition

    Received: 06 Sep 2024; Accepted: 10 Oct 2024.

    Copyright: © 2024 Zhang, Dai, Yin, Kang, Zeng, Jiang, Zhao, Li and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Tao Jiang, State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Science, Beijing, Beijing Municipality, China
    Guang-Yu Zhao, State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Science, Beijing, Beijing Municipality, China
    Xiao-He Li, College of Basic Medical Sciences, Inner Mongolia Medical University, Hohhot, Inner Mongolia Autonomous Region, China
    Jing Li, State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Science, Beijing, Beijing Municipality, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.