AUTHOR=Ren Jinjin , Chen Xiaozhen , Zhang Zhengqian , Shi Haoran , Wu Shuxiang TITLE=DPred_3S: identifying dihydrouridine (D) modification on three species epitranscriptome based on multiple sequence-derived features JOURNAL=Frontiers in Genetics VOLUME=14 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1334132 DOI=10.3389/fgene.2023.1334132 ISSN=1664-8021 ABSTRACT=

Introduction: Dihydrouridine (D) is a conserved modification of tRNA among all three life domains. D modification enhances the flexibility of a single nucleotide base in the spatial structure and is disease- and evolution-associated. Recent studies have also suggested the presence of dihydrouridine on mRNA.

Methods: To identify D in epitranscriptome, we provided a prediction framework named “DPred_3S” based on the machine learning approach for three species D epitranscriptome, which used epitranscriptome sequencing data as training data for the first time.

Results: The optimal features were evaluated by the F-score and integration of different features; our model achieved area under the receiver operating characteristic curve (AUROC) scores 0.955, 0.946, and 0.905 for Saccharomyces cerevisiae, Escherichia coli, and Schizosaccharomyces pombe, respectively. The performances of different machine learning algorithms were also compared in this study.

Discussion: The high performances of our model suggest the D sites can be distinguished based on their surrounding sequence, but the lower performance of cross-species prediction may be limited by technique preferences.