AUTHOR=Hu Si-Le , Chen Ying-Li , Zhang Lu-Qiang , Bai Hui , Yang Jia-Hong , Li Qian-Zhong TITLE=LncSTPred: a predictive model of lncRNA subcellular localization and decipherment of the biological determinants influencing localization JOURNAL=Frontiers in Molecular Biosciences VOLUME=11 YEAR=2024 URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2024.1452142 DOI=10.3389/fmolb.2024.1452142 ISSN=2296-889X ABSTRACT=Introduction

Long non-coding RNAs (lncRNAs) play crucial roles in genetic markers, genome rearrangement, chromatin modifications, and other biological processes. Increasing evidence suggests that lncRNA functions are closely related to their subcellular localization. However, the distribution of lncRNAs in different subcellular localizations is imbalanced. The number of lncRNAs located in the nucleus is more than ten times that in the exosome.

Methods

In this study, we propose a new oversampling method to construct a predictive dataset and develop a predictive model called LncSTPred. This model improves the Adaboost algorithm for subcellular localization prediction using 3-mer, 3-RF sequence, and minimum free energy structure features.

Results and Discussion

By using our improved Adaboost algorithm, better prediction accuracy for lncRNA subcellular localization was obtained. In addition, we evaluated feature importance by using the F-score and analyzed the influence of highly relevant features on lncRNAs. Our study shows that the ANA features may be a key factor for predicting lncRNA subcellular localization, which correlates with the composition of stems and loops in the secondary structure of lncRNAs.