AUTHOR=Premkumar Keshav Aditya R. , Bharanikumar Ramit , Palaniappan Ashok
TITLE=Riboflow: Using Deep Learning to Classify Riboswitches With ∼99% Accuracy
JOURNAL=Frontiers in Bioengineering and Biotechnology
VOLUME=8
YEAR=2020
URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2020.00808
DOI=10.3389/fbioe.2020.00808
ISSN=2296-4185
ABSTRACT=
Riboswitches are cis-regulatory genetic elements that use an aptamer to control gene expression. Specificity to cognate ligand and diversity of such ligands have expanded the functional repertoire of riboswitches to mediate mounting apt responses to sudden metabolic demands and signal changes in environmental conditions. Given their critical role in microbial life, riboswitch characterisation remains a challenging computational problem. Here we have addressed the issue with advanced deep learning frameworks, namely convolutional neural networks (CNN), and bidirectional recurrent neural networks (RNN) with Long Short-Term Memory (LSTM). Using a comprehensive dataset of 32 ligand classes and a stratified train-validate-test approach, we demonstrated the accurate performance of both the deep learning models (CNN and RNN) relative to conventional hyperparameter-optimized machine learning classifiers on all key performance metrics, including the ROC curve analysis. In particular, the bidirectional LSTM RNN emerged as the best-performing learning method for identifying the ligand-specificity of riboswitches with an accuracy >0.99 and macro-averaged F-score of 0.96. An additional attraction is that the deep learning models do not require prior feature engineering. A dynamic update functionality is built into the models to factor for the constant discovery of new riboswitches, and extend the predictive modeling to new classes. Our work would enable the design of genetic circuits with custom-tuned riboswitch aptamers that would effect precise translational control in synthetic biology. The associated software is available as an open-source Python package and standalone resource for use in genome annotation, synthetic biology, and biotechnology workflows.