AUTHOR=Schlieben Lea D. , Prokisch Holger , Yépez Vicente A. 

TITLE=How Machine Learning and Statistical Models Advance Molecular Diagnostics of Rare Disorders Via Analysis of RNA Sequencing Data

JOURNAL=Frontiers in Molecular Biosciences

VOLUME=Volume 8 - 2021

YEAR=2021

URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2021.647277

DOI=10.3389/fmolb.2021.647277

ISSN=2296-889X

ABSTRACT=Rare diseases, though individually rare, collectively affect approximately 350 million people worldwide. Currently, nearly 6000 distinct rare disorders with a known molecular basis have been described, yet establishing a specific diagnosis based on the clinical phenotype is challenging. Increasing integration of whole exome sequencing into routine diagnostics of rare diseases is improving diagnostic rates. Nevertheless, about half of patients do not receive a genetic diagnosis due to challenges of variant detection and interpretation. 
During the last years, RNA sequencing is increasingly used as a complementary diagnostic tool providing functional data. Initially, arbitrary thresholds have been applied to call aberrant expression, aberrant splicing, and mono-allelic expression. With the application of RNA sequencing to search for the molecular diagnosis, the implementation of robust statistical models on normalized read counts allowed for the detection of significant outliers corrected for multiple testing. More recently, machine learning methods have been developed to improve the normalization of RNA sequencing read count data by taking confounders into account. Together the methods have increased the power and sensitivity of detection and interpretation of pathogenic variants, leading to diagnostic rates of 10 - 35% in rare diseases. In this review, we provide an overview of the methods used for RNA sequencing and illustrate how these can improve the diagnostic yield of rare diseases.