AUTHOR=Wong Wilson K. M. , Thorat Vinod , Joglekar Mugdha V. , Dong Charlotte X. , Lee Hugo , Chew Yi Vee , Bhave Adwait , Hawthorne Wayne J. , Engin Feyza , Pant Aniruddha , Dalgaard Louise T. , Bapat Sharda , Hardikar Anandwardhan A. TITLE=Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells JOURNAL=Frontiers in Endocrinology VOLUME=13 YEAR=2022 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2022.853863 DOI=10.3389/fendo.2022.853863 ISSN=1664-2392 ABSTRACT=
Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with the presence or absence of insulin transcript(s). Prediction accuracy/sensitivity of each ML-workflow was tested in a separate validation dataset (N=2,913). Ensemble ML-workflows, in particular Random Forest ML-algorithm delivered high predictive power (AUC=0.83) and sensitivity (0.98), compared to other algorithms. The transcripts identified through these analyses also demonstrated significant correlation with insulin in bulk RNA-seq data from human islets. The top-10 features, (including