AUTHOR=Hernandez Bernard , Stiff Oliver , Ming Damien K. , Ho Quang Chanh , Nguyen Lam Vuong , Nguyen Minh Tuan , Nguyen Van Vinh Chau , Nguyen Minh Nguyet , Nguyen Quang Huy , Phung Khanh Lam , Dong Thi Hoai Tam , Dinh The Trung , Huynh Trung Trieu , Wills Bridget , Simmons Cameron P. , Holmes Alison H. , Yacoub Sophie , Georgiou Pantelis ,  on behalf of the Vietnam ICU Translational Applications Laboratory (VITAL) investigators 

TITLE=Learning meaningful latent space representations for patient risk stratification: Model development and validation for dengue and other acute febrile illness

JOURNAL=Frontiers in Digital Health

VOLUME=Volume 5 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2023.1057467

DOI=10.3389/fdgth.2023.1057467

ISSN=2673-253X

ABSTRACT=Increased data availability has prompted the creation of clinical decision support systems. These systems utilise clinical information to enhance health care provision, both to predict the likelihood of specific clinical outcomes or evaluate the risk of further complications. However, their adoption remains low due to concerns regarding the quality of recommendations, and a lack of clarity on how results are best obtained and presented. We used autoencoders capable of reducing the dimensionality of complex datasets in order to produce a 2D representation denoted as latent space to support understanding of complex clinical data. In this output, meaningful representations of individual patient profiles are spatially mapped in an unsupervised manner according to their input clinical parameters. This technique was then applied to a large real-world clinical dataset of over 12000 patients with an illness compatible with dengue infection in Ho Chi Minh City, Vietnam between 1999 and 2021. Dengue is a systemic viral disease which exerts significant health and economic burden worldwide, and up to 5\% of hospitalised patients develop life-threatening complications. The latent space produced by the selected autoencoder aligns with established clinical characteristics exhibited by patients with dengue infection, as well as features of disease progression. Similar clinical phenotypes are represented close to each other in the latent space and clustered according to outcomes broadly described by the World Health Organisation dengue guidelines. Balancing distance metrics and density metrics produced results covering most of the latent space, and improved visualisation whilst preserving utility, with similar patients grouped closer together. In this case, this balance is achieved by using the sigmoid activation function and one hidden layer with three neurons, in addition to the latent dimension layer, which produces the output (Pearson, 0.840; Spearman, 0.830; Procrustes, 0.301; GMM 0.321).  This study demonstrates that when adequately configured, autoencoders can produce two-dimensional representations of a complex dataset that conserve the distance relationship between points. The output visualisation groups patients with clinically relevant features closely together and inherently supports user interpretability. Work is underway to incorporate these findings into an electronic clinical decision support system to guide individual patient management.