AUTHOR=Yamga Eric , Mullie Louis , Durand Madeleine , Cadrin-Chenevert Alexandre , Tang An , Montagnon Emmanuel , Chartrand-Lefebvre Carl , Chassé Michaël 

TITLE=Interpretable clinical phenotypes among patients hospitalized with COVID-19 using cluster analysis

JOURNAL=Frontiers in Digital Health

VOLUME=Volume 5 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2023.1142822

DOI=10.3389/fdgth.2023.1142822

ISSN=2673-253X

ABSTRACT=Background: Multiple clinical phenotypes have been proposed for coronavirus disease (COVID-19), but few have used multimodal data. Using clinical and imaging data, we aimed to identify distinct clinical phenotypes in patients admitted with COVID-19 and to assess their clinical outcomes. Our secondary objective was to demonstrate the clinical applicability of this method by developing an interpretable model for phenotype assignment.

Methods: We analyzed data from 547 patients hospitalized with COVID-19 at a Canadian academic hospital. We processed the data by applying a factor analysis of mixed data (FAMD) and compared four clustering algorithms: k-means, partitioning around medoids (PAM), and divisive and agglomerative hierarchical clustering. We used imaging data and 34 clinical variables collected within the first 24 hours of admission to train our algorithm. We conducted a survival analysis to compare the clinical outcomes across phenotypes. With the data split into training and validation sets (75/25 ratio), we developed a decision-tree-based model to facilitate the interpretation and assignment of the observed phenotypes.

Results:  Agglomerative hierarchical clustering was the most robust algorithm. We identified three clinical phenotypes:79 patients (14%) in Cluster  1, 275 patients (50%) in Cluster  2, and 203 (37%) in Cluster  3. Cluster 2 and Cluster 3 were both characterized by a low-risk respiratory and inflammatory profile but differed in terms of demographics. Compared with Cluster 3, Cluster 2 comprised older patients with more comorbidities. Cluster 1 represented the group with the most severe clinical presentation, as inferred by the highest rate of hypoxemia and the highest radiological burden. Intensive care unit (ICU) admission and mechanical ventilation risks were the highest in Cluster 1. Using only two to four decision rules, the classification and regression tree (CART) phenotype assignment model achieved an AUC of 84% (81.5-86.5%, 95 CI) on the validation set. 

Conclusions: We conducted a multidimensional phenotypic analysis of adult inpatients with COVID-19 and identified three distinct phenotypes associated with different clinical outcomes. We also demonstrated the clinical usability of this approach, as phenotypes can be accurately assigned using a simple decision tree. Further research is still needed to properly incorporate these phenotypes in managing patients with COVID-19.