AUTHOR=Chen Hui , Zhu Zhu , Su Nan , Wang Jun , Gu Jun , Lu Shu , Zhang Li , Chen Xuesong , Xu Lei , Shao Xiangrong , Yin Jiangtao , Yang Jinghui , Sun Baodi , Li Yongsheng TITLE=Identification and Prediction of Novel Clinical Phenotypes for Intensive Care Patients With SARS-CoV-2 Pneumonia: An Observational Cohort Study JOURNAL=Frontiers in Medicine VOLUME=8 YEAR=2021 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2021.681336 DOI=10.3389/fmed.2021.681336 ISSN=2296-858X ABSTRACT=

Background: Phenotypes have been identified within heterogeneous disease, such as acute respiratory distress syndrome and sepsis, which are associated with important prognostic and therapeutic implications. The present study sought to assess whether phenotypes can be derived from intensive care patients with coronavirus disease 2019 (COVID-19), to assess the correlation with prognosis, and to develop a parsimonious model for phenotype identification.

Methods: Adult patients with COVID-19 from Tongji hospital between January 2020 and March 2020 were included. The consensus k means clustering and latent class analysis (LCA) were applied to identify phenotypes using 26 clinical variables. We then employed machine learning algorithms to select a maximum of five important classifier variables, which were further used to establish a nested logistic regression model for phenotype identification.

Results: Both consensus k means clustering and LCA showed that a two-phenotype model was the best fit for the present cohort (N = 504). A total of 182 patients (36.1%) were classified as hyperactive phenotype, who exhibited a higher 28-day mortality and higher rates of organ dysfunction than did those in hypoactive phenotype. The top five variables used to assign phenotypes were neutrophil-to-lymphocyte ratio (NLR), ratio of pulse oxygen saturation to the fractional concentration of oxygen in inspired air (Spo2/Fio2) ratio, lactate dehydrogenase (LDH), tumor necrosis factor α (TNF-α), and urea nitrogen. From the nested logistic models, three-variable (NLR, Spo2/Fio2 ratio, and LDH) and four-variable (three-variable plus TNF-α) models were adjudicated to be the best performing, with the area under the curve of 0.95 [95% confidence interval (CI) = 0.94–0.97] and 0.97 (95% CI = 0.96–0.98), respectively.

Conclusion: We identified two phenotypes within COVID-19, with different host responses and outcomes. The phenotypes can be accurately identified with parsimonious classifier models using three or four variables.