AUTHOR=Lv Xiang , Luo Jiesi , Huang Wei , Guo Hui , Bai Xue , Yan Pijun , Jiang Zongzhe , Zhang Yonglin , Jing Runyu , Chen Qi , Li Menglong TITLE=Identifying diagnostic indicators for type 2 diabetes mellitus from physical examination using interpretable machine learning approach JOURNAL=Frontiers in Endocrinology VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2024.1376220 DOI=10.3389/fendo.2024.1376220 ISSN=1664-2392 ABSTRACT=Background

Identification of patients at risk for type 2 diabetes mellitus (T2DM) can not only prevent complications and reduce suffering but also ease the health care burden. While routine physical examination can provide useful information for diagnosis, manual exploration of routine physical examination records is not feasible due to the high prevalence of T2DM.

Objectives

We aim to build interpretable machine learning models for T2DM diagnosis and uncover important diagnostic indicators from physical examination, including age- and sex-related indicators.

Methods

In this study, we present three weighted diversity density (WDD)-based algorithms for T2DM screening that use physical examination indicators, the algorithms are highly transparent and interpretable, two of which are missing value tolerant algorithms.

Patients

Regarding the dataset, we collected 43 physical examination indicator data from 11,071 cases of T2DM patients and 126,622 healthy controls at the Affiliated Hospital of Southwest Medical University. After data processing, we used a data matrix containing 16004 EHRs and 43 clinical indicators for modelling.

Results

The indicators were ranked according to their model weights, and the top 25% of indicators were found to be directly or indirectly related to T2DM. We further investigated the clinical characteristics of different age and sex groups, and found that the algorithms can detect relevant indicators specific to these groups. The algorithms performed well in T2DM screening, with the highest area under the receiver operating characteristic curve (AUC) reaching 0.9185.

Conclusion

This work utilized the interpretable WDD-based algorithms to construct T2DM diagnostic models based on physical examination indicators. By modeling data grouped by age and sex, we identified several predictive markers related to age and sex, uncovering characteristic differences among various groups of T2DM patients.