AUTHOR=Tian Fang , Lin Yongchun , Wang Liangjiao , Fang Fei , Hou Kaiwen TITLE=Construction of a risk screening and visualization system for pulmonary nodule in physical examination population based on feature self-recognition machine learning model JOURNAL=Frontiers in Medicine VOLUME=11 YEAR=2025 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1424750 DOI=10.3389/fmed.2024.1424750 ISSN=2296-858X ABSTRACT=Objective

To assess the effectiveness of a feature self-recognition machine learning model in screening for pulmonary nodule risk in a physical examination population and to evaluate the constructed visualization system.

Methods

We analyzed data from 4,861 individuals who underwent chest CT exams during their physical examinations at the Western Theater General Hospital of the People’s Liberation Army from January 2023 to November 2023. Among them, 1,168 had positive CT reports for pulmonary nodules, while 3,693 had negative findings. We developed a machine learning model using the XGBoost algorithm and employed an improved sooty tern optimization algorithm (ISTOA) for feature selection. The significance of the selected features was evaluated through univariate analysis and multivariable logistic stepwise regression analysis. A visualization system was created to estimate the risk of developing pulmonary nodules.

Results

Multivariable analysis identified older age, smoking or passive smoking, high psychological stress within the past year, occupational exposure (e.g., air pollution at the workplace), presence of chronic lung diseases, and elevated carcinoembryonic antigen levels as significant risk factors for pulmonary nodules. The feature self-recognition machine learning model further highlighted age, smoking or passive smoking, high psychological stress, occupational exposure, chronic lung diseases, family history of lung cancer, decreased albumin levels, and elevated carcinoembryonic antigen as key predictors for early pulmonary nodule risk, demonstrating superior performance.

Conclusion

The feature self-recognition machine learning model effectively aids in the early prediction and clinical identification of pulmonary nodule risk, facilitating timely intervention and improving patient prognosis.