Construction of a risk screening and visualization system for pulmonary nodule in physical examination population based on feature self-recognition machine learning model

Tian, Fang; Lin, Yongchun; Wang, Liangjiao; Fang, Fei; Hou, Kaiwen

doi:10.3389/fmed.2024.1424750

ORIGINAL RESEARCH article

Front. Med., 04 March 2025

Sec. Pulmonary Medicine

Volume 11 - 2024 | https://doi.org/10.3389/fmed.2024.1424750

This article is part of the Research TopicAdvancements in Multimodal Data Analysis for Lung Tumor DiagnosisView all 12 articles

Construction of a risk screening and visualization system for pulmonary nodule in physical examination population based on feature self-recognition machine learning model

Fang Tian¹

Yongchun Lin¹

Liangjiao Wang¹

Fei Fang²

Kaiwen Hou¹^*

¹Department of Outpatient, Western Theater Command General Hospital of PLA, Chengdu, Sichuan, China
²Department of Emergency, Tibet Command General Hospital of PLA, Lhasa, China

Objective: To assess the effectiveness of a feature self-recognition machine learning model in screening for pulmonary nodule risk in a physical examination population and to evaluate the constructed visualization system.

Methods: We analyzed data from 4,861 individuals who underwent chest CT exams during their physical examinations at the Western Theater General Hospital of the People’s Liberation Army from January 2023 to November 2023. Among them, 1,168 had positive CT reports for pulmonary nodules, while 3,693 had negative findings. We developed a machine learning model using the XGBoost algorithm and employed an improved sooty tern optimization algorithm (ISTOA) for feature selection. The significance of the selected features was evaluated through univariate analysis and multivariable logistic stepwise regression analysis. A visualization system was created to estimate the risk of developing pulmonary nodules.

Results: Multivariable analysis identified older age, smoking or passive smoking, high psychological stress within the past year, occupational exposure (e.g., air pollution at the workplace), presence of chronic lung diseases, and elevated carcinoembryonic antigen levels as significant risk factors for pulmonary nodules. The feature self-recognition machine learning model further highlighted age, smoking or passive smoking, high psychological stress, occupational exposure, chronic lung diseases, family history of lung cancer, decreased albumin levels, and elevated carcinoembryonic antigen as key predictors for early pulmonary nodule risk, demonstrating superior performance.

Conclusion: The feature self-recognition machine learning model effectively aids in the early prediction and clinical identification of pulmonary nodule risk, facilitating timely intervention and improving patient prognosis.

1 Introduction

The widespread implementation of lung cancer screening programs has markedly increased the detection rates of pulmonary nodules. These nodules, characterized as focal, round-shaped, solid or subsolid lung opacities not exceeding 3 cm in diameter on imaging, can evolve into malignant tumors if not diagnosed and managed promptly. This progression significantly deteriorates the quality of life for affected individuals (1, 2). Lung cancer remains the most prevalent and deadliest of all malignant tumors, with most patients presenting at advanced stages, resulting in low five-year survival rates and poor prognoses (3, 4). Consequently, the effective management of pulmonary nodules is crucial in the prevention and control of lung tumors.

The clinical manifestations of pulmonary nodules are non-specific, complicating the diagnostic process and increasing the likelihood of misdiagnosis. Traditionally, the assessment of these nodules for benign or malignant characteristics involves the analysis of chest CT images or the employment of invasive techniques such as surgery or biopsy to obtain a definitive lesion characterization (5, 6). However, recent advancements in artificial intelligence (AI) have facilitated the extraction of feature information and the development of predictive models. These innovations are proving instrumental in aiding physicians to diagnose suspicious pulmonary nodules non-invasively. Such technological progress not only enhances the potential for early disease detection and prognosis but also significantly improves the diagnostic accuracy of pulmonary nodules (7, 8). For instance, studies have demonstrated that deep learning models are capable of learning subtle image features from complex imaging data, features that are often elusive to traditional methods (2, 9). These advancements have not only improved the accuracy in distinguishing benign from malignant pulmonary nodules but have also shortened the diagnostic process, providing quicker decision support for patient treatment (10, 11). Furthermore, recent research has explored how improvements in algorithms and model structures can enhance the generalizability and interpretability of diagnostic systems, making their application in clinical practice more widespread and effective (12). These findings not only confirm the potential of artificial intelligence technology in non-invasive diagnostics but also highlight future research directions, specifically how to better integrate these advanced technologies into routine clinical diagnostic processes to improve early disease detection and treatment outcomes (13). The aim of this study is to analyze the value of pulmonary nodules risk screening in physical examination population and the effect of visualization system construction based on feature self-recognition machine learning mode.

2 Methods and materials

2.1 Study population

A total of 4,861 individuals who underwent chest CT examinations as part of their physical examinations at the Western Theater General Hospital of the People’s Liberation Army from January 2023 to November 2023 were included in this study, access to the study data began on January 1, 2024. Among them, 1,168 patients had positive CT reports for pulmonary nodules, while 3,693 patients had negative findings. Inclusion criteria were as follows: (1) Normal mental status, clear cognition, and able to cooperate with inquiries; (2) Complete clinical data. Exclusion criteria were as follows: (1) Presence of severe diseases such as cardiovascular, liver, or kidney disorders; (2) History of previous tumors; (3) Pregnant or breastfeeding women.

This study was conducted retrospectively, informed consent was waived, and this study was approved by the Ethics Committee of the Western Theater General Hospital of the People’s Liberation Army (Approval No.: 2022ky105-3). All data were anonymized.

2.2 Data collection

General information and laboratory test results of the study participants were obtained from the Health Management System of the Western Theater General Hospital of the People’s Liberation Army, comprising a total of 33 features. The general information included gender, age, smoking history, alcohol consumption history, place of residence, education level, chronic lung diseases, family history of lung cancer, regular exercise, body mass index, presence of high psychological stress within the past year, and presence of depressive symptoms within the past year. The laboratory test results included carcinoembryonic antigen, thyroid-stimulating hormone, white blood cell count, lymphocyte count, platelet count, hemoglobin, eosinophil count, basophil count, albumin, globulin, albumin-globulin ratio, alanine aminotransferase, aspartate aminotransferase, indirect bilirubin, high-density lipoprotein, low-density lipoprotein, triglycerides, fasting blood glucose, creatinine, blood urea nitrogen, and uric acid.

2.3 Feature self-recognition machine learning model

In this study, we proposed a feature self-recognition machine learning model (FSRML) that does not require preliminary feature selection before running. All 33 features studied were included in the model training. The FSRML utilizes the powerful global optimization capability of swarm intelligence algorithms to automatically perform feature selection. The schematic diagram of the model is shown in Figure 1.

Figure 1

Figure 1. Schematic diagram of the FSRML model. Firstly, swarm intelligence algorithms initialize the generation of chaotic population random sequences. Each chaotic sequence can be filtered using binary encoding (0 represents feature deletion, 1 represents feature retention). The filtered features are then fed into the machine learning model, and the predictive accuracy is calculated as the objective function. Through the iteration of the swarm intelligence algorithm’s population, the feature selection and model construction results are gradually optimized.

Caption: Firstly, swarm intelligence algorithms initialize the generation of chaotic population random sequences. Each chaotic sequence can be filtered using binary encoding (0 represents feature deletion, 1 represents feature retention). The filtered features are then fed into the machine learning model, and the predictive accuracy is calculated as the objective function. Through the iteration of the swarm intelligence algorithm’s population, the feature selection and model construction results are gradually optimized.

To better guide the feature optimization task mentioned above, this study improves upon the sooty tern optimization algorithm (STOA) (14) by incorporating three enhancement strategies: Bernoulli chaotic mapping (15), Cauchy mutation perturbation (16), and longitudinal-lateral crossover mutation (17). These improvements lead to the development of a hybrid chaotic sooty tern optimization algorithm that combines longitudinal-lateral crossover and Cauchy mutation. It is referred to as the improved sooty tern optimization algorithm (ISTOA). The specific improvement strategies are as follows:

(1) Bernoulli chaotic mapping

Swarm intelligence optimization algorithms generally generate populations through randomization. However, when the population size is small, the populations generated by random arrays may lack sufficient ergodicity, potentially causing the optimization results to fall into local optima. Chaotic sequences, characterized by strong ergodicity, unpredictability, and sensitivity to initial values, are better suited for the task of initializing populations. Bernoulli mapping is a typical example of chaotic mapping, and its expression is as follows:

x_{n + 1} = {\begin{cases} \frac{x_{n}}{1 - λ} 0 < x_{n} < 1 - λ \\ \frac{x_{n} - (1 - λ)}{λ} 1 - λ < x_{n} < 1 \end{cases}

In the above expression, we set the value to 0.4. First, a random number x₀ between 0 and 1 is generated. Then, the chaotic sequence is produced according to the aforementioned formula.

(2) Cauchy mutation disturbance

Based on the original STOA, we set a certain probability to perform a position update using Cauchy mutation disturbance. This enhances the algorithm’s ability to escape local optima. The formula for the Cauchy probability density function is as follows:

f (x; x_{0}; λ) = \frac{1}{π λ [1 + {(\frac{x - x_{0}}{λ})}^{2}]} = \frac{1}{π} [\frac{λ}{{(x - x_{0})}^{2} + λ^{2}}]

Incorporating this into the STOA position update formula, we have:

X_{newbest} = X_{best} + X_{best} \times Cauchy (0, 1)

where Cauchy() represents the Cauchy probability density function, X_newbest is the position after mutation, and X_best is the best position before mutation.

2.4 Software system development

A visualization prediction system was built based on the constructed predictive model. This system was developed using MATLAB R2022a and designed using the APP Designer functionality, resulting in an initial *.mlapp file. Subsequently, the *.mlapp file was compiled into an executable *.exe file that can run independently without the need for the MATLAB environment. As long as the computer has MATLAB Runtime installed, the software can be run, effectively reducing the software’s runtime environment requirements and improving its portability.

2.5 Statistical analysis

The predictive model construction was performed using MATLAB 2022a, and data analysis was conducted using SPSS 26.0 software. A significance level of p < 0.05 was used to indicate statistically significant differences. Count data were presented as [n (%)] and compared using the chi-square test, while normally distributed continuous data were expressed as (mean ± standard deviation) and compared using the t-test.

3 Results

3.1 Performance testing of ISTOA optimization

To comprehensively evaluate the performance and efficiency of various algorithms, this study utilized 23 standard benchmark functions to assess the performance of each algorithm, the specific 23 functions are shown in Supplementary File 1. The results demonstrated that ISTOA exhibited significantly faster convergence speed and superior global optimization capability compared to other algorithms. Please refer to Figure 2a for detailed findings. Thus, the ISTOA algorithm showed clear advantages in optimization, making it suitable as the guiding algorithm for feature automatic selection in this study.

3.2 Lung nodule prediction model construction

3.2.1 Model construction overview

Randomly selecting 80% of the dataset as the training set (3,889 cases), we chose several base models including logistic regression (LR), decision tree (DT), k-nearest neighbors algorithm (KNN), backpropagation neural network (BP), support vector machine (SVM), random forest (RF), and XGBoost. All these models were integrated with ISTOA for automatic feature selection, thus constructing the FSRML model. Five-fold cross-validation was performed for all models. Comparing the validation results of the five-fold cross-validation, it was evident that XGBoost exhibited significant advantages (Table 1; Figures 2b,c).

Table 1

Table 1. Cross-validation results of FSRML model training (validation set).

Figure 2

Figure 2. Performance analysis of ISTOA on 23 standard benchmark functions. The 3D surface plots in the figure depict the two-dimensional search space for each benchmark function. The convergence curves illustrate the convergence trends of the first solution dimension for each benchmark function, with a comparison between STOA (blue curve) and the improved ISTOA (red curve).

3.2.2 Machine learning model performance testing

After constructing the models, the remaining 20% of the samples (972 cases) were selected as the test set to evaluate the predictive performance of each model on external data. The results showed that using XGBoost as the base model for FSRML yielded significant improvements in predictive performance (Table 2; Figures 3, 4).

Table 2

Table 2. FSRML model performance comparison results (test set).

Figure 3

Figure 3. ROC curve of validation set.

Figure 4

Figure 4. PR curves of validation set.

3.2.3 Compared with other automatic machine learning methods

The FSRML we developed was compared with other AutoML models, and the results showed that the FSRML model constructed in this study had the best prediction performance on the test set (Table 3; Figures 5, 6).

Table 3

Table 3. Comparison of FSRML and other AutoML prediction performance.

Figure 5

Figure 5. ROC curve of test set.

Figure 6

Figure 6. PR curve of test set.

3.3 Feature validation through model-automated selection

The features selected automatically by the FSRML model include age, smoking or frequent passive smoking, significant psychological stress in the past year, occupational exposure (presence of air pollution in the work environment), presence of chronic lung disease, family history of lung cancer, elevated levels of albumin, and elevated levels of carcinoembryonic antigen. The value of these selected features was assessed through univariate analysis and multivariate logistic stepwise regression analysis.

3.3.1 Univariate analysis for feature selection

The results of the univariate analysis showed that in patients with positive lung nodules, the proportions of age, smoking or frequent passive smoking, significant psychological stress in the past year, occupational exposure (presence of air pollution in the work environment), presence of chronic lung disease, family history of lung cancer, and elevated levels of carcinoembryonic antigen were higher compared to patients with negative lung nodules. Additionally, the level of albumin was lower in patients with positive lung nodules. These differences were statistically significant (p < 0.05) (Table 4).

Table 4

Table 4. Results of univariate analysis of model selection feature.

3.3.2 Multivariate analysis for feature selection

The results of the multivariate analysis showed that advanced age, smoking or frequent passive smoking, significant psychological stress in the past year, occupational exposure (presence of air pollution in the work environment), presence of chronic lung disease, and elevated levels of carcinoembryonic antigen were identified as risk factors for predicting the occurrence of lung nodules (Table 5).

Table 5

Table 5. Results of multivariate logistic regression stepwise regression analysis.

3.4 Development of visualization system

In clinical practice, the changes in various features related to lung nodules can be complex and difficult to visually interpret, making it challenging to determine whether a patient is at risk of developing lung nodules. Existing artificial intelligence methods also face the challenge of high implementation barriers, requiring clinicians to possess advanced coding skills and extensive literature review, which hinders widespread adoption in hospitals. To address this issue, this study innovatively developed a practical visualization system called “A Risk Prediction System for Pulmonary Nodules in Physical Examination Population.” This system offers intuitive, convenient, and practical advantages.

Users can input patients’ basic information in the “Basic Information Input” section and then click the “Prediction” button. The predicted results will be displayed in the “Prediction Result Output” section, providing users with easy access to the prediction outcomes (Figures 7, 8).

Figure 7

Figure 7. Low risk of pulmonary nodules.

Figure 8

Figure 8. High risk of pulmonary nodules.

4 Discussion

In recent years, with changes in people’s lifestyles and the influence of environmental factors, the incidence of lung nodules, as one of the early signs of lung cancer, has been increasing year by year (18). Currently, the diagnostic techniques for lung nodules mainly include CT scanning, needle biopsy, or pathological examination after surgery. CT scans rely on comprehensive analysis by physicians of lesion location, size, density, shape, and other information to make a qualitative diagnosis. However, different pathological subtypes of lung nodules often exhibit similar imaging features, and the diagnosis of the same lesion may be influenced by subjective differences among different diagnosticians, making it difficult to accurately diagnose early-stage lung cancer (19, 20). Pathological examination is considered the gold standard for diagnosing lesions, but it does not provide a comprehensive assessment of the entire lesion. Therefore, different pathological results can also occur depending on the site of sample collection (21). Positron Emission Tomography/Computed Tomography (PET/CT)，the significant role in the evaluation and management of pulmonary nodules. PET/CT is instrumental in distinguishing between benign and malignant nodules, enhancing the diagnostic accuracy beyond what is achievable with CT alone. This imaging modality integrates metabolic and anatomic information, providing a more comprehensive assessment of nodule activity. Studies have shown that PET/CT can significantly improve the sensitivity and specificity of lung cancer detection, especially in nodules that are indeterminate in size and appearance (22). However, the high cost of PET/CT makes it difficult to promote it in clinical practice.

Machine learning is an interdisciplinary field that combines statistics, various domains of knowledge, and computer technology to process large volumes of data. It is a subfield of artificial intelligence. By utilizing machine learning algorithms, researchers can extract the necessary feature variables from massive datasets, thereby enhancing learning efficiency (23, 24). Machine learning has been widely applied in the medical field. In this study, a machine learning model based on XGBoost was developed for feature recognition. This model automates the preliminary work of machine learning, including data preparation, encoding, feature selection/extraction, and engineering environment. During the model generation process, it involves algorithm selection, optimization, iteration, and validation (25, 26). Additionally, this study utilized the ISTOA for optimizing the performance of machine learning. This algorithm builds upon the decision tree algorithm and continuously improves precision through accumulation (27). An essential aspect of implementing data-driven models in medicine is ensuring the feasibility and integration of these processes within healthcare service providers. The successful deployment of our feature self-recognition machine learning model for pulmonary nodule risk screening hinges not only on its predictive accuracy but also on its practical application in clinical settings. According to recent studies, it is crucial to consider factors such as interoperability with existing healthcare systems, ease of use for clinical staff, and the ability to handle large-scale data efficiently. Our model has been designed with these considerations in mind, featuring an intuitive visualization system that can seamlessly integrate with electronic health records (EHR) and other hospital information systems (HIS). Additionally, the model’s reliance on routinely collected clinical data ensures that its implementation does not require significant changes to current workflows, thereby facilitating its adoption in real-world healthcare environments. Future work will focus on pilot testing the system in various healthcare settings to further validate its feasibility and gather feedback for continuous improvement.

In this study, both univariate analysis and multivariate logistic regression models were used to identify six influencing factors: advanced age, smoking or frequent passive smoking, significant psychological stress in the past year, occupational exposure (presence of air pollution in the work environment), presence of chronic lung disease, and elevated levels of carcinoembryonic antigen. On the other hand, the feature recognition machine learning model identified eight features, including age, smoking or frequent passive smoking, significant psychological stress in the past year, occupational exposure (presence of air pollution in the work environment), presence of chronic lung disease, family history of lung cancer, decreased levels of albumin, and elevated levels of carcinoembryonic antigen. These features can be used for early diagnosis and prediction of the risk of developing lung nodules. This is because in the regression models, there is a high degree of linear correlation among the independent variables, which leads to inaccurate, unstable, and even unreliable estimates of the regression coefficients. This affects the predictive ability of the models and indicates that machine learning outperforms traditional multivariate analysis.

An analysis of the aforementioned risk factors reveals that both men and women have an increased incidence of lung nodules with age. This is because as the body ages, the immune system weakens, cell self-repair capabilities decline, and various carcinogenic factors stimulate the development of multiple diseases, promoting tumor growth (28, 29). Smoking intensity and duration are positively correlated with the incidence of lung nodules. This is due to the presence of dozens of carcinogens in tobacco, which can cause genetic mutations and promote chronic tumor growth. Passive smokers unknowingly inhale smoke, leading to lung function impairment (30, 31). Smoking also causes constriction of small blood vessels in the lungs and thickening of vessel walls, resulting in elevated levels of carcinoembryonic antigen (32). Air pollution and smoking have a synergistic effect on the occurrence and development of lung cancer, continuously increasing the incidence of lung nodules. Previous studies have shown that exposure to kitchen fumes and occupational exposure increase the risk of developing lung nodules. This is because kitchen fumes mainly contain carcinogens such as benzopyrene, volatile nitrosamines, and heterocyclic amine compounds, which exert cytotoxic effects on lung tissue and damage the respiratory system. Occupational exposure to substances like aluminum, arsenic, asbestos, coke, and coal gas has carcinogenic effects on the lungs (33, 34). Most lung nodules are caused by lung inflammation, and underlying lung diseases such as pneumonia, emphysema, chronic bronchitis, chronic obstructive pulmonary disease (COPD), and asthma are all inflammatory conditions that can recur and increase the incidence of lung nodules (35, 36). A positive family history of lung cancer and a history of lung disease are positively associated with the development of lung nodules, increasing the risk of their occurrence (37). Reasons for low albumin levels include inadequate intake, excessive consumption, excessive elimination, and insufficient synthesis. In patients with lung nodules, low albumin levels can be caused by malnutrition, impaired liver function, tumor metastasis, digestive tract tumors, liver tumors, and other factors (38, 39).

Compared to traditional statistical models such as logistic regression, the feature recognition machine learning model improves model accuracy. Additionally, the use of the ISTOA significantly reduces the barrier to entry for artificial intelligence technology. Healthcare professionals can utilize this tool to screen individuals undergoing medical examinations for their risk of developing lung nodules and implement targeted intervention measures. For example, they can promote a balanced diet, encourage physical exercise, foster healthy lifestyle habits, and establish regulations to restrict smoking (40, 41).

5 Conclusion

The application of feature recognition machine learning models can help clinicians identify characteristics of lung nodule patients, thereby enabling early prediction of disease occurrence, assisting in the development of treatment plans, and improving prognosis. However, this study also has certain limitations. Firstly, it is a retrospective and single-center study, which may introduce selection bias and affect the accuracy of the research findings. To further validate the results, more multicenter sample data is needed. Additionally, CT imaging plays a crucial role in the diagnosis of lung nodules in clinical practice. However, this study did not include CT imaging radiomics features, indicating the need for further analysis in future studies.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Ethics Committee of the Western Theater General Hospital of the People’s Liberation Army. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because this study was conducted retrospectively, informed consent was waived.

Author contributions

KH: Conceptualization, Methodology, Project administration, Resources, Writing – review & editing. FT: Data curation, Investigation, Resources, Writing – original draft. YL: Data curation, Investigation, Writing – review & editing. LW: Investigation, Writing – review & editing. FF: Formal analysis, Methodology, Software, Visualization, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2024.1424750/full#supplementary-material

References

1. Godoy, MCB, Odisio, EGLC, Truong, MT, de Groot, PM, Shroff, GS, and Erasmus, JJ. Pulmonary nodule Management in Lung Cancer Screening: a pictorial review of lung-RADS version 1.0. Radiol Clin North Am. (2018) 56:353–63. doi: 10.1016/j.rcl.2018.01.003

PubMed Abstract | Crossref Full Text | Google Scholar

2. De Margerie-Mellon, C, and Chassagnon, G. Artificial intelligence: a critical review of applications for lung nodule and lung cancer. Diagn Interv Imaging. (2023) 104:11–7. doi: 10.1016/j.diii.2022.11.007

PubMed Abstract | Crossref Full Text | Google Scholar

3. Wu, Z, Wang, F, Cao, W, Qin, C, Dong, X, Yang, Z, et al. Lung cancer risk prediction models based on pulmonary nodules: a systematic review. Thorac Cancer. (2022) 13:664–77. doi: 10.1111/1759-7714.14333

PubMed Abstract | Crossref Full Text | Google Scholar

4. Zheng, F, Lavin, J, and Sprafka, JM. Patient out-of-pocket costs for suspicious pulmonary nodule biopsy in lung cancer patients. J Med Econ. (2021) 24:1173–7. doi: 10.1080/13696998.2021.1988282

PubMed Abstract | Crossref Full Text | Google Scholar

5. Zhang, Y, Jiang, B, Zhang, L, Greuter, MJW, de Bock, GH, Zhang, H, et al. Lung nodule detectability of artificial intelligence-assisted CT image Reading in lung Cancer screening. Curr Med Imaging. (2022) 18:327–34. doi: 10.2174/1573405617666210806125953

Crossref Full Text | Google Scholar

6. Ghossein, J, Gingras, S, and Zeng, W. Differentiating primary from secondary lung cancer with FDG PET/CT and extra-pulmonary tumor grade. J Med Imaging Radiat Sci. (2023) 54:451–6. doi: 10.1016/j.jmir.2023.05.045

PubMed Abstract | Crossref Full Text | Google Scholar

7. Li, W, Yu, S, Yang, R, Tian, Y, Zhu, T, Liu, H, et al. Machine learning model of ResNet50-ensemble voting for malignant-benign small pulmonary nodule classification on computed tomography images. Cancers (Basel). (2023) 15:5417. doi: 10.3390/cancers15225417

Crossref Full Text | Google Scholar

8. Liu, M, Zhou, Z, Liu, F, Wang, M, Wang, Y, Gao, M, et al. CT and CEA-based machine learning model for predicting malignant pulmonary nodules. Cancer Sci. (2022) 113:4363–73. doi: 10.1111/cas.15561

PubMed Abstract | Crossref Full Text | Google Scholar

9. Pei, Q, Luo, Y, Chen, Y, Li, J, Xie, D, and Ye, T. Artificial intelligence in clinical applications for lung cancer: diagnosis, treatment and prognosis. Clin Chem Lab Med. (2022) 60:1974–83. doi: 10.1515/cclm-2022-0291

Crossref Full Text | Google Scholar

10. Chassagnon, G, de Margerie-Mellon, C, Vakalopoulou, M, Marini, R, Hoang-Thi, TN, Revel, MP, et al. Artificial intelligence in lung cancer: current applications and perspectives. Jpn J Radiol. (2023) 41:235–44. doi: 10.1007/s11604-022-01359-x

PubMed Abstract | Crossref Full Text | Google Scholar

11. Goncalves, S, Fong, PC, and Blokhina, M. Artificial intelligence for early diagnosis of lung cancer through incidental nodule detection in low- and middle-income countries-acceleration during the COVID-19 pandemic but here to stay. Am J Cancer Res. (2022) 12:1–16.

Google Scholar

12. Viswanathan, VS, Toro, P, Corredor, G, Mukhopadhyay, S, and Madabhushi, A. The state of the art for artificial intelligence in lung digital pathology. J Pathol. (2022) 257:413–29. doi: 10.1002/path.5966

PubMed Abstract | Crossref Full Text | Google Scholar

13. Zhang, K, and Chen, K. Artificial intelligence: opportunities in lung cancer. Curr Opin Oncol. (2022) 34:44–53. doi: 10.1097/CCO.0000000000000796

Crossref Full Text | Google Scholar

14. Xia, Q, Ding, Y, Zhang, R, Zhang, H, Li, S, and Li, X. Optimal performance and application for seagull optimization algorithm using a hybrid strategy. Entropy. (2022) 24:973. doi: 10.3390/e24070973

Crossref Full Text | Google Scholar

15. Yang, C, Pan, P, and Ding, Q. Image encryption scheme based on mixed chaotic Bernoulli measurement matrix block compressive sensing. Entropy. (2022) 24:273. doi: 10.3390/e24020273

PubMed Abstract | Crossref Full Text | Google Scholar

16. Araújo, MO, Marinho, LS, and Felinto, D. Observation of nonclassical correlations in biphotons generated from an ensemble of pure two-level atoms. Phys Rev Lett. (2022) 128:83601. doi: 10.1103/PhysRevLett.128.083601

Crossref Full Text | Google Scholar

17. Zhi, Z, Bian, Z, Chen, Y, Zhang, X, Wu, Y, and Wu, H. Horizontal and vertical comparison of microbial community structures in a low permeability reservoir at the local scale. Microorganisms. (2023) 11:2862. doi: 10.3390/microorganisms11122862

PubMed Abstract | Crossref Full Text | Google Scholar

18. Senent-Valero, M, Librero, J, and Pastor-Valero, M. Solitary pulmonary nodule malignancy predictive models applicable to routine clinical practice: a systematic review. Syst Rev. (2021) 10:308. doi: 10.1186/s13643-021-01856-6

PubMed Abstract | Crossref Full Text | Google Scholar

19. Silva, M, Milanese, G, Sestini, S, Sabia, F, Jacobs, C, van Ginneken, B, et al. Lung cancer screening by nodule volume in lung-RADS v1.1: negative baseline CT yields potential for increased screening interval. Eur Radiol. (2021) 31:1956–68. doi: 10.1007/s00330-020-07275-w

PubMed Abstract | Crossref Full Text | Google Scholar

20. Ha, T, Kim, W, Cha, J, Lee, YH, Seo, HS, Park, SY, et al. Differentiating pulmonary metastasis from benign lung nodules in thyroid cancer patients using dual-energy CT parameters. Eur Radiol. (2022) 32:1902–11. doi: 10.1007/s00330-021-08278-x

PubMed Abstract | Crossref Full Text | Google Scholar

21. Wang, Y, Huang, Q, and Li, J. Analysis of clinical and pathological features of malignant pulmonary nodules. Altern Ther Health Med. (2023) 29:188–93.

PubMed Abstract | Google Scholar

22. Evangelista, L, Cuocolo, A, Pace, L, Mansi, L, del Vecchio, S, Miletto, P, et al. Performance of FDG-PET/CT in solitary pulmonary nodule based on pre-test likelihood of malignancy: results from the ITALIAN retrospective multicenter trial. Eur J Nucl Med Mol Imaging. (2018) 45:1898–907. doi: 10.1007/s00259-018-4016-1

PubMed Abstract | Crossref Full Text | Google Scholar

23. Zhang, Y, Feng, W, Wu, Z, Li, W, Tao, L, Liu, X, et al. Deep-learning model of ResNet combined with CBAM for malignant-benign pulmonary nodules classification on computed tomography images. Medicina. (2023) 59:1088. doi: 10.3390/medicina59061088

PubMed Abstract | Crossref Full Text | Google Scholar

24. Chen, Y, Hou, X, Yang, Y, Ge, Q, Zhou, Y, and Nie, S. A novel deep learning model based on multi-scale and multi-view for detection of pulmonary nodules. J Digit Imaging. (2023) 36:688–99. doi: 10.1007/s10278-022-00749-x

PubMed Abstract | Crossref Full Text | Google Scholar

25. Huang, W, Zhang, H, Ge, Y, Duan, S, Ma, Y, Wang, X, et al. Radiomics-based machine learning methods for volume doubling time prediction of pulmonary ground-glass nodules with baseline chest computed tomography. J Thorac Imaging. (2023) 38:304–14. doi: 10.1097/RTI.0000000000000725

PubMed Abstract | Crossref Full Text | Google Scholar

26. Qi, J, Hong, B, Tao, R, Sun, R, Zhang, H, Zhang, X, et al. Prediction model for malignant pulmonary nodules based on cfMeDIP-seq and machine learning. Cancer Sci. (2021) 112:3918–23. doi: 10.1111/cas.15052

PubMed Abstract | Crossref Full Text | Google Scholar

27. Lin, RY, Zheng, YN, Lv, FJ, Fu, BJ, Li, WJ, Liang, ZR, et al. A combined non-enhanced CT radiomics and clinical variable machine learning model for differentiating benign and malignant sub-centimeter pulmonary solid nodules. Med Phys. (2023) 50:2835–43. doi: 10.1002/mp.16316

PubMed Abstract | Crossref Full Text | Google Scholar

28. Xu, L, Su, Z, and Xie, B. Diagnostic value of conventional tumor markers in young patients with pulmonary nodules. J Clin Lab Anal. (2021) 35:23912. doi: 10.1002/jcla.23912

Crossref Full Text | Google Scholar

29. Huang, C, Sun, Y, Wu, Q, Ma, C, Jiao, P, Wang, Y, et al. Simultaneous bilateral pulmonary resection via single-utility port VATS for multiple pulmonary nodules: a single-center experience of 16 cases. Thorac Cancer. (2021) 12:525–33. doi: 10.1111/1759-7714.13791

PubMed Abstract | Crossref Full Text | Google Scholar

30. Gendarme, S, and Chouaid, C. Monitoring subsolid pulmonary nodules in high-risk patients is even more cost-effective when combined with a stop-smoking program. J Thorac Oncol. (2020) 15:1268–70. doi: 10.1016/j.jtho.2020.04.023

PubMed Abstract | Crossref Full Text | Google Scholar

31. Trejo Gallego, C, Bueno, J, Cruces, E, Stelow, EB, Mancheño, N, and Flors, L. Pulmonary histiocytosis: beyond Langerhans cell histiocytosis related to smoking. Radiologia. (2019) 61:215–24. doi: 10.1016/j.rxeng.2019.03.004

PubMed Abstract | Crossref Full Text | Google Scholar

32. Huang, CS, Chen, CY, Huang, LK, Wang, WS, and Yang, SH. Prognostic value of postoperative serum carcinoembryonic antigen levels in colorectal cancer patients who smoke. PLoS One. (2020) 15:233687. doi: 10.1371/journal.pone.0233687

Crossref Full Text | Google Scholar

33. Hirano, T, Numakura, T, Moriyama, H, Saito, R, Shishikura, Y, Shiihara, J, et al. The first case of multiple pulmonary granulomas with amyloid deposition in a dental technician; a rare manifestation as an occupational lung disease. BMC Pulm Med. (2018) 18:77. doi: 10.1186/s12890-018-0654-0

Crossref Full Text | Google Scholar

34. Hung, SC, Wang, YT, and Tseng, MH. An interpretable three-dimensional artificial intelligence model for computer-aided diagnosis of lung nodules in computed tomography images. Cancers. (2023) 15:4655. doi: 10.3390/cancers15184655

Crossref Full Text | Google Scholar

35. Namireddy, MK, Consul, N, and Sher, AC. FDG-avid pulmonary nodules and tracheobronchial mural inflammation in IgG4-related disease. Clin Nucl Med. (2021) 46:e125–6. doi: 10.1097/RLU.0000000000003358

Crossref Full Text | Google Scholar

36. Liao, J, Guan, H, Yu, M, Zhou, P, Han, Y, Peng, X, et al. Pulmonary granulomatous inflammation after ceritinib treatment in advanced ALK-rearranged pulmonary adenocarcinoma. Investig New Drugs. (2022) 40:1141–5. doi: 10.1007/s10637-022-01270-2

PubMed Abstract | Crossref Full Text | Google Scholar

37. Uthoff, JM, Mott, SL, Larson, J, Neslund-Dudas, CM, Schwartz, AG, and Sieren, JC. Computed tomography features of lung structure have utility for differentiating malignant and benign pulmonary nodules. Chronic Obstr Pulm Dis. (2022) 9:154–64. doi: 10.15326/jcopdf.2021.0271

Crossref Full Text | Google Scholar

38. Wei, Q, Fang, W, Chen, X, Yuan, Z, du, Y, Chang, Y, et al. Establishment and validation of a mathematical diagnosis model to distinguish benign pulmonary nodules from early non-small cell lung cancer in Chinese people. Transl Lung Cancer Res. (2020) 9:1843–52. doi: 10.21037/tlcr-20-460

PubMed Abstract | Crossref Full Text | Google Scholar

39. Dailey, WA, Frey, GT, McKinney, JM, Paz-Fumagalli, R, Sella, DM, Toskich, BB, et al. Percutaneous computed tomography-guided radiotracer-assisted localization of difficult pulmonary nodules in Uniportal video-assisted thoracic surgery. J Laparoendosc Adv Surg Tech A. (2018) 28:1451–7. doi: 10.1089/lap.2018.0248

PubMed Abstract | Crossref Full Text | Google Scholar

40. Kao, MW. Intracorporeal direct measurement for localizing peripheral pulmonary nodules during thoracoscopy. J Thorac Dis. (2019) 11:4119–26. doi: 10.21037/jtd.2019.10.06

PubMed Abstract | Crossref Full Text | Google Scholar

41. Kim, H, Goo, JM, and Park, CM. A simple prediction model using size measures for discrimination of invasive adenocarcinomas among incidental pulmonary subsolid nodules considered for resection. Eur Radiol. (2019) 29:1674–83. doi: 10.1007/s00330-018-5739-x

Crossref Full Text | Google Scholar

Keywords: machine learning, pulmonary nodules, risk screening, visualization system, algorithm

Citation: Tian F, Lin Y, Wang L, Fang F and Hou K (2025) Construction of a risk screening and visualization system for pulmonary nodule in physical examination population based on feature self-recognition machine learning model. Front. Med. 11:1424750. doi: 10.3389/fmed.2024.1424750

Received: 28 April 2024; Accepted: 22 October 2024;
Published: 04 March 2025.

Edited by:

Liang Zhao, Dalian University of Technology, China

Reviewed by:

Elizaveta Savchenko, Ariel University, Israel
Salvatore Annunziata, Fondazione Policlinico Universitario A. Gemelli IRCCS, Italy

Copyright © 2025 Tian, Lin, Wang, Fang and Hou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kaiwen Hou, aGt3Y2NAMTI2LmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Construction of a risk screening and visualization system for pulmonary nodule in physical examination population based on feature self-recognition machine learning model

1 Introduction

2 Methods and materials

2.1 Study population

2.2 Data collection

2.3 Feature self-recognition machine learning model

2.4 Software system development

2.5 Statistical analysis

3 Results

3.1 Performance testing of ISTOA optimization

3.2 Lung nodule prediction model construction

3.2.1 Model construction overview

3.2.2 Machine learning model performance testing

3.2.3 Compared with other automatic machine learning methods

3.3 Feature validation through model-automated selection

3.3.1 Univariate analysis for feature selection

3.3.2 Multivariate analysis for feature selection

3.4 Development of visualization system

4 Discussion

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

References

94% of researchers rate our articles as excellent or good

94% of researchers rate our articles as excellent or good