ORIGINAL RESEARCH article

Front. Oncol., 02 April 2025

Sec. Cancer Imaging and Image-directed Interventions

Volume 15 - 2025 | https://doi.org/10.3389/fonc.2025.1510386

Prediction of EGFR mutations in non-small cell lung cancer: a nomogram based on 18F-FDG PET and thin-section CT radiomics with machine learning

Jianbo Li&#x;Jianbo Li1†Qin Shi&#x;Qin Shi2†Yi YangYi Yang2Jikui XieJikui Xie2Qiang XieQiang Xie2Ming Ni*Ming Ni2*Xuemei Wang,*Xuemei Wang1,2*
  • 1Department of Nuclear Medicine, The Affiliated Hospital of Inner Mongolia Medical University, Hohhot, China
  • 2Department of Nuclear Medicine, Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China

Background: This study aimed to develop and validate radiomics-based nomograms for the identification of EGFR mutations in non-small cell lung cancer (NSCLC).

Methods: A retrospective analysis was performed on 313 NSCLC patients, who were randomly divided into training (n = 250) and validation (n = 63) groups. Radiomic features were extracted from 18F-fluorodeoxyglucose positron emission tomography (18F-FDG PET) and thin-section computed tomography (CT) scans. After selecting optimal radiomic features, four machine learning algorithms, including logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost), were used to develop and validate radiomics models. A combined model, incorporating the Rad score from the best performing radiomics model with clinical and radiological features, was then formulated. Finally, the integrated nomogram was generated. Its predictive performance and clinical utility were evaluated using receiver operating characteristic curves, calibration curves, and decision curve analysis.

Results: Among the radiomics models, the RF model showed the best performance with AUCs of 0.785 (95% CI, 0.726-0.844) and 0.776 (95% CI, 0.662-0.889) in the training and validation groups, respectively. The AUCs of the clinical and radiological models in both groups were 0.711 (95% CI, 0.645-0.776) and 0.758 (95% CI, 0.627-0.890), and 0.632 (95% CI, 0.564-0.699) and 0.677 (95% CI, 0.531-0.822), respectively. The combined model achieved the highest AUCs of 0.872 (95% CI, 0.829-0.915) and 0.831 (95% CI, 0.723-0.940) in the training and validation groups, respectively. The DeLong test confirmed the superiority of the combined model over the other three models. Both the calibration curve and the DCA indicated that the radiomics nomogram was consistent and clinically useful.

Conclusions: Radiomics combined with machine learning and based on 18F-FDG PET/CT images can effectively determine EGFR mutation status in NSCLC patients. Radiomics-based nomograms provide a non-invasive and visually intuitive prediction tool for screening NSCLC patients with EGFR mutations in a clinical setting.

1 Introduction

In 2024, lung cancer will remain the leading cause of cancer-related deaths worldwide (1). Non-small cell lung cancer (NSCLC) accounts for approximately 80-85% of all lung cancers (2). Unfortunately, most NSCLC patients are diagnosed at an advanced stage, resulting in a poor prognosis (3). With the advent of precision medicine and personalized treatment strategies, the paradigm of targeting the epidermal growth factor receptor (EGFR) with tyrosine kinase inhibitors (TKIs) has become the standard of care for advanced NSCLC. This approach has significantly improved progression-free survival and overall survival in patients with EGFR mutations (4, 5). However, resistance to TKIs inevitably develops over time (6). The National Comprehensive Cancer Network (NCCN) guidelines recommend molecular detection of EGFR mutations in patients with advanced or metastatic NSCLC (7). Therefore, rapid and accurate identification of EGFR mutations is of paramount importance for tailoring individualized treatment plans.

Currently, gene mutation detection relies primarily on histological samples from primary or metastatic lesions. Invasive procedures often yield limited tissue or cell samples that may not accurately represent the overall tumor profile or capture intra- and inter-tumor heterogeneity. In addition, approximately 5% to 20% of patients with advanced NSCLC cannot undergo molecular genetic testing using histological samples (8). Liquid biopsy has emerged as a novel method to assess EGFR mutation status. Although it offers convenience, speed and affordability, its sensitivity and stability remain suboptimal (9). Therefore, there is an urgent need to develop non-invasive, simple, rapid and reliable techniques for detecting gene mutations.

Phenotypic analysis by imaging is a promising non-invasive method for predicting EGFR mutations. Previous studies have shown that CT signs, including ground glass composition, air-bronchial sign, vacuole sign and pleural indentation sign, correlate with EGFR mutations (1012). However, these CT signs rely on subjective visual assessment and lack quantification. 18F-FDG PET/CT, recognized as a molecular imaging technique for tumors, has become an integral part of the clinical management of NSCLC (7). While EGFR mutation may influence FDG uptake via the NADPH oxidase 4 (NOX4)/reactive oxygen species (ROS)/glucose transporter protein 1 (GLUT1) axis (13), the predictive value of 18F-FDG PET/CT metabolic parameters - such as the maximum standard uptake value (SUVmax), mean standard uptake value (SUVmean), total lesion glycolysis (TLG) and metabolic tumor volume (MTV) - remains controversial in the context of EGFR mutation status (14).

Radiomics offers a departure from traditional image analysis by transforming medical images into high-dimensional, mineable data through the high-throughput extraction of quantitative features. This approach has shown significant potential in tumor diagnosis, treatment evaluation and prognosis prediction (15). While numerous machine learning models based on radiomics features have been reported to identify EGFR mutation status in NSCLC patients (1618), most radiomics studies have typically used a single modelling method, which may affect predictive outcomes. To improve the accuracy of radiomics in predicting EGFR mutations, our study used different machine learning algorithms to build multiple models. Furthermore, we constructed a combined model that integrates PET/CT radiomics with clinical and radiological features to optimize prediction efficiency. A radiomics-based nomogram was then developed to predict EGFR mutation status.

2 Materials and methods

2.1 Patients

This study was conducted in accordance with the Declaration of Helsinki, as revised in 2013, and was approved by the Ethics Committee of The First Affiliated Hospital of University of Science and Technology of China (approval number 2023-RE-018). The data are anonymous, and the requirement for informed consent was therefore waived. 313 patients were retrospectively analyzed from January 2015 to June 2021. Inclusion criteria included: NSCLC diagnosis confirmed by surgical or puncture biopsy with subsequent EGFR gene detection; 18F-FDG PET/CT scan within one month prior to treatment; and no history of other malignancies. Exclusion criteria included: prior antitumor treatment prior to the PET/CT scan; poor image quality due to significant respiratory or motion artefacts, or unclear tumor boundaries making it difficult to delineate the lesion volume of interest (VOI); a lesion VOI of less than 1.0 cm3, which could introduce a partial volume effect; lesions identified as ground glass nodules (GGNs) or with an SUVmax < 2.5; multiple lung cancer lesions (≥ 2); and incomplete clinical or imaging data. Of the 313 patients analyzed, 123 were identified as having a wild-type EGFR genotype, while 190 had EGFR mutations. These patients were randomly assigned in an 8:2 ratio to a training group (n = 250) and a validation group (n = 63). Both clinical and radiological features were meticulously documented.

2.2 EGFR mutation detection

Tumor tissue samples were obtained either by either surgical resection or biopsy. EGFR mutation status was analyzed using the human EGFR gene mutation detection kit provided by Wuhan Friends Medical Technology Co, Ltd, China. Mutations in EGFR exons 18, 19, 20 and 21 were identified using the real-time PCR/amplification retardation mutation system (RT-PCR/ARMS). PCR analysis was performed on the PRISM 7500 system from Applied Biosystems, Inc. Experienced pathologists with over a decade of experience interpreted and confirmed both histological findings and EGFR mutation results.

2.3 Image acquisition

Prior to the scan, patients were required to fast for more than 6 hours and maintain a blood glucose level of less than 11.10 mmol/L. They were then administered 18F-FDG at a dose of 3.7-7.4 MBq/kg. After a rest period of approximately 60 ± 10 minutes, patients underwent a PET/CT scan using a Biograph 16HR PET/CT scanner (SIEMENS, Germany). A low-dose CT scan was performed first, followed by a PET scan. The PET acquisition used a three-dimensional mode over 6-8 beds, with each bed taking approximately 2 minutes. PET images, attenuated with CT data, were reconstructed using the ordered subset expectation maximization method (3 iterations, 24 subsets, and a 4 mm full width at half maximum). To obtain more detailed morphological information, a breath-hold thin-section CT scan was performed immediately after the PET/CT scan. The acquisition parameters for this scan were set to a voltage of 120 kV, a current of 200 mA, a pitch of 1.15, a collimator width of 0.75 mm, a reconstruction slice thickness of 0.625 mm, and a matrix of 512 × 512.

2.4 Tumor segmentation

In accordance with the Image Biomarker Standardization Initiative (IBSI) (19), the volume of interest (VOI) was delineated by two nuclear medicine physicians with more than 10 years of experience using the TrueD toolkit on the Syngo via workstation (version VB10B, SIEMENS). The VOI included areas of necrosis, hemorrhage and calcification while excluding normal lung tissue, atelectasis or surrounding tumor inflammation. For PET images, physicians initially set the SUVmax threshold at 40% and then used the adaptive brush tool for semi-automated 3D segmentation to ensure that the VOI visually encompassed the entire primary tumor (20). Metabolic volume parameters such as SUVmax, SUVmean, peak standard uptake value (SUVpeak) and MTV were calculated automatically on the same post-processing workstation. TLG was derived using the formula TLG = SUVmean × MTV. For thin-section CT images, 3D semi-automated segmentation software was used, based on the region growth segmentation method (taking into account homogeneity and grey level differences), followed by manual layer-by-layer adjustments. Maximum tumor diameter (MTD) and gross tumor volume (GTV) were determined automatically. Thin-section CT radiological features included lesion size (MTD and GTV), lobulation, spiculation, pleural indentation, vacuole sign, cavity sign, vascular convergence, air bronchogram and calcification. To evaluate the reliability and reproducibility of radiomics features by calculating intra- and interobserver intraclass-correlation coefficient (ICC). Both physicians were blinded to patient pathology results and EGFR mutation status. Physician 1 and Physician 2 randomly selected 72 patients from the enrolled group to draw VOI from PET and thin-section CT images, with Physician 1 drawing again after 2 weeks. VOI segmentation of the remaining cases was performed by Physician 1.

2.5 Feature extraction and selection

Radiomics features were extracted using the PyRadiomics package in Python (version 3.0.1) (21). To minimize errors in the image data acquisition process, all images were normalized and resampled to a uniform resolution of 1 × 1 × 1 mm3 by interpolation prior to feature extraction. From the original images, the radiomics features included 14 shape-based, 18 first-order statistics, 24 grey level co-occurrence matrix (GLCM), 16 grey level run length matrix (GLRLM), 16 grey level size zone matrix (GLSZM), 14 grey level dependence matrix (GLDM), and 5 neighborhood grey tone difference matrix (NGTDM) features. The morphological features were only extracted from the original image. In addition to the original image features, we also extracted features after wavelet and local binary pattern (LBP) filtering to capture more efficient attributes. Wavelet filtering resulted in eight decomposed images representing all combinations of high-pass (H) or low-pass (L) filters applied in three dimensions, namely: wavelet-HHH, wavelet-HHL, wavelet-HLH, wavelet-HLL, wavelet-LHH, wavelet-LHL, wavelet-LLH and wavelet-LLL. A total of 944 PET and 944 CT radiomics features were extracted. Initial refinement using Pearson’s correlation test eliminated 1,447 features with a correlation coefficient |r| ≥ 0.90, leaving 441 features. Subsequent analysis of variance (ANOVA) identified the top 60 features with the most significant variance. To ensure feature reliability and reproducibility during segmentation, only 43 features with an ICC greater than 0.9 were retained. Finally, to mitigate overfitting, the least absolute shrinkage and selection operator (Lasso) algorithm combined with 10-fold cross-validation was used for optimal selection of the radiomics feature subset. Baseline clinical variables including age, sex, smoking status, tumor location (central/peripheral), lung lobe, clinical staging, TNM staging, tumor indicators (CEA, SCC and CYFRA21-1), histological subtypes and 13 other variables were collected, yielding a total of 33 clinical features. Radiological features included thin-slice CT features and PET features. Radiological features included the above thin-slice CT features and PET features, resulting in a total of 23 variables. Pearson correlation analysis, Mann-Whitney U test, ANOVA analysis, and 10-fold cross-validated Lasso regression model were used to screen clinical variables and radiological features.

2.6 Construction and validation of the model

Machine learning model constructing and performance evaluation were performed using Python. Four machine learning algorithms - logistic regression (LR), random forest (RF), support vector machine (SVM) and extreme gradient boosting (XGBoost) - were used to construct the radiomics models. A 5-fold cross-validation was implemented to ensure model robustness. The predictive ability of each algorithm was primarily assessed using the area under the curve (AUC) from receiver operating characteristic (ROC) curve analysis. The model with the highest AUC was considered the optimal radiomics model, from which the radiomics score (Rad score) was derived. Similarly, clinical variables and radiological features were screened and models were constructed using logistic regression. To combine radiomics, radiological and clinical features, we developed a combined model and assessed its performance. The identified clinical and radiological features, together with the Rad score, were then used to construct the nomograms. The goodness of fit of the nomograms was assessed using the calibration curve and the Hosmer-Lemeshow test (22). The clinical utility of the different models was assessed using decision curve analysis (DCA). The workflow of our study is shown in Figure 1.

Figure 1
www.frontiersin.org

Figure 1. The workflows of this study.

2.7 Statistical analysis

Statistical analyses were performed using SPSS software (version 26.0) and R software (version 4.2.2). Continuous variables were presented as either mean ± standard deviation or median (interquartile range), while categorical variables were expressed as percentages. For continuous variables, comparisons were made using independent samples t-tests or the Wilcoxon rank-sum test. Categorical variables were compared using the χ2 test or Fisher’s exact test. The DeLong test was used to assess statistical differences in the AUCs of the models. The nomogram, calibration curve, Hosmer-Lemeshow test and DCA were calculated using R (version 4.2.2, http://www.r-project.org). A two-tailed P value of less than 0.050 was considered statistically significant.

3 Results

3.1 Patient characteristics

The distribution of characteristics across the dataset is shown in Table 1. Gender and histological type showed significant differences between the EGFR wild-type and EGFR mutation groups in both the training (P < 0.001, P < 0.001, respectively) and validation groups (P = 0.019, P = 0.016, and P = 0.015, respectively). Patients with never smoking, elevated CEA levels and specific CT radiological features (spiculation, pleural indentation and air bronchogram) were more likely to have EGFR mutations in the training group (P < 0.001, P = 0.023, P = 0.036, P = 0.003 and P = 0.041, respectively). However, these differences were not statistically significant in the validation group (all P > 0.05). Compared to the wild-type group, lesion size (MTD and GTV) and all metabolic parameters were lower in the EGFR mutation group. However, with the exception of SUVmean (training group P = 0.020, validation group P = 0.005), no other significant differences were observed between the two cohorts (all P > 0.05).

Table 1
www.frontiersin.org

Table 1. Patient characteristics.

3.2 Feature selection and model performance evaluation

Finally, six optimal features were identified using the LASSO algorithm and tenfold cross-validation, consisting of four CT and two PET radiomics features. These features included: CT-original shape sphericity, CT-wavelet-HLL_glszm_SizeZoneNonUniformity, CT-wavelet-LHL_glszm_GrayLevelNonUniformity, CT-wavelet-LLH_glszm_GrayLevelVariance, PET-wavelet-HHH_glszm_GrayLevelNonUniformity, and PET-wavelet-LHL_firstorder_Minimum (as shown in Figure 2). The intraobserver and interobserver ICC of these six radiomics features were 0.9055-0.9979 and 0.9645-0.9979, respectively. In this study, four classifiers were used to construct radiomics-based models: LR, RF, SVM and XGBoost. The detailed performance metrics for each model are shown in Table 2. The DeLong test revealed that the RF model’s AUC surpassed those of the LR and SVM models with statistical significance (P < 0.001) and was comparable to the XGBoost model in the training group. In the validation group, RF exhibited the highest AUC, albeit without a significant difference between the groups. Consequently, the RF model demonstrated consistent and superior predictive performance, deeming it the most fitting radiomics model for this study. Separately, 4 clinical variables (smoking history, sex, adenocarcinoma and squamous cell carcinoma) and 3 radiological features (air bronchus sign, pleural indentation sign, and spiculation sign) were obtained through multi-step screening to establish corresponding models. The clinical model for distinguishing EGFR mutations, achieved an AUC of 0.711 (95% CI, 0.645-0.776) in the training group and 0.758 (95% CI, 0.627-0.890) in the validation group. The performance of radiological model was moderate, with AUC values of 0.632 (95% CI, 0.564-0.699) and 0.677 (95% CI, 0.531-0.822) in the training and validation groups, respectively.

Figure 2
www.frontiersin.org

Figure 2. The LASSO algorithm and tenfold cross-validation for the selection of radiomic features. (A) MSE trend associated with the change in λ: λ is selected by 10-fold cross-validation. When λ=0.0486, the MSE is minimum and the Lasso regression model shows the best performance. (B) Coefficient trend of each feature along with λ: 6 radiomic features with non-zero coefficients were selected in the final model. (C) The coefficient values of these radiomic features in the LASSO model. LASSO, least absolute shrinkage and selection operator; MSE, mean square error; λ, lambda.

Table 2
www.frontiersin.org

Table 2. The diagnostic performance of each model in identifying EGFR mutations.

3.3 Establishment and evaluation of the nomogram prediction model

To improve predictive accuracy, we combined radiomic, clinical and radiological features to create a combined model. This model achieved an AUC of 0.872 (95% CI, 0.829-0.915) and 0.831 (95% CI, 0.723-0.940) in the training and validation groups, respectively, as shown in Table 3 and Figure 3. In the training group, the AUC of the combined model outperformed that of the RF radiomics model (P = 0.010), the clinical model (P < 0.001) and the radiological model (P < 0.001). In the validation group, it outperformed the other three models, with a significant difference noted only against the radiological model (P = 0.017). Overall, the combined model showed superior predictive ability. As a result, we created an individualized nomogram in the validation group, which provides an intuitive visualization of the prediction results and their influencing factors (Figure 4A). Among the predictors, Rad-score was the most influential in predicting EGFR mutation. The Hosmer-Lemeshow test confirmed the accuracy of the combined model in both the training (χ2 = 7.3975, P = 0.495) and validation groups (χ2 = 9.8997, P = 0.272). Calibration curves further highlighted the agreement between observed and predicted results (Figure 4B). Decision curve analysis (DCA), shown in Figure 5, revealed that the area under the curve of the combined model outperformed other models, highlighting its superior clinical utility.

Table 3
www.frontiersin.org

Table 3. The Delong test for RF radiomic, clinical, radiological and combined model.

Figure 3
www.frontiersin.org

Figure 3. ROC curves for each prediction model in the training group (A) and validation group (B). All model results were evaluated using quintuple cross-validation. ROC, receiver operating characteristic; AUC, area under the curve; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Figure 4
www.frontiersin.org

Figure 4. Nomogram evolution and performance. (A) Nomogram based on the combined model. Histological type: 1 represents SCC, 0 represents ADC; Smoking: 1 represents current or former smoker, 0 represents never smoker; Gender: 1 represents female, 0 represents male; Radiological signs: 1 represents yes, 0 represents no. (B) Calibration curve of the nomogram in the training and validation groups.

Figure 5
www.frontiersin.org

Figure 5. Decision curve analysis (DCA) for RF radiomic, clinical, radiological and combined models in the training group (A) and validation group (B). The combined model for EGFR prediction added more value than the use of the treat-all scheme for threshold probabilities >20% in both the training and validation groups.

4 Discussion

Given the clear benefit of TKIs for NSCLC patients with an EGFR mutation, accurate detection of EGFR gene mutation status becomes critical for informed clinical treatment decisions. In this retrospective study, we developed individualized nomograms integrating 18F-FDG PET/CT radiomics, radiological and clinical features to provide a non-invasive approach to predict EGFR mutation status in NSCLC patients.

Previous research has shown that females, adenocarcinoma patients and non-smokers are more likely to have EGFR mutations (22, 23), a finding consistent with our results. By integrating these variables, our clinical model achieved AUCs of 0.711 and 0.758 in the training and validation groups, respectively. While numerous studies have investigated the relationship between 18F-FDG uptake and EGFR mutation status in NSCLC, their results have been inconsistent. Our analysis of PET metabolic parameters between the EGFR mutation and wild-type groups (detailed in Supplementary Table S1) is largely consistent with previous studies (24, 25). We observed minimal correlation between 18F-FDG uptake and EGFR mutation status in both groups, leading us to exclude PET metabolic parameters from our radiological model. Notably, while several studies have suggested that a higher SUVmax indicates the presence of an EGFR mutation, others have found no association between 18F-FDG uptake and EGFR mutation status (26, 27). Such discrepancies may be due to differences in sample size, sample characteristics, or ROI selection and measurement methods. In conclusion, PET metabolic parameters appear to have limited predictive value for EGFR mutations.

The integration of CT equipment into PET/CT scanners enhances the clarity of morphological features, potentially improving the diagnostic accuracy for NSCLC patients. Our study confirmed that CT morphological features such as air bronchial sign, pleural indentation sign and spicule sign were associated with an increased risk of EGFR mutation, which is consistent with previous studies (12, 28, 29). In addition, other studies have found that ground glass nodules (GGNs), which include both pure and mixed ground glass nodules, are often indicative of EGFR mutations in NSCLC patients (30, 31). Given the challenges in delineating GGNs on PET images, they were excluded from our study. Notably, our radiological model based on CT morphological features showed moderate predictive performance, with AUC values of 0.632 and 0.677 in the training and validation groups, respectively.

Research on radiomics for EGFR prediction is growing rapidly, with many studies confirming its feasibility and potential benefits. In our study, we used four machine learning classifiers (LR, RF, SVM and XGBoost) to construct an EGFR mutation prediction model using six optimal features refined by a four-step dimensionality reduction process. The AUC values for these models ranged from 0.659 to 0.794 in the training group and from 0.710 to 0.776 in the validation group. Given the relative stability and commendable predictive performance of the RF model, with AUC values of 0.785 and 0.776 in the training and validation groups respectively, it was selected for further research. Although the difference wasn’t statistically significant, we expect this to be resolved by increasing the sample size. Introduced by Breiman in 2001, RF is an ensemble learning method suitable for both classification and regression. It uses a collection of decision trees to create a diversified prediction model. Due to its robust predictive accuracy, resistance to overfitting, ability to model complex non-linear relationships and interpretability, RF has gained traction in biomedical engineering (3234). Wang et al. (35) demonstrated that an RF model based on preoperative CT radiomics features could detect EGFR mutations in lung adenocarcinoma patients, achieving AUC values of 0.70 and 0.64 for the training and validation groups, respectively. Gu et al. highlighted the superior performance of an RF-based radiomics classifier (AUC=0.776) in predicting Ki-67 expression levels in NSCLC patients (36). Some studies (24, 26, 37) have suggested that radiomics signatures from 18F-FDG PET/CT images provide better EGFR mutation predictions than those from stand-alone CT or conventional PET images. Recent studies typically report AUC values between 0.57 and 0.86 when relying on PET/CT radiomics features (16, 24, 38, 39). While factors such as image data sources, spatial resolution, post-processing, model algorithms and data size can introduce variability, the collective body of work, including our study, underscores the potential of radiomics-based machine learning models for EGFR mutation prediction.

In our study, PET and CT images were filtered and pre-processed prior to feature extraction. This step is critical because it minimizes image acquisition errors and ensures that the results are both stable and reliable. Our results underline that the final six radiomics features are highly reliable. Even when different machine learning algorithms are used to construct models using these features, the resulting models consistently show commendable predictive performance. Notably, five of the six radiomics features were wavelet features, highlighting the central role of features derived from wavelet-filtered images in the radiomics model. Wavelet transform, a widely used method for noise reduction, data smoothing and filtering, is excellent at revealing specific patterns hidden in large datasets. By capturing tumor heterogeneity, wavelet features potentially improve the predictive power of the model (40). Similarly, Zhang et al. (41) found that seven out of twelve wavelet-transformed features correlated with EGFR mutations. This suggests that texture and high-dimensional features may have a more robust association with EGFR mutation status.

Although the RF radiomics model in our study showed slightly better predictive power than the clinical and radiological models, reliance on it alone for clinical applications may be limited. Zhang et al. (38) found that a model combining PET/CT radiomics with clinical features (gender and smoking history) outperformed a model based on PET/CT radiomics alone, with AUC values of 0.86 vs 0.79 in the training group and 0.87 vs 0.85 in the validation group. Another study constructed an integrated model using CT radiomics, CT radiological features and clinical features to predict EGFR mutations in adenocarcinoma patients, achieving AUCs of 0.849 and 0.835 in the training and validation groups, respectively (41). Chang et al. (37) also showed that a combined model integrating PET/CT radiomics with CT morphological features was more effective in predicting EGFR mutations in lung adenocarcinoma than a model based on PET/CT radiomics alone (AUC: 0.84 vs. 0.76 in the training group and 0.81 vs. 0.75 in the validation group). In our study, the combined model incorporating rad-score, clinical and radiological features achieved AUCs of 0.872 and 0.831 in the training and validation groups, respectively, outperforming the stand-alone clinical and radiological models. To further improve the clinical utility, we developed a radiomics-based nomogram that integrates the Rad score with the aforementioned clinical and radiological features to provide a visualized prediction. Decision curve analysis (DCA) further validated the clinical applicability of this nomogram.

Our study has several limitations that need to be considered. First, it is a single-center retrospective study with a relatively small sample size. A larger sample size can enhance research reliability, interpretability, and generalization, while mitigating selection bias. To further validate and improve model performance, we plan to expand the sample size or collaborate with multicenter data for both single-center and external validations in future work. Second, the use of manual and semi-automated outlining methods introduces the possibility of human error. These methods may also lack the repeatability seen with fully automated outlining. Third, while our study provides an initial exploration of radiomics using four different classifiers, the optimal feature selection method and machine learning algorithm for specific applications remains a matter of debate. Future research will combine radiomics with deep learning to achieve fully automated analysis of the entire process from tumor segmentation to prediction, and improve prediction efficiency (AUC > 0.9) to enhance its clinical applicability.

5 Conclusions

In conclusion, the combination of radiomics and machine learning using 18F-FDG PET/CT images offers a promising approach to identify EGFR mutation status in NSCLC patients. The integration of clinical and radiological features with the Rad score further improves the predictive accuracy. Radiomics-based nomograms provide a valuable, non-invasive and visually intuitive tool for screening patients with EGFR mutations in a clinical setting.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving humans were approved by Ethics Committee of The First Affiliated Hospital of University of Science and Technology of China (approval number 2023-RE-018). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin because the data are anonymous, and the requirement for informed consent was therefore waived. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article because the data are anonymous, and the requirement for informed consent was therefore waived.

Author contributions

JL: Data curation, Investigation, Writing – original draft. QS: Data curation, Investigation, Software, Writing – original draft. YY: Data curation, Formal Analysis, Writing – original draft. JX: Data curation, Formal Analysis, Writing – original draft. QX: Data curation, Formal Analysis, Software, Supervision, Writing – original draft. MN: Funding acquisition, Project administration, Validation, Writing – review & editing. XW: Conceptualization, Supervision, Validation, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the Joint Fund for Medical Artificial Intelligence (MAI2022Q017).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1510386/full#supplementary-material

References

1. Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin. (2024) 74:12–49. doi: 10.3322/caac.21820

PubMed Abstract | Crossref Full Text | Google Scholar

2. Singh T, Fatehi Hassanabad M, Fatehi Hassanabad A. Non-small cell lung cancer: Emerging molecular targeted and immunotherapeutic agents. Biochim Biophys Acta (BBA) - Rev Cancer. (2021) 1876:188636. doi: 10.1016/j.bbcan.2021.188636

PubMed Abstract | Crossref Full Text | Google Scholar

3. Eguchi T, Bains S, Lee M-C, Tan KS, Hristov B, Buitrago DH, et al. Impact of increasing age on cause-specific mortality and morbidity in patients with stage I non-small-cell lung cancer: A competing risks analysis. J Clin Oncol. (2016) 35:281–90. doi: 10.1200/JCO.2016.69.0834

PubMed Abstract | Crossref Full Text | Google Scholar

4. Liu JF, Sun XS, Yin JH, Xu XE. Adjuvant EGFR-TKI therapy in resected EGFR-mutation positive non-small cell lung cancer: A real-world study. Front Oncol. (2023) 13:1132854. doi: 10.3389/fonc.2023.1132854

PubMed Abstract | Crossref Full Text | Google Scholar

5. Grant C, Nagasaka M. Neoadjuvant EGFR-TKI therapy in Non-Small cell lung cancer. Cancer Treat Rev. (2024) 126:102724. doi: 10.1016/j.ctrv.2024.102724

PubMed Abstract | Crossref Full Text | Google Scholar

6. Wu S-G, Shih J-Y. Management of acquired resistance to EGFR TKI-targeted therapy in advanced non-small cell lung cancer. Mol Cancer. (2018) 17:38. doi: 10.1186/s12943-018-0777-1

PubMed Abstract | Crossref Full Text | Google Scholar

7. Riely GJ, Wood DE, Ettinger DS, Aisner DL, Akerley W, Bauman JR, et al. Non-small cell lung cancer, version 4.2024, NCCN clinical practice guidelines in oncology. J Natl Compr Cancer Network. (2024) 22:249–74. doi: 10.6004/jnccn.2204.0023

PubMed Abstract | Crossref Full Text | Google Scholar

8. Shiau CJ, Babwah JP, da Cunha Santos G, Sykes JR, Boerner SL, Geddie WR, et al. Sample features associated with success rates in population-based EGFR mutation testing. J Thorac Oncol. (2014) 9:947–56. doi: 10.1097/JTO.0000000000000196

PubMed Abstract | Crossref Full Text | Google Scholar

9. Rolfo C, Mack PC, Scagliotti GV, Baas P, Barlesi F, Bivona TG, et al. Liquid biopsy for advanced non-small cell lung cancer (NSCLC): A statement paper from the IASLC. J Thorac Oncol. (2018) 13:1248–68. doi: 10.1016/j.jtho.2018.05.030

PubMed Abstract | Crossref Full Text | Google Scholar

10. Liu Y, Kim J, Qu F, Liu S, Wang H, Balagurunathan Y, et al. CT features associated with epidermal growth factor receptor mutation status in patients with lung adenocarcinoma. Radiology. (2016) 280:271–80. doi: 10.1148/radiol.2016151455

PubMed Abstract | Crossref Full Text | Google Scholar

11. Suh YJ, Lee H-J, Kim YJ, Kim KG, Kim H, Jeon YK, et al. Computed tomography characteristics of lung adenocarcinomas with epidermal growth factor receptor mutation: A propensity score matching study. Lung Cancer. (2018) 123:52–9. doi: 10.1016/j.lungcan.2018.06.030

PubMed Abstract | Crossref Full Text | Google Scholar

12. Zhang H, Cai W, Wang Y, Liao M, Tian S. CT and clinical characteristics that predict risk of EGFR mutation in non-small cell lung cancer: a systematic review and meta-analysis. Int J Clin Oncol. (2019) 24:649–59. doi: 10.1007/s10147-019-01403-3

PubMed Abstract | Crossref Full Text | Google Scholar

13. Chen L, Zhou Y, Tang X, Yang C, Tian Y, Xie R, et al. EGFR mutation decreases FDG uptake in non−small cell lung cancer via the NOX4/ROS/GLUT1 axis. Int J Oncol. (2019) 54:370–80. doi: 10.3892/ijo.2018.4626

PubMed Abstract | Crossref Full Text | Google Scholar

14. Abdurixiti M, Nijiati M, Shen R, Ya Q, Abuduxiku N, Nijiati M. Current progress and quality of radiomic studies for predicting EGFR mutation in patients with non-small cell lung cancer using PET/CT images: a systematic review. Br J Radiology. (2021) 94:20201272. doi: 10.1259/bjr.20201272

PubMed Abstract | Crossref Full Text | Google Scholar

15. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. (2017) 14:749–62. doi: 10.1038/nrclinonc.2017.141

PubMed Abstract | Crossref Full Text | Google Scholar

16. Koyasu S, Nishio M, Isoda H, Nakamoto Y, Togashi K. Usefulness of gradient tree boosting for predicting histological subtype and EGFR mutation status of non-small cell lung cancer on 18F-FDG-PET/CT. Ann Nucl Med. (2020) 34:49–57. doi: 10.1007/s12149-019-01414-0

PubMed Abstract | Crossref Full Text | Google Scholar

17. Hinzpeter R, Kulanthaivelu R, Kohan A, Murad V, Mirshahvalad SA, Avery L, et al. Predictive [18F]-FDG PET/CT-based radiogenomics modelling of driver gene mutations in non-small cell lung cancer. Acad Radiology. (2024) 31:5314–23. doi: 10.1016/j.acra.2024.06.038

PubMed Abstract | Crossref Full Text | Google Scholar

18. Ma N, Yang W, Wang Q, Cui C, Hu Y, Wu Z. Predictive value of 18F-FDG PET/CT radiomics for EGFR mutation status in non-small cell lung cancer: a systematic review and meta-analysis. Front Oncol. (2024) 14:1281572. doi: 10.3389/fonc.2024.1281572

PubMed Abstract | Crossref Full Text | Google Scholar

19. Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. (2020) 295:328–38. doi: 10.1148/radiol.2020191145

PubMed Abstract | Crossref Full Text | Google Scholar

20. Krarup MMK, Nygård L, Vogelius IR, Andersen FL, Cook G, Goh V, et al. Heterogeneity in tumours: Validating the use of radiomic features on 18F-FDG PET/CT scans of lung cancer patients as a prognostic tool. Radiotherapy Oncol. (2020) 144:72–8. doi: 10.1016/j.radonc.2019.10.012

PubMed Abstract | Crossref Full Text | Google Scholar

21. van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. (2017) 77:e104–7. doi: 10.1158/0008-5472.CAN-17-0339

PubMed Abstract | Crossref Full Text | Google Scholar

22. Kramer AA, Zimmerman JE. Assessing the calibration of mortality benchmarks in critical care: The Hosmer-Lemeshow test revisited. Crit Care Med. (2007) 35:2052–56. doi: 10.1097/01.CCM.0000275267.64078.B0

PubMed Abstract | Crossref Full Text | Google Scholar

23. Wang HC, Wang ZM, Hu WD, Liang XQ, Cui LL. Correlation of FDG PET/CT, tumor markers and Ki-67 index with EGFR mutation or positive ALK expression in patients with non-small cell lung cancer. Q J Nucl Med Mol Imaging. (2024) 68:169–75. doi: 10.23736/S1824-4785.24.03535-0

PubMed Abstract | Crossref Full Text | Google Scholar

24. Li X, Yin G, Zhang Y, Dai D, Liu J, Chen P, et al. Predictive power of a radiomic signature based on 18F-FDG PET/CT images for EGFR mutational status in NSCLC. Front Oncol. (2019) 9. doi: 10.3389/fonc.2019.01062

PubMed Abstract | Crossref Full Text | Google Scholar

25. Guo Y, Zhu H, Che C, Li X, Liu F, Yao Z. A nomogram based on 18F-fluorodeoxyglucose PET/CT and clinical features to predict epidermal growth factor receptor mutation status in patients with lung adenocarcinoma. Quantitative Imaging Med Surgery. (2022) 12:5239–50. doi: 10.21037/qims-22-248

PubMed Abstract | Crossref Full Text | Google Scholar

26. Li S, Li Y, Zhao M, Wang P, Xin J. Combination of 18F-fluorodeoxyglucose PET/CT radiomics and clinical features for predicting epidermal growth factor receptor mutations in lung adenocarcinoma. Korean J Radiology. (2022) 23:921. doi: 10.3348/kjr.2022.0295

PubMed Abstract | Crossref Full Text | Google Scholar

27. Caicedo C, Garcia-Velloso MJ, Lozano MD, Labiano T, Vigil Diaz C, Lopez-Picazo JM, et al. Role of [18F]FDG PET in prediction of KRAS and EGFR mutation status in patients with advanced non-small-cell lung cancer. Eur J Nucl Med Mol Imaging. (2014) 41:2058–65. doi: 10.1007/s00259-014-2833-4

PubMed Abstract | Crossref Full Text | Google Scholar

28. AlGharras A, Kovacina B, Tian Z, Alexander JW, Semionov A, van Kempen LC, et al. Imaging-based surrogate markers of epidermal growth factor receptor mutation in lung adenocarcinoma: A local perspective. Can Assoc Radiologists J. (2020) 71:208–16. doi: 10.1177/0846537119888387

PubMed Abstract | Crossref Full Text | Google Scholar

29. Han X, Fan J, Li Y, Cao Y, Gu J, Jia X, et al. Value of CT features for predicting EGFR mutations and ALK positivity in patients with lung adenocarcinoma. Sci Rep. (2021) 11:5679. doi: 10.1038/s41598-021-83646-7

PubMed Abstract | Crossref Full Text | Google Scholar

30. Cheng Z, Shan F, Yang Y, Shi Y, Zhang Z. CT characteristics of non-small cell lung cancer with epidermal growth factor receptor mutation: a systematic review and meta-analysis. BMC Med Imaging. (2017) 17:5. doi: 10.1186/s12880-016-0175-3

PubMed Abstract | Crossref Full Text | Google Scholar

31. Qiu X, Yuan H, Sima B. Relationship between EGFR mutation and computed tomography characteristics of the lung in patients with lung adenocarcinoma. Thorac Cancer. (2019) 10:170–4. doi: 10.1111/tca.2019.10.issue-2

PubMed Abstract | Crossref Full Text | Google Scholar

32. Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinf. (2006) 7:3. doi: 10.1186/1471-2105-7-3

PubMed Abstract | Crossref Full Text | Google Scholar

33. Dutschmann T-M, Baumann K. Evaluating high-variance leaves as uncertainty measure for random forest regression. Molecules. (2021) 26:6514. doi: 10.3390/molecules26216514

PubMed Abstract | Crossref Full Text | Google Scholar

34. Jia T-Y, Xiong J-F, Li X-Y, Yu W, Xu Z-Y, Cai X-W, et al. Identifying EGFR mutations in lung adenocarcinoma by noninvasive imaging using radiomics features and random forest modeling. Eur Radiology. (2019) 29:4742–50. doi: 10.1007/s00330-019-06024-y

PubMed Abstract | Crossref Full Text | Google Scholar

35. Wang S, Shi J, Ye Z, Dong D, Yu D, Zhou M, et al. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. Eur Respir J. (2019) 53:1800986. doi: 10.1183/13993003.00986-2018

PubMed Abstract | Crossref Full Text | Google Scholar

36. Gu Q, Feng Z, Liang Q, Li M, Deng J, Ma M, et al. Machine learning-based radiomics strategy for prediction of cell proliferation in non-small cell lung cancer. Eur J Radiology. (2019) 118:32–7. doi: 10.1016/j.ejrad.2019.06.025

PubMed Abstract | Crossref Full Text | Google Scholar

37. Chang C, Zhou S, Yu H, Zhao W, Ge Y, Duan S, et al. A clinically practical radiomics-clinical combined model based on PET/CT data and nomogram predicts EGFR mutation in lung adenocarcinoma. Eur Radiology. (2021) 31:6259–68. doi: 10.1007/s00330-020-07676-x

PubMed Abstract | Crossref Full Text | Google Scholar

38. Zhang J, Zhao X, Zhao Y, Zhang J, Zhan Z, Wang J, et al. Value of pre-therapy 18F-FDG PET/CT radiomics in predicting EGFR mutation status in patients with non-small cell lung cancer. Eur J Nucl Med Mol Imaging. (2020) 47:1137–46. doi: 10.1007/s00259-019-04592-1

PubMed Abstract | Crossref Full Text | Google Scholar

39. An N, Zhang Y, Niu H, Li Z, Cai J, Zhao Q, et al. EGFR-TKIs versus taxanes agents in therapy for nonsmall-cell lung cancer patients: A PRISMA-compliant systematic review with meta-analysis and meta-regression. Medicine. (2016) 95:e5601. doi: 10.1097/MD.0000000000005601

PubMed Abstract | Crossref Full Text | Google Scholar

40. Cui Y, Liu H, Ren J, Du X, Xin L, Li D, et al. Development and validation of a MRI-based radiomics signature for prediction of KRAS mutation in rectal cancer. Eur Radiology. (2020) 30:1948–58. doi: 10.1007/s00330-019-06572-3

PubMed Abstract | Crossref Full Text | Google Scholar

41. Zhang G, Cao Y, Zhang J, Ren J, Zhao Z, Zhang X, et al. Predicting EGFR mutation status in lung adenocarcinoma: development and validation of a computed tomography-based radiomics signature. Am J Cancer Res. (2021) 11:546–60.

PubMed Abstract | Google Scholar

Keywords: nomogram, non-small cell lung cancer, PET/CT, machine learning, epidermal growth factor receptor

Citation: Li J, Shi Q, Yang Y, Xie J, Xie Q, Ni M and Wang X (2025) Prediction of EGFR mutations in non-small cell lung cancer: a nomogram based on 18F-FDG PET and thin-section CT radiomics with machine learning. Front. Oncol. 15:1510386. doi: 10.3389/fonc.2025.1510386

Received: 12 October 2024; Accepted: 14 March 2025;
Published: 02 April 2025.

Edited by:

Xin Tang, Hangzhou Wuyunshan Hospital, China

Reviewed by:

Hailin Tang, Sun Yat-sen University Cancer Center (SYSUCC), China
Xiaoliang Shao, the Third Affiliated Hospital of Soochow University, China

Copyright © 2025 Li, Shi, Yang, Xie, Xie, Ni and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xuemei Wang, d2FuZ3h1ZW1laTIwMTAxMEAxNjMuY29t; Ming Ni, bmltaW5nMTk4NEB1c3RjLmVkdS5jbg==

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Research integrity at Frontiers

94% of researchers rate our articles as excellent or good

Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


Find out more