Machine learning based ultrasomics noninvasive predicting EGFR expression status in hepatocellular carcinoma patients

Ma, Yujing; Duan, Shaobo; Ren, Shanshan; Bu, Didi; Li, Yahong; Cai, Xiguo; Zhang, Lianzhong

doi:10.3389/fmed.2024.1483291

ORIGINAL RESEARCH article

Front. Med. , 19 November 2024

Sec. Hepatobiliary Diseases

Volume 11 - 2024 | https://doi.org/10.3389/fmed.2024.1483291

Machine learning based ultrasomics noninvasive predicting EGFR expression status in hepatocellular carcinoma patients

Yujing Ma¹

Shaobo Duan²

Shanshan Ren³

Didi Bu⁴

Yahong Li⁴

Xiguo Cai⁵

Lianzhong Zhang^1,6^*

¹Henan University People’s Hospital, Henan Provincial People’s Hospital, Zhengzhou, China
²Department of Health Management, Henan Provincial People’s Hospital, Zhengzhou, China
³Department of Ultrasound, Henan Provincial People’s Hospital, Zhengzhou, China
⁴Zhengzhou University People’s Hospital, Henan Provincial People’s Hospital, Zhengzhou, China
⁵Henan Rehabilitation Clinical Medical Research Center, Henan Provincial People’s Hospital, Zhengzhou, China
⁶Henan International Joint Laboratory of Ultrasonic Nanotechnology and Artificial Intelligence in Precision Theragnostic Systems, Henan Provincial People’s Hospital, Zhengzhou, China

Objective: To investigate the ability of ultrasomics to noninvasively predict epidermal growth factor receptor (EGFR) expression status in patients with hepatocellular carcinoma (HCC).

Methods: 198 HCC patients were comprised in the study (n = 138 in the training dataset and n = 60 in the test dataset). EGFR expression was detected by immunohistochemistry. Ultrasomics features from gray-scale ultrasound images were extracted. Intra-class correlation coefficient (ICC) screening, variance filtering, mutual information method, and extreme gradient boosting (XGboost) embedding method were applied for selecting the best features. Random forest (RF), XGBoost, support vector machine (SVM), decision tree (DT), and logistic regression (LR) 5 machine learning algorithms were used to construct clinical models, ultrasomics models, and clinical-ultrasomics combined models, respectively. Area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, decision curve analysis (DCA), and calibration curve were used to assess the predictive performance of the model.

Results: In 198 patients, high EGFR expression was observed in 100 patients and low EGFR expression was observed in 98 patients. The RF machine learning ultrasomics model was found to perform well, with the AUC of the training and test dataset being 0.929 (95%CI, 0.874–0.966) and 0.807 (95%CI, 0.684–0.897) respectively, the sensitivity being 0.843 and 0.767 respectively, the specificity being 0.857 and 0.800 respectively, and the accuracy being 0.850 and 0.783, respectively. The predictive performance of the combined model established by integrating ultrasomics features and clinical baseline characteristics was improved, with the AUC, sensitivity, specificity, and accuracy of the RF machine learning combined model for the training and test dataset reaching 0.937 (95%CI, 0.884–0.971), 0.822 (95%CI, 0.702–0.909); 0.857, 0.833; 0.857, 0.800; 0.857, 0.817, respectively.

Conclusion: To predict the status of EGFR expression in HCC patients, the ultrasomics model and combined model created by five machine learning algorithms can be utilized as efficient and noninvasive techniques, and the ultrasomics model and combined model established by RF classifier have the best predictive performance.

1 Introduction

Hepatocellular carcinoma (HCC) is one of the three leading cancers with the lowest survival rates worldwide (1). In China, liver cancer is burdened and has an insidious onset, and most HCC patients are in advanced stages at presentation, particularly those with cirrhosis or severe liver fibrosis, often losing the opportunity for surgical resection (2). Tyrosine kinase inhibitors (TKIs) and other systemic treatments were made the preferred choice for patients with advanced hepatocellular carcinoma (aHCC) (3, 4). However, the molecular biological and genetic changes during the division of cancer cells endowed HCC with heterogeneous characteristics (5, 6), which affected the therapeutic effects and prognosis of the patients (7).

Epidermal growth factor receptor (EGFR) is located on the cell membrane surface and is a receptor for cell proliferation and signal transduction, and its expression status is related to tumor progression and prognosis (8, 9). EGFR high expression (EGFR^high) status activates more downstream signaling pathways and promotes proliferation, metastasis and invasiveness of tumor cells, resulting in poor tumor prognosis (10, 11). EGFR is highly expressed in 40–70% of HCC patients, and it has been shown that HCC patients with EGFR^high have a poor prognosis and have a shorter survival time than those with low EGFR expression (EGFR^low) (12, 13). A recent study published in Nature found that HCC patients with EGFR^high were more likely to develop resistance to TKIs, particularly lenvatinib (12). Only in HCC patients with EGFR^high, lenvatinib induces the feedback activation of EGFR and its downstream PAK2-ERK5 signaling pathway by inhibiting FGFR and downstream ERK1/2 (14), and simultaneously activates the downstream signaling pathway MEK1/2-ERK1/2, which is common with FGFR, resulting in strong proliferation ability of HCC cells while lenvatinib was administered. EGFR inhibitors effectively blocked feedback activation, and combined with Lenvatinib, produced synergistic antitumor effects, indicating that HCC patients with EGFR^high could benefit from this combination (12, 15). Thus, prediction of EGFR expression status not only allows assessment of HCC prognosis, but also enables precise treatment strategies for risk stratification of patients.

EGFR expression requires immunohistochemical detection by surgical resection specimens or biopsies. However, such invasive and less reproducible modality is not suitable for patients with aHCC. Radiomics is a specific algorithm that performs feature extraction and deep mining of standard medical images not only quantifies image features, but also analyzes the molecular phenotype of tumor cells to explore tumor heterogeneity in a non-invasive and reproducible manner (16, 17). Previous researches have reported that computed tomography (CT), magnetic resonance imaging (MRI) and Ultrasound (US) based on radiomics features have the ability to noninvasively characterize biomarkers such as cytokeratin 19 (CK19), vascular endothelial growth factor receptor (VEGFR), and P53 and have achieved promising predictive results (18–20). Up to now, there are few reports on the use of radiomics features to predict EGFR expression in patients with HCC. Since ultrasound is non-invasive, non-radiative, highly repeatable, and reasonably priced, it is one of the most often used techniques for liver testing (20). Thus, the current study is intended to investigate the value of ultrasomics features based on gray-scale ultrasound images for noninvasive prediction of EGFR expression status in patients with HCC, thus providing more objective evidence for precise treatment of aHCC.

2 Materials and methods

2.1 Study population

735 HCC patients who underwent surgical resection in Henan Provincial People ‘s Hospital from January 2021 to December 2023 were retrospectively analyzed. Inclusion criteria (1) pathological diagnosis of HCC; (2) liver ultrasound examination within 4 weeks before surgery; (3) complete ultrasound and clinical image data. Exclusion Criteria (1) previous treatment with local, systemic or liver transplantation; (2) having tumors in other organs; (3) poor image quality, incomplete lesion display. Finally, 198 HCC patients were included in the study. These 198 patients were randomly stratified (7:3) into a training dataset (n = 138) and a test dataset (n = 60). The training dataset was processed for imbalanced dataset using the Synthetic Minority Over-sampling Technique (SMOTE) (21).

Age, gender, maximum tumor diameter, tumor number, Child-Pugh(A/B/C), cirrhosis (yes/no), HbsAg/HbcAb (positive/negative), portal hypertension (yes/no), Edmondson-steiner grade, alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin (TBIL), glutamyl transpeptidase (GGT), serum alpha-fetoprotein (AFP), neutrophil-to-lymphocyte ratio (NLR) and other clinical data were derived from medical records. A flow chart of patient selection was shown in Figure 1.

Figure 1

Figure 1. The patients were screened and enrolled according to the established exclusion criteria.

2.2 EGFR immunohistochemical analysis

Liver cancer was surgically excised from all patient, regarding the preparation of the immunohistochemical sections provided in Supplementary material 1.

Without knowledge of the patient ‘s information, two observers analyzed the membrane staining intensity of each section and the percentage of number cells at various staining intensities under a light microscope with scores calculated by the H- score formula, and any disagreement assessed by a third observer. Staining intensity was graded into four grades: 0 as no staining; 1+ as weak staining (light brown membrane staining); 2+ as moderate staining (between 1+ and 3+); and 3+ as strong staining (dark brown linear membrane staining) (22). H-score formula: 1× (% of 1+ cells) + 2 × (% of 2+ cells) + 3 × (% of 3 + cells) (23). The score was 0–300 points and the threshold was set at 200 points, and HCC patients were divided into low-expression (H < 200 points) and high-expression (H ≥ 200 points) groups according to score (12, 22) (Figures 2C,F).

Figure 2

Figure 2. Representative images of lesion segmentation (arrow pointing) and corresponding pathological images of two HCC patients. (A–C) Show the gray-scale ultrasound image, lesion segmentation image, and EGFR^low pathological image of a 63-year-old male patient (H<200); (D–F) Show the gray-scale ultrasound image, lesion segmentation image, and EGFR^high pathological image of a 55-year-old male patient (H ≥ 200).

2.3 Image acquisition

Image scans were performed by physicians with over 8 years of abdominal ultrasound experience, and ultrasound image imaging features were qualitatively assessed: (1) Lesion margin (clear/unclear); (2) Lesion echo (Hypo/Iso/hyper-echoic). Tumor images of the largest diameter were stored in Digital Imaging and Communications in Medicine (DICOM) format for further study (as shown in Figures 2A,D). Ultrasonographic parameters were presented in Supplementary material 2.

2.4 Image segmentation

HCC lesions were defined as regions of interest (ROIs). The ITK-SNAP program (version 3.8.0, Figure, www.itksnap.org) was used to import all ultrasound pictures, the delineation process was performed independently by two sonographers with 10 and 15 years of experience in the field, confirmed by a senior sonographer (with 25 years of expertise), and the clinical data about the patient was blinded by the three physicians to avoid differences between and within observers affecting the results. Thirty ultrasound images were randomly selected to assess interobserver reproducibility. The intra-class correlation coefficient (ICC) was used to evaluate the characteristic, and features with ICC ≥ 0.80 were defined as having good agreement (20, 24) to improve the repeatability of features. Segmented images of the lesions were shown in Figures 2B,E.

2.5 Feature extraction

Before feature extraction, raw images were preprocessed using 14 filters to obtain corresponding derived images to reduce the impact of different ultrasound devices on features. Pyradiomics 2.1.2, an open-source software program, was utilized to take information out of all raw and derived images and convert it into quantitative features. The feature extraction taken was presented in Supplementary material 3.

2.6 Feature selection

After extracting all features, missing values for each feature were filled with means. Features in higher dimensions may have problems with low computational efficiency and overfitting (16, 25). Z-score normalization was used to eliminate dimensional differences in the data before feature selection. Features with ICC ≥ 0.8 were first selected, indicating that the feature was reproducible. Features with zero variance (i.e., features without any contribution to classification) were removed using variance filtering. Linear and nonlinear correlations between features and tags were captured using mutual information method, excluding features with maximal information coefficient (MIC) zero. Ultimately, the most valuable ultrasomics features were selected in combination with XGBoost embedding method.

2.7 Modeling and performance evaluation

5 machine learning algorithms, RF, XGBoost, SVM, DT, and LR, were used to construct clinical models, ultrasomics models, and clinical-ultrasomics combined models, respectively, for a total of 15.

Firstly, univariate analysis was performed for characteristics between EGFR^high and EGFR^low groups, including clinical data [Age, gender, maximum tumor diameter, tumor number, Child-Pugh(A/B/C), HbsAg/HbcAb (positive/negative), cirrhosis (yes/no), portal hypertension (yes/no), Edmondson-steiner grade, ALT, AST, TBIL, GGT, NLR, AFP] and qualitative imaging characteristics [lesion margin (clear/unclear), lesion echo (Hypo/Iso/hyper-echoic)]. Independent predictors were analyzed by including variables with p < 0.05 in univariate analysis into univariate and multivariate logistic regression. The above independent predictors were used to construct the clinical model by five machine learning algorithms.

The most valuable ultrasomics features extracted were used to construct the ultrasomics model through five machine learning algorithms. Finally, ultrasomics features were fused with clinical baseline features to build five combined models to investigate whether the accuracy of the model in predicting EGFR expression status could be improved.

The predictive ability of the model was evaluated through the area under the curve (AUC) value as well as its sensitivity, specificity, and accuracy. To evaluate the clinical practicability and efficiency of models, decision curve analysis (DCA) and calibration curve analyses were employed. Within the Python environment, the scikit-learn 0.23.2 package was used for both the model construction and evaluation. The workflow was illustrated as shown in Figure 3.

Figure 3

Figure 3. The ultrasomics workflow and study flowchart. (A) Clinical data. (B) Image segmentation, feature extraction and selection. (C) Model building. (D) Model evaluation.

2.8 Statistical analysis

Statistical analysis was conducted using SPSS 26.0 and R 4.4.1. Continuous variables that were normally distributed were assessed using the independent sample t-test, while non-normal distributions were evaluated using a Mann–Whitney U test. Categorical variables were assessed using the chi-square test or Fisher’s exact test. A p-value of less than 0.05 was considered to indicate statistical significance.

3 Results

3.1 Clinical features

In the present study, 198 patients were included, with an average age of 57.07 ± 9.02 years, of whom 77.8% (n = 154) were male. There was no statistically significant difference in EGFR expression status and clinical baseline characteristics between the training and test dataset (p > 0.05). Table 1 summarized the clinical baseline characteristics of all patients.

Table 1

Table 1. Patient clinical baseline characteristics of the training and test datasets.

3.2 Feature extraction and selection

A total of 1,409 features were taken out of the original and derived images. 285 features with ICC less than 0.8 were excluded. Using variance filtering and mutual information techniques, the remaining 1,124 features, 16 features with 0 in order variance, and 495 features with nil mutual information characteristics were eliminated. 602 features were further excluded using the embedded method of XGBoost, ultimately identifying 11 of the most valuable ultrasomics features, including original, shape, first-order, second-order texture, square, exponential, gradient, and higher-order (wavelet features, etc.). The feature extraction and selection process were detailed in Supplementary materials 3, 4.

3.3 Predictive performance of clinical models

Through univariate analysis of characteristics between EGFR^high and EGFR^low groups, there were significant differences in three clinical data (age, AFP, NLR) and one qualitative imaging feature (lesion margin) (p < 0.05). Univariate and multiple logistic regression analyses of these variables showed age (OR = 0.958, 95% CI 0.926–0.991, p = 0.013) and focus margin characteristics (OR = 2.114, 95% CI 1.164–3.839, p = 0.014) as independent predictors (Table 2). The above variables were used to construct the clinical model using five machine learning algorithms, and the clinical model with good predictive performance was RF and XGBoost classifier. AUC in the test dataset was 0.713 (95%CI, 0.582–0.823) and 0.733 (95%CI, 0.603–0.839), respectively (Figures 4A,D), sensitivity was 0.700 and 0.700, specificity was 0.767 and 0.633, and accuracy was 0.733 and 0.667, respectively. The clinical model of RF classifier was higher than the clinical model of XGBoost classifier in specificity and accuracy, showed that the RF classifier established a clinical model with good predictive performance. In the test dataset, 21 of 30 EGFR^high patients and 23 of 30 EGFR^low patients were identified by the clinical model of RF classifier (Figure 5).

Table 2

Table 2. Univariate and multivariate assessments of variables related to EGFR expression status.

Figure 4

Figure 4. (A) clinical model of the RF algorithm. (B) ultrasomics model of the RF algorithm. (C) combined model of the RF algorithm. (D) clinical model of the XGBoost algorithm. (E) AUC comparison of five machine learning algorithms constructed as combined models in the training dataset. (F) AUC comparison of five machine learning algorithms constructed as combined models in the test dataset. (G) DCA of three RF algorithm models. (H) The calibration curve for the RF algorithm’s combined model.

Figure 5

Figure 5. Number of true positive, false positive, true negative, and false negative events in the training and test dataset for clinical models, ultrasomics models, and combined models of RF algorithm.

3.4 Predictive performance of ultrasomics and combined models

11 most valuable ultrasomics features were analyzed and ultrasomics models were built by five machine learning algorithms. The results showed that the Ultrasomics model of RF classifier performed well in predicting EGFR expression in HCC patients, with AUC of 0.929 (95%CI, 0.874–0.966) and 0.807 (95%CI, 0.684–0.897) (Figure 4B), sensitivity of 0.843 and 0.767, specificity of 0.857 and 0.800, and accuracy of 0.850 and 0.783 in the training and test dataset, respectively. In the test dataset, 23 of 30 EGFR^high patients and 24 of 30 EGFR^low patients were identified by the ultrasomics model of RF classifier (Figure 5).

Finally, the predictive performance of the model was further optimized by fusing ultrasomics features with clinical baseline features to build a combined model, and in the test dataset, the AUC of the combined model established by the five machine learning algorithms was RF (0.822), XGboost (0.811), DT (0.753), SVM (0.751), and LR (0.733), and the combined model predictive performance of RF classifier was better than that of the other models (Figures 4C,F). In the test dataset, 25 of 30 EGFR^high patients and 24 of 30 EGFR^low patients were identified by the combined model of RF classifier (Figure 5). Performance evaluation measures for the model were shown in Table 3.

Table 3

Table 3. Predicted results of clinical models, ultrasomics models, and combined models.

Among the five machine learning algorithms, the three models of RF classifier demonstrated the best prediction performance. The combined model, however, showed superior clinical net benefit, indicating greater applicability in clinical practice (Figure 4G). Additionally, its calibration curve demonstrated a sufficient degree of agreement between the predicted EGFR expression status and actual results (Figure 4H), showing more stable prediction performance.

4 Discussion

HCC is the most common type of liver cancer, with poor prognosis and 5-year survival rate of 18%, and is one of several malignancies with high fatality rate worldwide (3). Multiple molecular biomarkers such as EGFR, ki-67, VEGF, P53 have been identified as the main factors involved in HCC progression and affecting prognosis (24, 26–28), of which EGFR is often highly expressed in HCC and is involved in proliferation, invasion and metastasis of tumor cells, resulting in poor prognosis of HCC (29–31). EGFR was found highly expressed in 50.5% of HCC patients in the present study, which is close to previous findings (8, 10).

Recently, some studies have found that EGFR^high in liver cancer cells is associated with resistance to targeted agents such as Lenvatinib (32, 33). Lenvatinib has been approved by Food and Drug Administration (FDA) as first-line treatment for aHCC. However, the objective response rate was only 24.1%, indicating lenvatinib needs to be combined with other drugs to improve its clinical benefit (34, 35). A study conducted by Jin et al. (12) found that lenvatinib, in HCC patients with high EGFR expression, induced feedback activation of EGFR and its downstream signaling pathways by inhibiting FGFR, leading to HCC cells still having strong proliferative capabilities. EGFR inhibitors could block the feedback-activated signaling pathways, enhancing the antitumor effect. A clinical trial initiated by Renji Hospital (NCT04642547) recruited aHCC patients with high EGFR expression, using a combination treatment of lenvatinib and EGFR inhibitors, and the clinical response rate reached 50% (12, 15, 36). Therefore, HCC patients with noninvasive identification EGFR^high are important conditions for therapeutic management.

In the present study, we compared the predictive performance of five machine learning algorithms constructed clinical models, ultrasomics models, and combined models for noninvasive prediction of EGFR expression in HCC patients. The findings demonstrated that the five machine learning algorithms’ ultrasomics features could successfully differentiate between the EGFR expression status of HCC patients in the training and test datasets (p < 0.05), and that the ultrasomics model and the combined model constructed with an RF classifier outperformed the others in terms of predictive performance (Figures 4E,F; Table 3).

The ultrasomics models developed according to five machine learning algorithms showed good predictive ability, with the RF classifier having the best predictive performance, and the AUC of the training and test datasets increased from 0.846 (95%CI, 0.776–0.902) and 0.713 (95%CI, 0.582–0.823) to 0.929 (95%CI, 0.874–0.966) and 0.807 (95%CI, 0.684–0.897), respectively, for the clinical model. The improvement in predictive performance is because ultrasomics can extract more features from images that are associated with tumor heterogeneity and assess them quantitatively (17, 37, 38). Wu et al. developed an radiomics model based on energy-enhanced CT to predict EGFR expression status in peripheral lung cancer (39). Features such as the arterial phase Laplace of Gaussian Filter Glszm Small Area Low Gray Level emphasis and wavelet HHL gray level co-occurrence matrix (GLCM) MCC, and the venous phase wavelet LHL first-order root mean square were extracted. A multiphasic model established based on the features from both phases was found to have good predictive performance (AUC 0.950). The results show that imaging features, especially higher-order features, can better predict the expression of EGFR. In this study, 7 of the 11 best features are high-order features obtained by wavelet filtering, which indicates that higher-order features can obtain more EGFR-related features, which is the same as previous research results. The seven wavelet transform features were primarily derived from gray-level size-zone matrix (GLSZM) features and first-order features. GLSZM was used to describe the spatial distribution of gray level values and the information about the size of regions in the image, while First-order features mainly described the symmetry, uniformity, and distribution changes of image voxels’ intensity (38, 40). Among the 11 features, the wavelet HLH first-order Minimum, wavelet LHL first-order Median, square gray-level run length matrix (GLRLM) RunEntropy, and wavelet HHL GLSZM Size Zone NonUniformity features were found to have the highest coefficients (Figure 6). The square feature involves squaring each pixel value in the image to enhance the contrast of gray level values (41). These results indicate that wavelet transform features could increase the predictive value of radiomic features, being more sensitive to the identification of tumor heterogeneity and can be used to predict the EGFR expression status (42). When clinical baseline characteristics were incorporated into the ultrasomics model, the RF machine learning combined model demonstrated better predictive performance, with AUCs of 0.937 (95%CI, 0.884–0.971) for the training dataset and 0.822 (95%CI, 0.702–0.909) for the test dataset, showing a slight improvement over the ultrasomics model alone. Therefore, the ultrasomics model and combined model established by the RF classifier can better predict the EGFR expression status in HCC patients.

Figure 6

Figure 6. Weighting coefficients of 11 of the ultrasomics features.

Qualitative characteristics of lesion margin on ultrasound images analyzed by multivariate logistic regression may serve as independent predictors of EGFR expression status. Unclear lesion boundary is a risk factor, which may be related to EGFR^high more aggressive and proliferative ability leading to changes in lesion morphology (40, 43). In univariate analysis, there was a significant difference between groups in serum AFP and NLR (p < 0.05), and correlation analysis showed that the correlation coefficients (r) between AFP and NLR and EGFR expression status were 0.146 and 0.154, respectively (p < 0.05) (Supplementary Figure S3), indicating that serum AFP and NLR had a low correlation with EGFR expression status. Fan et al. predicted VEGF expression in HCC patients based on MRI imageomics profiles, and multivariate logistic regression analysis showed that AFP, NLR, and irregular lesion boundaries were independent predictors, and AUC of the clinical model and imageomics model in the training and test dataset were 0.709, 0.725; 0.892, 0.800, respectively (44). The AUC of the RF machine learning clinical model in this study was 0.846 (95%CI, 0.776–0.902) and 0.713 (95%CI, 0.582–0.823) in the training and test dataset, respectively, and compared to the ultrasomics model, the prediction accuracy was lower (AUC 0.929, 0.807). These data suggested that clinical baseline characteristics have limited predictive power for molecular tumor phenotypes.

Due to the limitations of EGFR immunohistochemical testing in the patients, in this present study, there was just a single center involved and lacked external validation to enhance the generalizability of the predictive model. Another limitation of the study was that manual segmentation was time-consuming and inefficient. Thus, in the future, there is a need for convenient, efficient, and repeatable automatic segmentation software that must be clinically validated. Third, the baseline images in this experiment were taken from different ultrasound instruments. Although the image is preprocessed before feature extraction, there may be confounding factors that affect the results. Finally, the present study only analyzed gray-scale ultrasound and did not assess in conjunction with contrast-enhanced ultrasound, elastography, and other imaging modalities. In our future work, multimodal imaging radiomics will be explored for the EGFR expression levels in HCC patients.

5 Conclusion

In conclusion, the construction of ultrasomics based on gray-scale ultrasound images by five machine learning algorithms can be used as noninvasive and effective diagnostic tools to predict EGFR expression status in HCC patients. Furthermore, the ultrasomics model and combined model established by RF classifier have the best predictive performance. The present study will provide a new noninvasive method for noninvasive prediction and precise treatment of EGFR expression status in patients with aHCC.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Medical Ethics Committee of Henan Provincial People’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because The study was retrospective, and written informed consent was waived.

Author contributions

YM: Software, Methodology, Investigation, Formal analysis, Data curation, Writing – review & editing, Writing – original draft. SD: Supervision, Writing – review & editing, Funding acquisition. SR: Writing – review & editing, Software. DB: Writing – review & editing, Data curation. YL: Writing – review & editing, Data curation. XC: Writing – review & editing, Funding acquisition. LZ: Writing – review & editing, Supervision, Funding acquisition.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was sponsored by the National Natural Science Foundation of China (grant no. 82371987), the Key Research and Development Program of Henan Province (no. 221111310400), the Henan Rehabilitation Clinical Medical Research Center and Medical Appropriate Technology Promotion Project of Henan Province (no. SYJS2022018), and the Science and Technology Breakthrough Plan Project of Henan Province (no. 242102311104).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2024.1483291/full#supplementary-material

References

1. Siegel, RL, Miller, KD, Wagle, NS, and Jemal, A. Cancer statistics, 2023. CA Cancer J Clin. (2023) 73:17–48. doi: 10.3322/caac.21763

Crossref Full Text | Google Scholar

2. Cao, MD, Wang, H, Shi, JF, Bai, FZ, Cao, MM, Wang, YT, et al. Liver cancer disease burden in the Chinese population: an updated meta-analysis of evidence from multiple trial sources. Chin J Epidemiol. (2020) 11:271. doi: 10.3760/cma.j.cn112338-20200306-00271

Crossref Full Text | Google Scholar

3. Vogel, A, Meyer, T, Sapisochin, G, Salem, R, and Saborowski, A. Hepatocellular carcinoma. Lancet. (2022) 400:1345–62. doi: 10.1016/S0140-6736(22)01200-4

Crossref Full Text | Google Scholar

4. Gordan, JD, Kennedy, EB, Abou-Alfa, GK, Beg, MS, Brower, ST, Gade, TP, et al. Systemic therapy for advanced hepatocellular carcinoma: ASCO guideline. J Clin Oncol. (2020) 38:4317–45. doi: 10.1200/JCO.20.02672

PubMed Abstract | Crossref Full Text | Google Scholar

5. Fu, J, and Wang, H. Precision diagnosis and treatment of liver cancer in China. Cancer Lett. (2018) 412:283–8. doi: 10.1016/j.canlet.2017.10.008

PubMed Abstract | Crossref Full Text | Google Scholar

6. Li, L, and Wang, H. Heterogeneity of liver cancer and personalized therapy. Cancer Lett. (2016) 379:191–7. doi: 10.1016/j.canlet.2015.07.018

Crossref Full Text | Google Scholar

7. Chen, S, Cao, Q, Wen, W, and Wang, H. Targeted therapy for hepatocellular carcinoma: challenges and opportunities. Cancer Lett. (2019) 460:1–9. doi: 10.1016/j.canlet.2019.114428

Crossref Full Text | Google Scholar

8. Ito, Y, Takeda, T, Sakon, M, Tsujimoto, M, Higashiyama, S, Noda, K, et al. Expression and clinical significance of erb-B receptor family in hepatocellular carcinoma. Br J Cancer. (2001) 84:1377–83. doi: 10.1054/bjoc.2000.1580

PubMed Abstract | Crossref Full Text | Google Scholar

9. Arteaga, CL, and Engelman, JA. ERBB receptors: from oncogene discovery to basic science to mechanism-based Cancer therapeutics. Cancer Cell. (2014) 25:282–303. doi: 10.1016/j.ccr.2014.02.025

PubMed Abstract | Crossref Full Text | Google Scholar

10. Lanaya, H, Natarajan, A, Komposch, K, Li, L, Amberg, N, Chen, L, et al. EGFR has a tumour-promoting role in liver macrophages during hepatocellular carcinoma formation. Nat Cell Biol. (2014) 16:972–81. doi: 10.1038/ncb3031

PubMed Abstract | Crossref Full Text | Google Scholar

11. Nikolova, D, Chalovska, V, Ivanova, MG, Nikolovska, E, Volkanovska, A, Orovchanec, N, et al. Immunohistochemical expression of epidermal growth factor receptor in hepatocellular carcinoma. Prilozi. (2018) 39:21–8. doi: 10.2478/prilozi-2018-0038

Crossref Full Text | Google Scholar

12. Jin, H, Shi, Y, Lv, Y, Yuan, S, Ramirez, CFA, Lieftink, C, et al. EGFR activation limits the response of liver cancer to lenvatinib. Nature. (2021) 595:730–4. doi: 10.1038/s41586-021-03741-7

PubMed Abstract | Crossref Full Text | Google Scholar

13. Qin, LX, and Tang, ZY. The prognostic significance of clinical and pathological features in hepatocellular carcinoma. World J Gastroenterol. (2002) 8:193–9. doi: 10.3748/wjg.v8.i2.193

PubMed Abstract | Crossref Full Text | Google Scholar

14. Vaseva, AV, Blake, DR, Gilbert, TSK, Ng, S, Hostetter, G, Azam, SH, et al. KRAS suppression-induced degradation of MYC is antagonized by a MEK5-ERK5 compensatory mechanism. Cancer Cell. (2018) 34:807–822.e7. doi: 10.1016/j.ccell.2018.10.001

PubMed Abstract | Crossref Full Text | Google Scholar

15. Wang, Y, Kui, L, and Wang, G. Combination therapy for HCC: from CRISPR screening to the design of clinical therapies. Signal Transduct Target Ther. (2021) 6:359. doi: 10.1038/s41392-021-00775-1

PubMed Abstract | Crossref Full Text | Google Scholar

16. Guiot, J, Vaidyanathan, A, Deprez, L, Zerka, F, Danthine, D, Frix, AN, et al. A review in radiomics: making personalized medicine a reality via routine imaging. Med Res Rev. (2022) 42:426–40. doi: 10.1002/med.21846

PubMed Abstract | Crossref Full Text | Google Scholar

17. Wakabayashi, T, Ouhmich, F, Gonzalez-Cabrera, C, Felli, E, Saviano, A, Agnus, V, et al. Radiomics in hepatocellular carcinoma: a quantitative review. Hepatol Int. (2019) 13:546–59. doi: 10.1007/s12072-019-09973-0

PubMed Abstract | Crossref Full Text | Google Scholar

18. Wang, Q, Zhang, Y, Zhang, E, Xing, X, Chen, Y, Nie, K, et al. A multiparametric method based on clinical and CT-based Radiomics to predict the expression of p53 and VEGF in patients with spinal Giant cell tumor of bone. Front Oncol. (2022) 12:894696. doi: 10.3389/fonc.2022.894696

PubMed Abstract | Crossref Full Text | Google Scholar

19. Yang, F, Wan, Y, Xu, L, Wu, Y, Shen, X, Wang, J, et al. MRI-Radiomics prediction for cytokeratin 19-positive hepatocellular carcinoma: a multicenter study. Front Oncol. (2021) 11:672126. doi: 10.3389/fonc.2021.672126

PubMed Abstract | Crossref Full Text | Google Scholar

20. Zhang, L, Qi, Q, Li, Q, Ren, S, Liu, S, Mao, B, et al. Ultrasomics prediction for cytokeratin 19 expression in hepatocellular carcinoma: a multicenter study. Front Oncol. (2022) 12:994456. doi: 10.3389/fonc.2022.994456

PubMed Abstract | Crossref Full Text | Google Scholar

21. Blagus, R, and Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. (2013) 14:106. doi: 10.1186/1471-2105-14-106

PubMed Abstract | Crossref Full Text | Google Scholar

22. Mazières, J, Brugger, W, Cappuzzo, F, Middel, P, Frosch, A, Bara, I, et al. Evaluation of EGFR protein expression by immunohistochemistry using H-score and the magnification rule: re-analysis of the SATURN study. Lung Cancer. (2013) 82:231–7. doi: 10.1016/j.lungcan.2013.07.016

Crossref Full Text | Google Scholar

23. Pirker, R, Pereira, JR, von Pawel, J, Krzakowski, M, Ramlau, R, Park, K, et al. EGFR expression as a predictor of survival for first-line chemotherapy plus cetuximab in patients with advanced non-small-cell lung cancer: analysis of data from the phase 3 FLEX study. Lancet Oncol. (2012) 13:33–42. doi: 10.1016/S1470-2045(11)70318-7

PubMed Abstract | Crossref Full Text | Google Scholar

24. Zhang, L, Duan, S, Qi, Q, Li, Q, Ren, S, Liu, S, et al. Noninvasive prediction of Ki-67 expression in hepatocellular carcinoma using machine learning-based Ultrasomics: a multicenter study. J Ultrasound Med. (2023) 42:1113–22. doi: 10.1002/jum.16126

PubMed Abstract | Crossref Full Text | Google Scholar

25. Park, JE, Park, SY, Kim, HJ, and Kim, HS. Reproducibility and generalizability in Radiomics modeling: possible strategies in radiologic and statistical perspectives. Korean J Radiol. (2019) 20:1124–37. doi: 10.3348/kjr.2018.0070

PubMed Abstract | Crossref Full Text | Google Scholar

26. Qin, L, Huang, DN, Huang, J, and Huang, H. New biomarkers and therapeutic targets of human liver cancer: transcriptomic findings. Biofactors. (2021) 47:1016–31. doi: 10.1002/biof.1775

Crossref Full Text | Google Scholar

27. Yang, C, Zhang, ZM, Zhao, ZP, Wang, ZQ, Zheng, J, Xiao, HJ, et al. Radiomic analysis based on magnetic resonance imaging for the prediction of VEGF expression in hepatocellular carcinoma patients. Abdom Radiol. (2024) 49:3824–33. doi: 10.1007/s00261-024-04427-0

PubMed Abstract | Crossref Full Text | Google Scholar

28. Tseng, P, Tai, M, Huang, C, Wang, CC, Lin, JW, Hung, CH, et al. Overexpression of VEGF is associated with positive p53 immunostaining in hepatocellular carcinoma (HCC) and adverse outcome of HCC patients. J Surg Oncol. (2008) 98:349–57. doi: 10.1002/jso.21109

PubMed Abstract | Crossref Full Text | Google Scholar

29. Guo, J, Zhao, J, Xu, Q, and Huang, D. Resistance of Lenvatinib in hepatocellular carcinoma. Curr Cancer Drug Targets. (2022) 22:865–78. doi: 10.2174/1568009622666220428111327

PubMed Abstract | Crossref Full Text | Google Scholar

30. Tao, M, Han, J, Shi, J, Liao, H, Wen, K, Wang, W, et al. Application and resistance mechanisms of Lenvatinib in patients with advanced hepatocellular carcinoma. J Hepatocell Carcinoma. (2023) 10:1069–83. doi: 10.2147/JHC.S411806

PubMed Abstract | Crossref Full Text | Google Scholar

31. Nicholson, RI, Gee, JMW, and Harper, ME. EGFR and cancer prognosis. Eur J Cancer. (2001) 37:9–15. doi: 10.1016/S0959-8049(01)00231-3

Crossref Full Text | Google Scholar

32. He, X, Hikiba, Y, Suzuki, Y, Nakamori, Y, Kanemaru, Y, Sugimori, M, et al. EGFR inhibition reverses resistance to lenvatinib in hepatocellular carcinoma cells. Sci Rep. (2022) 12:8007. doi: 10.1038/s41598-022-12076-w

PubMed Abstract | Crossref Full Text | Google Scholar

33. Hu, B, Zou, T, Qin, W, Shen, X, Su, Y, Li, J, et al. Inhibition of EGFR overcomes acquired Lenvatinib resistance driven by STAT3–ABCB1 signaling in hepatocellular carcinoma. Cancer Res. (2022) 82:3845–57. doi: 10.1158/0008-5472.CAN-21-4140

PubMed Abstract | Crossref Full Text | Google Scholar

34. Kudo, M, Finn, RS, Qin, S, Han, KH, Ikeda, K, Piscaglia, F, et al. Lenvatinib versus sorafenib in first-line treatment of patients with unresectable hepatocellular carcinoma: a randomised phase 3 non-inferiority trial. Lancet. (2018) 391:1163–73. doi: 10.1016/S0140-6736(18)30207-1

PubMed Abstract | Crossref Full Text | Google Scholar

35. Nair, A, Reece, K, Donoghue, MB, Yuan, W(V), Rodriguez, L, Keegan, P, et al. FDA supplemental approval summary: Lenvatinib for the treatment of Unresectable hepatocellular carcinoma. Oncologist. (2021) 26:e484–91. doi: 10.1002/onco.13566

PubMed Abstract | Crossref Full Text | Google Scholar

36. Hindson, J . Lenvatinib plus EGFR inhibition for liver cancer. Nat Rev Gastroenterol Hepatol. (2021) 18:675–5. doi: 10.1038/s41575-021-00513-6

Crossref Full Text | Google Scholar

37. Che, F, Xu, Q, Li, Q, Huang, ZX, Yang, CW, Wang, LY, et al. Radiomics signature: a potential biomarker for β-arrestin1 phosphorylation prediction in hepatocellular carcinoma. World J Gastroenterol. (2022) 28:1479–93. doi: 10.3748/wjg.v28.i14.1479

PubMed Abstract | Crossref Full Text | Google Scholar

38. Gu, JX, Bao, SL, Akemuhan, R, Jia, Z, Zhang, Y, and Huang, C. Radiomics based on contrast-enhanced CT for Recognizin c-met-positive hepatocellular carcinoma: a noninvasive approach to predict the outcome of Sorafenib resistance. Mol Imaging Biol. (2003) 25:1073–83. doi: 10.1007/s11307-023-01870-1

Crossref Full Text | Google Scholar

39. Wu, L, Li, J, Ruan, X, Ren, J, Ping, X, and Chen, B. Prediction of VEGF and EGFR expression in peripheral lung Cancer based on the Radiomics model of spectral CT enhanced images. Int J Gen Med. (2022) 15:6725–38. doi: 10.2147/IJGM.S374002

PubMed Abstract | Crossref Full Text | Google Scholar

40. Zhang, N, Wu, M, Zhou, Y, Yu, C, Shi, D, Wang, C, et al. Radiomics nomogram for prediction of glypican-3 positive hepatocellular carcinoma based on hepatobiliary phase imaging. Front Oncol. (2023) 13:1209814. doi: 10.3389/fonc.2023.1209814

PubMed Abstract | Crossref Full Text | Google Scholar

41. Feng, X, Tustison, NJ, Patel, SH, and Meyer, CH. Brain tumor segmentation using an ensemble of 3D U-nets and overall survival prediction using Radiomic features. Front Comput Neurosci. (2020) 14:25. doi: 10.3389/fncom.2020.00025

PubMed Abstract | Crossref Full Text | Google Scholar

42. Davnall, F, Yip, CSP, Ljungqvist, G, Selmi, M, Ng, F, Sanghera, B, et al. Assessment of tumor heterogeneity: an emerging imaging tool for clinical practice? Insights Imaging. (2012) 3:573–89. doi: 10.1007/s13244-012-0196-6

Crossref Full Text | Google Scholar

43. Wu, TH, Hatano, E, Yamanaka, K, Seo, S, Taura, K, Yasuchika, K, et al. A non-smooth tumor margin on preoperative imaging predicts microvascular invasion of hepatocellular carcinoma. Surg Today. (2016) 46:1275–81. doi: 10.1007/s00595-016-1320-x

Crossref Full Text | Google Scholar

44. Fan, T, Li, S, Li, K, Xu, J, Zhao, S, Li, J, et al. A potential prognostic marker for recognizing VEGF-positive hepatocellular carcinoma based on magnetic resonance Radiomics signature. Front Oncol. (2022) 12:857715. doi: 10.3389/fonc.2022.857715

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: hepatocellular carcinoma (HCC), machine learning, ultrasomics, epidermal growth factor receptor (EGFR), lenvatinib

Citation: Ma Y, Duan S, Ren S, Bu D, Li Y, Cai X and Zhang L (2024) Machine learning based ultrasomics noninvasive predicting EGFR expression status in hepatocellular carcinoma patients. Front. Med. 11:1483291. doi: 10.3389/fmed.2024.1483291

Received: 19 August 2024; Accepted: 01 November 2024;
Published: 19 November 2024.

Edited by:

Luis Castro-Sánchez, University of Colima, Mexico

Reviewed by:

Naveena Yanamala, The State University of New Jersey, United States
Yu-quan Wu, Guangxi Medical University, China

Copyright © 2024 Ma, Duan, Ren, Bu, Li, Cai and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lianzhong Zhang, emx6ODc3N0B6enUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Machine learning based ultrasomics noninvasive predicting EGFR expression status in hepatocellular carcinoma patients

1 Introduction

2 Materials and methods

2.1 Study population

2.2 EGFR immunohistochemical analysis

2.3 Image acquisition

2.4 Image segmentation

2.5 Feature extraction

2.6 Feature selection

2.7 Modeling and performance evaluation

2.8 Statistical analysis

3 Results

3.1 Clinical features

3.2 Feature extraction and selection

3.3 Predictive performance of clinical models

3.4 Predictive performance of ultrasomics and combined models

4 Discussion

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

References

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good