AUTHOR=Ma Yujing , Duan Shaobo , Ren Shanshan , Bu Didi , Li Yahong , Cai Xiguo , Zhang Lianzhong TITLE=Machine learning based ultrasomics noninvasive predicting EGFR expression status in hepatocellular carcinoma patients JOURNAL=Frontiers in Medicine VOLUME=11 YEAR=2024 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1483291 DOI=10.3389/fmed.2024.1483291 ISSN=2296-858X ABSTRACT=Objective

To investigate the ability of ultrasomics to noninvasively predict epidermal growth factor receptor (EGFR) expression status in patients with hepatocellular carcinoma (HCC).

Methods

198 HCC patients were comprised in the study (n = 138 in the training dataset and n = 60 in the test dataset). EGFR expression was detected by immunohistochemistry. Ultrasomics features from gray-scale ultrasound images were extracted. Intra-class correlation coefficient (ICC) screening, variance filtering, mutual information method, and extreme gradient boosting (XGboost) embedding method were applied for selecting the best features. Random forest (RF), XGBoost, support vector machine (SVM), decision tree (DT), and logistic regression (LR) 5 machine learning algorithms were used to construct clinical models, ultrasomics models, and clinical-ultrasomics combined models, respectively. Area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, decision curve analysis (DCA), and calibration curve were used to assess the predictive performance of the model.

Results

In 198 patients, high EGFR expression was observed in 100 patients and low EGFR expression was observed in 98 patients. The RF machine learning ultrasomics model was found to perform well, with the AUC of the training and test dataset being 0.929 (95%CI, 0.874–0.966) and 0.807 (95%CI, 0.684–0.897) respectively, the sensitivity being 0.843 and 0.767 respectively, the specificity being 0.857 and 0.800 respectively, and the accuracy being 0.850 and 0.783, respectively. The predictive performance of the combined model established by integrating ultrasomics features and clinical baseline characteristics was improved, with the AUC, sensitivity, specificity, and accuracy of the RF machine learning combined model for the training and test dataset reaching 0.937 (95%CI, 0.884–0.971), 0.822 (95%CI, 0.702–0.909); 0.857, 0.833; 0.857, 0.800; 0.857, 0.817, respectively.

Conclusion

To predict the status of EGFR expression in HCC patients, the ultrasomics model and combined model created by five machine learning algorithms can be utilized as efficient and noninvasive techniques, and the ultrasomics model and combined model established by RF classifier have the best predictive performance.