Enhanced preoperative prediction of pancreatic fistula using radiomics and clinical features with SHAP visualization

Li, Yan; Zong, Kenzhen; Zhou, Yin; Sun, Yuan; Liu, Yanyao; Zhou, Baoyong; Wu, Zhongjun

doi:10.3389/fbioe.2025.1510642

ORIGINAL RESEARCH article

Front. Bioeng. Biotechnol., 04 April 2025

Sec. Biomechanics

Volume 13 - 2025 | https://doi.org/10.3389/fbioe.2025.1510642

Enhanced preoperative prediction of pancreatic fistula using radiomics and clinical features with SHAP visualization

Yan Li¹^†

Kenzhen Zong¹^†

Yin Zhou²^†

Yuan Sun¹

Yanyao Liu¹*

Baoyong Zhou^1,3*

Zhongjun Wu¹*

¹Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
²Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
³Department of Hepatobiliary Surgery, Bishan Hospital of Chongqing Medical University, Chongqing, China

Background: Clinically relevant postoperative pancreatic fistula (CR-POPF) represents a significant complication after pancreaticoduodenectomy (PD). Therefore, the early prediction of CR-POPF is of paramount importance. Based on above, this study sought to develop a CR-POPF prediction model that amalgamates radiomics and clinical features to predict CR-POPF, utilizing Shapley Additive explanations (SHAP) for visualization.

Methods: Extensive radiomics features were extracted from preoperative enhanced Computed Tomography (CT) images of patients scheduled for PD. Subsequently, feature selection was performed using Least Absolute Shrinkage and Selection Operator (Lasso) regression and random forest (RF) algorithm to select pertinent radiomics and clinical features. Last, 15 CR-POPF prediction models were developed using five distinct machine learning (ML) predictors, based on selected radiomics features, selected clinical features, and a combination of both. Model performance was compared using DeLong’s test for the area under the receiver operating characteristic curve (AUC) differences.

Results: The CR-POPF prediction model based on the XGBoost predictor with the combination of the radiomics and clinical features selected by Lasso regression and RF exhibited superior performance among these 15 CR-POPF prediction models, achieving an accuracy of 0.85, an AUC of 0.93. DeLong’s test showed statistically significant differences (P < 0.05) when compared to the radiomics-only and clinical-only models, with recall of 0.63, precision of 0.65, and F1 score of 0.64.

Conclusion: The proposed CR-POPF prediction model based on the XGBoost predictor with the combination of the radiomics and clinical features selected by Lasso regression and RF can effectively predicting the CR-POPF and may provide strong support for early clinical management of CR-POPF.

1 Introduction

Pancreaticoduodenectomy (PD) represents one of the most complex procedures within the surgical discipline and remains the gold standard for treating pancreatic and periampullary neoplasms (Bergeat et al., 2020). Despite significant advancements in surgical techniques and perioperative care, mortality rates in high-volume centers have been reduced to below 3% (Zureikat et al., 2016; Wu et al., 2023). However, CR-POPF persists as a major complication, occurring in 10%–20% of patients, leading to prolonged hospitalization, increased costs, and elevated morbidity and mortality (Ma et al., 2017; Williamsson et al., 2017; Hirono et al., 2019; Casciani et al., 2021). Early prediction of CR-POPF is critical for risk stratification and personalized management (Mungroop et al., 2019). Existing risk scoring models (Callery et al., 2013; Kantor et al., 2017; Mungroop et al., 2019; Mungroop et al., 2021), such as the Fistula Risk Score (FRS), rely on subjective intraoperative assessments (e.g., pancreatic texture) or postoperative parameters, limiting their utility for preoperative decision-making. Consequently, there is an urgent need for robust preoperative prediction tools that integrate objective, quantifiable biomarkers to guide clinical interventions.

Computed tomography (CT), widely used for preoperative evaluation, offers a non-invasive platform for objective risk stratification. However, conventional CT analysis focuses on macroscopic features (e.g., ductal morphology), which lack the granularity to capture subtle parenchymal heterogeneity linked to CR-POPF pathogenesis. Radiomics, an emerging paradigm, bridges this gap by converting medical images into high-dimensional quantitative features that reflect underlying pathophysiological processes (Gillies et al., 2016; Lambin et al., 2017; Rigiroli et al., 2021). These features, such as texture and shape parameters, quantify pancreatic fibrosis, microlobular fat infiltration, and ductal microcalcifications (Lubner et al., 2017; Chitalia and Kontos, 2019; Kim et al., 2019; Abunahel et al., 2022)—factors strongly associated with anastomotic integrity. Nevertheless, unimodal radiomics models often overlook systemic clinical variables (Huang et al., 2022; Tan et al., 2022; Mack et al., 2024), such as inflammatory markers or metabolic indices, which may synergize with imaging biomarkers to enhance predictive accuracy.

Machine learning (ML) provides a powerful framework to integrate radiomics with clinical data, enabling the development of multimodal predictive models. Prior studies demonstrate that combined models outperform unimodal approaches by capturing both microenvironmental heterogeneity and systemic physiological states (Capretti et al., 2022; Shen et al., 2022; Verma et al., 2024). For instance, texture features derived from gray-level matrices quantify pancreatic stiffness, while clinical variables like main pancreatic duct (MPD) diameter and platelet-to-albumin ratio (PAR) reflect anatomical risk and systemic inflammation, respectively. However, the clinical adoption of ML models has been hindered by their “black-box” nature, which obscures the interpretability of feature contributions (Azodi et al., 2020).

To address these challenges, we propose an interpretable ML framework that synergizes preoperative CT radiomics with clinical features for CR-POPF prediction. Our approach achieves superior predictive performance (AUC: 0.93) while addressing key limitations of existing methods—namely, their reliance on subjective intraoperative assessments, dependence on intraoperative or postoperative parameters, isolated use of unimodal data (radiomics or clinical features), and the opacity of traditional machine learning algorithms. By employing SHAP to elucidate feature contributions (Lundberg et al., 2020), we transform the model into a clinically interpretable tool. This integration not only enhances predictive accuracy but also provides mechanistic insights into how specific variables collectively influence fistula risk, bridging the gap between algorithmic performance and clinical trust.

2 Materials and methods

2.1 Study cohort

This retrospective cohort study was approved by the Ethics Committee of the First Affiliated Hospital of Chongqing Medical University (Ethics Approval Number: 2024-087-01). Informed consent was waived due to the retrospective design. We reviewed 336 patients who underwent PD between October 2018 and June 2023. Inclusion criteria were: (1) complete clinical and pathological data, (2) preoperative contrast-enhanced CT within 1 month before surgery. Exclusion criteria included: (1) non-curative resection, (2) prior neoadjuvant therapy, (3) poor CT image quality. After screening (Figure 1), 241 patients were included and stratified into CR-POPF (N = 55, 22.8%) and non-CR-POPF (N = 186, 77.2%) groups based on ISGPS 2016 criteria (Bassi et al., 2017).

Figure 1

Figure 1. Flowchart of inclusion and exclusion criteria for eligible patients in the study. PD, pancreaticoduodenectomy; CT, computed tomography; CR-POPF, clinically relevant postoperative pancreatic fistula.

Demographic and clinical comparisons between groups are summarized in Table 1. Age, gender, diabetes, hypertension, cardiovascular/pulmonary diseases, smoking history, and prior abdominal surgery showed no significant differences (P > 0.05). However, CR-POPF patients exhibited higher BMI (23.2 vs. 22.2 kg/m², P = 0.035), increased alcohol consumption (43.6% vs. 28.0%, P = 0.042), and smaller MPD diameter (2.77 mm vs. 4.25 mm, P < 0.001). Preoperative laboratory tests revealed elevated platelet-to-albumin ratio (PAR: 6.55 vs. 5.53, P = 0.012) and bilirubin levels (129 vs. 78.2 μmol/L, P = 0.004) in the CR-POPF group. Pancreatic head lesions were more frequent in CR-POPF patients (67.3% vs. 32.7%, P = 0.006). No differences were observed in preoperative biliary drainage, ASA classification, or surgical approach (P > 0.05).

Table 1

Table 1. Clinical baseline characteristics of patients.

The cohort was randomly split into training (n = 193, 80%) and test sets (n = 48, 20%) using an 8:2 ratio. Reporting followed TRIPOD guidelines (Collins et al., 2015).

2.2 CT technique

Contrast-enhanced abdominal CT scans were performed using Siemens SOMATOM Force, GE Discovery CT750 HD, or GE LightSpeed VCT. Scanning parameters: 120 kV, 200 mA, 5 mm slice thickness. All images were reconstructed using a standard reconstruction kernel with the following parameters: pitch of 1, rotation time of 0.5 s, field of view of 350 mm × 350 mm, matrix size of 512 × 512, slice thickness of 5 mm, interval of 5 mm, and reconstruction slice thickness of 1 mm. Patients were required to fast and avoid drinking for at least 3 h prior to the examination. A non-ionic iodinated contrast agent (300–400 mgI/ml) was administered intravenously at a dose of 1–1.5 mL/kg with an injection rate of 3 mL/s. Arterial phase scanning was delayed by 15–18 s. Portal venous and delayed phase scans were performed with delays of 33–36 s and 180 s, respectively. Enhanced CT images were exported from the Picture Archiving and Communication System (PACS) in DICOM format for further analysis.

2.3 Image preprocessing and segmentation

Image preprocessing included artifact removal, grayscale normalization (0–255), and enhancement via contrast adjustment, sharpening, and noise reduction.

Two radiologists (>5 years of experience) manually delineated pancreatic parenchyma (body and tail) as regions of interest (ROIs) on portal venous phase images using ITK-SNAP (v3.6.0). The portal vein served as the anatomical landmark to differentiate the pancreatic head from the body. Segmentation masks were saved in Nifti format. A senior radiologist (>10 years of experience) validated 50 randomly selected samples. Intraclass and interclass correlation coefficients (ICCs) were calculated, with ICC >0.8 indicating satisfactory reproducibility.

2.4 Feature extraction and selection

Radiomics feature extraction was performed using the PyRadiomics library (v3.0.1) in Python, based on original CT images and their preprocessed variants, including those filtered with Laplacian of Gaussian (LoG) and wavelet transforms. The extracted features encompassed first-order statistics (e.g., mean, variance, skewness), shape features (e.g., volume, sphericity, maximum diameter), and texture features derived from matrices such as the gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), gray-level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM).

To optimize feature selection and reduce dimensionality, the Lasso regression combined with cross-validation (Lasso-CV) was applied. Regularization parameters were optimized using grid search, and features with non-zero coefficients were retained. Additionally, the Random Forest (RF) algorithm was employed to rank the importance of clinical variables and identify the most relevant features for model development.

2.5 Model construction and validation

In the training cohort, five machine learning predictors—XGBoost, Random Forest (RF), Extra Trees (ET), Gradient Boosting (GB), and AdaBoost—were employed to develop 15 CR-POPF prediction models (the model parameters are presented in Table 2). These models were trained on three datasets: radiomics-only, clinical-only, and a combined radiomics-clinical dataset. Feature selection and model training were performed exclusively on the training set. The test set remained entirely independent and was only used for final model evaluation to prevent data leakage. Hyperparameters, including learning rate, maximum tree depth, subsampling rate, and regularization terms, were optimized via 5-fold cross-validated grid search to balance model complexity and generalizability. To mitigate overfitting, early stopping mechanisms and maximum iteration limits (10,000 iterations) were enforced during training. Model performance was rigorously evaluated using accuracy, AUC, precision, recall, and F1 score.

Table 2

Table 2. Hyperparameters of machine learning models.

Pairwise comparisons of AUC values between models were conducted using DeLong’s test, with results visualized as a heatmap (Supplementary Material S1) to highlight statistically significant differences (P < 0.05). Calibration curves quantified the agreement between predicted probabilities and observed outcomes, while decision curve analysis (DCA) assessed clinical utility by quantifying net benefits across threshold probabilities (Supplementary Material S2). Model interpretability was enhanced via SHAP analysis, elucidating feature contributions globally and locally. The workflow is summarized in Figure 2.

Figure 2

Figure 2. Workflow of model development. PD, pancreaticoduodenectomy; CT, computed tomography; Lasso, Lasso regression; SHAP, Shapley Additive explanation.

2.6 Evaluation metrics

To comprehensively evaluate the performance of the predictive models, five standard metrics were employed: accuracy, precision, recall, F1-score, and AUC. The definitions and corresponding formulas are as follows:

• Accuracy: The proportion of correctly classified instances relative to the total instances (Sokolova and Lapalme, 2009).

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

• Precision: The proportion of true positive predictions among all positive predictions (Sokolova and Lapalme, 2009).

Precision = \frac{T P}{T P + F P}

• Recall: The proportion of true positive predictions among all actual positive instances (Sokolova and Lapalme, 2009).

R e c a l l = \frac{T P}{T P + F N}

• F1-score: The harmonic mean of precision and recall (Sokolova and Lapalme, 2009).

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

• AUC: The area under the receiver operating characteristic (ROC) curve, calculated as the probability that a randomly chosen positive instance is ranked higher than a negative instance. For M positive and N negative instances (Fawcett, 2006):

AUC = \frac{\sum_{i = 1}^{M} \sum_{j = 1}^{N} I (P_{i} > P_{j})}{M \times N}

where Pi and Pj denote the predicted probabilities of the i-th positive and j-th negative instance, respectively, and I (⋅) is an indicator function equal to 1 if Pi > Pj.

2.7 statistical analysis

The statistical evaluations were executed employing Python software (version 3.7; https://www.python.org/). Quantitative data, conforming to a normal distribution, are articulated as the mean ± standard deviation (SD), while quantitative data that do not follow a normal distribution are represented as the median, along with the interquartile range. Categorical data are denoted as numbers and percentages (N, %). To assess the efficacy of the constructed models, several widely utilized metrics were chosen, encompassing accuracy, precision, recall, F1 score, and the area under the Receiver Operating Characteristic (ROC) curve (AUC). Pairwise comparisons of AUC values between models were conducted using DeLong’s test to assess statistical significance. The threshold for statistical significance was established at P < 0.05.

3 Results

3.1 Feature selection outcomes

In the course of this study, the Pyradiomics library was utilized to derive 1719 radiomics features from CT images. To guarantee the performance and interpretability of the model, Lasso regression was implemented for the selection of these high-dimensional features. The alterations in the model’s performance with varying parameter α iterations were depicted in Figures 3A, B, thereby determining the optimal parameter values and the corresponding number of features. The Lasso regression model was then used to pinpoint features with non-zero coefficients, which were subsequently ranked based on the absolute values of these coefficients, as illustrated in Figure 3C, including texture features (e.g., wavelet HHL_glszm_GrayLevelNonUniformity, original_glcm_ClusterProminence) and shape features (e.g., original_shape_Maximum2DDiameterRow, original_shape_Sphericity).

Figure 3

Figure 3. Feature screening results and their visualization. (A) and (B) are the results of different α parameters in the Lasso algorithm; (C) is the radiomics features with contribution values not equal to 0.

For clinical features, RF analysis ranked eight predictors by importance (Figure 4): MPD diameter (2.77 mm vs. 4.25 mm in CR-POPF vs non-CR-POPF), lesion location (67.3% vs. 32.7% pancreatic head involvement), bilirubin (129 vs. 78.2 μmol/L), PAR (6.55 vs. 5.53), ALT (95.0 vs. 137 U/L), BMI (mean: 23.2 vs. 22.2 kg/m²), systemic immune-inflammation index (SII) (820 vs. 688), and AST (82.0 vs. 87.0 U/L).Combining radiomics and clinical features, we established a multimodal feature set encompassing 28 variables for subsequent training and validation of ML models.

Figure 4

Figure 4. Contribution ranking of key clinical features selected using Random Forest analysis.

3.2 Model performance comparison

A total of fifteen CR-POPF prediction models were developed using five distinct ML predictors, incorporating selected radiomics features, selected clinical features, and a combination of both. Among the radiomics-based models, the AdaBoost model demonstrated the highest predictive performance, achieving the highest AUC of 0.87 (Figure 5A), along with the best recall (0.76) and precision (0.71). In contrast, the RF and Extra Trees models exhibited the highest accuracy (0.83); however, the RF model showed lower robustness, with an AUC of 0.74. Regarding clinical models, the Extra Trees model achieved the highest AUC (0.85, Figure 5B) while maintaining a balanced performance in terms of precision (0.75) and recall (0.75).

Figure 5

Figure 5. ROC curves and AUC values of each model. (A–C) represent the prediction model results based on radiomics, clinical, and radiomics-clinical, respectively.

The CR-POPF prediction model based on the XGBoost predictor with the combination of the selected radiomics and clinical features demonstrated superior performance among all 15 CR-POPF prediction models, achieving an accuracy of 0.85 and an AUC of 0.93 (Figure 5C). Compared to unimodal models, the combined model exhibited statistically significant improvements. Specifically, when compared to the radiomics-only model using the Extra Trees predictor, the AUC difference was 0.17 (p = 0.041), and when compared to the clinical-only model using the Random Forest predictor, the AUC difference was 0.11 (p = 0.041). These results, validated by DeLong’s test (Supplementary Material S1, where red cells denote p < 0.05), highlight the synergistic value of multimodal integration.

Detailed performance metrics are summarized in Table 3, while calibration and decision curve analysis (DCA) curves (Supplementary Material S2) further validated the clinical utility of the combined model across threshold probabilities. These results highlight that the integration of CT radiomics and clinical data significantly enhances preoperative CR-POPF risk stratification.

Table 3

Table 3. Performance comparison of each model.

3.3 XGBoost combined model for SHAP

In the implementation of the XGBoost ensemble model, the SHAP method is utilized to elucidate the final model output through the computation of each variable’s contribution to the prediction. This interpretive strategy yields two categories of explanations: global explanations at the feature level and local explanations at the individual level. Global explanations elucidate the comprehensive behavior of the model and the significance of its features. This is illustrated in the SHAP bar chart and the SHAP summary plots (Figures 6A, C), where the influence of features on the model is assessed via mean SHAP values and presented in a descending sequence, thereby highlighting the top 20 variables that contribute most significantly to the model. The three variables with the highest contribution are wavelet-HHL_glszm_GrayLevelNonUniformity, original_shape_Maximum2DDiameterRow, and lesion location. The SHAP heatmap (Figure 6B) visually represents the direction and magnitude of the effect of each feature across all instances within the model. Additionally, SHAP dependence plots (Figure 7) facilitate comprehension of the manner in which a singular feature influences the output of the XGBoost predictive model. The y-axis denotes the SHAP value of the feature, in contrast to the x-axis, which signifies the value of the feature. The plot provides a visual representation of the fluctuating importance of the feature in relation to its value. A SHAP value exceeding zero corresponds to positive class predictions within the model, signifying an elevated risk of CR-POPF. Local explanations scrutinize the methods by which specific predictions for individual cases are formulated through the amalgamation of personalized input data. Figure 8 delineates instances of four standard positive and negative CR-POPF forecasts. The SHAP Waterfall plot elucidates the contributions of each attribute to the prediction outcome for a singular case. The baseline value symbolizes the model’s fundamental prediction probability, while each feature’s contribution value (also known as the SHAP value) signifies the direction and magnitude of that particular feature’s influence on the prediction. Positive values imply that the feature escalates the likelihood of predicting positive CR-POPF. The final prediction probability, denoted as f(x), is the cumulative sum of the baseline value and all feature contributions.

Figure 6

Figure 6. Global model explanation by the SHAP method. (A) The SHAP bar chart. (B) The SHAP heatmap plot shows the direction and intensity of influence for each feature of all cases in the model. (C) The SHAP beeswarm plot for the top 20 features in the model. Each dot represents a patient for each feature, with red denoting a higher feature value and blue denoting a lower feature value. The x-axis represents the SHAP values that describe the impact of each feature on model prediction. Positive SHAP values indicate an increased risk of CR-POPF, whereas negative SHAP values indicate a decreased risk. The dots are stacked vertically to show density. SHAP, Shapley Additive explanation; CR-POPF, clinically relevant postoperative pancreatic fistula; MPD, main pancreatic duct diameter; BMI, body mass index; ALT, alanine aminotransferase; PAR, platelet-to-albumin ratio.

Figure 7

Figure 7. SHAP dependence plot. Each dependence plot shows how a single feature affects the output of the prediction model, and each dot represents a single patient. SHAP, Shapley Additive explanation; MPD, main pancreatic duct diameter.

Figure 8

Figure 8. Local model explanation by the SHAP method. The SHAP waterfall plots illustrate how each feature contributes to individual predictions [(A, B) are CR-POPF negative cases; (C, D) are CR-POPF positive cases]. On a waterfall plot, the value at the bottom represents the expected value of the model output, and each row represents the contribution of each feature to the model output. A red arrow indicates an increased risk of CR-POPF, while a blue arrow indicates a decreased risk. The gray text before the feature names shows the value of each feature for the case. SHAP, Shapley Additive explanation; CR-POPF, clinically relevant postoperative pancreatic fistula; MPD, main pancreatic duct diameter; BMI, body mass index.

4 Discussion

4.1 Synergistic feature selection strategy

The integration of radiomics and clinical features through ML offers a transformative approach for preoperative prediction of CR-POPF. In this study, we extracted 1,719 radiomics features from preoperative portal venous phase CT images of 241 PD patients and combined them with clinical variables to develop a multimodal predictive model. The dual application of Lasso regression and RF algorithm for feature selection proved instrumental in balancing dimensionality reduction with biological relevance. Lasso’s regularization properties efficiently distilled 1,719 radiomics features to 20 non-redundant predictors, mitigating overfitting while preserving texture and shape parameters critical for quantifying pancreatic heterogeneity—a strategy validated in pancreatic cancer studies by Kim et al. (2019). Meanwhile, RF’s inherent ability to rank nonlinear interactions among clinical variables identified MPD, lesion location, PAR, and other important features as key contributors, reflecting anatomical risk and systemic inflammation, respectively (Huang et al., 2022; Tan et al., 2022). This hybrid approach harmonizes the strengths of both methods: Lasso’s sparsity induction for dimensionality reduction and RF’s robustness in handling multicollinearity, aligning with methodological frameworks advocating combined techniques for high-dimensional biomedical data (Azodi et al., 2020; Kumarasamy et al., 2021).

4.2 Performance comparison between unimodal models

The experimental results underscore the differential performance of ML predictors when utilizing single-modal versus multimodal features. Models trained solely on selected radiomics features achieved moderate predictive accuracy (AUC: 0.74–0.87), with texture parameters such as GLSZM and Gray Level Dependence Matrix (GLDM) emerging as pivotal predictors, consistent with studies emphasizing their utility in quantifying tissue heterogeneity and fibrosis—key determinants of pancreatic anastomotic integrity (Lubner et al., 2017; Chitalia and Kontos, 2019; Kim et al., 2019). For instance, Abunahel et al. linked GLSZM features to pancreatic stiffness, a surrogate for soft pancreatic texture widely associated with CR-POPF (Abunahel et al., 2022). Similarly, Capretti et al. reported comparable AUCs (0.75–0.81) using CT texture analysis, underscoring the reproducibility of radiomics in pancreatic risk stratification (Capretti et al., 2022). However, the inherent limitations of unimodal radiomics models—such as their inability to incorporate systemic physiological variables—highlight the necessity of integrating clinical data to enhance generalizability. Our clinical-only model, incorporating variables such as MPD diameter, lesion location, and PAR, achieved an AUC of 0.82–0.85. While this performance aligns with the predictive capacity of established risk scores like the FRS and updated alternative FRS (ua-FRS) (Mungroop et al., 2019; Mungroop et al., 2021), it demonstrates a moderate improvement over their external validation results (AUC: 0.74–0.82), highlighting the potential advantages of integrating modern ML frameworks with preoperative clinical indices. Notably, the significantly smaller MPD diameter in CR-POPF patients (2.77 vs 4.25 mm, P < 0.001) reflects multifactorial pathophysiology involving impaired drainage, reduced fibrosis-mediated anastomotic stability, and elevated duct-to-mucosa tension, synergistically increasing fistula risk (Casciani et al., 2021; Lee et al., 2023). Despite these strengths, clinical models struggle to capture subvisual parenchymal changes, such as microlobular fat infiltration or ductal microcalcifications, which radiomics excels in detecting (Chitalia and Kontos, 2019; Abunahel et al., 2022). This limitation highlights the necessity of integrating multimodal data to address the multifactorial nature of CR-POPF pathogenesis.

4.3 Superiority and interpretability of the combined model

The multimodal XGBoost model (AUC: 0.93) outperformed all unimodal approaches, underscoring the synergistic value of combining radiomics and clinical data. This aligns with emerging paradigms in precision oncology, where combined models consistently outperform unimodal approaches by encapsulating both macroscopic pathophysiology and microenvironmental heterogeneity (Shen et al., 2022; Verma et al., 2024). However, ML techniques are frequently characterized as “black boxes,” with limited studies dedicated to elucidating the sources of their predictions. This underscores an additional advantage of our study: following the training and evaluation of the model, we employed SHAP methods to interpret the “black box” nature of the ML model. By presenting the SHAP values, we elucidated the relationship between critical covariates and the estimated risk of CR-POPF: wavelet-HHL_glszm_GrayLevelNonUniformity (reflecting parenchymal disorganization) and MPD diameter jointly drove predictions, mirroring the interplay between ductal anatomy and tissue integrity. Such findings resonate with Lambin et al.’s assertion that radiomics bridges qualitative imaging and quantitative biology, thereby advancing clinical decision-making (Lambin et al., 2017).

4.4 Clinical implications for personalized prevention

Furthermore, case analysis elucidates the contributions of critical features within individual cases and computes the final Shapley values to derive the ultimate prediction probabilities, thereby facilitating personalized predictions. For patients at high risk of CR-POPF, preoperative preventive strategies, including nutritional support, optimization of diabetes and exocrine insufficiency, and respiratory training, may confer substantial benefits (Ausania et al., 2019; Bundred et al., 2020). Additionally, prophylactic medications, such as somatostatin analogs or hydrocortisone, have demonstrated efficacy in reducing complications associated with pancreatic surgery (Allen et al., 2014; Laaninen et al., 2016; Tarvainen et al., 2020). Risk assessment identifies patients best suited for interventions, cutting unnecessary medication costs. Evaluating the risk of CR-POPF also facilitates the management of drainage by enabling the early removal of drains in low-risk patients, consequently diminishing the risks of infection and erosion (Conlon et al., 2001; McMillan et al., 2017). Such personalized preventive measures are essential for mitigating the adverse effects associated with CR-POPF.

4.5 Limitations and future directions

Undeniably, our study has several limitations. First, the retrospective study design may introduce selection bias. Second, due to the model being derived from a single center, the sample size is relatively small, and external applicability needs further testing. Third, although our ML model can be used to assess the risk of CR-POPF in precision medicine, too many features limit its clinical application. Future studies should validate this framework prospectively and explore streamlined feature sets to facilitate real-world deployment.

5 Conclusion

This study presents a novel machine learning framework for preoperative prediction of CR-POPF by integrating CT radiomics and clinical features. The model leverages radiomic signatures, such as parenchymal heterogeneity, alongside clinical predictors, including MPD diameter and platelet-to-albumin ratio, achieving superior predictive performance with an AUC of 0.93. Enhanced interpretability is provided through SHAP, which identifies critical feature contributions, such as wavelet-HHL_glszm_GrayLevelNonUniformity, and enables patient-specific risk stratification. The framework offers significant clinical applicability, supporting perioperative interventions like prophylactic medication and optimized drain management to reduce morbidity. By combining quantitative imaging with actionable insights, this work advances precision surgery and highlights the transformative potential of explainable AI in pancreatic surgical oncology.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving humans were approved by Ethics Committee of the First Affiliated Hospital of Chongqing Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because this study is retrospective.

Author contributions

YLi: Writing–original draft, Writing–review and editing, Conceptualization, Data curation, Formal Analysis, Methodology. KZ: Data curation, Methodology, Writing–original draft, Writing–review and editing. YZ: Resources, Writing–original draft, Writing–review and editing. YS: Writing–review and editing. YLu: Supervision, Validation, Writing–review and editing. BZ: Funding acquisition, Project administration, Supervision, Writing–review and editing. ZW: Funding acquisition, Project administration, Supervision, Validation, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This project was supported by the Natural Science Foundation of Chongqing (CSTB2023NSCQ- BHX0131), the Postdoctoral Cultivation Project of the First Affiliated Hospital of Chongqing Medical University (CYYY-BSHPYXM-202315) and Joint project of Chongqing Health Commission and Science and Technology Bureau (2024MSXM093).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe.2025.1510642/full#supplementary-material

References

Abunahel, B. M., Pontre, B., and Petrov, M. S. (2022). Effect of gray value discretization and image filtration on texture features of the pancreas derived from magnetic resonance imaging at 3T. J. Imaging 8 (8), 220. doi:10.3390/jimaging8080220

PubMed Abstract | CrossRef Full Text | Google Scholar

Allen, P. J., Gönen, M., Brennan, M. F., Bucknor, A. A., Robinson, L. M., Pappas, M. M., et al. (2014). Pasireotide for postoperative pancreatic fistula. N. Engl. J. Med. 370 (21), 2014–2022. doi:10.1056/NEJMoa1313688

PubMed Abstract | CrossRef Full Text | Google Scholar

Ausania, F., Senra, P., Meléndez, R., Caballeiro, R., Ouviña, R., and Casal-Núñez, E. (2019). Prehabilitation in patients undergoing pancreaticoduodenectomy: a randomized controlled trial. Rev. Esp. Enferm. Dig. 111 (8), 603–608. doi:10.17235/reed.2019.6182/2019

PubMed Abstract | CrossRef Full Text | Google Scholar

Azodi, C. B., Tang, J., and Shiu, S. H. (2020). Opening the black box: interpretable machine learning for geneticists. Trends Genet. 36 (6), 442–455. doi:10.1016/j.tig.2020.03.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Bassi, C., Marchegiani, G., Dervenis, C., Sarr, M., Abu Hilal, M., Adham, M., et al. (2017). The 2016 update of the International Study Group (ISGPS) definition and grading of postoperative pancreatic fistula: 11 Years after. Surgery 161 (3), 584–591. doi:10.1016/j.surg.2016.11.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Bergeat, D., Merdrignac, A., Robin, F., Gaignard, E., Rayar, M., Meunier, B., et al. (2020). Nasogastric decompression vs No decompression after pancreaticoduodenectomy: the randomized clinical IPOD trial. JAMA Surg. 155 (9), e202291. doi:10.1001/jamasurg.2020.2291

PubMed Abstract | CrossRef Full Text | Google Scholar

Bundred, J. R., Kamarajah, S. K., Hammond, J. S., Wilson, C. H., Prentis, J., and Pandanaboyana, S. (2020). Prehabilitation prior to surgery for pancreatic cancer: a systematic review. Pancreatology 20 (6), 1243–1250. doi:10.1016/j.pan.2020.07.411

PubMed Abstract | CrossRef Full Text | Google Scholar

Callery, M. P., Pratt, W. B., Kent, T. S., Chaikof, E. L., and Vollmer, C. M. (2013). A prospectively validated clinical risk score accurately predicts pancreatic fistula after pancreatoduodenectomy. J. Am. Coll. Surg. 216 (1), 1–14. doi:10.1016/j.jamcollsurg.2012.09.002

PubMed Abstract | CrossRef Full Text | Google Scholar

Capretti, G., Bonifacio, C., De Palma, C., Nebbia, M., Giannitto, C., Cancian, P., et al. (2022). A machine learning risk model based on preoperative computed tomography scan to predict postoperative outcomes after pancreatoduodenectomy. Updat. Surg. 74 (1), 235–243. doi:10.1007/s13304-021-01174-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Casciani, F., Bassi, C., and Vollmer, C. M. (2021). Decision points in pancreatoduodenectomy: insights from the contemporary experts on prevention, mitigation, and management of postoperative pancreatic fistula. Surgery 170 (3), 889–909. doi:10.1016/j.surg.2021.02.064

PubMed Abstract | CrossRef Full Text | Google Scholar

Chitalia, R. D., and Kontos, D. (2019). Role of texture analysis in breast MRI as a cancer biomarker: a review. J. Magn. Reson Imaging 49 (4), 927–938. doi:10.1002/jmri.26556

PubMed Abstract | CrossRef Full Text | Google Scholar

Collins, G. S., Reitsma, J. B., Altman, D. G., and Moons, K. G. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Bmj 350, g7594. doi:10.1136/bmj.g7594

PubMed Abstract | CrossRef Full Text | Google Scholar

Conlon, K. C., Labow, D., Leung, D., Smith, A., Jarnagin, W., Coit, D. G., et al. (2001). Prospective randomized clinical trial of the value of intraperitoneal drainage after pancreatic resection. Ann. Surg. 234 (4), 487–494. doi:10.1097/00000658-200110000-00008

PubMed Abstract | CrossRef Full Text | Google Scholar

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognit. Lett. 27 (8), 861–874. doi:10.1016/j.patrec.2005.10.010

CrossRef Full Text | Google Scholar

Gillies, R. J., Kinahan, P. E., and Hricak, H. (2016). Radiomics: images are more than pictures, they are data. Radiology 278 (2), 563–577. doi:10.1148/radiol.2015151169

PubMed Abstract | CrossRef Full Text | Google Scholar

Hirono, S., Kawai, M., Okada, K. I., Miyazawa, M., Kitahata, Y., Hayami, S., et al. (2019). Modified blumgart mattress suture versus conventional interrupted suture in pancreaticojejunostomy during pancreaticoduodenectomy: randomized controlled trial. Ann. Surg. 269 (2), 243–251. doi:10.1097/sla.0000000000002802

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Z., Zheng, Q., Yu, Y., Zheng, H., Wu, Y., Wang, Z., et al. (2022). Prognostic significance of platelet-to-albumin ratio in patients with esophageal squamous cell carcinoma receiving definitive radiotherapy. Sci. Rep. 12 (1), 3535. doi:10.1038/s41598-022-07546-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Kantor, O., Talamonti, M. S., Pitt, H. A., Vollmer, C. M., Riall, T. S., Hall, B. L., et al. (2017). Using the NSQIP pancreatic demonstration project to derive a modified fistula risk score for preoperative risk stratification in patients undergoing pancreaticoduodenectomy. J. Am. Coll. Surg. 224 (5), 816–825. doi:10.1016/j.jamcollsurg.2017.01.054

PubMed Abstract | CrossRef Full Text | Google Scholar

Kim, B. R., Kim, J. H., Ahn, S. J., Joo, I., Choi, S. Y., Park, S. J., et al. (2019). CT prediction of resectability and prognosis in patients with pancreatic ductal adenocarcinoma after neoadjuvant treatment using image findings and texture analysis. Eur. Radiol. 29 (1), 362–372. doi:10.1007/s00330-018-5574-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Kumarasamy, C., Tiwary, V., Sunil, K., Suresh, D., Shetty, S., Muthukaliannan, G. K., et al. (2021). Prognostic utility of platelet-lymphocyte ratio, neutrophil-lymphocyte ratio and monocyte-lymphocyte ratio in head and Neck cancers: a detailed PRISMA compliant systematic review and meta-analysis. Cancers (Basel) 13 (16), 4166. doi:10.3390/cancers13164166

PubMed Abstract | CrossRef Full Text | Google Scholar

Laaninen, M., Sand, J., Nordback, I., Vasama, K., and Laukkarinen, J. (2016). Perioperative hydrocortisone reduces major complications after pancreaticoduodenectomy: a randomized controlled trial. Ann. Surg. 264 (5), 696–702. doi:10.1097/sla.0000000000001883

PubMed Abstract | CrossRef Full Text | Google Scholar

Lambin, P., Leijenaar, R. T. H., Deist, T. M., Peerlings, J., de Jong, E. E. C., van Timmeren, J., et al. (2017). Radiomics: the bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol. 14 (12), 749–762. doi:10.1038/nrclinonc.2017.141

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, B., Yoon, Y. S., Kang, C. M., Choi, M., Lee, J. S., Hwang, H. K., et al. (2023). Validation of original, alternative, and updated alternative fistula risk scores after open and minimally invasive pancreatoduodenectomy in an Asian patient cohort. Surg. Endosc. 37 (3), 1822–1829. doi:10.1007/s00464-022-09633-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Lubner, M. G., Smith, A. D., Sandrasegaran, K., Sahani, D. V., and Pickhardt, P. J. (2017). CT texture analysis: definitions, applications, biologic correlates, and challenges. Radiographics 37 (5), 1483–1503. doi:10.1148/rg.2017170056

PubMed Abstract | CrossRef Full Text | Google Scholar

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 (1), 56–67. doi:10.1038/s42256-019-0138-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, L. W., Dominguez-Rosado, I., Gennarelli, R. L., Bach, P. B., Gonen, M., D'Angelica, M. I., et al. (2017). The cost of postoperative pancreatic fistula versus the cost of pasireotide: results from a prospective randomized trial. Ann. Surg. 265 (1), 11–16. doi:10.1097/sla.0000000000001892

PubMed Abstract | CrossRef Full Text | Google Scholar

Mack, A., Vanden Hoek, T., and Du, X. (2024). Thromboinflammation and the role of platelets. Arterioscler. Thromb. Vasc. Biol. 44 (6), 1175–1180. doi:10.1161/atvbaha.124.320149

PubMed Abstract | CrossRef Full Text | Google Scholar

McMillan, M. T., Malleo, G., Bassi, C., Allegrini, V., Casetti, L., Drebin, J. A., et al. (2017). Multicenter, prospective trial of selective drain management for pancreatoduodenectomy using risk stratification. Ann. Surg. 265 (6), 1209–1218. doi:10.1097/sla.0000000000001832

PubMed Abstract | CrossRef Full Text | Google Scholar

Mungroop, T. H., Klompmaker, S., Wellner, U. F., Steyerberg, E. W., Coratti, A., D'Hondt, M., et al. (2021). Updated alternative fistula risk score (ua-FRS) to include minimally invasive pancreatoduodenectomy: pan-European validation. Ann. Surg. 273 (2), 334–340. doi:10.1097/sla.0000000000003234

PubMed Abstract | CrossRef Full Text | Google Scholar

Mungroop, T. H., van Rijssen, L. B., van Klaveren, D., Smits, F. J., van Woerden, V., Linnemann, R. J., et al. (2019). Alternative fistula risk score for pancreatoduodenectomy (a-FRS): design and international external validation. Ann. Surg. 269 (5), 937–943. doi:10.1097/sla.0000000000002620

PubMed Abstract | CrossRef Full Text | Google Scholar

Rigiroli, F., Hoye, J., Lerebours, R., Lafata, K. J., Li, C., Meyer, M., et al. (2021). CT radiomic features of superior mesenteric artery involvement in pancreatic ductal adenocarcinoma: a pilot study. Radiology 301 (3), 610–622. doi:10.1148/radiol.2021210699

PubMed Abstract | CrossRef Full Text | Google Scholar

Shen, Z., Chen, H., Wang, W., Xu, W., Zhou, Y., Weng, Y., et al. (2022). Machine learning algorithms as early diagnostic tools for pancreatic fistula following pancreaticoduodenectomy and guide drain removal: a retrospective cohort study. Int. J. Surg. 102, 106638. doi:10.1016/j.ijsu.2022.106638

PubMed Abstract | CrossRef Full Text | Google Scholar

Sokolova, M., and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45 (4), 427–437. doi:10.1016/j.ipm.2009.03.002

CrossRef Full Text | Google Scholar

Tan, J., Song, G., Wang, S., Dong, L., Liu, X., Jiang, Z., et al. (2022). Platelet-to-Albumin ratio: a novel IgA nephropathy prognosis predictor. Front. Immunol. 13, 842362. doi:10.3389/fimmu.2022.842362

PubMed Abstract | CrossRef Full Text | Google Scholar

Tarvainen, T., Sirén, J., Kokkola, A., and Sallinen, V. (2020). Effect of hydrocortisone vs pasireotide on pancreatic surgery complications in patients with high risk of pancreatic fistula: a randomized clinical trial. JAMA Surg. 155 (4), 291–298. doi:10.1001/jamasurg.2019.6019

PubMed Abstract | CrossRef Full Text | Google Scholar

Verma, A., Balian, J., Hadaya, J., Premji, A., Shimizu, T., Donahue, T., et al. (2024). Machine learning-based prediction of postoperative pancreatic fistula following pancreaticoduodenectomy. Ann. Surg. 280 (2), 325–331. doi:10.1097/sla.0000000000006123

PubMed Abstract | CrossRef Full Text | Google Scholar

Williamsson, C., Ansari, D., Andersson, R., and Tingstedt, B. (2017). Postoperative pancreatic fistula-impact on outcome, hospital cost and effects of centralization. HPB Oxf. 19 (5), 436–442. doi:10.1016/j.hpb.2017.01.004

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, Y., Peng, B., Liu, J., Yin, X., Tan, Z., Liu, R., et al. (2023). Textbook outcome as a composite outcome measure in laparoscopic pancreaticoduodenectomy: a multicenter retrospective cohort study. Int. J. Surg. 109 (3), 374–382. doi:10.1097/js9.0000000000000303

PubMed Abstract | CrossRef Full Text | Google Scholar

Zureikat, A. H., Postlewait, L. M., Liu, Y., Gillespie, T. W., Weber, S. M., Abbott, D. E., et al. (2016). A multi-institutional comparison of perioperative outcomes of robotic and open pancreaticoduodenectomy. Ann. Surg. 264 (4), 640–649. doi:10.1097/sla.0000000000001869

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: computed tomography, clinically relevant postoperative pancreatic fistula, machine learning, radiomics, the shapley additive explanations

Citation: Li Y, Zong K, Zhou Y, Sun Y, Liu Y, Zhou B and Wu Z (2025) Enhanced preoperative prediction of pancreatic fistula using radiomics and clinical features with SHAP visualization. Front. Bioeng. Biotechnol. 13:1510642. doi: 10.3389/fbioe.2025.1510642

Received: 13 October 2024; Accepted: 21 March 2025;
Published: 04 April 2025.

Edited by:

Hyunjin Park, Sungkyunkwan University, Republic of Korea

Reviewed by:

Lixia Wang, Cedars Sinai Medical Center, United States
Yingjian Yang, Shenzhen Lanmage Medical Technology Co., Ltd, China

Copyright © 2025 Li, Zong, Zhou, Sun, Liu, Zhou and Wu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zhongjun Wu, d3pqdGN5QDEyNi5jb20=; Baoyong Zhou, c2RseXRjemh6aEAxNjMuY29t; Yanyao Liu, bGl1eWFueWFvMTQ3QHNpbmEuY29t

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Enhanced preoperative prediction of pancreatic fistula using radiomics and clinical features with SHAP visualization

1 Introduction

2 Materials and methods

2.1 Study cohort

2.2 CT technique

2.3 Image preprocessing and segmentation

2.4 Feature extraction and selection

2.5 Model construction and validation

2.6 Evaluation metrics

2.7 statistical analysis

3 Results

3.1 Feature selection outcomes

3.2 Model performance comparison

3.3 XGBoost combined model for SHAP

4 Discussion

4.1 Synergistic feature selection strategy

4.2 Performance comparison between unimodal models

4.3 Superiority and interpretability of the combined model

4.4 Clinical implications for personalized prevention

4.5 Limitations and future directions

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

Supplementary material

References

94% of researchers rate our articles as excellent or good

94% of researchers rate our articles as excellent or good