AUTHOR=Luo Yi , Cuneo Kyle C. , Lawrence Theodore S. , Matuszak Martha M. , Dawson Laura A. , Niraula Dipesh , Ten Haken Randall K. , El Naqa Issam TITLE=A human-in-the-loop based Bayesian network approach to improve imbalanced radiation outcomes prediction for hepatocellular cancer patients with stereotactic body radiotherapy JOURNAL=Frontiers in Oncology VOLUME=12 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.1061024 DOI=10.3389/fonc.2022.1061024 ISSN=2234-943X ABSTRACT=Background

Imbalanced outcome is one of common characteristics of oncology datasets. Current machine learning approaches have limitation in learning from such datasets. Here, we propose to resolve this problem by utilizing a human-in-the-loop (HITL) approach, which we hypothesize will also lead to more accurate and explainable outcome prediction models.

Methods

A total of 119 HCC patients with 163 tumors were used in the study. 81 patients with 104 tumors from the University of Michigan Hospital treated with SBRT were considered as a discovery dataset for radiation outcomes model building. The external testing dataset included 59 tumors from 38 patients with SBRT from Princess Margaret Hospital. In the discovery dataset, 100 tumors from 77 patients had local control (LC) (96% of 104 tumors) and 23 patients had at least one grade increment of ALBI (I-ALBI) during six-month follow up (28% of 81 patients). Each patient had a total of 110 features, where 15 or 20 features were identified by physicians as expert knowledge features (EKFs) for LC or I-ALBI prediction. We proposed a HITL based Bayesian network (HITL-BN) approach to enhance the capability of selecting important features from imbalanced data in terms of accuracy and explainability through humans’ participation by integrating feature importance ranking and Markov blanket algorithms. A pure data-driven Bayesian network (PD-BN) method was applied to the same discovery dataset of HCC patients as a benchmark.

Results

In the training and testing phases, the areas under receiver operating characteristic curves of the HITL-BN models for LC or I-ALBI prediction during SBRT are 0.85 (95% confidence interval: 0.75-0.95) or 0.89 (0.81-0.95) and 0.77 or 0.78, respectively. They significantly outperformed the during-treatment PD-BN model in predicting LC or I-ALBI based on the discovery cross-validation and testing datasets from the Delong tests.

Conclusion

By allowing the human expert to be part of the model building process, the HITL-BN approach yielded significantly improved accuracy as well as better explainability when dealing with imbalanced outcomes in the prediction of post-SBRT treatment response of HCC patients when compared to the PD-BN method.