AUTHOR=Kim Ji-Yeon , Lee Yong Seok , Yu Jonghan , Park Youngmin , Lee Se Kyung , Lee Minyoung , Lee Jeong Eon , Kim Seok Won , Nam Seok Jin , Park Yeon Hee , Ahn Jin Seok , Kang Mira , Im Young-Hyuck 

TITLE=Deep Learning-Based Prediction Model for Breast Cancer Recurrence Using Adjuvant Breast Cancer Cohort in Tertiary Cancer Center Registry

JOURNAL=Frontiers in Oncology

VOLUME=11

YEAR=2021

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2021.596364

DOI=10.3389/fonc.2021.596364

ISSN=2234-943X

ABSTRACT=<p>Several prognosis prediction models have been developed for breast cancer (BC) patients with curative surgery, but there is still an unmet need to precisely determine BC prognosis for individual BC patients in real time. This is a retrospectively collected data analysis from adjuvant BC registry at Samsung Medical Center between January 2000 and December 2016. The initial data set contained 325 clinical data elements: baseline characteristics with demographics, clinical and pathologic information, and follow-up clinical information including laboratory and imaging data during surveillance. Weibull Time To Event Recurrent Neural Network (WTTE-RNN) by Martinsson was implemented for machine learning. We searched for the optimal window size as time-stamped inputs. To develop the prediction model, data from 13,117 patients were split into training (60%), validation (20%), and test (20%) sets. The median follow-up duration was 4.7 years and the median number of visits was 8.4. We identified 32 features related to BC recurrence and considered them in further analyses. Performance at a point of statistics was calculated using Harrell's C-index and area under the curve (AUC) at each 2-, 5-, and 7-year points. After 200 training epochs with a batch size of 100, the C-index reached 0.92 for the training data set and 0.89 for the validation and test data sets. The AUC values were 0.90 at 2-year point, 0.91 at 5-year point, and 0.91 at 7-year point. The deep learning-based final model outperformed three other machine learning-based models. In terms of pathologic characteristics, the median absolute error (MAE) and weighted mean absolute error (wMAE) showed great results of as little as 3.5%. This BC prognosis model to determine the probability of BC recurrence in real time was developed using information from the time of BC diagnosis and the follow-up period in RNN machine learning model.</p>