Background

AUTHOR=Huber Markus , Luedi Markus M. , Schubert Gerrit A. , Musahl Christian , Tortora Angelo , Frey Janine , Beck Jürgen , Mariani Luigi , Christ Emanuel , Andereggen Lukas 

TITLE=Machine Learning for Outcome Prediction in First-Line Surgery of Prolactinomas

JOURNAL=Frontiers in Endocrinology

VOLUME=Volume 13 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2022.810219

DOI=10.3389/fendo.2022.810219

ISSN=1664-2392

ABSTRACT=<sec><title>Background</title><p>First-line surgery for prolactinomas has gained increasing acceptance, but the indication still remains controversial. Thus, accurate prediction of unfavorable outcomes after upfront surgery in prolactinoma patients is critical for the triage of therapy and for interdisciplinary decision-making.</p></sec><sec><title>Objective</title><p>To evaluate whether contemporary machine learning (ML) methods can facilitate this crucial prediction task in a large cohort of prolactinoma patients with first-line surgery, we investigated the performance of various classes of supervised classification algorithms. The primary endpoint was ML-applied risk prediction of long-term dopamine agonist (DA) dependency. The secondary outcome was the prediction of the early and long-term control of hyperprolactinemia.</p></sec><sec><title>Methods</title><p>By jointly examining two independent performance metrics – the area under the receiver operating characteristic (AUROC) and the Matthews correlation coefficient (MCC) – in combination with a stacked <italic>super learner</italic>, we present a novel perspective on how to assess and compare the discrimination capacity of a set of binary classifiers.</p></sec><sec><title>Results</title><p>We demonstrate that for upfront surgery in prolactinoma patients there are not a <italic>one-algorithm-fits-all</italic> solution in outcome prediction: different algorithms perform best for different time points and different outcomes parameters. In addition, ML classifiers outperform logistic regression in both performance metrics in our cohort when predicting the primary outcome at long-term follow-up and secondary outcome at early follow-up, thus provide an added benefit in risk prediction modeling. In such a setting, the stacking framework of combining the predictions of individual <italic>base learners</italic> in a so-called <italic>super learner</italic> offers great potential: the <italic>super learner</italic> exhibits very good prediction skill for the primary outcome (AUROC: mean 0.9, 95% CI: 0.92 – 1.00; MCC: 0.85, 95% CI: 0.60 – 1.00). In contrast, predicting control of hyperprolactinemia is challenging, in particular in terms of early follow-up (AUROC: 0.69, 95% CI: 0.50 – 0.83) vs. long-term follow-up (AUROC: 0.80, 95% CI: 0.58 – 0.97). It is of clinical importance that baseline prolactin levels are by far the most important outcome predictor at early follow-up, whereas remissions at 30 days dominate the ML prediction skill for DA-dependency over the long-term.</p></sec><sec><title>Conclusions</title><p>This study highlights the performance benefits of combining a diverse set of classification algorithms to predict the outcome of first-line surgery in prolactinoma patients. We demonstrate the added benefit of considering two performance metrics jointly to assess the discrimination capacity of a diverse set of classifiers.</p></sec>