Skip to main content

ORIGINAL RESEARCH article

Front. Endocrinol., 16 February 2022
Sec. Pituitary Endocrinology
This article is part of the Research Topic Pituitary Adenomas: Targeted Therapy View all 6 articles

Machine Learning for Outcome Prediction in First-Line Surgery of Prolactinomas

  • 1Department of Anaesthesiology and Pain Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
  • 2Department of Neurosurgery, Kantonsspital Aarau, Aarau, Switzerland
  • 3Department of Gynecology and Obstetrics, Kantonsspital Lucerne, Lucerne, Switzerland
  • 4Department of Neurosurgery, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
  • 5Department of Neurosurgery, Medical Center, University of Freiburg, Freiburg, Germany
  • 6Department of Neurosurgery, University Hospital of Basel, Basel, Switzerland
  • 7Department of Endocrinology, Diabetes and Metabolism, University Hospital of Basel, Basel, Switzerland
  • 8Faculty of Medicine, University of Bern, Bern, Switzerland

Background: First-line surgery for prolactinomas has gained increasing acceptance, but the indication still remains controversial. Thus, accurate prediction of unfavorable outcomes after upfront surgery in prolactinoma patients is critical for the triage of therapy and for interdisciplinary decision-making.

Objective: To evaluate whether contemporary machine learning (ML) methods can facilitate this crucial prediction task in a large cohort of prolactinoma patients with first-line surgery, we investigated the performance of various classes of supervised classification algorithms. The primary endpoint was ML-applied risk prediction of long-term dopamine agonist (DA) dependency. The secondary outcome was the prediction of the early and long-term control of hyperprolactinemia.

Methods: By jointly examining two independent performance metrics – the area under the receiver operating characteristic (AUROC) and the Matthews correlation coefficient (MCC) – in combination with a stacked super learner, we present a novel perspective on how to assess and compare the discrimination capacity of a set of binary classifiers.

Results: We demonstrate that for upfront surgery in prolactinoma patients there are not a one-algorithm-fits-all solution in outcome prediction: different algorithms perform best for different time points and different outcomes parameters. In addition, ML classifiers outperform logistic regression in both performance metrics in our cohort when predicting the primary outcome at long-term follow-up and secondary outcome at early follow-up, thus provide an added benefit in risk prediction modeling. In such a setting, the stacking framework of combining the predictions of individual base learners in a so-called super learner offers great potential: the super learner exhibits very good prediction skill for the primary outcome (AUROC: mean 0.9, 95% CI: 0.92 – 1.00; MCC: 0.85, 95% CI: 0.60 – 1.00). In contrast, predicting control of hyperprolactinemia is challenging, in particular in terms of early follow-up (AUROC: 0.69, 95% CI: 0.50 – 0.83) vs. long-term follow-up (AUROC: 0.80, 95% CI: 0.58 – 0.97). It is of clinical importance that baseline prolactin levels are by far the most important outcome predictor at early follow-up, whereas remissions at 30 days dominate the ML prediction skill for DA-dependency over the long-term.

Conclusions: This study highlights the performance benefits of combining a diverse set of classification algorithms to predict the outcome of first-line surgery in prolactinoma patients. We demonstrate the added benefit of considering two performance metrics jointly to assess the discrimination capacity of a diverse set of classifiers.

Introduction

Dopamine agonists (DAs) are the treatment of choice for prolactinomas, given their effectiveness in controlling hyperprolactinemia and restoring gonadal function (13). However, in contrast to previous reports, most patients with low remission rates will need prolonged treatment with DAs (4). Additionally, potential long-time effects (5, 6), - including personality changes (710) - contributed to the increased acceptance of first-line surgery in prolactinomas in recent years (1115). Although upfront surgery has recently been given a more dominant role in the treatment of prolactinomas (16, 17), their indication still remains controversial in selected patients (18, 19). Thus, accurate prediction of unfavorable outcomes after upfront surgery in prolactinoma patients is crucial to the triage of therapy and interdisciplinary decision-making. In this context of medical prognosis and prediction analysis, combining patient data with statistical methods, algorithms and tools that constitute the field of Machine Learning (ML) entails a distinct impact on medical research and clinical practice (2025). As such, we aimed at examining whether and how contemporary ML methods can facilitate outcome prediction of first-line surgery in prolactinoma patients. In addition, we aimed at investigating the performance of various classes of supervised classification algorithms in predicting the risk of dependence on DAs over the long-term, as well as the control of hyperprolactinemia at early and long-term follow-up.

In particular, instead of finding a single best-performing model determined by a single performance metric, such as the commonly employed area under the receiver operating characteristic (AUROC), we aimed at focusing on quantifying and illustrating similarities and differences of the various classifiers by investigating two performance metrics jointly for our set of classifiers. We further aimed at providing a statistical framework to examine the cases for which ML methods offer an added benefit compared to traditional statistical approaches such as logistic regression. We will argue that by considering and combining multiple ML classifiers on the one hand and by examining two performance metrics jointly on the other hand, the utility of a set of patient- and treatment-related characteristics in predicting dependence on DAs and the risk of persistent hyperprolactinemia can be robustly investigated.

Methods

Study Design and Preoperative Assessment

This cohort study analyzed data from prolactinoma patients stored in our institutional database and prospectively maintained from January 1996 to December 2015. The Human Research Ethics Committee of Bern (Cantonal Ethikkommission KEK Bern, Bern, Switzerland) approved the project (KEK n° 10-10-2006 and 8-11-2006). Collected data included all consecutive prolactinoma patients with performance of upfront surgery in the treatment of either a micro- or macroprolactinoma. Thereby, a tumor diameter of 1–10 mm was characterized as a microadenoma and >10 mm as a macroadenoma, respectively. Invasiveness of the cavernous sinus was defined as Knosp grading ≥1 (11, 26, 27). Diagnosis of prolactinoma was based on biochemical and clinical assessment as well as on a standard protocol for the detection of pituitary adenomas with magnetic resonance imaging (MRI) (2830). Biochemical measurements of PRL levels including the immunoradiometric PRL assay to overcome the high-dose PRL hook effect were completed (31), and the presence of macroprolactin was examined (32). Upper limits of >20 ng/mL were defined as hyperprolactinemia (33). Diagnosis was extended to immunohistochemical confirmation with a PRL antibody as an immunohistochemical marker according to the WHO classification of neuroendocrine tumors (34).

Partial hypopituitarism was considered when there was impaired secretion of one or more pituitary hormones. Secondary hypocorticism was defined in the presence of low serum cortisol (<50 nmol/L), or normal cortisol but inadequate responses to the insulin tolerance test or the adrenocorticotropin (ACTH) stimulation test. Secondary hypothyroidism was characterized by the presence of low-normal thyroid-stimulating hormone (TSH) levels along with a low free thyroxin (FT4) level. Central hypogonadism was defined as low-normal levels of gonadotropins in parallel with low estradiol/testosterone levels.

The indication for surgery was discussed by an interdisciplinary group at the weekly pituitary tumor board meeting, with consensus tailored to preventing patients from becoming dependent on DA therapy over the long term. The treatment decision was again discussed with the patient and the choice was based on his or her preference. Patients who had previously received DAs were excluded from the study.

Postoperative and Long-Term Assessment

Early (short-term) follow-up occurred three months following surgery. If serum PRL levels were > 20 µg/L at that time, DA therapy was initiated (35), except in patients with prolactin levels slightly above the normal range but lacking clinical symptoms. In these patients, prolactin levels were subsequently reassessed. Late (Long-term) follow-up was defined as the last documented visit to the endocrine outpatient clinic. After initiation of DAs, medical therapy was tapered at 24 months if PRL levels were in the normal range (36, 37). Serum PRL level < 20 µg/L at last follow-up was characterized as in remission.

Primary and Secondary Endpoints

The primary outcome is defined as long-term dependence on DAs. The secondary outcomes are defined as the successful control of hyperprolactinemia on early-term and long-term follow-up.

Statistical Analysis and Prediction Modeling

Descriptive Statistics and Predictors

In terms of descriptive statistics, continuous variables were examined with the Shapiro-Wilk normality test and are presented with mean and standard deviation for normally distributed variables and with median and interquartile range (IQR) otherwise. Categorical variables are presented with counts and percentages.

The following patients and treatment-related characteristics were available as predictors: age (numerical), sex (binary), adenoma size (binary, i.e. micro- vs. macro-adenoma), the incidence of headache at patients’ presentation (binary), partial hypopituitarism (binary), cavernous sinus invasion (binary), baseline prolactin levels (numerical) and remission at 30 days (binary; only used as a predictor of the long-term outcomes).

Machine Learning Algorithms and Hyperparameter Selection

The selection of ML algorithms (the corresponding R packages are listed in italics) features a broad spectrum of algorithmic diversity and includes decision-tree-based algorithms [Random Forest, randomForest (38)], a distance-based algorithm [k-Nearest Neighbor, kknn (39)], standard (Logistic Regression) and penalized regression-based algorithms [Elasticnet Regularization; glmnet (40)], a feed-forward neural network with a single hidden layer [nnet (41)], flexible discriminant analysis [earth (42)], support vector machines [e1071 (43)] as well as gradient boosting machines [gbm (44)]. A detailed description of each algorithm is beyond the scope of the present study and we refer the reader to the pertinent literature, e.g. (45, 46).

We adopted a heuristic approach to examine which algorithm-dependent hyperparameters are necessary to optimize in our setting. For each ML algorithm, we examined all hyperparameters and selected only those which (i) were tunable and (ii) featured a default value. For categorical hyperparameters, we sampled all possible predefined values uniformly. In case of integer or continuous hyperparameters, we sampled randomly and uniformly from an order of magnitude lower than the default value up to an order of magnitude greater than the default value (where numerically possible), thus accounting for the skewed nature of most continuous hyperparameters. For example, the default number of decision trees (ntree) in the Random Forest algorithm was set to 50, and we sampled accordingly from 5 to 500 trees. The importance of each hyperparameter was assessed by randomly sampling 50 values and examining the area under the curve (AUROC) in a three-fold repeated cross-validation sampling (RepCV) with 4 repetitions. Based on the AUROC distribution of each hyperparameter, we chose two hyperparameters for each algorithm. These were subsequently co-sampled. In addition to computing the performance of individual classifiers (so-called base learners), we combined the predictions of the base learners in a stacking framework in to a so-called super learner (47). We chose a gradient boosting machine as the super learner.

Cross-Validation and Missing Data

A three-fold RepCV sampling with 100 repetitions was computed for each classifier and each outcome (the so-called inner loop), which was repeated for 100 different, randomly sampled hyperparameters combinations of each algorithm (the so-called outer loop).

The dataset features missing data at random in several variables, and data availability is indicated in each Table 1. Patients with missing data in the outcome variables are omitted in the prediction modeling (complete-case analysis). A single imputation method was used for missing predictor values: missing numerical data were imputed using the median value across the available patients, whereas the mode value was used for missing categorical variables. The single imputed dataset was used in the RepCV sampling.

TABLE 1
www.frontiersin.org

Table 1 Patients’ characteristics at diagnosis.

Performance Metrics and Predictor Importance

We assess the discrimination ability of the various classifiers using two independent performance metrics: the area under the receiver operating characteristic (AUROC) and the Matthews correlation coefficient (MCC). One of the advantages of the MCC is that it is based on the full confusion matrix (i.e. true and false both positives and negatives) (48); another is that it performs well on imbalanced data sets (49). By considering the two performance indicators together we get a more detailed and comprehensive assessment of the performance of a binary classifier: whereas the AUROC indicator measures diagnostic ability by comparing the true positive rate (TPR) with the false positive rate (FPR) and varying the threshold (or cutoff) used to make the classification, the MCC is not based on varying the threshold but rather explicitly accounts for the balance ratios of the 4 entries in the confusion matrix.

The importance of each predictor is assessed within a permutation framework: as performance metric we choose the AUROC and the change in AUROC is computed when the values of a particular predictor (i.e. age) are permuted within the patients: the larger the change in the AUROC with respect to the AUROC based on the original, unpermuted data, the more important a predictor is considered to be.

Statistical Software

All computations were performed with R version 4.0.5 (50). In particular, the machine learning workbench mlr (51) is used to compute and evaluate the various ML algorithms.

Results

Characteristics of the Study Population

Patients’ demographic and baseline characteristics are summarized in Table 1. For the 86 patients undergoing first-line surgery, median age was 32 years (IQR, 27 - 42 years) and 82.6% were female. A macroadenoma was diagnosed in 41 patients (47.7%). Fifty-three patients (76.8%) exhibited secondary (hypogonadotroph) hypogonadism, with secondary hypothyroidism present in 4 patients (5.3%) and secondary hypocorticism present in 3 patients (4.1%), respectively. Median prolactin levels were 199µg/L (IQR, 97.6 - 443.0 µg/L).

Outcomes at early and long-term follow-up are shown in Table 2. As for surgery alone, we noted that remission was achieved in 52 (63%) patients at early follow-up, and in 49 (59%) patients in the long-term. For the control of hyperprolactinemia, DA was ultimately required in 19 (22%) patients at early follow-up, and in 31 (36%) patients at the long-term follow-up. All of the patients with long-term DA dependency did not show remission at early follow-up.

TABLE 2
www.frontiersin.org

Table 2 Patients’ characteristics at early (30 days postoperatively) and long-term follow-up.

Thereby, daily doses of DA agonists at early follow-up were as follows (mean ± SD): bromocriptine 7.1 ± 1.0 mg, and cabergoline 0.08 ± 0.03 mg. Daily doses at last follow-up were 5.9 ± 2.9 mg for bromocriptine, and 0.09 ± 0.03 mg for cabergoline.

Patients with short-term remission had significantly lower PRL levels than those without short-term remission (133 μg/L (IQR 78–224 μg/L) vs. 303 μg/L (IQR 211–900 μg/L), p < 0.001).

Cavernous sinus invasion was a significant predictor for long-term dependence on DAs (p=0.03) when excluding the predictor remission from the multivariable regression due to the near-complete separation.

Secondary hypothyroidism was present in 8 patients (9.4%), with levothyroxine substitution therapy being prescribed in all but one of them.

Diabetes insipidus (DI) or Syndrome of inappropriate antidiuretic hormone secretion (SIADH) was biochemically documented in case of clinical suspicion only. Thereby, SIADH was present in 10%, and DI in 13% of patients, respectively.

Hyperparameter Tuning

The range of AUROC values derived from perturbing the default hyperparameters for each classifier is illustrated in Figure 1. The target variable for this hyperparameter sensitivity analysis was DA-dependency at the long-term follow-up (primary outcome). Most classifiers perform very well, with AUROC values above 0.9 with default hyperparameter settings. Only a few classifiers displayed significant sensitivity of hyperparameter settings, and thus had the potential to achieve higher AUROC performances by hyperparametertuning, notably the Gradient Boosting Machine (GBM), the Neural Network (NNET) and the k-nearest neighbor (KNN) classifiers. Note that the logistic regression features performance metrics similar to those of the other algorithms, even outperforming them in the case of the NNET classifier. From here onwards, we selected two hyperparameters for each classifier, based on their individual capability in increasing the discrimination ability of the corresponding classifier, and sampled them jointly.

FIGURE 1
www.frontiersin.org

Figure 1 Hyperparameter tuning in our set of machine learning classifiers. The impact of varying the default values of a single hyperparameter on the area under the curve (AUROC) is illustrated for a selection of hyperparameters in each algorithm (shown on the ordinate). Each hyperparameter is sampled 50 times and its performance is assessed within a repeated cross-validation sampling (three-fold, 4-repeats), resulting in an AUROC distribution, which is illustrated with a box and whiskers plot. The outcome was dependence on dopamine agonists at long-term follow-up. For comparison, the range of AUROC values derived using the default hyperparameter settings are shown as DEFAULT in each panel. Due to the repeated cross-validation sampling, the default hyperparameter settings also feature AUROC distributions, despite using only a fixed set of hyperparameters.

Relationship Between the Two Performance Metrics AUROC and MCC

Figure 2 depicts the relationship between two performance metrics in a set of 500 randomly sampled hyperparameters: the area under the curve (AUROC) on the abscissa and the Matthews correlation coefficient (MCC) on the ordinate are shown for each classifier and hyperparameter combination.

FIGURE 2
www.frontiersin.org

Figure 2 Relationship between two performance metrics in a set of supervised classification algorithms resulting from randomly sampling two hyperparameters in each algorithm (N=500 samples). The area under the curve (AUROC) performance indicator is shown on the abscissa, whereas the corresponding value for the Matthews correlation coefficient (MCC) is shown on the ordinate. The outcomes are (A) dependency on DA on long-term follow-up and (B) successful control of hyperprolactinemia at early follow-up. For illustration purposes, a Locally Weighted Scatterplot Smoothing (LOESS) curves with associated 95% confidence intervals are shown for each classification algorithm.

We found a quasi-linear relationship between the AUROC and the MCC for most algorithms, suggesting that a high AUROC performance for an algorithm also features a high MCC. Interestingly, some ML methods such as the k-nearest neighbor and penalized regression display non-linear relationships in AUROC and MCC, implying that some choices of hyperparameters result in performance gains only in one of the performance metrics, while the performance measured by the other metric decreases. Figure 2 further shows that hyperparameter tuning can result in very broad performance ranges, notably by sampling the size of a neural network for the prediction of the primary outcome (Figure 2A). A further insight from Figure 2 is that the range of performances of the standard logistic regression resulting from the RepCV-sampling procedure can be compared to the performance range of “modern” machine learning algorithms resulting from hyperparameter sampling.

Figure 2 further highlights that depending on the choice of hyperparameters, the classifiers can display similar AUROC performances; however, their performance as measured with the MCC metric can be significantly different – at least for the outcomes and predictors available for the present study. For example for the prediction of successful hyperprolactinamia at early follow-up, a Neural Network with a particular choice of hyperparameters can display an AUROC of 0.65 and a (low) MCC of roughly 0.2, whereas a logistic regression can feature the same AUROC value of 0.65 but a comparatively larger MCC of 0.3 (Figure 2B). The added value of ML methods in the modeling setup here is the result that hyperparameter tuning provides the opportunity for some ML to outperform logistic regression in both metrics, thus constituting an added benefit with respect to the more traditional prediction by logistic regression. Note, however, that the performance of logistic regression can be considered competitive with respect to other algorithms, and hyperparameter tuning is often required to achieve the performance gain displayed by other machine learning methods.

Overall, the take-home message of this Figure is that examining the two performance indicators together provides a more comprehensive picture of the overall discrimination ability of a particular classifier, and can facilitate the comparison and choice of a particular machine learning algorithm.

Primary and Secondary Outcomes

Figure 3 shows the median AUROC and MCC values and associated 95% confidence intervals (computed from the repeated cross-validation) for early- and long-term dependency of DAs based on optimized hyperparameter settings. In terms of predicting the DA dependence, Figure 3B demonstrates that the prediction performance is particularly high for the long-term (primary endpoint): a Random Forest classifier features a median AUROC performance of 0.98 and a MCC of 0.93. In this case, all ML algorithms consistently outperform logistic regression. For the prediction of DA dependence on early follow-up, the classifiers feature only moderate performances (median AUROC range: 0.73–0.85, median MCC range: 0.21–0.48, Figure 3B).

FIGURE 3
www.frontiersin.org

Figure 3 Area under the curve (AUROC) and Matthews correlation coefficient (MCC) values for the outcomes at early- and long-term follow-up. Median and 95% confidence intervals are shown, where the latter were derived in a repeated cross-validation sampling (three-fold, 100-repeats). For each machine learning algorithm, two influential hyperparameters (refer to Figure 1) were sampled 100 times and the hyperparameters settings resulting in the best AUROC performance were selected.

The high prediction performance of the classifiers for the primary outcome is strongly related to the association of remission after 30 days: of the 52 out of 83 patients who did not show DA dependency, 49 did show remission at 30 days, whereas all of the patients with long-term DA dependence did not show remission after 30 days. We thus find almost complete separation in these two variables. The importance of remission at 30 days will be further quantified below.

To predict the control of hyperprolactinemia at early follow-up, all classifiers displayed only moderate performance, with median AUROC values ranging from 0.62 to 0.75 and median MCC performance ranging from 0.27 to 0.35. In terms of predicting the long-term outcome in hyperprolactinemia, the overall performance was slightly increased, with moderate median AUROC values ranging from 0.62 (Support Vector Machine) to 0.86 (Gradient Boosting Machine). All MCC values are equal to zero, likely due to the small sample size and the imbalanced datasets: an MCC of zero can result when a row or a column of the confusion matrix measures exactly zero, while the other two entries feature non-zero entries (14). As there were only seven patients with a successful long-term hyperprolactinemia outcome, the data splitting in the cross-validation might result in zero entries in the confusion matrix.

Overall, we noted that there was no single classifier outperforming all other classifiers and that different algorithms performed best for different times and different outcomes. In the context of this near-complete separation for the primary outcome and remission at 30 days, Figure 3 indicates that the ML algorithms might be more capable of handling such variable separation compared to logistic regression, as these classifiers showed better performance metrics and narrower confidence ranges. The complete data table of Figure 3 is provided in the Supplementary Material.

To complete the evaluation of the classifiers on outcomes considered in our analyses, Table 3 presents the performance metrics for a super learner, which combines the predictions of individual base learners (see Methods). The performance of the super learner ranks generally high compared to most individual base learners, however the super learner does not always outperform individual base learners.

TABLE 3
www.frontiersin.org

Table 3 Performance metrics of a stacked super learner combining the outcome predictions of the individual classifiers (referred to as base learners; see method section).

Variable Importance

We next examined the importance of each variable in predicting the outcome at early and long-term follow-up. The decrease in the AUROC values when the values of a particular predictor are perturbed is illustrated in Figure 4. Thus, the more negative the importance metric on the ordinate is, the more important the predictor is considered to be. Thereby, prolactin levels are the most important predictors at early follow-up, both for the control of hyperprolactinemia and for dependence on DAs (Figures 4A, C). In addition, remission from hyperprolactinemia at 30 days is the most important predictor for the long-term dependency of DAs, and this finding is robust across most classifiers, likely due to near-complete separation in the two variables (Figures 4B, D). Of secondary importance are the presence of prolactinoma invasion into the cavernous sinus, as well as patients’ age, BMI and sex.

FIGURE 4
www.frontiersin.org

Figure 4 Importance of the available set of variables in predicting early and long-term outcome. The variable importance metric is based on a permutation approach, where the impact of perturbing the values of a given predictor on a particular performance metric [in this case: area under the curve (AUROC)] is assessed: the larger the decrease in the AUROC metric, the more important a predictor is considered. The variable importance is assessed for each classification algorithm with optimized hyperparameters, and the importance values for each predictor are simply stacked upon each other to illustrate the overall importance of a particular predictor and to visualize the inter-algorithm agreement in the assessment of the importance of a single predictor.

Discussion

Our results highlight the benefits of employing a ML approach in addition to traditional methods such as logistic regression for outcome prediction in prolactinoma patients treated with first-line surgery, in particular in a situation of near-complete variable separation, as is the case here for the primary outcome with the predictor remission 30 days.

In a systematic review featuring 71 studies, no superior performance of ML algorithm compared to logistic regression was found for clinical prediction models (52). In a similar vein, it was demonstrated that logistic regression and ML methods have a similar ability to predict major chronic diseases with low incidences and only simple clinical predictors (53). Against this background, we demonstrate that there was no one-algorithm-fits-all solution in predicting early and long-term outcome in prolactinoma patients treated with first-line surgery: different algorithms performed best for different outcomes and at different times, and there are instances when logistic regression featured similar (or better) performance scores than ML methods (Figure 3A). We thus argue and highlight in this study that by jointly examining two independent performance metrics – the area under the receiver operating characteristic (AUROC) and the Matthews correlation coefficient (MCC) – the discrimination capacity of a set of binary classifiers can be more holistically investigated than by focusing on a single performance metric such as the AUROC. Importantly, with the stacking framework of the super learners (47), ML offers a viable methodology to combine different classifiers. In general, the super learner exhibits a high performance metric compared to individual classifiers. In this regard, ML adds to the current statistical methods when it comes to outcome prediction of first-line surgery in prolactinoma patients.

Our data indicate that baseline serum prolactin levels are by far the most important outcome predictor at early follow-up, whereas remissions at 30 days dominated the importance of long-term dependence on DAs. Initial high serum PRL levels have been associated with recurrence of hyperprolactinemia (54, 55), corroborating our results. Likewise, in a large cohort of prolactinoma patients, Mattogno and colleagues reported that in those with a follow-up of > 5 years, surgery and female gender were independent predictors of control of hyperprolactinemia (17). Just as in women symptoms such as amenorrhea are investigated at an early time-point, subsequent prolactin levels are usually not as high as in men harboring larger adenomas due to unreported or subclinical symptoms of hypogonadism (13, 56, 57).

DAs can be tapered 24 months after initiation of medical therapy in case of normalization of the respective serum PRL values (1). However, early recurrence of hyperprolactinemia has been described (58) following discontinuation of DAs, in particular in patients with macroprolactinomas (14, 5961), or those with adenoma extension into the cavernous sinus (11). In surgical series, recurrences in as many as one-third of patients with prolactinomas have been reported, including late recurrences of more than 10 years (62). In this regard, reporting the number of patients who remain off medication is an important outcome predictor (11, 63), as surgery can be an effective alternative treatment option in selected patients (1113, 64, 65). However, whether surgery of prolactinomas dominates DAs as a first-line approach or a second-line treatment is a matter of debate, with the PRolaCT trial hopefully providing insights on this important issue (16).

This study has inherent limitations. First, the set of available variables and study population size is somewhat limited, suggesting only exploratory findings with regard to the prediction capacity of the models (66). However, the available dataset still represents one of the largest cohorts of patients with a surgery-first approach, reaching a long-term follow-up of almost 10 years, which we think is crucial. In addition, the dataset features missing data in variables, and the (single) imputation approach in the repeated cross-validation might impact the training and test sets and thus the two performance metrics. Second, we consider only a limited set of ML classifiers. Third, computational resources constrained the sampling of the hyperparameter space of each classifier. However, given the robustness of the classifier performance – i.e., consider the similar AUROC and MCC performances in Figure 3 – it seems not very likely that sampling more hyperparameters would have resulted in a fundamental performance increase.

From a clinical point of view, a follow-up period of <24 months in a few patients may have confounded the results of long-term DAs dependence, as our treatment strategy follows current consensus guidelines in tapering DAs 24 months after initiation of the medical therapy in case of normalized serum prolactin levels and/or prolactinoma size reduction of >50%. Thereby, not all patients were subsequently screened with a pituitary MR in case of normoprolactinemia at follow-up. In addition, we cannot exclude that a very small number of prolactinomas diagnosed as prolactinoma were GH co-secreting adenomas or non-secreting adenomas. Finally, not all patients were systematically screened for growth hormone deficiency using validated dynamic testing if there was not a clinical suspicion for significant adult GH-deficiency, and the agreement of the patients to treat the condition by daily injections.

Conclusion

There were benefits in employing a ML approach and of using a set of diverse classification algorithms to predict long-term DA-dependency following first-line surgery in prolactinoma patients. We can confirm that baseline prolactin levels are by far the most important outcome predictor at early follow-up, whereas remission at 30 days dominates the prediction skill for DA- dependence over the long-term.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics Statement

The Cantonal Ethikkommission KEK Bern (Bern, Switzerland) approved the project (KEK no 10-10-2006 and 8-11-2006). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author Contributions

Conception and design: LA, EC, MH. Acquisition of data: JF, LA. Analysis and interpretation of data: MH, LA. Drafting the article: MH, LA. Critically revising the article: MH, ML, EC, LA. Reviewed submitted version of manuscript: all authors. Statistical analysis: MH. Administrative/technical/material support: ML. Study supervision: LA, EC. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The assistance of Ms. Jeannie Wurz in editing the manuscript is greatly appreciated.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2022.810219/full#supplementary-material

Abbreviations

DA, dopamine agonist; MCC, Matthews correlation coefficient; ML, machine learning; MRI, magnetic resonance imaging; PRL, prolactin; TSS, transsphenoidal surgery.

References

1. Colao A, Di Sarno A, Guerra E, Pivonello R, Cappabianca P, Caranci F, et al. Predictors of Remission of Hyperprolactinaemia After Long-Term Withdrawal of Cabergoline Therapy. Clin Endocrinol (2007) 67(3):426–33. doi: 10.1111/j.1365-2265.2007.02905.x

CrossRef Full Text | Google Scholar

2. Kars M, Souverein PC, Herings RMC, Romijn JA, Vandenbroucke JP, de Boer A, et al. Estimated Age- and Sex-Specific Incidence and Prevalence of Dopamine Agonist-Treated Hyperprolactinemia. J Clin Endocrinol Metab (2009) 94(8):2729–34. doi: 10.1210/jc.2009-0177

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Levy A. Pituitary Disease: Presentation, Diagnosis, and Management. J Neurol Neurosurg Psychiatry (2004) 75:47–52. doi: 10.1136/jnnp.2004.045740

CrossRef Full Text | Google Scholar

4. Dekkers OM, Lagro J, Burman P, Jørgensen JO, Romijn JA, Pereira AM. Recurrence of Hyperprolactinemia After Withdrawal of Dopamine Agonists: Systematic Review and Meta-Analysis. J Clin Endocrinol Metab (2010) 95(1):43–51. doi: 10.1210/jc.2009-1238

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Herring N, Szmigielski C, Becher H, Karavitaki N, Wass JAH. Valvular Heart Disease and the Use of Cabergoline for the Treatment of Prolactinoma. Clin Endocrinol (2009) 70(1):104–8. doi: 10.1111/j.1365-2265.2008.03458.x

CrossRef Full Text | Google Scholar

6. Zanettini R, Antonini A, Gatto G, Gentile R, Tesei S, Pezzoli G. Valvular Heart Disease and the Use of Dopamine Agonists for Parkinson’s Disease. New Engl J Med (2007) 356(1):39–46. doi: 10.1056/NEJMoa054830

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Moore TJ, Glenmullen J, Mattison DR. Reports of Pathological Gambling, Hypersexuality, and Compulsive Shopping Associated With Dopamine Receptor Agonist Drugs. JAMA Internal Med (2014) 174(12):1930–3. doi: 10.1001/jamainternmed.2014.5262

CrossRef Full Text | Google Scholar

8. Weiss HD, Pontone GM. Dopamine Receptor Agonist Drugs and Impulse Control Disorders. JAMA Internal Med (2014) 174(12):1935–7. doi: 10.1001/jamainternmed.2014.4097

CrossRef Full Text | Google Scholar

9. Bancos I, Nannenga MR, Bostwick JM, Silber MH, Erickson D, Nippoldt TB. Impulse Control Disorders in Patients With Dopamine Agonist-Treated Prolactinomas and Nonfunctioning Pituitary Adenomas: A Case–Control Study. Clin Endocrinol (2014) 80(6):863–8. doi: 10.1111/cen.12375

CrossRef Full Text | Google Scholar

10. Hinojosa-Amaya JM, Johnson N, González-Torres C, Varlamov EV, Yedinak CG, McCartney S, et al. Depression and Impulsivity Self-Assessment Tools to Identify Dopamine Agonist Side Effects in Patients With Pituitary Adenomas. Front Endocrinol (2020) 11:579606(728). doi: 10.3389/fendo.2020.579606

CrossRef Full Text | Google Scholar

11. Andereggen L, Frey J, Andres RH, Luedi MM, El-Koussy M, Widmer HR, et al. First-Line Surgery in Prolactinomas: Lessons From a Long-Term Follow-Up Study in a Tertiary Referral Center. J Endocrinol Invest (2021) 44(12):2621–33. doi: 10.1007/s40618-021-01569-6

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Andereggen L, Frey J, Andres RH, Luedi MM, Gralla J, Schubert GA, et al. Impact of Primary Medical or Surgical Therapy on Prolactinoma Patients’ BMI and Metabolic Profile Over the Long-Term. J Clin Trans Endocrinol (2021) 24:100258. doi: 10.1016/j.jcte.2021.100258

CrossRef Full Text | Google Scholar

13. Andereggen L, Frey J, Andres RH, Luedi MM, Widmer HR, Beck J, et al. Persistent Bone Impairment Despite Long-Term Control of Hyperprolactinemia and Hypogonadism in Men and Women With Prolactinomas. Sci Rep (2021) 11(1):5122. doi: 10.1038/s41598-021-84606-x

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Andereggen L, Frey J, Christ E. Long-Term IGF-1 Monitoring in Prolactinoma Patients Treated With Cabergoline Might Not be Indicated. Endocrine (2021) 72(1):216–22. doi: 10.1007/s12020-020-02557-1

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Zielinski G, Ozdarski M, Maksymowicz M, Szamotulska K, Witek P. Prolactinomas: Prognostic Factors of Early Remission After Transsphenoidal Surgery. Front Endocrinol (2020) 11:439(439). doi: 10.3389/fendo.2020.00439

CrossRef Full Text | Google Scholar

16. Zandbergen IM, Zamanipoor Najafabadi AH, Pelsma ICM, van den Akker-van Marle ME, Bisschop PHLT, Boogaarts HDJ, et al. The PRolaCT Studies — a Study Protocol for a Combined Randomised Clinical Trial and Observational Cohort Study Design in Prolactinoma. Trials (2021) 22(1):653. doi: 10.1186/s13063-021-05604-y

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Mattogno PP, D’Alessandris QG, Chiloiro S, Bianchi A, Giampietro A, Pontecorvi A, et al. Reappraising the Role of Trans-Sphenoidal Surgery in Prolactin-Secreting Pituitary Tumors. Cancers (2021) 13(13):3252. doi: 10.3390/cancers13133252

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Ma Q, Su J, Li Y, Wang J, Long W, Luo M, et al. The Chance of Permanent Cure for Micro- and Macroprolactinomas, Medication or Surgery? A Systematic Review and Meta-Analysis. Front Endocrinol (2018) 9:636(636). doi: 10.3389/fendo.2018.00636

CrossRef Full Text | Google Scholar

19. Zamanipoor Najafabadi AH, Zandbergen IM, de Vries F, Broersen LHA, van den Akker-van Marle ME, Pereira AM, et al. Surgery as a Viable Alternative First-Line Treatment for Prolactinoma Patients. A Systematic Review and Meta-Analysis. J Clin Endocrinol Metab (2019) 105(3):e32–41. doi: 10.1210/clinem/dgz144

CrossRef Full Text | Google Scholar

20. Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. New Engl J Med (2019) 380(14):1347–58. doi: 10.1056/NEJMra1814259

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Lu P-J, Barakovic M, Weigel M, Rahmanzadeh R, Galbusera R, Schiavi S, et al. GAMER-MRI in Multiple Sclerosis Identifies the Diffusion-Based Microstructural Measures That Are Most Sensitive to Focal Damage: A Deep-Learning-Based Analysis and Clinico-Biological Validation. Front Neurosci (2021) 15:647535(258). doi: 10.3389/fnins.2021.647535

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Khera R, Haimovich J, Hurley NC, McNamara R, Spertus JA, Desai N, et al. Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction. JAMA Cardiol (2021) 6(6):633–41. doi: 10.1001/jamacardio.2021.0122

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Fang Y, Wang H, Feng M, Zhang W, Cao L, Ding C, et al. Machine-Learning Prediction of Postoperative Pituitary Hormonal Outcomes in Nonfunctioning Pituitary Adenomas: A Multicenter Study. Front Endocrinol (1266) 2021:748725. doi: 10.3389/fendo.2021.748725

CrossRef Full Text | Google Scholar

24. Thomasian NM, Kamel IR, Bai HX. Machine Intelligence in Non-Invasive Endocrine Cancer Diagnostics. Nat Rev Endocrinol (2021) 18(2):81–95. doi: 10.1038/s41574-021-00543-9

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Park YW, Eom J, Kim S, Kim H, Ahn SS, Ku CR, et al. Radiomics With Ensemble Machine Learning Predicts Dopamine Agonist Response in Patients With Prolactinoma. J Clin Endocrinol Metab (2021) 106(8):e3069–e77. doi: 10.1210/clinem/dgab159

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Knosp E, Steiner E, Kitz K, Matula C. Pituitary Adenomas With Invasion of the Cavernous Sinus Space: A Magnetic Resonance Imaging Classification Compared With Surgical Findings. Neurosurgery (1993) 33(4):610–7. doi: 10.1227/00006123-199310000-00008

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Micko ASG, Wöhrer A, Wolfsberger S, Knosp E. Invasion of the Cavernous Sinus Space in Pituitary Adenomas: Endoscopic Verification and Its Correlation With an MRI-Based Classification. J Neurosurg JNS (2015) 122(4):803–11. doi: 10.3171/2014.12.JNS141083

CrossRef Full Text | Google Scholar

28. Andereggen L, Gralla J, Schroth G, Mordasini P, Andres RH, Widmer HR, et al. Influence of Inferior Petrosal Sinus Drainage Symmetry on Detection of Adenomas in Cushing’s Syndrome. J Neuroradiol (2021) 48(1):10–5. doi: 10.1016/j.neurad.2019.05.004

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Andereggen L, Hess B, Andres R, El-Koussy M, Mariani L, Raabe A, et al. A Ten-Year Follow-Up Study of Treatment Outcome of Craniopharyngiomas. Swiss Med Weekly (2018) 148:w14521. doi: 10.4414/smw.2018.14521

CrossRef Full Text | Google Scholar

30. Andereggen L, Mariani L, Beck J, Andres RH, Gralla J, Luedi MM, et al. Lateral One-Third Gland Resection in Cushing Patients With Failed Adenoma Identification Leads to Low Remission Rates: Long-Term Observations From a Small, Single-Center Cohort. Acta Neurochir (Wien) (2021) 163(11):3161–9. doi: 10.1007/s00701-021-04830-2

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Karavitaki N, Thanabalasingham G, Shore HCA, Trifanescu R, Ansorge O, Meston N, et al. Do the Limits of Serum Prolactin in Disconnection Hyperprolactinaemia Need Re-Definition? A Study of 226 Patients With Histologically Verified non-Functioning Pituitary Macroadenoma. Clin Endocrinol (2006) 65(4):524–9. doi: 10.1111/j.1365-2265.2006.02627.x

CrossRef Full Text | Google Scholar

32. Cattaneo F, Kappeler D, Müller B. Macroprolactinaemia, the Major Unknown in the Differential Diagnosis of Hyperprolactinaemia. Swiss Med Weekly (2001) 131(9-10):122–6.

Google Scholar

33. Melmed S, Casanueva FF, Hoffman AR, Kleinberg DL, Montori VM, Schlechte JA, et al. Diagnosis and Treatment of Hyperprolactinemia: An Endocrine Society Clinical Practice Guideline. J Clin Endocrinol Metab (2011) 96(2):273–88. doi: 10.1210/jc.2010-1692

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Lopes MBS. The 2017 World Health Organization Classification of Tumors of the Pituitary Gland: A Summary. Acta Neuropathol (2017) 134(4):521–35. doi: 10.1007/s00401-017-1769-8

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Arduc A, Gokay F, Isik S, Ozuguz U, Akbaba G, Tutuncu Y, et al. Retrospective Comparison of Cabergoline and Bromocriptine Effects in Hyperprolactinemia: A Single Center Experience. J Endocrinol Invest (2015) 38(4):447–53. doi: 10.1007/s40618-014-0212-4

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Wass JAH. When to Discontinue Treatment of Prolactinoma? Nat Clin Pract Endocrinol Metab (2006) 2(6):298–9. doi: 10.1038/ncpendmet0162

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Colao A, Di Sarno A, Cappabianca P, Di Somma C, Pivonello R, Lombardi G. Withdrawal of Long-Term Cabergoline Therapy for Tumoral and Nontumoral Hyperprolactinemia. New Engl J Med (2003) 349(21):2023–33. doi: 10.1056/NEJMoa022657

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Breiman L, Cutler A, Liaw A, Wiener M. Randomforest: Breiman and Cutler’s Random Forests for Classification and Regression. (2018). Available at: https://cran.r-project.org/web/packages/randomForest/.

Google Scholar

39. Schliep K, Hechenbichler K. Kknn: Weighted K-Nearest Neighbors. (2016). Available at: https://cran.r-project.org/web/packages/kknn/.

Google Scholar

40. Friedman J, Hastie T, Tibshirani R, Narasimhan B, Tay K, Simon N. Glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. Available at: (2021). https://cran.r-project.org/web/packages/glmnet/.

Google Scholar

41. Ripley B. Nnet: Feed-Forward Neural Networks and Multinomial Log-Linear Models. (2021). https://cran.r-project.org/web/packages/nnet/.

PubMed Abstract | Google Scholar

42. Milborrow S. Earth: Multivariate Adaptive Regression Splines. (2021). https://cran.r-project.org/web/packages/earth/.

Google Scholar

43. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. (2021). Avalable at: https://cran.r-project.org/web/packages/e1071/.

Google Scholar

44. Greenwell B, Boehmke B, Cunningham J, Developers GBM. Gbm: Generalized Boosted Regression Models. (2020). Available at: https://cran.r-project.org/web/packages/gbm/.

Google Scholar

45. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer-Verlag (2009).

Google Scholar

46. Boehmke B, Greenwell BM. Hands-On Machine Learning With R Vol. 488. New York, United States: CRC Press - Taylor & Francis Group (2019).

Google Scholar

47. Laan MJ, EC P, Hubbard AE. Super Learner. Stat Appl Genet Mol Biol (2007) 6(1). doi: 10.2202/1544-6115.1309

CrossRef Full Text | Google Scholar

48. Chicco D, Jurman G. The Advantages of the Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genomics (2020) 21(1):6. doi: 10.1186/s12864-019-6413-7

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Boughorbel S, Jarray F, El-Anbari M. Optimal Classifier for Imbalanced Data Using Matthews Correlation Coefficient Metric. PloS One (2017) 12(6):e0177678. doi: 10.1371/journal.pone.0177678

PubMed Abstract | CrossRef Full Text | Google Scholar

50. R Core Team. R: A Language and Environment for Statistical Computing. Vienna Austria (2020). Avalaile at: https://www.R-project.org/.

Google Scholar

51. Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, et al. Mlr: Machine Learning in R. J Mach Learn Res (2016) 17(170):1–5.

Google Scholar

52. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A Systematic Review Shows No Performance Benefit of Machine Learning Over Logistic Regression for Clinical Prediction Models. J Clin Epidemiol (2019) 110:12–22. doi: 10.1016/j.jclinepi.2019.02.004

PubMed Abstract | CrossRef Full Text | Google Scholar

53. Nusinovici S, Tham YC, Chak Yan MY, Wei Ting DS, Li J, Sabanayagam C, et al. Logistic Regression was as Good as Machine Learning for Predicting Major Chronic Diseases. J Clin Epidemiol (2020) 122:56–69. doi: 10.1016/j.jclinepi.2020.03.002

PubMed Abstract | CrossRef Full Text | Google Scholar

54. Teixeira M, Souteiro P, Carvalho D. Prolactinoma Management: Predictors of Remission and Recurrence After Dopamine Agonists Withdrawal. Pituitary (2017) 20(4):464–70. doi: 10.1007/s11102-017-0806-x

PubMed Abstract | CrossRef Full Text | Google Scholar

55. Dogansen SC, Selcukbiricik OS, Tanrikulu S, Yarman S. Withdrawal of Dopamine Agonist Therapy in Prolactinomas: In Which Patients and When? Pituitary (2016) 19(3):303–10. doi: 10.1007/s11102-016-0708-3

PubMed Abstract | CrossRef Full Text | Google Scholar

56. Daly AF, Rixhon M, Adam C, Dempegioti A, Tichomirowa MA, Beckers A. High Prevalence of Pituitary Adenomas: A Cross-Sectional Study in the Province of Liège, Belgium. J Clin Endocrinol Metab (2006) 91(12):4769–75. doi: 10.1210/jc.2006-1668

PubMed Abstract | CrossRef Full Text | Google Scholar

57. Wu ZB, Su ZP, Wu JS, Zheng WM, Zhuge QC, Zhong M. Five Years Follow-Up of Invasive Prolactinomas With Special Reference to the Control of Cavernous Sinus Invasion. Pituitary (2008) 11(1):63–70. doi: 10.1007/s11102-007-0072-4

PubMed Abstract | CrossRef Full Text | Google Scholar

58. Hu J, Zheng X, Zhang W, Yang H. Current Drug Withdrawal Strategy in Prolactinoma Patients Treated With Cabergoline: A Systematic Review and Meta-Analysis. Pituitary (2015) 18(5):745–51. doi: 10.1007/s11102-014-0617-2

PubMed Abstract | CrossRef Full Text | Google Scholar

59. Kwancharoen R, Auriemma RS, Yenokyan G, Wand GS, Colao A, Salvatori R. Second Attempt to Withdraw Cabergoline in Prolactinomas: A Pilot Study. Pituitary (2014) 17(5):451–6. doi: 10.1007/s11102-013-0525-x

PubMed Abstract | CrossRef Full Text | Google Scholar

60. Xia MY, Lou XH, Lin SJ, Wu ZB. Optimal Timing of Dopamine Agonist Withdrawal in Patients With Hyperprolactinemia: A Systematic Review and Meta-Analysis. Endocrine (2018) 59(1):50–61. doi: 10.1007/s12020-017-1444-9

PubMed Abstract | CrossRef Full Text | Google Scholar

61. Sala E, Bellaviti Buttoni P, Malchiodi E, Verrua E, Carosi G, Profka E, et al. Recurrence of Hyperprolactinemia Following Dopamine Agonist Withdrawal and Possible Predictive Factors of Recurrence in Prolactinomas. J Endocrinol Invest (2016) 39(12):1377–82. doi: 10.1007/s40618-016-0483-z

PubMed Abstract | CrossRef Full Text | Google Scholar

62. Primeau V, Raftopoulos C, Maiter D. Outcomes of Transsphenoidal Surgery in Prolactinomas: Improvement of Hormonal Control in Dopamine Agonist-Resistant Patients. Eur J Endocrinol (2012) 166(5):779–86. doi: 10.1530/EJE-11-1000

PubMed Abstract | CrossRef Full Text | Google Scholar

63. Andereggen L, Christ E. Commentary: “Prolactinomas: Prognostic Factors of Early Remission After Transsphenoidal Surgery”. Front Endocrinol (2021) 12:695498(559). doi: 10.3389/fendo.2021.695498

CrossRef Full Text | Google Scholar

64. Donegan D, Atkinson JD, Jentoft M, Natt N, Nippoldt TB, Erickson B, et al. Surgical Outcomes of Prolactinomas in Recent Era: Results of a Heterogenous Group. Endocrine Pract (2017) 23(1):37–45. doi: 10.4158/EP161446.OR

CrossRef Full Text | Google Scholar

65. Andereggen L, Frey J, Andres RH, El-Koussy M, Beck J, Seiler RW, et al. 10-Year Follow-Up Study Comparing Primary Medical vs. Surgical Therapy in Women With Prolactinomas. Endocrine (2017) 55(1):223–30. doi: 10.1007/s12020-016-1115-2

PubMed Abstract | CrossRef Full Text | Google Scholar

66. Steyerberg EW. Validation in Prediction Research: The Waste by Data Splitting. J Clin Epidemiol (2018) 103:131–3. doi: 10.1016/j.jclinepi.2018.07.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: dopamine agonists, long-term outcome, machine learning, primary surgical therapy, prolactinoma, prediction modeling

Citation: Huber M, Luedi MM, Schubert GA, Musahl C, Tortora A, Frey J, Beck J, Mariani L, Christ E and Andereggen L (2022) Machine Learning for Outcome Prediction in First-Line Surgery of Prolactinomas. Front. Endocrinol. 13:810219. doi: 10.3389/fendo.2022.810219

Received: 06 November 2021; Accepted: 17 January 2022;
Published: 16 February 2022.

Edited by:

Francesco Doglietto, University of Brescia, Italy

Reviewed by:

Giuseppe Jurman, Bruno Kessler Foundation (FBK), Italy
Atanaska Petrova Elenkova, Medical University-Sofia, Bulgaria
Andrea Glezer, University of São Paulo, Brazil

Copyright © 2022 Huber, Luedi, Schubert, Musahl, Tortora, Frey, Beck, Mariani, Christ and Andereggen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lukas Andereggen, bHVrYXMuYW5kZXJlZ2dlbkBrc2EuY2g=; orcid.org/0000-0003-1764-688X

These authors have contributed equally to this work and share last authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.