Machine learning is a promising tool in the area of suicide prevention due to its ability to combine the effects of multiple risk factors and complex interactions. The power of machine learning has led to an influx of studies on suicide prediction, as well as a few recent reviews. Our study distinguished between data sources and reported the most important predictors of suicide outcomes identified in the literature.
Our study aimed to identify studies that applied machine learning techniques to administrative and survey data, summarize performance metrics reported in those studies, and enumerate the important risk factors of suicidal thoughts and behaviors identified.
A systematic literature search of PubMed, Medline, Embase, PsycINFO, Web of Science, Cumulative Index to Nursing and Allied Health Literature (CINAHL), and Allied and Complementary Medicine Database (AMED) to identify all studies that have used machine learning to predict suicidal thoughts and behaviors using administrative and survey data was performed. The search was conducted for articles published between January 1, 2019 and May 11, 2022. In addition, all articles identified in three recently published systematic reviews (the last of which included studies up until January 1, 2019) were retained if they met our inclusion criteria. The predictive power of machine learning methods in predicting suicidal thoughts and behaviors was explored using box plots to summarize the distribution of the area under the receiver operating characteristic curve (AUC) values by machine learning method and suicide outcome (i.e., suicidal thoughts, suicide attempt, and death by suicide). Mean AUCs with 95% confidence intervals (CIs) were computed for each suicide outcome by study design, data source, total sample size, sample size of cases, and machine learning methods employed. The most important risk factors were listed.
The search strategy identified 2,200 unique records, of which 104 articles met the inclusion criteria. Machine learning algorithms achieved good prediction of suicidal thoughts and behaviors (i.e., an AUC between 0.80 and 0.89); however, their predictive power appears to differ across suicide outcomes. The boosting algorithms achieved good prediction of suicidal thoughts, death by suicide, and all suicide outcomes combined, while neural network algorithms achieved good prediction of suicide attempts. The risk factors for suicidal thoughts and behaviors differed depending on the data source and the population under study.
The predictive utility of machine learning for suicidal thoughts and behaviors largely depends on the approach used. The findings of the current review should prove helpful in preparing future machine learning models using administrative and survey data.