Skip to main content

METHODS article

Front. Digit. Health
Sec. Health Informatics
Volume 7 - 2025 | doi: 10.3389/fdgth.2025.1495879

A Novel Machine Learning Methodology for the Systematic Extraction of Chronic Kidney Disease Comorbidities from Abstracts

Provisionally accepted
  • 1 University of Pécs, Pécs, Hungary
  • 2 Royal College of Surgeons in Ireland, Dublin, County Dublin, Ireland

The final, formatted version of the article will be published soon.

    Background: Chronic Kidney Disease (CKD) is a global health concern and is frequently underdiagnosed due to its subtle initial symptoms, contributing to increasing morbidity and mortality. A comprehensive understanding of CKD comorbidities could lead to the identification of risk-groups, more effective treatment and improved patient outcomes. Our research presents a two-fold objective: developing an effective machine learning (ML) workflow for text classification and entity relation extraction and assembling a broad list of diseases influencing CKD development and progression. Methods: We analysed 39,680 abstracts with CKD in the title from the Embase library. Abstracts about a disease affecting CKD development and/or progression were selected by multiple ML classifiers trained on a human-labelled sample. The best classifier was further trained with active learning. Disease names in question were extracted from the selected abstracts using a novel entity relation extraction methodology. The resulting disease list and their corresponding abstracts were manually checked and a final disease list was created. Findings: The SVM model gave the best results and was chosen for further training with active learning. This optimised ML workflow enabled us to discern 68 comorbidities across 15 ICD-10 disease groups contributing to CKD progression or development. The reading of the ML-selected abstracts showed that some diseases have direct causal effect on CKD, while others, like schizophrenia, has indirect causal effect on CKD. Interpretation: These findings have the potential to guide future CKD investigations, by facilitating the inclusion of a broader array of comorbidities in CKD prognostic models. Ultimately, our study enhances understanding of prognostic comorbidities and supports clinical practice by enabling improved patient monitoring, preventive strategies, and early detection for individuals at higher CKD development or progression risk.

    Keywords: Chronic Kidney Disease, comorbidities, Systematic Literature Review, machine learning, Active Learning, named entity recognition, Entity Relation

    Received: 17 Sep 2024; Accepted: 21 Jan 2025.

    Copyright: © 2025 Saghy, Elsharkawy, Moriarty, Kovács, Wittmann and Zemplenyi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Eszter Saghy, University of Pécs, Pécs, Hungary

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.