Skip to main content

ORIGINAL RESEARCH article

Front. Surg.
Sec. Visceral Surgery
Volume 12 - 2025 | doi: 10.3389/fsurg.2025.1523684
This article is part of the Research Topic Exploring Machine Learning Applications in Visceral Surgery View all 5 articles

Random Forests Algorithm Using Basic Medical Data for Predicting the Presence of Colonic Polyps

Provisionally accepted
Mihaela-Flavia Avram Mihaela-Flavia Avram 1*Nicolae Lupa Nicolae Lupa 2Dimitrios Koukoulas Dimitrios Koukoulas 3Daniela Cornelia Lazar Daniela Cornelia Lazar 1Mihaela Ioana Maris Mihaela Ioana Maris 1Marius Sorin Murariu Marius Sorin Murariu 1Sorin Olariu Sorin Olariu 1
  • 1 Victor Babes University of Medicine and Pharmacy, Timisoara, Romania
  • 2 Politehnica University of Timișoara, Timișoara, Romania
  • 3 Department of Gastroenterology, Municipal Hospital “Dr. Teodor Andrei” Lugoj, Romania, Lugoj, Romania

The final, formatted version of the article will be published soon.

    Background: Colorectal cancer is considered to be triggered by the malignant transformation of colorectal polyps. Early diagnosis and excision of colorectal polyps has been found to lower the mortality and morbidity associated with colorectal cancer.The aim of this study is to offer a predictive model for the presence of colorectal polyps based on Random Forests machine learning algorithm, using basic patient information and common laboratory test results.Materials and methods: 164 patients were included in the study. The following data was collected: sex, residence, age, diabetes mellitus, body mass index, fasting blood glucose levels, hemoglobin, platelets, total, LDL and HLD cholesterol, triglycerides, serum glutamic-oxaloacetic transaminase, chronic gastritis, presence of colonic polyps at colonoscopy. 80% of patients were included in the training set for creating a Random forests algorithm, 20% were in the test set. External validation was performed on data from 42 patients. The performance of the Random Forests was compared with the performance of a generalized linear model (GLM) and support vector machine (SVM) built and tested on the same datasets.The Random Forest prediction model gave an AUC of 0.820 on the test set. The top five variables in order of importance were: body mass index, platelets, hemoglobin, triglycerides, glutamic-oxaloacetic transaminase. For external validation, the AUC was 0.79. GLM performance in internal validation was an AUC of 0.788, while for external validation AUC-0.65. For SVN, the AUC -0.785 for internal validation and 0.685 for the external validation dataset.A random forest prediction model was developed using patient's demographic data, medical history and common blood tests results. This algorithm can foresee, with good predictive power, the presence of colonic polyps.

    Keywords: colorectal polyps, Random forests, machine learning, Colorectal cancer prevention, Risk prediction model

    Received: 06 Nov 2024; Accepted: 10 Feb 2025.

    Copyright: © 2025 Avram, Lupa, Koukoulas, Lazar, Maris, Murariu and Olariu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Mihaela-Flavia Avram, Victor Babes University of Medicine and Pharmacy, Timisoara, Romania

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.