Skip to main content

ORIGINAL RESEARCH article

Front. Genet.
Sec. Computational Genomics
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1491602

BIMSSA: Enhancing Cancer Prediction with Salp Swarm Optimization and Ensemble Machine Learning Approaches

Provisionally accepted
  • 1 C.V. Raman Global University, Bhubaneswar, Odisha, India
  • 2 Siksha O Anusandhan University, Bhubaneswar, India
  • 3 Vardhaman College of Engineering, Hyderabad, Telangana, India
  • 4 Hainan University, Haikou, Hainan Province, China
  • 5 Coventry University, Coventry, West Midlands, United Kingdom
  • 6 Parul University, Waghodia, Gujarat, India

The final, formatted version of the article will be published soon.

    Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient's genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential. Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work's ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR are applied to extract relevant gene expression aspects. Then, SSA is implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, are applied as the base learners. Then, majority voting is used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA is evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT). In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer dataset, according to the empirical evaluations. The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.

    Keywords: Cancer prediction, Microarray data, Feature Selection, swarm intelligence, Ensemble Learning [13] SSA, SVM Breast Cancer Accuracy: 98.75 Bladder Accuracy: 100 Colon Cancer Accuracy: 99.75 [14] SSA, KNN Breast Cancer Accuracy: 97.08 Lung Cancer Accuracy: 60 Breast EW Accuracy: 97.08 [15] ISSA, KNN Breast Cancer Accuracy: 95.70 Lung Cancer Accuracy: 59.78 Breast EW Accuracy: 96.10 DLBCL Accuracy: 92.7 Leukemia Accuracy: 92.9 Lung Accuracy: 98.8 SRBCT Accuracy: 93.6 [16] KNN

    Received: 06 Oct 2024; Accepted: 11 Dec 2024.

    Copyright: © 2024 Panda, Bisoy, Panigrahi, Pati, Sahu, Guo, Liu and Jain. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Zheshan Guo, Hainan University, Haikou, 570228, Hainan Province, China
    Prince Jain, Parul University, Waghodia, Gujarat, India

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.