Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Machine Learning and Artificial Intelligence
Volume 7 - 2024 | doi: 10.3389/frai.2024.1446368

A Unified Foot and Mouth Disease Dataset for Uganda: Evaluating Machine Learning Predictive Performance Degradation under Varying Distributions

Provisionally accepted
Geofrey Kapalaga Geofrey Kapalaga 1*Florence N. Kivunike Florence N. Kivunike 1Susan D. Kerfua Susan D. Kerfua 2Daudi Jjingo Daudi Jjingo 1Savino Biryomumaisho Savino Biryomumaisho 3Justus Rutaisire Justus Rutaisire 2PAUL SSAJJAKAMBWE PAUL SSAJJAKAMBWE 2Swidiq Mugerwa Swidiq Mugerwa 2Yusuf Kiwala Yusuf Kiwala 4
  • 1 College of Computing and Information Science, Makerere University, Kampala, Uganda
  • 2 National Livestock Resources Research Institute, Tororo, Uganda
  • 3 College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, Kampala, Uganda
  • 4 College of Business and Management Sciences (CoBAMS), Makerere University, Kampala, Uganda

The final, formatted version of the article will be published soon.

    In Uganda, the absence of a unified dataset for constructing machine learning models to predict Foot and Mouth Disease outbreaks hinders preparedness. Although machine learning models exhibit excellent predictive performance for Foot and Mouth Disease outbreaks under stationary conditions, they are susceptible to performance degradation in non-stationary environments. Rainfall and temperature are key factors influencing these outbreaks, and their variability due to climate change can significantly impact predictive performance. This study created a unified Foot and Mouth Disease dataset by integrating disparate sources and pre-processing data using mean imputation, duplicate removal, visualization, and merging techniques. To evaluate performance degradation, seven machine learning models were trained and assessed using metrics including accuracy, area under the receiver operating characteristic curve, recall, precision and F1-score. The dataset showed a significant class imbalance with more nonoutbreaks than outbreaks, requiring data augmentation methods. Variability in rainfall and temperature impacted predictive performance, causing notable degradation. Random Forest with borderline SMOTE was the topperforming model in a stationary environment, achieving 92% accuracy, 0.97 area under the receiver operating characteristic curve, 0.94 recall, 0.90 precision, and 0.92 F1-score. However, under varying distributions, all models exhibited significant performance degradation, with random forest accuracy dropping to 46%, area under the receiver operating characteristic curve to 0.58, recall to 0.03, precision to 0.24, and F1-score to 0.06. This study underscores the creation of a unified Foot and Mouth Disease dataset for Uganda and reveals significant performance degradation in seven machine learning models under varying distributions. These findings highlight the need for new methods to address the impact of distribution variability on predictive performance.

    Keywords: Foot and mouth disease, machine learning, Distribution shifts, performance degradation rates, Class imbalance

    Received: 09 Jun 2024; Accepted: 09 Jul 2024.

    Copyright: © 2024 Kapalaga, Kivunike, Kerfua, Jjingo, Biryomumaisho, Rutaisire, SSAJJAKAMBWE, Mugerwa and Kiwala. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Geofrey Kapalaga, College of Computing and Information Science, Makerere University, Kampala, Uganda

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.