The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Mar. Sci.
Sec. Physical Oceanography
Volume 12 - 2025 |
doi: 10.3389/fmars.2025.1514746
Metrics Matters: A Deep Assessment of Deep Learning CNN Method for ENSO Forecast
Provisionally accepted- 1 University of Zanjan, Zanjan, Zanjan, Iran
- 2 Shahid Chamran University of Ahvaz, Ahvaz, Khuzestan, Iran
- 3 Shiraz University, Shiraz, Fars, Iran
Recent advancements in machine learning (ML) have significantly propelled the field, enhancing performance across diverse applications, including climate science and physical oceanography. However, the critical importance of reproducibility and the recreation of these methods cannot be overstated, as they ensure the reliability and trustworthiness of results. In this study, we thoroughly reproduced the state-of-the-art CNN method originally published by Ham et al. in Nature for predicting the El Niño-Southern Oscillation (ENSO), meticulously documenting the entire process from the creation of oceans' anomaly maps to the post-processing of outputs. While the transformation of raw data from repositories into input anomaly maps is complex and inadequately described in the original article, we developed these maps from scratch to guarantee their validity and to provide a clear methodology for other researchers to replicate. Each component of the CNN model was constructed from the ground up on the Google Colaboratory platform, aiming for holistic reproducibility and facilitating accessibility for fellow researchers. Our comprehensive analysis of the CNN method, evaluated through six distinct metrics, revealed that each metric illuminates different facets of model performance, highlighting a notable disparity in CNN accuracy between the 20th and 21st centuries, which may indicate an overfitting issue; thus, we recommend employing generalization techniques to reconcile prediction accuracy between these two periods. Although the correlation metric suggested that the CNN method and the advanced 3D-Geoformer have similar skill levels, error-based metrics revealed that the 3D-Geoformer outperformed the CNN, which has an RMSE five times higher than that of the 3D-Geoformer for short-term predictions. This highlights the importance of using diverse metrics for a thorough evaluation of machine learning models. We identified latent climatologies within the input data that may falsely improve future predictions, a phenomenon also observed in other recent methodologies. Moreover, we noted a common misapplication of historical CMIP runs for training machine learning models, as scientists frequently use data that overlaps with validation datasets, potentially leading to a misleading increase in model skill by as much as 6%. Finally, we achieved a 10 percent reduction in model execution time on Google Colaboratory without compromising accuracy.
Keywords: Metric, CNN, deep learning, assessment, ENSO (EI Nino suthern oscillation) prediction, forecast, training mistakes
Received: 21 Oct 2024; Accepted: 21 Jan 2025.
Copyright: © 2025 Naisipour, Saeedpanah, Adib and Ganji. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Iraj Saeedpanah, University of Zanjan, Zanjan, 38791-45371, Zanjan, Iran
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.