AUTHOR=Sen Anindya , Stevens Nathaniel T. , Tran N. Ken , Agarwal Rishav R. , Zhang Qihuang , Dubin Joel A. 

TITLE=Forecasting daily COVID-19 cases with gradient boosted regression trees and other methods: evidence from U.S. cities

JOURNAL=Frontiers in Public Health

VOLUME=Volume 11 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1259410

DOI=10.3389/fpubh.2023.1259410

ISSN=2296-2565

ABSTRACT=There is a vast literature on the performance of different short-term forecasting models for country specific COVID-19 cases, but much less research with respect to city level cases. This paper employs daily case counts for 25 Metropolitan Statistical Areas (MSAs) in the U.S. to evaluate the efficacy of Gradient Boosted Regression Trees (GBRT) and other methods in generating daily predictions for November 2020 -March 2021. This exercise is important given the inaccuracy of forecasts from Susceptible, Infectious, or Recovered (SIR) models found by previous studies. Consistent with other research that have employed Machine Learning (ML) based methods, we find that Median Absolute Percentage Error (MAPE) values for both 7-day ahead and 28-day ahead predictions from GBRTs are lower than corresponding values from SIR, Linear Mixed Effects (LME), and Seasonal Autoregressive Integrated Moving Average (SARIMA) specifications for the majority of MSAs during November-December 2020 and January 2021. GBRT and SARIMA models do not offer highquality predictions for February 2021. However, SARIMA generated MAPE values for 28-day ahead predictions are slightly lower than corresponding GBRT estimates for March 2021. The results of this research demonstrate that basic ML models can lead to relatively accurate forecasts at the local level, which is important for resource allocation decisions and epidemiological surveillance by policymakers.