AUTHOR=Ke Huabing , Gong Sunling , He Jianjun , Zhang Lei , Mo Jingyue TITLE=A hybrid XGBoost-SMOTE model for optimization of operational air quality numerical model forecasts JOURNAL=Frontiers in Environmental Science VOLUME=10 YEAR=2022 URL=https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2022.1007530 DOI=10.3389/fenvs.2022.1007530 ISSN=2296-665X ABSTRACT=

As a main technical tool, the air quality numerical model is widely used in the forecasts of atmospheric pollutants, and its development is of great significance to the atmospheric environment and human health. In this study, a hybrid XGBoost-SMOTE model has been developed and applied for the optimization of forecasted PM2.5 and O3 concentrations from the Chinese operational air quality forecasting model - CMA Unified Atmospheric Chemistry Environment model (CUACE), which automatically finds the optimal hyperparameters and features without human intervention. Supported by a knowledge base including the ground-observed, CUACE-forecasted pollutants and meteorological data as well as some auxiliary variables, and based on the evaluation analysis of 46 selected key national cities, it was found that the XGBoost-SMOTE model can achieve satisfactory optimization effects for the operational model, especially the significant improvement of the pollutant extreme values on high-pollution days. The results show that after optimization, the 5-day average correlation coefficient (R), mean error (ME) and root mean square error (RMSE) values can reach 0.87, 10.34 µg/m3 and 16.53 µg/m3 for PM25, and 0.89, 14.53 µg/m3 and 18.83 µg/m3 for O3, far better than those from original CUACE model and XGBoost model. Furthermore, the optimization of the spatial distribution of pollutants from the CUACE model and the impact analysis of the input features by the SHAP method were also explored. The developed hybrid model unveils a good application prospect in the field of environmental meteorology forecasts.