AUTHOR=He Zheyu , Yang Yuanjian , Fang Runzhuo , Zhou Shaohui , Zhao Wenchuan , Bai Yingjie , Li Junsheng , Wang Bo TITLE=Integration of shapley additive explanations with random forest model for quantitative precipitation estimation of mesoscale convective systems JOURNAL=Frontiers in Environmental Science VOLUME=10 YEAR=2023 URL=https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2022.1057081 DOI=10.3389/fenvs.2022.1057081 ISSN=2296-665X ABSTRACT=
Mesoscale convective cloud systems have a small horizontal scale and a short lifetime, which brings great challenges to quantitative precipitation estimation (QPE) by satellite remote sensing. Combining machine learning models and geostationary satellite spectral information is an effective method for the QPE of mesoscale convective cloud, while the interpretability of machine learning model outputs remains unclear. In this study, based on Himawari-8 data, high-density automatic weather station observations, and reanalysis data over the North China Plain, a random forest (RF) machine learning model of satellite-based QPE was established and verified. The interpretation of the output of the RF model of satellite-based QPE was further explored by using the Shapley Additive Explanations (SHAP) algorithm. Results showed that the correlation coefficient between the predicted and observed precipitation intensity of the RF model was .64, with a root-mean-square error of .27 mm/h. The importance ranking obtained by SHAP model is completely consistent with the outputs of random forest importance function. This SHAP method can display the importance ranking of global features with positive/negative contribution values (e.g., current precipitation, column water vapor/black body temperature, cloud base height), and can visualize the marginal contribution values of local features under interaction. Therefore, combining the RF and SHAP methods provides a valuable way to interpret the output of machine learning models for satellite-based QPE, as well as an important basis for the selection of input variables for satellite-based QPE.