AUTHOR=Ju Aobo , Wang Hu , Wang Lequan , Weng Yuang
TITLE=Application of machine learning algorithms for prediction of ultraviolet absorption spectra of chromophoric dissolved organic matter (CDOM) in seawater
JOURNAL=Frontiers in Marine Science
VOLUME=10
YEAR=2023
URL=https://www.frontiersin.org/journals/marine-science/articles/10.3389/fmars.2023.1065123
DOI=10.3389/fmars.2023.1065123
ISSN=2296-7745
ABSTRACT=
The ultraviolet absorption spectra of chromophoric dissolved organic matter (CDOM) can be used to trace its sources and to explore the dynamic of the CDOM pool. In previous studies, only the spectra above 240 nm can be used directly to characterize the CDOM in seawater, due to the overlapping of CDOM absorption spectra below 240 nm with inorganic chemicals such as NO3−, NO2−, Cl- and Br-. In this study, three different machine learning models, back propagation neural network (BPNN), random forest (RF) and extreme gradient boosting (XGBoost), were built to predict the CDOM ultraviolet absorption spectra between 215 and 350 nm after being trained with the raw absorption spectra of seawater. The optimal input wavelength range of the raw seawater spectra is 250-350 nm, and the optimal model parameters of machine learning algorithms were determined by using five-fold cross validation. The results show that the three models can well predict the CDOM absorption spectra. Comparatively, the XGBoost model gave the best prediction results. The reasons might be related to the fact that the XGBoost algorithm focuses on the residuals generated by the last iteration, which can reduce both variance and bias, especially for datasets with small sample sizes. Based on the predicted spectra by XGBoost algorithm, we calculated the spectra slopes of short wavelengths between 215 and 240 nm (S215-240) and between 215 and 275 nm (S215-275). The results show that the S215-240 and S215-275 are ~2 times the widely used spectra slopes between 275 and 295 nm (S275-295) obtained by traditional method based on the raw spectra. Moreover, the S215-240 and S215-275 are more relavant with salinity for marine CDOM than S275-295, suggesting spectra slopes of shorter wavelengths might be the better proxies for marine CDOM than that of longer wavelengths.