AUTHOR=Ren Huiying , Hou Zhangshuan , Duan Zhuoran , Song Xuehang , Perkins William A. , Richmond Marshall C. , Arntzen Evan V. , Scheibe Timothy D. TITLE=Spatial Mapping of Riverbed Grain-Size Distribution Using Machine Learning JOURNAL=Frontiers in Water VOLUME=2 YEAR=2020 URL=https://www.frontiersin.org/journals/water/articles/10.3389/frwa.2020.551627 DOI=10.3389/frwa.2020.551627 ISSN=2624-9375 ABSTRACT=
Recent alluvial sediments in riverbeds play a significant role in controlling hydrologic exchange flows (HEFs) in river systems. The alluvial layer is usually associated with strong heterogeneity in physical properties (e.g., permeability and hydraulic conductivity), which affects local HEFs and therefore biogeochemical processes. The spatial distribution of these physical properties needs to be determined to inform the numerical models used to reveal the realistic hydro-biogeochemical behaviors. Such information can be obtained based on the intrinsic link between sediment grain-size distribution and hydraulic properties where sediment texture information is available. However, grain-size measurements are usually spatially sparse and do not have adequate coverage and resolution, particularly for a relatively large domain such as the Hanford Reach of the Columbia River. In this paper, we adopted machine learning (ML) approaches for categorizing and mapping the spatial distributions of riverbed substrate grain size and filling in missing areas of substrate data using the ML models along the reach. Such ML models for substrate size mapping were trained at 13,372 locations using measured substrate sizes along with observed and simulated attributes, including bathymetric attributes (e.g., elevation, slope, and aspect ratio) from LIDAR and bathymetric surveys, and hydrodynamic properties (e.g., water depth, velocity, shear stress, and their statistical moments). An ensemble bagging-based ML technique, Random Forest, was adopted to identify the most influential factors as predictors to develop the predictive models with over-fitting issues addressed. The models were evaluated with respect to each individual substrate size class and the lumped group, and then used to generate the final substrate size maps covering all the grid cells in the numerical modeling domain.