AUTHOR=Li Chunhui , Zarzycki Piotr TITLE=A computational pipeline to generate a synthetic dataset of metal ion sorption to oxides for AI/ML exploration JOURNAL=Frontiers in Nuclear Engineering VOLUME=1 YEAR=2022 URL=https://www.frontiersin.org/journals/nuclear-engineering/articles/10.3389/fnuen.2022.977743 DOI=10.3389/fnuen.2022.977743 ISSN=2813-3412 ABSTRACT=

The charged mineral/electrolyte interfaces are ubiquitous in the surface and subsurface–including the surroundings of the geological disposal sites for radioactive waste. Therefore, understanding how ions interact with charged surfaces is critically important for predicting radionuclide mobility in the case of waste leakage. At present, the Surface Complexation Models (SCMs) are the most successful thermodynamic frameworks to describe ion retention by mineral surfaces. SCMs are interfacial speciation models that account for the effect of the electric field generated by charged surfaces on sorption equilibria. These models have been successfully used to analyze and interpret a broad range of experimental observations including potentiometric and electrokinetic titrations or spectroscopy. Unfortunately, many of the current procedures to solve and fit SCM to experimental data are not optimal, which leads to a non-transferable or non-unique description of interfacial electrostatics and consequently of the strength and extent of ion retention by mineral surfaces. Recent developments in Artificial Intelligence (AI) offer a new avenue to replace SCM solvers and fitting algorithms with trained AI surrogates. Unfortunately, there is a lack of a standardized dataset covering a wide range of SCM parameter values available for AI exploration and training–a gap filled by this study. Here, we described the computational pipeline to generate synthetic SCM data and discussed approaches to transform this dataset into AI-learnable input. First, we used this pipeline to generate a synthetic dataset of electrostatic properties for a broad range of the prototypical oxide/electrolyte interfaces. The next step is to extend this dataset to include complex radionuclide sorption and complexation, and finally, to provide trained AI architectures able to infer SCMs parameter values rapidly from experimental data. Here, we illustrated the AI-surrogate development using the ensemble learning algorithms, such as Random Forest and Gradient Boosting. These surrogate models allow a rapid prediction of the SCM model parameters, do not rely on an initial guess, and guarantee convergence in all cases.