AUTHOR=Sauzède Raphaëlle , Bittig Henry C. , Claustre Hervé , Pasqueron de Fommervault Orens , Gattuso Jean-Pierre , Legendre Louis , Johnson Kenneth S. TITLE=Estimates of Water-Column Nutrient Concentrations and Carbonate System Parameters in the Global Ocean: A Novel Approach Based on Neural Networks JOURNAL=Frontiers in Marine Science VOLUME=4 YEAR=2017 URL=https://www.frontiersin.org/journals/marine-science/articles/10.3389/fmars.2017.00128 DOI=10.3389/fmars.2017.00128 ISSN=2296-7745 ABSTRACT=

A neural network-based method (CANYON: CArbonate system and Nutrients concentration from hYdrological properties and Oxygen using a Neural-network) was developed to estimate water-column (i.e., from surface to 8,000 m depth) biogeochemically relevant variables in the Global Ocean. These are the concentrations of three nutrients [nitrate (NO3), phosphate (PO43−), and silicate (Si(OH)4)] and four carbonate system parameters [total alkalinity (AT), dissolved inorganic carbon (CT), pH (pHT), and partial pressure of CO2 (pCO2)], which are estimated from concurrent in situ measurements of temperature, salinity, hydrostatic pressure, and oxygen (O2) together with sampling latitude, longitude, and date. Seven neural-networks were developed using the GLODAPv2 database, which is largely representative of the diversity of open-ocean conditions, hence making CANYON potentially applicable to most oceanic environments. For each variable, CANYON was trained using 80 % randomly chosen data from the whole database (after eight 10° × 10° zones removed providing an “independent data-set” for additional validation), the remaining 20 % data were used for the neural-network test of validation. Overall, CANYON retrieved the variables with high accuracies (RMSE): 1.04 μmol kg−1 (NO3), 0.074 μmol kg−1 (PO43−), 3.2 μmol kg−1 (Si(OH)4), 0.020 (pHT), 9 μmol kg−1 (AT), 11 μmol kg−1 (CT) and 7.6 % (pCO2) (30 μatm at 400 μatm). This was confirmed for the eight independent zones not included in the training process. CANYON was also applied to the Hawaiian Time Series site to produce a 22 years long simulated time series for the above seven variables. Comparison of modeled and measured data was also very satisfactory (RMSE in the order of magnitude of RMSE from validation test). CANYON is thus a promising method to derive distributions of key biogeochemical variables. It could be used for a variety of global and regional applications ranging from data quality control to the production of datasets of variables required for initialization and validation of biogeochemical models that are difficult to obtain. In particular, combining the increased coverage of the global Biogeochemical-Argo program, where O2 is one of the core variables now very accurately measured, with the CANYON approach offers the fascinating perspective of obtaining large-scale estimates of key biogeochemical variables with unprecedented spatial and temporal resolutions. The Matlab and R codes of the proposed algorithms are provided as Supplementary Material.