AUTHOR=Contractor Steefan , Roughan Moninya TITLE=Efficacy of Feedforward and LSTM Neural Networks at Predicting and Gap Filling Coastal Ocean Timeseries: Oxygen, Nutrients, and Temperature JOURNAL=Frontiers in Marine Science VOLUME=8 YEAR=2021 URL=https://www.frontiersin.org/journals/marine-science/articles/10.3389/fmars.2021.637759 DOI=10.3389/fmars.2021.637759 ISSN=2296-7745 ABSTRACT=
Ocean data timeseries are vital for a diverse range of stakeholders (ranging from government, to industry, to academia) to underpin research, support decision making, and identify environmental change. However, continuous monitoring and observation of ocean variables is difficult and expensive. Moreover, since oceans are vast, observations are typically sparse in spatial and temporal resolution. In addition, the hostile ocean environment creates challenges for collecting and maintaining data sets, such as instrument malfunctions and servicing, often resulting in temporal gaps of varying lengths. Neural networks (NN) have proven effective in many diverse big data applications, but few oceanographic applications have been tested using modern frameworks and architectures. Therefore, here we demonstrate a “proof of concept” neural network application using a popular “off-the-shelf” framework called “TensorFlow” to predict subsurface ocean variables including dissolved oxygen and nutrient (nitrate, phosphate, and silicate) concentrations, and temperature timeseries and show how these models can be used successfully for gap filling data products. We achieved a final prediction accuracy of over 96% for oxygen and temperature, and mean squared errors (MSE) of 2.63, 0.0099, and 0.78, for nitrates, phosphates, and silicates, respectively. The temperature gap-filling was done with an innovative contextual Long Short-Term Memory (LSTM) NN that uses data before and after the gap as separate feature variables. We also demonstrate the application of a novel dropout based approach to approximate the Bayesian uncertainty of these temperature predictions. This Bayesian uncertainty is represented in the form of 100 monte carlo dropout estimates of the two longest gaps in the temperature timeseries from a model with 25% dropout in the input and recurrent LSTM connections. Throughout the study, we present the NN training process including the tuning of the large number of NN hyperparameters which could pose as a barrier to uptake among researchers and other oceanographic data users. Our models can be scaled up and applied operationally to provide consistent, gap-free data to all data users, thus encouraging data uptake for data-based decision making.