AUTHOR=Lee Taylor R. , Phrampus Benjamin J. , Obelcz Jeffrey TITLE=The necessary optimization of the data lifecycle: Marine geosciences in the big data era JOURNAL=Frontiers in Earth Science VOLUME=10 YEAR=2023 URL=https://www.frontiersin.org/journals/earth-science/articles/10.3389/feart.2022.1089112 DOI=10.3389/feart.2022.1089112 ISSN=2296-6463 ABSTRACT=

In the marine geosciences, observations are typically acquired using research vessels to understand a given phenomenon or area of interest. Despite the plateauing of ship time and active research vessels in the last decade, the rate of marine geoscience data production has continued to increase. Simultaneously, there exists large quantities of legacy data aggregated within data repositories; however, these data are rarely curated to be both discoverable and machine-readable (i.e., accessible). This results in inefficient use, or even omission, of high-quality data, that is, both increasingly important to utilize and impractical to recollect. The proliferation of newly acquired data, and increasing importance of legacy data, has only been met with incremental evolution in the methods of data integration. This paper describes some improvements at each stage of the data lifecycle (acquisition, curation, and integration) that could align the marine geosciences better with the “big data” paradigm. We have encountered several major issues coordinating these efforts which we outline here: 1) geologic anomalies are the primary focus of data acquisition and pose difficulty in understanding the dominant (i.e., baseline) marine geology, 2) marine geoscience data are rarely curated to be accessible, and 3) aforementioned issues preclude the use of efficient integration tools that can make optimal use of data. In this paper, we discuss challenges and solutions associated with these issues to overcome these concerns in future decades of marine geoscience. The successful execution of these interconnected steps will optimize the lifecycle of marine geoscience data in the “big data” era.