Skip to main content

REVIEW article

Front. Big Data
Sec. Data Science
Volume 7 - 2024 | doi: 10.3389/fdata.2024.1441869

When We Talk About Big Data, What Do We Really Mean?

Provisionally accepted
Xiaoyao Han Xiaoyao Han *Oskar J. Gstrein Oskar J. Gstrein Vasilios Andrikopoulos Vasilios Andrikopoulos
  • University of Groningen, Groningen, Netherlands

The final, formatted version of the article will be published soon.

    Despite the lack of consensus on an official definition of Big Data, research and studies have continued to progress based on this "no consensus" stance over the years. However, the lack of a clear definition and scope for Big Data results in scientific research and communication lacking a common ground. Even with the popular "V" characteristics, Big Data remains elusive. The term is broad and is used differently in research, often referring to entirely different concepts, which is rarely stated explicitly in papers. While many studies and reviews attempt to draw a comprehensive understanding of Big Data, there has been little systematic research on the position and practical implications of the term Big Data in research environments. To address this gap, this paper presents a Systematic Literature Review (SLR) on secondary studies to provide a comprehensive overview of how Big Data is used and understood across different scientific domains. Our objective was to monitor the application of the Big Data concept in science, identify which technologies are prevalent in which fields, and investigate the discrepancies between the theoretical understanding and practical usage of the term. Our study found that various Big Data technologies are being used in different scientific fields, including machine learning algorithms, distributed computing frameworks, and other tools. These manifestations of Big Data can be classified into four major categories: abstract concepts, large datasets, machine learning techniques, and the Big Data ecosystem. This study revealed that despite the general agreement on the "V" characteristics, researchers in different scientific fields have varied implicit understandings of Big Data. These implicit understandings significantly influence the content and discussions of studies involving Big Data, although they are often not explicitly stated. We call for a clearer articulation of the meaning of Big Data in research to facilitate smoother scientific communication.

    Keywords: Big data definition, Systematic Literature Review, Scientific research, Big Data review, big data epistemology

    Received: 31 May 2024; Accepted: 12 Aug 2024.

    Copyright: © 2024 Han, Gstrein and Andrikopoulos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Xiaoyao Han, University of Groningen, Groningen, Netherlands

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.