About this Research Topic
Data science, on the other hand, gives value to data for advancing scientific goals. The exponential growth of data has often been mentioned, at least since the early 20th century, and the Schmidt quote and chief economist for Google’s Varian’s influential quantification of it are notable examples (“Over the last two years alone 90 percent of the data in the world was generated”). Open Data has long been recognized as an important asset since it encourages the sharing and interlinking of research data via large digital infrastructures. Examples are sharing the data concerning the Ebola virus and, more recently, the first genome sequence of the SARS-CoV-2 virus. These examples provide an inspiring model for how global research collaborations can help address societal challenges.
This Research Topic addresses a holistic view of Data science; a view that has implications for the way scientists do science. Holistic data science and AI mean a new way of understanding data and going beyond just a dataset. The Research Topic aims to understand how data science can help to solve problems facing scientists and advance scientific goals; e.g., working with massive datasets and complex metadata, analyzing and reasoning about data, scientific reproducibility, and how data science can facilitate the scientific cycle (exploration, analysis, interpretation, and communication).
Access to data is a critical factor and there are many initiatives to create data pools and data spaces around the world, encouraging sharing and interlinking of research data via large digital infrastructures. However, many current uses rely on proprietary data and much of the critical data is not open. Many European regulations (e.g., GDPR and the recently proposed Commission’ AI Regulation Act) have implications for the use of AI for data collection and processing.
Research is likely to become more urgent to track the use of open data and to develop approaches to address the opacity of algorithms (how to be kept under ‘control’, and be traced and checked) through open data and open data infrastructures. Also, the world of data is not flat. It is organized around disciplines, lines of research, or schools of thought. Scientific disciplines are called to make data in a way that is findable, accessible, interoperable, and reusable (FAIR), and crosses scientific boundaries.
There has been a lot of discussion about non-reproducibility in science. The single most important challenge is whether data science (and AI) can have a key role to improve the credibility and efficiency of research, one of the cornerstones on which science is built. Areas to be covered in this Research Topic may include, but are not limited to:
• Questions that science needs to raise with regard to data science; for example, how to interact with data (which includes complex metadata), and how data science can facilitate the scientific cycle (exploration, analysis, interpretation, communication).
• How can the credibility and efficiency of research be improved across the whole reproducibility spectrum (from 'publication only' to 'full replication' with linked and executable code and data and share all research outputs in a way that is findable, accessible, interoperable, and reusable (FAIR))?
• How to tackle the potential danger of Data Science and AI for generating ‘fake science;’ e.g., by deploying AI models to detect fake contributions (in publishing) or ‘fake’ science?
Keywords: Data Science, Artificial Intelligence, Open Data, research life cycle, knowledge production
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.