AUTHOR=Hey Tony TITLE=Open science and Big Data in South Africa JOURNAL=Frontiers in Research Metrics and Analytics VOLUME=7 YEAR=2022 URL=https://www.frontiersin.org/journals/research-metrics-and-analytics/articles/10.3389/frma.2022.982435 DOI=10.3389/frma.2022.982435 ISSN=2504-0537 ABSTRACT=

With the Square Kilometer Array (SKA) project and the new Multi-Purpose Reactor (MPR) soon coming on-line, South Africa and other collaborating countries in Africa will need to make the management, analysis, publication, and curation of “Big Scientific Data” a priority. In addition, the recent draft Open Science policy from the South African Department of Science and Innovation (DSI) requires both Open Access to scholarly publications and research outputs, and an Open Data policy that facilitates equal opportunity of access to research data. The policy also endorses the deposit, discovery and dissemination of data and metadata in a manner consistent with the FAIR principles – making data Findable, Accessible, Interoperable and Re-usable (FAIR). The challenge to achieve Open Science in Africa starts with open access for research publications and the provision of persistent links to the supporting data. With the deluge of research data expected from the new experimental facilities in South Africa, the problem of how to make such data FAIR takes center stage. One promising approach to make such scientific datasets more “Findable” and “Interoperable” is to rely on the Dataset representation of the Schema.org vocabulary which has been endorsed by all the major search engines. The approach adds some semantic markup to Web pages and makes scientific datasets more “Findable” by search engines. This paper does not address all aspects of the Open Science agenda but instead is focused on the management and analysis challenges of the “Big Scientific Data” that will be produced by the SKA project. The paper summarizes the role of the SKA Regional Centers (SRCs) and then discusses the goal of ensuring reproducibility for the SKA data products. Experiments at the new MPR neutron source will also have to conform to the DSI's Open Science policy. The Open Science and FAIR data practices used at the ISIS Neutron source at the Rutherford Appleton Laboratory in the UK are then briefly described. The paper concludes with some remarks about the important role of interdisciplinary teams of research software engineers, data engineers and research librarians in research data management.