AUTHOR=Bratt Sarah , Langalia Mrudang , Nanoti Abhishek TITLE=North-south scientific collaborations on research datasets: a longitudinal analysis of the division of labor on genomic datasets (1992–2021) JOURNAL=Frontiers in Big Data VOLUME=6 YEAR=2023 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2023.1054655 DOI=10.3389/fdata.2023.1054655 ISSN=2624-909X ABSTRACT=

Collaborations between scientists from the global north and global south (N-S collaborations) are a key driver of the “fourth paradigm of science” and have proven crucial to addressing global crises like COVID-19 and climate change. However, despite their critical role, N-S collaborations on datasets are not well understood. Science of science studies tend to rely on publications and patents to examine N-S collaboration patterns. To this end, the rise of global crises requiring N-S collaborations to produce and share data presents an urgent need to understand the prevalence, dynamics, and political economy of N-S collaborations on research datasets. In this paper, we employ a mixed methods case study research approach to analyze the frequency of and division of labor in N-S collaborations on datasets submitted to GenBank over 29 years (1992–2021). We find: (1) there is a low representation of N-S collaborations over the 29-year period. When they do occur, N-S collaborations display “burstiness” patterns, suggesting that N-S collaborations on datasets are formed and maintained reactively in the wake of global health crises such as infectious disease outbreaks; (2) The division of labor between datasets and publications is disproportionate to the global south in the early years, but becomes more overlapping after 2003. An exception in the case of countries with lower S&T capacity but high income, where these countries have a higher prevalence on datasets (e.g., United Arab Emirates). We qualitatively inspect a sample of N-S dataset collaborations to identify leadership patterns in dataset and publication authorship. The findings lead us to argue there is a need to include N-S dataset collaborations in measures of research outputs to nuance the current models and assessment tools of equity in N-S collaborations. The paper contributes to the SGDs objectives to develop data-driven metrics that can inform scientific collaborations on research datasets.