Bad bibliometrics don’t add up for research or why research publishing policy needs sound science

The integrity and reliability of bibliometric studies hold a pivotal role in shaping the trajectory of scientific research, influencing policy-making processes, and guiding the allocation of funding resources.

Photo credit: Shutterstock

From gatekeepers to guardians: science publishing in the 21st century

Growth in scientific output and scientific publishing is nothing new. Ever since the Industrial Revolution, societies have exponentially invested in education, research, and development, leading to ever increasing numbers of literate and university-educated people on the planet, and subsequently researchers and scientific output, primarily measured in research articles (Figure 1: Historical Article Growth), but also patents. This has been and continues to be the underpinning of modern knowledge economies, prosperity, and nearly every advance we have made to live better, more prosperous, and healthier lives over the last 200 years: cures for diseases, vaccines, better hygiene and sanitation, higher-yielding crops all allow us to produce enough food and better care for 8 billion people (for an excellent review on the exponentials in research and development visit: https://ourworldindata.org/research-and-development). 

Figure 1. Historical Article Growth. DATA SOURCE: Web of Science Core Collection 2024-02-16. Scopus 2024-02-19. Dimensions (publication type = articles) 2024-02-19. Indexation for 2023 will be incomplete for all.

Two factors accelerated the remarkable shift in science communications we have seen in the last two decades. The first is technical: the ability to publish digitally has transformed not only the volume of research that it is possible to disseminate publicly, but also the accessibility and usability of that research accelerated the exchange of scientific ideas and collaboration. The second is political and principled: the research community and science policymakers have moved to democratize access to research by demanding it be openly accessible. In practice, this democratization means findings are available globally, removing paywall barriers and facilitating more engagement with both researchers and the wider public, including taxpayers who ultimately pay for government-funded research.

Open science advocates eschew unaccountable gatekeeping that restricts access to research. Instead, open science privileges an editorial guardianship that upholds critical values such as quality, accessibility, transparency (particularly as regards peer review), interoperability, and equity.

Researchers have shifted to open access publishing because they share these values and benefit in turn from enhanced visibility, accessibility, and potential impact. This switch ensures that science is a global, interconnected endeavor where even minor advances contribute to the broader community's progress, facilitating cross-disciplinary developments and cutting research waste. Open science refuses to gatekeep which contributions add value, provided quality is guarded – research communities themselves determine how all collective knowledge and data is dynamically developed and used, now and in the future. 

Principles and prejudice: science publishing and bibliometric evidence

The exponential increase in scientific publications has long surpassed an individual researcher’s capacity to ingest every article and poses challenges, both practical and principled. On a practical front, keeping pace with the sheer volume of literature demands new approaches beyond traditional methods and publishing models. The real issue is not the quantity of scientific output but how it is managed, accessed, and ultimately utilized by researchers and beyond. Outdated publishing models and information-sharing methods are insufficient. Modern tools like bibliometrics, data science, and AI offer solutions, enhancing navigation through complex information, improving research relevance, and fostering innovation across disciplines.

At the same time, there are principled questions posed to the open science movement, particularly to vanguard open access publishers who process at scale, about the maintenance of quality when the gates are opened, not just to read but to publish. There are also questions about how expanding opportunities to publish impacts research culture and pressure on researchers, encapsulated in the term ‘publish or perish.’ We need to acknowledge the queries from detractors from open science who argue on principle that quality cannot be guaranteed at volume and that pressure to publish is putting too much strain on researchers. But if principled arguments cannot be supported by evidence-based analysis, principled opinion risks becoming a misjudged prejudice. This is why Frontiers wants to learn from the best data and analysis about research publishing available and why we scrutinize analysis of science publishing with the same rigor in our publication processes. This is particularly true of bibliometric studies.

Bibliometrics is arguably one of the most valuable research areas to guide researchers, institutions, policymakers, and publishers. While most bibliometric studies hold some of the highest standards of integrity in the scientific community, it is imperative for anyone considering arguments arising from bibliometric analysis to apply the same rigor and skepticism as you would to any other scientific article. This means acting as the guardians to prevent flawed methods and questionable data, to identify personal biases and preconceptions. 

Such robust assessment, as in the best peer review, prevents problematic studies gaining traction and amplifying misinformation, including through media coverage. It is important that this corrective and objective counterbalance is available to policymakers and funders who are making decisions about science publishing and communications, avoiding sub-par and potentially biased bibliometrics that distort the debate on how science should be communicated and who can contribute to scientific knowledge.  

To mitigate the harmful effects of such studies, the scientific community, particularly reputable bibliometricians, need to continue to promote integrity, rigor, transparency, and a sense of responsibility in analyses, because without necessary cautions a seemingly good conclusion may introduce unforeseen and even more harmful problems. This will protect scientific research principles, ensuring bibliometrics remains a valuable tool for knowledge advancement. 

Taking our own cue, Frontiers provides such a case study analysis applying those principles of integrity, rigor, and transparency to a bibliometric preprint entitled, “The strain on scientific publishing” by Hanson et.al.

Case Study: The Misrepresentation of Strain

The study posited that the scientific community is under strain due to a declining workforce and an exponential increase in published articles, caused by special issues in open access journals. Frontiers own review of the article has identified shortcomings in methodology and analysis, and we invite all researchers and policymakers to impartially consider the points raised below. We also very much welcome further objective analysis of the bibliometric data available.  

Misleading data visualization: correlation is not causation

The study presents a graph that claims to show an “exponential increase” in publications from 2013 to 2022 and over lays a plot of PhD graduates declining markedly in 2020. The study then makes the case that the “dramatic” growth in articles “outpaces” a declining scientific workforce, placing a strain on researchers to write, review, and edit articles. It then zooms in to what appears to be a pre-ordained and isolated cause – special issue articles by open access publishers. No other potential drivers are identified, the preprint neglecting to consider and discount any other causes (see below).  

Incomplete and selective use of data

The authors resorted to unverifiable data obtained through web scraping, which means that the study and its conclusions were predicated on an incomplete and unbalanced dataset of publisher activity. In addition, the available data was edited and cut to display only numbers that related to years in which special issues increased, creating a skewed visual impression of a “dramatic” “rise of special issues." When contacted for access to their dataset, the authors responded that they had “embargoed” their data, with the result that no one can verify or replicate their findings. 

Findings that cannot be reproduced

The study’s key findings could not be reproduced. This was due to two fundamental issues with the data used in the study: 1) the reliance on unverifiable proxy data sources while portraying that data as reflecting original sources and 2) the omittance of any data or evidence that contradicts the cornerstone correlation and conclusion that special issues cause significant increases in research output and add to strain on researchers.   

The authors leaned heavily on the visual impression of data to make a case for correlations to imply causality, with no scientific analysis – a methodologically unsound and misrepresentative practice. Employing reproducible data from original, verifiable sources painted a starkly different picture (Figure 2: False Pillars of Strain): 

Figure 2. False Pillars of Stress. (A) compares the output of the main indexation collections and the data derived from Scimago by Crosetto et al. (B) shows the total number of active researchers. (C) superimposes published output in Scopus and Web of Science with the contribution from Frontiers Research Topics. DATA SOURCES: Web of Science Core Collection, Scopus, Dimensions, all extracted 2023-02-16. Frontiers Research Topics, internal data warehouse. Researcher numbers from OECD, 2023-12-04.

When the preprint’s data is examined and tested in this way, the three pillars of the study’s argument start to look unstable.  

  1. The proxy data was not reproducible and not representative of the original sources of Scopus and Web of Science. The original data does not show the claimed “exponential increase” in the total number of articles published during the study period (2013-2022), in fact growth is linear during this period (Figure 2A). 

  2. The study uses PhD graduates as a proxy for active researchers who write, review, and edit articles when a more direct measure of active researchers is available. They furthermore extrapolate a downward trend to 2021. They ignore the upward trend available from the same data source they used (Figure 2B) and fail to mention or consider that PhD graduations were disrupted during the pandemic. A direct measure of the number of active researchers shows the opposite to what they claim – a continuous increase.  

  3. The preprint’s argument relies on a visual leap from the incorrect Figure 1A to their selectively edited Figure 2A that creates the misleading impression of a correlation without providing any correlation or causal analysis. In fact, using the same data Frontiers provided the preprint authors as well as the correct original Scopus data, we do not find any correlation between total number of articles and the number of special issues articles (Figure 2C).  

Frontiers’ review concludes that no evidence is available, and no analysis has been provided to prove, the preprint’s assertion that special issue articles is a (let alone the) cause of strain on scientific publishing by driving an exponential increase in total publications, amidst a dwindling global scientific workforce. In short, the article’s premise and evidence are wrong. 

A biased attribution to growth?

Attributing the growth of scientific output solely to gold open access publishers, which publish far less than subscription publishers, overlooks the real factors driving scientific advancement. The surge in scientific output is due to a combination of educational expansion, increased number of researchers, higher investment in R&D, and technological progress. 

The global expansion of educational institutions has widened the pool of potential researchers, with a notable rise in doctoral degrees and specialized academic fields. This, coupled with longer career spans and more financial support for research, has significantly contributed to the growth in scientific inquiry. Technological advancements have revolutionized research methods and international collaboration, boosting the pace of discoveries. Government policies promoting research, the push for commercialization, and the rise of interdisciplinary studies have all spurred publication rates.  

Additionally, the role of conferences, specialized software, corporate research, and the expansion of language and regional journals have increased publication volumes. Artificial intelligence has streamlined data analysis, enhancing research efficiency.  

Most significantly, national initiatives like China's goal to lead in scientific publications, and the international response to the COVID-19 pandemic (reflected by the over 1 million articles indexed in the CORD-19 database), have massively contributed to scientific output at global scale. These are just two of the indisputable technological, scholarly, and political drivers behind the increase in scientific publication output over the past 20 years – none of which is analyzed nor discussed by the authors of the pre-print.  This lack of consideration not only undermines the article’s argument but is a missed opportunity to use bibliometric analysis to understand how science publishing is being shaped by a range of interlinked and interdisciplinary global factors. It is clearly flawed to single out the shift of academic publishing towards open access as the sole driver of increase in scientific output, ignoring the multifaceted and mostly positive drivers behind this growth. 

At a higher level, we point out that the availability of more publications does not lead to lower collective value. Delayed or missed recognition is not uncommon in the history of science. Prematurely and artificially limiting the number of publications simply because “there are too many” will hurt science; the view presented by the authors of the pre-print takes a negative and unconstructive premise as its starting point. 

Figure 3. Regional Publishing Output. DATA SOURCES: Dimensions 2023-01-27. Note that not all published articles have country-affiliation data, these are excluded. Data for 2023 will be incomplete. Regional classifications, Frontiers own.

Figure 3 shows total article output by country (data from Dimensions, retrieved on 14 January 2024). The rise of the Chinese research output, along with other counties in Asia, have had the most significant impact on the total number of research articles. The surge of articles related to COVID-19 is seen here in all demographics, but is most prevalent in the numbers of Europe and North America. The number of articles published is now generally falling back to the long-term trend line.  

Shifting dynamics in academic publishing 

As noted before, the academic publishing landscape is undergoing a significant transformation. While the overall production of scholarly articles continues to grow, there is strong growth in publications within gold open access (OA) journals (Figure 4: Publishing Output by Business Model). The steady progression in total publications is maintained as the total number of open-access articles increases steeply, indicating a shift in business models driven by the preference among authors towards OA models. The popularity of gold OA is challenging the traditional dominance of subscription-based publishing. 

Figure 4. Publishing Output by Business Model. DATA SOURCE: Scopus 2024-02-15.

Frontiers’ solution: A human-centric AI approach 

Frontiers is tackling the challenge of the ever-increasing scientific output by integrating advanced data science and AI into the peer review process, enhancing efficiency and quality, while keeping the human researcher at the core of decision-making. These tools assist reviewers and editors by quickly providing context, relevance, and connections to related work, tasks that would be time-consuming for humans. 

This approach ensures that research is accurately positioned within the global scientific dialogue, maximizing its impact and accessibility. Frontiers' AI assists in making the peer review process more thorough and efficient, benefiting the entire scientific community without replacing the essential human judgment. 

The success of Frontiers, surpassing century-old publishers in citations per article, highlights the effectiveness of its innovative, technology-enhanced services (see Quality blog post). This achievement reflects Frontiers' commitment to improving scientific communication through continuous evolution of its AI and data science tools, always with the goal of supporting human researchers in navigating the vast landscape of scientific knowledge.    When things go wrong – and we know from experience that mistakes will happen as science and research publishing adapts to rapidly evolving technologies – Frontiers will lead in owning and learning from those mistakes. 

Frontiers' adaptive publishing solution: Research Topics  

Frontiers has revolutionized thematic publishing with "Research Topics," a forward-thinking adaptation to the dynamic nature of scientific research. Moving away from the traditional special issues model, which was less inclusive and less flexible, Research Topics embrace a collaborative and open approach. They cater to the interdisciplinary and fast-evolving demands of modern science, allowing for a broader participation from across the scientific community. 

These Research Topics act as mini journals, curated by Topic Editors who are leaders in their fields. They invite contributions from established experts across academic disciplines and open the topic to spontaneous submissions from the wider community, promoting a rich diversity of ideas and fostering unexpected research directions. Managed closely with the hosting journal's Editor-in-Chief, each submission is rigorously peer-reviewed, maintaining equally high standards of research integrity and quality. 

Beyond serving as a publication platform, Research Topics are a catalyst for scientific evolution, breaking down traditional barriers to encourage a fluid exchange of ideas. This initiative supports the cross-pollination of disciplines and the pursuit of innovative solutions to complex problems, embodying the open, collaborative, and interdisciplinary spirit of 21st-century science. 

Frontiers' Research Topics reflect a deep understanding of the changing landscape of scientific inquiry, offering a flexible and progressive platform that meets the needs of today's researchers. This approach not only addresses current scientific community needs but also lays the foundation for future discoveries, ensuring the continued advancement of knowledge. 

Unknown science: research volume and open access  

The strategic pivot to OA aims to extend the reach and influence of scholarly work, making it accessible to a broader audience, including researchers, practitioners, policymakers, and the public. This increased visibility is crucial for speeding up scientific discoveries and promoting cross-disciplinary collaborations.  

The rapid expansion of OA publications marks a collective shift towards a more open scientific ecosystem, posing a challenge to traditional publishing models. As this trend continues, it sets the stage for OA to emerge as the leading model in academic publishing, maximizing research value for the global community. 

This vast volume of research, growing for centuries, necessitates preparation for even greater increases, driven by the demands of a sustainable planet and healthy lives. We cannot predict what research, evidence, or data may be of value to future researchers. The first step to unlocking the potential of research volume is to make that research openly available now and in the future. This is the power and imperative nature of open science. The COVID-19 pandemic underscored the value of open science, with over 1 million related articles in the CORD-19 database demonstrating that science saves lives. We do not know what the next pandemic will be, or what research in what form will be needed to meet the challenges to come. Policymakers are thus encouraged to support investments and collaborations, not restrict growth due to the volume of information but to increase access to knowledge for all.

 


Acknowledgements: The author thanks Chaomei Chen, Professor of Informatics, College of Computing and Informatics at Drexel University, for his helpful contribution to this blog post.