ORIGINAL RESEARCH article
Front. Disaster Emerg. Med.
Sec. Emergency Health Services
Volume 3 - 2025 | doi: 10.3389/femer.2025.1558200
This article is part of the Research TopicElectronic Health Records in Emergency Medicine: From Accountability to OpportunityView all 4 articles
A Cost-Effective Approach to Counterbalance the Scarcity of Medical Datasets
Provisionally accepted- 1Bruno Kessler Foundation (FBK ), Trento, Italy
- 2University of Padua, Padua, Veneto, Italy
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
This paper presents an innovative methodology for addressing the critical issue of data scarcity in clinical research, specifically within emergency departments. Inspired by the recent advancements in the generative abilities of Large Language Models (LLMs), we devised an automated approach based on LLMs to extend an existing publicly available English dataset to new languages.We constructed a pipeline of multiple automated components which first converts an existing annotated dataset from its complex standard format to a simpler inline annotated format, then generates inline annotations in the target language using LLMs, and finally converts the generated target language inline annotations to the dataset's standard format; a manual validation is envisaged for erroneous and missing annotations. By automating the translation and annotation transfer process, the method we propose significantly reduces the resource-intensive task of collecting data and manually annotating them, thus representing a crucial step towards bridging the gap between the need for clinical research and the availability of high-quality data.
Keywords: clinical data scarcity, multilingual dataset expansion, Large language models, automated annotation transfer, emergency department data, Low-resource settings, clinical multilingual NLP
Received: 09 Jan 2025; Accepted: 10 Apr 2025.
Copyright: © 2025 Magnini, Farzi, Ferrazzi, Ghosh, Lavelli, Mezzanotte and Speranza. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Bernardo Magnini, Bruno Kessler Foundation (FBK ), Trento, Italy
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.