Skip to main content

ORIGINAL RESEARCH article

Front. Digit. Health
Sec. Health Technology Implementation
Volume 7 - 2025 | doi: 10.3389/fdgth.2025.1535168
This article is part of the Research Topic Advances in Artificial Intelligence Transforming the Medical and Healthcare Sectors View all articles

ChestX-Transcribe: A Multimodal Transformer for Automated Radiology Report Generation from Chest X-rays

Provisionally accepted
  • Lovely Professional University, Phagwara, India

The final, formatted version of the article will be published soon.

    Radiology departments are under increasing pressure to meet the demand for timely and accurate diagnostics, especially with chest X-rays, a key modality for pulmonary condition assessment. Producing comprehensive and accurate radiological reports is a time-consuming process prone to errors, particularly in high-volume clinical environments. Automated report generation plays a crucial role in alleviating radiologists' workload, improving diagnostic accuracy, and ensuring consistency. This paper introduces ChestX-Transcribe, a multimodal transformer model that combines the Swin Transformer for extracting high-resolution visual features with DistilGPT for generating clinically relevant, semantically rich medical reports. Trained on the Indiana University Chest X-ray dataset, ChestX-Transcribe demonstrates state-of-the-art performance across BLEU, ROUGE, and METEOR metrics, outperforming prior models in producing clinically meaningful reports. However, the reliance on the Indiana University dataset introduces potential limitations, including selection bias, as the dataset is collected from specific hospitals within the Indiana Network for Patient Care. This may result in underrepresentation of certain demographics or conditions not prevalent in those healthcare settings, potentially skewing model predictions when applied to more diverse populations or different clinical environments. Additionally, the ethical implications of handling sensitive medical data, including patient privacy and data security, are considered. Despite these challenges, ChestX-Transcribe shows promising potential for enhancing real-world radiology workflows by automating the creation of medical reports, reducing diagnostic errors, and improving efficiency. The findings highlight the transformative potential of multimodal transformers in healthcare, with future work focusing on improving model generalizability and optimizing clinical integration.

    Keywords: Medical report generation, Multimodal transformers, swin transformer, DistilGPT, vision-language models, Radiology Workflow

    Received: 27 Nov 2024; Accepted: 06 Jan 2025.

    Copyright: © 2025 Singh and Singh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Sudhakar Singh, Lovely Professional University, Phagwara, India

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.