Skip to main content

ORIGINAL RESEARCH article

Front. Cardiovasc. Med.
Sec. General Cardiovascular Medicine
Volume 12 - 2025 | doi: 10.3389/fcvm.2025.1458289
This article is part of the Research Topic The Role of Artificial Intelligence Technologies in Revolutionizing and Aiding Cardiovascular Medicine View all 9 articles

Assessing the Performance of Zero-Shot Visual Question Answering in Multimodal Large Language Models for 12-Lead ECG Image Interpretation

Provisionally accepted
Tomohisa Seki Tomohisa Seki 1*Yoshimasa Kawazoe Yoshimasa Kawazoe 1,2Yu Akagi Yu Akagi 2Toru Takiguchi Toru Takiguchi 1Kazuhiko Ohe Kazuhiko Ohe 1,2
  • 1 The University of Tokyo Hospital, Tokyo, Japan
  • 2 The University of Tokyo, Bunkyo, Tōkyō, Japan

The final, formatted version of the article will be published soon.

    Large Language Models (LLM) are increasingly multimodal, and Zero-Shot Visual Question Answering (VQA) shows promise for image interpretation. If zero-shot VQA can be applied to a 12lead electrocardiogram (ECG), a prevalent diagnostic tool in the medical field, the potential benefits to the field would be substantial. This study evaluated the diagnostic performance of zero-shot VQA with multimodal LLMs on 12-lead ECG images. The results revealed that multimodal LLM tended to make more errors in extracting and verbalizing image features than in describing preconditions and making logical inferences. Even when the answers were correct, erroneous descriptions of image features were common. These findings suggest a need for improved control over image hallucination and indicate that performance evaluation using the percentage of correct answers to multiple-choice questions may not be sufficient for performance assessment in VQA tasks.

    Keywords: Large Language Model, Electrocardiography, Visual question answering, Hallucination, Zero-shot learning

    Received: 02 Jul 2024; Accepted: 15 Jan 2025.

    Copyright: © 2025 Seki, Kawazoe, Akagi, Takiguchi and Ohe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Tomohisa Seki, The University of Tokyo Hospital, Tokyo, Japan

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.