ORIGINAL RESEARCH article

Front. Physiol.

Sec. Computational Physiology and Medicine

Volume 16 - 2025 | doi: 10.3389/fphys.2025.1580985

Cross-Modal Attention Model Integrating Tongue Images and Descriptions: A Novel Intelligent TCM Approach for Pathological Organ Diagnosis

Provisionally accepted
Quan  GanQuan Gan1*Chen  WangChen Wang1Zhaoman  ZhongZhaoman Zhong1Jiaying  WuJiaying Wu1Qiwei  GeQiwei Ge2Lei  ShiLei Shi1Jiaqing  ShangJiaqing Shang1Chuanxia  LiuChuanxia Liu1*
  • 1Jiangsu Ocean Universiity, Lianyungang, China
  • 2Yamaguchi University, Yamaguchi, Yamaguchi, Japan

The final, formatted version of the article will be published soon.

Introduction: Tongue diagnosis is a fundamental technique in traditional Chinese medicine (TCM), where clinicians evaluate the tongue's appearance to infer the condition of pathological organs. However, most existing research on intelligent tongue diagnosis primarily focuses on analyzing tongue images, often neglecting the important descriptive text that accompanies these images. This text is an essential component of clinical diagnosis. To overcome this gap, we propose a novel Cross-Modal Pathological Organ Diagnosis Model that integrates tongue images and textual descriptions for more accurate pathological classification. Methods: Our model extracts features from both the tongue images and the corresponding textual descriptions. These features are then fused using a cross-modal attention mechanism to enhance the classification of pathological organs. The cross-modal attention mechanism enables the model to effectively combine visual and textual information, addressing the limitations of using either modality alone. Results: We conducted experiments using a self-constructed dataset to evaluate our model's performance. The results demonstrate that our model outperforms common models regarding overall accuracy. Additionally, ablation studies, where either tongue images or textual descriptions were used alone, confirmed the significant benefit of multimodal fusion in improving diagnostic accuracy. Discussion: This study introduces a new perspective on intelligent tongue diagnosis in TCM by incorporating visual and textual data. The experimental findings highlight the importance of cross-modal feature fusion for improving the accuracy of pathological diagnosis. Our approach not only contributes to the development of more effective diagnostic systems but also paves the way for future advancements in the automation of TCM diagnosis.

Keywords: Tongue diagnosis, Pathological organ, Tongue images analysis, Textual descriptions, cross-modal attention

Received: 21 Feb 2025; Accepted: 07 Apr 2025.

Copyright: © 2025 Gan, Wang, Zhong, Wu, Ge, Shi, Shang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Quan Gan, Jiangsu Ocean Universiity, Lianyungang, China
Chuanxia Liu, Jiangsu Ocean Universiity, Lianyungang, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Research integrity at Frontiers

94% of researchers rate our articles as excellent or good

Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


Find out more