Skip to main content

ORIGINAL RESEARCH article

Front. Med.
Sec. Pathology
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1546452
This article is part of the Research Topic Advancing Computational Pathology: Integrating Multi-Modal Data and Efficient Model Training View all 3 articles

Abnormality-Aware Multimodal Learning for WSI Classification

Provisionally accepted
Thao M. Dang Thao M. Dang 1*Qifeng Zhou Qifeng Zhou 1Yuzhi Guo Yuzhi Guo 1*Hehuan Ma Hehuan Ma 1Saiyang Na Saiyang Na 1*Thao Bich Dang Thao Bich Dang 2*Jean Gao Jean Gao 1Junzhou Huang Junzhou Huang 1*
  • 1 University of Texas at Arlington, Arlington, United States
  • 2 University of Arizona, Tucson, Arizona, United States

The final, formatted version of the article will be published soon.

    Whole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into thousands of patches, which increases computational demands and makes it challenging to effectively focus on diagnostically relevant regions. Furthermore, these methods frequently rely on feature extractors pretrained on natural images, which are not optimized for pathology tasks, and overlook multimodal data sources such as cellular and textual information that can provide critical insights. To address these limitations, we propose the Abnormality-Aware MultiModal (AAMM) learning framework, which integrates abnormality detection and multimodal feature learning for WSI classification. AAMM incorporates a Gaussian Mixture Variational Autoencoder (GMVAE) to identify and select the most informative patches, reducing computational complexity while retaining critical diagnostic information. It further integrates multimodal features from pathology-specific foundation models, combining patch-level, cell-level, and text-level representations through cross-attention mechanisms. This approach enhances the ability to comprehensively analyze WSIs for cancer diagnosis and subtyping. Extensive experiments on normal-tumor classification and cancer subtyping demonstrate that AAMM achieves superior performance compared to state-of-the-art methods. By combining abnormal detection with multimodal feature integration, our framework offers an efficient and scalable solution for advancing computational pathology.

    Keywords: WSI analysis, multimodal fusion, Abnormal detection, Foundation model, Gaussian mixture autoencoder

    Received: 16 Dec 2024; Accepted: 04 Feb 2025.

    Copyright: © 2025 Dang, Zhou, Guo, Ma, Na, Dang, Gao and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Thao M. Dang, University of Texas at Arlington, Arlington, United States
    Yuzhi Guo, University of Texas at Arlington, Arlington, United States
    Saiyang Na, University of Texas at Arlington, Arlington, United States
    Thao Bich Dang, University of Arizona, Tucson, 85721, Arizona, United States
    Junzhou Huang, University of Texas at Arlington, Arlington, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.