Skip to main content

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Head and Neck Cancer

Volume 15 - 2025 | doi: 10.3389/fonc.2025.1551876

This article is part of the Research Topic Challenges to Research on Oral Potentially Malignant Disorders and Oral Cancer View all 3 articles

Fusion Feature-Based Hybrid Methods for Diagnosing Oral Squamous Cell Carcinoma in Histopathological Images

Provisionally accepted
  • Baoan Central Hospital, Fifth Affiliated Hospital of Shenzhen University, Shenzhen, China

The final, formatted version of the article will be published soon.

    This study is experimental in nature and assess the effectiveness of the Cross-Attention Vision Transformer (CrossViT) in the early detection of oral squamous cell carcinoma (OSCC) and to propose a hybrid model that combines CrossViT features with manually extracted features to improve the accuracy and robustness of OSCC diagnosis.We employed the CrossViT architecture, which utilizes a dual attention mechanism to process multi-scale features, in combination with convolutional neural network (CNN) technology for the effective analysis of image patches. Simultaneously, features were manually extracted by experts from OSCC pathological images and subsequently fused with the features extracted by CrossViT to enhance diagnostic performance. The classification task was performed using an artificial neural network (ANN) to further improve diagnostic accuracy. Model performance was evaluated based on classification accuracy on two independent OSCC datasets.The proposed hybrid feature model demonstrated excellent performance in pathological diagnosis, achieving accuracies of 99.36% and 99.59%, respectively. Compared to CNN and Vision Transformer (ViT) models, the hybrid model was more effective in distinguishing between malignant and benign lesions, significantly improving diagnostic accuracy.By combining CrossViT with expert features, diagnostic accuracy for OSCC was significantly enhanced, thereby validating the potential of hybrid artificial intelligence models in clinical pathology. Future research will expand the dataset and explore the model's interpretability to facilitate its practical application in clinical settings.

    Keywords: oral squamous cell carcinoma, Convolutional Neural Networks, vision Transformer, Cross Vision Transformer, ANN - Artificial neural networks

    Received: 26 Dec 2024; Accepted: 11 Mar 2025.

    Copyright: © 2025 . This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: 佳兴 李, Baoan Central Hospital, Fifth Affiliated Hospital of Shenzhen University, Shenzhen, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    94% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more