Skip to main content

ORIGINAL RESEARCH article

Front. Neurosci.
Sec. Neuromorphic Engineering
Volume 18 - 2024 | doi: 10.3389/fnins.2024.1493163
This article is part of the Research Topic Deep Spiking Neural Networks: Models, Algorithms and Applications View all 9 articles

ClinClip: A Multimodal Language Pretraining Model Integrating EEG Data for Enhanced English Medical Listening Assessment

Provisionally accepted
  • The Basic Department, The Tourism College of Changchun University, Changchun 130000, P. R. China, Changchun, China

The final, formatted version of the article will be published soon.

    In the field of medical listening assessments, accurate transcription and effective cognitive load management are critical for enhancing healthcare delivery. Traditional speech recognition systems, while successful in general applications, often struggle in medical contexts where the cognitive state of the listener plays a significant role. These conventional methods typically rely on audio-only inputs and lack the ability to account for the listener's cognitive load, leading to reduced accuracy and effectiveness in complex medical environments. To address these limitations, this study introduces ClinClip, a novel multimodal model that integrates EEG signals with audio data through a transformer-based architecture. ClinClip is designed to dynamically adjust to the cognitive state of the listener, thereby improving transcription accuracy and robustness in medical settings. The model leverages cognitive-enhanced strategies, including EEG-based modulation and hierarchical fusion of multimodal data, to overcome the challenges faced by traditional methods. Experiments conducted on four datasets-EEGEyeNet, DEAP, PhyAAt, and eSports Sensors-demonstrate that ClinClip significantly outperforms six state-of-the-art models in both Word Error Rate (WER) and Cognitive Modulation Efficiency (CME). These results underscore the model's effectiveness in handling complex medical audio scenarios and highlight its potential to improve the accuracy of medical listening assessments. By addressing the cognitive aspects of the listening process, ClinClip contributes to more reliable and effective healthcare delivery, offering a substantial advancement over traditional speech recognition approaches.

    Keywords: Clip, Multimodal Language Pretraining, EEG data, English Medical Speech Recognition, Robotics

    Received: 08 Sep 2024; Accepted: 16 Dec 2024.

    Copyright: © 2024 Guangyu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Sun Guangyu, The Basic Department, The Tourism College of Changchun University, Changchun 130000, P. R. China, Changchun, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.