Skip to main content

ORIGINAL RESEARCH article

Front. Neurol.

Sec. Epilepsy

Volume 16 - 2025 | doi: 10.3389/fneur.2025.1521001

This article is part of the Research Topic Advanced EEG Analysis Techniques for Neurological Disorders View all 7 articles

Data Transformation of Unstructured Electroencephalography Reports by Natural Language Processing: Improving Data Usability for Large-Scale Epilepsy Studies

Provisionally accepted
Yoon Gi Chung Yoon Gi Chung 1Jaeso Cho Jaeso Cho 1Young Ho Kim Young Ho Kim 1Hyun Woo Kim Hyun Woo Kim 1Hunmin Kim Hunmin Kim 1*Yong Seo Koo Yong Seo Koo 2Seo-Young Lee Seo-Young Lee 3Young-Min Shon Young-Min Shon 4
  • 1 Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi, Republic of Korea
  • 2 Asan Medical Center, Seoul, Republic of Korea
  • 3 School of Medicine, Kangwon National University, Chuncheon, Gangwon, Republic of Korea
  • 4 Samsung Medical Center, Sungkyunkwan University, Seoul, Republic of Korea

The final, formatted version of the article will be published soon.

    Introduction: Electroencephalography (EEG) is a popular technique that provides neurologists with electrographic insights and clinical interpretations. However, these insights are predominantly presented in unstructured textual formats, which complicates data extraction and analysis. In this study, we introduce a hierarchical algorithm aimed at transforming unstructured EEG reports from pediatric patients diagnosed with epilepsy into structured data using natural language processing (NLP) techniques.Methods: The proposed algorithm consists of two distinct phases: a deep learning-based text classification followed by a series of rule-based keyword extraction procedures. First, we categorized the EEG reports into two primary groups: normal and abnormal. Thereafter, we systematically identified the key indicators of cerebral dysfunction or seizures, distinguishing between focal and generalized seizures, as well as identifying the epileptiform discharges and their specific anatomical locations. For this study, we retrospectively analyzed a dataset comprising 17,172 EEG reports from 3,423 pediatric patients. Among them, we selected 6,173 normal and 6,173 abnormal reports confirmed by neurologists for algorithm development.Results: The developed algorithm successfully classified EEG reports into 1,000 normal and 1,000 abnormal reports, and effectively identified the presence of cerebral dysfunction or seizures within these reports. Furthermore, our findings reveal that the algorithm translated abnormal reports into structured tabular data with an accuracy surpassing 98.5% when determining the type of seizures (focal or generalized). Additionally, the accuracy for detecting epileptiform discharges and their respective locations exceeded 88.5%. These outcomes were validated through both internal and external assessments involving 800 reports from two different medical institutions.Discussion: Our primary focus was to convert EEG reports into structured datasets, diverging from the traditional methods of formulating clinical notes or discharge summaries. We developed a hierarchical and streamlined approach leveraging keyword selections guided by neurologists, which contributed to the exceptional performance of our algorithm. Overall, this methodology enhances data accessibility as well as improves the potential for further research and clinical applications in the field of pediatric epilepsy management.

    Keywords: Natural Language Processing, Electroencephalography, Epilepsy, deep learning, keyword extraction

    Received: 01 Nov 2024; Accepted: 17 Feb 2025.

    Copyright: © 2025 Chung, Cho, Kim, Kim, Kim, Koo, Lee and Shon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Hunmin Kim, Seoul National University Bundang Hospital, Seongnam-si, 13620, Gyeonggi, Republic of Korea

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    94% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more