AUTHOR=Somaskandhan Pranavan , Leppänen Timo , Terrill Philip I. , Sigurdardottir Sigridur , Arnardottir Erna Sif , Ólafsdóttir Kristín A. , Serwatko Marta , Sigurðardóttir Sigurveig Þ. , Clausen Michael , Töyräs Juha , Korkalainen Henri TITLE=Deep learning-based algorithm accurately classifies sleep stages in preadolescent children with sleep-disordered breathing symptoms and age-matched controls JOURNAL=Frontiers in Neurology VOLUME=14 YEAR=2023 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2023.1162998 DOI=10.3389/fneur.2023.1162998 ISSN=1664-2295 ABSTRACT=Introduction

Visual sleep scoring has several shortcomings, including inter-scorer inconsistency, which may adversely affect diagnostic decision-making. Although automatic sleep staging in adults has been extensively studied, it is uncertain whether such sophisticated algorithms generalize well to different pediatric age groups due to distinctive EEG characteristics. The preadolescent age group (10–13-year-olds) is relatively understudied, and thus, we aimed to develop an automatic deep learning-based sleep stage classifier specifically targeting this cohort.

Methods

A dataset (n = 115) containing polysomnographic recordings of Icelandic preadolescent children with sleep-disordered breathing (SDB) symptoms, and age and sex-matched controls was utilized. We developed a combined convolutional and long short-term memory neural network architecture relying on electroencephalography (F4-M1), electrooculography (E1-M2), and chin electromyography signals. Performance relative to human scoring was further evaluated by analyzing intra- and inter-rater agreements in a subset (n = 10) of data with repeat scoring from two manual scorers.

Results

The deep learning-based model achieved an overall cross-validated accuracy of 84.1% (Cohen’s kappa κ = 0.78). There was no meaningful performance difference between SDB-symptomatic (n = 53) and control subgroups (n = 52) [83.9% (κ = 0.78) vs. 84.2% (κ = 0.78)]. The inter-rater reliability between manual scorers was 84.6% (κ = 0.78), and the automatic method reached similar agreements with scorers, 83.4% (κ = 0.76) and 82.7% (κ = 0.75).

Conclusion

The developed algorithm achieved high classification accuracy and substantial agreements with two manual scorers; the performance metrics compared favorably with typical inter-rater reliability between manual scorers and performance reported in previous studies. These suggest that our algorithm may facilitate less labor-intensive and reliable automatic sleep scoring in preadolescent children.