AUTHOR=Somaskandhan Pranavan , Leppänen Timo , Terrill Philip I. , Sigurdardottir Sigridur , Arnardottir Erna Sif , Ólafsdóttir Kristín A. , Serwatko Marta , Sigurðardóttir Sigurveig Þ. , Clausen Michael , Töyräs Juha , Korkalainen Henri
TITLE=Deep learning-based algorithm accurately classifies sleep stages in preadolescent children with sleep-disordered breathing symptoms and age-matched controls
JOURNAL=Frontiers in Neurology
VOLUME=14
YEAR=2023
URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2023.1162998
DOI=10.3389/fneur.2023.1162998
ISSN=1664-2295
ABSTRACT=IntroductionVisual sleep scoring has several shortcomings, including inter-scorer inconsistency, which may adversely affect diagnostic decision-making. Although automatic sleep staging in adults has been extensively studied, it is uncertain whether such sophisticated algorithms generalize well to different pediatric age groups due to distinctive EEG characteristics. The preadolescent age group (10–13-year-olds) is relatively understudied, and thus, we aimed to develop an automatic deep learning-based sleep stage classifier specifically targeting this cohort.
MethodsA dataset (n = 115) containing polysomnographic recordings of Icelandic preadolescent children with sleep-disordered breathing (SDB) symptoms, and age and sex-matched controls was utilized. We developed a combined convolutional and long short-term memory neural network architecture relying on electroencephalography (F4-M1), electrooculography (E1-M2), and chin electromyography signals. Performance relative to human scoring was further evaluated by analyzing intra- and inter-rater agreements in a subset (n = 10) of data with repeat scoring from two manual scorers.
ResultsThe deep learning-based model achieved an overall cross-validated accuracy of 84.1% (Cohen’s kappa κ = 0.78). There was no meaningful performance difference between SDB-symptomatic (n = 53) and control subgroups (n = 52) [83.9% (κ = 0.78) vs. 84.2% (κ = 0.78)]. The inter-rater reliability between manual scorers was 84.6% (κ = 0.78), and the automatic method reached similar agreements with scorers, 83.4% (κ = 0.76) and 82.7% (κ = 0.75).
ConclusionThe developed algorithm achieved high classification accuracy and substantial agreements with two manual scorers; the performance metrics compared favorably with typical inter-rater reliability between manual scorers and performance reported in previous studies. These suggest that our algorithm may facilitate less labor-intensive and reliable automatic sleep scoring in preadolescent children.