AUTHOR=Choo Bryan Peide , Mok Yingjuan , Oh Hong Choon , Patanaik Amiya , Kishan Kishan , Awasthi Animesh , Biju Siddharth , Bhattacharjee Soumya , Poh Yvonne , Wong Hang Siang TITLE=Benchmarking performance of an automatic polysomnography scoring system in a population with suspected sleep disorders JOURNAL=Frontiers in Neurology VOLUME=14 YEAR=2023 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2023.1123935 DOI=10.3389/fneur.2023.1123935 ISSN=1664-2295 ABSTRACT=Aim

The current gold standard for measuring sleep disorders is polysomnography (PSG), which is manually scored by a sleep technologist. Scoring a PSG is time-consuming and tedious, with substantial inter-rater variability. A deep-learning-based sleep analysis software module can perform autoscoring of PSG. The primary objective of the study is to validate the accuracy and reliability of the autoscoring software. The secondary objective is to measure workflow improvements in terms of time and cost via a time motion study.

Methodology

The performance of an automatic PSG scoring software was benchmarked against the performance of two independent sleep technologists on PSG data collected from patients with suspected sleep disorders. The technologists at the hospital clinic and a third-party scoring company scored the PSG records independently. The scores were then compared between the technologists and the automatic scoring system. An observational study was also performed where the time taken for sleep technologists at the hospital clinic to manually score PSGs was tracked, along with the time taken by the automatic scoring software to assess for potential time savings.

Results

Pearson's correlation between the manually scored apnea–hypopnea index (AHI) and the automatically scored AHI was 0.962, demonstrating a near-perfect agreement. The autoscoring system demonstrated similar results in sleep staging. The agreement between automatic staging and manual scoring was higher in terms of accuracy and Cohen's kappa than the agreement between experts. The autoscoring system took an average of 42.7 s to score each record compared with 4,243 s for manual scoring. Following a manual review of the auto scores, an average time savings of 38.6 min per PSG was observed, amounting to 0.25 full-time equivalent (FTE) savings per year.

Conclusion

The findings indicate a potential for a reduction in the burden of manual scoring of PSGs by sleep technologists and may be of operational significance for sleep laboratories in the healthcare setting.