AUTHOR=Csizek Zsófia , Mikó-Baráth Eszter , Budai Anna , Frigyik Andrew B. , Pusztai Ágota , Nemes Vanda A. , Závori László , Fülöp Diána , Czigler András , Szabó-Guth Kitti , Buzás Péter , Piñero David P. , Jandó Gábor TITLE=Artificial intelligence-based screening for amblyopia and its risk factors: comparison with four classic stereovision tests JOURNAL=Frontiers in Medicine VOLUME=10 YEAR=2023 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2023.1294559 DOI=10.3389/fmed.2023.1294559 ISSN=2296-858X ABSTRACT=Introduction

The development of costs-effective and sensitive screening solutions to prevent amblyopia and identify its risk factors (strabismus, refractive problems or mixed) is a significant priority of pediatric ophthalmology. The main objective of our study was to compare the classification performance of various vision screening tests, including classic, stereoacuity-based tests (Lang II, TNO, Stereo Fly, and Frisby), and non-stereoacuity-based, low-density static, dynamic, and noisy anaglyphic random dot stereograms. We determined whether the combination of non-stereoacuity-based tests integrated in the simplest artificial intelligence (AI) model could be an alternative method for vision screening.

Methods

Our study, conducted in Spain and Hungary, is a non-experimental, cross-sectional diagnostic test assessment focused on pediatric eye conditions. Using convenience sampling, we enrolled 423 children aged 3.6–14 years, diagnosed with amblyopia, strabismus, or refractive errors, and compared them to age-matched emmetropic controls. Comprehensive pediatric ophthalmologic examinations ascertained diagnoses. Participants used filter glasses for stereovision tests and red-green goggles for an AI-based test over their prescribed glasses. Sensitivity, specificity, and the area under the ROC curve (AUC) were our metrics, with sensitivity being the primary endpoint. AUCs were analyzed using DeLong’s method, and binary classifications (pathologic vs. normal) were evaluated using McNemar’s matched pair and Fisher’s nonparametric tests.

Results

Four non-overlapping groups were studied: (1) amblyopia (n = 46), (2) amblyogenic (n = 55), (3) non-amblyogenic (n = 128), and (4) emmetropic (n = 194), and a fifth group that was a combination of the amblyopia and amblyogenic groups. Based on AUCs, the AI combination of non-stereoacuity-based tests showed significantly better performance 0.908, 95% CI: (0.829–0.958) for detecting amblyopia and its risk factors than most classical tests: Lang II: 0.704, (0.648–0.755), Stereo Fly: 0.780, (0.714–0.837), Frisby: 0.754 (0.688–0.812), p < 0.02, n = 91, DeLong’s method). At the optimum ROC point, McNemar’s test indicated significantly higher sensitivity in accord with AUCs. Moreover, the AI solution had significantly higher sensitivity than TNO (p = 0.046, N = 134, Fisher’s test), as well, while the specificity did not differ.

Discussion

The combination of multiple tests utilizing anaglyphic random dot stereograms with varying parameters (density, noise, dynamism) in AI leads to the most advanced and sensitive screening test for identifying amblyopia and amblyogenic conditions compared to all the other tests studied.