Background

AUTHOR=Parkash Om , Siddiqui Asra Tus Saleha , Jiwani Uswa , Rind Fahad , Padhani Zahra Ali , Rizvi Arjumand , Hoodbhoy Zahra , Das Jai K. TITLE=Diagnostic accuracy of artificial intelligence for detecting gastrointestinal luminal pathologies: A systematic review and meta-analysis JOURNAL=Frontiers in Medicine VOLUME=9 YEAR=2022 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2022.1018937 DOI=10.3389/fmed.2022.1018937 ISSN=2296-858X ABSTRACT=Background

Artificial Intelligence (AI) holds considerable promise for diagnostics in the field of gastroenterology. This systematic review and meta-analysis aims to assess the diagnostic accuracy of AI models compared with the gold standard of experts and histopathology for the diagnosis of various gastrointestinal (GI) luminal pathologies including polyps, neoplasms, and inflammatory bowel disease.

Methods

We searched PubMed, CINAHL, Wiley Cochrane Library, and Web of Science electronic databases to identify studies assessing the diagnostic performance of AI models for GI luminal pathologies. We extracted binary diagnostic accuracy data and constructed contingency tables to derive the outcomes of interest: sensitivity and specificity. We performed a meta-analysis and hierarchical summary receiver operating characteristic curves (HSROC). The risk of bias was assessed using Quality Assessment for Diagnostic Accuracy Studies-2 (QUADAS-2) tool. Subgroup analyses were conducted based on the type of GI luminal disease, AI model, reference standard, and type of data used for analysis. This study is registered with PROSPERO (CRD42021288360).

Findings

We included 73 studies, of which 31 were externally validated and provided sufficient information for inclusion in the meta-analysis. The overall sensitivity of AI for detecting GI luminal pathologies was 91.9% (95% CI: 89.0–94.1) and specificity was 91.7% (95% CI: 87.4–94.7). Deep learning models (sensitivity: 89.8%, specificity: 91.9%) and ensemble methods (sensitivity: 95.4%, specificity: 90.9%) were the most commonly used models in the included studies. Majority of studies (n = 56, 76.7%) had a high risk of selection bias while 74% (n = 54) studies were low risk on reference standard and 67% (n = 49) were low risk for flow and timing bias.

Interpretation

The review suggests high sensitivity and specificity of AI models for the detection of GI luminal pathologies. There is a need for large, multi-center trials in both high income countries and low- and middle- income countries to assess the performance of these AI models in real clinical settings and its impact on diagnosis and prognosis.

Systematic review registration

[https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=288360], identifier [CRD42021288360].