Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 7 - 2024 | doi: 10.3389/frai.2024.1454441
This article is part of the Research Topic Generative AI for Enhanced Predictive Models: From Disease Diagnosis to Diverse Applications View all articles

Is Synthetic Data Generation Effective in Maintaining Clinical Biomarkers? Investigating Diffusion Models Across Diverse Imaging Modalities

Provisionally accepted
  • Weill Cornell Medicine- Qatar, Ar-Rayyan, Qatar

The final, formatted version of the article will be published soon.

    The integration of recent technologies in medical imaging has become a cornerstone of modern healthcare, facilitating detailed analysis of internal anatomy and pathology. Traditional methods, however, often grapple with data-sharing restrictions due to privacy concerns. Emerging techniques in artificial intelligence offer innovative solutions to overcome these constraints, with synthetic data generation enabling the creation of realistic medical imaging datasets, but the preservation of critical hidden medical biomarkers is an open question. This study employs state-of-the-art Denoising Diffusion Probabilistic Models integrated with a Swin-transformer-based network to generate synthetic medical data. Three distinct areas of medical imaging - radiology, ophthalmology, and histopathology - are explored. The quality of synthetic images is evaluated through a classifier trained to identify the preservation of medical biomarkers. The diffusion model effectively preserves key medical features, such as lung markings and retinal abnormalities, producing synthetic images closely resembling real data. Classifier performance demonstrates the reliability of synthetic data for downstream tasks, with F1 and AUC reaching 0.8-0.99. This work provides valuable insights into the potential of diffusion-based models for generating realistic, biomarker-preserving synthetic images across various medical imaging modalities. These findings highlight the potential of synthetic data to address challenges such as data scarcity and privacy concerns in clinical practice, research, and education.

    Keywords: Synthetic data generation, clinical biomarkers, Denoising Diffusion Models, medical imaging, Swin-Transformer Network

    Received: 25 Jun 2024; Accepted: 31 Dec 2024.

    Copyright: © 2024 Hosseini and Serag. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Ahmed Serag, Weill Cornell Medicine- Qatar, Ar-Rayyan, Qatar

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.