Background: Due to the lower reliability of laboratory tests, skin diseases are more suitable for diagnosis with AI models. There are limited AI dermatology diagnostic models combining images and text; few of these are for Asian populations, and few cover the most common types of diseases.
Methods: Leveraging a dataset sourced from Asia comprising over 200,000 images and 220,000 medical records, we explored a deep learning-based system for Dual-channel images and extracted text for the diagnosis of skin diseases model DIET-AI to diagnose 31 skin diseases, which covers the majority of common skin diseases. From 1 September to 1 December 2021, we prospectively collected images from 6,043 cases and medical records from 15 hospitals in seven provinces in China. Then the performance of DIET-AI was compared with that of six doctors of different seniorities in the clinical dataset.
Results: The average performance of DIET-AI in 31 diseases was not less than that of all the doctors of different seniorities. By comparing the area under the curve, sensitivity, and specificity, we demonstrate that the DIET-AI model is effective in clinical scenarios. In addition, medical records affect the performance of DIET-AI and physicians to varying degrees.
Conclusion: This is the largest dermatological dataset for the Chinese demographic. For the first time, we built a Dual-channel image classification model on a non-cancer dermatitis dataset with both images and medical records and achieved comparable diagnostic performance to senior doctors about common skin diseases. It provides references for exploring the feasibility and performance evaluation of DIET-AI in clinical use afterward.
In the information age, real-world data-based evidence can help extrapolate and supplement data from randomized controlled trials, which can benefit clinical trials and drug development and improve public health decision-making. However, the legitimate use of real-world data in China is limited due to concerns over patient confidentiality. The use of personal information is a core element of data governance in public health. In China’s public health data governance, practical problems exist, such as balancing personal information protection and public value conflict. In 2021, China adopted the Personal Information Protection Law (PIPL) to provide a consistent legal framework for protecting personal information, including sensitive medical health data. Despite the PIPL offering critical legal safeguards for processing health data, further clarification is needed regarding specific issues, including the meaning of “separate consent,” cross-border data transfer requirements, and exceptions for scientific research. A shift in the law and regulatory framework is necessary to advance public health research further and realize the potential benefits of combining real-world evidence and digital health while respecting privacy in the technological and demographic change era.
Wikipedia is an open-source online encyclopedia and one of the most-read sources of online health information. Likewise, Wikipedia page views have also been analyzed to inform public health services and policies. The present review analyzed 29 studies utilizing Wikipedia page views for health research. Most reviewed studies were published in recent years and emanated from high-income countries. Together with Wikipedia page views, most studies also used data from other internet sources, such as Google, Twitter, YouTube, and Reddit. The reviewed studies also explored various non-communicable diseases, infectious diseases, and health interventions to describe changes in the utilization of online health information from Wikipedia, to examine the effect of public events on public interest and information usage about health-related Wikipedia pages, to estimate and predict the incidence and prevalence of diseases, to predict data from other internet data sources, to evaluate the effectiveness of health education activities, and to explore the evolution of a health topic. Given some of the limitations in replicating some of the reviewed studies, future research can specify the specific Wikipedia page or pages analyzed, the language of the Wikipedia pages examined, dates of data collection, dates explored, type of data, and whether page views were limited to Internet users and whether web crawlers and redirects to the Wikipedia page were included. Future research can also explore public interest in other commonly read health topics available in Wikipedia, develop Wikipedia-based models that can be used to predict disease incidence and improve Wikipedia-based health education activities.