Artificial intelligence and skin cancer

Wei, Maria L.; Tada, Mikio; So, Alexandra; Torres, Rodrigo

doi:10.3389/fmed.2024.1331895

REVIEW article

Front. Med., 19 March 2024

Sec. Dermatology

Volume 11 - 2024 | https://doi.org/10.3389/fmed.2024.1331895

This article is part of the Research TopicArtificial Intelligence in Cutaneous Lesions: Where do we Stand and What is Next?View all 11 articles

Artificial intelligence and skin cancer

Maria L. Wei^1,2^*

Mikio Tada³

Alexandra So⁴

Rodrigo Torres²

¹Department of Dermatology, University of California, San Francisco, San Francisco, CA, United States
²Dermatology Service, San Francisco VA Health Care System, San Francisco, CA, United States
³Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, United States
⁴School of Medicine, University of California, San Francisco, San Francisco, CA, United States

Artificial intelligence is poised to rapidly reshape many fields, including that of skin cancer screening and diagnosis, both as a disruptive and assistive technology. Together with the collection and availability of large medical data sets, artificial intelligence will become a powerful tool that can be leveraged by physicians in their diagnoses and treatment plans for patients. This comprehensive review focuses on current progress toward AI applications for patients, primary care providers, dermatologists, and dermatopathologists, explores the diverse applications of image and molecular processing for skin cancer, and highlights AI’s potential for patient self-screening and improving diagnostic accuracy for non-dermatologists. We additionally delve into the challenges and barriers to clinical implementation, paths forward for implementation and areas of active research.

Introduction

Artificial intelligence (AI) stands at the forefront of technological innovation and has permeated into almost every industry and field. In dermatology, significant progress has been made toward the application of AI in skin cancer screening and diagnosis. Notably, a milestone that marked the era of modern artificial intelligence in dermatology was the demonstration of skin cancer classification abilities by deep learning convolutional neural networks (CNNs), which was on par with the performance of board-certified dermatologists (1). This CNN was trained on a dataset that was two orders of magnitude greater than those previously utilized. The dermatologist-level classification ability has since been experimentally validated by other papers (2, 3). Recent progress in the field of AI enables models to not only analyze image data but also integrate clinical information, including patient demographics and past medical history (4–6). Advancements allow for the simultaneous evaluation and identification of multiple lesions from wide-field images (7, 8). Moreover, models can now gain information from whole slide images without having to use costly pixel-wise human-made annotations (9). Despite these advancements, research has found that AI models lack robustness to simple data variations, have proven inadequate in real-world dermatologic practice performance, and that barriers remain before achieving clinical readiness (2, 10–14).

Clinical applications

Artificial intelligence has been employed to predict the most common types of skin cancers, melanoma (1) and non-melanoma skin cancer (1), through image analysis. In addition, machine learning has been used on RNA datasets to develop classifiers that also predict skin cancer, as well as the prognosis of skin lesions. Several of these methods can be, or have the potential to be, readily deployed by patients, primary care practitioners, dermatologists, and dermatopathologists.

Patients

With the rising prevalence of smartphone usage, patients can directly screen for and monitor lesions with AI applications. These applications can run AI models on patients’ own local devices, which ensures the protection of patient data (15). The feasibility of an AI model to assist patients’ with self-assessed risk using smartphones has been validated with a model that was trained on pictures captured from patients’ smartphones, and which exhibited comparable performance to general practitioners’ ability to distinguish lower-risk vs. higher-risk pigmented lesions (16). Moreover, AI significantly increased the abilities of 23 non-medical professionals to correctly determine a diagnosis of malignancy from 47.6 to 87.5% without compromising specificity (12). In the future, AI models may assist with overseeing and assessing changes to lesions as they progress (17) and collaborate with apps that allow patients to examine themselves and document moles (18, 19).

Despite progress with these AI models, there is no smartphone application that is endorsed on the market in the United States for non-professionals to evaluate their lesions as they do not have satisfactory performance or generalizability (20). Limitations include biases introduced due to the narrow range of lesion types, skin pigmentation types, and low number of high-quality curated images used in training. Further, inadequate follow-up has been a limitation with regards to identifying false negative diagnoses (21). Notably, users may not be adequately protected from the risks of using smartphone diagnostic apps by Conformit Europenne (CE) certification, which endorsed two apps with flaws (SkinVision and TeleSkin’s skinScan app). A prospective trial of SkinVision found low sensitivity and specificity for melanoma classification (22). In contrast to CE, the US Food and Drug Administration’s (FDA) requirements for endorsement are more stringent (21).

Primary care

Artificial intelligence applications can enhance skin cancer screening in the primary care setting and streamline referrals to dermatologists. Referral data from primary care practitioners to teledermatology consultations were used to train a model capable of a top-3 accuracy and specificity of 93 and 83%, respectively, given 26 skin conditions that makeup 80% of encountered primary care cases (4). This performance was on par with dermatologists and surpassed primary care physicians (PCPs) and nurse practitioners. This type of model could assist PCPs in diagnosing patients more accurately and broadening their differential diagnoses. In cases in which the top 3 diagnoses from the model have the same management strategy, patients may start treatment while awaiting further workup or follow-up with dermatology. Nevertheless, further testing on populations with a low prevalence of skin cancer is essential to demonstrate efficacy in the broader population (23).

Dermatology

Models have been trained to use electronic health record (EHR) data and/or gene sequencing data to predict an individual’s likelihood of developing melanoma (24–27) or nonmelanoma skin cancer (27–31). While AI models could potentially flag patients at high risk of skin cancer to be screened, studies are limited by the variability of included predictive factors, inconsistent methods of evaluating models, and inadequate validation (32). Moreover, EHRs often do not include some of the most important risk determinants for skin cancer, such as exposure to UV light and the patient’s familial history; the omission of such data may result in decreased performance (28).

Artificial intelligence has the potential to supplement dermatologists’ diagnostic and treatment capabilities in what is known as augmented intelligence (AuI). For diagnosis, AuI might assist dermatologists in more effectively managing teledermatology referrals (4) and increase the efficacy of in-person visits (33). However, in a prospective trial comparing AI to dermatologists in a teledermatology setting, dermatologists outperformed the AI (13). Despite AI currently underperforming dermatologists, AI could provide a new perspective that could still be beneficial as AI and humans exhibit distinct types of errors. For instance, models may provide insights into certain images’ classification ambiguity, whereas humans are better able to distinguish variability in image quality such as blurriness or shadowing (12).

Augmented intelligence can also assist with suggesting clinical decisions given inputted images, such as recommending whether a lesion warrants excision (34). The integration of AuI into dermatologic patient management resulted in a 19.2% reduction in unnecessary excisions of benign lesions (35). Although current CNNs’ performance has been shown to fall short when compared with using sequential dermatoscopic photography in predicting melanoma, AuI may be used in the future by dermatologists to evaluate and monitor lesion change (36). Of interest, in this study, neither dermatologists nor the CNN had satisfactory diagnostic performance levels on baseline images, but both dermatologists and CNN had improved performances when follow-up images were provided, and the best performance was combining CNN and dermatologist assessment together.

Integration of AI into advanced imaging techniques may reduce the extent of training necessary to use them (37). One area of application is in the detection of the dermal-epidermal junction, which is crucial in a non-invasive method of skin cancer diagnosis called reflectance confocal microscopy (RCM) imaging (38). Furthermore, there are ongoing efforts to analyze RCM images with AI (39).

The FDA has not approved any medical devices or algorithms based on artificial intelligence in the field of dermatology (40, 41). On the other hand, the FotoFinder Moleanalyzer Pro, an AI application for dermatology, was approved in the European market. It demonstrated performance on par with dermatologists in store-and-forward dermatology (42) and a prospective diagnostic study (43), however, the latter had extensive exclusion criteria, e.g., excluding patients of skin type IV and greater. The first randomized controlled trial comparing AI skin lesion prediction to dermatologists’ assessment reported that AI did not exceed attending dermatologists in skin cancer detection (44).

Dermatopathology

With the growing application of whole slide imaging (WSI) in the field of dermatopathology (45), AI can potentially support dermatopathologists in several ways, particularly skin cancer recognition. Among the AI models trained to detect melanoma from digitized slides (5, 46–50), two models were able to match the performance of pathologists in an experimental setting. These models were limited in that they were only given either a part of (46) or a single (49) hematoxylin and eosin (H&E)-stained slide. In contrast, pathologists can utilize supplementary data such as immunohistochemistry or relevant patient data. However, integrating patient information, such as age, sex, and lesion location, into CNN models did not enhance performance (5). One limitation to implementing AI in dermatopathology is the unreliable prediction that may be made when a model is given an input that differs from the training dataset. One potential solution is the use of conformal prediction, which has been shown to increase accuracy of prostate biopsy diagnosis by flagging unreliable predictions (51).

Studies have been done to evaluate AI’s ability for diagnosing basal cell carcinoma (BCC) using WSI (9, 52, 53). Campanella et al. showed the ability of a convolutional neural network to achieve 100% sensitivity for detecting BCC, on the test set; importantly, a multiple instance learning approach was introduced that obviated the necessity of time-consuming pixel-level slide annotations to distinguish between areas with and without disease (9). Kimeswenger et al. subsequently incorporated an “attention” function to draw attention to areas of digital slides that include indications of BCC. Interestingly, CNN pattern recognition varied from that of pathologists for BCC diagnosis as tissues were flagged based on different image regions (53). These CNNs could also be applied to identify and filter slides for Mohs micrographic surgery (52). In the setting of rising caseloads, AI can help to decrease pathologists’ workload generated by these commonly diagnosed, low risk entities. Duschner et al. applied AI to automated diagnosis of BCCs, and demonstrated both sensitivity and specificity of over 98%. Notably, the model demonstrated successful generalization to samples from other centers with similar sensitivity and specificity (54).

Artificial intelligence has also had some success in predicting sentinel lymph node status (55), visceral recurrence, and death (56) based on histology of primary melanoma tumors. In the future, AI could be utilized to identify mitotic figures, delineate tumor margins, and determine the results of immunohistochemistry stains; further, AI could recommend more immunostaining or genetic panels that could be of use diagnostically (57). While AI predictions have not been consistently successful for melanoma (58), AI has been demonstrated to identify the mutation given a lung adenocarcinoma slide that has been stained with H&E (59–61).

Machine learning applied to RNA profiles

While AI in dermatology is most often associated with using deep learning techniques on clinical and histological images, machine learning methods have been utilized in developing gene expression profile (GEP) classifiers for predicting skin cancer diagnosis and prognosis. Generally, simpler machine learning models that require tuning of fewer parameters compared to more complex neural nets have been employed to analyze GEP. They still, however, share the benefits of the ability to use iterative learning optimized to find patterns in complex non-linear relationships not possible in traditional statistical and linear models, assuming sufficient data is available. Some common models include many Kernel methods such as support vector machines (SVM) or tree-based models, e.g., Random Forest and XGBoost that have often been found to produce the best performance for tabular gene expression data. These models also often use some method to feature select (62) to both maximize performance and find the most relevant features for the classification task. This also allows for a better sense of interpretability as with fewer features there is the ability to assess their relevance individually. Reproducibility is of great concern and has often been the critique of many biomarker and classifier studies, since there is often little to no overlap in targets, which understandably can lead to general skepticism of the results, especially considering the generally small sample sizes employed in many studies. Despite this, there has been a push to make use of molecular profiling to assist in different aspects of melanoma management.

Currently, the GEPs developed for use in melanoma management fall into two categories. First, some GEPs are used as a diagnostic tool to help determine the malignancy of a pigmented lesion either pre- or post-biopsy. Pre-biopsy there is an epidermal tape sampling test that can predict melanoma with 94% sensitivity and 69% specificity (63) with an improved sensitivity of 97%, when TERT mutation assessment is included (64). There are, however, reported limitations to this test as it cannot be used on mucous membranes or acral skin and there is the possibility of non-actionable results due to insufficient sample collected for testing (65). Post-biopsy GEPs can be used to help with diagnostically difficult cases such as Spitz nevi, but have poorer performance on Spitz melanomas and pediatric patients (66). Machine learning has also been applied with success to miRNA profiles to differentiate melanomas from nevi (67).

Second, there are GEPs, derived from biopsy material, that are used as prognostic tools to stratify the risk of melanoma recurrence or metastasis (68), however subsequent management protocols for high risk early-stage disease are not in place (68). Despite optimism for prognostic use of prognostic GEP classifiers, the expert consensus is that there is currently insufficient evidence to support routine use (69). The climate, however, is evolving, with new reports incorporating additional clinicopathological data together with patient outcomes (70). Overall, there remains a lack of consensus on the use of the GEP biopsy and tape sampling tests (71, 72). Further studies are needed, such as non-interventional retrospective studies, followed by prospective interventional trials, but there remains promise that they can become additional tools in providers’ arsenal of available tests.

Barriers to clinical implementation

Image quality

Image quality significantly impacts the prediction performance of AI computer vision (73). Several factors can result in subpar images, including inadequate focus or lighting, color misrepresentations, unfavorable angles or framing, obstructing objects, and poor resolution. Moreover, while humans can readily ignore items such as blurred focus, scale bars, and surgical markings, these artifacts affect AI prediction performance (11, 74, 75).

Obtaining consistently high-quality images in the fast-paced environment of a clinic presents many challenges. Barriers such as limited time, insufficient training, inadequate imaging equipment, and other constraints may hinder the process. Guidelines for skin lesion imaging have been suggested to facilitate the capture of high-quality images (76, 77). These guidelines include suggestions for adequate lighting, background, field of view, image orientation, and color calibration. Additional recommendations are suggested for photographing skin of color (78).

A comprehensive, multifaceted solution is necessary to enhance image quality. Educating dermatology residents in photography might contribute to improving image quality in a clinical setting (79). Moreover, a study done in United Kingdom primary care facilities showed enhanced photo quality when patients were educated with the “4 Key Instructions” (Framing—requesting at least one near and one distant image; Flash—educating about the use of flash to enhance image sharpness, emphasizing not to use it too closely; Focus—educating patients to give the camera time to auto-focus; Scale—asking for a comparison like a ruler or a coin) (73). Among 191 digital applications for skin imaging, 57% included one or more strategies to enhance quality, but it was rare for applications to have more than one (80). An immediate feedback feature for image quality shows promise, although it is still in the early stages of development (81).

Algorithmic bias and health equity

There is a risk for indiscriminately implemented AI to potentially exacerbate health inequities by incorporating pre-existing and newly emerging biases (82) (Table 1). Pre-existing biases include pre-coding biases in datasets used to train the model or personal biases inadvertently introduced by developers. Emergent biases can be introduced by relying on models in new or unexpected contexts and not adjusting models for new knowledge and shifting cultural norms.

Table 1

Table 1. Challenges in AI in dermatology.

Artificial intelligence models for early melanoma detection have relied on large datasets from individuals with mostly lighter pigmented skin. While melanoma is more prevalent among individuals with lighter skin, those with darker skin frequently come in with a more severe stage of disease and experience lower survival rates. An AI model trained on lighter skin tones for melanoma prediction had lower performance for lesions on darker skin tones (83). The International Skin Imaging Collaboration (ISIC) archive, one of the most extensive and widely used databases for individuals in the United States, Europe, and Australia, and a prospective diagnostic accuracy paper comparing an AI model with other noninvasive imaging techniques did not include individuals with Fitzpatrick phototype III or higher (43, 84). Efforts to collect lesions from individuals of all skin tones should be a priority, and transparency in the characteristics of training datasets as well as the quality and range of disease labels should be disclosed (85).

AI model validation

It is crucial to carefully validate AI models before applying them in real-world settings (Table 1). Computational stress testing is necessary to guarantee efficacy in actual clinical scenarios (2). Validation should be performed using large amounts of external data as determining performance solely on internal data has been shown to often lead to overestimation (2, 86). The reason for the lower model performance on external validation datasets can arise from training data that is not representative of the general population or from leakage of additional data, either between the training and testing data or from the future drift of data (86). Unfortunately, most models are not open code, limiting research into the external validation of these models. On the other hand, Han et al. share the use of their models publicly, setting a standard that should be followed (7, 12, 87). Along with publicly shared models, having publicly shared benchmarks such as the melanoma classification benchmark (88) and accessible databases (such as DataDerm) is crucial for comprehensive validation (89). Few public datasets have representation of all skin types. A rigorous testing of outcome metrics with and without the support of an AI model in randomized controlled trials would be optimal.

Though CNNs routinely and autonomously identify image features pertinent for classification, this ability can lead to the incorporation of unintended biases. An example of possible bias is the use of ink markings (75) or scale bars (74) in melanoma identification. It is important to assess whether and how changes to inputted images can affect the prediction output. Changes to test include image quality, rotation, brightness/contrast adjustments, adversarial noise, and the presence of artifacts, such as those aforementioned (2, 10, 74, 75, 90, 91). Testing for robustness given such uncertainties can assist users in understanding the model’s scope and reasons for error (92).

The path to clinical implementation

Given the rapid pace of advancements in AI in the medical field, the American Academy of Dermatology (AAD) issued a position statement regarding how to integrate augmented intelligence into dermatologic clinical settings (93). The AAD underscored the importance of high-quality validated models, open transparency to patients and providers, and efforts to actively engage stakeholders.

For AI to be broadly accepted in dermatology, studies need to demonstrate a significant improvement in health outcomes. The first randomized controlled trial of an AI’s ability to augment clinicians’ diagnostic accuracy on skin lesions highlighted the potential for AI to augment non-dermatologists diagnostic performance in a real-world setting, but not that of dermatology residents in training, and found superior performance by experienced dermatologists—who use patient metadata as well as images—compared to the AI model (44). It also noted that if the model’s top 3 diagnoses were incorrect, trainees’ diagnostic accuracy fell after consulting the AI model, highlighting a pitfall of using current AI models.

Increasing access to dermatological care

AI offers hope for increasing health equity through increasing access, and democratizing skin screenings. Access to dermatologists is a problem, especially in rural areas, where it may take longer for a patient to obtain a biopsy of suspected melanoma (94). As of 2018, 69% of counties in the United States do not have access to dermatologists (95). Further exacerbating the issue, many dermatology clinics closed during the COVID-19 pandemic (96). AI-augmented teledermatology may be able to enhance accessibility by streamlining referrals and reducing waiting times, and it could help increase the accessibility in areas with a scarcity of dermatopathologists. AI may also help dermatologists more accurately diagnose skin disease in patients whose skin is not well-represented in the local population (97).

Human-computer collaboration

Clinicians are indispensable to synthesizing relevant context and offering patient counseling and subsequent care. Furthermore, given the enhanced accuracy of diagnosis when integrating AI into decision-making, the future of dermatology will likely entail human-computer collaboration (98). Embedding Collective Human Intelligence (CoHI) or even swarm intelligence (CoHI with interaction between participating humans) as checkpoints within an AI model may help overcome the limited ability of AI to contextualize and generalize (99).

When interacting with AI, potential cognitive errors and biases may be exacerbated, especially when there is discordance in diagnosis between clinicians and AI (100). The use of AI introduces a new kind of bias called automation bias, in which humans tend to unquestioningly trust automated decisions from AI (100). When physicians used AI decision support for reading chest X-rays, experienced physicians rated diagnostic advice as lower quality when they thought the advice was generated by AI, but not physicians with less experience (101). Though rated as less trustworthy, inaccurate advice by AI still led to decreased diagnostic accuracy (101). It will be important for AI developers and medical educators, the latter when teaching AI applications, to take such human factors into account.

Areas of active research

There are several areas of active computational research that are anticipated to aid in bringing validated image analysis models to clinical use (Table 2).

Table 2

Table 2. Future advances in AI.

Federated learning

A problem with training models for clinical use to detect skin cancer or other disorders is the limitation in sharing clinical images due to privacy concerns and the inherent limitations in collecting sufficient images of rare skin cancer types and disorders and of different skin pigmentation. The current approach for multi-institution model training necessitates the forwarding of patient data to a centralized location, termed collective data sharing (102). Alternatively, federated learning uses a decentralized training system in which a shared global model learns collaboratively while keeping data locally. Each device’s data comes with its own inherent bias and different properties due to demographic variations. Instead of sending data to a central server, the model itself travels to each device, learns from the locally-stored data, and then updates the global model with this newly acquired training. By not sharing the training data across devices, federated learning enables the preservation of privacy of sensitive data (103). In a study across 10 institutions, the performance from federated learning was shown to better than that of a single institution model and shown to be comparable to that of collective data sharing (102). Moreover, the federated learning approach would be a method to virtually aggregate data on rare skin cancers or disorders from different centers, such as Merkel cell cancer, or data from patients with rarer subtypes of skin cancers, such as mucosal or acral melanoma. An analogy of federated learning is a team of dermatologists who visit multiple clinics to learn and share knowledge, rather than asking patients to visit a single central hospital to see the team. A model trained with federated learning can offer more accurate diagnoses on rare skin cancer types and disorders, including lesions found on differing skin pigmentations, and still maintain patient privacy.

Deploying federated learning faces several challenges. Ensuring fairness across different demographic groups and data security while optimizing the overall performance of the global model is computationally complicated. Establishing computational infrastructure capable of seamless communication, such as transmission of a model, may require additional IT assistance. These obstacles pose a barrier to the practical implementation of federated learning (104).

Uncertainty estimation

Whereas many studies on the applications of skin cancer classification models have reported high accuracy, these models rarely concurrently report uncertainty estimates for the predictions and when assessed, models have been found to be overconfident (2). As a result, medical practitioners may hesitate to incorporate these models into their diagnostic workflow. Uncertainty estimation provides a meaningful confidence level, with regards to when to trust a model prediction. To safely deploy a computer-aided diagnostic system in a clinical setting, it is crucial to incorporate not only a model’s prediction but also a confidence score. Clinicians are then equipped to decide whether to trust the prediction or alternatively disregard the AI prediction and rely on provider assessment (94).

Multimodal learning

Most skin disease diagnosis models are trained only on one data modality: clinical or histological images or RNA sequence data. However, medical data is inherently multimodal by nature, and dermatologists use patient information in addition to clinical images to make a diagnosis. Metadata from patients, such as age, ethnicity, and anatomic location of lesion, can also be useful to enhance skin cancer classification models. Multimodal learning is a technique where a single model learns from multiple types of data simultaneously (105). One skin disease classifier that integrated up to six clinical images and 45 demographic items and medical history to classify 26 skin conditions as the primary prediction outperformed six primary care physicians and six nurse practitioners (4). Another study showed that a model integrating dermatoscopic and macroscopic images with three patient metadata variables outperformed models with just one image modality for binary and multiclass classification setting (106, 107).

Incremental learning

Current skin disease diagnostic models are static, wherein data distribution is already known and the target skin diseases are pre-set. However, in the clinical setting, as the database size grows over time, with the accumulation of new images, a shift in data distribution can occur, for example after the inclusion of new skin disease classes, or with improved or new devices. Changes or differences in image acquisition tools, such as mobile phone cameras, also can shift dataset distribution by changing the quality of images captured. This results in the need to adapt models to new images while not degrading model performance on the pre-existing data. Incremental learning enables a model to continue learning the attributions of new data while preserving learned features from the data acquired before; successful incremental learning strategies on dermatology images have been recently reported (97, 108, 109).

Generative adversarial networks modeling

The ability to synthesize new data that closely resembles real skin lesion images can augment training on rare skin diseases and create a diverse and balanced dataset (110). While the potential to fill the data gaps is promising, models’ performance does not show significant improvement when trained on synthesized data (111). The stylized images should be used cautiously, so as to not degrade the quality or reliability of the dataset and model by adding unintentional bias, and also ensure alignment with real-world conditions for clinical application (111, 112).

Emerging new model architectures—vision transformers

Vision transformer has emerged as an advanced model architecture, challenging the traditional dominance of convolutional neural networks (CNNs). CNNs have been the default choice for in both medical imaging and natural image tasks (113, 114). However, inspired by the success of Transformer in natural language processing (NLP), researchers have increasingly utilized ViTs or hybrid models of CNN and ViT and demonstrated promising results across various medical imaging tasks (115, 116). Concurrently, a resurgence of CNN is occurring with advanced CNN architectures such as ConveNeXt, showcasing competitive performance alongside Transformers in natural image task (117). These ongoing explorations and adaptations of ViTs address the challenges and uncertainties in deciding on model architecture.

Applications of large language model

Large language model is a type of natural language processing model that is trained to “understand” and generate human-like text, and has potential applications in enhancing clinical decision-making and overall patient care. For example, ChatGPT-style LLMs designed only for clinical diagnosis can accelerate clinical diagnoses by helping patients better understand their medical conditions and communicate with doctors remotely (118). Another application of LLM in clinic is AI-enabled digital scribes that can record and summarize patients visit information for treatment plans and billing purposes, eliminating the workload due to medical charting (119, 120). While there are positive aspects of LLM utilization for clinical care, there are also concerns such as the need for continued oversight of such models. It is essential to recognize that LLMs and doctors can complement each other, with LLM providing efficiency in processing large amounts of information while doctors offer interpretation of the data, emotional intelligence and compassion to patients, thus improving patient care (121). However, caution should be used when utilizing LLM for medical advice. A recent study demonstrated that 4 LLM provided erroneous race-based responses to queries designed to detect race-based medical misapprehensions (122). To address this, testing of LLMs is critical before clinical implementation, and human feedback can help to correct errors.

Self-supervised learning

Self-supervised learning offers a promising approach to enhance the robustness and generalizability of models by enabling them to learn meaningful representations from unlabeled data. Traditionally, the efficacy of training deep learning models has relied on access to large-scale labeled datasets (123). However, in the medical field, acquiring such data is costly and requires specialized expertise. As a result, the scarcity of annotated data poses a significant obstacle to the development of robust models for various clinical settings. SSL addresses this challenge by developing a versatile model capable of efficiently adapting to new data distributions with a reduced number of labeled data during fine-tuning, while ensuring strong performance (124). Thus, SSL is a promising method to bridge the gap between AI research in the medical field and its clinical implementation.

Conclusion

Artificial intelligence currently is able to augment non-dermatologists’ performance in a synergistic fashion and performs at the level of experienced dermatologists in a randomized controlled trial assessing skin malignancies. This achievement opens the door to aiding primary care physicians’ discriminative triaging of patients to dermatologists and likely will decrease referrals for benign lesions, thereby freeing up dermatology practices to address true malignancies in a timely manner. Similarly, the potential for patients to self-refer for lesions concerning for malignancy may be possible in the near future, with models that can assess regional anatomic sites for lesions with concerning features. Through the implementation of AI, access to dermatologic care may become more democratic and accessible to the general population, including underserved subpopulations.

Limitations in performance include misdiagnosis by the model when assessing out of distribution diagnoses, leading clinicians astray; a solution might be for models to provide confidence estimates together with diagnostic predictions. A formidable problem in training models is the large number of diagnoses in dermatology, including numerous low incidence but aggressive malignancies (such a Merkel cell carcinoma, microcytic adnexal carcinoma, dermatofibrosarcoma tuberans, and angiosarcoma), or low incidence chronic malignancies such as cutaneous T cell lymphoma with potential for aggressive progression; one solution is federated training through the collaboration of multiple academic centers, some of which have specialty clinics focused on these diagnoses; or the formation of a central shared databank. In the future, models likely will be utilized to aid experienced dermatologists and dermatopathologists, as well as primary care providers and patients, particularly after training on multimodal datasets.

Author contributions

MW: Writing – review & editing, Writing – original draft, Supervision, Resources, Conceptualization. MT: Writing – review & editing, Writing – original draft. RT: Writing – review & editing, Writing – original draft, Visualization. AS: Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Funding provided by Department of Defense grant W81XWH2110982 (MW, RT) and Department of Veterans Affairs grant 1I01HX003473 (MW, MT).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Esteva, A, Kuprel, B, Novoa, RA, Ko, J, Swetter, SM, Blau, HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. (2017) 542:115–8. doi: 10.1038/nature21056