Skip to main content

ORIGINAL RESEARCH article

Front. Med., 06 October 2023
Sec. Dermatology
This article is part of the Research Topic Artificial Intelligence in Cutaneous Lesions: Where do we Stand and What is Next? View all 11 articles

Effectiveness of an image analyzing AI-based Digital Health Technology to identify Non-Melanoma Skin Cancer and other skin lesions: results of the DERM-003 study

Helen Marsden
Helen Marsden1*Caroline MorganCaroline Morgan2Stephanie AustinStephanie Austin2Claudia DeGiovanniClaudia DeGiovanni3Marcello VenziMarcello Venzi1Polychronis KemosPolychronis Kemos1Jack GreenhalghJack Greenhalgh1Dan MullarkeyDan Mullarkey1Ioulios PalamarasIoulios Palamaras4
  • 1Skin Analytics Ltd., London, United Kingdom
  • 2Dermatology Unit, University Hospitals Dorset, Poole Hospital, Poole, United Kingdom
  • 3Dermatology Unit, University Hospitals Sussex NHS Foundation Trust, Brighton, United Kingdom
  • 4Department of Dermatology, Barnet and Chase Farm Hospitals, Royal Free London NHS Foundation Trust, London, United Kingdom

Introduction: Identification of skin cancer by an Artificial Intelligence (AI)-based Digital Health Technology could help improve the triage and management of suspicious skin lesions.

Methods: The DERM-003 study (NCT04116983) was a prospective, multi-center, single-arm, masked study that aimed to demonstrate the effectiveness of an AI as a Medical Device (AIaMD) to identify Squamous Cell Carcinoma (SCC), Basal Cell Carcinoma (BCC), pre-malignant and benign lesions from dermoscopic images of suspicious skin lesions. Suspicious skin lesions that were suitable for photography were photographed with 3 smartphone cameras (iPhone 6S, iPhone 11, Samsung 10) with a DL1 dermoscopic lens attachment. Dermatologists provided clinical diagnoses and histopathology results were obtained for biopsied lesions. Each image was assessed by the AIaMD and the output compared to the ground truth diagnosis.

Results: 572 patients (49.5% female, mean age 68.5 years, 96.9% Fitzpatrick skin types I-III) were recruited from 4 UK NHS Trusts, providing images of 611 suspicious lesions. 395 (64.6%) lesions were biopsied; 47 (11%) were diagnosed as SCC and 184 (44%) as BCC. The AIaMD AUROC on images taken by iPhone 6S was 0.88 (95% CI: 0.83–0.93) for SCC and 0.87 (95% CI: 0.84–0.91) for BCC. For Samsung 10 the AUROCs were 0.85 (95% CI: 0.79–0.90) and 0.87 (95% CI, 0.83–0.90), and for the iPhone 11 they were 0.88 (95% CI, 0.84–0.93) and 0.89 (95% CI, 0.86–0.92) for SCC and BCC, respectively. Using pre-determined diagnostic thresholds on images taken on the iPhone 6S the AIaMD achieved a sensitivity and specificity of 98% (95% CI, 88–100%) and 38% (95% CI, 33–44%) for SCC; and 94% (95% CI, 90–97%) and 28% (95 CI, 21–35%) for BCC. All 16 lesions diagnosed as melanoma in the study were correctly classified by the AIaMD.

Discussion: The AIaMD has the potential to support the timely diagnosis of malignant and premalignant skin lesions.

1. Introduction

Non-Melanoma Skin Cancer (NMSC) is the fifth most common form of all types of cancer worldwide, with the most common NMSC types being Basal Cell Carcinoma (BCC), accounting for 75% of cases, and Squamous Cell Carcinoma (SCC), accounting for 23% of NMSC cases (1). In the UK, there are around 156,000 NMSC cases diagnosed, resulting in 920 deaths, per annum. The actual incidence of NMSC may be higher however, as it is known to be under-reported due to the number of multiple diagnoses per patient. Incidence rates of skin cancer have increased by over 2.5-fold (169%) since the early 1990s and are projected to rise by 14% in the UK between 2023 and 2025 (2). While NMSCs make up most of skin cancer diagnoses, melanoma has a much higher mortality rate due to high risk of metastasis, and early diagnosis is critical. When melanoma is caught early, the chances of survival are greatly improved (3).

Currently, diagnosis of NMSC is usually clinical, with subsequent histological confirmation following excision and specialist interpretation (4). To facilitate early diagnosis, alongside managing patient concern, a high proportion of ‘suspicious moles’ are referred from primary care on the two-week wait pathway, which has seen an increase from 332-thousand referrals in 2015/16 to 509-thousand referral in 2019/20 (5). However, a high proportion of these lesions are benign (6) with the main diagnoses being melanocytic naevi or seborrheic keratosis. Due to the nature of these referrals, they are awarded an inappropriate priority at the expense of more serious disorders. As a result, healthcare services are under pressure with the number of patients being referred for specialist evaluation, onward biopsies and subsequent management of suspicious skin lesions, such that a decreasing percentage of patients referred on a two-week wait pathway are seen within 14 days (5). There is a need to improve diagnostic accuracy of skin lesions earlier on in this process, in order to minimize unnecessary referrals and skin biopsies.

Deep Ensemble for the Recognition of Malignancy (DERM) is a Digital Health Technology that includes an Artificial Intelligence as a Medical Device (AIaMD) algorithm that is able to analyze dermoscopic images of a skin lesion and determine the presence of melanoma in pigmented lesions, with a similar accuracy to clinicians specialized in skin cancer detection (7). The AIaMD has been trained and tested on dermoscopic images of skin lesions with confirmed diagnoses of a range of malignant and non-malignant lesions and sub-types. This helps ensure that, for example, melanoma lesions with different clinical appearance like amelanotic melanoma (8), would be classified as melanoma. However, the AIaMD would not be expected to identify skin cancer from different image types, such as that from reflectance confocal microscopy. The AIaMD is also able to detect BCC and SCC, premalignant and selected benign lesions [such as Intraepidermal Carcinoma (IEC/SCC in situ), actinic keratosis, seborrheic keratosis, and benign melanocytic nevi] providing additional information to aid the clinician in differentiating skin cancers, including melanoma, from benign conditions. The AIaMD provides a high degree of accuracy in the diagnosis of NMSC using historical dermoscopic images, but clinical validation is necessary to demonstrate its utility in clinical practice. DERM is a Class IIa UKCA marked medical device and has been deployed in clinical pathways within the UK since 2020.

2. Materials and methods

The DERM-003 study was a prospective, multi-center, single-arm, cross-sectional, blinded study (NCT04116983), designed to demonstrate the effectiveness of the AIaMD to identify SCC and BCC. Secondary objectives included demonstrating the effectiveness of the AIaMD to identify premalignant and benign conditions, comparing the AIaMD performance to dermatologists, and demonstrating the feasibility of image capture in a clinic setting. Ethical approval for the study was granted by the Leicester South National Research Ethics committee.

Eligible participants were patients attending dermatology clinics with at least one suspicious skin lesion that was suitable for photographing. Lesions were defined as suspicious by a dermatologist, with no requirement on lesions being of a particular type or pigmentation. Patients provided written informed consent for the study. Recruitment was on a consecutive, competitive recruitment basis in 4 UK hospitals between June 2020 and February 2022. Lesions needed to be less than 15 mm in diameter, not located on an anatomical site unsuitable for photographing (genitals, hair-bearing areas, under nails) or in an area of visible scarring or tattooing, and not previously biopsied, excized or otherwise traumatized. Suitable lesions were photographed by three smartphones (iPhone 6S, iPhone 11 Apple Inc., Samsung Galaxy S10) with (dermoscopic image) or without (macroscopic image) a Dermlite DL1 Basic (DermLite LLC) lens attached, providing a 10x magnification. In addition, one dermoscopic image of healthy skin was also taken by each camera. The AIaMD assessment was not shared with the investigator, who managed the patient in accordance with standard of care. The patient had completed the protocol-defined procedures once the photographs had been taken. For each lesion included in the study, a clinical diagnosis and the clinician’s assessment of the likelihood of skin cancer, using a four-point Likert scale (unlikely, equivocal, likely, highly likely), was collected. Where a biopsy was taken, the histopathology-confirmed diagnosis was collected and categorized as melanoma, SCC, BCC, IEC, Actinic Keratosis (AK), Atypical, Benign or other. When there was histopathological uncertainty in the diagnosis, investigators reported the most likely diagnosis. ‘Other’ diagnoses were reviewed by the Chief Investigator.

Images of skin lesions were captured electronically and securely transferred to DERM for analysis by the AIaMD. All images were analyzed by DERM v3 after the completion of the study. The AIaMD generates a numeric output (continuous scale) for each of the examined classes, which reflects its confidence that the lesion is that condition. The sum of the numeric output of all classes is always 1. Threshold settings are defined for each lesion type, above which a lesion is classified as that lesion type. The AIaMD returns the most serious lesion type where the confidence score is above the threshold setting.

2.1. Statistical aspects

Patients and lesions that did not meet the inclusion criteria were excluded from the Intention To Treat population (ITT), as were those lesions without a final diagnosis available. Lesions with no AIaMD result available (missing dermoscopic images, and/or where these failed the DERM v3 image quality assessment) were excluded from the Per Protocol (PP) population. The primary analyses were conducted on biopsied lesions in the PP population only.

Area Under the Receiver Operator Characteristic (AUROC) curves were used to examine the association of the algorithm’s confidence scale with the histopathology-confirmed diagnosis (biopsied lesions) or clinical diagnosis (non-biopsied lesions). The co-primary outcome measures of the study were the one-against-all AUROC for both SCC and BCC. The iPhone 6S camera was used as the reference device. The study aimed to demonstrate both co-primary endpoints were above 0.9.

Assuming the true AUROC curve of the AIaMD is 0.98 and an incidence rate of 11% for SCC and 43% for BCC, a sample size of 45 SCC and 50 BCC lesions was required to demonstrate the AUROCs were superior to 0.9 at alpha = 0.05, with 90% power. A sample size of 543 patients, with an average of 1.2 lesions per patient, was expected to provide sufficient numbers of lesions diagnosed as SCC and BCC, but recruitment remained open until 45 SCC lesions had been included in the study.

Diagnostic accuracy indices (sensitivity, specificity, predictive values, false-positive rates, and false-negative rates) were calculated using decision thresholds determined prior to the image analysis, and applying the hierarchy within the AIaMD. The hierarchy means that, if the AIaMD identifies a lesion as potentially either a BCC or melanoma, it will return the classification of melanoma. Therefore, for a lesion diagnosed as SCC, an output from the AIaMD of “suspected melanoma” is considered a true positive, whereas for a lesion diagnosed as melanoma, an output from the AIaMD of “suspected SCC” is a false negative. The definition of true positive will therefore vary depending on the lesion type being assessed. The likelihood assessment scale was used to calculate a clinician AUROC that could be compared to the AIaMD.

The influence of patient and lesion variables that may affect the AIaMD’s accuracy were investigated. The following co-variates were examined: age, sex, Fitzpatrick skin type, skin cancer risk factors including past medical history of skin cancer, lesion body location, experience of reviewing clinician, lesion change, patient’s level of concern, clinician’s assessment of likelihood of skin cancer, malignancy sub-type and staging.

A p-value of <0.05 was regarded as statistically significant, and all tests were two-tailed. Statistical estimates of accuracy are reported with 95% Confidence Intervals (CIs). Statistical analysis was conducted using R language version 4.1.3 (The R Project for Statistical Computing).

3. Results

A total of 572 patients consented to the study, providing 611 suspicious lesions. Nine patients (6 lesions) were withdrawn / excluded from the study. Eighteen lesions were excluded from the ITT population due to failing to meet eligibility criteria, resulting in 18 patients being excluded due to no eligible lesions. Two further lesions were excluded from the PP population due to missing AIaMD results, resulting in 1 further patient being excluded from the PP population (Figure 1). Of the lesions included in the PP population, 96.7% had images available from all three combinations of hardware, 2.9% had 2 images available, and 2 lesions had just one image available. Nine images failed image quality checks.

FIGURE 1
www.frontiersin.org

Figure 1. Consort diagram. Number of patients in the ITT/PP population = number of patients who have at least one lesion that fulfills the ITT/PP inclusion criteria for at least one capture device; Number of lesions in the ITT/PP datasets = number of lesions from patients included in the ITT/PP population, that fulfill the ITT/PP inclusion criteria for at least one capture device.

The PP population was equally distributed between females and males, mostly White (94%) and ranged in age from 18 to 97 years (median 73). Most patients (97.8%) had Fitzpatrick skin type I-III, with over half (56.8%) the patients reporting having Fitzpatrick skin type II (Table 1). Most lesions were located on the face and scalp (46.3%), posterior chest and back (14.5%), arms (13.5%), and legs (12.3%). On average, lesions were 8.9 (±3.5 standard deviation) mm in size, ranging from 0.8 to 15 mm (Table 2).

TABLE 1
www.frontiersin.org

Table 1. Patient demographics by analysis population.

TABLE 2
www.frontiersin.org

Table 2. Lesion characteristics by analysis population.

Forty-three lesions in the PP population were diagnosed as SCC and 176 as BCC (Table 3) by histopathology. A further 22 lesions were diagnosed as SCC or BCC by clinical diagnosis only, which were excluded from the primary analysis. These lesions did not undergo a biopsy because either the dermatologist chose to treat the lesion (n = 10), the patient refused biopsy (n = 3) or other reason (n = 9), including the biopsy occurred outside the study window. The PP population also included 16 lesions diagnosed as melanoma, and two lesions diagnosed as other malignancies [one Neuroendocrine, and one Spitzoid tumor of uncertain malignant potential (STUMP)] (Supplementary Table 1). Most malignancies were at an early stage.

TABLE 3
www.frontiersin.org

Table 3. Breakdown of lesion diagnoses in the PP population.

The AUROC for SCC and BCC produced on images of biopsied lesions captured on each camera were: iPhone 6S 88.5% (95% CI: 83.9–93.1%) and 89.6% (95% CI: 86.5–92.7%) respectively; iPhone 11 88.9% (95% CI: 83.8–94.0%) and 89.5% (95% CI: 86.4–92.6%) respectively; and Samsung S10 84.9% (95% CI: 79.1–90.7%) and 87.2% (95% CI: 83.8–90.7%) respectively (Figure 2 and Table 4). The AUROCs for BCC and SCC, when calculated on all lesions, were > 90% except for SCC in images captured on the Samsung 10 camera, where the AUROC was 87% (Figure 3). The AUROC for benign lesions produced by the AIaMD when assessing biopsied lesions only was between 74.9–76.8%, while the AUROC for benign lesions when all lesions were assessed, ranged between 79.8–80.9%. The AUROC for melanoma was ≥91.8% for all cameras when the AIaMD assessed both biopsied lesions and all lesions. Moderate concordance (72.9% percentage agreement) was found between the AIaMD output label using images from the two iPhones; between iPhone 6S and Samsung 10 the percentage agreement of the AIaMD output label was 60.3%, and between the iPhone 11 and Samsung 10, it was 61.7%.

FIGURE 2
www.frontiersin.org

Figure 2. ROC curves for SCC (left) and BCC (right) produced by the AIaMD when assessing images of biopsied lesions, taken by different cameras.

TABLE 4
www.frontiersin.org

Table 4. AUROCs produced by DERM, using images taken on each camera.

FIGURE 3
www.frontiersin.org

Figure 3. ROC curves for SCC (left) and BCC (right) produced by the AIaMD when assessing images of all lesions, taken by different cameras.

The AUROC for SCC and BCC produced by clinicians were 74.0% (95% CI: 66.4–81.6%) and 85.6% (95% CI: 81.8–89.3%) for biopsied lesions, and 76.9% (95% CI: 69.6–84.3%) and 90.0% (95% CI: 87.3–92.7%) for all lesions, respectively (Table 5). The AUROCs for SCC lesions were significantly lower than those produced by the AIaMD (p < 0.026 for each camera). The clinician AUROCs were also significantly lower than those produced by the AIaMD (p ≤ 0.04) for lesions diagnosed as IEC, AK and benign by histopathology. A weak to moderate level of agreement between clinical and histopathology diagnosis labels was found (percentage agreement 66.4%; Cohen’s kappa = 0.52, p < 0.001).

TABLE 5
www.frontiersin.org

Table 5. AUROC of clinician assessment of likelihood of skin cancer.

When pre-set threshold settings were applied, the sensitivity of the AIaMD to identify malignant lesions was above 90%, and the specificity of the AIaMD for malignant lesions was above 41.5% for each individual malignant lesion type and for all malignant lesions (Table 6). Both “other malignant” lesions were classified as malignant by the AIaMD using images from all cameras. The sensitivity and specificity of the AIaMD was more variable for other lesion types, particularly atypical lesions where the sensitivity varied between 38.1% for the Samsung and 86.4% for the iPhone 6S. In comparison, when considering the suspected diagnosis documented by the clinician at the time of their assessment, they labeled fewer melanoma and SCC lesions accurately compared to the AIaMD (melanoma sensitivity of 81.2% compared to >93% by the AIaMD, SCC sensitivity of 63.6% compared to >90%), and more BCC lesions (sensitivity of 97.5% compared to <96%). Conversely, clinicians achieved a much higher specificity for malignant lesions and were more accurate at identifying benign lesions than the AIaMD.

TABLE 6
www.frontiersin.org

Table 6. Diagnostic performance metrics of clinicians and DERM, using images from each camera, for all lesions in the Per Protocol population.

Univariate analyses and multiple logistic regression analyses were performed on the FA population, filtered for those images with a final diagnosis available, to identify patient and lesion characteristics that might have influenced the accuracy of the AIaMD results and clinical diagnosis. Age above 60 was associated with a non-significant reduction in the accuracy of both dermatologists and the AIaMD to identify malignant lesions in images from the iPhones (Odds Ratio (OR) = 0.37–0.88, p > 0.16) and minor improvement in images from the Samsung 10 (OR = 1.07–1.18, p > 0.7). The impact only reached significance (p = 0.034) for the AIaMD with images from the iPhone 11, in patients aged 74–82. No significant impact was seen for either the AIaMD assessment or clinicians to accurately identify malignant lesions due to the Fitzpatrick skin type, however no cancers were detected in patients with Fitzpatrick skin types V and VI. Indeed, the only factor associated with a significant improvement on the accuracy of dermatologists to identify malignant lesions was a likely or high likelihood of skin cancer (OR > 7, p < 0.018), and on the AIaMD was a high level of patient concern (OR = 1.95, p = 0.008).

4. Discussion

The DERM-003 study is the first prospective, powered, clinical validation study that specifically evaluates the ability of the AIaMD to identify NMSC. Previously, the performance of the AIaMD to identify melanoma was evaluated (7), though this was on an earlier version of the software which focused solely on the identification of melanoma. DERM v3 is designed to identify SCC and BCC, alongside melanoma, as well as a range of premalignant, atypical and benign lesions often mistaken for skin cancer. The study recruited patients in dermatology clinics across the UK, such that the population reflects the aging, primarily Caucasian, population seen in these clinics. Although patients with Fitzpatrick Skin types V and VI were recruited, no skin cancers were diagnosed in these patients. Indeed, only 2.2% of the study population had Fitzpatrick skin type IV-VI, limiting the generalizability of these results for patients with darker skin tones. However, this reflects the trend seen in other clinical studies, and in the real world, where few patients with Fitzpatrick skin types IV-VI are seen in dermatology clinics with suspicious skin lesions (7, 9) and as such the study population can be seen as representative of the population that DERM would be used on. Robust performance evaluation of technologies, such as DERM, in patients with darker skin types may only be possible through post-market surveillance analyses, where more patients with these skin types can be evaluated (10). Similarly, the study included lesions across a good distribution of body locations, including those with higher sun exposure (head, neck upper body) and lower limbs, where lesions can look different, and a range of skin cancer sub-types and stages that are seen in dermatology clinics. The study also included two “other malignant” lesions, which were diagnosed as STUMP and neuroendocrine, and a range of benign lesions.

When the study was designed, the calculations used to determine the success criteria and sample size were based on in silica performance data, which provided an assumption that the true AUROC for both SCC & BCC was 98%. The clinical performance of AI-based devices has frequently been shown to be lower than that of laboratory-based data (1113), and as such an expectation that the true AUROC achieved by the AIaMD on fresh clinical data would be comparable to laboratory results was perhaps unrealistic. Although the study failed to meet either of the co-primary endpoints, the AUROCs achieved by the AIaMD for SCC and BCC were still high and at least comparable to dermatologists. Indeed, the AUROCs of the clinical diagnosis for SCC and BCC lesions do not achieve a 90% AUROC either, indicating that even between clinician and histology there is a huge amount of diagnostic variability. This may be a reflection of clinical practice, where uncertainty of diagnosis drives a conservative view and decision to biopsy. Reassuringly, the AUROC produced by the AIaMD for melanoma was higher than that previously reported (7), demonstrating an improved performance of the AIaMD over the earlier version of the algorithm.

It should be noted that for non-biopsied lesions, the clinical diagnosis was used as the ground truth against which both the AIaMD and clinical diagnosis were compared. Clinical diagnosis therefore will appear more accurate in an all-lesion population, compared to a biopsy-only population, for those lesions where a high proportion do not have a histopathology diagnosis, specifically BCC, AK, and benign lesions. Despite this, the AUROCs achieved by the AIaMD for non-malignant lesions are comparable to those achieved by dermatologists in an all-lesion population, and indeed are notably higher than dermatologists in a biopsy only population.

The study assessed the performance of the AIaMD on images captured by three smartphone cameras available in the UK market at the time of the study. They were chosen to demonstrate performance of the AIaMD across different physical hardware devices (camera specification), operating systems, and price points and included a reference combination (iPhone 6S/DL1) which Skin Analytics has used in a previous study (7). Across the three cameras, the AUROCs for melanoma, SCC and BCC were very similar, indicating a good generalizability of the algorithm across the image capture hardware used. Although a greater variability across the cameras is seen for non-malignant lesions, the AUROCs achieved by the AIaMD from all cameras are still high.

The thresholds used to determine the sensitivity and specificity of the AIaMD were defined to be suitable for use in a secondary care setting at the beginning of the study. The sensitivity achieved by the AIaMD for melanoma, SCC and all malignant lesions were higher than achieved by clinical diagnosis alone, though clinicians referred these lesions for biopsy, so their management decision ensured a sensitivity of 100%. Even for BCC, sensitivity achieved by the AIaMD was around 95% using images from all cameras, and the sensitivity and specificity of the AIaMD to identify premalignant and atypical lesions are at a level that are clinically useful. Additionally, the specificity and NPV values for malignant lesions indicate that the AIaMD could aid the appropriate management of benign lesions. The threshold settings used in live deployments of the AIaMD are different than used in this study, and the sensitivity across all malignant lesions achieved in the real world have been demonstrated to be even higher (10), demonstrating the value in optimizing the settings within the AIaMD for the population it is being used to assess. The sensitivities achieved by the AIaMD for non-malignant lesions are more variable across the cameras than seen for malignant lesions, specifically atypical and benign lesions. Similarly, there was only a moderate concordance between the outputs produced by the AIaMD when analyzing images captured by the different image capture hardware. This may be due to variances in the hardware and post-processing software, or a factor of the threshold settings used by the AIaMD to assign the output label. If the confidence scores produced by the AIaMD on images of the same lesion taken on two different cameras were similar, but fell either side of the threshold set, the AIaMD output label from each image could be different. Since the AUROCs for these lesions were similar, this suggests that the thresholds applied could be optimized for the image capture hardware being used, to achieve the best sensitivity.

The multivariate analysis identified a different impact of patient factors on the accuracy of malignant lesion detection by the AIaMD compared with previously reported analyses (7). This may reflect a change in how the AIaMD works between the two versions assessed. However, since the impact of patient factors on the accuracy of dermatologists is also different, it may be more a reflection that the previous study focused on melanoma detection, whereas this analysis considered all malignant lesions included in the study population. Further analyses are needed to understand whether these translate into a clinically relevant reduction in sensitivity and/or specificity of the AIaMD in different patient groups.

The main limitation to the DERM-003 study is the clinical setting in which it was conducted, and therefore the population studied. The study was conducted in UK secondary care dermatology clinics in order to include sufficient numbers of SCC and BCC lesions in the study population, and to easily capture the histopathology confirmed diagnosis of biopsied lesions and a dermatologist’s clinical assessment of the lesion. This means the study population was made up of patients and lesions that dermatologists determined were suitable for inclusion in the study, which may not be representative of all patients and lesions that would be assessed by DERM. For example, lesions that were clearly benign may have been excluded by a study dermatologist, but on which a less experienced clinician may use DERM to support their patient management decision. That said, the study recruited a broader spectrum of lesions in the study population compared to a previous study (7), where the study population was limited to patients with a pigmented lesion that was due for biopsy. The results of this study are therefore more generalizable to the population of patients seen in secondary care in the UK. Indeed, data from ongoing post-market surveillance monitoring indicates that DERM can be deployed safely as an adjuvant tool in live clinical services accessible to patients with eligible skin lesions (i.e., excluding those under nails, on genitalia or on hairy areas of skin), from a broad range of age groups and most representative skin types with suspicious skin lesions, with sensitivity and specificity in-line with target thresholds and performance demonstrated in clinical studies (10).

Finally, the reliance on clinical diagnosis as the ground truth for non-biopsied lesions not only artificially increases the performance metrics for the dermatologists, as discussed above, but potentially impacts the apparent performance of the AIaMD on non-biopsied lesions. The clinical diagnosis of skin cancer by clinicians is based on the subjective interpretation of morphological features and as such variability in the clinical diagnoses given by dermatologists is known to exist (14). The reliance on one dermatologist to provide the clinical diagnosis used as the ground truth for non-biopsied lesions introduces a potential bias to the results for both the AIaMD and dermatologists. The use of a panel of dermatologists to provide a consensus diagnosis would have provided a greater confidence in the clinical diagnosis ground truth, and provided an independent diagnosis against which to compare the investigating dermatologist.

In conclusion, even though the study failed to meet its co-primary endpoints, the results from the DERM-003 study showed that the AIaMD can detect NMSC and premalignant lesions with a similar level of accuracy as dermatologists, and that taking the images was a quick and well tolerated process. DERM could provide dermatologist level assessment of suspicious skin lesions earlier in the patient pathway, potentially enabling the earlier diagnosis of malignant lesions and improvement of differentiation between harmless and potentially harmful lesions by non-specialists.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Leicester South National Research Ethics Committee. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

HM: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Writing – original draft. CM: Data curation, Investigation, Writing – review & editing. SA: Data curation, Investigation, Writing – original draft. CD: Data curation, Investigation, Writing – review & editing. MV: Data curation, Formal analysis, Investigation, Methodology, Writing – review & editing. CK: Conceptualization, Formal analysis, Methodology, Writing – review & editing. JG: Methodology, Software, Writing – review & editing. DM: Investigation, Resources, Supervision, Writing – review & editing. IP: Conceptualization, Investigation, Supervision, Writing – review & editing. SA: Writing - reviewing and editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Skin Analytics, London, UK sponsored and funded this study, as part of an InnovateUK BioMedical Catalyst project, and was involved with the study design, data collection, statistical analysis and interpretation of the data.

Acknowledgments

The authors would like to thank all patients who consented to the study, and all research staff involved in the conduct and data collection for the study. In particular, Philip Hampton for recruiting additional patients at the Royal Victoria Infirmary, Alicja Raginis-Zborowska, and other Skin Analytics staff involved in the operationalization of the study.

Conflict of interest

SA has received a non-financial gift from Skin Analytics for presenting the results of this research. PK was an employee of Skin Analytics. DM, JG, HM, and MV are employees of Skin Analytics and have received Skin Analytics shares or share options. JG is named as an inventor on patents (pending) relating to DERM.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2023.1288521/full#supplementary-material

References

1. Cancer Research UK. Types of skin cancer. (2023). Available at: https://www.cancerresearchuk.org/about-cancer/skin-cancer/types (Accessed August 14, 2023).

Google Scholar

2. Cancer Research UK. Non-melanoma skin cancer statistics. (2022). Available at: https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/non-melanoma-skin-cancer (Accessed August 14, 2023).

Google Scholar

3. Cancer Research UK. Survival and Incidence by Stage at Diagnosis. (2023). Available at: https://crukcancerintelligence.shinyapps.io/EarlyDiagnosis/ (Accessed August 14, 2023).

Google Scholar

4. Newlands, C, Currie, R, Memon, A, Whitaker, S, and Woolford, T. Non-melanoma skin cancer: United Kingdom National Multidisciplinary Guidelines. J Laryngol Otol. (2016) 130:S125–32. doi: 10.1017/S0022215116000554

PubMed Abstract | CrossRef Full Text | Google Scholar

5. NHS England. Cancer Waiting Times. (2023). Available at: https://www.england.nhs.uk/statistics/statistical-work-areas/cancer-waiting-times/ (Accessed September 1, 2023).

Google Scholar

6. Webb, JB, and Khanna, A. Can we rely on a general practitioner's referral letter to a skin lesion clinic to prioritize appointments and does it make a difference to the patient's prognosis? Ann R Coll Surg Engl. (2006) 88:40–5. doi: 10.1308/003588406X82970

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Phillips, M, Marsden, H, Jaffe, W, Matin, N, Wali, GN, Greenhalgh, J, et al. Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions. JAMA Netw Open. (2019) 2:e1913436. doi: 10.1001/jamanetworkopen.2019.13436

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Pizzichetta, MA, Talamini, R, Stanganelli, I, Puddu, P, Bono, R, Argenziano, G, et al. Amelanotic/hypomelanotic melanoma: clinical and dermatoscopic features. Br J Dermatol. (2004) 150:1117–24. doi: 10.1111/j.1365-2133.2004.05928.x

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Marsden, H, Kemos, P, Venzi, M, Noy, M, Maheswaran, S, Francis, N, et al. Accuracy of an artificial intelligence as a medical device as part of a UK-based skin cancer teledermatology service. Front. Med. (2023).

Google Scholar

10. Thomas, L, Hyde, C, Mullarkey, D, Greenhalgh, J, Kalsi, D, and Ko, J. Real-world post-deployment performance of a novel machine learning-based digital health technology for skin lesion assessment and suggestions for post-market surveillance. Front. Med. (2023).

Google Scholar

11. Li, CX, Fei, WM, Shen, CB, Wang, ZY, Jing, Y, Meng, RS, et al. Diagnostic capacity of skin tumor artificial intelligence-assisted decision-making software in real-world clinical settings. Chin Med J. (2020) 133:2020–6. doi: 10.1097/CM9.0000000000001002

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Zech, JR, Badgeley, MA, Liu, M, Costa, AB, Titano, JJ, and Oermann, EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. (2018) 15:e1002683. doi: 10.1371/journal.pmed.1002683

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Lin, D, Xiong, J, Liu, C, Zhao, L, Li, Z, and Yu, S. Application of comprehensive artificial intelligence retinal expert (CARE) system: a national real-world evidence study. Lancet Digit Health. (2021) 3:e486–95. doi: 10.1016/S2589-7500(21)00086-8

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Polesie, S, Sundback, L, Gillstedt, M, Ceder, H, Dahlén Gyllencreutz, J, Fougelberg, J, et al. Interobserver agreement on dermoscopic features and their associations with in situ and invasive cutaneous melanomas. Acta Derm Venereol. (2021) 101:adv00570. doi: 10.2340/actadv.v101.281

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: skin cancer, Artificial Intelligence, Digital Health Technology, skin lesions, smartphone cameras

Citation: Marsden H, Morgan C, Austin S, DeGiovanni C, Venzi M, Kemos P, Greenhalgh J, Mullarkey D and Palamaras I (2023) Effectiveness of an image analyzing AI-based Digital Health Technology to identify Non-Melanoma Skin Cancer and other skin lesions: results of the DERM-003 study. Front. Med. 10:1288521. doi: 10.3389/fmed.2023.1288521

Received: 04 September 2023; Accepted: 18 September 2023;
Published: 06 October 2023.

Edited by:

Mara Giavina-Bianchi, Albert Einstein Israelite Hospital, Brazil

Reviewed by:

Giusto Trevisan, University of Trieste, Italy
Darius Mehregan, Wayne State University, United States

Copyright © 2023 Marsden, Morgan, Austin, DeGiovanni, Venzi, Kemos, Greenhalgh, Mullarkey and Palamaras. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Helen Marsden, helen@skinanalytics.co.uk

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.