Poster Presentation Australasian Society for Dermatology Research Annual Scientific Meeting 2024

Assessing the impact of lesion pigmentation on the performance of deep learning models: Diagnostic accuracy cross- sectional study (#72)

Ibukun AIO Oloruntoba 1 , Claire CF Felmingham 1 2 3 , Deval DM Mehta 4 , Miki Wada 5 , maithili sashindranath 1 , Nikki Adler 1 2 , Zhen ZY Yu 4 , Asa AI Ingvar 1 6 , Cristina CVA Vico-Alonso 1 , Toan TN Nguyen 4 , John JK Kelly 2 , Yan YP Pan 2 , Alex AC Chamberlain 2 , Zongyuan ZG Ge 4 , Rory RW Wolfe 1 , Victoria VM Mar 1 2
  1. SPHPM, Monash University , Melbourne, Victoria, Australia
  2. Victorian Melanoma Service, Alfred Health, Melbourne, Victoria, Australia
  3. Skin Health Institute , Melbourne, Victoria, Australia
  4. Monash Medical Artificial Intelligence, Monash University, Melbourne, Victoria, Australia
  5. Department of Dermatology, Alfred Health, Melbourne, Victoria, Australia
  6. Department of Dermatology, Skåne University Hospital, Lund, Sweden

Abstract

 

Background

Hypomelanotic and amelanotic melanomas, characterised by little or no pigment, pose significant clinical challenges. These melanomas are difficult to diagnose early, increasing the risk of late detection and poorer outcomes.

 

Objectives

The objective of this study was to assess the impact of varying lesion pigmentation on the diagnostic accuracy of two deep learning models with distinct architectural structures. CNN1 was a domain-specific model, while CNN2 was a domain generalisable model.

 

Methods

CNN1 was a 26 class architecture with a pretrained inception v4 backbone, and CNN2 used ImageNet pretrained ResNet-50 as a backbone combined with a transformer. CNN1 was trained on benign and malignant melanocytic lesions, making it more domain-specific, while CNN2 was more domain-generalisable due to its architecture and training on a wider range of lesion classes. The test dataset comprised of 488 images, including 237 pathology-confirmed melanomas and 251 benign melanocytic lesions, predominantly from individuals with Fitzpatrick skin types I and II.

 

Results

Whilst CNN1 performed better overall (Accuracy 87.7%; AUROC 0.956 vs Accuracy 80.1%; AUROC 0.926) suggesting this is a domain-specific problem, the diagnostic performance of both CNN models revealed a tendency to underestimate the malignancy of lightly pigmented lesions. Grad-CAM heatmaps provided insight into the decision-making processes of the models, indicating potential areas for improvement in their training.

 

Conclusions

This study highlights the critical need for careful consideration of model architecture to enable more accurate recognition of hypomelanotic and amelanotic melanomas to avoid compounding the risk of false negative reassurance.