Abstract
Background
Hypomelanotic and amelanotic melanomas, characterised by little or no pigment, pose significant clinical challenges. These melanomas are difficult to diagnose early, increasing the risk of late detection and poorer outcomes.
Objectives
The objective of this study was to assess the impact of varying lesion pigmentation on the diagnostic accuracy of two deep learning models with distinct architectural structures. CNN1 was a domain-specific model, while CNN2 was a domain generalisable model.
Methods
CNN1 was a 26 class architecture with a pretrained inception v4 backbone, and CNN2 used ImageNet pretrained ResNet-50 as a backbone combined with a transformer. CNN1 was trained on benign and malignant melanocytic lesions, making it more domain-specific, while CNN2 was more domain-generalisable due to its architecture and training on a wider range of lesion classes. The test dataset comprised of 488 images, including 237 pathology-confirmed melanomas and 251 benign melanocytic lesions, predominantly from individuals with Fitzpatrick skin types I and II.
Results
Whilst CNN1 performed better overall (Accuracy 87.7%; AUROC 0.956 vs Accuracy 80.1%; AUROC 0.926) suggesting this is a domain-specific problem, the diagnostic performance of both CNN models revealed a tendency to underestimate the malignancy of lightly pigmented lesions. Grad-CAM heatmaps provided insight into the decision-making processes of the models, indicating potential areas for improvement in their training.
Conclusions
This study highlights the critical need for careful consideration of model architecture to enable more accurate recognition of hypomelanotic and amelanotic melanomas to avoid compounding the risk of false negative reassurance.