Introduction & Objectives:
Differentiating dysplastic naevi from malignant melanoma represents a formidable challenge in both the clinic and for the Pathologist, complicated by concerns of melanoma overdiagnosis. Convolutional neural networks (CNNs) hold great promise to assist clinicians with this task. However, their implementation is fraught with challenges, and this begins with high quality training data. Here we assess the implication of pathologist consensus for labelling of artificial intelligence training data.
Materials & Methods:
210 lesions suspected of melanoma were imaged and biopsied from an Australian General Practice Clinic. Ground truth diagnosis was established by histopathological consensus of five independent dermatopathologists. Probability weighted diagnoses were assigned by two dermoscopic CNNs that were trained on data from ResNet-50 and the 2018 International Skin Imaging Collaboration Challenge; one was pre-trained on ImageNet data (CNN-1) and the other on images from Australian teledermatology clinics (SMARTI).
Results:
CNN-1 yielded an area under the receiver-operator curve of 0.682 while SMARTI yielded 0.725. CNN-1 had a specificity of 0.35 (95% confidence interval (95% CI) 0.27-0.45) and sensitivity of 0.91 (95% CI 0.84-0.96). Whereas SMARTI demonstrated a specificity of 0.26 (95% CI 0.19-0.35) at a sensitivity of 0.95 (CI 0.88-0.98). We observed higher inter-rater agreement among pathologists for lesions correctly classified by SMARTI (Fleiss’ Kappa 0.788) relative to lesions misclassified by SMARTI (Fleiss’ Kappa 0.406). So, lesions misclassified by the AI model were also divisive for pathologists.
Conclusion:
Our finding that CNNs struggle with the same lesions as Pathologists highlights the importance of consensus diagnosis for labelling of training data considering the difficulty for Dermoscopists and Pathologists alike with melanomas. Future directions should explore the integration of comprehensive histopathological data and multi-modal learning approaches to refine the diagnostic precision and reliability of neural networks in Dermatology.