Purchase this article with an account.
Sara Beqiri, Eneda Rustemi, Madeline Kelly, Robbert Struyven, Edward Korot, Pearse Andrew Keane; Qualitative comparison of AutoML explainability tools with bespoke saliency methods. Invest. Ophthalmol. Vis. Sci. 2022;63(7):200 – F0047.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Google Cloud Platform (GCP) empowers clinicians to explore Artificial Intelligence (AI) via its code-free user interface. This, however, is associated with a low level of transparency and control over algorithm design. As a solution, the GCP Explainable AI features can produce saliency heatmaps with minimal coding, highlighting regions of model interest to guide clinician’s understanding.We present a qualitative evaluation of these explanations through a user survey, comparing them to the same saliency technique produced from a bespoke model.
We trained two algorithms for the binary classification of referrable vs non-referrable Diabetic Retinopathy (DR), using the same 60,133 fundus images from publicly available datasets. The AutoML and bespoke models reached accuracies of 93.7%, and 96.5% respectively, sufficient for our purposes of saliency map assessment.12 test images were selected to represent varying degrees of AutoML prediction confidence. For each image, an XRAI saliency map was produced for both the AutoML and bespoke algorithms. The prior involved minimal coding, whereas the latter required a fully coded Jupyter notebook.These maps were provided to a consultant ophthalmologist with 20 years of experience, who answered a survey of three specified questions per image, via a 5-point Likert scale. These focused on the map’s localisation ability, clarity of information, and overall quality.
Paired t-tests showed no statistically significant difference between the AutoML and bespoke map scores (3.25±0.75 vs 3.58±0.51) when comparing quality, however localisation ability and clarity of information were significantly higher for the bespoke model (2.75±1.14 vs 3.92±0.51, and 2.67±1.07 vs 3.83±0.58). A combined score for the three questions also showed a significant difference with a mean of 11.3±2.9 for bespoke and 8.67±1.44 for AutoML.
Our results showed that utilising the same saliency method and dataset in two different models can lead to significantly differing maps. Our qualitative evaluation depicted superiority of bespoke saliency maps in localisation and clarity when compared to the AutoML explainability tools.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.
This PDF is available to Subscribers Only