Abstract
Purpose :
Machine Learning (ML) models suffer from a lack of interpretability, particularly in healthcare settings. We used Google Brain’s What-if Tool (WIT) in a retrospective cohort study to analyse the decision boundaries of a multi-classification model that predicts visual acuity (VA) outcomes for patients with wet age-related macular degeneration (AMD).
Methods :
Our AMD dataset consisted of 3961 eyes from patients who had attended Moorfields Eye Hospital in the UK and were undergoing anti-vascular endothelial growth factor treatment. For each patient, VA was measured at the start of treatment and one year later using Early Treatment Diabetic Retinopathy Study charts. VA after one year of treatment was binned to labels of “Good” for scores of 70+, “Neutral” for scores of 36-69, and “Poor” for scores of 35 or below. A Google Cloud AutoML Tables model was then trained on this data to predict these VA outcome labels based on VA at baseline, age, ethnicity and gender.
We report the AUROC, precision and recall performance of the model. To explore decision boundaries, nearest counterfactual analysis using L1 distance was performed using the WIT – a model-agnostic explainable artificial intelligence tool - as a Jupyter notebook extension.
Results :
The trained AutoML model performed with an AUROC of 0.892, a precision of 73.1% and a recall of 71.9%. We present a case study of an 84-year-old British male patient with an initial VA of 70, and his nearest counterfactual, an 84-year-old British female patient, also with an initial VA of 70. The ground truth for both patients was “Good”; this was correctly predicted in the male patient, whilst the model predicted a “Neutral” outcome for the female.
Conclusions :
We present a novel way in which clinicians can easily view nearest counterfactuals using the WIT, allowing for a greater understanding into how ML models arrive at their decisions at the level of an individual patient. In our example, there is no clinically strong evidence to support the model’s prediction of a “Neutral” outcome in the female patient in comparison to the male patient. Importantly, minimal coding experience is required in both the training of the model on AutoML Tables and the analysis using the WIT. This approach could therefore contribute to the democratisation of ML in healthcare.
This is a 2021 ARVO Annual Meeting abstract.