Abstract
Purpose :
As machine learning (ML) algorithms become more common in the clinical setting, so does the importance of explainable AI. The aim of this study was to explore the use of the What-If Tool (WIT) to help clinicians understand how an ML algorithm makes decisions.
Methods :
A supervised deep learning model was trained using Google AutoML Tables to predict visual acuity (VA) in diabetic macular oedema patients receiving anti-VEGF injections. It was trained on a public dataset consisting of 2614 eyes of 1964 patients at a tertiary referral centre in London and optimised for mean absolute error (MAE).
The model was interrogated using the WIT via a jupyter notebook extension. To see how it treated different subgroups, the WIT was used to slice the data by the input features gender and ethnicity. The mean absolute error (MAE) was reported for each of these. To individually explore the data, 10 male patients were chosen at random and had their gender hypothetically changed using the WIT’s partial dependence plots. These are able to vary an input and report how a model’s prediction changes with it.
Results :
The MAE of the model in predicting VA was 8.060 letters. Slicing the data in the WIT by male and female eyes showed that it treated both broadly equally with a MAE of 7.510 and 8.855 respectively. Further slicing by both gender and ethnicity found the largest difference in MAE was between white males (6.888) and white females (10.400).
Partial dependence plots of the randomly chosen points showed a mean change in VA prediction of +0.952 letters when gender changed hypothetically from male to female (range: -8.676 to +10.906). In this sample, changing gender from male to female caused the predicted VA to increase 6 out of 10 times. This was investigated further with a global partial dependence plot, which demonstrated that the average change when changing gender from male to female was -0.57.
Conclusions :
Our results show two ways the WIT can scrutinise an ML model. Firstly, the freedom to slice the data by any chosen feature allows us to see deeper than overall performance metrics. Secondly, the use of partial dependence plots to observe the model’s behaviour in response to hypothetical scenarios offers a more granular understanding than the basic explainable AI features on the Google Cloud Platform. Overall, the WIT presents a novel method for clinicians to understand how ML algorithms make decisions.
This is a 2021 ARVO Annual Meeting abstract.