Abstract
Purpose :
Feature attribution methods provide insight into the decision-making process of convolutional neural networks by highlighting pixels that strongly influence the classification decision. Training a network using adversarial examples causes it to emphasize the most relevant image features, resulting in more focused feature maps in comparison to conventional training. In this study, we investigated if an adversarially trained network produces more robust feature maps for an OCT B-scan classifier.
Methods :
61,058 B-scans from 478 independent eyes were used for training an Xception network both conventionally and with adversarial attacks for a binary classification task (i.e. B-scan is either normal or abnormal). A B-scan is considered “abnormal” if it was graded by at least one (of two) retina specialist to contain one of the pathologies shown in Table 1. The performance of each network was evaluated on a hold-out test set containing 15,338 B-scans of 120 eyes. The percentage of samples where pathology is present/absent for training and test set was, respectively, 41%/59% and 38%/62%. All B-scans were acquired with CIRRUSTM HD-OCT 4000 or CIRRUSTM HD-OCT 5000 devices (ZEISS, Dublin, CA).
Five different feature attribution methods – Grad-CAM, SmoothGrad2, VarGrad2, Integrated Gradients, and Vanilla Gradients – were used to generate feature maps indicating how much each feature in the model contributed to the predictions. The maps were qualitatively compared for the two types of training regimes.
Results :
In all feature attribution methods except Grad-CAM, the model trained with adversarial examples displays clearer and more focused feature maps, as shown in Figure 1. However, the gains in feature attribution map interpretability come at the cost of a small loss in model performance. The conventionally trained model obtained an accuracy of 96.14% and AUC of 0.992, while the adversarially trained model obtained an accuracy of 94.74% and AUC of 0.989.
Conclusions :
The results consistently show that when using feature attribution maps for model interpretability, one can obtain enhanced feature attribution maps by adversarially training the models
This is a 2021 ARVO Annual Meeting abstract.