Abstract
Purpose :
Interpretable staged transfer learning (iSTL), a pipeline for improving image classification with small sample sizes, was trained to carry out classification of optical coherence tomography (OCT) images. Model attention was visualised using SHAP maps, allowing interpretation of its behaviour.
Methods :
A disease classification task using the Inception-v4 convolutional neural network architecture and iSTL training pipeline was carried out. Target OCT images are available under licence (Gholami et al). Normal (n=206), macular hole (MH: n=104), diabetic retinopathy (DR; n=109), and central serous retinopathy (CSR; n=104) scans were used. 50 images were used for data augmentation and training, the remainder were used for validation. The model was pre-trained on ImageNet, given randomly initialised output layers; the early layers frozen. The model was then trained on an intermediate bridge dataset (Kermany et al), output layers replaced, and final training carried out on the target dataset. 10 models were trained, the best performing selected for SHAP attention maps visualisation.
Results :
SHAP maps showed both image and disease specific features were allocated importance during prediction. High importance areas have a large impact on the model’s prediction, low importance areas have less impact. Classifying CSR, high importance was allocated to areas surrounding posterior epithelial detachment and not to areas of subretinal fluid. With DR, retinal microaneurysms and intraretinal oedema were highlighted but subretinal fluid was not. MH scans saw strong regional importance allocated to the vertical edge of full-thickness holes and surrounding intraretinal cysts when present. Normal images typically presented with medium to high importance on the retinal pigment epithelium adjacent to the macula as well as the inner limiting membrane surface. Many images saw moderate attention within areas of the choroid and vitreous that had no apparent clinical importance. When present in the scan field the optic nerve head was not allocated high importance.
Conclusions :
Confounding factors are a concern when training deep learning models, particularly with small datasets. The iSTL model appeared to predominantly use clinically applicable features to make predictions. Further work is needed to determine whether some features used to make predictions are confounds or genuine clinical features related to disease biomarkers.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.