Abstract
Purpose :
Eye movements of experts on medical images could offer valuable objective information about image regions used for disease detection. Toward the goal of training artificial intelligence (AI) systems with expert eye movements to improve AI disease detection accuracy and interpretability, we present an AI model trained on eye tracking data of non-expert subjects as they viewed optical coherence tomography (OCT) reports (since OCT is often used during assessment of ophthalmic diseases, e.g., glaucoma). The resulting AI model serves as a proof-of-principle method to predict eye fixations of non-experts on OCT reports that can be translated to predict eye fixations of experts on OCT reports.
Methods :
Twenty OCT reports were viewed by 3 non-expert subjects. A Pupil Labs Core eye tracker was used to track each subject’s eyes. Each OCT report (originally 1280x720 pixels in resolution, downsampled to 224x224 pixels) was represented as a 12x7 grid. A convolutional neural network (CNN) was trained to predict whether a given grid location was fixated by the subject. We first trained the CNN using publicly-available eye movement data on a line-drawing dataset [1] and then used the non-expert OCT fixations to fine-tune the CNN. The CNN architecture consisted of a ResNet50 [2] pre-trained on ‘ImageNet-V1’ [3].
Results :
Results were computed for each non-expert subject (80%:20% train-test split). Average accuracy (correctly predicted fixations divided by total number of fixations) was 0.88, while average recall and precision were 0.31 and 0.68, respectively. The CNN-predicted fixations were compared with non-expert subject fixations (ground truth) on the test set (visualized in Fig. 1; blue points depict non-expert fixations, left, and CNN-predicted fixations, right). Note similarity in location of CNN-predicted and ground truth fixations.
Conclusions :
Our CNN was able to robustly predict fixations made by non-expert subjects on OCT reports. Future work will focus on reducing false negatives by customizing the CNN’s loss function to improve recall. The proposed CNN model demonstrates its ability to learn viewer-fixated regions on OCT reports. Our approach can be extended to prediction of expert fixations on OCT reports, offering the potential to aid in training of eye-movement–informed AI systems, CNN model interpretability, and medical education. [1] Kietzmann et al., 2015; [2] Kaiming He et al., 2014, 2016; [3] Russakovsky et al., 2015.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.