Abstract
Purpose :
Deep learning-based models for object detection and segmentation have proven effective in analyzing retinal OCT images. Our team has employed deep learning techniques to investigate retinal OCT scans in children with sickle cell disease (SCD). In sickle-cell retinopathy (SCR), retinal injury presents as an area of inner retinal thinning that often spans across multiple OCT B-scans. Since the changes at the disease margin can be subtle, ophthalmologists use adjacent scans as a reference to differentiate artifacts from a part of retinal damage. We present the Detachable Encoder Transformer (DEnT), which applies a similar strategy to identify SCR in OCT images.
Methods :
Two models, DEnT pre-trainer and DEnT SCR detector, are used. The DEnT pre-trainer employs a contrastive learning framework to pre-train OCT B-scans. It uses a Siamese network for analyzing a pair of augmented B-scans to determine whether they originate from the same OCT study (positive case) or not (negative case). Through this approach, the model learns to identify how the contour, shape, and thickness change across a set of B-scans. The DEnT detector uses the DEnT pre-trainer as an encoder to attend to three adjacent B-scans simultaneously. It applies the positional encoding of one B-scan embedding to query features in the other two. This approach allows the model to leverage the volumetric nature of OCT and analyze multiple cross-sections concurrently rather than individually.
Results :
We trained the DEnT pre-trainer using 47,787 positive and 337,700 negative pairs of B-scans and the DEnT detector with 331 OCT studies from 90 SCD patients containing 9,320 sets of three adjacent B-scans. We trained three detection models: (a) – without pre-training, (b) – with pre-training, and (c) – with pre-training and finetuning. We evaluated our method using the mean average precision (mAP) and compared it against popular object detectors like YOLO, Faster RCNN, and DETR. The result is shown in Table 1.
Conclusions :
Our proposed network, DEnT, serves three primary purposes: 1) It uses a transformer-based pre-training network to detect subtle patterns within the retinal layers of OCT images. 2) It employs pre-trained embeddings and attention mechanisms to extract and analyze similar features from adjacent B-scans. 3) It presents one of the first deep learning-based frameworks exclusively dedicated to SCR detection from OCT images.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.