Purchase this article with an account.
Alexandra Van Brummen, Julia P Owen, Theodore Spaide, Colin Froines, Randy Lu, Megan Lacy, Marian Blazes, Cecilia S Lee, Aaron Y Lee, Matthew Zhang; Artificial intelligence automation of eyelid and periorbital measurements. Invest. Ophthalmol. Vis. Sci. 2021;62(8):2149.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Currently there are no standardized automated tools to assess oculoplastic metrics despite the importance of periorbital measurements in assessing clinical disease and surgical outcomes. To date, only the margin reflex distances (MRD1/MRD2) have been previously automated. To address this, we used a deep learning semantic segmentation network to fully automate 9 periorbital measurements.
Periorbital photos were collected from routine oculoplastics clinic. In the retrospective phase, photos from 397 patients were collected. Three areas in each photo (eye aperture, iris, and eyebrow) were segmented bilaterally by 3 graders. The segmentations were used to train a deep learning semantic segmentation model consisting of a vanilla PSPNet with a ResNet50 backbone and a U-Net-style upsampling arm. Then, a post-processing algorithm was developed to measure: MRD1, MRD2, medial canthal height (MCH), lateral canthal height (LCH), medial brow height (MBH), lateral brow height (LBH), medial inter-canthal distance (MID), and lateral inter-canthal distance (LID). In the prospective phase, three human graders used Photoshop version 22.0.1 to segment and measure the 9 metrics in photos from 46 participants. The images and grader-derived segmentations and measurements formed the independent test set. The trained network and the post-processing algorithm were used to obtain periorbital measurements for the test-set images (Fig 1).
The mean absolute difference range for MRD1 was 0.43-0.57mm between AI and human graders and 0.24-0.30mm between the 3 human graders. For MRD2 it was 0.38-0.39mm between AI and humans and 0.28-0.35mm between human graders. On average the periorbital measures deviated less than 4.5mm between every pair of raters across all metrics (Fig 2). The 95% confidence intervals are largely overlapping between all pairs of raters indicating the variations between human graders were similar to those between humans and AI.
We present, to the best of our knowledge, the first machine learning automation of 9 different periorbital measurements. This tool has similar variability to human graders and could be clinically useful to objectively track disease progression and surgical outcomes.
This is a 2021 ARVO Annual Meeting abstract.
Example (not from training set) with AI-derived segmentations and measures.
Mean absolute difference calculated between every pair of raters, including AI, with 95% confidence interval error bars.
This PDF is available to Subscribers Only