Abstract
Purpose :
The quality and diversity of datasets determine the effectiveness of artificial intelligence (AI) models for automated disease detection. Publicly available datasets for glaucoma are limited in size and attribute diversity, are not extendable, and do not follow a standard format which constrains the progress, development, and transparent evaluation of AI models assisting in glaucoma diagnosis and management. The objective of this work is to increase accessibility of a high-quality glaucoma dataset for AI through development of the largest open-source standardized labeled glaucoma dataset for classification and segmentation tasks.
Methods :
All fundus images (and corresponding segmentation data) were standardized with an algorithm that maximizes cropping of information outside the field-of-view, centers images with missing information, and resizes images to 512x512 pixels while preserving the aspect ratio. Associated metadata were also standardized. The dataset was divided into labeled training (9,916), validation (1,000), and test (1,000) sets. To provide a glaucoma classification benchmark, we employed an initialized DenseNet201 as a base architecture (Hyperparameters: SGD, LR=1e-5, Momentum=0.9, epochs=50, batch size: 32) was performed.
Results :
The final dataset (publicly accessible at https://github.com/TheBeastCoding/standardized-multichannel-dataset-glaucoma) contains 12,049 total unique, multi-channel datapoints; this is comprised of 7,299 non-glaucoma, 4,617 glaucoma, and 133 glaucoma-suspect instances. It is diverse in image capture environment, country of origin, presented segmentation data, glaucoma type, and image format. All datapoints contain at least one fundus image and a corresponding glaucoma diagnosis. Our model benchmarks with F1=87.57%, accuracy=87.60%, precision=87.55%, and recall=87.60%.
Conclusions :
We present the largest publicly available labeled glaucoma dataset and associated classification benchmark. The dataset is open-source, multi-channel, extendable, standardized and AI-ready for segmentation and classification tasks. Accessibility of a high-quality, benchmarked glaucoma dataset is critical in accelerating research in AI application to glaucoma diagnosis and management.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.