Investigative Ophthalmology & Visual Science Cover Image for Volume 64, Issue 8
June 2023
Volume 64, Issue 8
Open Access
ARVO Annual Meeting Abstract  |   June 2023
Classification of Fungal and Bacterial Infectious Keratitis - A Comparison of Transformer and CNN Models
Author Affiliations & Notes
  • Alex Hassett
    Camas High School, Camas, Washington, United States
  • Xubo Song
    School of Medicine, Oregon Health & Science University, Portland, Oregon, United States
  • Prajna Lalitha
    Department of Microbiology, Aravind Eye Hospital, Madurai, Tamil Nadu, India
  • Venkatesh Prajna
    Department of Microbiology, Aravind Eye Hospital, Madurai, Tamil Nadu, India
  • Rameshkumar Gunasekaran
    Department of Microbiology, Aravind Eye Hospital, Madurai, Tamil Nadu, India
  • Travis Redd
    School of Medicine, Oregon Health & Science University, Portland, Oregon, United States
  • Footnotes
    Commercial Relationships   Alex Hassett None; Xubo Song Oregon Health & Science University, Code E (Employment); Prajna Lalitha Aravind Eye Hospital, Code E (Employment); Venkatesh Prajna Aravind Eye Hospital, Code E (Employment); Rameshkumar Gunasekaran Aravind Eye Hospital, Code E (Employment); Travis Redd Oregon Health & Science University, Code E (Employment)
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2023, Vol.64, 1097. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Alex Hassett, Xubo Song, Prajna Lalitha, Venkatesh Prajna, Rameshkumar Gunasekaran, Travis Redd; Classification of Fungal and Bacterial Infectious Keratitis - A Comparison of Transformer and CNN Models. Invest. Ophthalmol. Vis. Sci. 2023;64(8):1097.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Deep learning models have been dominated by Convolutional Neural Networks (CNN), and have led to unprecedented improvements in biomedical image analysis, including ocular imaging. The recently developed Vision Transformers (ViT) has surpassed CNN models in many application domains and is considered a generally superior model. The advantage of ViT over CNN is due to its ability to capture long-range dependencies in the image early in the model layers through the attention mechanism, and is generally more robust than CNN. In this paper, we evaluate and compare the performances of ViT and CNN-based models in classification of fungal and bacterial Infectious keratitis.

Methods : We implemented both ViT and CNN-based models, on the Keras platform. We evaluated two ViT models - the original ViT and the ViT with Multilayer Perceptrons (ViT Mixer). We evaluated three CNN models, including ResNet, EfficientNet and MobileNet. All these models were pretrained on public domain ImageNet. The MobileNet has been reported to perform the best on a similar dataset. The data set contains images from handheld cameras collected from patients with culture-proven corneal ulcers in South India recruited as part of clinical trials conducted between 2006 and 2015. There are 671 images, with 440 fungal and 231 bacterial samples. We used 5-fold cross validation to evaluate the models, where the dataset was divided into 80% training data and 20% validation data for each fold and each model was trained 5 times. Model performance was measured by categorical accuracy, area under the curve (AUC), specificity, and sensitivity on the validation dataset.

Results : Among the ViT models, the ViT-MLP Mixer performed better than the original ViT. Among the CNN models, the ResNet50CNN performed the best, and outperformed a model from a similar study. Interestingly, the ViT-MLP Mixer fell behind the ResNet50CNN, in all evaluation criteria including accuracy, sensitivity, specificity and area under the ROC curve. The performance discrepancies are significant.

Conclusions : While Transformer-based models often outperform CNN-based models, it is not always the case. A transformer model requires more data to train, especially due to the complexity of the embedding layers and the attention module. When the dataset is small, a CNN model may be more appropriate, as is the case with our infectious keratitis data.

This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×