Abstract
Purpose:
To evaluate interobserver concordance in measured corneal fluorescein staining (CFS) using the National Eye Institute/Industry (NEI) grading scale and the Corneal Fluorescein Staining Index (CFSi), a computer-assisted, objective, centesimal scoring system.
Methods:
We conducted a study to evaluate CFS in clinical photographs of patients with corneal epitheliopathy. One group of clinicians graded CFS in the images using the NEI while a second group applied the CFSi. We evaluated the level of interobserver agreement and differences among CFS scores with each method, level of correlation between the two methods, and distribution of cases based on the CFS severity assigned by each method.
Results:
The level of interobserver agreement was 0.65 (P < 0.001) with the NEI, and 0.99 (P < 0.001) with the CFSi. There were statistically significant differences among clinicians' measurements obtained with the NEI (P < 0.001), but not with the CFSi (P = 0.78). There was a statistically significant correlation between the CFS scores obtained with the two methods (R = 0.72; P < 0.001). The NEI scale allocated the majority of cases (65%) within the higher quartile in the scale's severity (12–15/15). In contrast, the CFSi allocated the majority of cases (61%) within the lower quartile in the scale's severity (0–25/100).
Conclusions:
The CFSi is easy to implement, provides higher interobserver consistency, and due to its continuous score can discriminate smaller differences in CFS. Reproducibility of the computer-based system is higher and, interestingly, the system allocates cases of epitheliopathy in different severity categories than clinicians do. The CFSi can be an alternative for objective CFS evaluation in the clinic and in clinical trials.
Ocular surface disease is characterized by clinical signs and symptoms that are indispensable for the assessment, staging, clinical management, and follow-up of the condition.
1–4 Punctate keratitis (corneal epitheliopathy) is a clinical sign that reflects, physiologically and anatomically, the viability of the corneal epithelium.
3,4 Punctate keratitis and other forms of corneal epitheliopathy (e.g., confluent defects) are easily studied and assessed in the clinic through dyes applied onto the ocular surface.
3,4 Sodium fluorescein has been used for decades as a biomarker that can be excited with cobalt blue light to depict the degree of corneal epitheliopathy, a technique commonly referred as to corneal fluorescein staining (CFS).
3–5
Corneal fluorescein staining is easy to perform in the clinic and is cardinal for the diagnosis, treatment, and follow-up of ocular surface diseases, including dry eye disease.
1–5 Although various CFS grading scales have been described, clinicians generally rely on practical ordinal scales, for example, “mild,” “moderate,” “severe” or 0+, 1+, 2+, and so on, or variations thereof. Although used with less frequency, other ordinal CFS scales that are more specific, defining the degree of punctate keratitis, are also available, including the Oxford grading scale and the National Eye Institute/Industry grading scale (NEI).
6,7 The Oxford scale relies on a comparative chart to define the degree of CFS, with different images defining six levels of severity (0 [absent] to 5 [severe]).
6 The NEI scale relies on a chart that divides the cornea into five sections and assigns a value from 0 (absent) to 3 (severe) to each section, based on the amount, size, and confluence of the punctate keratitis, for a maximum of 15 points.
7 Currently, the Oxford and NEI scales are the most commonly used CFS grading scales in clinical trials due to the systematic methodology they utilize and the popularity among clinicians and researchers they have gained over the past years. However, even though they are more systematic, these CFS scales are still observer dependent and subjective, and thus susceptible to high inter- and intraobserver variance.
Objective methods to assess CFS based on corneal image analysis have been proposed, but none has been implemented for routine CFS assessment in the clinic or in clinical trials.
8–11 We proposed that a systematic method that objectively evaluates CFS, minimizes subjective input from human observers, and facilitates comparisons reducing inter- and intrarater variability will represent an important advance for the evaluation of CFS changes over time or after an intervention. We thus developed the Corneal Fluorescein Staining Index (CFSi), a computer-guided method that objectively quantifies CFS in clinical photographs and delivers a continuous, centesimal (0–100) score.
Herein we present the results of a prospective study involving consecutive patients with ocular surface disease and corneal epitheliopathy that attended our clinic. We acquired clinical images of the cornea, and CFS was evaluated by a group of clinicians using the NEI scale and by a group of nonclinician raters using the newly developed CFSi, and compared the results obtained with the two methods.
Photographs of the cornea showing punctate keratitis were acquired with the SL-D7 Topcon photography system (Topcon Medical Systems, Inc., Oakland, NJ, USA) using cobalt blue light and a yellow filter. Fluorescein was prepared by adding a standard drop (approximately 20 μL) of sterile saline solution (AddiPak; Hudson RCI, Research Triangle Park, NC, USA) to a sterile fluorescein strip (BioGlo; HUB Pharmaceuticals, Rancho Cucamonga, CA, USA) and then instilled into the inferior eyelid cul de sac; patients were instructed to blink smoothly and the excess of fluorescein was removed. After 2 minutes, patients were instructed to gaze forward while eyelids were held open to expose the entire cornea and the photographs were acquired. The same light intensity (maximum), slit width (14 mm), magnification (×10), and camera settings (exposure time, aperture, and shutter speed [automatic]) were used in all cases. The process of image acquisition took approximately 5 minutes.
A cornea specialist selected all the photographs considered to have good clinical quality to permit another clinician to score CFS out of the clinic. All the selected photographs were evaluated clinically (NEI system) and with the CFSi, and all the scores obtained with both techniques were included in the analysis.
At the start of the plug-in, a circle is generated and the user may adjust the circle to fit the corneal size. Before generating a fluorescein map of the cornea, areas with specular reflections are detected by a flooding algorithm. Areas with specular reflections are then excluded from the calculation of the fluorescein staining score, as the specular reflection does not represent the true colors. The plug-in then reads the RGB value of each pixel within the grid and converts the RGB values to HSV space,
13 where the value component is given by:
\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicodeTimes]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\begin{equation}\tag{1}{V = \max(R,G,B).}\end{equation}
The saturation component is given by:
\begin{equation}\tag{2}S = \left\{ {\matrix{ {{{delta} \over {\max \left( {R,G,B} \right)}}} \hfill&{{\rm{if}}\;\max \left( {R,G,B} \right) \ne 0} \hfill \cr 0 \hfill&{{\rm{else}}} \hfill \cr } } \right.,\end{equation}
where
delta = max(
R,
G,
B) − min(
R,
G,
B).
The hue component is given by:
\begin{equation}\tag{3}H = \left\{ {\matrix{ 0 \hfill&{{\rm{if}}\;delta = 0} \hfill \cr {{{60 \times \left( {{{G - B} \over {delta}}} \right) + 360} \over {360}}} \hfill&{{\rm{if}}\;delta \ne 0\;{\rm{and}}\;\max \left( {R,G,B} \right) = R} \hfill \cr {{{60 \times \left( {{{B - R} \over {delta}}} \right) + 360} \over {360}}} \hfill&{{\rm{if}}\;delta \ne 0\;{\rm{and}}\;\max \left( {R,G,B} \right) = G} \hfill \cr {{{60 \times \left( {{{R - G} \over {delta}}} \right) + 360} \over {360}}} \hfill&{{\rm{otherwise}}} \hfill \cr } } \right.\end{equation}
Then the hue value is mapped from [0°, 360°] to [0, 1] using a parabola curve. Based on multiple experimental data, we found that for the range of [200°, 220°], the linear function is a better marker than a parabolic function to avoid confusion between the blue (from the light source utilized to excite fluorescein) and green colors (fluorescence from epithelial defects). The final fluorescence (greenness) value is calculated as:
\begin{equation}\tag{4}greenness = H \times S \times V\end{equation}
At this point the greenness score falls within the range of [0, 1], as the H, S, and V values fall all within the range of [0, 1]. A zero score means there is absolutely no greenness in the corneal selection, and a score of 1 means the entire selection is positive (preset threshold) for green. When calculating the percentage of the fluorescent area, the fluorescence threshold is applied; that is, if the greenness score of a pixel is above the threshold, then it is regarded as fluorescein staining. This threshold was incorporated to the algorithm to avoid false positives triggered by a stained tear film or other artifacts, and was selected after careful review of multiple fluorescent maps by an experienced clinician.
We obtained 61 corneal images with punctate keratitis in patients with ocular surface disease. The mean CFS scores obtained by the clinical observers using the NEI scale were as follows: observer A, 13.8/15 (range, 5–15); observer B, 11.5/15 (3–15); observer C, 10.7/15 (2–15); and observer D, 12.7/15 (3–15). The CFS scores obtained by the observers using the computer-assisted system were observer E, 21.3 (range, 1–69); observer F, 21.4 (0.5–69); observer G, 21.3 (0.5–68); and observer D, 21.2 (0.5–68). We found a statistically significant correlation between the scores obtained by the control observer using the two different methods (R = 0.72; P < 0.001).
The intraclass coefficient of correlation for absolute agreement among clinical observers using the NEI scale was 0.65 (P < 0.001; 95% confidence interval [CI] 0.482–0.775) and 0.99 (P < 0.001; 95% CI 0.997–0.999) for the observers using the CFSi. The Friedman's test reported statistically significant differences among the mean CFS scores obtained by clinical observers with using the NEI scale (P < 0.001). The ANOVA test for repeated measures did not report significant differences among the mean scores obtained by the observers using the CFSi method (P = 0.78).
Finally, when we analyzed the distribution of the CFS scores obtained with both methods across the population, we found that when using the NEI scale, 4 cases were allocated within the scale's lowest (first) severity quartile (0–3/15), 24 cases in the second quartile (4–7/15), 57 cases in the third quartile (8–11/15), and 159 cases in the highest (fourth) severity quartile (12–15/15). Using the CFSi scale, 150 cases were allocated within the scale's lowest severity quartile (0–25/100), 77 cases in the second quartile (26–50/100), and 17 cases in the third quartile (51–75/100). None of the cases was allocated in the highest severity (fourth) quartile (76–100/100).
Objective CFS scoring using the CFSi in clinical images was feasible and convenient. Observers did not report difficulties or required assistance using the CFSi, which is a fundamental aspect to consider when assessment of CFS is performed a posteriori in large studies or multicenter trials. We found a clear correlation between CFS scores obtained with the two different methods. However, it is important to note that such association reflects only that scores were allocated (ranked) in similar order but meaningful differences in CFS score magnitude were present. The level of interobserver consistency found with the NEI scale (0.65) is considered to be fair to moderate, in contrast to the CFSi showing a level of consistency (0.99) considered excellent.
14,15
The CFSi objectively evaluates the amount of fluorescein staining avoiding subjective input from a clinician, which is influenced by the size, number, confluence, or localization of CFS. Although these punctate keratitis characteristics are important for clinical judgment and management, in multicenter studies or clinical trials, CFS scales with low levels of consistency can compromise the reliability of the results. The CFSi linearly assigns higher scores to larger CFS areas and lower scores to areas without CFS, avoiding exponential changes arising from subjective judgment. CFS grading scales such as the NEI and Oxford attempt to address two needs simultaneously: One is giving weight in the severity scale to clinically important features, and the other is being less subjective by providing more quantifiable features in their system. Nevertheless, our data show that these scales lack consistency and are highly variable, limiting further development in the study of the ocular surface.
Ordinal scales are by definition characterized by nonclear intervals between values. Ordinal CFS scales often assign very high scores to cases considered clinically severe, even when a limited area of the cornea is affected, and by doing so preclude the assignment of higher scores as CFS increases. This phenomenon often leads to arbitrarily using the same “severe” score in cases with different degrees of CFS (
Fig. 2). In fact, in some cases with very severe CFS (e.g., one that has obtained the maximum score) there is no room to increase the score if the condition worsens (
Fig. 2), but the same applies if the condition improves but not sufficiently to be reflected by the immediate lower value in the CFS scale (i.e., from 15/15 to 14/15) (
Figs. 2,
3). Another example of the limitations of ordinal scales presents when cases rated with severe scores improve and the scale cannot reflect the improvement since the improvement is modest and there is not an appropriate value in between, or the case still meets criteria to be rated with the highest CFS score, creating a “plateau” effect (
Fig. 4). In this study clinical observers mostly assigned severe scores, allocating 65% of the cases in the NEI scale upper quartile of severity (score 12–15/15); should CFS worsen it would be difficult to reflect that change in many of these cases. Other examples of ordinal scale limitations include variability in the perception or interpretation of guidelines creating a “fickle-grid” effect (
Fig. 5), or judgment bias when cases are evaluated next to each other and the scores are influenced by observer's previous decisions.
The CFSi objectively quantifies the number of fluorescence-positive pixels in the corneal area, thus reflecting even minimal CFS fluctuations regardless of the initial severity. In contrast to the NEI scale, the CFSi allocated most measurements (61%) in the lower quartile of a 100-point scale, while clinicians allocated only 2% of the measurements in the lower quartile (NEI 0–3/15). This does not signify that the CFSi underestimates (or overestimates) CFS scores as compared to the NEI scale, but only that the measurements, although correlated, are different in nature. CFS values and severity ranges are different for both grading systems, and each CFS score is inherent to the way it is calculated and allocated within its own severity scale; therefore CFS values should be considered only in the context of the correspondent CFS grading system. A limited number of studies have explored the applicability of CFS image analysis in the clinical setting, mostly correlating objective image-analysis techniques to ordinal scales used during evaluation of contact lens–related complications.
8–11 These studies show that in general, CFS scores based on image analysis are significantly correlated with clinical scores, independently of the clinical CFS scale used. One report showed that levels of interobserver agreement with image analysis are approximately seven times better than with subjective clinical matching scales.
9 Evidence also shows that depending on the algorithm used for CFS image analysis, repeatability of CFS scores can vary, and the correlation with clinical observations can change dramatically (from positive to negative).
9 Another study compared a CFS image analysis scoring technique with the Oxford and NEI scales and found good levels of correlation with both.
8
To date investigators have focused mainly on the correlation between CFS scores obtained with image analysis and ordinal clinical scores, but an intuitive continuous scale has not consistently been pursued. We developed a novel method that combines objective CFS assessment and an intuitive continuous centesimal scale. While a continuous scale can fit an unlimited number of values, in contrast, the use of subjective CFS scales with arbitrary intervals can be one of the reasons multiple clinical trials continue to fail in detecting changes in CFS. In the series of cases we studied we observed that the subjective scales presented a clear trend to assign a high degree of severity, while the objective scale did exactly the opposite. This might be related to the fact that the cases in the current study fit into what is “typically” considered as severe dry eye, but the same could have happened if patients with “typically” mild disease had been evaluated (allocating the majority in the lower quartile). In both cases, detecting subtle changes in CFS, either worsening in severe cases or improvement in mild cases, would be challenging with an ordinal scale. It is important to note that application of an image-based CFS scoring method has some challenges, mostly related to the achievement of a standardized and repeatable image acquisition process. However, this aim is not insurmountable and can be accomplished by using a defined protocol and with some practice. Specifically, it is critical to minimize variables such as illumination intensity, slit-lamp use technique, fluorescein application, or camera settings. In some cases with severe ocular surface disease, clinical image acquisition may present some challenges, and thus clinical and research personnel must be familiar with the protocols and ready to perform a procedure that is as quick and comfortable as possible. In any case, these are limitations that are present in virtually all scenarios where ocular surface image–based analysis is conducted.
Corneal fluorescein staining is an essential biomarker to understand ocular surface pathology; it is fundamental to guide treatment, evaluate its efficacy, and develop new therapies, and it is a cornerstone for regulatory purposes. The CFSi is an alternative way to assess CFS that is easy to implement and has a higher level of interobserver consistency than an ordinal scale such as the NEI/Industry grading scale. Additionally, our data provide new perspectives suggesting that objective continuous CFS scales are more accurate in detecting differences in measured CFS. Objective continuous scales could lead to an alternative way of corneal epitheliopathy profiling in ocular surface disease and to some paradigm changes, perhaps by showing some interventions to be more effective than previously thought or by improving understanding of the real relationship between clinical signs and patients' symptoms.
The Massachusetts Eye and Ear holds intellectual property pertaining to the use of the technology mentioned in this manuscript. Francisco Amparo and Reza Dana are listed as co-inventors of this technology.
Disclosure: F. Amparo, None; H. Wang, None; J. Yin, None; A. Marmalidou, None; R. Dana, None