August 2015
Volume 56, Issue 9
Free
Cornea  |   August 2015
Focused Tortuosity Definitions Based on Expert Clinical Assessment of Corneal Subbasal Nerves
Author Affiliations & Notes
  • Neil Lagali
    Department of Ophthalmology Institute for Clinical and Experimental Medicine, Faculty of Health Sciences, Linköping University, Linköping, Sweden
  • Enea Poletti
    Department of Information Engineering, University of Padova, Padova, Italy
  • Dipika V. Patel
    Department of Ophthalmology, New Zealand National Eye Centre, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
  • Charles N. J. McGhee
    Department of Ophthalmology, New Zealand National Eye Centre, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
  • Pedram Hamrah
    Ocular Surface Imaging Center, Schepens Eye Research Institute, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
  • Ahmad Kheirkhah
    Ocular Surface Imaging Center, Schepens Eye Research Institute, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
  • Mitra Tavakoli
    Centre for Endocrinology & Diabetes, Institute of Human Development, University of Manchester, Manchester, United Kingdom
  • Ioannis N. Petropoulos
    Centre for Endocrinology & Diabetes, Institute of Human Development, University of Manchester, Manchester, United Kingdom
    Weill Cornell Medical College in Qatar, Qatar Foundation, Education City, Doha, Qatar
  • Rayaz A. Malik
    Centre for Endocrinology & Diabetes, Institute of Human Development, University of Manchester, Manchester, United Kingdom
    Weill Cornell Medical College in Qatar, Qatar Foundation, Education City, Doha, Qatar
  • Tor Paaske Utheim
    Department of Medical Biochemistry, Oslo University Hospital, Oslo, Norway
    Department of Oral Biology, Faculty of Dentistry, University of Oslo, Oslo, Norway
  • Andrey Zhivov
    Department of Ophthalmology, University of Rostock, Rostock, Germany
  • Oliver Stachs
    Department of Ophthalmology, University of Rostock, Rostock, Germany
  • Karen Falke
    Department of Ophthalmology, University of Rostock, Rostock, Germany
  • Sabine Peschel
    Department of Ophthalmology, University of Rostock, Rostock, Germany
  • Rudolf Guthoff
    Department of Ophthalmology, University of Rostock, Rostock, Germany
  • Cecilia Chao
    School of Optometry and Vision Science, University of New South Wales, Sydney, New South Wales, Australia
  • Blanka Golebiowski
    School of Optometry and Vision Science, University of New South Wales, Sydney, New South Wales, Australia
  • Fiona Stapleton
    School of Optometry and Vision Science, University of New South Wales, Sydney, New South Wales, Australia
  • Alfredo Ruggeri
    Department of Information Engineering, University of Padova, Padova, Italy
  • Correspondence: Neil Lagali, Department of Ophthalmology, Institute for Clinical and Experimental Medicine, Faculty of Health Sciences, Linköping University, 581 83 Linköping, Sweden; neil.lagali@liu.se
Investigative Ophthalmology & Visual Science August 2015, Vol.56, 5102-5109. doi:10.1167/iovs.15-17284
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Neil Lagali, Enea Poletti, Dipika V. Patel, Charles N. J. McGhee, Pedram Hamrah, Ahmad Kheirkhah, Mitra Tavakoli, Ioannis N. Petropoulos, Rayaz A. Malik, Tor Paaske Utheim, Andrey Zhivov, Oliver Stachs, Karen Falke, Sabine Peschel, Rudolf Guthoff, Cecilia Chao, Blanka Golebiowski, Fiona Stapleton, Alfredo Ruggeri; Focused Tortuosity Definitions Based on Expert Clinical Assessment of Corneal Subbasal Nerves. Invest. Ophthalmol. Vis. Sci. 2015;56(9):5102-5109. doi: 10.1167/iovs.15-17284.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: We examined agreement among experts in the assessment of corneal subbasal nerve tortuosity.

Methods: Images of corneal subbasal nerves were obtained from investigators at seven sites (Auckland, Boston, Linköping, Manchester, Oslo, Rostock, and Sydney) using laser-scanning in vivo confocal microscopy. A set of 30 images was assembled and ordered by increasing tortuosity by 10 expert graders from the seven sites. In a first experiment, graders assessed tortuosity without a specific definition and performed grading three times, with at least 1 week between sessions. In a second experiment, graders assessed the same image set using four focused tortuosity definitions. Intersession and intergrader repeatability for the experiments were determined using the Spearman rank correlation.

Results: Expert graders without a specific tortuosity definition had high intersession (Spearman correlation coefficient 0.80), but poor intergrader (0.62) repeatability. Specific definitions improved intergrader repeatability to 0.79. In particular, tortuosity defined by frequent small-amplitude directional changes (short range tortuosity) or by infrequent large-amplitude directional changes (long range tortuosity), indicated largely independent measures and resulted in improved repeatability across the graders. A further refinement, grading only the most tortuous nerve in a given image, improved the average correlation of a given grader's ordering of images with the group average to 0.86 to 0.90.

Conclusions: Definitions of tortuosity specifying short or long-range tortuosity and considering only the most tortuous nerve in an image improved the agreement in tortuosity grading among a group of expert observers. These definitions could improve accuracy and consistency in quantifying subbasal nerve tortuosity in clinical studies.

Tortuosity is defined by the degree of twistedness of a curved structure. It is a particularly evident feature of corneal subbasal nerves. Tortuosity is becoming more widely used to quantitatively describe and/or compare subbasal nerve status in healthy corneas,15 in the context of ocular pathology, such as dry eye disease57 and keratoconus,4 and in systemic diseases, such as diabetes,815 rheumatoid arthritis,16 inflammatory polyneuropathy,17 Wilson's disease,18 and amyotrophic lateral sclerosis.19 Unfortunately, however, there is no standardized definition for tortuosity, and its interpretation and quantification can vary, leading to poor reproducibility for this measure.20,21 While some methods to describe tortuosity are purely quantitative and mathematically-defined,4 others are semiquantitative and rely on observer judgment,1 and still others are based on quantitative descriptions derived from subjective observation.8 The plurality of definitions not only makes it difficult to compare results across studies, but may also result in overlooking important differences in nerve morphology by adopting a particular definition. Computation of a single value for tortuosity may, for example, mask the complexity of nerve patterns, such that widely differing morphologic characteristics of nerves could have the same numeric tortuosity value. 
As a starting point for developing more standardized and reproducible measures of tortuosity, the interpretation and perception of nerve tortuosity is important. Human judgement may take into account nerve architecture and morphologic features that have not yet been explicitly identified or mathematically defined. Also, an ideal tortuosity grading and quantification scheme should be consistent with the judgment of those experienced in assessing nerve images and making clinical decisions. To date, the perception of tortuosity and the level of agreement between expert observers have not been fully explored. 
This study, therefore, had several aims: to investigate the perception of nerve tortuosity and measure its intra and interobserver agreement across a range of subbasal nerve images from healthy and pathologic corneas, representing patients and experienced graders from geographically diverse regions; to explore alternative, more focused definitions of tortuosity and determine the agreement of the perception of these focused definitions among a group of expert graders; and to provide reference image sets representing ground-truth agreement among experienced graders, for the future development of new algorithms to measure tortuosity in clinical studies. 
Starting with a reference set of diverse images and a group of examiners experienced in the assessment of corneal nerve images, it was hypothesized that a consensus would emerge, that could eventually lead to standardization in the assessment and quantification of subbasal nerve tortuosity. 
Methods
Image Compilation Across Centers
Investigators from seven geographically dispersed clinical research centers (Auckland, Boston, Linköping, Manchester, Oslo, Rostock, and Sydney) were each asked to contribute 6 to 10 images of the subbasal nerve plexus depicting various severities of tortuous subbasal nerves (by subjective assessment). Images were selected from a database at each investigator clinic. All centers used the same type of laser-scanning in vivo confocal microscope to acquire the images (Heidelberg Retinal Tomograph 2/3 with Rostock Corneal Module, HRT-RCM; Heidelberg Engineering, Heidelberg, Germany), and all images were from patients who had given informed consent at the time of examination. All images were anonymized at the center of origin by removing identifying patient information except for age, sex, and disease. As the acquisition of original images from individual centers was approved by the respective local ethical review committees, occurred with informed consent, and followed the tenets of the Declaration of Helsinki, no specific further ethical approval was sought for a retrospective analysis of the resulting compilation of images. From the 67 received images, a set of 30 images was selected by including high-quality images without tilt (oblique orientation) and depicting clear, high-contrast subbasal nerves while excluding similar (redundant) images. The final images represented 30 eyes from 30 subjects with the following conditions: healthy (10 subjects), dry eye disease (4), post-LASIK (4), diabetes mellitus type 2 (3), diabetes mellitus type 1 (2), keratoconus (2), corneal transplantation (1), contact lens wearer (1), granular dystrophy (1), autism (1), and untreated gastrointestinal cancer (1). 
Experiment 1: Grading of Generic Tortuosity
Ten experienced clinical graders from the seven centers were asked to sort the set of 30 images by arranging them in order of increasing tortuosity. To facilitate this task, a custom-developed image sorting tool was provided, TorTsorT (available in the public domain at http://bioimlab.dei.unipd.it). The TorTsorT tool sequentially displays pairs of images, asking the user to select the more tortuous image. It then combines all the sorted pairs into a sequence of images by means of the merge-sort algorithm. Merge-sort is a comparison-based divide-and-conquer sorting algorithm that ensures the best average- and worst-case performance. Using this software, each grader provided input to sort the set of 30 images from lowest to highest tortuosity. 
No specific definition of tortuosity or instruction on how to assess tortuosity was provided to the graders. Graders performed the sorting three times (sessions) on the same image set. Sessions were separated in time by at least 1 week, with images to be analyzed at a given session provided to the centers with 1-week delays after being randomly renamed, so as to minimize any memory effect. Besides the sorting of images, graders also were asked to note any difficulties or comments they had while performing the sorting during the multiple sessions. 
Experiment 2: Grading by Specific Definitions of Tortuosity
Based on grader comments in the first experiment and inspection of the images that had the greatest average differences in position in the ordered list by the various graders, some specific definitions of tortuosity were provided to the graders. Four different definitions of tortuosity were proposed: T1, a measure of short-range changes in direction, considering an average of all nerves in the image; T2, a measure of short-range changes in direction, considering only the nerve with maximum tortuosity; T3, a measure of long-range changes in direction, considering an average of all nerves in the image; and T4, a measure of long-range changes in direction, considering only the nerve with maximum tortuosity. 
These short- and long-range definitions are illustrated in Figure 1. The 10 graders were then asked to sort again the same 30-image set from Experiment 1 in order of increasing tortuosity, using each of the four definitions T1 through T4. 
Figure 1
 
Schematic illustration of the more focused tortuosity definitions used in Experiment 2. Short-range directional changes often were small-amplitude and high frequency (short periodicity), while long-range directional changes tended to be large-amplitude and lower frequency (long periodicity).
Figure 1
 
Schematic illustration of the more focused tortuosity definitions used in Experiment 2. Short-range directional changes often were small-amplitude and high frequency (short periodicity), while long-range directional changes tended to be large-amplitude and lower frequency (long periodicity).
Consensus Rank
To express an overall or average image ordering, for example, to derive the ground truth assessment to compare with, and to enable comparison between graders and/or tortuosity definitions, we defined the consensus rank, which is the equivalent for ordered lists of the average for numerical quantities. For a given set of ordered lists of images, the consensus rank was determined as follows: Each image was given a score represented by its average position (rank) in the ordered lists; the consensus rank was obtained by ordering the images by these scores. 
In experiment 1, to assess intergrader agreement, the Spearman rank correlation coefficient was calculated between the ordered list of images from a given grader at a given session and the consensus rank (see below) of all other graders for that session (excluding the grader in question). To assess intragrader agreement across sessions, the Spearman correlation was performed between pairs of image lists ordered by a given grader at the different sessions (i.e., session 1 vs. 2, 2 vs. 3, and 1 vs. 3). The average Spearman correlation coefficient with the consensus rank for all graders then was computed for each session (CRS1–3) and the three ranks then were averaged across the three sessions to represent an average consensus rank of images across all graders and sessions (CRS). 
In experiment 2, to assess the agreement of individual grader orderings with the consensus rank, for each tortuosity definition T1 through T4, the Spearman correlation between each grader's ordering and the consensus rank of that tortuosity definition was computed, and this value was averaged across all graders for each definition (CRT1–4). To compare the correlation between different definitions of tortuosity (i.e., no specific definition in Experiment 1 and the various definitions in Experiment 2), Spearman correlations among the consensus ranks CRT1 through T4 and CRS1 through T3 were computed. 
Results
Experiment 1: Grading of Generic Tortuosity
All 10 graders completed sorting for all three sessions, resulting in a complete data set that was compiled for analysis. The correlation of the sorted list provided by each grader with the consensus rank of all 10 graders for a given session is given in Table 1. The range in correlation coefficients was 0.43 to 0.73, with an overall mean correlation of 0.62 across all graders and sessions. Images from the data set with the lowest and highest perceived tortuosity are given in Figure 2
Table 1
 
Spearman Rank Correlation Coefficients for Each Grader and Each Session
Table 1
 
Spearman Rank Correlation Coefficients for Each Grader and Each Session
Figure 2
 
Images with the lowest (A, B) and highest (C, D) tortuosity on average, from the 30-image data set, as graded subjectively by 10 expert graders in experiment 1. All images, 400 × 400 μm.
Figure 2
 
Images with the lowest (A, B) and highest (C, D) tortuosity on average, from the 30-image data set, as graded subjectively by 10 expert graders in experiment 1. All images, 400 × 400 μm.
To investigate intragrader repeatability across sessions, the correlation between each grader's orderings across the different sessions was computed (Table 2), along with the average repeatability. Repeatability across sessions ranged from 0.51 to 0.89, with an overall repeatability of 0.80 across all graders. One grader, however, had a poor repeatability (mean intragrader correlation of 0.51). 
Table 2
 
Spearman's Rank Correlation Coefficients Between Orderings Provided by the Same Grader in Different Sessions
Table 2
 
Spearman's Rank Correlation Coefficients Between Orderings Provided by the Same Grader in Different Sessions
Without a specific definition of tortuosity, several graders noted difficulties in assessing and comparing nerves with numerous small-amplitude directional changes versus nerves with few large-amplitude directional changes. Another difficulty noted was whether to consider the average tortuosity of nerves in a given image or to consider the nerve with the maximum tortuosity. These difficulties were partially evident when examining the images that had the largest average sorting error across the 10 observers (Fig. 3). This average sorting error has been derived as the absolute difference in a grader's rank of an image relative to the consensus rank, averaged across all graders. Nerves in these images clearly adopted nonlinear paths, often with few, but large-scale changes in direction and without a clear “start” or “end” point. 
Figure 3
 
Images with the greatest sorting error among observers. The root mean squared error in the rank of an image from the consensus rank across all graders for that image was 12, 10, 7, 7, and 7 positions for images (AE), respectively (from a list with 30 possible positions corresponding to the 30-image set). All images, 400 × 400 μm.
Figure 3
 
Images with the greatest sorting error among observers. The root mean squared error in the rank of an image from the consensus rank across all graders for that image was 12, 10, 7, 7, and 7 positions for images (AE), respectively (from a list with 30 possible positions corresponding to the 30-image set). All images, 400 × 400 μm.
Experiment 2: Grading by Specific Definitions of Tortuosity
For each of the four definitions of tortuosity, T1 through T4, all graders completed grading of the image set. Agreement between each grader's sorted list and the consensus rank for a specific tortuosity definition was assessed by the Spearman rank correlation coefficient, and is given in Table 3
Table 3
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition
Table 3
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition
Very low correlations (<0.35) were achieved by three graders for two tortuosity definitions each. It was deemed likely that these graders misunderstood the instructions or confused the definitions of tortuosity. Across the remaining seven graders, minimum correlation was 0.55, although this value was based on inclusion of the grading results from the three graders with low correlation in computing the consensus rank. To overcome this limitation, it was decided to consider the very poor correlations (from graders 2, 8, and 10) as outliers and exclude their grading results from the analysis. The resulting correlations, using only data from seven graders, are given in Table 4. In this case, the range of correlation was 0.60 to 0.84, with a mean correlation across all seven graders of 0.66 to 0.70, depending on tortuosity definition. Interestingly, for all seven graders, T2 had a higher degree of correlation with the group average than T1, and similarly T4 had a higher degree of correlation compared to T3. Grades T2 and T4 represented tortuosity definitions considering only the most tortuous nerve in the image. 
Table 4
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition Based on Seven Graders
Table 4
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition Based on Seven Graders
To graphically represent the degree of variability in image grading among the seven graders for the four tortuosity definitions, the error in image rank for each image and grader was plotted versus the consensus rank, ranked from lowest to highest average tortuosity, (i.e., image 1 had lowest average tortuosity and image 30 had the highest, for a particular tortuosity definition). These graphical representations are given in Figure 4
Figure 4
 
Distribution of grading error for seven observers for images ordered by increasing tortuosity, based on four different tortuosity definitions T1 through T4. For each image number on the x-axis, seven black dots indicate the error in ranking of that image by each observer for the given tortuosity definition (relative to the consensus rank for that particular image and definition). Dashed lines are indicated at an error level of ±10 image positions in the ranking to facilitate comparisons between the various definitions.
Figure 4
 
Distribution of grading error for seven observers for images ordered by increasing tortuosity, based on four different tortuosity definitions T1 through T4. For each image number on the x-axis, seven black dots indicate the error in ranking of that image by each observer for the given tortuosity definition (relative to the consensus rank for that particular image and definition). Dashed lines are indicated at an error level of ±10 image positions in the ranking to facilitate comparisons between the various definitions.
Several features became apparent when comparing the resulting distributions. The distributions confirmed that T2 and T4 had fewer outliers and better overall agreement among graders (lower overall error) than T1 and T3, respectively. In addition, images with the lowest tortuosity and the highest tortuosity had better interobserver agreement than images with a moderate level of tortuosity. The definitions T2 and T4 also appeared to improve the agreement in images with a moderate to high level of tortuosity, compared to T1 and T3 definitions. To numerically assess the level of agreement across the different tortuosity definitions, the consensus ranks (i.e., average ranked list of images across all seven observers) were determined and the average correlation with the consensus rank across the seven observers was computed, for each definition (Table 5). 
Table 5
 
Average Spearman Rank Correlation Coefficients Between Graders and the Consensus Rank for the Generic Tortuosity Assessment in Experiment 1 (CRS, Average Correlation Across all Three Sessions) and for Each Specific Tortuosity Definition in Experiment 2 (CRT, Average Correlation of Graders With the Consensus Rank, for a Specific Tortuosity Definition)
Table 5
 
Average Spearman Rank Correlation Coefficients Between Graders and the Consensus Rank for the Generic Tortuosity Assessment in Experiment 1 (CRS, Average Correlation Across all Three Sessions) and for Each Specific Tortuosity Definition in Experiment 2 (CRT, Average Correlation of Graders With the Consensus Rank, for a Specific Tortuosity Definition)
Observing Table 5, it is evident that a lower error for T2 and T4 noted in Table 4 and Figure 4 correspond to a higher overall interobserver agreement for CRT2 and CRT4 relative to CRT1 and CRT3, respectively. In other words, assessing only the nerve with the highest tortuosity in the image improved the average interobserver agreement with the consensus rank. Moreover, the focused definitions of tortuosity in Experiment 2 all had better interobserver agreement compared to the subjective tortuosity assessment in Experiment 1. The consensus rank images for T2 and T4 definitions are shown in Figure 5. Finally, to assess the level of overlap in the generic and specific tortuosity definitions in terms of grader perceptions, Spearman correlations between all pairs of consensus rank image sets CRT1 through T4 and CRS1 through T3 were performed (Table 6). 
Figure 5
 
Images sorted by the T2 consensus rank, upper panels (increasing tortuosity from top left to bottom right), and images sorted by the T4 consensus rank, lower panels (increasing tortuosity from top left to bottom right). The positions of the images are substantially different depending on the tortuosity definition used. All images, 400 × 400 μm.
Figure 5
 
Images sorted by the T2 consensus rank, upper panels (increasing tortuosity from top left to bottom right), and images sorted by the T4 consensus rank, lower panels (increasing tortuosity from top left to bottom right). The positions of the images are substantially different depending on the tortuosity definition used. All images, 400 × 400 μm.
Table 6
 
Spearman Rank Correlation Between Pairs of Consensus Ranks in Experiments 1 and 2.
Table 6
 
Spearman Rank Correlation Between Pairs of Consensus Ranks in Experiments 1 and 2.
The consensus ranks CRS1 through S3 were highly correlated with each other (correlation coefficient 0.93–0.94), indicating very good reproducibility of generic tortuosity assessment between sessions. The correlation between generic assessment CRS1 through S3 and CRT1 through T2 (short range tortuosity) was relatively weak (0.50–0.66), whereas CRS1 through S3 correlated strongly (0.86–0.91) with CRT3 through T4 (long range tortuosity). For the specific definitions, CRT1 and T2 were highly correlated with each other (0.94) and similarly for CRT3 and T4 (0.97). Interestingly, however, correlation between perceptions of short and long range tortuosity were very weak (0.27–0.40). 
Discussion
Increased tortuosity in peripheral nerves, such as those of the corneal subbasal nerve plexus, may reflect pathophysiologic processes in the body that may otherwise be difficult to assess. The cornea in particular presents an ideal location for noninvasive, two-dimensional imaging of a nerve plexus within which morphologic parameters, such as tortuosity can be assessed. Other techniques of peripheral nerve examination, such as skin biopsies22 do not present a two-dimensional view and are, therefore, less amenable for investigation of long-range tortuous nerve paths. Nevertheless, despite the ease of imaging the corneal subbasal nerve plexus, standard definitions of tortuosity are lacking. 
Several distinct methods to quantify subbasal nerve tortuosity have been described.1,4,8,15 One widely adopted method, first published by Oliveira-Soto and Efron,1 involves a subjective, semiquantitative grading of tortuosity on a grading scale from 0 (no tortuosity) to 4 (maximum tortuosity), exemplified by representative images containing relatively few nerves. Using this method, where the observer considers frequency and amplitude of directional changes, a normal, healthy cornea had subbasal nerves with an average tortuosity grade of 1.2. Although the scale is simple to implement, differences in interpretation have led to variability in tortuosity values for even healthy subject groups (e.g., see the comparison by Patel and McGhee2). Although the scale has been used widely, no information is available concerning agreement between expert observers, and the scale itself was developed based only on subbasal nerves in healthy corneas. 
In another common method, first described by Kallinikos et al.,8 a numerical tortuosity coefficient was computed based on the number and frequency of directional changes derived from first and second derivatives of nerve paths. Whereas the strengths of this method are its precise mathematical definition, reproducibility, and objectivity, it is nevertheless difficult for an observer to intuitively relate the numeric coefficient to the level of tortuosity in a given image due to the use of different scaling factors. Moreover, this nerve fiber tortuosity coefficient has been shown to be highly variable in patients with diabetic neuropathy, such that it has been shown to be increased, decreased, and, indeed, unchanged compared to control subjects.10,12,20 
Scarpa et al.4 developed an alternative mathematical description of tortuosity by taking into account the number of changes in the curvature of a nerve combined with the amplitude of these changes. The measure also included an empirical weighting function to boost the average tortuosity of nerves in an image with one or a few very tortuous nerves. The strength of this technique is the combination of an objective mathematical description with perceptions of a human observer. A weakness, however, is that algorithm parameters were empirically derived based on one image set analyzed by a single observer. Moreover, no information about the perception of multiple observers or interobserver variation was given. 
Recently, another completely automated algorithm for tortuosity quantification was described by Ziegler et al.15 in a population of healthy subjects and patients newly diagnosed with diabetes. The measure, called CNFTo, quantified “total absolute nerve fiber curvature.”15 Although its precise definition was unclear, it did not differ between diabetic patients and control subjects, and the numeric value was difficult to interpret with respect to nerve morphology. 
Likely due to differences in interpretation and implementation of the above tortuosity measures, quantifying subbasal nerve tortuosity in different pathologies has led to conflicting results. For example, using the same method of Oliveira-Soto and Efron,1 significantly increased subbasal nerve tortuosity was reported in both Sjögren's and non-Sjögren associated dry eye in one study,5 while another study showed no difference in tortuosity between healthy and dry eye subjects.6 Sampling limitations and differences in study populations, however, cannot be ruled out. Likewise, a recent meta-analysis of corneal nerve assessment in patients with diabetic peripheral neuropathy concluded there was no difference in nerve tortuosity between healthy or neuropathy patients or in diabetics with or without peripheral neuropathy.23 The meta-analysis also highlighted conflicting results in the examined studies and a lack of standardization. 
To improve the reproducibility of nerve tortuosity as a parameter describing subbasal nerves, the present study sought to compare the clinical perception of tortuosity among a group of expert graders. Without a specific tortuosity definition, generic grading of tortuosity was repeatable by a given grader across 3 sessions (average correlation coefficient of 0.80) but the correlation among different graders was low to moderate (average 0.62). The grading exercise resulted in identification of two subtypes of tortuosity, describing either short-range, small amplitude or long-range, large amplitude directional changes in the nerves. These specific definitions improved the intergrader agreement compared to generic grading (from 0.62–0.79). The short-range and long-range tortuosity measures were poorly correlated with each other, suggesting that these are two distinct and independent measures of tortuosity that are clearly distinguishable by expert graders. The average correlation of a given grader with the consensus rank for the specific tortuosity definitions rose to 0.86 through 0.90, when only the nerve with the maximum tortuosity in an image was considered. The exclusion of additional information from other nerves in the image did not appear to affect the intergrader agreement, since the correlation between using all or only the maximum tortuosity nerve was very high (0.94–0.97). This method, therefore, provides a robust means to assess a specific type of tortuosity with high reproducibility and ease and speed of implementation (since only the most tortuous nerve per image is assessed). This reproducibility is essential for comparing the results of studies conducted in different centers. Moreover, a reproducible measure of tortuosity also could aid in the possible normalization of subbasal nerve density measurements.24 It also is interesting to note that the specific tortuosity definitions in this study are largely independent of prior tortuosity parameters. For example, the correlation between the consensus rank for T2 (CRT2) and the tortuosity coefficient reported by Kallinikos et al.8 is 0.24. 
The set of 30 “consensus rank”–ordered subbasal nerve images based on short- and long-range tortuosity definitions is shown in Figure 5, and is available in the public domain at http://bioimlab.dei.unipd.it. It is envisioned that these ordered image sets, representing a wide selection of tortuous nerves from various pathologies and firmly based on a consensus between expert graders, will be used by researchers to develop algorithms to quantify short- and long-range tortuosity. This in turn can lead to objective, reproducible, and expert-validated automated means to assess the two distinct types of nerve tortuosity in future studies. 
While this study focused on perception and definitions of tortuosity, reproducibility in tortuosity analysis can be subject to sampling bias regardless of the definition used. Further efforts are required to develop a standardized sampling strategy taking into consideration the sampling area, number of distinct images to use, averaging, location of images, whorl region, and so on. 
Although it remains to be seen how short-range and long-range tortuosity parameters are related to subbasal nerves in healthy and pathologic corneas, it is hypothesized that one or both of these distinct parameters may better describe the morphology of the nerves (and may better correlate to disease) than the single tortuosity parameters used today. 
Acknowledgments
Disclosure: N. Lagali, None; E. Poletti, None; D.V. Patel, None; C.N.J. McGhee, None; P. Hamrah, None; A. Kheirkhah, None; M. Tavakoli, None; I.N. Petropoulos, None; R.A. Malik, None; T.P. Utheim, None; A. Zhivov, None; O. Stachs, None; K. Falke, None; S. Peschel, None; R. Guthoff, None; C. Chao, None; B. Golebiowski, None; F. Stapleton, None; A. Ruggeri, None 
References
Oliveira-Soto L, Efron N. Morphology of corneal nerves using confocal microscopy. Cornea. 2001; 20: 374–384.
Patel DV, McGhee CNJ. In vivo confocal microscropy of human corneal nerves in health, in ocular and systemic disease, and following corneal surgery: a review. Br J Ophthalmol. 2009; 93: 853–860.
Patel DV, Tavakoli M, Craig JP, Efron N, McGhee CNJ. Corneal sensitivity and slit scanning in vivo confocal microscopy of the subbasal nerve plexus of the normal central and peripheral human cornea. Cornea. 2009; 28: 735–740.
Scarpa F, Zheng X, Ohashi Y, Ruggeri A. Automatic evaluation of corneal nerve tortuosity in images from in vivo confocal microscopy. Invest Ophthalmol Vis Sci. 2011 ; 52: 6404–6408.
Benítez del Castillo JM, Wasfy MA, Fernandez C, Garcia-Sanchez J. An in vivo confocal masked study on corneal epithelium and subbasal nerves in patients with dry eye. Invest Ophthalmol Vis Sci. 2004; 45: 3030–3035.
Hoşal BM, Ornek N, Zilelioğlu G, Elhan AH. Morphology of corneal nerves and corneal sensation in dry eye: a preliminary study. Eye. 2005; 19: 1276–1279.
Labbé A, Liang Q, Wang Z, et al. Corneal nerve structure and function in patients with non-Sjogren dry eye: clinical correlations. Invest Ophthalmol Vis Sci. 2013; 54: 5144–5150.
Kallinikos P, Berhanu M, O'Donnell C, Boulton AJ, Efron N, Malik RA. Corneal nerve tortuosity in diabetic patients with neuropathy. Invest Ophthalmol Vis Sci. 2004; 45: 418–422.
De Cillà S, Ranno S, Carini E, et al. Corneal subbasal nerves changes in patients with diabetic retinopathy: an in vivo confocal study. Invest Ophthalmol Vis Sci. 2009; 50: 5155–5158.
Mehra S, Tavakoli M, Kallinikos PA, et al. Corneal confocal microscopy detects early nerve regeneration after pancreas transplantation in patients with type 1 diabetes. Diabetes Care. 2007; 30: 2608–2612.
Pritchard N, Edwards K, Shahidi AM, et al. Corneal markers of diabetic neuropathy. Ocul Surf. 2011 ; 9: 17–28.
Edwards K, Pritchard N, Vagenas D, Russell A, Malik RA, Efron N. Utility of corneal confocal microscopy for assessing mild diabetic neuropathy: baseline findings of the LANDMark study. Clin Exp Optom. 2012; 95: 348–354.
Nitoda E, Kallinikos P, Pallikaris A, et al. Correlation of diabetic retinopathy and corneal neuropathy using confocal microscopy. Curr Eye Res. 2012 ; 37: 898–906.
Maddaloni E, Sabatino F, Del Toro R, et al. In vivo corneal confocal microscopy as a novel non-invasive tool to investigate cardiac autonomic neuropathy in Type 1 diabetes. Diabet Med. 2015; 32: 262–266.
Ziegler D, Papanas N, Zhivov A, et al. German Diabetes Study (GDS) Group. Early detection of nerve fiber loss by corneal confocal microscopy and skin biopsy in recently diagnosed type 2 diabetes. Diabetes. 2014; 63: 2454–2463.
Villani E, Galimberti D, Viola F, Mapelli C, Del Papa N, Ratiglia R. Corneal involvement in rheumatoid arthritis: an in vivo confocal study. Invest Ophthalmol Vis Sci. 2008; 49: 560–564.
Schneider C, Bucher F, Cursiefen C, Fink GR, Heindl LM, Lehmann HC. Corneal confocal microscopy detects small fiber damage in chronic inflammatory demyelinating polyneuropathy (CIDP). J Peripher Nerv Syst. 2014; 19: 322–327.
Sturniolo GC, Lazzarini D, Bartolo O, et al. Small fiber peripheral neuropathy in Wilson disease: an in vivo documentation by corneal confocal microscopy. Invest Ophthalmol Vis Sci. 2015; 56: 1390–1395.
Ferrari G, Grisan E, Scarpa F, et al. Corneal confocal microscopy reveals trigeminal small sensory fiber neuropathy in amyotrophic lateral sclerosis. Front Aging Neurosci. 2014 ; 6: 278.
Hertz P, Bril V, Orszag A, et al. Reproducibility of in vivo corneal confocal microscopy as a novel screening test for early diabetic sensorimotor polyneuropathy. Diabet Med. 2011; 28: 1253–1260.
Patel DV, McGhee CN. Quantitative analysis of in vivo confocal microscopy images: a review. Surv Ophthalmol. 2013; 58: 466–475.
Lauria G, Devigili G. Skin biopsy as a diagnostic tool in peripheral neuropathy. Nat Clin Pract Neurol. 2007; 3: 546–557.
Jiang MS, Yuan Y, Gu ZX, Zhuang SL. Corneal confocal microscopy for assessment of diabetic peripheral neuropathy: a meta-analysis [published online ahead of print February 12, 2015]. Br J Ophthalmol. doi:10.1136/bjophthalmol.2014.306038.
Edwards K, Pritchard N, Vagenas D, Russell A, Malik RA, Efron N. Standardizing corneal nerve fiber length for nerve tortuosity increases its association with measures of diabetic neuropathy. Diabet Med. 2014; 31: 1205–1209.
Figure 1
 
Schematic illustration of the more focused tortuosity definitions used in Experiment 2. Short-range directional changes often were small-amplitude and high frequency (short periodicity), while long-range directional changes tended to be large-amplitude and lower frequency (long periodicity).
Figure 1
 
Schematic illustration of the more focused tortuosity definitions used in Experiment 2. Short-range directional changes often were small-amplitude and high frequency (short periodicity), while long-range directional changes tended to be large-amplitude and lower frequency (long periodicity).
Figure 2
 
Images with the lowest (A, B) and highest (C, D) tortuosity on average, from the 30-image data set, as graded subjectively by 10 expert graders in experiment 1. All images, 400 × 400 μm.
Figure 2
 
Images with the lowest (A, B) and highest (C, D) tortuosity on average, from the 30-image data set, as graded subjectively by 10 expert graders in experiment 1. All images, 400 × 400 μm.
Figure 3
 
Images with the greatest sorting error among observers. The root mean squared error in the rank of an image from the consensus rank across all graders for that image was 12, 10, 7, 7, and 7 positions for images (AE), respectively (from a list with 30 possible positions corresponding to the 30-image set). All images, 400 × 400 μm.
Figure 3
 
Images with the greatest sorting error among observers. The root mean squared error in the rank of an image from the consensus rank across all graders for that image was 12, 10, 7, 7, and 7 positions for images (AE), respectively (from a list with 30 possible positions corresponding to the 30-image set). All images, 400 × 400 μm.
Figure 4
 
Distribution of grading error for seven observers for images ordered by increasing tortuosity, based on four different tortuosity definitions T1 through T4. For each image number on the x-axis, seven black dots indicate the error in ranking of that image by each observer for the given tortuosity definition (relative to the consensus rank for that particular image and definition). Dashed lines are indicated at an error level of ±10 image positions in the ranking to facilitate comparisons between the various definitions.
Figure 4
 
Distribution of grading error for seven observers for images ordered by increasing tortuosity, based on four different tortuosity definitions T1 through T4. For each image number on the x-axis, seven black dots indicate the error in ranking of that image by each observer for the given tortuosity definition (relative to the consensus rank for that particular image and definition). Dashed lines are indicated at an error level of ±10 image positions in the ranking to facilitate comparisons between the various definitions.
Figure 5
 
Images sorted by the T2 consensus rank, upper panels (increasing tortuosity from top left to bottom right), and images sorted by the T4 consensus rank, lower panels (increasing tortuosity from top left to bottom right). The positions of the images are substantially different depending on the tortuosity definition used. All images, 400 × 400 μm.
Figure 5
 
Images sorted by the T2 consensus rank, upper panels (increasing tortuosity from top left to bottom right), and images sorted by the T4 consensus rank, lower panels (increasing tortuosity from top left to bottom right). The positions of the images are substantially different depending on the tortuosity definition used. All images, 400 × 400 μm.
Table 1
 
Spearman Rank Correlation Coefficients for Each Grader and Each Session
Table 1
 
Spearman Rank Correlation Coefficients for Each Grader and Each Session
Table 2
 
Spearman's Rank Correlation Coefficients Between Orderings Provided by the Same Grader in Different Sessions
Table 2
 
Spearman's Rank Correlation Coefficients Between Orderings Provided by the Same Grader in Different Sessions
Table 3
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition
Table 3
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition
Table 4
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition Based on Seven Graders
Table 4
 
Spearman Rank Correlation Coefficient for Each Grader and Tortuosity Definition Based on Seven Graders
Table 5
 
Average Spearman Rank Correlation Coefficients Between Graders and the Consensus Rank for the Generic Tortuosity Assessment in Experiment 1 (CRS, Average Correlation Across all Three Sessions) and for Each Specific Tortuosity Definition in Experiment 2 (CRT, Average Correlation of Graders With the Consensus Rank, for a Specific Tortuosity Definition)
Table 5
 
Average Spearman Rank Correlation Coefficients Between Graders and the Consensus Rank for the Generic Tortuosity Assessment in Experiment 1 (CRS, Average Correlation Across all Three Sessions) and for Each Specific Tortuosity Definition in Experiment 2 (CRT, Average Correlation of Graders With the Consensus Rank, for a Specific Tortuosity Definition)
Table 6
 
Spearman Rank Correlation Between Pairs of Consensus Ranks in Experiments 1 and 2.
Table 6
 
Spearman Rank Correlation Between Pairs of Consensus Ranks in Experiments 1 and 2.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×