**Purpose**:
To assess the performance of a novel system for automated tortuosity estimation and interpretation.

**Methods**:
A supervised strategy (driven by observers' grading) was employed to automatically identify the combination of tortuosity measures (i.e., tortuosity representation) leading to the best agreement with the observers. We investigated 18 tortuosity measures including curvature and density of inflection points, computed at multiple spatial scales. To leverage tortuosity interpretation, we propose the tortuosity plane (TP) onto which each image is mapped. Experiments were carried out on 140 images of subbasal nerve plexus of the central cornea, covering four levels of tortuosity. Three experienced observers graded each image independently.

**Results**:
The best tortuosity representation was the combination of mean curvature at spatial scales 2 and 5. These tortuosity measures were the axes of the proposed TP (*interpretation*). The system for tortuosity *estimation* revealed strong agreement with the observers on a *global* and *per-level* basis. The agreement with each observer (Spearman's correlation) was statistically significant (*α _{s}* = 0.05,

*P <*0.0001) and higher than that of at least one of the other observers in two out of three cases (

*ρ*

_{OUR}= 0.7594 versus

*ρ*

_{Obs3}= 0.7225;

*ρ*

_{OUR}= 0.8880 versus

*ρ*

_{Obs1}= 0.8017,

*ρ*

_{Obs3}= 0.7315). Based on paired-sample

*t*-tests, these improvements were significant (

*P*< 0.001).

**Conclusions**:
Our automated system stratifies images by four tortuosity levels (discrete scale) matching or exceeding the accuracy of experienced observers. Of importance, the TP allows the assessment of tortuosity on a two-dimensional continuous scale, thus leading to a finer discrimination among images.

^{1–7}For clinical evaluation of the corneal nerve structures in patients, in vivo confocal microscopy (IVCM) is often used, which visualizes corneal nerves and in particular the subbasal nerve plexus.

^{6,8,9}In addition to measuring the corneal nerve density, this imaging modality enables the assessment of morphologic features of the nerves such as width, reflectivity, orientation, branching patterns, bead-like formations and tortuosity.

^{10,11}

^{12–19}However, tortuosity has no widely accepted definition, and it is often estimated qualitatively by comparing patient images with reference images.

^{11}Moreover, several corneal nerve fibers with different tortuosity characteristics (e.g., amplitude, number of inflection points) might be present in the same image. This inevitably introduces a degree of subjectivity; therefore, quantitative, objective and repeatable methods of tortuosity estimation would advance the field.

^{20–27}and for the brain vasculature

^{28}(see Ref. 29 for a recent comparative study). In contrast, there is very limited published work on corneal nerve fibers; Kallinikos et al.

^{30}were the first to propose an objective, semiautomated method for quantifying the tortuosity of corneal nerve fibers. Scarpa et al.

^{31}adapted the algorithm proposed by Grisan et al.

^{21}for retinal vessels to work with corneal nerve fibers.

**Figure 1**

**Figure 1**

^{32}we addressed points 1 and 2, investigating which role is played by different spatial scales (i.e., different frequency of turns along the fiber) in tortuosity estimation (Fig. 1). Experimental results suggested that the notion of tortuosity can be captured quantitatively in a richer and more informative way by computing the tortuosity measures at multiple spatial scales, and we proposed an automated methodology to compute it. The need to measure tortuosity at multiple spatial scales is further strengthened by a recent study

^{33}highlighting the roles played by “short- and long-range tortuosity” in tortuosity grading, which we (a) formalized mathematically with multi-scale analysis and (b) discovered through a machine learning algorithm.

^{32}

^{2}, from the subbasal layer. For imaging, a disposable sterile polymethylmethacrylate cap (Tomo-Cap; Heidelberg Engineering) was filled with a layer of hypromellose 0.3% gel (GenTeal; Alcon, Fort Worth, TX, USA) and mounted on the Cornea Module. After topical anesthesia with 0.5% proparacaine hydrochloride (Alcon), a drop of hypromellose 0.3% gel was applied to both eyes. One drop of this gel was also put on the outside tip of the cap to improve optical coupling. Then, the Cornea Module of the confocal microscope was advanced until the gel contacted the central surface of the cornea. Digital images were then recorded at a rate of 3 frames per second in sequence mode, including 100 images per sequence, from all layers of the cornea.

^{11}Nerves were traced using NeuronJ (in the public domain at http://www.imagescience.org/meijering/software/neuronj/), a plug-in for the ImageJ software (in the public domain at http://imagej.nih.gov/ij).

**Figure 2**

**Figure 2**

^{22,24,25,34}These three measures are computed at six spatial scales to take into account the contribution of turns at six different frequencies. As Figure 3 shows, we start by measuring curvatures and inflections points on the original fiber (including all the frequencies, scale 1). Then, we smooth out its shape to remove high-frequency turns and concentrate on lower frequency turns (scales 2–6).

**Table 1**

**Figure 3**

**Figure 3**

^{32}Notice, we use the density of inflection points (i.e., number of inflection points per unit length) rather than their number, to yield a consistent comparison among fibers of different length.

^{35–37}is a class of machine learning algorithms identifying the most discriminative features (or measures) for a task, here for tortuosity assessment. As our pool of measures is computed at multiple spatial scales, it also selects the most discriminative scales.

^{37}in which the classification is based on a multinomial logistic ordinal regressor (MLOR). The MLOR models the log-cumulative odds

^{38}; that is, the logarithm of the ratio of two probabilities: the probability that an image

*y*has level of tortuosity

*c*or lower, Pr(y ≤ c

_{j}_{j}) and the probability that its tortuosity level is greater than

*c*, Pr(y > c

_{j}_{j}), using a linear combination of the p tortuosity measures (features)

*X*: where,

_{i}*j*∈ {1,2,3} and

*α*and

*β*are the weights of each tortuosity measure into the final assignment.

*α*and

*β*), and it is used to assign the tortuosity level to a subset of training images. In this framework, the best combination of tortuosity measures and their scale is the one minimizing the number of misclassified images. Notice, we used cross-validation (i.e., randomly splitting the data set in training and test sets) to provide a fair assessment.

*final*MLOR model (i.e., using all the training images). The best weights

*α*and

*β*are estimated for each case (i.e., level 1 versus level 2, 3, or 4; level 1 or 2 versus level 3 or 4; level 1, 2, or 3 versus level 4). Then, the selected tortuosity measures are computed on the new image, and its tortuosity level is obtained using the maximum a posteriori criterion: the tortuosity level that the MLOR predicts as the most likely among the four considered is assumed as the

*estimated*tortuosity level.

*single*tortuosity level (or coefficient value), obfuscating the different effects of the factors considered, and limiting interpretations that could be important to the ophthalmologist for diagnosis. Using the geometric interpretation of the MLOR model, we map each IVCM image onto a plane whose axes are the best tortuosity measures identified automatically by the feature selection procedure (in our case, weighted mean curvatures at spatial scales 2 and 5 corresponding to the fourth and 13th tortuosity measures in Table 1, respectively). Geometrically, the best weights

*α*and

*β*of the MLOR model define the best (in the MLOR sense) linear

*decision boundaries*separating the TP into four regions corresponding to the four tortuosity levels (black lines in Fig. 4). Any new corneal nerve image can be plotted as a point on the TP by computing the values of the two best tortuosity measures (Fig. 4 shows four examples). The region containing the new point gives the estimated tortuosity level for the corneal nerve image.

**Figure 4**

**Figure 4**

*level of confidence*for the estimated tortuosity level, quantifying the reliability of the system. This confidence is given by the probability of belonging to one of the four regions (i.e., tortuosity levels) estimated automatically by the MLOR model and is intuitively proportional to the distance of the point (i.e., image) from the linear decision boundaries. In fact, the closer the point to the boundary between two adjacent regions, the less reliable is the estimated tortuosity level. The level of confidence for

*all*the points on the TP can be estimated once the system is trained (i.e., before analyzing the target images) and color coded for immediate, intuitive visualization (Fig. 4).

^{39}we use accuracy (Acc), sensitivity (Se), specificity (Sp), positive predicted value (Ppv), and negative predictive value (Npv), as defined below. where TP

*indicates true positives; TN*

_{i}*, true negatives; FP*

_{i}*, false positives; and FN*

_{i}*, false negatives for the*

_{i}*i*-th tortuosity level (

*N*= 4). We use weighted averaging as our data set is unbalanced, as suggested by Sokolova and Lapalme.

^{39}The above performance measures do not take into account the distance from the actual value when considering errors. To this aim, we also include the mean squared error (MSE) defined as where

*N*

_{imgs}is the total number of test images,

*y*is the predicted tortuosity level, and

_{i}*l*the true one, for the

_{i}*i*-th image.

*t*-tests.

^{40}) To address this problem, we create a consensus ground truth (CGT): the tortuosity level of each image is given by the majority voting (i.e., the CGT tortuosity level for an image is the one assigned independently by at least two expert observers). Consistent with the difficulty of the task, 11 images were discarded owing to complete disagreement (all assigned levels different). We trained the system using the CGT and achieved promising global-level performance, reported previously (Annunziata et al.,

^{32}Table 1). However, some tortuosity levels might be easier to estimate compared with others, or a specific level could be predicted with poor performance. To better investigate this point, we compute each performance measure on a per-level basis and show the results in Table 2. Our system achieves almost 90% accuracy for the tortuosity levels 1 and 4. Performance decreases for the middle levels owing to smaller differences with neighboring regions. Quantitatively, sensitivity and positive-predictive values decrease considerably owing to the increase in false negatives. Qualitative considerations on the TP are made below.

**Table 2**

*α*= 0.05,

_{s}*P*< 0.0001). Moreover, the first and second rows of Table 3 show that the correlation of our system with the reference observer is higher than that of at least one of the other observers. Computed

*P*values for the pair-sample

*t*-tests on the MSE show that these improvements are significant for both hypotheses “MSE

_{ours}

*<*MSE

_{Obs3}” (row 1,

*P*< 0.001) and “MSE

_{ours}

*<*MSE

_{Obs1}” (row 2,

*P*< 0.001).

**Table 3**

*l*in Equation 7) of each image as per CGT. The four clusters reflecting the tortuosity levels are clearly visible and separated automatically by our system, although some misclassifications are present. We notice that the maximum error made by the system is always within one tortuosity level, showing very low MSE errors. Moreover, different regions show different spreading on the TP, especially in region 2 (very low) and region 4 (very high), suggesting that

*intra*-level tortuosity characteristics should be better investigated in future studies. Of interest, tortuosity does not vary much in the

*Y*axis, but does vary in the

*X*axis, for levels 1 and 2. At levels 3 and 4, tortuosity varies in both the

*Y*and

*X*axes.

**Figure 5**

**Figure 5**

^{33}Unlike this previous study, the proposed method attempts to discover the best tortuosity representation and the most discriminative spatial scales for a specific data set, using a fully automated procedure (feature selection). This novel approach to tortuosity estimation has the potential to answer interesting research questions related to the role played by each tortuosity measure and spatial scale for specific disease conditions.

*ρ*= 0.0174,

*P*= 0.85). Likewise, averaging each tortuosity measure at fiber level is important to avoid the effect of higher-order aberration, which could favor longer fibers, for instance. No significant correlation between total nerve length per image and the tortuosity assigned by the system was found (

*ρ*= −0.1693,

*P*= 0.0657).

*tortuosity volume*, but four or more cannot. Although further tests are needed, the need for a higher dimensional representation including more than three tortuosity measures seems unlikely, given that we used a representative set of clinical IVCM images including different diseases.

^{41–43}This will lead to a fully automated system, which would contribute to disease stratification by tortuosity and better diagnosis based on objective, highly informative measurements.

**R. Annunziata**, None;

**A. Kheirkhah**, None;

**S. Aggarwal**, None;

**B.M. Cavalcanti**, None;

**P. Hamrah**, None;

**E. Trucco**, None

*. 2010; 25: 171–177.*

*Semin Ophthalmol**. 2011; 52: 5136–5143.*

*Invest Ophthalmol Vis Sci**. 2012; 31: 437–441.*

*Cornea**. 2015; 122: 2200–2209.*

*Ophthalmology**. 2009; 37: 100–117.*

*Clin Experiment Ophthalmol**. 2013; 58: 466–475.*

*Surv Ophthalmol**. 2013; 13: 569–576.*

*Curr Opin Allergy Clin Immunol**. 2009; 93: 853–860.*

*Br J Ophthalmol**. 2014; 39: 213–231.*

*Curr Eye Res**. 2015; 13: 250–262.*

*Ocul Surf**. 2001; 20: 374–384.*

*Cornea**. 2015; 34: 1114–1119.*

*Cornea**. 2006; 142: 488–490.*

*Am J Ophthalmol**. 2009; 50: 5155–5158.*

*Invest Ophthalmol Vis Sci**. 2013; 120: 40–47.*

*Ophthalmology**. 2010; 117: 1930–1936.*

*Ophthalmology**. 2015; 34: 745–749.*

*Cornea**. 2012; 26: 126–132.*

*Eye**. 2012; 30: 1734–1745.*

*Stem Cells**Proceedings of the 2011 18th Iranian Conference of Biomedical Engineering (ICBME)*. Tehran: IEEE; 2011: 181–184.

*. 2008; 27: 310–319.*

*IEEE Trans Med Imaging**. 1999; 53: 239–252.*

*Int J Med Inform**. 2002; 6: 407–429.*

*Med Image Anal**. 2013; 106: 40–46.*

*Exp Eye Res**Proceedings of the 2011 International Conference on Electrical Control and Computer Engineering (INECCE)*. Pahang: IEEE; 2011: 183–186.

*. 2010; 57: 2239–2247.*

*IEEE Trans Biomed Eng**. 2008; 49: 3577–3585.*

*Invest Ophthalmol Vis Sci**. 2003; 22: 1163–1171.*

*IEEE Trans Med Imaging**. 2014: 5414–5417.*

*Conf Proc IEEE Eng Med Biol Soc**. 2004; 45: 418–422.*

*Invest Ophthalmol Vis Sci**. 2011; 52: 6404–6408.*

*Invest Ophthalmol Vis Sci**Proceedings of the Ophthalmic Medical Image Analysis (OMIA) First International Workshop, MICCAI 2014*. Boston: Iowa Research Online; 2014: 113–120.

*. 2015; 56: 5102–5109.*

*Invest Ophthalmol Vis Sci**. 2008; 49: 3577–3585.*

*Invest Ophthalmol Vis Sci**. 2005; 27: 1226–1238.*

*IEEE Trans Pattern Anal Mach Intell**. 2012; 9: 1106–1119.*

*IEEE/ACM Trans Comput Biol Bioinform**. 2003; 3: 1157–1182.*

*J Mach Learn Res**. New York: Springer-Verlag; 2006.*

*Pattern Recognition and Machine Learning**. 2009; 45: 427–437.*

*Inf Process Manag**. 2013; 32: e83–e89.*

*Cornea**Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015*. Munich: Springer International Publishing; 2015: 588–595.

*Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015*. Munich: Springer International Publishing; 2015: 596–603.

*. Milan: IEEE; 2015: 5655–5658.*

*Engineering in Medicine and Biology Society - EMBS*