The median (interquartile range, IQR) baseline age of the 34 patients was 57.6 (50.2, 61.9) years. The median (IQR) follow-up was 9.0 (7.8, 10.1) years with 18 (15, 20) HRT images. The median (IQR) absolute time difference between the baseline HRT examination and baseline disc photograph was 0 (0, 70) days, and the last HRT and last disc photograph was 0 (0, 1) days. The median (IQR) baseline age of the 34 normal control subjects used to derive the TCA specificity estimates was 56.0 (45.4, 64.5) years. They were followed for 10.2 (7.7, 10.8) years with 14 (11, 17) HRT images. The median (IQR) mean pixel height standard deviation (MPHSD) of the HRT images was 21 (16, 27) μm in patients and 19 (14, 29) μm in control subjects. There were no statistically significant differences between patients and control subjects in the baseline age (P = 0.52, Mann–Whitney test) or length of follow-up (P = 0.08); however, patients had more examinations (P = 0.01) and a higher MPHSD (P = 0.01).
Observer scores for the 10 optic disc photographs used for the specificity estimates are shown in
Figure 2 . There were three patients (cases 6, 7, and 9;
Fig. 2 ) in whom all four observers scored 0 (definitely no change) and 9 patients in whom at least two observers scored 0 (all except case 3,
Fig. 2 ). A score of 0 (definitely no change) yielded specificities ranging from 50% (observer B) to 90% (observer A). A score of either 0 or 1 (definitely no change or probably no change) yielded specificities ranging from 60% (observer B) to 100% (observer A).
For the remaining 34 photographs in the test set, the agreement between observers was fair according to Landis and Koch’s descriptors
27 with κ values of 0.22 (95% CI, 0.18–0.26) when a score of 1, 2, or 3 was used for defining progression, 0.24 (95% CI, 0.11–0.38) when a score of 2 or 3 was used, and 0.38 (95% CI, 0.34–0.43) when a score of 3 was used.
The agreement rates between the TCA and observers depended on the criterion used to define progression. For the moderate TCA criterion, the overall agreement decreased with increasingly conservative observer criteria
(Fig. 3) . Agreement was highest for observer B who classified the highest number of discs as progressing and lowest for observer A who classified the lowest number of discs as progressing. For the conservative TCA criterion, the overall agreement generally increased with increasingly conservative observer criteria
(Fig. 4)and was highest for observer A and lowest for observer C. The κ statistics of agreement between the TCA and observers are shown in
Figure 5for the moderate and conservative TCA criteria. These data indicate that the agreement between the TCA and observers was generally poorer than that between the observers
(Fig. 5) ; however, there was at least one observer criterion where the TCA-observer agreement was close to or better than the equivalent interobserver agreement. In some instances, κ was <0, indicating worse than chance agreement between the TCA and observers.
Illustrative cases
(Figs. 6 7 8)show the spectrum of agreement between the two methods. For case 1 (right eye), a 54-year-old patient at baseline followed for 10 years, all four observers gave a classification of definitely change (score, 3) with all observers noting changes in the superior temporal, temporal, and inferior temporal sectors of the disc
(Fig. 6) . The TCA showed progression with the conservative criterion, with increasing changes throughout the disc over the same follow-up period (
Fig. 6and
Movie S1). In case 2 (left eye), a 58-year-old patient at baseline followed for 8 years, there were two observer classifications of probably no change (score, 1) and two of definitely change (score, 3) in the superior and inferior temporal quadrants
(Fig. 7) . The TCA showed progression with the conservative criterion with significant changes in the superior temporal, superior nasal, and inferior temporal quadrants (
Fig. 7and Movie S2). Finally for case 3 (right eye), a 45-year-old patient at baseline followed for 11 years, there was one classification of definitely no change (score, 0), two of probably change (score, 2), and one of definitely change (score, 3) in the temporal and inferior temporal sectors
(Fig. 8) . There was significant progression with the conservative TCA with overall widening of the cup with additional surface changes in the neuroretinal rim (
Fig. 8and Movie S3).
The TCA hit rates (proportion of progressing cases) for the three criteria—liberal (specificity, 81%), moderate (specificity, 94%), and conservative (specificity, 97%)—were 94%, 77%, and 35%, respectively
(Fig. 9) . Each observer’s hit rates for the three criteria used—probably no change, probably change, or definitely change (score, 1, 2, or 3); probably change and definitely change (score, 2 or 3); and definitely change (score, 3)—are also shown in
Figure 9 . These data show substantial differences among the observers. For example, observer A yielded high specificity, but even the most conservative TCA criterion yielded a higher hit rate with a slightly reduced specificity for the two more conservative observer criteria. Although observer B yielded comparable hit rates to the TCA, the specificity was notably poorer. The responses of observers C and D were between those of observers A and B.
Using combined responses from the observers allowed comparison with the TCA over a wide range of criteria, from the most liberal (progression defined as score of 1, 2, or 3 from only 1 observer) to the most conservative (progression defined as a score of 3 from all four observers). These data show that for each of the three observer criteria, there was one of four possible combined responses where the observer performance was similar or better than that of the TCA; however, in all other cases, the observer specificity and hit rates were lower than those of the TCA
(Fig. 10) .