Abstract
purpose. To compare the performance of neural networks for perimetric glaucoma diagnosis when using different types of data inputs: numerical threshold sensitivities, Statpac Total Deviation and Pattern Deviation, and probability scores based on Total and Pattern Deviation probability maps (Carl Zeiss Meditec, Inc., Dublin, CA).
methods. The results of SITA Standard visual field tests in 213 healthy subjects, 127 patients with glaucoma, 68 patients with concomitant glaucoma and cataract, and 41 patients with cataract only were included. The five different types of input data were entered into five identically designed artificial neural networks. Network thresholds were adjusted for each network. Receiver operating characteristic (ROC) curves were constructed to display the combinations of sensitivity and specificity.
results. Input data in the form of Pattern Deviation probability scores gave the best results, with an area of 0.988 under the ROC curve, and were significantly better (P < 0.001) than threshold sensitivities and numerical Total Deviations and Total Deviation probability scores. The second best result was obtained with numerical Pattern Deviations with an area of 0.980.
conclusions. The choice of type of data input had important effects on the performance of the neural networks in glaucoma diagnosis. Refined input data, based on Pattern Deviations, resulted in higher sensitivity and specificity than did raw threshold values. Neural networks may have high potential in the production of useful clinical tools for the classification of visual field tests.
Perimetry is one the most important examinations for diagnosis and monitoring of glaucoma. Static computerized threshold perimetry in which white stimuli are shown on an evenly illuminated white background has been the most common type of perimetry in clinical glaucoma management for a long time. The way in which perimetric findings are analyzed and presented is important in the interpretation of test results. Reading fields by looking only at maps of numerical threshold sensitivities or gray-scale representation of such values, is difficult even for experts. Programs such as the Humphrey Statpac (Carl Zeiss Meditec, Inc., Dublin, CA)
1 for computer-assisted interpretation were developed in the mid- to late 1980s. The probability maps included in the Statpac program are often able to highlight early glaucomatous field defects before they become visible in gray-scale representations of raw threshold values
2 and can also reduce effects caused by cataract.
3 This probability map concept has enjoyed wide acceptance and has subsequently been applied in most new perimetric devices and perimetric modalities, such as frequency-doubling perimetry
4 and short-wavelength automated perimetry.
5 6
The Glaucoma Hemifield Test (GHT)
7 included in Statpac, is a rather simple expert system based on up-and-down hemifield differences between probability scores calculated from Pattern Deviation probability maps. The GHT was one of the first computerized systems that was able to classify field test results reliably as normal or abnormal and improved the ability of ordinary clinicians to assess visual field test results.
8
In the beginning of the 1990s artificial neural networks (ANNs), one of many algorithms in the machine learning classifier concept, were tested as a tool for the interpretation of perimetric results (Goldbaum MH, et al.
IOVS 1990;31:ARVO Abstract 2471; Keating D, et al.
IOVS 1992;33:ARVO Abstract 1394).
9 ANNs were reported to be able to differentiate between glaucoma and normal visual field status at least as well as trained readers.
10 In other papers, it was also reported that machine learning classifiers discriminate better between normal and glaucomatous fields than do global visual field indices.
11 12 Global visual field indices are far from ideal as diagnostic tools, however, because they condense all threshold data into one number, resulting in loss of valuable spatial information, and visual field indices are not particularly sensitive to early localized glaucomatous visual field loss.
13 14 15
The performance of ANNs has also been compared with that of other types of field interpretation criteria based on localized loss.
11 Disc topography data have also been added to visual field data to improve the diagnostic ability of ANNs.
16
We hypothesized that it may be possible to enhance the diagnostic performance of ANNs further by using input data from which the effects of age and media opacities have been eliminated or reduced and in which measured sensitivities have already been compared to the range of age-corrected normal sensitivities and subsequently translated into probabilities. The Statpac program provides two important analyses: (1) Numerical Total Deviations represent the deviation at each tested point of the measured threshold from age-corrected normal values. (2) Numerical Pattern Deviations represent a modification of the Total Deviation results in which a correction has been applied to account for any general elevation or depression of the field caused by media opacities or changes in pupil size. Total and Pattern Deviation probability maps are graphic presentations of the significances of the numerical deviations, relative to the known ranges of normal values at each test point location.
The purpose of this study was to test our hypothesis by comparing sensitivities and specificities achieved by ANNs for glaucoma diagnosis by using different types of perimetric inputs: numerical threshold values in decibels, and Statpac numerical Total and Pattern Deviations and probabilities.
The field tests of patients with glaucoma were randomly selected from the directory fields included in the database in one of our Humphrey Field Analyzers. This database consisted of 11,134 tests of 3,629 patients, almost all assessed by the 30-2 SITA Standard program. The directory was sorted in alphabetic order according to the patient’s surname. Starting with the letter A, one field test was randomly selected from every fifth patient; no first field results were selected, to avoid patterns of learning. The selected patients were then matched to our glaucoma register. Only patients with a diagnosis of glaucoma or suspected glaucoma were eligible, and patient records were retrieved. In this way, 643 SITA Standard 30-2 test results were selected to be evaluated for inclusion. At this point the only information available was that the patient had undergone 30-2 SITA Standard visual field testing at least twice, and that the patient had a diagnosis of suspected glaucoma or glaucoma. After retrieving patient records disc photographs obtained before the selected field test were inspected. Fields of all eyes with glaucomatous disc appearance were deemed usable. A comprehensive description of disc topography was required in patient records lacking disc photographs. A description of lens status was also required. The absence of such a description or a notation of a clear lens or pseudophakic eyes was regarded as glaucoma without cataract, whereas data indicating the presence of any type or stage of cataract classified the eyes as having glaucoma plus cataract. After exclusion of eyes according to these criteria, 127 tests of 127 eyes with glaucoma and 68 tests of 68 eyes with concomitant glaucoma and cataract remained.
The mean age of the 127 patients with glaucoma was 75 years, ranging from 40 to 96. MDs ranged from −31.18 to +0.74 dB
(Fig. 2B) . The group with both glaucoma and cataract averaged 77 years of age, ranging from 51 to 97 and had MDs ranging from −29.99 to −0.12 dB
(Fig. 2D) . In some eyes, the selected field test results appeared normal, but then the disc appeared suspicious or pathologic, and later field tests, not included in the analysis, showed glaucomatous field loss.
Our networks were fully connected feed-forward multilayer perceptrons built using commercial software (Neural Network Toolbox, ver.4.0 of MatLab; The MathWorks Inc., Natick, MA). This network architecture, consisting of an input layer, two hidden layers, and an output layer, was the same for the different sets of input data. There were 74 units in the input layer, each unit corresponding to one test point in the 30-2 test point pattern. The number of processing elements in the two hidden layers was 25 and 5. The output layer, one neuron with a logistic transfer function, provided the network’s output: glaucoma or normal.
ANNs have been suggested as tools for interpretation of automated visual field test results in patients with glaucoma.
10 11 Other types of machine learning classifiers, such as support vector machines or committee machines, have also been reported to interpret visual fields adequately.
12 In all studies that we have been able to find, however, the inputs have been trained and tested with unprocessed threshold sensitivities. There is no reason to believe that different types of machine learning classifiers would yield different results when different types of input data are compared. We found that using the more refined input data available from a program for computer-assisted interpretation (i.e., Statpac data) could significantly enhance sensitivity and specificity. Pattern Deviation probability scores based on the Pattern Deviation probability maps produced the largest area under the ROC curve, indicating high performance in discrimination between normal and glaucomatous fields.
The improved results obtained when field data were entered as Pattern Deviations is probably explained by the reduction of the influence of cataract on Pattern Deviations. Both Pattern Deviation numerical displays and probability maps were designed to reduce the effect of media opacities. Pattern Deviation misclassified only 2 normal eyes with cataract, whereas 13 were misclassified when Total Deviation was used. The network was designed to identify the absence or presence of glaucomatous visual field loss. Thus, we included subjects with cataract in the normal group and patients with concomitant cataract and glaucoma in the glaucoma group. We used this approach because cataract frequently occurs in the age groups where glaucoma is most prevalent.
The normal fields obtained in healthy subjects without cataract were randomly selected from a larger multicenter database used for calculation of Statpac normal values and normal limits for SITA fields. We do not believe that this has biased our results. A large database including data from multiple centers is probably more representative of a normal population than a smaller sample collected at one center only. We did not use the full database; 66% of the records were randomly selected for the purpose of this study. We also included normal fields of patients with media opacities in our set of normal fields. The results, as presented in ROC curves, depended considerably more on the network output than on the Statpac normal limits. Further, our purpose was to compare different input derived from the same normal and pathologic fields and the conclusion pertaining to that comparison would not be expected to cause any bias, as the effects of the selection of the normal data would be equal in all five parameters.
The five different ANNs correctly classified most fields; but, as expected, normal eyes with substantial cataract were more often classified correctly by the two Pattern Deviation–based ANNs compared with the Total Deviation and unprocessed threshold ANNs
(Fig. 4) . In fields with severe damage, Pattern Deviation–based ANNs did not perform as well as ANNs trained with Total Deviation and threshold sensitivities. This was also anticipated, as the Pattern Deviation concept cannot presently be successfully used in end-stage fields.
27 28
The selection of subjects is crucial when evaluating diagnostic methods. Testing the method in only patients with obvious moderate to severe field defects would give results suggesting better discrimination than would be found in patients with early defects. We randomly selected our glaucoma fields from the directory of tests on the hard disk in one of our perimeters. This resulted in a representative selection of patients with a wide range of visual field defects, including glaucomatous eyes without apparent field loss. With this method, 39% had MDs better than −5 dB and thus could be considered to have mild loss. If only fields with clear-cut reproducible defects were selected, one would expect higher sensitivities for all types of input data. Our selection of fields including a random sample of glaucomatous eyes has advantages, but the selection, in principle, should not be critical when comparing performance of neural networks all using different input data from the same normal and glaucomatous visual fields.
Our results suggest that the ability of artificial neural networks to classify visual fields can be further improved if refined input data based on Pattern Deviations is used. Such input data resulted in higher sensitivity and specificity than did raw threshold sensitivity values, probably because of the former’s ability to separate field loss caused by glaucoma from that caused by cataract. Further studies including independent visual field data not used for training of network data are needed to evaluate a more general applicability of ANNs for classification of visual field test results. Neural networks and other machine classifiers seem to have a great potential to become a useful clinical tool in the diagnosis of glaucomatous visual field loss, and it may be of value in the study of the performance of a range of types of data inputs with different machine classifiers.
Supported by Grant K2002-74X-10426-10A from the Swedish Research Council, by the Järnhardt Foundation, and by funds administered by Malmö University Hospital.
Submitted for publication February 10, 2005; revised April 14 and May 12, 2005; accepted July 1, 2005.
Disclosure:
B. Bengtsson, None;
D. Bizios, None;
A. Heijl, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked “
advertisement” in accordance with 18 U.S.C. §1734 solely to indicate this fact.
Corresponding author: Boel Bengtsson, Department of Ophthalmology, Malmö University Hospital, Lund University, SE-205 02 Malmö, Sweden;
boel.bengtsson@oftal.mas.lu.se.
Table 1. Performance of Neural Network in Classifying Standard Automated Perimetric Visual Fields, using Different Input Data
Table 1. Performance of Neural Network in Classifying Standard Automated Perimetric Visual Fields, using Different Input Data
| Pattern Deviation | | | | Threshold Sensitivity | | Total Deviation | | | |
| Prob. Scores | | dB | | dB | | Prob. Scores | | dB | |
Network threshold | 0.50 | 0.30 (best) | 0.50 | 0.37 (best) | 0.50 | 0.43 (best) | 0.50 | 0.42 (best) | 0.50 | 0.47 (best) |
Sensitivity (%) | 89.7 | 93.9 | 86.7 | 90.8 | 81.5 | 85.1 | 79.5 | 82.1 | 79.5 | 80.5 |
Specificity (%) | 97.6 | 96.5 | 98.0 | 94.9 | 95.3 | 91.3 | 94.9 | 93.3 | 94.9 | 94.9 |
Area under ROC curve | 0.988* | | 0.980, † | | 0.960 | | 0.943 | | 0.942 | |
The authors thank Ola Engwall, MSc (Lund, Sweden), for performing the necessary programming in the MatLab environment.
HeijlA, LindgrenG, OlssonJ. A package for statistical analysis of computerized fields. Doc Ophthalmol Proc Ser. 1987;49:153–168.
HeijlA, BengtssonB. Early visual field defects in glaucoma: a study of eyes developing field loss.BucciMG eds. Glaucoma: Decision Making in Therapy. 1996;75–78.Springer Verlag Milan, Italy.
BengtssonB, LindgrenA, HeijlA, LindgrenG, ÅsmanP, PatellaVM. Perimetric probability maps to separate change caused by glaucoma from that caused by cataract. Acta Ophthalmol Scand. 1997;75:184–188.
[PubMed]JohnsonCA, WallM, FingeretM, LalleP. A Primer for Frequency Doubling Technology. 1998;9–11.Welch-Allyn Inc. Skaneateles, NY.
JohnsonCA, AdamsAJ, CassonEJ, BrandtJD. Blue-on-yellow perimetry can predict the development of glaucomatous visual field loss. Arch Ophthalmol. 1993;111:645–650.
[CrossRef] [PubMed]HeijlA, PatellaVM. Essential Perimetry. The Field Analyzer Primer. 2002; 3rd ed. 37.Carl Zeiss Meditec, Inc. Dublin, CA.
ÅsmanP, HeijlA. Glaucoma Hemifield Test: automated visual field evaluation. Arch Ophthalmol. 1992;110:812–819.
[CrossRef] [PubMed]KatzJ, SommerA, GaasterlandDE, AndersonDR. Comparison of analytic algorithms for detecting glaucomatous visual field loss. Arch Ophthalmol. 1991;109:1684–1689.
[CrossRef] [PubMed]MutlukanE, KeatingD. Visual field interpretation with a personal computer based neural network. Eye. 1994;8:321–323.
[CrossRef] [PubMed]GoldbaumMH, SamplePA, WhiteH, et al. Interpretation of automated perimetry for glaucoma by neural network. Invest Ophthalmol Vis Sci. 1994;35:3362–3373.
[PubMed]LietmanT, EngJ, KatzJ, QuigleyHA. Neural networks for visual field analysis: how do they compare with other algorithms?. J Glaucoma. 1999;8:77–80.
[PubMed]GoldbaumMH, SamplePA, ChanK, et al. Comparing machine learning classifiers for diagnosing glaucoma from standard automated perimetry. Invest Ophthalmol Vis Sci. 2002;43:162–169.
[PubMed]HarringtonDO, DrakeMW. Computerized perimeters. The Visual Fields: Text and Atlas of Clinical Perimetry. 1990; 6th ed. 54–55.CV Mosby St. Louis.
ChauhanBC, DranceSM, LaiC. A cluster analysis for threshold perimetry. Graefes Arch Clin Exp Ophthalmol. 1989;227:216–220.
[CrossRef] [PubMed]AsmanP, HeijlA, OlssonJ, RootzenH. Spatial analyses of glaucomatous visual fields; a comparison with traditional visual field indices. Acta Ophthalmol Scand. 1992;70:679–686.
BrigattiL, HoffmanD, CaprioliJ. Neural networks to identify glaucoma with structural and functional measurements. Am J Ophthalmol. 1996;121:511–521.
[CrossRef] [PubMed]KatzJ, SommerA. Reliability indexes of automated perimetric tests. Arch Ophthalmol. 1988;106:1252–1254.
[CrossRef] [PubMed]BengtssonB, HeijlA. False-negative responses in glaucoma perimetry: indicators of patient performance or test reliability?. Invest Ophthalmol Vis Sci. 2000;41:2201–2204.
[PubMed]HeijlA, LindgrenG, OlssonJ. The effect of perimetric experience in normal subjects. Arch Ophthalmol. 1989;107:81–86.
[CrossRef] [PubMed]WildJM, Dengler-HarlesM, SearleAE, O’NeillEC, CrewsSJ. The influence of the learning effect on automated perimetry in patients with suspected glaucoma. Acta Ophthalmol Scand. 1989;67:537–545.
HeijlA, BengtssonB. The effect of perimetric experience in patients with glaucoma. Arch Ophthalmol. 1996;114:19–22.
[CrossRef] [PubMed]BengtssonB, HeijlA. Inter-subject variability and normal limits of the SITA Standard, SITA Fast, and the Humphrey Full Threshold computerized perimetry strategies, SITA STATPAC. Acta Ophthalmol Scand. 1999;77:125–129.
[CrossRef] [PubMed]MøllerMF. A scaled conjugate gradient algorithm for fast supervised training. Neural Networks. 1993;6:525–533.
[CrossRef] StoneM. Cross-validation choice and assessment of statistical predictions. J R Stat Soc. 1974;B36:111–147.
SweetsJ. Measuring the accuracy of diagnostic systems. Science. 1988;240:1285–1293.
[CrossRef] [PubMed]DeLongER, DeLongDM, Clarke-PearsonDL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845.
[CrossRef] [PubMed]HeijlA, PatellaVM. Essential perimetry. The Field Analyzer Primer. 2002; 3rd ed. 50.Carl Zeiss Meditec, Inc. Dublin, CA.
BlumenthalE, Sapir-PichhadzeR. Misleading statistical calculations in far-advanced glaucomatous visual field loss. Ophthalmology. 2003;110:196–200.
[CrossRef] [PubMed]