purpose. To interpret individual results from automated perimeters, a normative database must be developed. Typically, a set of criteria determines those subjects that may be included in the database. This study examined whether a criterion of normal performance on an established perimeter generates a subgroup with supernormal perimetric performance.

methods. The right-eye perimetric results of 100 subjects were analyzed. Subjects had visual acuities of 6/12 or better, no history of eye disease, and normal slit lamp biomicroscopic and ophthalmoscopic examinations. Subjects performed test–retest visual field examinations on a Humphrey Field Analyzer (HFA) 24-2 test (Zeiss Humphrey Systems, Dublin, CA), and on a custom frequency-doubling (FD) perimeter with targets spaced in the same 24-2 pattern.

results. Test–retest correlation (Spearman rank correlation coefficients, *r* _{s}) for mean defect (MD) and pattern SD (PSD) were 0.65 and 0.40 (HFA), and 0.82 and 0.39 (FD perimeter). Three subjects with HFA MDs in the lower 5% had similarly low MDs on retest, whereas no subject was common between the test and retest for the lower 5% of HFA PSD. Correlation between the HFA and FD test results were 0.41 (MD) and 0.05 (PSD). Based on these correlations, the bias introduced into perimetric probability limits were determined, by using Monte Carlo simulations.

conclusions. Although a criterion of a normal MD may produce a subpopulation with supernormal perimetric performance, a criterion of a normal PSD is less likely to do so. Also, a criterion on one test type is less likely to create a supernormal group on a different test type. The bias introduced into perimetric probability limits is small.

^{ 1 }

^{ 2 }These criteria may include a negative history of ocular disease, normal findings in slit lamp biomicroscopic and ophthalmoscopic examinations, visual acuity better than a prescribed limit, and a restricted range of refractive errors. It is also possible to establish a perimetric criterion based on a subject’s performance on an established clinical perimeter

^{ 2 }

^{ 3 }

^{ 4 }

^{ 5 }—for example, mean defect (MD) and pattern SD (PSD) within the 95% limits of normality, on the Humphrey Visual Field Analyzer (HFA; Carl Zeiss Meditec, Inc., Dublin CA), where the index MD provides a measure of uniform loss or loss involving a large fraction of the visual field, and PSD provides a measure of local irregularity. Such perimetric criteria may be useful in detecting visual pathway disease not manifest on ophthalmoscopy (e.g., vascular

^{ 6 }and compressive

^{ 7 }lesions), or early ocular disease in which the ocular fundus is not frankly abnormal (e.g., early glaucoma). Both the original HFA analysis package

^{ 8 }and the newer Swedish interactive test algorithm (SITA) (both achromatic

^{ 9 }and short-wavelength automated perimetry [SWAP]

^{ 5 }) are based on analyses of subject groups from which those with abnormal visual field results were excluded. It should be noted, however, that the presence of an abnormal field result does not necessarily mean that eye disease is present. Indeed, 5% of the visual fields of healthy eyes should be judged abnormal when 95% probability limits are used. It is important, therefore, to appreciate that a distinction exists between visual fields from healthy eyes and visual fields that are statistically normal, with the latter being a subset of the former. Although in this study we examined perimetry specifically, the issue of normative databases pervades ophthalmology, particularly in functional testing (e.g., normative limits for contrast sensitivity charts) and in imaging (e.g., normative limits for optic nerve head parameters in retinal tomography). Because of this, it is important to appreciate the factors underlying normative databases, as well as any limitations that may result.

^{ 10 }

^{ 11 }and subject inclusion criteria provide a means through which to do this. Inherent in the use of inclusion criteria for subjects in a study of normal observers is the assumption that a classification of “normal” is equivalent to that of “disease-free.” Unfortunately, “normal” and “disease” may be part of a continuum, as in a disease process such as hypertension, thereby making the distinction between the two categories less clear. To avoid making this distinction, it is possible to create a perimetric database without any specified inclusion criteria for subjects. Ignoring the possible effects of unintentional recruitment bias,

^{ 12 }the resultant probability limits give the likelihood of a particular index value arising from the population as a whole (i.e., disease and disease-free observers), rather than from a group of normal observers. Such limits, however, would have a reduced sensitivity for detecting ocular disease, particularly once the prevalence of disease rose above the probability limit defining an abnormal result. Because the prevalence of glaucoma alone reaches above 5% (a commonly accepted limit for “normality”) in older populations,

^{ 13 }the use of a criterion-determined normal population to create perimetric databases is important for maximizing a test’s sensitivity for detecting ocular disease.

^{ 12 }Using a perimetric-based criterion raises an interesting question, however: Is it appropriate to use an inclusion criterion (perimetric performance) that is based on the variable for which normal limits are being determined? In particular, what is the meaning of normative probability limits of 2%, 1%, and 0.5%, when they are based on a group from which the lowest 5% was removed? For example, using 5% probability levels as a criterion for normality results in a database that contains only the top 95% of normal performers. Furthermore, requiring subjects in the database to have two normal indices (e.g., MD and PSD) at the 5% level makes things worse, resulting in only the top 90% (0.95 squared, assuming complete independence of the indices) of performers. Such a database would produce a high false-positive rate for detecting abnormal visual fields, as the subject group who formed the database had perimetrically “supernormal” performance.

^{ 14 }

^{ 15 }

^{ 16 }and is likely to be lower, given the comparatively restricted range of test indices returned by normal observers.

^{ 17 }Although it may be expected that good correlation exists among perimeters that have similar test parameters, this may not be true among perimeters designed to measure different visual functions (e.g., frequency-doubling [FD] perimetry,

^{ 18 }or SWAP

^{ 3 }). Previous work has failed to find a significant correlation between the MD index for conventional increment–threshold perimetry and FD perimetry in a group of normal observers, despite the presence of a strong correlation when a similarly sized group of glaucomatous observers was used.

^{ 15 }

^{ 19 }Based on our empirical findings, we performed a Monte Carlo investigation of the effects of inclusion criteria on perimetric normative database probability limits.

*P*≥ 5%). Because of this, our subjects were not naïve perimetric observers for the data presented in this study and so may not demonstrate the same improvement in performance with serial field testing (the “learning effect”) expected in a naïve sample. The significance of this is discussed in the following sections.

^{ 2 }A test that is used primarily to monitor patients should not account for a learning effect, however, as it is expected that subjects taking the test are either not perimetrically naïve or are naïve and so may require training to achieve consistent results. We believe that it generally is undesirable to have a database that is influenced by a learning effect and that it is preferable to be aware that some naïve subjects may require training. Our approach is consistent with that used in the development of the Humphrey Visual Field Analyzer, in which only subjects experienced with perimetry were included. As noted by Hiejl et al.,

^{ 2 }“… [I]f a model of the normal visual field were to be based on subjects without any previous experience in visual field testing, the normal variability would be very large and nonrepresentative for many clinically examined patients.”

^{ 19 }with testing spaced over four visits and interspersed with other perimetric tests (not analyzed in this study). We performed customized calculations

^{ 20 }of the indices MD and PSD for each eye and for each test type, for both test and retest sessions, using a linear model for the effect of aging

^{ 2 }

^{ 19 }and the formulas used by the commercial HFA device.

^{ 21 }Both test indices also are available on the newer SITA test algorithm for the HFA perimeter.

^{ 9 }Percentile limits were calculated empirically, using linear interpolation. Table 1 shows the distribution of subjects, by age.

*P*= 0.59 and 0.24, respectively), although test and retest distributions of HFA PSD were significantly different from normal (

*P*= 0.03 and

*P*< 0.001, respectively). The distributions of these indices are given in Figure 1 . Because of these departures from normality, we used Spearman rank correlation coefficients (

*r*

_{s}) to assess the monotonicity of relationships between tests nonparametrically, with a value of 1 indicating a perfect monotone relationship among ranks.

^{ 22 }Correlation analyses typically are inappropriate for comparing test methods, as the level of correlation depends on the range of intersubject variability in the study.

^{ 17 }This presents significant problems when assessing correlation between perimetric indices in ocular disease, because the correlation obtained depends, in part, on the range of disease severity included in the study. In our study on normative subjects, however, the range of intersubject variability is essentially fixed and so this problem is avoided.

*X*and

*Y*; mean = 0, SD = 1) using the following equation:

*Z*and can vary between 0 and 1. It should be noted that α is not identical with Pearson’s coefficient of determination (

*r*

^{2}) nor the Spearman rank correlation coefficient (

*r*

_{s}). The variance of the distribution

*Z*is:

_{ X }

^{2}and σ

_{ Y }

^{2}are the variances of distributions

*X*and

*Y*, respectively. The SD of distribution

*Z*could then be normalized by dividing each element by the root of this variance:

*X*, and another set of normally distributed indices,

*Z*(norm), with a known correlation between the two sets. We simulated 2000 indices for each of the two distributions

*X*and

*Z*(norm), using two combined multiplicative congruential random number generators, as implemented by Press et al.,

^{ 23 }giving a period of approximately 2.3 × 10

^{18}. Serial correlations were removed by using a Bays and Durham shuffle.

^{ 23 }

*r*

_{s}= 0), screening subjects on an established perimeter has no influence on the probability limits in the new database. When there is perfect correlation between the two tests (

*r*

_{s}= 1), the 5% and 1% probability limits exclude 9.75% (5% + 0.05 × 95%) and 5.95% (5% + 0.01 × 95%) of normal subjects when no criterion for prior performance on the established perimetry test is used. Between these two limits, both the 5% and 1% limits show an accelerating function with

*r*

_{s}.

*P*< 5% do not represent a constant proportion of an otherwise normal population. Further support for this latter finding comes from the observation that six of our hundred subjects had an abnormal (

*P*≤ 5%) MD and/or PSD for their right and/or left eye on initial testing, as determined by the commercial HFA database, despite having normal (

*P*≤ 5%) indices at a prior screening visit (see the Methods section).

*P*≥ 5%) MD index on the HFA. The correlation between the FD perimeter and the HFA was 0.41 (95% CI, 0.23–0.57; Fig. 4 ), suggesting that the 5% limits in our new database would actually exclude approximately 5.9% (95% CI, 5.5–6.7%) of the normal population not previously screened on the HFA (Fig. 5) . Similarly, the 1% limits would exclude approximately 1.5% (95% CI, 1.2–1.8%) of the normal population. Such shifts are small compared with those expected if there were a perfect correlation between the two tests (9.75% and 5.95% for the 5% and 1% limits, respectively: see the Results section). As the correlation between any new test and the HFA should be less than the autocorrelation (i.e., test–retest) of the HFA, the test–retest correlation of the HFA should set an upper limit on what probability limit shifts are expected in practice. We found a test–retest correlation coefficient of 0.65 (95% CI, 0.52–0.76) for the HFA index MD (Fig. 2) , which predicts an upper limit of 7.2% (6.4%–7.9%) and 2.0% (1.7%–2.6%) for the 5% and 1% probability limit shifts, respectively (Fig. 5) . Given the uniformly lower correlations found for the index PSD, it is likely that a criterion based on PSD will cause a probability limit smaller than those expected with a criterion based on the index MD. It is possible that many or all the described probability limit shifts are smaller than those introduced by inexact modeling of the change in sensitivity with age

^{ 19 }or by assuming that the variance of sensitivity distributions is constant with age.

^{ 2 }

^{ 15 }In particular, variability must be viewed in light of the range of index values encountered clinically. Because of this, a test index may show little or no repeatability among normal observers, but still be a useful diagnostic index provided diseased observers return test index values outside the normal range. For example, we found PSD to have poorer test–retest reliability than MD for normal observers, which could be interpreted that PSD would be the poorer choice for detecting early disease. In contrast, though, previous work has found that PSD is superior to MD in detecting glaucomatous visual field damage

^{ 24 }and that PSD (in an analogous form, corrected loss variance) is superior to MD for detecting the onset and progression of glaucomatous visual field damage.

^{ 25 }

^{ 20 }

^{ 26 }and some of the subjects initially falling outside the normative criteria may become eligible for inclusion in the database. If retesting is allowed, however, it is important that this be noted in the eligibility criteria.

Age Range (y) | Frequency (N = 100) |
---|---|

25–34 | 7 |

35–44 | 18 |

45–54 | 25 |

55–64 | 13 |

65–74 | 14 |

75–84 | 22 |

85–94 | 1 |

**Figure 1.**

**Figure 1.**

**Figure 2.**

**Figure 2.**

**Figure 3.**

**Figure 3.**

r _{s} (95% CI) | P | Subjects Common to Lower 5th Percentile (95% CI for proportion; %) | |
---|---|---|---|

Test vs. retest, MD | 0.82 (0.74 to 0.87) | <0.001 | 3 (0.62%–8.5%) |

Test vs. retest, PSD | 0.39 (0.21 to 0.55) | <0.001 | 3 (0.62%–8.5%) |

Test MD vs. test PSD | −0.30 (−0.48 to −0.11) | 0.002 | 1 (0.03%–5.4%) |

**Figure 4.**

**Figure 4.**

**Figure 5.**

**Figure 5.**