purpose. To compare the baseline Collaborative Initial Glaucoma Treatment Study (CIGTS) visual field (VF) score and mean deviation (MD), investigate test–retest variability, and identify variables associated with VF loss and VF measurement variability.

methods. Baseline data from a randomized clinical trial of 607 patients with newly diagnosed open-angle glaucoma were collected at 14 clinical centers. The CIGTS VF score and MD were obtained from 24-2 VF tests (Zeiss-Humphrey Systems, Dublin, CA) at two visits approximately 2 weeks apart.

results. Although most baseline CIGTS VF scores showed limited field loss, 15% (91/607) of patients showed a substantial deficit (VF score >10 on a 0–20 scale). A small but significant learning effect was seen over the two baseline measures for CIGTS VF score and MD. CIGTS VF score and MD correlate highly (*r =* −0.93); both have high test–retest correlation (0.83 and 0.91, respectively). Variables associated with greater baseline VF loss for both CIGTS VF score and MD include (probabilities for VF only): male sex (*P* = 0.018), black race (*P* ≤ 0.0001), lower visual acuity (*P* ≤ 0.0001), higher intraocular pressure if more than 30 mm Hg (*P* = 0.0034), poor field reliability score (*P* ≤ 0.0001), cardiovascular disease (*P* = 0.015), reduced patient-reported alertness (*P* = 0.023), and CIGTS clinical center (*P* ≤ 0.0001). Predictors of increased CIGTS VF score variability include a midrange VF score (*P* ≤ 0.0001), first-tested eye (*P* = 0.0027), reduced patient-reported alertness (*P* = 0.0177), increasing age (*P* = 0.0040), current smoker (*P* = 0.0014), and CIGTS clinical center (*P* = 0.0215).

conclusions. The CIGTS VF score provides a measure of VF strikingly similar to the MD. Variables associated with VF loss and VF variability may help identify patients who need greater clinical scrutiny.

^{ 1 }

^{ 2 }

^{ 3 }

^{ 4 }

^{ 5 }

^{ 6 }

^{ 7 }due to factors that include fatigue of the patient, learning effects, visual artifacts, measurement error, and perhaps inherent variability of the VF itself.

^{ 8 }administered by telephone. Once eligibility was established, written informed consent was obtained. These procedures followed the tenets of the Declaration of Helsinki and were approved by the University of Michigan Institutional Review Board (IRB) as well as by the IRB at each of 14 clinical centers. Details of the CIGTS study design, eligibility criteria, and patient baseline characteristics are given in Musch et al.

^{ 9 }Before randomization, the clinician chose one eye (usually the more severely affected) as the first eye to be treated under the randomized treatment assignment. Only the baseline data for the studied eye are presented.

^{ 10 }were followed in developing the protocol.

^{ 11 }The proprietary distributions are built into the VF test software and are not available for inspection. The probability at each of the 52 points is reported as no defect,

*P*≤ 0.05,

*P*≤ 0.02,

*P*≤ 0.01, or

*P*≤ 0.005, meaning that the measured value at that point was at or below the respective percentile of the age-specific empiric distribution at that position of the field for normal subjects. Because artifacts may result in isolated points of defect in the field, we counted only defects forming clusters, as described later.

*P*≤ 0.05 is given a weight of zero. For example, a point at

*P*≤ 0.01 with only two neighboring points of defect, both at

*P*≤ 0.05, would receive a weight of 1. The weights for all 52 points in the field are summed, resulting in a value between 0 and 208 (52 × 4). The sum is then scaled to a range of 0 to 20 (dividing the sum by 10.4), to yield values in the same range as the VF score previously developed by the Advanced Glaucoma Intervention Study (AGIS).

^{ 1 }The resultant score is a nearly continuous measure of VF loss. An illustration of a CIGTS VF score calculation from a hypothetical Humphrey VF deviation plot is given in Figure 1 .

^{ 12 }and are also described in Mills.

^{ 13 }

^{ 8 }a widely used measure of functional health status. The SIP provides 12 category subscales, including a 10-item alertness behavior scale. Examples of items in this scale are “I do not keep my attention on any activity for long” and “I react slowly to things that are said or done.” Each item receives a yes/no answer. The alertness behavior score used in the study is simply the number of items endorsed, and higher scores therefore represent more problems with attention-related behavior. Because the Humphrey VF test requires steady concentration for periods up to half an hour, difficulty with alertness may contribute to a poor VF score.

*t*-tests were used to assess learning effects. Variables associated with both baseline CIGTS VF scores and MD were investigated with regression analyses, using a nonautomated step-down procedure for variable selection. CIGTS clinical center was considered as a random effect in all models. A computer was used for all analyses (SAS software

^{ 14 }with SAS Proc Mixed used for the regressions; SAS Cary, NC). CIGTS VF and MD variability were measured by determining the absolute difference between the first and second baseline values. Predictors of variability were investigated by using regression models, with square root transformations used in both cases to reduce skewness of the outcome measures. Because of the limited number of study participants self-identified as of a race other than black or white, all other races were grouped with whites for analysis. A significance level of 0.05 was used throughout.

*r*= 0.30,

*P*< 0.0001). Within the average CIGTS baseline VF, the distribution of defects was as follows: Fifty-seven percent of points had no defect, 12% of points had a defect

*P*≤ 0.05; 7%, had

*P*≤ 0.02, 6% had

*P*≤ 0.01, and 18% had

*P*≤ 0.005. Approximately two thirds of patients (66%) had evidence of a central VF defect, as measured by the central four points of the Humphrey field. By the Humphrey glaucoma hemifield test, 9% of CIGTS patients were scored borderline and 70% were outside normal limits (indicating glaucoma), with the remaining 21% within normal limits (meeting CIGTS eligibility criteria with elevated IOP and a glaucomatous optic disc).

*r =*−0.93; negative because of opposite scaling), and fairly high correlations with the other three measures (

*r*= 0.64–0.75).

*P*= 0.0067 by paired

*t*-test, Table 1 ). Although evidence of a learning effect was present (the second scores are lower than the first on average, indicating less defect), the magnitude of the effect is small. For MD, the first and second baseline scores were −5.6 and −5.3, respectively, for a difference of −0.26 (representing improved VF;

*P*= 0.0007).

*P*= 0.017), based on a linear regression of the difference in VF scores by the number of days (up to 30) between tests. When we tested again 3 days later, we observed a mean decrease of 0.61 in the VF score. After 20 days, essentially no decrease was observed. A similar pattern was observed for MD (

*P*= 0.002), with a learning effect near −0.57 VF units (i.e., improved VF) on day 3, increasing to zero by day 20.

*n =*61), the third score was between the first two 75% of the time, above the higher score 3% of the time, and below the lower score 21% of the time. The small number of patients (

*n*= 2) with the third score higher than either of the first two scores may indicate that one of the first two scores was artificially high for some reason, rather than representing random variation.

*r*= 0.36). The correlation is 0.83 for CIGTS VF scores and 0.91 for MDs. Although the correlation is not necessarily a good measure of test–retest agreement, because it measures the strength of the linear relationship, regardless of any differences between the two measures in location or scale, it can be useful when location or scale differences are negligible. In our case, the small location shifts in the second CIGTS VF or MD measures due to learning effects were negligible for practical purposes. Plots of the differences between visits for CIGTS VF and MD versus their respective averages revealed fairly symmetrical distributions above and below zero, with greater variability near the center for both measures than at the upper or lower ends of the distributions.

*df*standard deviation estimates from the first two baseline values from each patient). The pooled standard deviation estimates were 1.8 for CIGTS VF score and 1.4 for MD. These estimates reflect the variability of repeated scores in the same person around the person’s mean.

*P*< 0.0001), although technicians within clinical centers were not significantly different (

*P*= 0.48). The proportion of variation explained increased from 16% to 23% after including clinical center in the model. The distribution of center effects (after adjusting for all other effects) was fairly normal with a SD of 1.0 VF unit; the three centers that varied the most from the mean had deviations of 2.0, −1.4, and −1.1 VF units from the adjusted mean of all centers. Although the regression assumption of normally distributed residual errors was not met because of the floor effect in the VF measurements, no transformation of the data could adequately correct the problem.

*P*= 0.0027), increasing age (

*P*= 0.0040), current smoking (

*P*= 0.0014), an increased (worse) SIP alertness score (

*P*= 0.0177), and a fourth-degree polynomial in the VF score itself (i.e., terms included for VF, VF

^{2}, VF

^{3}, and VF

^{4}), reflecting lower variability for scores near zero and 16 and a plateau of constant variability between scores of approximately 3 and 13 (

*P*< 0.0001). CIGTS clinical center effects were also significant (

*P*= 0.0160). The proportion of variation explained by the model,

*R*

^{2}, was 39% on the transformed (square root) scale, but only 21% on the original scale of absolute differences.

*P*= 0.0017), increasing age (

*P*= 0.0178), current smoking (

*P*= 0.0249), and a quadratic polynomial in the MD score itself (

*P*= 0.0001), reflecting lower variability near the lower and upper limits of MD, and increased variability in the middle. New predictors of variability in this model included an increased (worse) reliability score (

*P*= 0.0079) and high blood pressure (0.0176). CIGTS clinical center effects were also significant in this model (

*P*= 0.0101). The number of days between baseline VF tests was not significantly associated with variability in either model.

*r*= .92), the CIGTS score was on average 1 unit larger than the AGIS score (paired

*t*-test

*P*= 0.0004), with the difference increasing with the magnitude of the scores.

^{ 1 }

^{ 15 }

^{ 16 }Heijl and Bengtsson

^{ 15 }found substantial learning effects in MDs (of 2.8 dB) between first and second VF tests, but no statistically significant effect in the three subsequent tests. They tested only 25 patients, however, and the statistical power was fairly low and the results consistent with the small learning effect observed in this study, in which all patients had had at least one VF test before enrollment. The learning effects reported in Heijl and Bengtsson were present at approximately the same magnitude when the first and second tests were separated in time by weeks or even months, with similar results reported by Wild et al.

^{ 16 }The smaller learning effects that we observed, after a pre-enrollment VF test, diminished with time between the CIGTS baseline tests, with little or no learning effect left by 20 days after the first baseline test. These findings support the idea that requiring a preliminary VF test within the previous several months is sufficient to minimize substantial learning effects on subsequent VF testing.

*r*= 0.92 in a sample of VF tests from non-CIGTS patients).

*r =*−0.93). A conceptual difference between the CIGTS VF score and the MD is that the MD reflects an average VF depression (in decibels) over the field, and even minor defects can depress the MD score. The CIGTS VF score is based on the probabilities from the total deviation probability plot, and only probabilities less than 0.05 can potentially increase the score. The CIGTS VF score has an advantage over MD, being based on the total deviation probability plot, that it may more accurately reflect field loss at points where the age-specific distribution is quite skewed. Also, the CIGTS VF score is not affected until at least three neighboring points are all depressed below the 5th percentile, potentially avoiding artifactual VF depressions at one or two points. The CIGTS VF score is somewhat less reproducible than the MD when comparing the first and second baseline measures. We conclude that the CIGTS VF score and the MD are quite comparable. The behavior of both scores over follow-up will provide a more complete basis of comparison.

^{ 17 }

^{ 11 }The point near central vision (

*x*= 3°,

*y*= 3°) had a symmetric and reasonably Gaussian distribution of decibel values. The “

*P*< 0.01” percentile was at approximately −5 dB. The distribution for a somewhat more peripheral point (

*x*= 3°, y = 15°) was slightly more skewed, with larger variance and the “

*P*< 0.01” percentile at approximately −13 dB. The most peripheral point (

*x*= 3°,

*y*= 27°) had a highly skewed distribution, with the largest variance and the “

*P*< 0.01” percentile at approximately −22 dB. Clearly, the age-adjusted distribution at each point should be considered in any measure of VF loss. Averaging the actual estimated percentile data would be an improvement over both the MD and the CIGTS VF score. However, the fact that the CIGTS VF score and the MD, calculated in such different ways, correlated so highly, gives some assurance that neither is far off the mark.

^{ 18 }

^{ 4 }

^{ 19 }sex, race, visual acuity, and IOP

^{ 20 }). Although cardiovascular disease has not been explicitly associated with VF loss, VF loss has been associated with some risk factors for cardiovascular disease.

^{ 20 }

^{ 21 }Street et al.

^{ 22 }reported a weak association of atherosclerotic disease with visually significant cataract requiring surgery in Medicare beneficiaries. Vogel et al.

^{ 20 }reported a significant correlation between their measure of VF defect and initial IOP (

*r =*−0.26, where lower VF scores indicate greater defect;

*P*= 0.0001), although the plot they present hints at a threshold rather than a linear relationship. Patients with initial IOPs less than 50 mm Hg had the whole range of VF scores, whereas patients with IOPs over 50 mm Hg had consistently poor baseline VF scores. The effects in our study of race and sex may be due to differential access to medical care or treatment-seeking behavior. Such effects are well documented in the literature.

^{ 23 }

^{ 24 }As in AGIS,

^{ 1 }we found no relationship between VF loss and baseline age.

^{ 1 }Predictors of CIGTS VF score (and MD) variability included reliability of the VF score. A similar association between MD variability and reliability score was reported by McMillan et al.

^{ 19 }Katz et al.

^{ 4 }have noted that patients with glaucoma report greater difficulty in meeting the reliability criteria of the Humphrey software than normal subjects. Our finding that increased VF variability is associated with increased age has been reported previously by Katz and Sommer,

^{ 25 }but not Boeglin et al.

^{ 2 }

^{ 26 }

^{ 27 }Although patients with clinically significant cataract were excluded from the CIGTS, it is possible that subclinical cataract was more common among the smokers. We have observed in CIGTS follow-up data that cataract formation is associated with worsening VF scores. The denser lens media among smokers may have globally suppressed the VF measurement, lowering sensitivity and increasing variability.

^{ 28 }(decreased MD and sensitivity, and increased PSD, number of stimulus presentations, and false negatives) and use of antihistamines

^{ 29 }(higher SF). The observed effect of the subject’s alertness on VF variability could have implications for clinical practice. In patients with alertness problems, the effect may be partially diminished by scheduling the VF testing early in the day and before the clinical examination. The observed increase in variability with the right eye is probably associated with the fact that right eyes were always tested first. By the time the left eye was tested, the patient had settled into the routine of the test and was more consistent.

*P*= 0.0017) and VA (

*P*= 0.0005), and even a marginally significant center effect on alertness scores (

*P*= 0.0501), where patient population differences are the likely causes. However, the center effects for VF and MD were stronger than those seen for the other effects tested. Because the centers’ VF machines are calibrated regularly, it is unlikely that the clinical center effects represent machine differences. However, other factors related to the setup may have more impact than previously considered. We tested for technician differences within clinical center among patients who were tested by the same technician at both baseline visits and found no significant effect.

**Figure 1.**

**Figure 1.**

Mean ± SD | Range | Correlation with CIGTS VF Score (95% CI)^{*} | Learning Effects (1st Minus 2nd Baseline Measures) (Mean ± SD)^{, †} | Correlation of 1st and 2nd Baseline Measures (95% CI)^{*} ^{, †} | |
---|---|---|---|---|---|

CIGTS VF Score | 4.9 ± 4.3 | 0.0–16.0 | 1.00 | 0.28 ± 2.6^{, ‡} | 0.83 (0.80–0.86) |

MD | −5.5 ± 4.3 | −23.5–3.4 | −0.93 (−0.94–−0.92) | −0.26 ± 1.9^{, §} | 0.91 (0.88–0.92) |

SF | 2.1 ± 0.7 | 0.8–4.7 | 0.64 (0.58–0.70) | 0.08 ± 1.0 | 0.36 (0.26–0.45) |

PSD | 5.7 ± 3.5 | 1.2–17.0 | 0.75 (0.70–0.79) | 0.07 ± 1.3 | 0.93 (0.92–0.94) |

CPSD | 5.0 ± 3.7 | 0.0–16.8 | 0.73 (0.68–0.78) | 0.06 ± 1.5 | 0.91 (0.90–0.93) |

**Figure 2.**

**Figure 2.**

**Figure 3.**

**Figure 3.**

Variable | Coefficient (SE) | P | Direction of Effect |
---|---|---|---|

Reliability score | 2.66 ± 0.48 | 0.0001 | ↑Reliability score (less reliable) ⇒ ↑VF score |

Sex | 0.69 ± 0.33 | 0.0390 | Males ⇒ ↑VF score |

Race | 1.52 ± 0.35 | 0.0001 | Blacks ⇒ ↑VF score |

Visual acuity | −0.15 ± 0.03 | 0.0001 | ↓VA ⇒ ↑VF score |

Cardiovascular disease | 1.06 ± 0.45 | 0.0176 | Cardiovascular disease ⇒ ↑VF score |

Diabetes^{*} | −1.76 ± 0.44 | 0.0001^{*} | Diabetes ⇒ ↓VF score |

IOP ≤30^{*} | −0.18 ± 0.05 | 0.0005^{*} | ↑IOP up to 30 ⇒ ↓VF score |

IOP >30 | 0.16 ± 0.06 | 0.0059 | ↑IOP over 30 ⇒ ↑VF score |

SIP alertness^{, †} | 0.24 ± 0.10 | 0.0129 | ↑Alertness score ⇒ ↑VF score |

*Ophthalmology*101,1445-1455 [CrossRef] [PubMed]

*Am J Ophthalmol*113,396-400 [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*41,3429-3436 [PubMed]

*Ophthalmology*98,70-75 [CrossRef] [PubMed]

*Am J Ophthalmol*108,130-135 [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*37,1419-1428 [PubMed]

*Invest Ophthalmol Vis Sci*30,1083-1089 [PubMed]

*Ophthalmology*108,887-897discussion 898. [CrossRef] [PubMed]

*Ophthalmology*106,653-662 [CrossRef] [PubMed]

*Ophthalmology*103,186-189 [CrossRef] [PubMed]

*Arch Ophthalmol*107,204-208 [CrossRef] [PubMed]

*Statpac 2 User’s Guide*Allergan Humphrey San Leandro, CA.

*J Ocul Pharmacol*7,89-95 [CrossRef] [PubMed]

*SAS/STAT Users Guide Version*SAS Institute Inc. Cary, NC.

*8**Arch Ophthalmol*114,19-22 [CrossRef] [PubMed]

*Acta Ophthalmol (Copenh)*69,210-216 [PubMed]

*Ophthalmology*106,391-395 [CrossRef] [PubMed]

*Acta Ophthalmol*61,186-194

*Acta Ophthalmol*70,665-670

*Br J Ophthalmol*74,3-6 [CrossRef] [PubMed]

*Am J Ophthalmol*123,338-346 [CrossRef] [PubMed]

*Arch Ophthalmol*114,1407-1411 [CrossRef] [PubMed]

*J Am Med Womens Assoc*51,133-136 [PubMed]

*Am J Public Health*87,811-816 [CrossRef] [PubMed]

*Arch Ophthalmol*105,1083-1086 [CrossRef] [PubMed]

*JAMA*268,994-998 [CrossRef] [PubMed]

*JAMA*268,989-993 [CrossRef] [PubMed]

*Jpn J Ophthalmol*34,291-297 [PubMed]

*Perimetry Update 1988/1989*,439-445 Kugler and Ghedini Amsterdam.