The recent study by Åsman et al.

^{ 1 }demonstrated the practical limitations of a method commonly used to identify the general height (85th percentile value) of visual field sensitivity. In that paper the methods simulated outcomes by substituting within normal regions of the visual field, obtained from a large sample (*n*= 82) of normal observers, a zone of abnormality, whose features were derived individually from a large group (*n*= 123) of patients having glaucoma, to yield a synthetic abnormal field. The outcomes derived from this synthetic field were then compared to those of the normal field before corruption. These simulations showed that the presence of a local scotoma results in an underestimate of the general height (overestimate of the mean defect; MD) with the magnitude of error being related to the size of the scotoma (number of involved points). Although the average effect on MD was small (range, −0.2 to −2.3 dB), it produced a substantial corruption of the pattern defect index and its associated probability scales, frustrating the detection of progression. The authors^{ 1 }conclude that improved methods are needed for describing the general height or sensitivity of the visual field. In this article we describe and evaluate two candidate methods that can be applied for such purposes.Our logic stems from the fact that one of the challenges in clinical science is to identify normal signals given the presence of abnormality or noise. In perimetry, the distribution of outcomes for the dependent variable (in this case decibels) can be described by a probability density function (PDF).

^{ 2 }A bell-shaped or unimodal PDF (Fig. 1A)can be summarized with descriptive statistics of central tendency such as the mean or median and its spread.^{ 3 }Unfortunately, single-peaked distributions are not typically found in clinical populations.^{ 4 }^{ 5 }^{ 6 }Clinical PDFs have long tails, or become altered by disease^{ 7 }to show multi-lobed distributions.^{ 4 }^{ 5 }^{ 6 }This effect has been demonstrated in patients with primary open-angle glaucoma, optic neuritis, and/or ocular hypertension, as well as persons with normal eyes with reduced sensitivity.^{ 7 }Indeed, in some cases, disease can yield a bi-lobed PDF,^{ 4 }^{ 5 }^{ 6 }with few normal values.^{ 8 }These multi-lobed distributions challenge traditional descriptors of central tendency and their ability to summarize normal parts of the visual field. The problem rests in extracting those few remaining normal data points from the abnormal values, because these will assist in monitoring the development of new defects in such eyes.^{ 9 }Moreover, the recent work of Åsman et al.^{ 1 }also shows that such extraction can have a significant bearing on diagnostic capacity. There are two problems in this process for perimetry; the adoption of traditional statistical descriptors, such as the mean, to derive perimetric indices, and the determination of the general height or*typical*sensitivity of the patient, which is often determined from the 86th percentile value. Although traditional methods have adopted these two different approaches for these applications, we will describe an alternative approach that can yield both outcomes simultaneously.Automated perimetry adopts conventional statistical indices to summarize data

^{ 10 }and express the average sensitivity on an absolute (mean sensitivity) or relative basis (MD).^{ 11 }^{ 12 }One consequence of averaging over all points is that the mean can return an index that*dilutes*or*misrepresents*local losses^{ 1 }^{ 7 }^{ 12 }or yields nonsense data in the presence of a bi-modal distribution, an issue that is not often considered or discussed in the literature. For example, with outcomes of 30, 30, 30, 29, 28, 28, and 28 dB, the mean of 29 dB or the 86th percentile of 30 dB provide reasonable estimates of this distribution. However the bi-modal data of 30, 30, 28, 0, 0, 0, and 0 dB return a 86th percentile of 30 dB and a mean of 12.6 dB, which fails to represent either the defective (0 dB) or normal (30 and 28 dB) values: a*meaningless*mean. In the latter example, we propose that the normal values (28 and 30 dB) carry the greatest information for detecting new scotomata and the null values (0 dB) can be considered as outliers, as they are data not representative of the normal state. Even though the 86th percentile succeeded in the example, it can fail in clinical applications,^{ 1 }^{ 7 }and so the challenge is to produce methods that reliably extract normal data and can summarize these values. In a similar manner, false positives or disease may yield higher than expected thresholds that should also be identified and removed from any calculations, as these will act to corrupt the estimate for diffuse (general) loss as identified by Åsman et al.^{ 1 }The literature

^{ 13 }proposes two procedures that can be used to yield robust means, where “robustness ” refers to the capacity to remain unaffected by outliers. The first is called data trimming and the other is termed weighting. For perimetry, robustness is important, given that perimetric outcomes are negatively skewed.^{ 4 }^{ 5 }^{ 6 }This skew leads to misrepresentation of the central tendency by a mean and the average collapses in the presence of far-advanced visual field losses,^{ 8 }so the 86th percentile has been suggested as a more robust indicator of threshold.^{ 11 }^{ 12 }Although intuitively reasonable, formal testing of the capacity of the 86th percentile to return robust outcomes was lacking until the seminal work of Åsman et al.^{ 1 }Their approach is based on real data and lacks fullness in the range of factors that can act to corrupt a clinical data set, such as false-positive responses, as these are not common in patients or may have been censored from acceptable outcomes. Some of these issues have been previously canvassed by Turpin et al.,^{ 14 }wherein they propose that the only way to appreciate fully the limitations of a method is with simulation. We agree with their proposal and describe such an approach.In this article, we compare methods that can be used to enhance the robustness of the estimate of the general height, identify the remaining normal data points by trimming and weighting,

^{ 13 }and compare the outcomes from these methods to the 86th percentile estimate of general height. The two approaches being considered can be used to identify*normal*values in the presence of disease or high variability. We demonstrate the benefits of implementing these processes with simulations. We have chosen to use simulations instead of empiric evaluation because the threshold PDF of diseased eyes is not known, and the true endpoint can never be known in clinical data.^{ 4 }However, to ensure meaningful outcomes, our simulation is based on real data sets^{ 3 }^{ 5 }^{ 6 }and has been tested by applying the methods to clinical data sets where it is known that the usual summary index (MD), fails.^{ 8 }Finally, our findings should complement those derived from clinical simulations.^{ 1 }Methods

The problem just described is one of identifying and extracting outliers to return

*normal*values that can then be used to yield the general height. In the following, our methods make use of simulated clinical data, so we first define the clinical PDFs used in our simulations and then describe the different methods for extracting outliers (trimming or weighting).A PDF gives the probability that various threshold values (in decibels) occur across a visual field, being normalized frequency distributions of the where

*total*population. Obviously, these can never be determined, and our estimate of the normal PDF (Fig. 1A)is derived from the normal data extracted from 11,400 central (0–30°) thresholds (75 control subjects; ages, 52–85 years), as detailed elsewhere.^{ 5 }^{ 6 }These can be described by the hyperbolic secant of equation 1 . \[\mathrm{PDF}_{x}{=}A_{x}/{[}B_{x}e^{{-}C_{x}(T_{x}{-}d)}{+}C_{x}e^{B_{x}(T_{x}{-}d)}),\]

*d*is the dB level of interest,*T*_{ x }is the modal threshold,*A*_{ x }its amplitude,*B*_{ x }the rising slope, and*C*_{ x }the falling slope, and*x*specifies the population being considered (normal; see Fig. 1 ).The PDF of any clinical population varies, depending on the inclusion criteria of the patient group. where PDF

^{ 4 }^{ 5 }^{ 6 }To control for such variability, we have chosen to develop composite PDFs (equation 2 ) that were created by polling at a specified rate from a normal PDF, an abnormal PDF, and a false-positive PDF. Composite PDFs were created with equation 2 . \[\mathrm{PDF}_{\mathrm{clinical}}{=}\mathrm{PDF}_{\mathrm{normal}}{+}\mathrm{PDF}_{\mathrm{disease}}{+}\mathrm{PDF}_{\mathrm{false-positive}},\]

_{normal}defines the normal distribution (Fig. 1A) , PDF_{disease}refers to one of the three disease distributions (Fig. 1B)and PDF_{false-positive}allows for false-positive responses (Fig. 1A) . Three abnormal probability density functions (PDF_{disease}: Fig. 1B ) were developed to reflect: mild (û_{d1}= 16 dB), moderate (û_{d2}= 6.9 dB), or severe (û_{d3}= 0.8 dB) magnitudes of loss and to allow increased variability with decreasing threshold.^{ 7 }The resultant composite PDFs are shown in Figures 1C 1D 1E . The mild condition was chosen so that 25% of the data were polled from the mild distribution of Figure 1C(μ_{d1}= 16 dB) giving an MD of −11.8 dB. The moderate condition had 50% of data polled from the moderate distribution of Figure 1D(μ_{d2}= 6.9 dB), giving an MD of −20.9 dB. The severe condition had 75% of its data polled from the severe distribution of Figure 1E(μ_{d3}= 0.8 dB) giving an MD of −27.5 dB. All the clinical PDFs were coupled to a 10% false-positive rate. We acknowledge that our choice of distributions for moderate and severe represent advanced clinical losses, but they were chosen because current perimetric indices are known to fail with such losses (see Blumenthal and Sapir-Pichhadze^{ 8 }) and they will provide a challenging test of the outlier detection methods.For completeness, we also simulated a generalized depression in field sensitivity (Fig. 1F) , using PDFs with a −6 dB loss (μ

_{g6}= 20.5 dB) giving an MD of −6.3 dB, and a −12 dB loss (μ_{g12}= 15.2 dB) giving an MD of −12.9 dB. Again, any reduction in threshold was associated with an increased variability.^{ 7 }In developing the composite PDF for generalized loss, 80% of values were drawn from the distribution having the generalized depression; 10% came from the normal distribution and the remainder represent false-positive responses.We also considered the effect that a large false-positive rate (40%) can have on our algorithms. In these cases, the normal-to-abnormal ratio was kept constant. For example, with severe defects, 40% of data were polled from the false-positive distribution and 60% came from the other two distributions in a 15:75 ratio (see previous), meaning that normal data comprised 10% and abnormal data comprised 50% of the composite PDF. False responses were drawn from a Gaussian whose mean was displaced beyond the 0.5th percentile limit of the normal data, with little variability (μ

_{fp}= 45±1 dB; Fig. 1A ). We acknowledge that our approach should not be taken as a descriptive of the false-positive PDF, which can never be determined, but we have adopted our approach because the proximity of our false-positive PDF to the normal PDF should provide the most severe test of the robustness of the outlier method for detecting small departures from normality.Data trimming

^{ 13 }removes outliers to give asymptotic (near constant) outcomes and can be improved with an iterative process. The iteration can cease when it fails to affect outcomes by a reasonable amount, in our case 2 dB, which for our purpose was found to occur by the sixth iteration in all cases. We recognize that this criterion can vary according to the precision needed in a particular setting. The challenge for the trimming process becomes to define the window that can be used to identify normal data.Most statistical approaches recommend that the trim profile should be symmetric and located at some statistically meaningful limits such as 95% confidence limits or ±1.96 SD from the mean.

^{ 13 }This approach loses efficiency when dealing with an asymmetric distribution where the mean value will not represent the central tendency. In these cases, an alternative would be to apply the same limits to the median value, being the central point of a skewed distribution. Note, as the distribution becomes less skewed, the median and mean collapse onto each other justifying such an approach. However, in cases in which bi-modal PDFs exist, the median also provides a poor representation of central tendency of the data set. We propose another approach to trimming skewed data, such that the trim profile is determined from the PDF, where its upper limit is set to the 95th percentile (1.96 SD), and the lower limit is set to have a common area under the curve (integral). This leads to trim limits of −0.78 and +1.96 standard deviations for the PDF, as in Figure 1A . This window should yield optimal performance with skewed data, especially in the presence of sensitivity losses due to disease.In terms of our previous example, the bi-modal data set returns a mean of 12.6 and an 86th percentile of 30, with an SD of 15.7. It is obvious that trimming around either of these values using the 95% confidence interval (±1.96; SD ±30.8) will yield the same outcome, due to the large SD. However, applying our asymmetric trim window gives limits of 0.3 dB (12.6 − 0.78 · 15.7) and 43.3 dB (12.6 + 1.96 · 15.7) removing the low outliers and returning a trimmed mean of 29.3 dB (30, 30, 28). Although in this example, a robust mean was returned after a single trimming procedure, we find an iterative process is needed before the mean changes by less than 2 dB in most applications (Figs. 2A 2C) .

In the example just given, trimming acts by removing

*unwanted*data. In contrast, weighting differs from trimming by retaining the total data set and applying a weighting function, called a tuning constant.^{ 15 }In this case, all data (appropriately weighted) contribute to the determination of the index to return a weighted or*W estimate*. The trimming process detailed previously can be considered as a tuning constant with a binary weight, being either 1 or 0, depending on the displacement of the datum from the mean. The purpose of the tuning constant is to emphasize the desired domain of values, which can reflect normal or defective profiles, depending on the application. In the visual field, weighting may be applied on a regional basis to accentuate patterns of loss, but here we will detail its global application. A related approach has been successfully applied to the extraction of oscillatory potentials in electroretinograms.^{ 16 }^{ 17 }The purpose is to define a suitable where

*tuning*constant. The starting point is to use the*normal*PDF because this emphasizes data that fall within the region of interest and thus de-emphasizes outliers. The literature describes weighted measures as Horvitz-Thompson estimators and where weighting is based on the PDF, as a Hajek estimator.^{ 15 }Determination of the Hajek estimate (*M*_{ w }) is given by equation 3 . \[M_{w}{=}\frac{{{\sum}}_{l}^{n}w_{i}x_{i}}{{{\sum}}_{l}^{n}w_{i}},\]

*w*_{ i }is the weight determined from the PDF for the particular dB level*x*_{ i.}, and*n*is the number of points in the field. We consider a tuning constant that returns a Hajek estimate (normal PDF) as well as power functions of this tuning constant (i.e.,*w*_{ i }^{2}).We evaluate the accuracy of the trimming and weighting algorithms by comparing their estimates to the modal level or the 86th percentile of the normal PDF. We also compare the capacity of the various methods to extract

*normal*data and to return sensible estimates in clinical cases in which the general height and MD have been shown to fail.^{ 1 }^{ 8 }Results

Our data show that the Hajek estimate provides a robust outcome of perimetric data except for extremely large defects, for which the quadratic of the weighting function performs better. The outcomes of the algorithms are compared in Figure 2 . Figures 2A and 2Cshow the mean returned from a trimming process (in decibels) as a function of the iteration step, whereas Figures 2B and 2Dshow the W estimate (weighted mean) for various tuning constants. The leftmost symbols in each panel identify the mean for the entire data set, as might be specified by many modern perimeters. As expected, this mean value gets progressively reduced with increasing defect severity. Figures 2A and 2Crefer to the trimming procedure, with Figure 2Ashowing the outcome for the three clinical PDFs and Figure 2Cgiving the results for generalized depressions in sensitivity and high false-positive (40%) responses. Figure 2Ashows that trimming reaches the ≤2 dB criterion by two to four iterations (Fig. 2 , asterisk) in mild (large open circles), moderate (filled circles), and severe (small open circles) defects. However, it is only in mild defects that the procedure returns a robust estimate of the true threshold (solid horizontal line). Figure 2shows that the presence of 40% false-positive responses (Fig. 2C , shaded squares) fails to give the desired outcome with trimming where the procedure asymptotes on to the false positive mean of 45 dB; and although it rapidly extracts the real threshold with mild generalized depressions, it requires many iterations with severe generalized losses.

The results for the W estimate are shown in the Figures 2B and 2D . Figure 2Bshows the effect that various tuning constants have on the W estimate. In all cases, except in the severe defect group, the W estimate provides a robust statistic of the underlying mean. However, uncertainty (error) of the estimate is large with moderate (filled circles) and severe (small open circles) cases for the Hajek-estimate (c

^{1}tuning constant). Weighting with a quadratic tuning constant (c^{2}) yields not only robust estimates but also reasonable errors in these values. Higher-order power functions fail to give any greater advantage beyond the quadratic. The results of the simulations indicate that the W estimate asymptotes onto the 60th percentile value of the normal PDF.The simulations indicate that the presence of high false-positive rates can modify outcomes as can severe loss, both of which challenge the adoption of the 86th percentile for this purpose. Indeed, Blumenthal and Sapir-Pichhadze

^{ 8 }recognized this limitation in their paper, in which they describe the problem of extracting a summary index in far-advanced glaucomatous field losses.Discussion

We have detailed a mathematical procedure that can be used to return values unaffected by outliers, typical of diseased eyes. We described how a robust estimate of central tendency can be derived using a weighted mean (Figs. 2B 2D) . The method is superior to that based on visual inspection by the clinician, as it removes any intrinsic biases introduced by subjective assessment. This W index is robust to false-positive responses and at least 12 dB of generalized depression, as might be found with severe cataract or refractive error, and although we have applied the procedure to the entire field, we feel that similar applications on a regional basis may detect false responses better and provide added diagnostic information. Although we have simulated losses in our study, the method has the potential to flag the presence of

*supernormal thresholds*, as well. Finally, it would be useful to compare the methods described in this communication to those presently being used, to emphasize their benefits, and to consider practical implementations.Practical Applications

In Figure 3 , the candidate methods have been compared to the mean and 86th percentile values of the normal PDF. Figure 3Asummarizes our simulation findings for all candidate methods, to show that the trim mean (circles) and the 86th percentile (squares) provide useful indices that become corrupted by the presence of false positives and moderate or severe defects. In comparison, the W indices (Fig. 3A , triangles) provide a robust statistic in all cases except for the Hajek estimate (c

^{1}tuning constant) with severe defects. As an aside, one benefit of these procedures is that the normal values identified by these methods can be indicated to clinicians to guide their attention to the unaffected data to detect early change. The simulations imply that the power tuning constant (c^{2}) gives better capacity for very severe losses. In the following, we consider a practical application of these methods, as earlier we argued that the full range of factors used in simulation may not manifest in clinical patients. Figure 3Bevaluates the methods in clinical patients with severe losses of visual field (patients 1, 9, and 12 of Blumenthal and Sapir-Pichhadze

^{ 8 }). Blumenthal and Sapir-Pichhadze have already reported that the 86th percentile fails to return sensible outcomes,^{ 8 }and Figure 3Bconfirms their assertion, showing how the mean (left most datum) for each of the three patients with glaucoma approaches 0 dB, because of the extent of involvement, thus masking any useful information about the healthier remnant in the visual field. On the other hand, both W estimates (c^{1}and c^{2}) return meaningful and similar outcomes, derived from the thresholds of the few remaining data. These would also afford the clinician a supplementary index with which to evaluate progression by identifying the normal (or abnormal) data. Our practical implementation does not find any advantage for the power-tuning constant (c^{2}), which implies that the severity of our simulated losses (Fig. 3A)produced unnatural outcomes.The Hajek estimate could be easily adopted by current perimeters. Although the simulations show that a quadratic tuning constant (c

^{2}) is most robust to the presence of severe defects (Fig. 3A) , there is a tradeoff between adopting the higher-order tuning constants for robustness and an overexpression of abnormality, as the kurtotic nature of these higher-order tuning constants yields a limited normal domain. Given the findings that we obtained from clinical data, we favor the adoption of the c^{1}tuning constant (Hajek). The effect that the shape of the tuning constant has on disease detection in clinical patients needs further consideration.Finally, further to the observation of Åsman et al.

^{ 1 }that there is a need for the development of novel and robust methods for data extraction that can be used to return the general height of visual field data, we detail such in this communication. As suggested by them, this approach can also be used with other clinical tests such as optic nerve indices or laboratory assays to yield robust means in the presence of outliers. Here, some prior knowledge of expected outcomes, as may be provided by a pilot trial, is needed to generate a PDF that can then be applied as the tuning constant. As experimental data are more likely to be homogeneous rather than bi-modal, we propose that a Hajek estimate should be returned, to give robust outcomes of an unbiased data set. Corresponding author: Algis J. Vingrys, Department of Optometry and Vision Sciences, The University of Melbourne, Victoria, 3010, Australia; algis@unimelb.edu.au.

**Figure 1.**

**Figure 1.**

**Figure 2.**

**Figure 2.**

**Figure 3.**

**Figure 3.**

ÅsmanP, WildJM, HeijlA. Appearance of the pattern deviation map as a function of change in area of localized field loss.

*Invest Ophthalmol Vis Sci*. 2004;45:3099–3106. [CrossRef] [PubMed]LyonsRG.

*Understanding Digital Signal Processing*. 1997;Addison-Wesley Reading, MA.King-SmithPE, GrigsbySS, VingrysAJ, BenesSC, SupowitA. Efficient and unbiased modifications of the QUEST threshold method: theory, simulations, experimental evaluation and practical implementation.

*Vis Research*. 1994;34:885–912. [CrossRef]TurpinA, McKendrickAM, JohnsonCA, VingrysAJ. Development of efficient threshold strategies for frequency doubling technology perimetry using computer simulation.

*Invest Ophthalmol Vis Sci*. 2002;43:322–331. [PubMed]VingrysAJ, PiantaMJ. Developing a clinical probability density function for automated perimetry.

*Aust NZJ Ophthalmol*. 1998;26:S101–S103. [CrossRef]VingrysAJ, PiantaMJ. A new look at threshold estimation algorithms for automated static perimetry.

*Optom Vis Sci*. 1999;76:588–595. [CrossRef] [PubMed]HensonDB, ChaudryS, ArtesPH, FaragherEB, AnsonsA. Response variability in the visual field: comparison of optic neuritis, glaucoma, ocular hypertension, and normal eyes.

*Invest Ophthalmol Vis Sci*. 2000;41:417–421. [PubMed]BlumenthalEZ, Sapir-PichhadzeR. Misleading statistical calculations in far-advanced glaucomatous visual field loss.

*Ophthalmology*. 2003;110:196–200. [CrossRef] [PubMed]CoralloG, GandolfoE. A method for detecting progression in glaucoma patients with deep, localized perimetric defects.

*Eur J Ophthalmol*. 2003;13:49–56. [PubMed]FlammerJ. The concept of visual field indices.

*Graefes Arch Clin Exp Ophthalmol*. 1986;224:389–392. [CrossRef] [PubMed]AndersonDR.

*Automated Static Perimetry*. 1992;Mosby St. Louis.BebieH. Computer techniques of visual field analyses.DranceSM AndersonDR eds.

*Automatic Perimetry in Glaucoma: A Practical Guide*. 1985;61–174.Grune & Stratton, Inc. Orlando, FL.FisherLD, van BelleG. Nonparametric, distribution-free and permutation methods: robust procedures.

*Biostatistics. A Methodology for the Health Sciences*. 1993;J Wiley & Sons New York.TurpinA, McKendrickAM, JohnsonCA, VingrysAJ. Performance of efficient test procedures for frequency-doubling technology perimetry in normal and glaucomatous eyes.

*Invest Ophthalmol Vis Sci*. 2002;43:709–715. [PubMed]HulligerB. Simple and robust estimators for sampling.

*Proceedings of the Section on Survey Research Methods of the American Statistical Association, 1999*. 2000;54–63.American Statistical Association Alexandria, VA.BuiBV, ArmitageJA, VingrysAJ. Extraction and modelling of oscillatory potentials.

*Doc Ophthalmol*. 2002;104:17–36. [CrossRef] [PubMed]DerrPH, MeyerAU, HauptEJ, BrigellMG. Extraction and modelling of the oscillatory potential: signal conditioning to obtain minimally corrupted oscillatory potentials.

*Doc Ophthalmol*. 2002;104:37–55. [CrossRef] [PubMed]